Data

Correction: More Gloom than Doom

I did some more research and came up with a much more complicated but focused formula. Instead of using 18 week cycles, which leaves plenty of room for missed variables, I decided to view the problem week by week.

The good news: the Egg layer population isn’t crashing out.

The bad news: previously available for retail eggs are being used to replenish egg layer populations.

TLDR, what’s the situation today?

Assuming most growers are attempting to replace all bird losses and increase their population to the three-year median level, my model estimates the Breeder Stock population in the US is sitting around 3,774,627 a decrease of 5,373 from 3,780,000 in Jan 2024.


The same model suggests Table Egg Layer population is currently at 302,954,162 a decrease of 8,045,838 from 311,000,000 in Jan 2024.


Taking a look at last week: 7,505,025 (39.22%) Breeder Stock eggs were used to replenish lost Breeders and Table Egg Layers instead of going to other uses such as Retail and Live sales.


Table Egg Layers laid 1,846,019,083 Eggs, 5.12% short if there were no Avian flu deaths, just 96.09% of their production if a median bird population was kept.


A total of 2,383,952 Table Eggs were lost this week. 22,501,221 so far in 2025.

Terms

  1. Table Egg Layers are birds that lay Eggs meant for retail/further processing.

  2. Breeder Stock Hens are birds breed to produce fertilized eggs that hatch into Table Egg Layers. If their numbers decline, so does the future population of Table Egg Layers and therefore the inventory of Retail Eggs.

  3. Pullets are Chicks that haven’t matured into Table Egg Layers yet.

Limited Data

The USDA does not currently report Avian Flu cases for Breeder Stock Hens. However, Breeder stock Hens make up 1.21% of birds related to Table Egg Laying. So we will calculate their exposure to Avian Flu using that percentage against all Table Egg Layers reported on the Avian Flu numbers.

Assumptions, Averages and Medians

  1. For 2022-2024 the Median number of Table Egg Layers was 315,963,300 with Breeding Stock at 3,840,480. They make up 98.79% and 1.21% of the Egg laying population respectively. These are the population numbers we will start at and attempt to maintain.

  2. Per the USDA, for every 100 Egg Layers there is an average of 80 Eggs laid daily.

  3. Let’s assume there is a 60% replacement rate each year due to old age, declining productivity and other natural causes (not Avian Flu). Meaning, 60% of the years starting population will need to be replaced by year end.

  4. Assume an 85% Hatchability rate for all Eggs.

  5. 50% chance of either Male or Female Chick for all Eggs.

  6. Chick survival rate of 90%.

  7. Pullets/Chicks take 18 weeks (126 Days) to Mature into Egg laying Hens.

Formulas

Without getting too complicated.


We have two running populations: Breeder Stock and Table Layers. Each day we subtract their respective share of the Avian Flu deaths and the annual 60% replacements. [(Population * 60%)/365] We then add the matured Pullet population minus any Pullet Avian Flu deaths.


Breeder Stock Egg production is determined by a quick formula outlined above. 80 Eggs for every 100 layers, 93% chance of Fertilization and 85% Hatchability. [(((Population/100)*80)*93%)*85%)


Table Egg Layer Egg production is determined slightly differently. 80 Eggs for every 100 layers and a 95% chance the Egg is usable. This allows 5% uncertainty for accidents/issues.


Eggs to Pullets are estimated by adding up several variables. Birds needing replacements, Avian Flu Deaths, and median population refill. These are subtracted from the Breeder Stock Egg Production and held onto for 18 weeks to mature.

NLP Model: A problem set one year in

I first had the idea for this particular project about three years ago. I'm now a year into the project and I have learned much. Yet I feel like I know nothing.

What is an NLP?

A machine learning technology that enables computers to understand, process, and manipulate human language. NLP is a branch of artificial intelligence, computer science, and linguistics. It uses techniques like machine learning, neural networks, and text mining to interpret language, translate between languages, and recognize patterns. NLP is used in many everyday products and services, including search engines, chatbots, voice-activated digital assistants, and translation apps.

Why an NLP?

I had tried several approaches and tested a few ideas. Ultimately I realized that my raw inputs would be too messy for most models. I needed something flexible and capable of comparing words within a sentence. NLP seemed like the best option that required the least physical resources.

What model am I using?

I am currently using a pre-trained Distilled BERT model that I am fine-tuning with my custom data.

How do I train it?

I will probably get into this with more detail at a later date, mostly because I want to update this part. However, I am converting the training data into a DataFrame with Python and then splitting that data into train and validation sets. But I feel like I can improve this significantly.

Goal

Classify an incredibly large dataset with at least 85% accuracy on an hourly basis without human assistance.

Problems

  1. There are no training datasets for public consumption. I need to create my own.

  2. The dataset needs to be fairly big in order to get the best results.

  3. The custom dataset needs to be classified manually, which takes longer the bigger it is.

  4. There will be over 360 different classes in the dataset. There needs to be a balance.

  5. My processing power is limited to a 6 year old GTX1070

The first couple problems have already been solved/still being solved. I have created 36,000+ lines of training data by scraping data that gets sent directly to me on a daily basis. And also, since I’m a data hoarder, I still have three years worth of raw data to convert into usable training data.

The third problem is still an ongoing problem. It takes a long long time to classify 36,000+ lines of training data. And my plan for the summer is to have 40,000 lines of training data. My next problem is that while I gain good training data for some classes, I’m still lacking for other classes. So I have to hunt for training points on lesser utilized classes. And they are lesser utilized for a reason. This slows down the overall progress of the project as it takes time to claw for examples of these points.

CUE THE PONZI SCHEME

This is when I came up with an idea. The NLP model reads the raw data and makes a classification effort for 18 different categories. Each category could relate to anywhere from 1 to 5 words in the raw data. Those words can be spelt or abbreviated different several ways that all mean the same thing. By swapping the source words with their alternatives, I can inflate the training data. And depending on the amount of alternative words in a single source sentence, that sentence can be transformed as many as 15 times. Now the model not only gets reinforcement training, but exposure to all spelling and abbreviation types.

That approach turned my 36,000 lines into 293,000.

NEXT

So now I need to ponder my processing power problem. My GTX 1070 doesn’t do a terrible job. But the bigger the model gets, the longer it takes to train. A few ways I think I can approach this without buying hardware:

  1. Adjust Training Parameters

  2. Play with the padding / truncation

  3. Clean up the training data

  4. Consolidate the Category with 140+ possible classes

  5. Research