Turning a Food Safety Idea Into a Real Prototype: My Data & ML Build Journey
- Get link
- X
- Other Apps
How I taught myself practical machine learning and engineered a working prototype using Python, OCR, and rule based logic
Why I started building and not just thinking
After mapping the food safety problem, I reached a point where thinking wasn’t enough.
Ideas can sound convincing in words. Diagrams can make them look coherent. But without a working prototype, everything stays hypothetical. I didn’t want this project to live as a concept or a case study. I wanted to know whether it could actually work.
Building felt like the only honest way to validate the idea.
So I decided to commit to a fixed window and treat it like an engineering challenge, not a side thought. Sixty days. One end to end prototype. No shortcuts.
Starting point: theory heavy, practice light
During my Master’s, I had studied machine learning, NLP, and Python.
I understood models conceptually. I knew how algorithms worked on paper. I had written isolated scripts and notebooks. But I had never built a complete system where data ingestion, processing, modelling, logic, and outputs had to work together.
This project was my first true end to end build.
That gap mattered. It forced me to move beyond “knowing” into debugging, decision making, and trade offs. It also made me realise how different real world ML feels compared to coursework.
The self learning plan
To avoid drifting, I broke the build into weekly goals. Each week had a clear outcome, not just topics to read.
Weeks 1–2: Machine learning refresh
I revisited core scikit learn workflows. Feature engineering. Train test splits. Model selection. Evaluation metrics. The goal was confidence and speed, not novelty.
Week 3: ETL pipelines
I focused on how data flows through a system. Raw inputs, cleaning, transformation, validation, and storage. This was where the project stopped feeling like a notebook and started feeling like software.
Week 4: OCR and text parsing
This was the most experimental phase. Extracting expiry dates and allergen text from labels meant dealing with noisy, imperfect input. I learned quickly that clean data is a luxury.
Week 5: Freshness scoring logic
I designed a scoring system that reflected gradual risk rather than binary outcomes. This forced me to think carefully about variables, weights, and explainability.
Week 6: Rule engine and system architecture
Finally, I connected everything. Models fed into rules. Rules produced guidance. Components had defined responsibilities. This was the point where the system felt real.
Designing the freshness scoring model
Freshness isn’t a yes or no condition. It decays.
I treated freshness as a score influenced by multiple variables:
Time since production
Time remaining until expiry
Storage conditions
Product category sensitivity
Instead of jumping straight to a complex model, I started with interpretable approaches. Regression for scoring. Classification for safety thresholds.
Weighting mattered. A dairy product behaves differently from dry goods. Storage violations mattered more than calendar time in some cases.
To keep the model trustworthy, I focused on explainability. I used feature contribution analysis to understand why a score moved up or down. If I couldn’t explain a prediction, it didn’t belong in a food safety context.
Evaluation wasn’t just about accuracy. It was about whether the output aligned with common sense safety expectations.
Building the rule based safety engine
Machine learning alone isn’t enough for food safety.
Some rules are non negotiable. A product past its use by date is unsafe, regardless of what a model predicts. Best before dates require different handling. Storage temperature violations can override freshness scores.
I implemented a rule engine that:
Differentiates use by and best before logic
Applies hard safety constraints
Overrides model outputs when required
Produces clear, explainable decisions
This layer was critical. It turned predictions into responsibility.
It also forced me to research food safety standards more deeply. Rules had to be grounded, not guessed.
OCR and label understanding
For the system to work in practice, it needed to read real labels.
I used OCR libraries to extract text from food packaging. The output was messy. Dates appeared in multiple formats. Allergen information was inconsistent. Noise was unavoidable.
So I built cleaning pipelines. Regex parsing. Text normalisation. Keyword mapping for allergen detection.
Once extracted, label data flowed into the rule engine. OCR didn’t need to be perfect. It needed to be reliable enough to support safety decisions.
That distinction changed how I evaluated success.
System architecture overview
Once everything was connected, the architecture looked like this:
This diagram represents more than flow. It reflects separation of concerns. Each component can evolve without breaking the system.
Reflection from analyst to engineer
Before this project, I thought like an analyst.
I focused on insights, metrics, and outcomes. This build forced me to think like an engineer. About failure modes. About responsibility. About how systems behave under imperfect input.
I learned that real world ML isn’t about clever models. It’s about decisions you’re willing to stand behind.
This prototype didn’t just validate an idea. It changed how I approach problems.
It taught me how to build, not just analyse.

good job👍🏼
ReplyDeleteI’ve now read a few of your posts after stumbling across your blog and I’ve really enjoyed them. This one was especially interesting — the way you describe the shift from theory to actually building something real is very relatable. It also quietly shows how much discipline and persistence a project like this takes. Great read.
ReplyDelete