Turning a Food Safety Idea Into a Real Prototype: My Data & ML Build Journey

How I taught myself practical machine learning and engineered a working prototype using Python, OCR, and rule based logic

Why I started building and not just thinking

After mapping the food safety problem, I reached a point where thinking wasn’t enough.

Ideas can sound convincing in words. Diagrams can make them look coherent. But without a working prototype, everything stays hypothetical. I didn’t want this project to live as a concept or a case study. I wanted to know whether it could actually work.

Building felt like the only honest way to validate the idea.

So I decided to commit to a fixed window and treat it like an engineering challenge, not a side thought. Sixty days. One end to end prototype. No shortcuts.

Starting point: theory heavy, practice light

During my Master’s, I had studied machine learning, NLP, and Python.

I understood models conceptually. I knew how algorithms worked on paper. I had written isolated scripts and notebooks. But I had never built a complete system where data ingestion, processing, modelling, logic, and outputs had to work together.

This project was my first true end to end build.

That gap mattered. It forced me to move beyond “knowing” into debugging, decision making, and trade offs. It also made me realise how different real world ML feels compared to coursework.

The self learning plan

To avoid drifting, I broke the build into weekly goals. Each week had a clear outcome, not just topics to read.

Weeks 1–2: Machine learning refresh

I revisited core scikit learn workflows. Feature engineering. Train test splits. Model selection. Evaluation metrics. The goal was confidence and speed, not novelty.

Week 3: ETL pipelines

I focused on how data flows through a system. Raw inputs, cleaning, transformation, validation, and storage. This was where the project stopped feeling like a notebook and started feeling like software.

Week 4: OCR and text parsing

This was the most experimental phase. Extracting expiry dates and allergen text from labels meant dealing with noisy, imperfect input. I learned quickly that clean data is a luxury.

Week 5: Freshness scoring logic

I designed a scoring system that reflected gradual risk rather than binary outcomes. This forced me to think carefully about variables, weights, and explainability.

Week 6: Rule engine and system architecture

Finally, I connected everything. Models fed into rules. Rules produced guidance. Components had defined responsibilities. This was the point where the system felt real.

Designing the freshness scoring model

Freshness isn’t a yes or no condition. It decays.

I treated freshness as a score influenced by multiple variables:

  • Time since production

  • Time remaining until expiry

  • Storage conditions

  • Product category sensitivity

Instead of jumping straight to a complex model, I started with interpretable approaches. Regression for scoring. Classification for safety thresholds.

Weighting mattered. A dairy product behaves differently from dry goods. Storage violations mattered more than calendar time in some cases.

To keep the model trustworthy, I focused on explainability. I used feature contribution analysis to understand why a score moved up or down. If I couldn’t explain a prediction, it didn’t belong in a food safety context.

Evaluation wasn’t just about accuracy. It was about whether the output aligned with common sense safety expectations.

Building the rule based safety engine

Machine learning alone isn’t enough for food safety.

Some rules are non negotiable. A product past its use by date is unsafe, regardless of what a model predicts. Best before dates require different handling. Storage temperature violations can override freshness scores.

I implemented a rule engine that:

  • Differentiates use by and best before logic

  • Applies hard safety constraints

  • Overrides model outputs when required

  • Produces clear, explainable decisions

This layer was critical. It turned predictions into responsibility.

It also forced me to research food safety standards more deeply. Rules had to be grounded, not guessed.

OCR and label understanding

For the system to work in practice, it needed to read real labels.

I used OCR libraries to extract text from food packaging. The output was messy. Dates appeared in multiple formats. Allergen information was inconsistent. Noise was unavoidable.

So I built cleaning pipelines. Regex parsing. Text normalisation. Keyword mapping for allergen detection.

Once extracted, label data flowed into the rule engine. OCR didn’t need to be perfect. It needed to be reliable enough to support safety decisions.

That distinction changed how I evaluated success.

System architecture overview

Once everything was connected, the architecture looked like this:



This diagram represents more than flow. It reflects separation of concerns. Each component can evolve without breaking the system.


Reflection from analyst to engineer

Before this project, I thought like an analyst.

I focused on insights, metrics, and outcomes. This build forced me to think like an engineer. About failure modes. About responsibility. About how systems behave under imperfect input.

I learned that real world ML isn’t about clever models. It’s about decisions you’re willing to stand behind.

This prototype didn’t just validate an idea. It changed how I approach problems.

It taught me how to build, not just analyse.

Comments

  1. I’ve now read a few of your posts after stumbling across your blog and I’ve really enjoyed them. This one was especially interesting — the way you describe the shift from theory to actually building something real is very relatable. It also quietly shows how much discipline and persistence a project like this takes. Great read.

    ReplyDelete

Post a Comment

Popular posts from this blog

What Senior Data Analysts Actually Do (Beyond Dashboards)

The Future of Food Safety Tech: How AI Driven Transparency Can Transform Global Consumer Health

Inside the Smart Food Safety System: Architecture, Data Pipelines, and ML Models Explained