Inside the Smart Food Safety System: Architecture, Data Pipelines, and ML Models Explained

 A deep technical walkthrough of the data pipelines, algorithms, and design decisions behind my food safety prototype

Architecture overview

Once the prototype moved beyond experimentation, I needed a structure that could survive real-world input.

Food labels are noisy. OCR is imperfect. Safety decisions cannot rely on a single model prediction. The architecture reflects that reality by separating concerns clearly and defensively.

At a high level, the system flows as follows:

Image / Label Input
OCR + Text Parsing
ETL + Validation Layer
Feature Engineering
Freshness ML Model
Rule-Based Safety Engine
Human-Readable Output
Each layer can fail safely without corrupting the next.

Data Engineering layer (ETL, validation, anonymisation)

This layer exists to answer one question:

Can this data be trusted enough to make a safety decision?

ETL ingestion

Raw inputs enter the system either as:

  • OCR extracted text

  • Structured label metadata (during testing)

def ingest_label(raw_text: str, source: str) -> dict: return { "raw_text": raw_text, "source": source, "ingested_at": datetime.utcnow() }

Nothing downstream assumes correctness.

Validation logic

Before feature engineering, every record is validated.

def validate_label(label: dict) -> bool: required_fields = ["expiry_date", "product_type"] for field in required_fields: if field not in label or label[field] is None: return False return True

Ambiguous expiry dates or missing fields are flagged and routed for manual review or conservative scoring.

Anonymisation (intentional design choice)

The system does not require business identifiers or customer data.

def anonymise_record(record: dict) -> dict: record.pop("restaurant_id", None) record["record_id"] = uuid4().hex return record

This makes the system:

  • Privacy preserving by default

  • Easier to deploy across vendors

  • Safer for regulatory environments

This decision was architectural, not cosmetic.

Structured schema

After validation, all data conforms to a fixed schema.

LabelSchema = { "record_id": str, "product_type": str, "expiry_date": date, "storage_temp": float, "allergens": list }

Downstream logic never handles raw text directly.

Machine Learning layer (freshness estimation)

The ML layer estimates gradual risk, not safety decisions.

Feature engineering

Freshness decays non linearly and differently across food types.

def compute_features(label): days_to_expiry = (label["expiry_date"] - date.today()).days return { "days_to_expiry": days_to_expiry, "temp_deviation": abs(label["storage_temp"] - IDEAL_TEMP[label["product_type"]]), "product_sensitivity": SENSITIVITY_MAP[label["product_type"]] }

Features were chosen for explainability, not model cleverness.

Model choice

I intentionally avoided deep models at this stage.

from sklearn.linear_model import LinearRegression model = LinearRegression() model.fit(X_train, y_train)

Why:

  • Deterministic behaviour

  • Stable predictions near thresholds

  • Easier debugging when things go wrong

Evaluation metrics

Accuracy alone is meaningless here.

I focused on:

  • Error near expiry boundaries

  • Stability across similar products

  • Consistency under small input noise

mean_absolute_error(y_test, y_pred)

Interpretability

Every freshness score can be decomposed:

contribution = model.coef_ * feature_vector

If I can’t explain why a score dropped, the model isn’t production worthy.

Rule Engine layer (non negotiable safety logic)

This layer exists because ML cannot be trusted alone in food safety.

Expiry logic

def expiry_rule(label): if label["expiry_type"] == "use_by" and date.today() > label["expiry_date"]: return "UNSAFE"

No model can override this.

Threshold mapping

def freshness_bucket(score): if score >= 0.8: return "SAFE" elif score >= 0.5: return "CAUTION" else: return "UNSAFE"

Clear, configurable, explicit.

Allergen override

def allergen_check(allergens, user_allergies): return bool(set(allergens) & set(user_allergies))

If triggered, freshness is irrelevant.

Final decision engine

def safety_decision(label, score): if expiry_rule(label) == "UNSAFE": return "UNSAFE" if allergen_check(label["allergens"], USER_PROFILE): return "ALLERGEN RISK" return freshness_bucket(score)

This hierarchy reflects real world responsibility.

Application layer (API + UI readiness)

The system is built API first.

POST /evaluate_food_item

Response example:

{ "status": "CAUTION", "reason": "Low freshness score due to temperature deviation", "confidence": 0.72 }

The UI never sees raw ML outputs. Only decisions and explanations.

Scaling considerations (designed, not promised)

Even as a local prototype, the architecture supports:

  • Batch scoring for retail inventory

  • Cloud containerisation

  • Near real time re evaluation

  • Federated learning without data sharing

These paths exist because of early design discipline.

Reflection

This project forced me to think beyond models.

I had to consider:

  • What happens when OCR fails

  • How unsafe data propagates

  • Where human override is required

  • How trust is built through explanation

This is no longer an “analytics project.”

It is a system designed around responsibility, uncertainty, and real world constraints.

And it is still evolving.

Comments

  1. Very clear and in depth explanation, its very helpful to think real time issues with practival mind set.

    ReplyDelete
  2. This was a really good read. I liked how clearly you walked through the architecture step by step — it made the flow easy to understand. You’ve shown strong system-level thinking and good attention to detail. The article feels well thought out and nicely structured. Well explained and great effort!

    ReplyDelete
  3. it’s great to see the thinking behind the system, not just the model. I like the focus on safety, explainability, and privacy from the start. Building something that can handle messy real-world data and still be trusted is what makes this impressive.

    ReplyDelete

Post a Comment

Popular posts from this blog

What Senior Data Analysts Actually Do (Beyond Dashboards)

The Future of Food Safety Tech: How AI Driven Transparency Can Transform Global Consumer Health