Applied Data Analytics for Impact

Posts

Showing posts from 2025

Turning a Food Safety Idea Into a Real Prototype: My Data & ML Build Journey

December 29, 2025

How I taught myself practical machine learning and engineered a working prototype using Python, OCR, and rule based logic Why I started building and not just thinking After mapping the food safety problem, I reached a point where thinking wasn’t enough. Ideas can sound convincing in words. Diagrams can make them look coherent. But without a working prototype, everything stays hypothetical. I didn’t want this project to live as a concept or a case study. I wanted to know whether it could actually work. Building felt like the only honest way to validate the idea. So I decided to commit to a fixed window and treat it like an engineering challenge, not a side thought. Sixty days. One end to end prototype. No shortcuts. Starting point: theory heavy, practice light During my Master’s, I had studied machine learning, NLP, and Python. I understood models conceptually. I knew how algorithms worked on paper. I had written isolated scripts and notebooks. But I had never built a complete system wh...

From Natasha’s Law to India’s Food Safety Gap: Why I Started Building a Smart Food Safety System

December 18, 2025

A data driven reflection on why consumer food safety needs innovation and how my journey began Introduction: A quiet turning point When working in India and travelling every 6 or a year once I noticed loads of difference in one them was the negligence of the FOOD SAFETY. That’s when I began noticing something small but unsettling. Customers regularly asked questions about food that surprised me. They confused expiry dates. They didn’t understand allergens. Some assumed if food smelled fine, it was safe. Others thought expiry labels were optional suggestions. There was no malice. No carelessness. Just a complete lack of clarity. What struck me most was the contrast. In the UK, food labelling and allergen transparency are taken seriously. In India, even well meaning businesses and customers operate with guesswork. That gap stayed with me. What I learned studying in the UK: Natasha’s Law, simply explained While studying in the UK, I learned about Natasha’s Law. It exists because a...

Performance Optimisation in Power BI & SQL Pipelines

December 06, 2025

Introduction Many analytics systems work well at small scale, then degrade quietly as usage grows. Dashboards take longer to load. Refreshes fail unpredictably. Simple queries become expensive. Analysts respond by adding workarounds rather than fixing root causes. The issue is rarely a single slow query or visual. It’s that performance was never treated as a design concern across the pipeline . Why optimisation is required Performance problems are not just technical inconveniences. They lead to: reduced trust in analytics stakeholders abandoning dashboards analysts spending time firefighting instead of improving insight hidden infrastructure and opportunity costs Optimising performance is not about squeezing milliseconds. It’s about designing analytics systems that remain usable, reliable, and scalable over time . Performance as a pipeline property At a programme level, performance must be considered end to end . Poor performance can originate from: ineff...

Operationalising Predictive Scores in Decision Workflows

November 05, 2025

Introduction Many predictive models never influence real decisions. Scores are generated and stored. Dashboards show rankings. Spreadsheets list “high risk” or “high value” customers. Then… nothing happens! The problem isn’t model accuracy. It’s that predictive scores are rarely embedded into actual decision workflows . Without clear ownership and action paths, models remain analytical artefacts rather than operational tools. Why this is required Predictive models are often built with significant effort, but their value is realised only if: someone knows when to trust the score someone knows how to act on it someone is accountable for outcomes Without operationalisation: stakeholders lose confidence in modelling analysts spend time defending scores instead of improving them models decay quietly without feedback Operationalising predictive scores turns modelling into a decision system , not a reporting exercise. Separating prediction from decision ...

Predictive Modelling for Donor and Customer Behaviour

October 08, 2025

Introduction Many predictive models claim to forecast customer or donor behaviour, but struggle to influence real decisions. Scores are produced, yet no one knows how to act on them. Models perform well in validation but degrade quietly over time. Predictions explain what might happen, but not why or what to do next . The problem is rarely algorithm choice. It’s that predictive modelling is treated as an isolated exercise rather than a behavioural decision system . Why prediction is important Predictive models increasingly influence: targeting and prioritisation retention strategies resource allocation long term engagement planning When models are poorly designed: stakeholders lose trust bias and leakage go unnoticed models become brittle as behaviour shifts analytics teams spend more time defending outputs than improving them Well designed behavioural models do the opposite. They create shared understanding, support action, and adapt as pattern...

Predictive Modelling Without Sensitive Attributes or Sensitive Text Signals

September 26, 2025

Introduction Predictive models often perform best when given more data. But more data is not always better data. Sensitive attributes such as exact age, location, income, or raw text signals can boost short term accuracy while quietly increasing privacy risk, bias, and governance complexity. In many cases, these features are included because they are available, not because they are essential. The real challenge is building predictive models that remain accurate, explainable, and defensible without relying on sensitive attributes or raw text . Why eliminating sensitive attributes is important Models influence decisions at scale. When sensitive features are used directly: models become harder to audit and explain bias and proxy discrimination risks increase feature access becomes difficult to justify model reuse and sharing are restricted By contrast, privacy aware predictive modelling: reduces ethical and legal risk improves long term maintainability encou...

Designing Privacy Aware NLP Pipelines

August 20, 2025

Introduction Text data is one of the most privacy sensitive assets organisations hold. Customer feedback, emails, chat logs, and notes often contain names, locations, contact details, or contextual clues that can identify individuals. Unlike structured data, this information is embedded in free text and is easy to overlook during analysis. As NLP becomes more common in analytics, the risk is not misuse of models, but unintentional exposure of personal data through text pipelines . The challenge is building NLP workflows that extract insight without retaining or amplifying sensitive information . Why designing Privacy aware NLP is required NLP pipelines often sit outside traditional governance controls. Text is copied into notebooks. Raw comments are shared for validation. Model outputs inadvertently surface personal details. This creates several risks: analysts gain access to information they don’t need derived datasets become unsafe to share downstream users in...

Feature Engineering Without Exposing PII

July 31, 2025

Introduction Feature engineering often pulls analysts closer to sensitive data. Raw emails are used to infer domains. Exact dates of birth are used to calculate age. Free fields accidentally leak names or locations. While these features may improve model performance, they also increase privacy risk and complicate governance. In many cases, analysts don’t need direct identifiers at all. The challenge is engineering informative features while deliberately avoiding exposure to PII . What Feature engineering decisions shape Feature engineering decisions shape both model outcomes and data risk. When PII is used directly: access controls become harder to justify datasets become risky to share or reuse downstream users inherit unnecessary responsibility compliance concerns grow over time Privacy aware feature engineering allows analysts to: preserve analytical value reduce exposure by default design models that are easier to maintain and audit Thi...