Machine learning development is not data science. Data science ends with a notebook; ML development ends with a running system that other humans trust. The difference is enormous, and it is where most ML investment is lost.
This is the eight-step machine learning development process our senior engineers use. It is designed to fail fast in the early steps and move carefully in the late ones, because that is where the cost curve flips.
1. Problem framing — before any data work
The first and highest-leverage step in machine learning development is deciding what you are predicting and why. Wrong framing wastes months. Good framing is falsifiable: you can state the business metric, the minimum acceptable accuracy, and the cost of a false positive and a false negative.
2. Establish unglamorous baselines
Before any model, write a rules-based baseline. Most of the time a linear model or a decision rule gets you 70% of the value in a week. If it does not, your data or framing is off, and no deep network will save you.
3. Data — the step that actually consumes the budget
- Data availability — do you already own enough signal?
- Data quality — labels, drift, missingness, leakage.
- Data governance — PII, consent, retention, and who can touch what.
- Data infrastructure — feature stores, versioning, reproducibility.
On real machine learning development engagements, we spend 40–60% of hours on data work. Teams that assume otherwise overrun schedule and underdeliver on accuracy.
4. Modelling — iterate cheaply, commit carefully
Start cheap: scikit-learn, gradient-boosted trees, small fine-tunes. Only move up the complexity curve when the cheap option has plateaued and the business value justifies the operational cost.
5. Evaluation that matches the business
Your evaluation metric should look like the way a human would judge the system in production, not just what is easy to compute. Ranking, calibration, top-k precision, and cost-weighted error matter far more than raw accuracy.
6. Deployment and serving
- Batch vs online vs streaming — pick the cheapest that meets SLAs.
- Versioned artefacts, versioned features, versioned prompts.
- Canary releases and automatic rollback on eval regressions.
7. Monitoring — where most ML projects silently die
Every production ML system should monitor prediction distributions, input distributions, latency, error budgets, and business KPIs side by side. Drift alerts that only fire when accuracy drops are too late.
8. The feedback loop
Close the loop: labelled outcomes flow back into training data. Without a feedback loop, your model is frozen while the world moves. With one, it compounds.