ML Interview Q Series: Is 10,000 delivery records from a Singapore beta test sufficient to build an accurate ETA model?

Apr 28, 2025

📚 Browse the full ML Interview series here.

Short Compact solution

Since “how accurate is accurate enough” is subjective, it is critical to clarify expectations around the ETA model’s required precision. Once you understand the goal, you can start by creating a baseline model trained on the 10,000 beta-test deliveries and measure its performance using metrics such as RMSE or MAE. This baseline tells you whether 10,000 observations are adequate or if you might need additional data to improve accuracy. If the performance is not sufficient, investigate whether to gather more data, add or refine features (e.g., traffic patterns, supply-demand signals, distance to the restaurant), or simplify the model so it can learn effectively from limited samples. You can then use learning curves to see how additional data influences your metrics. If you discover that model improvements plateau long before you exhaust your data, the issue might lie in feature quality or model complexity. Ultimately, decide if more data acquisition or new features is cost-effective relative to the business impact of more precise ETAs.

Connect with me on X (Twitter)

Comprehensive Explanation

Clarifying “good enough” ETA accuracy is the first critical step. Different parts of the system might demand different precision levels. For instance, when matching orders to drivers, the platform may require highly accurate timing estimates to minimize driver idle time, while the ETA displayed to customers can tolerate some slack as long as it does not frustrate or mislead them.

After clarifying the desired accuracy, a sensible approach is to construct a baseline model using the available 10,000 beta records. A simple example is a regression that depends on average preparation time at each restaurant and the driving time based on distance. You can evaluate its performance using metrics suitable for continuous predictions. Common metrics include:

Root Mean Squared Error (RMSE)

and the coefficient of determination (R-squared), which measures the proportion of variance explained by your model. Once you have the baseline results, you can judge whether 10,000 examples yield enough accuracy for the intended business use.

Learning curves are a valuable way to investigate the relationship between model performance and the quantity of training data. By training on progressively larger subsets of data—say 25%, 50%, 75%, and so on—you can observe how the performance metric changes. If the performance grows substantially with more data and does not plateau, it indicates that collecting or synthesizing additional data is likely beneficial. Conversely, if the model’s accuracy levels off, you might be facing issues related to feature selection, data quality, or model complexity rather than just data volume.

If you find your model remains unsatisfactory, there are several options. You could incorporate new features that better capture relevant signals (for instance, real-time traffic data, supply-demand ratios in the area, or restaurant-specific time-of-day variations). You might switch to simpler or more regularized algorithms that do not require large datasets, or consider dimension reduction if you have too many features relative to your data size. From a business standpoint, also ask whether an imperfect ETA model actually hinders the launch or if it is an issue that can be improved post-launch as more data is collected in production.

Potential Follow-Up Questions