ML Case-study Interview Question: Predicting Refrigeration Temperature Anomalies at Scale Using Prophet

Rohan Paul

Apr 21, 2025

Browse all the ML Case-Studies here.

Case-Study question

A major retail chain stores perishable goods in over half a million refrigeration units of various types, each requiring different temperature ranges. The company faces recurring malfunctions, leading to product spoilage and reduced customer satisfaction. They have sensor data (pressure, fan speed, temperature, defrost states, compressor lock states, outside air temperature) at ten-minute intervals, aggregated to hourly. The task is to build a scalable automated approach to forecast temperature anomalies 72 hours ahead. How would you design, implement, and validate a reliable end-to-end solution to prioritize issues and reduce overall maintenance burden?

Connect with me on X (Twitter)

Detailed Solution

Data Extraction and Aggregation

Sensor data arrives every ten minutes. The data is noisy and sometimes incomplete. Aggregating to hourly intervals reduces missing-value impact. Cleaning includes removing invalid readings and imputing short gaps. Time stamps are aligned so all features match the same hourly slots.

Feature Engineering

Recent historical data is used as input. For each hour, lags over a window of the previous 96 hours are computed. Rolling means over six-hour periods capture short-term trends. Outside air temperature forecasts come from a trusted third-party source. The presence of defrost or compressor lock states is turned into time-based signals. These features represent typical operational behavior.

Modeling Approach

ARIMAX and Bayesian Structural Time Series were tested, but execution time across large-scale deployments was high for BSTS. ARIMAX had acceptable accuracy but lacked intuitive ways to pinpoint rapid changes. A procedure with change-point detection was desired.

Prophet includes trend, seasonality, and change-points. It highlights sudden shifts in temperature. Core idea of Prophet’s additive model is:

y(t)=g(t)+s(t)+h(t)+e(t)

Where y(t) is the observed time series. g(t) is the piecewise linear or logistic trend component that can change at certain time points. s(t) represents seasonality (daily, weekly). h(t) captures holiday or event effects. e(t) is the error term.

Prophet automatically estimates daily and weekly seasonalities and places potential change-points in the early portion of the time series. L1 regularization ensures only significant change-points remain. Including sensor features as additional regressors helps capture external influences.

Anomaly Detection

Each system has a recommended temperature range. Two scenarios guide anomaly tags:

If the recent reading is within range, any forecast drifting beyond that range by a significant margin is flagged.
If the recent reading is already outside range, the forecast is compared against the current temperature level to detect further deterioration.

Defrost periods are excluded from anomaly tagging because temperature is naturally high. Longer and larger deviations rank higher in priority.

Root Cause Indication

A separate Prophet run on each sensor feature helps locate rapid changes in pressure or fan speed near the anomaly time. If a feature shows a similar shift around the same time horizon, it is flagged as a probable cause.

Validation and Deployment

Cross-validation on historical data shows over 85% of systems have forecasts with over 90% accuracy. Around 67% of actual issues are captured when recent data is within range, and 84% when recent data is already out of range.

Anomalies with sensor changes have also been reviewed by maintenance experts. Results integrate into a daily batch process running on a distributed environment using PySpark PandasUDF to scale to thousands of stores. Deployment leverages a cloud platform, storing model artifacts for automatic updates. A front-end interface displays flagged issues and potential causes.

How would you handle large-scale real-time data ingestion?

Data ingestion at this scale often involves a distributed streaming pipeline. Partition data by store or device ID. Use a high-throughput system like Apache Kafka for real-time streams. Store in a data lake or distributed file system. Hourly aggregation can be a Spark structured streaming job to produce hourly feature sets. Maintain checks to drop or impute incomplete data. Persist final aggregated data in a columnar store for fast model inference.

Would you choose univariate or multivariate forecasting under heavy missing data?

Multivariate forecasting provides deeper insight. It captures interactions between temperature, pressure, fan speeds, or external weather. Heavy missing data needs robust imputation or a feature selection approach that drops unreliable variables. In extreme cases, partial univariate models can still run on minimal external data. However, without multiple sensors, root-cause analysis becomes harder. A balanced approach is to adapt the data pipeline to reduce missing intervals, perhaps discarding features with persistently high missing rates.

Why is Prophet’s change-point functionality important for diagnostics?

Change-points reveal abrupt shifts. This is critical for systems that fluctuate when compressors fail or defrost cycles go rogue. Traditional ARIMA-based approaches focus on correlations and seasonal patterns but don’t explicitly mark abrupt breaks. Identifying these breaks helps highlight recent mechanical faults or unusual temperature spikes. For cause discovery, each sensor feature is examined in a similar fashion, pinpointing change-points that coincide with target shifts.

How do you ensure explainability of flagged anomalies?

Explainability hinges on clear logic linking sensor behavior to anomalies. One approach is storing time segments around identified anomalies, then automatically retrieving sensor patterns that changed concurrently. Observing a fan speed spike or compressor lock at the same time clarifies what went wrong. Visual overlays of temperature versus key sensor signals near anomalies provide immediate interpretability. This transparency is valuable for maintenance teams who want quick guidance without manual data exploration.

What steps to optimize for near real-time predictions?

Hourly batch processing might be replaced with micro-batching every 15 minutes if sensor updates become more frequent. Prophet can still forecast 72-hour horizons, but partial retraining or incremental updates might be needed. Online learning variants of ARIMAX or a streaming approach can also be considered if real-time adjustments are critical. The key is balancing model complexity with the infrastructure’s ability to re-train or update frequently without excessive latency.

How do you confirm model stability over time?

Monitor MAPE or RMSE on a rolling basis against new data. Watch for performance drift, indicating sensor calibration issues or fundamental changes in refrigeration behavior. If the drift exceeds a threshold, retrain or recalibrate with fresh data. Periodic checks compare anomaly forecasts to real mechanical failure logs or system resets. A continuous feedback loop ensures the model stays aligned with the evolving system.

How would you extend this approach to other IoT devices?

Similar logic applies to any machine with sensor data: define an acceptable range, forecast future values using external drivers, identify large and persistent divergences, and cross-check relevant signals for diagnostic clues. Using distributed computing frameworks scales across devices. The method is flexible as long as relevant sensor data and external drivers (like weather or load) are available, and a suitable time-series model with clear interpretability is chosen.

Rohan's Bytes

Discussion about this post