ML Interview Q Series: Which model would best predict agent needs across call centers, and how would you evaluate and balance accuracy?

May 01, 2025

📚 Browse the full ML Interview series here.

Comprehensive Explanation

Model Choice for Predicting Call Volumes

A common way to approach call center staffing is to forecast the volume of incoming calls (or customer requests) over time. Because call patterns typically follow daily, weekly, and seasonal fluctuations, time series forecasting methods often work well. Some potential model classes include:

Connect with me on X (Twitter)

• Traditional Time Series Algorithms: ARIMA-based models, Exponential Smoothing, and SARIMA (Seasonal ARIMA) are widely used. • Machine Learning Regressors: Tree-based methods such as XGBoost or LightGBM can be applied to predict the next step or multiple steps ahead. • Deep Learning Models: Recurrent Neural Networks (like LSTMs) or Temporal Convolutional Networks can capture longer-range dependencies and complex patterns in call volume.

The key is to capture any repeating seasonal pattern (e.g., more calls during certain hours or days) and to incorporate external factors such as holidays, special promotions, or marketing campaigns.

Core Cost Function for Over-Allocation vs Under-Allocation

One way to decide whether it is better to over-allocate or under-allocate agents is to quantify the cost implications of each scenario. We can define a cost function for a given number of staffed agents S and an actual demand V. If S >= V, we have extra agents; if S < V, we have fewer agents than needed. A simplified piecewise cost function could be written as:

In text form, c_1 is the cost per idle agent, while c_2 is the cost per unmet call (or a lost opportunity, customer dissatisfaction, etc.). By comparing c_1 and c_2, we decide how aggressively we should overstaff or understaff. If c_2 is significantly higher than c_1, we lean toward overstaffing to avoid damaging service quality.

Relevant Metrics for Model Performance

When evaluating a forecasting model for call volume, the following error metrics are usually used:

Mean Squared Error in plain text can be shown as MSE = (1 / n) * sum((y_t - y'_t)^2) over t=1 to n. It penalizes large errors heavily and gives a rough sense of variance.

Mean Absolute Error measures the average magnitude of errors in absolute terms, making it easier to interpret.

Mean Absolute Percentage Error (MAPE) helps gauge the relative size of the errors compared to the magnitude of the demand. It is particularly useful if call volumes vary significantly over time.

However, real-world decisions also require metrics that reflect how well the model satisfies service-level requirements, such as the percentage of time the forecast is above the actual calls, or how often it is below. In many call center contexts, a small negative error (understaffing) can be much worse than a small positive error (overstaffing).

Trade-Off Between Over-Allocating and Under-Allocating Agents

Whether you prefer over-allocation or under-allocation depends on the cost structure of your business and its priorities:

In most customer support scenarios, under-allocation is very costly due to lost sales, poor customer satisfaction, and reputational damage. Over-allocation might be more acceptable if idle agent cost is not prohibitively high.

In some strict budgeting scenarios, idle time is considered very expensive, so there might be pressure to reduce overstaffing.

To determine this trade-off, you would typically analyze historical data on lost calls, service-level agreements (SLAs), and agent cost per hour. A realistic approach is to define a custom objective function that incorporates these costs and then train or tune your forecasting/staffing model to minimize that function.

Sample Implementation in Python

Below is a small illustration of using a traditional ARIMA model from statsmodels to forecast call volume, which you could then pass to a resource allocation algorithm that decides how many agents to staff. This is just a demonstration of the modeling portion.

import pandas as pd
from statsmodels.tsa.arima.model import ARIMA
import numpy as np

# Suppose df is a DataFrame with a DateTime index (df.index) and a column "calls"
# that contains the number of calls at each time period.

# Split into training and testing
train_size = int(len(df) * 0.8)
train_data = df['calls'][:train_size]
test_data = df['calls'][train_size:]

# Fit ARIMA model
model = ARIMA(train_data, order=(2,1,2))  # (p,d,q) can be tuned
fitted_model = model.fit()

# Generate forecasts
forecast_steps = len(test_data)
forecast = fitted_model.forecast(steps=forecast_steps)

# Evaluate performance (for instance with MAE)
mae = np.mean(np.abs(forecast - test_data))
print("MAE:", mae)

# Next step would be to feed this forecast into a staffing function
# that decides how many agents to allocate at each future time step
# possibly using the cost function described above.

To incorporate domain knowledge like day-of-week seasonality or holiday effects, you can engineer additional features (in a machine learning approach) or use a seasonal ARIMA or Prophet model that naturally captures seasonality. For neural networks, specialized architectures like LSTMs or Transformers can be employed with time-based features.

How to Incorporate SLAs and Business Constraints

Service-level agreements often require that a certain percentage of calls be answered within a given time. You could embed this requirement into a penalty function that includes waiting times or the fraction of missed calls. This can be done by simulating queueing behavior under different staffing levels. The ultimate decision is typically reached by balancing that penalty with the cost of extra agents.

Follow-up Question: How do you handle sudden spikes or outliers in the call volume data?

Sudden spikes can occur due to one-time events, technical issues, or marketing campaigns. In many standard forecasting models, large outliers can skew the parameter estimates. Techniques to handle spikes or outliers include:

Using robust forecasting methods or transforming the data (for example, applying a logarithmic transform if volumes are strictly positive). Incorporating exogenous variables (like marketing events or product launches) directly into the model so it “knows” when spikes are more likely. Applying an anomaly detection step prior to training, so that abnormal points are identified and handled (either by capping them or treating them as special events).

Follow-up Question: Why might you prefer a probabilistic forecast rather than a single deterministic forecast?

A single deterministic forecast (e.g., “we predict 100 calls at 2 PM tomorrow”) does not capture uncertainty. In practice, you might want to know a range of possible outcomes, such as a confidence interval or a full probability distribution. With a probabilistic forecast, you can:

Consider different scenarios, such as worst-case (high volume) and best-case (low volume) call loads. Allocate agents to meet a chosen service probability target, for example staffing at the 90th percentile of predicted call volume. Incorporate more nuanced cost functions that account for the distribution of possible outcomes, not just a point estimate.

Follow-up Question: How would you extend or adapt this approach if the client’s demand changes drastically over time (for example, new product launches, expansions, or cyclical changes in business)?

One approach is to periodically retrain or update the model to incorporate the latest data. This process is known as model adaptation or incremental learning. Additional considerations include:

Using an online learning or streaming approach that continually updates the model parameters as new data arrives. Setting up automated alerts or triggers to indicate that model performance is declining, which signals a need for immediate retraining. Adding dynamic features that measure the real-time changes in call arrival rates or external indicators so that the model remains sensitive to sudden shifts in behavior.

Follow-up Question: How do you account for different time resolutions in the data?

Sometimes call data might be aggregated at a daily or hourly level, while you need minute-by-minute predictions. You can:

Use hierarchical time-series approaches, forecasting at a coarser resolution first, then refining in a second stage. Aggregate or disaggregate data carefully, using domain knowledge (like typical within-hour distributions) so that you do not lose the natural patterns of call arrivals. Adopt specialized models that can handle high-frequency time series data directly if minute-by-minute data is both available and of sufficient quantity for training.

Follow-up Question: How would you handle a scenario where a subset of calls might be much longer than others, affecting agent availability?

Typical call volume forecasts only predict how many calls might come in, not how long each call may take. Large call durations can reduce effective agent availability. Approaches to manage this include:

Building an additional model for call duration, possibly as a regression or survival analysis problem. Using queueing theory (like Erlang models) to combine both arrival rates and average service times into a final staffing estimate. Including call handle time as an exogenous factor in the main model or building a simulation that merges call arrivals with distributions of handle times.

By addressing these considerations comprehensively, you ensure that your model not only forecasts how many calls will come in, but also informs a staffing strategy that balances service-level objectives with operational costs.

Below are additional follow-up questions

How do you incorporate real-time data or streaming updates into your staffing model?

In many call centers, you receive continuous updates about the number of incoming calls, average handle times, and agent availability. Instead of using a pure batch process (e.g., retraining once a day), you can refine your forecasts and staffing plans throughout the day based on real-time data. One approach is to implement an online learning algorithm or a streaming model that updates its parameters whenever new data comes in. A potential pitfall is that if the real-time data has fluctuations too small to justify significant staffing changes (or the lead time to bring in additional agents is too long), frequent adjustments might be counterproductive. Another subtle issue arises if data pipelines have delays or if the streaming data is noisy—aggressive decisions based on very short-term spikes could lead to overreaction, so you need a smoothing factor or a threshold-based mechanism to avoid constant toggling of agent counts.

How do you handle agent heterogeneity, such as varying skill sets or different languages?

In many real-world call centers, different agents specialize in distinct areas, or speak different languages, and not all incoming calls can be handled by every agent. You could build a separate demand forecast for each skill or language category. Then you must allocate the appropriate subset of agents to each forecasted demand. One pitfall is ignoring cross-training capabilities: some agents might handle multiple skill types if needed, but at different efficiency levels. A sophisticated solution includes modeling agent capabilities as a bipartite matching or multi-dimensional optimization problem, where you aim to maximize coverage while minimizing idle time. Additionally, you must consider how quickly agents can switch between tasks. If re-skilling or cross-training is possible, the model should incorporate the cost and time required to train agents in multiple skills.

In what ways could you integrate queueing theory concepts (e.g., Erlang models) into your approach?

Queueing theory provides analytical formulas for metrics such as average waiting time, probability of waiting, and agent utilization given arrival rates and service rates. You can combine the forecasted demand (arrival rates) with a queueing formula to determine the minimal number of agents needed to achieve a target service level. A common approach is the Erlang C formula, which, in big h1 LaTeX, can be written as:

In plain text, c is the number of agents, and rho is the traffic intensity, typically arrival_rate * average_call_duration / number_of_agents. You’d use this to compute service metrics like the probability that an arriving call must wait. A subtle edge case arises when the arrival process deviates from Poisson assumptions or when service times have high variability. Pure Erlang formulas might under- or over-estimate staffing needs if assumptions are violated. An even trickier case is when arrivals are non-stationary (e.g., demand changes over time), so you might need an extended or piecewise version of Erlang models to handle time-varying rates.

How do you ensure your model remains robust to data drift or changes in call patterns over time?

Data drift occurs if call patterns shift due to new customer behaviors, new products, or external factors such as economic changes. One method is to monitor model performance metrics (like MAPE or RMSE) in real time to detect a sudden degradation in forecasting accuracy. A rolling retraining pipeline can be scheduled, or triggered automatically, whenever performance falls below a threshold. A challenge arises if the data drift is extremely abrupt—your model might not have enough recent training examples in the new regime. In that case, you could incorporate transfer learning or domain adaptation techniques, or quickly gather enough new data to retrain from scratch. Another subtlety is that if drift is temporary (e.g., a one-time event), automatically retraining might cause the model to overfit that anomaly.

Can you discuss potential issues around data privacy and compliance when dealing with call center data?

Call center logs often include sensitive customer information, like personal details or payment information. Ensuring compliance with regulations such as GDPR or CCPA requires that you properly anonymize or aggregate data before using it for modeling. If your system needs real-time data, you must be certain that you are not inadvertently streaming personally identifiable information to external services or cloud environments that lack appropriate security measures. Another edge case is if you rely on call transcripts for additional features (like topic classification or sentiment). These transcripts could contain highly sensitive text, so advanced encryption and access controls are critical. If anonymization is not handled carefully, you might inadvertently tie call volume spikes to specific individuals or times, creating privacy risks.

How would you address seasonality that spans multiple time scales (e.g., weekly, monthly, and annual patterns all at once)?

Many call centers have layered seasonality—for instance, a daily pattern (mornings are busy, midday might be slower), a weekly pattern (weekends differ from weekdays), and a yearly pattern (holidays or special events). One approach is to use models that handle multiple seasonal components, such as SARIMA (which can be extended to multiple seasonalities) or Prophet (which has built-in support for daily, weekly, and yearly seasonality). For deep learning, you could incorporate time-based embeddings that capture day-of-week, month-of-year, and holiday indicators. A key challenge is ensuring you have enough historical data to reliably detect these seasonalities, especially if you need to capture annual patterns but only have six months of data. Another subtlety arises if you have partial coverage of certain types of seasonality (e.g., you only have two holiday seasons in your dataset). You might see an overfitting or underfitting phenomenon for rare seasonal events.

How would you design a simulation or test environment to validate your staffing strategy before deploying it to a live call center?

A simulation environment can replicate incoming calls using historical data, or a synthetic distribution that approximates real-world conditions, and then apply your staffing algorithm to determine outcomes like average wait times, number of abandoned calls, and agent utilization. You might incorporate a queueing simulator that processes individual call arrivals and departures to capture the complexities of agent scheduling. A pitfall is if your simulation makes unrealistic assumptions about agent behavior—like ignoring how breaks or shift handovers are handled. Another subtlety is if your historical data is incomplete or biased (e.g., containing primarily normal periods and few peak events). Your simulation may show optimistic results that fail in a real crisis (like a sudden system outage or viral campaign).

How do you manage large fluctuations in call volume that occur across different time zones or different geographical regions?

If you operate multiple call centers across the globe, you may experience peaks in different time zones at overlapping hours. One strategy is to pool resources virtually, allowing agents in off-peak regions to handle some of the load from peak regions. However, this requires analyzing the time zone overlap, language capabilities, and local labor laws or shift rules. A subtle challenge arises if your forecast model for one region is too simplistic and doesn’t account for how calls might get rerouted from a high-demand region. Another pitfall is ignoring cultural or language constraints. Even if agents technically speak a foreign language, service quality might drop if they are not fully fluent or trained for that region’s common call types. A further edge case arises when external events in one region cause sudden spikes, which the global routing system might fail to handle if not configured to dynamically distribute calls.

How can you incorporate agent-specific performance metrics into your staffing decisions?

While forecasting the volume of calls is one part, understanding how efficient or skilled each agent is can refine scheduling. You might build separate productivity scores based on historical average handling time, call resolution rate, or customer satisfaction for each agent. Then your scheduling algorithm can prioritize assigning more calls to high-performing agents. A pitfall is that over-reliance on high-performing agents can lead to burnout or unfair distribution of workload. Another subtlety is the possibility of biases in performance metrics—for example, an agent always handling more complex calls might appear to have a lower throughput, but they are actually dealing with more challenging customer issues. You must incorporate fairness constraints or skill-level adjustments in your model to avoid punishing agents who take on more difficult tasks.

What happens if the cost structure changes over time, for example, if the price of overtime or contractor agents spikes?

In some companies, the cost of labor changes dynamically—perhaps hiring temporary staff in peak seasons is more expensive than normal wages for full-time agents. You might design a dynamic optimization approach that updates staffing strategies based on real-time cost parameters. One risk is that if cost optimization overrides service-level objectives too aggressively, you could end up consistently understaffed and miss SLAs. Conversely, if you overemphasize service levels, you might lock the company into paying high labor costs. A subtle edge case arises if you have complicated union rules or contractual obligations for minimum staffing levels, which can constrain how you adapt to changing costs. Another subtlety is that an abrupt cost increase might happen mid-forecast horizon, meaning your model’s historical data won’t reflect the new cost reality, so you need a mechanism to incorporate real-time cost updates without waiting for long retraining cycles.

Rohan's Bytes

Discussion about this post