"Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts"

Playback speed

Share post at current time

0:00

Transcript

"Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts"

Generated this podcast with Google's Illuminate.

Rohan Paul

Jan 04, 2025

Existing pre-trained time series foundation models lack scale and efficiency, hindering the development of larger, more capable forecasting models for real-world applications.

TIME-MOE scales time series forecasting to billion-parameter models, improving accuracy while reducing computational costs, by letting experts sleep until needed.

• 23% average MSE reduction in zero-shot forecasting across 6 benchmarks

-----

📚 https://arxiv.org/pdf/2409.16040

Solution in this Paper 🛠️:

• TIME-MOE: A decoder-only transformer with mixture-of-experts layers

• Point-wise tokenization of input time series

• Multi-resolution forecasting heads for flexible prediction horizons

• Pre-training on Time-300B dataset (over 300 billion time points across 9 domains)

• Models scaled up to 2.4 billion parameters (1.1 billion activated)

• Sparse architecture activates only a subset of networks for each prediction

-----

Key Insights from this Paper 💡: