Author’s photo | Diagram with Chronos-2: from one-dimensional to universal forecasts
# Entry
The base models didn’t start with ChatGPT. Long before enormous language models became popular, pre-trained models were already driving advances in computer vision and natural language processing, including segmentation, classification, and text understanding.
If you still rely primarily on classical statistical methods or deep learning models with a single dataset, you may be missing a fundamental shift in the way you build forecasting systems.
# 1. Chronos-2
Key Features:
- Encoder-only architecture inspired by T5
- Null forecasting with quantile scores
- Native support for past and known future covariates
- Long context length up to 8192 and forecast horizon up to 1024
- Productive CPU and GPU inference at high throughput
Apply cases:
- Enormous-scale forecasting across multiple related time series
- Forecasting based on covariates such as demand, energy and prices
- Quick prototyping and production implementation without model training
Best utilize cases:
- Production forecasting systems
- Research and benchmarking
- Intricate multivariate forecasting with covariates
# 2. TiRex
Key Features:
- Pre-trained architecture based on xLSTM
- Zero forecasting without training on a specific dataset
- Point projections and quantile-based uncertainty estimates
- Good performance in both long- and short-term benchmarks
- Optional CUDA acceleration for capable GPU inference
Apply cases:
- Zero forecasting for fresh or unseen time series datasets
- Long- and short-term forecasting in finance, energy and operations
- Quick benchmarking and deployment without model training
# 3.TimesFM
Key Features:
- The basic model is intended only for a decoder with a 500M checkpoint
- Forecasting univariate zero-shot time series
- Context length up to 2048 time points, with support beyond training limits
- Versatile forecast horizons with optional frequency indicators
- Optimized for speedy, large-scale point forecasting
Apply cases:
- Enormous-scale univariate forecasting across diverse datasets
- Long-term forecasting for operational and infrastructure data
- Experiment and compare quickly without model training
# 4. IBM Granite TTM R2
Key Features:
- Tiny, pre-trained models starting with 1M parameters
- High efficiency of multivariate forecasting using the zero method and several shots
- Targeted models tailored to your specific context and forecast length
- Quick inference and tuning on a single GPU or CPU
- Support for exogenous variables and unchanging categorical features
Apply cases:
- Multivariate forecasting in low-resource or edge environments
- Zero baselines with optional slight tuning
- Quick implementation for operational forecasting with confined data
# 5. Toto Open Base 1
Key Features:
- A decoder-only transformer providing adaptable context and predicted lengths
- Zero-forecasting without fine-tuning
- Efficiently handle high-dimensional multidimensional data
- Probabilistic forecasts using the Student-T mixture model
- Pre-trained on over two trillion time series data points
Apply cases:
- Forecasting observability and monitoring metrics
- Enormous-scale telemetry of systems and infrastructure
- Zero-based forecasting for large-scale nonstationary time series
Abstract
| Model | Parameters | Architecture | Forecasting type | Key strengths |
|---|---|---|---|---|
| Chronos-2 | 120M | Encoder only | One-dimensional, multi-dimensional, probabilistic | High zero-shot accuracy, long context and horizon, high inference throughput |
| TiRex | 35M | based on xLSTM | One-dimensional, probabilistic | A featherlight model with good parameters in the brief and long term |
| TimesFM | 500M | Only decoder | One-factor, point forecasts | Supports long contexts and adaptable horizons at scale |
| Granit TimeSeries TTM-R2 | 1M – tiny | Targeted, pre-trained models | Multidimensional, point forecasts | Extremely compact, speedy inference, forceful zero results and few shots |
| Toto Open Base 1 | 151M | Only decoder | Multidimensional, probabilistic | Optimized for high-dimensional, non-stationary observability data |
Abid Ali Awan (@1abidaliawan) is a certified data science professional who loves building machine learning models. Currently, he focuses on creating content and writing technical blogs about machine learning and data science technologies. Abid holds a Master’s degree in Technology Management and a Bachelor’s degree in Telecommunications Engineering. His vision is to build an AI product using a graph neural network for students struggling with mental illness.
