Empowering Time Series AI: How Salesforce is Leveraging Synthetic Data
Introduction
Time series analysis is crucial for various business applications, yet it faces significant challenges related to data availability, quality, and diversity. Real-world datasets often encounter limitations due to regulatory restrictions, biases, and insufficient annotations. These obstacles hinder the development of effective Time Series Foundation Models (TSFMs) and Time Series Language Models (TSLLMs), impacting essential tasks such as forecasting, classification, and anomaly detection.
Salesforce’s Innovative Approach
Salesforce AI Research has proposed a strategic solution to these challenges through the use of synthetic data. Their recent study outlines a method for enhancing TSFMs and TSLLMs by focusing on bias reduction, increasing dataset diversity, and enriching contextual information. This approach is particularly valuable in sensitive sectors like healthcare and finance, where data sharing is tightly regulated.
Key Methodologies
The methodology employed by Salesforce AI Research incorporates various synthetic data generation techniques tailored to capture specific time series dynamics, including trends, seasonal patterns, and noise characteristics. Some notable methods include:
- ForecastPFN: Combines linear-exponential trends with periodic seasonalities, simulating realistic scenarios.
- TimesFM: Integrates piecewise linear trends with autoregressive moving average (ARMA) models.
- KernelSynth by Chronos: Utilizes Gaussian Processes combined with various kernels to create rich synthetic datasets.
Case Studies and Findings
The research findings indicate substantial advantages of using synthetic data throughout the model development lifecycle:
- In pretraining, models like ForecastPFN showed significant enhancements in performance when trained on synthetic datasets.
- Chronos discovered optimal performance by mixing 10% synthetic data with real-world data, beyond which performance could decline.
- Synthetic data also facilitated precise evaluation of model capabilities, enabling researchers to uncover internal representations and identify gaps in learned patterns.
Addressing Limitations and Future Directions
Despite the promising results, the paper identifies limitations in the current use of synthetic data. Key areas for improvement include:
- The need for structured frameworks to systematically integrate synthetic datasets into existing models.
- Exploration of data-driven generative techniques to enhance the realism of synthetic datasets.
- Leveraging synthetic data during fine-tuning phases to address domain-specific gaps more effectively.
Conclusion
Salesforce AI Research highlights that synthetic data is a powerful tool for overcoming data-related challenges in time series analysis. By integrating high-quality synthetic datasets throughout the model development process, TSFMs and TSLLMs can achieve improved generalization, reduced biases, and enhanced performance across various analytical tasks. Future research should focus on enhancing data realism, systematically addressing data gaps, and utilizing iterative synthetic data generation processes. These advancements have the potential to significantly broaden the applicability and reliability of time series models, paving the way for future innovations in artificial intelligence.
Next Steps for Businesses
To leverage AI effectively in your organization, consider the following steps:
- Identify processes that can be automated using AI to enhance efficiency.
- Determine key performance indicators (KPIs) to measure the impact of your AI initiatives.
- Select customizable tools that align with your business objectives.
- Start with small projects to test effectiveness before scaling up.
If you require assistance in managing AI within your business, please reach out to us at hello@itinai.ru or connect with us on Telegram, X, or LinkedIn.