This blog post explores various metrics for evaluating synthetic time series datasets and includes hands-on code examples. It discusses the evaluation of synthetic time series data in scenarios such as model training augmentation, downstream performance, privacy, diversity, fairness, and qualitative analysis. It also presents a comprehensive overview of different evaluation techniques and their applications. The post is part of the TSGM project for enhancing time series pipelines. For more details, see the full blog post on the original website.
Exploring various metrics for synthetic time series evaluation with hands-on code examples
This blog post is available as a jupyter notebook on GitHub and is a part of TSGM, a library for time series generative modeling.
Introduction
Today, we will discuss the evaluation of synthetic time series datasets — datasets that are artificially created to represent real data. Let’s say there is a synthetic dataset D* that is aimed at representing a real dataset D. It is essential to quantitatively evaluate how good these synthetic data are: Does D* represent D well? Are these data secure? Are these data valuable for downstream problems? In this tutorial, we’ll journey through the methods used to quantitatively and qualitatively assess the quality of synthetic time series data.
An example of original and synthetic sine data.
Use Cases of Synthetic Data
First, let’s consider two cases from [1] that describe possible usages of synthetic data:
- An organization wishes to employ an outside agent to analyze sensitive data or study statistical methods for a given problem. Sharing real data can be complicated due to privacy or commercial concerns. Synthetic counterparts can provide a convenient solution to this problem.
- An organization wishes to train a model on a relatively small dataset. However, the dataset is not sufficient for the desired quality of modeling. Such limited datasets can be augmented with synthetic data. This synthetic data, which must be similar to real data, aims to enhance the model’s performance or, in other cases, assist in model reliability tests.
Overall, we indicate and describe in this tutorial the following metrics:
- Real data similarity (Scenarios 1 and 2)
- Distance metric
- Discriminative metric
- Maximum mean discrepancy score
- Predictive consistency (Scenario 1)
- Downstream effectiveness (Scenario 2)
- Privacy (Scenario 1)
- Diversity (Scenarios 1 and 2)
- Fairness (Scenarios 1 and 2)
- Visual comparison (Scenarios 1 and 2)
In TSGM, all metrics are neatly organized in tsgm.metrics. Dive into the details with our comprehensive documentation.
Generating Synthetic Data
Now, let’s kickstart coding examples by installing tsgm:
pip install tsgm
Generating synthetic data. Moving forward, we import tsgm, and load an exemplary dataset. A tensor Xr now will contain 100 sine time or constant time series (based on target class yr). We will use (Xr, yr) as a real (= historical = original) dataset. Xs contains synthetic data generated by a variational autoencoder. (Note: we use only one epoch for demonstration; increase the number of epochs and check training convergence for practical applications).
Importing necessary libraries and initializing parameters for the dataset generation process.
Similarity with Real Data
Starting off, it is convenient to measure similarity between real and synthetic data. One approach to doing this is calculate the distance between a vector of summary statistics of synthetic data and real data.
The smaller the distance, the closer the synthetic data align with the realism of the actual data. Now, let’s define a set of statistics that will serve as the foundation for our distance metric. Methods tsgm.metrics.statistics.axis_*_s calculate statistics * over the provided axis.
Moving forward, let us establish the distance metric. For simplicity’s sake, we will opt for the Euclidean norm.
Bringing it all together, we will utilize tsgm.metrics.DistanceMetric object.
MMD metric
An alternative approach involves comparing synthetic and real data distributions. In this context, the use of Maximum Mean Discrepancy (MMD) [3] proves to be convenient. MMD serves as a non-parametric two-sample test to determine if samples are drawn from the same distribution. Through empirical observations, we have identified the MMD metric as a particularly convenient method for assessing the similarity of real data.
Discriminative metric
In this approach, a model is trained to distinguish between real and synthetic data. In TSGM, tsgm.metrics.DiscriminativeMetric proves to be a valuable tool for this purpose. This metric facilitates the assessment of how effectively a model can discriminate between real and synthetic datasets, providing an additional perspective on data similarity.
Consistency Metric
Next, we move on to consistency metric. The idea is aligned with Scenario 1 written above. Here, our focus is on gauging the consistency of a set of downstream models. In more detail, let us consider a set of models ℳ, and an evaluator E: ℳ × 𝒟 → ℝ.
To evaluate consistency of ℳ on D and D*, we measure p(m₁ ∼ m₂| m₁, m₂ ∈ ℳ, D, D*) where m₁ ∼ m₂ means m₁ consistent with m₂: “if m₁ outperforms m₂ where the models trained on D, then it outperforms m₂ on D* and vice versa.” Estimating this probability involves fixing a finite set ℳ and evaluating the models using real data, and separately evaluating them using synthetic data.
In TSGM, our first step is to define a set of evaluators. For this purpose, we’ll leverage a collection of LSTM models, ranging from one to three LSTM blocks.
Downstream Performance
Now, let’s explore how generated data can contribute to improving predictive performance in a specific downstream problem. We’ll consider two distinct approaches to evaluating downstream performance:
- Augmenting Real Data with Synthetic Data.
- Utilizing Generated Data Exclusively for Downstream Model Training.
The result signifies accuracy gain from augmenting with synthetic data compared to the model trained exclusively on the training data.
Privacy: Membership Inference Attack Metric
One approach to measuring privacy of synthetic data is measuring the susceptibility to membership inference attacks. Membership inference attack procedure is visually depicted in the figure above. The idea is the following. Imagine an attacker who has access to synthetic data and a particular data sample (which may or may not exist in the original dataset). The goal is to infer whether this sample is present in the real data.
tsgm.metrics.PrivacyMembershipInferenceMetric measures the susceptibility to membership inference attacks using synthetic data.
Diversity
With this metric, our goal is to quantify the diversity of synthetic data. Consider the image below, where red dots represent real data and blue dots signify synthetic data. Which option yields a superior synthetic dataset? The one on the right appears more favorable, but why? The answer lies in its diversity, making it potentially more versatile and useful. However, diversity alone is not sufficient; it’s crucial to consider other metrics in tandem, such as distance or downstream performance. In our exploration, we’ll exemplify the concept using entropy.
Fairness
The topic of fairness intersects with synthetic time series generation in two significant ways. Firstly, it’s crucial to assess whether synthetic data introduces new biases. Secondly, synthetic data presents an opportunity to mitigate biases inherent in the original data. Defining standardized procedures for checking fairness proves challenging, as it often hinges on the specifics of downstream problems.
Qualitative Analysis
In order to evaluate the data qualitatively, it is convenient to:
- Draw samples and visualize individual samples from synthetic and real data.
- Build embeddings of the generated samples and visualize them using, for instance, TSNE.
Citation
This blog post is a part of the project TSGM, in which we are creating a tool for enhancing time series pipelines via augmentation and synthetic data generation.
Conclusion
In conclusion, we’ve explored various evaluation techniques for synthetic time series data, providing a comprehensive overview of different scenarios. To navigate through these methods effectively, it’s beneficial to consider the described scenarios. Ultimately, selecting the right metric is contingent on the downstream problem, application area, and legal regulations governing the data in use. The diverse set of metrics provided aims to assist in crafting a comprehensive evaluation pipeline tailored to your specific problem.
References
- Nikitin, A., Iannucci, L. and Kaski, S., 2023. TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series. arXiv preprint arXiv:2305.11567.
- Time Series Augmentations, TowardsDataScience post, https://medium.com/towards-data-science/time-series-augmentations-16237134b29b.
- Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B. and Smola, A., 2012. A kernel two-sample test. The Journal of Machine Learning Research, 13(1), pp.723–773.
- Wen, Q., Sun, L., Yang, F., Song, X., Gao, J., Wang, X. and Xu, H., 2020. Time series data augmentation for deep learning: A survey. arXiv preprint arXiv:2002.12478.
- Wattenberg, M., Viégas, F. and Hardt, M., 2016. Attacking discrimination with smarter machine learning. Google Research, 17.
- Machine Learning Glossary: Fairness. Google Developers Blog.
Unless otherwise noted, all images are by the author. For additional materials on synthetic time series generation, see TSGM on GitHub, and subscribe to Medium posts.
Evolve Your Company with AI
If you want to evolve your company with AI, stay competitive, use for your advantage Evaluation of Synthetic Time Series.
Discover how AI can redefine your way of work. Identify Automation Opportunities: Locate key customer interaction points that can benefit from AI. Define KPIs: Ensure your AI endeavors have measurable impacts on business outcomes. Select an AI Solution: Choose tools that align with your needs and provide customization. Implement Gradually: Start with a pilot, gather data, and expand AI usage judiciously.
For AI KPI management advice, connect with us at hello@itinai.com. And for continuous insights into leveraging AI, stay tuned on our Telegram t.me/itinainews or Twitter @itinaicom.
Spotlight on a Practical AI Solution
Consider the AI Sales Bot from itinai.com/aisalesbot designed to automate customer engagement 24/7 and manage interactions across all customer journey stages.
Discover how AI can redefine your sales processes and customer engagement. Explore solutions at itinai.com.