Papers | Deep into MLF

Time Series Generation With Masked Autoencoder

This paper shows that masked autoencoder with extrapolator (ExtraMAE) is a scalable self-supervised model for time series generation. ExtraMAE randomly masks some patches of the original time series and learns temporal dynamics by recovering the masked patches. Our approach has two core designs. First, ExtraMAE is self-supervised. Supervision allows ExtraMAE to effectively and efficiently capture the temporal dynamics of the original time series. Second, ExtraMAE proposes an extrapolator to disentangle two jobs of the decoder: recovering latent representations and mapping them back into the feature space. These unique designs enable ExtraMAE to consistently and significantly outperform state-of-the-art (SoTA) benchmarks in time series generation. The lightweight architecture also makes ExtraMAE fast and scalable. ExtraMAE shows outstanding behavior in various downstream tasks such as time series classification, prediction, and imputation. As a self-supervised generative model, ExtraMAE allows explicit management of the synthetic data. We hope this paper will usher in a new era of time series generation with self-supervised models.
Google stock data Sinusoidal sequence UCI Energy data Wafer Italy power demand Strawberry
t-SNE, PCA visualization Predictive score Classification score

Paper

PSA-GAN: Progressive Self Attention Gans For Synthetic Time Series

Realistic synthetic time series data of sufficient length enables practical applications in time series modeling tasks, such as forecasting, but remains a challenge. In this paper we present PSA-GAN, a generative adversarial network (GAN) that generates long time series samples of high quality using progressive growing of GANs and self-attention. We show that PSA-GAN can be used to reduce the error in two downstream forecasting tasks over baselines that only use real data. We also introduce a Frechet-Inception Distance-like score, Context-FID, assessing the quality of synthetic time series samples. In our downstream tasks, we find that the lowest scoring models correspond to the best-performing ones. Therefore, Context-FID could be a useful tool to develop time series GAN models.
The M4 Competition Solar: hourly solar energy collection data in Alabama State Electricity: hourly electricity consumption data Traffic: hourly occupancy rate of lanes in San Francisco
Context FID score Far-forecasting Missing Value Stretches

Paper

TimeVAE: A Variational Auto-Encoder For Multivariate Time Series Generation

Recent work in synthetic data generation in the time-series domain has focused on the use of Generative Adversarial Networks. We propose a novel architecture for synthetically generating time-series data with the use of Variational Auto-Encoders (VAEs). The proposed architecture has several distinct properties: interpretability, ability to encode domain knowledge, and reduced training times. We evaluate data generation quality by similarity and predictability against four multivariate datasets. We experiment with varying sizes of training data to measure the impact of data availability on generation quality for our VAE method as well as several state-of-the-art data generation methods. Our results on similarity tests show that the VAE approach is able to accurately represent the temporal attributes of the original data. On next-step prediction tasks using generated data, the proposed VAE architecture consistently meets or exceeds performance of state-of-the-art data generation methods. While noise reduction may cause the generated data to deviate from original data, we demonstrate the resulting de-noised data can significantly improve performance for next-step prediction using generated data. Finally, the proposed architecture can incorporate domain-specific time-patterns such as polynomial trends and seasonalities to provide interpretable outputs. Such interpretability can be highly advantageous in applications requiring transparency of model outputs or where users desire to inject prior knowledge of time-series patterns into the generative model.
Sinusoidal sequence Yahoo stock data UCI Energy and Air data
t-SNE, PCA visualization Discriminative score Predictive score

Paper

Code

Time-Series Generation By Contrastive Imitation

Consider learning a generative model for time-series data. The sequential setting poses a unique challenge: Not only should the generator capture the conditional dynamics of (stepwise) transitions, but its open-loop rollouts should also preserve the joint distribution of (multi-step) trajectories. On one hand, autoregressive models trained by MLE allow learning and computing explicit transition distributions, but suffer from compounding error during rollouts. On the other hand, adversarial models based on GAN training alleviate such exposure bias, but transitions are implicit and hard to assess. In this work, we study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking) transition policy, where the reinforcement signal is provided by a global (but stepwise-decomposable) energy model trained by contrastive estimation. At training, the two components are learned cooperatively, avoiding the instabilities typical of adversarial objectives. At inference, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality. By expressly training a policy to imitate sequential behavior of time-series features in a dataset, this approach embodies “generation by imitation”. Theoretically, we illustrate the correctness of this formulation and the consistency of the algorithm. Empirically, we evaluate its ability to generate predictively useful samples from real-world datasets, verifying that it performs at the standard of existing benchmarks.
Multivariate sinusoidal sequence UCI energy, gas, metro dataset Medical Information Mart for Intensive Care MIMIC-III database
Predictive score Train-on-Synthetic, Test-on-Real

Paper

Towards Generating Real-World Time Series Data

Time series data generation has drawn increasing attention in recent years. Several generative adversarial network (GAN) based methods have been proposed to tackle the problem usually with the assumption that the targeted time series data are well-formatted and complete. However, real-world time series (RTS) data are far away from this utopia, e.g., long sequences with variable lengths and informative missing data raise intractable challenges for designing powerful generation algorithms. In this paper, we propose a novel generative framework for RTS data - RTSGAN to tackle the aforementioned challenges. RTSGAN first learns an encoder-decoder module which provides a mapping between a time series instance and a fixed-dimension latent vector and then learns a generation module to generate vectors in the same latent space. By combining the generator and the decoder, RTSGAN is able to generate RTS which respect the original feature distributions and the temporal dynamics. To generate time series with missing values, we further equip RTSGAN with an observation embedding layer and a decide-and-generate decoder to better utilize the informative missing patterns. Experiments on the four RTS datasets show that the proposed framework outperforms the previous generation methods in terms of synthetic data utility for downstream classification and prediction tasks.
Google stock data UCI Appliances energy prediction dataset PhysioNet Challenge 2012 dataset Medical Information Mart for Intensive Care MIMIC-III database
Discriminative score Predictive score t-SNE, PCA visualization

Paper

Code

Sig-Wasserstein GANs For Time Series Generation

Synthetic data is an emerging technology that can significantly accelerate the development and deployment of AI machine learning pipelines. In this work, we develop high-fidelity time-series generators, the SigWGAN, by combining continuous-time stochastic models with the newly proposed signature W1 metric. The former are the Logsig-RNN models based on the stochastic differential equations, whereas the latter originates from the universal and principled mathematical features to characterize the measure induced by time series. SigWGAN allows turning computationally challenging GAN min-max problem into supervised learning while generating high fidelity samples. We validate the proposed model on both synthetic data generated by popular quantitative risk models and empirical financial data.
Multi-dimensional Geometric Brownian motion Rough Volatility model S&P 500 and DJI Market Data
Sig-W1 metric Marginal distribution metric Correlation metric

Paper

Code

Conditional Loss And Deep Euler Scheme For Time Series Generation

We introduce three new generative models for time series that are based on Euler discretization of Stochastic Differential Equations (SDEs) and Wasserstein metrics. Two of these methods rely on the adaptation of generative adversarial networks (GANs) to time series. The third algorithm, called Conditional Euler Generator (CEGEN), minimizes a dedicated distance between the transition probability distributions over all time steps. In the context of Ito processes, we provide theoretical guarantees that minimizing this criterion implies accurate estimations of the drift and volatility parameters. We demonstrate empirically that CEGEN outperforms state-of-the-art and GAN generators on both marginal and temporal dynamics metrics. Besides, it identifies accurate correlation structures in high dimension. When few data points are available, we verify the effectiveness of CEGEN, when combined with transfer learning methods on Monte Carlo simulations. Finally, we illustrate the robustness of our method on various real-world datasets.
GBM process OU Process Stock data Jena Climate Electric load
Marginal statistics Quadratic variation Correlation structure Underlying process parameters Discriminative score Predictive score

Paper

Differentially Private Time Series Generation

Privacy issues prevent data owner from improving Machine Learning (ML) performance as it makes external collaborations binding. To allow data sharing without confidentiality concerns, we propose in this work methods to generate time series in a privacy preserving manner. We combine the existing Generative Adversarial Networks (GAN) models for time series namely TimeGAN, ClaRe-GAN and C-RNN-GAN with differential privacy. This is achieved by changing their original discriminator with a private discriminator that relies on the differentially private stochastic gradient method (DPSGD). Our experiments show that the developed methods - in particular TimeGAN and ClaRe-GAN outperform the existing and unique differentially private model for time series of RCGAN in terms of privacy and accuracy.
Italy power demand Two lead ECG Freezer regular train Distal Phalanx TW Yoga
TRTS classification score TSTR classification score

Paper

Generative Adversarial Networks For Markovian Temporal Dynamics: Stochastic Continuous Data Generation

In this paper, we present a novel generative adversarial network (GAN) that can describe Markovian temporal dynamics. To generate stochastic sequential data, we introduce a novel stochastic differential equation-based conditional generator and spatial-temporal constrained discriminator networks. To stabilize the learning dynamics of the min-max type of the GAN objective function, we propose well-posed constraint terms for both networks. We also propose a novel conditional Markov Wasserstein distance to induce a pathwise Wasserstein distance. The experimental results demonstrate that our method outperforms state-of-the-art methods using several different types of data.
Fashion-MNIST Gaussian process Human action video data LPC-Sprite Animations
Frecet inception distance Kernel inception distance

Paper

Analyzing Deep Generated Financial Time Series For Various Asset Classes

Generative Adversarial Networks (GANs) have shown remarkable success as a framework for trainingmodels to produce realistic-looking data. In this work, we propose a GAN to produce realistic real-valued time series, with an emphasis on their application to financial data.Our aim is having a GAN, applied on various financial time series for various asset classes, that canreflect all the characteristics of them, as well as the characteristics we may be unaware of, as GANslearn the underlying structure of our data, rather than just a set of features. If we are able to achievethis, the synthetic datasets we create could be used for a variety of purposes including model trainingand model selection. In this paper we try to train a GAN with real data, representing one asset ofdifferent asset classes such as commodities, forex, futures, index and shares.
Gold futures EUR vs USD Foreign Exchange Reference Rate S&P 500 Volatility Index VIX Futures S&P 500 Index, Continuous Contract Apple Inc. Stock Prices
Kurtosis ACF score Histogram

Paper

1 2 3 4 5

Submit a paper