Using Time-Series Generative Adversarial Networks to Synthesize Sensing Data for Pest Incidence Forecasting on Sustainable Agriculture

A sufficient amount of data is crucial for high-performance and accurate trend prediction. However, it is difficult and time-consuming to collect agricultural data over long periods of time; the consequence of such difficulty is datasets that are characterized by missing data. In this study we use a time-series generative adversarial network (TimeGAN) to synthesize multivariate agricultural sensing data and train RNN (Recurrent Neural Network), LSTM (Long Short-Term Memory), and GRU (Gated Recurrent Unit) neural network prediction models on the original and generated data to predict future pest populations. After our experiment, the data generated using TimeGAN and the original data have the smallest EC value in the GRU model, which is 9.86. The results show that the generative model effectively synthesizes multivariate agricultural sensing data and can be used to make up for the lack of actual data. The pest prediction model trained on synthetic data using time-series data generation yields results that are similar to that of the model trained on actual data. Accurate prediction of pest populations would represent a breakthrough in allowing for accurate and timely pest control.  


Generative Adversarial Networks in Time Series: A Systematic Literature Review

Generative adversarial network (GAN) studies have grown exponentially in the past few years. Their impact has been seen mainly in the computer vision field with realistic image and video manipulation, especially generation, making significant advancements. Although these computer vision advances have garnered much attention, GAN applications have diversified across disciplines such as time series and sequence generation. As a relatively new niche for GANs, fieldwork is ongoing to develop high-quality, diverse, and private time series data. In this article, we review GAN variants designed for time series related applications. We propose a classification of discrete-variant GANs and continuous-variant GANs, in which GANs deal with discrete time series and continuous time series data. Here we showcase the latest and most popular literature in this field—their architectures, results, and applications. We also provide a list of the most popular evaluation metrics and their suitability across applications. Also presented is a discussion of privacy measures for these GANs and further protections and directions for dealing with sensitive data. We aim to frame clearly and concisely the latest and state-of-the-art research in this area and their applications to real-world technologies.  


Generating multivariate time series with COmmon Source CoordInated GAN (COSCI-GAN)

Generating multivariate time series is a promising approach for sharing sensitive data in many medical, financial, and IoT applications. A common type of multivariate time series originates from a single source such as the biometric measurements from a medical patient. This leads to complex dynamical patterns between individual time series that are hard to learn by typical generation models such as GANs. There is valuable information in those patterns that machine learning models can use to better classify, predict or perform other downstream tasks. We propose a novel framework that takes time series' common origin into account and favors channel/feature relationships preservation. The two key points of our method are: 1) the individual time series are generated from a common point in latent space and 2) a central discriminator favors the preservation of inter-channel/feature dynamics. We demonstrate empirically that our method helps preserve channel/feature correlations and that our synthetic data performs very well in downstream tasks with medical and financial data.  


Learning the conditional law: signatures and conditional GANs in filtering and prediction of diffusion processes

We consider the filtering and prediction problem for a diffusion process. The signal and observation are modeled by stochastic differential equations (SDEs) driven by correlated Wiener processes. In classical estimation theory, measure-valued stochastic partial differential equations (SPDEs) are derived for the filtering and prediction measures. These equations can be hard to solve numerically. We provide an approximation algorithm using conditional generative adversarial networks (GANs) in combination with signatures, an object from rough path theory. The signature of a sufficiently smooth path determines the path completely. As a result, in some cases, GANs based on signatures have been shown to efficiently approximate the law of a stochastic process. For our algorithm we extend this method to sample from the conditional law, given noisy, partial observation. Our generator is constructed using neural differential equations (NDEs), relying on their universal approximator property. We show well-posedness in providing a rigorous mathematical framework. Numerical results show the efficiency of our algorithm.  


Generating Synthetic Time Series for Machine-Learning-Empowered Monitoring of Electric Motor Test Benches

The development of new electric traction machines is a time-consuming process as it involves intensive testing on motor test benches. Machine-Learning-empowered monitoring offers the opportunity to anticipate costly failures early and hence reduce development time. However, machine learning (ML) for process monitoring requires large amounts of training data, especially as the targeted fault states are scarce and yet diverse in their appearances.Therefore, we propose to use synthetic time series data to leverage the high cost of acquiring training data from experiments in real test benches. In this article, we present a novel scheme to generate synthetic data based on a sub-dimensional time series representation. We introduce a highly flexible model by mapping the data to a latent representation and approximating the latent data distribution by a Gaussian Mixture Model. In addition, we propose the Fréchet InceptionTime Distance (FITD) as a new distance measure to evaluate the generated data. It allows extracting characteristics at different scales by using multiple kernel sizes. In this way, we ensure that the synthesized data contains characteristics similar to those present in the real data. In our experiment, we train two types of fault detectors, one based on real data of a motor test bench and the other based on synthetic data. We also consider employing fault-aware conditional architectures to generate training data for different fault types explicitly. Our final results show that using synthesized data in the training process increases the performance in terms of classification accuracy score (CAS) up to 29%.  


Fractional SDE-Net: Generation of Time Series Data with Long-term Memory

In this paper, we focus on the generation of time series data using neural networks. It is often the case that input time-series data have only one realized (and usually irregularly sampled) path, which makes it difficult to extract time-series characteristics, and its noise structure is more complicated than i.i.d. type. Time series data, especially from hydrology, telecommunications, economics, and finance, exhibit long-term memory also called long-range dependency (LRD). The main purpose of this paper is to artificially generate time series with the help of neural networks, making the LRD of paths into account. We propose fSDE-Net: neural fractional Stochastic Differential Equation Network. It generalizes the neural stochastic differential equation model by using fractional Brownian motion with a Hurst index larger than half, which exhibits the LRD property. We derive the solver of fSDE-Net and theoretically analyze the existence and uniqueness of the solution to fSDE-Net. Our experiments with artificial and real time-series data demonstrate that the fSDE-Net model can replicate distributional properties well.
fractional Ornstein-Uhlenbeck SPX NileMin ethernetTraffic NBSdiff1kg NhemiTemp  


Wasserstein generative adversarial networks for modeling marked events

Marked temporal events are ubiquitous in several areas, where the events’ times and marks (types) are usually interrelated. Point processes and their non-functional variations using recurrent neural networks (RNN) model temporal events using intensity functions. However, since they usually utilize the likelihood maximization approach, they might fail. Moreover, their high simulation complexity makes them inappropriate. Since calculating the intensity function is not always necessary, generative models are utilized for modeling. Generative adversarial networks (GANs) have been successful in modeling point processes, but they still lack in modeling interdependent types and times of events. In this research, a double Wasserstein GAN (WGAN), using a conditional GAN, is proposed which generates types of events that are categorical data, dependent on their times. Experiments on synthetic and real-world data represent that WGAN methods are efficient or competitive with the compared intensity-based models. Furthermore, these methods have a faster simulation than intensity-based methods.  


Simulating financial time series using attention

Financial time series simulation is a central topic since it extends the limited real data for training and evaluation of trading strategies. It is also challenging because of the complex statistical properties of the real financial data. We introduce two generative adversarial networks (GANs), which utilize the convolutional networks with attention and the transformers, for financial time series simulation. The GANs learn the statistical properties in a data-driven manner and the attention mechanism helps to replicate the long-range dependencies. The proposed GANs are tested on the S&P 500 index and option data, examined by scores based on the stylized facts and are compared with the pure convolutional GAN, i.e. QuantGAN. The attention-based GANs not only reproduce the stylized facts, but also smooth the autocorrelation of returns.  


TTS-GAN: A Transformer-Based Time-Series Generative Adversarial Network

Signal measurements appearing in the form of time series are one of the most common types of data used in medical machine learning applications. However, such datasets are often small, making the training of deep neural network architectures ineffective. For time-series, the suite of data augmentation tricks we can use to expand the size of the dataset is limited by the need to maintain the basic properties of the signal. Data generated by a Generative Adversarial Network (GAN) can be utilized as another data augmentation tool. RNN-based GANs suffer from the fact that they cannot effectively model long sequences of data points with irregular temporal relations. To tackle these problems, we introduce TTS-GAN, a transformer-based GAN which can successfully generate realistic synthetic time-series data sequences of arbitrary length, similar to the real ones. Both the generator and discriminator networks of the GAN model are built using a pure transformer encoder architecture. We use visualizations and dimensionality reduction techniques to demonstrate the similarity of real and generated time-series data. We also compare the quality of our generated data with the best existing alternative, which is an RNN-based time-series GAN.
Sinusoidal sequence UniMiB data PTB Diagnostic ECG data
t-SNE, PCA visualization Cosine similarity Jensen-Shannon distance  


Time-Series Transformer Generative Adversarial Networks

Many real-world tasks are plagued by limitations on data: in some instances very little data is available and in others, data is protected by privacy enforcing regulations (e.g. GDPR). We consider limitations posed specifically on time-series data and present a model that can generate synthetic time-series which can be used in place of real data. A model that generates synthetic time-series data has two objectives: 1) to capture the stepwise conditional distribution of real sequences, and 2) to faithfully model the joint distribution of entire real sequences. Autoregressive models trained via maximum likelihood estimation can be used in a system where previous predictions are fed back in and used to predict future ones; in such models, errors can accrue over time. Furthermore, a plausible initial value is required making MLE based models not really generative. Many downstream tasks learn to model conditional distributions of the time-series, hence, synthetic data drawn from a generative model must satisfy 1) in addition to performing 2). We present TsT-GAN, a framework that capitalises on the Transformer architecture to satisfy the desiderata and compare its performance against five state-of-the-art models on five datasets and show that TsT-GAN achieves higher predictive performance on all datasets.
Sinusoidal sequence Stock data UCI Energy and Air data UCI Hungarian Chickenpox Cases data
Predictive score Discriminative score t-SNE, PCA visualization