Beyond Transformer, fully upgraded! MIT and other Chinese teams released a universal timing TimeMixer++ architecture, taking the lead in eight tasks

Author：Eve Cole Update Time：2024-11-14 11:42:01

[Introduction] TimeMixer++ is an innovative time series analysis model that surpasses existing models in multiple tasks through multi-scale and multi-resolution methods. It demonstrates a new perspective of time series analysis and brings benefits to tasks such as prediction and classification. for greater accuracy and flexibility.

In the data-driven era, time series analysis has become an integral part of many fields, such as weather prediction, medical symptom classification, spacecraft anomaly detection, and filling in missing data in sensor data, etc. These applications specifically involve Time series prediction, classification, anomaly detection, missing value filling and other tasks.

How can one model be used for all tasks simultaneously?

In recent years, a series of works, including the Transformer architecture, have demonstrated excellent performance in segmentation tasks, but due to the lack of flexible and universal temporal feature extraction capabilities, they cannot become a universal model architecture.

In order to solve these problems, Chinese teams from MIT, Hong Kong University of Science and Technology, Zhejiang University, and Griffith University jointly launched a new deep model architecture TimeMixer++, which can perform 8 tasks including long-range time series prediction, short-range time series prediction, time series classification, and anomaly detection. The performance on time series tasks comprehensively surpasses Transformer and other models, enabling universal time series modeling and applications.

Paper link: https://arxiv.org/pdf/2410.16032

The universal capability of TimeMixer++ is due to its ability to extract universal timing features. For different tasks, the model adaptively learns different latent space representations, showing strong flexibility and effectiveness.

design motivation

The paper proposes the concept of "Time Series Pattern Machine" (TSPM). As a model that can perform well in a wide range of timing tasks, it must be able to extract a variety of timing features to adapt to the requirements of the task.

Time series are sampled from the continuous real world at different scales (such as seconds, minutes, hours), and the periodicity displayed at different scales is different. This multi-scale, multi-periodic characteristic guided the design of the model architecture.

TimeMixer++

core effect

Based on time domain (multi-scale) and frequency domain (multi-frequency/period) information, TimeMixer++ converts each time series into a multi-resolution time series image (Multi-Resolution Time Images), and maps each time series image in the depth space. Decoupling and mixing are performed to finally extract multi-scale and multi-period features.

overall structure

The structure of TimeMixer++ is similar to Transformer, including downsampling, embedding layer (Input Projection), L stacked MixerBlocks, and output layer. Among them, each MixerBlock includes (1) multi-resolution time imaging, (2) timing diagram decomposition, (3) multi-scale mixing, and (4) multi-resolution mixing in order.

Here we briefly introduce the operations within MixerBlock.

1. Multi-Resolution Time Imaging (MRTI): MRTI is responsible for folding the timing in multiple scales and periods based on the frequency domain information, thereby obtaining multiple sets of timing diagrams.

2. Time Image Decomposition (TID): TID decouples the season-trend from each time series chart through the attention mechanism of the horizontal axis and the vertical axis, and obtains seasonal charts and trend charts.

3. Multi-Scale Mixing (MCM): MCM is responsible for mixing seasonal graphs and trend graphs at different scales. In view of the form of the graph, the paper uses convolution and deconvolution operations.

MCM is driven by seasonality and trend mixing, gradually aggregating seasonal maps from fine-grained to coarse-grained, and using coarse-scale prior knowledge to deeply mine macro-trend information, ultimately achieving multi-scale mixing in past information extraction. For trend charts, stepwise aggregation from coarse-grained to fine-grained is used.

Model effect

In order to verify the performance of TimeMixer++, the author conducted tests on 8 mainstream time series tasks including long-range prediction, short-range prediction, time series classification, anomaly detection, filling, and few-sample/zero-sample prediction. Experimental results show that TimeMixer++ comprehensively surpasses the current state-of-the-art Transformer model in multiple indicators. The specific performance is as follows:

In long-range time series prediction, TimeMixer++ surpasses the prediction models of recent years in 9/12 indicators.

In single-variable and multi-variable short-range prediction tasks, TimeMixer++ surpasses other models in recent years in all aspects.

In the missing value filling task, TimeMixer++ also maintained its lead, surpassing other models in almost all indicators and data.

In difficult classification tasks and anomaly detection tasks, TimeMixer++ still achieved the best results among all models, defeating many timing models designed specifically for this task.

Under the setting of zero-sample prediction, TimeMixer++ achieved first place in performance, indicating that universal timing features are extracted and not due to overfitting.

Characterization analysis

Through visual analysis, it is shown that TimeMixer++ decomposes the time series into multiple sets of seasonal charts and trend charts, and can fully extract the characteristics of the time series from both the time domain and frequency domain perspectives. There are significant differences in seasonality and trend under different scales and frequencies.

efficiency analysis

TimeMixer++ demonstrates high efficiency in memory footprint and training time while maintaining competitive MSE scores. In weather data filling and ETTm1 long-term prediction tasks, compared with other models, it has lower memory usage and faster training time, and can effectively capture long-range dependencies.

ablation experiment

The author conducted ablation experiments to verify the rationality of the TimeMixer++ architecture. The results show that the existing multi-group module design has achieved optimal results on most data sets.

Summarize

This article introduces a new deep model architecture, TimeMixer++, which comprehensively surpasses Transformer and other models in eight time series analysis tasks, and successfully implements universal time series modeling and applications. The innovation of TimeMixer++ is to convert time series into images and perform feature extraction in time domain, frequency domain, multi-scale and multi-resolution, thereby improving the performance of the model.

The success of TimeMixer++ not only brings new ideas to the field of timing analysis, but also demonstrates a new perspective of timing understanding. In the future, with the introduction of more optimization technologies and application scenarios, I believe TimeMixer++ will further promote the development of time series prediction technology and bring greater value to various industries.