Восстановление пропущенных значений временного ряда на основе совместного применения аналитических алгоритмов и нейронных сетей
Авторы
М. Л. Цымблер
А. А. Юртин
Ключевые слова:
временной ряд
восстановление пропущенных значений
сниппеты временного ряда
мера MPdist
рекуррентные нейронные сети
Аннотация
В настоящее время обработка данных временных рядов осуществляется в широком спектре научных и практических приложений, в которых актуальной является задача восстановления единичных точек или блоков значений временного ряда, пропущенных из-за аппаратных или программных сбоев либо ввиду человеческого фактора. В статье представлен метод SANNI (Snippet and Artificial Neural Network-based Imputation) для восстановления пропущенных значений временного ряда, обрабатываемого в режиме офлайн. SANNI включает в себя две нейросетевые модели: Распознаватель и Реконструктор. Распознаватель определяет сниппет (типичную подпоследовательность) ряда, на который наиболее похожа данная подпоследовательность с пропущенной точкой, и состоит из следующих трех групп слоев: сверточные, рекуррентный и полносвязные. Реконструктор, используя выход Распознавателя и входную подпоследовательность c пропуском, восстанавливает пропущенную точку. Реконструктор состоит из трех групп слоев: сверточные, рекуррентные и полносвязные. Топологии слоев Распознавателя и Реконструктора параметризуются относительно соответственно количества сниппетов и длины сниппета. Представлены методы подготовки обучающих выборок указанных нейросетевых моделей. Проведены вычислительные эксперименты, показавшие, что среди передовых аналитических и нейросетевых методов SANNI входит в тройку лучших.
S. A. Ivanov, K. Yu. Nikolskaya, G. I. Radchenko, et al., “Digital Twin of a City: Concept Overview,” Vestn. Yuzhn. Ural. Gos. Univ. Ser. Vychisl. Mat. Inf. 9 (4), 5-23 (2020). doi 10.14529/cmse200401
M. L. Zymbler, Ya. A. Kraeva, E. A. Latypova, et al., “Cleaning Sensor Data in Intelligent Heating Control System,” Vestn. Yuzhn. Ural. Gos. Univ. Ser. Vychisl. Mat. Inf. 10 (3), 16-36 (2021). doi 10.14529/cmse210302
V. V. Epishev, A. P. Isaev, R. M. Miniakhmetov, et al., “Physiological Data Mining System for Elite Sports,” Vestn. Yuzhn. Ural. Gos. Univ. Ser. Vychisl. Mat. Inf. 2 (1), 44-54 (2013). doi 10.14529/cmse130105
S. M. Abdullaev, O. Yu. Lenskaia, A. O. Gayazova, et al., “Short-Range Forecasting Algorithms Using Radar Data: Translation Estimate and Life-Cycle Composite Display,” Vestn. Yuzhn. Ural. Gos. Univ. Ser. Vychisl. Mat. Inf. 3 (1), 17-32 (2014). doi 10.14529/cmse140102
M. M. Dyshaev and I. M. Sokolinskaya, “Representation of Trading Signals Based on Kaufman Adaptive Moving Average as a System of Linear Inequalities,” Vestn. Yuzhn. Ural. Gos. Univ. Ser. Vychisl. Mat. Inf. 2 (4), 103-108 (2013). doi 10.14529/cmse130408
M. Khayati, A. Lerner, Z. Tymchenko, and P. Cudré-Mauroux, “Mind the Gap: An Experimental Evaluation of Imputation of Missing Values Techniques in Time Series,” Proc. VLDB Endow. 13 (5), 768-782 (2020). doi 10.14778/3377369.3377383
F. A. Adnan, K. R. Jamaludin, W. Z. A. W. Muhamad, and S. Miskon, “A Review of the Current Publication Trends on Missing Data Imputation over Three Decades: Direction and Future Research,” Neural Comput. Appl. 34 (21), 18325-18340 (2022). doi 10.1007/s00521-022-07702-7
M. L. Zymbler, V. A. Polonsky, and A. A. Yurtin, “On One Method of Imputation Missing Values of a Streaming Time Series in Real Time,” Vestn. Yuzhn. Ural. Gos. Univ. Ser. Vychisl. Mat. Inf. 10 (4), 5-25 (2021). doi 10.14529/cmse210401
S. Imani, F. Madrid, W. Ding, et al., “Matrix Profile XIII: Time Series Snippets: A New Primitive for Time Series Data Mining,” in Proc. 9th IEEE Int. Conf. on Big Knowledge (ICBK), Singapore, November 17-18, 2018 (IEEE Press, New York, 2018), pp. 382-389. doi 10.1109/ICBK.2018.00058
J. Yoon, W. R. Zame, and M. van der Schaar, “Estimating Missing Data in Temporal Data Streams Using Multi-Directional Recurrent Neural Networks,” IEEE Trans. Biomed. Eng. 66 (5), 1477-1490 (2019). doi 10.1109/TBME.2018.2874712
W. Du, D. Côté, and Y. Liu, “SAITS: Self-Attention-Based Imputation for Time Series,” Expert Syst. Appl. 219, Article Number 119619 (2023). doi 10.1016/j.eswa.2023.119619
J. Yoon, J. Jordon, and M. van der Schaar, GAIN: Missing Data Imputation Using Generative Adversarial Nets , arXiv preprint: 1806.02920v1 [cs.LG] (Cornell Univ. Library, Ithaca, 2018). https://arxiv.org/abs/1806.02920 . Cited June 18, 2023.
Z. Guo, Y. Wan, and H. Ye, “A Data Imputation Method for Multivariate Time Series Based on Generative Adversarial Network,” Neurocomputing 360, 185-197 (2019). doi 10.1016/j.neucom.2019.06.007
Y. Luo, Y. Zhang, X. Cai, and X. Yuan, “E²GAN: End-to-End Generative Adversarial Network for Multivariate Time Series Imputation,” in Proc. 28th Int. Joint Conf. on Artificial Intelligence, Macao, China, August 10-16, 2019 (AAAI Press, Washington, DC, 2019), pp. 3094-3100. doi 10.24963/ijcai.2019/429
S. Gharghabi, S. Imani, A. Bagnall, et al., “An Ultra-Fast Time Series Distance Measure to Allow Data Mining in More Complex Real-World Deployments,” Data Min. Knowl. Disc. 34, 1104-1135 (2020). doi 10.1007/s10618-020-00695-8
C.-C. M. Yeh, Y. Zhu, L. Ulanova, et al., “Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View that Includes Motifs, Discords and Shapelets,” in Proc. IEEE 16th Int. Conf. on Data Mining (ICDM), Barcelona, Spain, December 12-15, 2016 (IEEE Press, New York, 2017), pp. 1317-1322. doi 10.1109/ICDM.2016.0179
M. L. Zymbler and A. I. Goglachev, “Discovery of Typical Subsequences of Time Series on Graphical Processor,” Numerical Methods and Programming (Vychislitel’nye Metody i Programmirovanie). 22 (4), 344-359 (2021). doi 10.26089/NumMet.v22r423
J. Sola and J. Sevilla, “Importance of Input Data Normalization for the Application of Neural Networks to Complex Industrial Problems,” IEEE Trans. Nucl. Sci. 44 (3), 1464-1468 (1997). doi 10.1109/23.589532
L. Huang, Normalization Techniques in Deep Learning (Springer, Cham, 2022). doi 10.1007/978-3-031-14595-7
E. M. Reingold, J. Nievergelt, and N. Deo, Combinatorial Algorithms: Theory and Practice (Prentice-Hall, Englewood Cliffs, 1977; Mir, Moscow, 1980).
S. Hochreiter, “The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions,” Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 6 (2), 107-116 (1998). doi 10.1142/S0218488598000094
L. Lu, Y. Shin, Y. Su, and G. E. Karniadakis, “Dying ReLU and Initialization: Theory and Numerical Examples,” Commun. Comput. Phys. 28, 1671-1706 (2020). doi 10.4208/cicp.OA-2020-0165
R. V. Bilenko, N. Yu. Dolganina, E. V. Ivanova, and A. I. Rekachinsky, “High-Performance Computing Resources of South Ural State University,” Vestn. Yuzhn. Ural. Gos. Univ. Ser. Vychisl. Mat. Inf. 11 (1), 15-30 (2022). doi 10.14529/cmse220102
I. Laña, I. Olabarrieta, M. Vélez, and J. Del Ser, “On the Imputation of Missing Data for Road Traffic Forecasting: New Insights and Novel Techniques,” Transp. Res. Part C Emerg. Technol. 90, 18-33 (2018). doi 10.1016/j.trc.2018.02.021
A. Reiss and D. Stricker, “Introducing a New Benchmarked Dataset for Activity Monitoring,” in Proc. 16th Int. Symposium on Wearable Computers, Newcastle, United Kingdom, June 18-22, 2012 (IEEE Press, New York, 2012), pp. 108-109. doi 10.1109/ISWC.2012.13
L. Biewald, “Experiment Tracking with Weights and Biases,” Software available from wandb.com: https://docs.wandb.ai/.Cited June 15, 2023.
X. Shu, F. Porikli, and N. Ahuja, “Robust Orthonormal Subspace Learning: Efficient Recovery of Corrupted Low-Rank Matrices,” in 2014 IEEE Conf. on Computer Vision and Pattern Recognition, Columbus, USA, June 23-28, 2014 (IEEE Press, New York, 2014), pp. 3874-3881. doi 10.1109/CVPR.2014.495
L. Li, J. McCann, N. S. Pollard, and C. Faloutsos, “DynaMMo: Mining and Summarization of Coevolving Sequences with Missing Values,” in Proc. 15th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Paris, France, June 28-July 1, 2009 (ACM Press, New York, 2009), pp. 507-516. doi 10.1145/1557019.1557078
M. Khayati, P. Cudré-Mauroux, and M. H. Böhlen, “Scalable Recovery of Missing Blocks in Time Series with High and Low Cross-Correlations,” Knowl. Inf. Syst. 62 (6), 2257-2280 (2020). doi 10.1007/s10115-019-01421-7
D. Zhang and L. Balzano, “Global Convergence of a Grassmannian Gradient Descent Algorithm for Subspace Estimation,” in Proc. 19th Int. Conf. on Artificial Intelligence and Statistics, Cadiz, Spain, May 9-11, 2016. Volume 51, 1460-1468 (2016). http://proceedings.mlr.press/v51/zhang16b.pdf . Cited June 15, 2023.
R. Mazumder, T. Hastie, and R. Tibshirani, “Spectral Regularization Algorithms for Learning Large Incomplete Matrices,” J. Mach. Learn. Res. 11, Article Number 80, 2287-2322 (2010). https://www.jmlr.org/papers/volume11/mazumder10a/mazumder10a.pdf . Cited June 15, 2023.
O. Troyanskaya, M. Cantor, G. Sherlock, et al., “Missing Value Estimation Methods for DNA Microarrays,” Bioinformatics 17 (6), 520-525 (2001). doi 10.1093/bioinformatics/17.6.520
J. Mei, Y. de Castro, Y. Goude, and G. Hébrail, “Nonnegative Matrix Factorization for Time Series Recovery from a Few Temporal Aggregates,” in Proc. 34th Int. Conf. on Machine Learning, Sydney, Australia, August 6-11, 2017. Volume 70, 2382-2390 (2017). https://dl.acm.org/doi/10.5555/3305890.3305927 . Cited June 15, 2023.
H.-F. Yu, N. Rao, and I. S. Dhillon, “Temporal Regularized Matrix Factorization for High-Dimensional Time Series Prediction,” in Proc. Annual Conf. on Neural Information Processing Systems, Barcelona, Spain, December 5-10, 2016. https://dl.acm.org/doi/abs/10.5555/3157096.3157191 . Cited June 15, 2023.
B. D. Minor, J. R. Doppa, and D. J. Cook, “Learning Activity Predictors from Sensor Data: Algorithms, Evaluation, and Applications,” IEEE Trans. Knowl. Data Eng. 29 (12), 2744-2757 (2017). doi 10.1109/TKDE.2017.2750669