Восстановление пропущенных значений временного ряда на основе совместного применения аналитических алгоритмов и нейронных сетей
Авторы
-
М. Л. Цымблер
-
А. А. Юртин
Ключевые слова:
временной ряд
восстановление пропущенных значений
сниппеты временного ряда
мера MPdist
рекуррентные нейронные сети
Аннотация
В настоящее время обработка данных временных рядов осуществляется в широком спектре научных и практических приложений, в которых актуальной является задача восстановления единичных точек или блоков значений временного ряда, пропущенных из-за аппаратных или программных сбоев либо ввиду человеческого фактора. В статье представлен метод SANNI (Snippet and Artificial Neural Network-based Imputation) для восстановления пропущенных значений временного ряда, обрабатываемого в режиме офлайн. SANNI включает в себя две нейросетевые модели: Распознаватель и Реконструктор. Распознаватель определяет сниппет (типичную подпоследовательность) ряда, на который наиболее похожа данная подпоследовательность с пропущенной точкой, и состоит из следующих трех групп слоев: сверточные, рекуррентный и полносвязные. Реконструктор, используя выход Распознавателя и входную подпоследовательность c пропуском, восстанавливает пропущенную точку. Реконструктор состоит из трех групп слоев: сверточные, рекуррентные и полносвязные. Топологии слоев Распознавателя и Реконструктора параметризуются относительно соответственно количества сниппетов и длины сниппета. Представлены методы подготовки обучающих выборок указанных нейросетевых моделей. Проведены вычислительные эксперименты, показавшие, что среди передовых аналитических и нейросетевых методов SANNI входит в тройку лучших.
Раздел
Методы и алгоритмы вычислительной математики и их приложения
Библиографические ссылки
- S. A. Ivanov, K. Yu. Nikolskaya, G. I. Radchenko, et al., “Digital Twin of a City: Concept Overview,” Vestn. Yuzhn. Ural. Gos. Univ. Ser. Vychisl. Mat. Inf. 9 (4), 5-23 (2020).
doi 10.14529/cmse200401
- M. L. Zymbler, Ya. A. Kraeva, E. A. Latypova, et al., “Cleaning Sensor Data in Intelligent Heating Control System,” Vestn. Yuzhn. Ural. Gos. Univ. Ser. Vychisl. Mat. Inf. 10 (3), 16-36 (2021).
doi 10.14529/cmse210302
- V. V. Epishev, A. P. Isaev, R. M. Miniakhmetov, et al., “Physiological Data Mining System for Elite Sports,” Vestn. Yuzhn. Ural. Gos. Univ. Ser. Vychisl. Mat. Inf. 2 (1), 44-54 (2013).
doi 10.14529/cmse130105
- S. M. Abdullaev, O. Yu. Lenskaia, A. O. Gayazova, et al., “Short-Range Forecasting Algorithms Using Radar Data: Translation Estimate and Life-Cycle Composite Display,” Vestn. Yuzhn. Ural. Gos. Univ. Ser. Vychisl. Mat. Inf. 3 (1), 17-32 (2014).
doi 10.14529/cmse140102
- M. M. Dyshaev and I. M. Sokolinskaya, “Representation of Trading Signals Based on Kaufman Adaptive Moving Average as a System of Linear Inequalities,” Vestn. Yuzhn. Ural. Gos. Univ. Ser. Vychisl. Mat. Inf. 2 (4), 103-108 (2013).
doi 10.14529/cmse130408
- M. Khayati, A. Lerner, Z. Tymchenko, and P. Cudré-Mauroux, “Mind the Gap: An Experimental Evaluation of Imputation of Missing Values Techniques in Time Series,” Proc. VLDB Endow. 13 (5), 768-782 (2020).
doi 10.14778/3377369.3377383
- F. A. Adnan, K. R. Jamaludin, W. Z. A. W. Muhamad, and S. Miskon, “A Review of the Current Publication Trends on Missing Data Imputation over Three Decades: Direction and Future Research,” Neural Comput. Appl. 34 (21), 18325-18340 (2022).
doi 10.1007/s00521-022-07702-7
- M. L. Zymbler, V. A. Polonsky, and A. A. Yurtin, “On One Method of Imputation Missing Values of a Streaming Time Series in Real Time,” Vestn. Yuzhn. Ural. Gos. Univ. Ser. Vychisl. Mat. Inf. 10 (4), 5-25 (2021).
doi 10.14529/cmse210401
- S. Imani, F. Madrid, W. Ding, et al., “Matrix Profile XIII: Time Series Snippets: A New Primitive for Time Series Data Mining,” in Proc. 9th IEEE Int. Conf. on Big Knowledge (ICBK), Singapore, November 17-18, 2018 (IEEE Press, New York, 2018), pp. 382-389.
doi 10.1109/ICBK.2018.00058
- W. Cao, D. Wang, J. Li, et al., “BRITS: Bidirectional Recurrent Imputation for Time Series,” in Proc. 32nd Conf. on Neural Inf. Proc. Systems (NeurIPS 2018), Montréal, Canada, December 3-8, 2018.
https://proceedings.neurips.cc/paper_files/paper/2018/file/734e6bfcd358e25ac1db0a4241b95651-Paper.pdf . Cited June 18, 2023.
- J. Yoon, W. R. Zame, and M. van der Schaar, “Estimating Missing Data in Temporal Data Streams Using Multi-Directional Recurrent Neural Networks,” IEEE Trans. Biomed. Eng. 66 (5), 1477-1490 (2019).
doi 10.1109/TBME.2018.2874712
- Y. Liu, R. Yu, S. Zheng, et al., “NAOMI: Non-Autoregressive Multiresolution Sequence Imputation,” in Proc. 33rd Int. Conf. on Neural Inf. Proc. Systems (NeurIPS 2019), Vancouver, Canada, December 8-14, 2019.
https://proceedings.neurips.cc/paper_files/paper/2019/file/50c1f44e426560f3f2cdcb3e19e39903-Paper.pdf . Cited June 18, 2023.
- W. Du, D. Côté, and Y. Liu, “SAITS: Self-Attention-Based Imputation for Time Series,” Expert Syst. Appl. 219, Article Number 119619 (2023).
doi 10.1016/j.eswa.2023.119619
- J. Yoon, J. Jordon, and M. van der Schaar, GAIN: Missing Data Imputation Using Generative Adversarial Nets , arXiv preprint: 1806.02920v1 [cs.LG] (Cornell Univ. Library, Ithaca, 2018).
https://arxiv.org/abs/1806.02920 . Cited June 18, 2023.
- Y. Luo, X. Cai, Y. Zhang, et al., “Multivariate Time Series Imputation with Generative Adversarial Networks,” in Proc. 32nd Conf. on Neural Inf. Proc. Systems (NeurIPS 2018), Montréal, Canada, December 3-8, 2018.
https://proceedings.neurips.cc/paper/2018/file/96b9bff013acedfb1d140579e2fbeb63-Paper.pdf . Cited June 18, 2023.
- Z. Guo, Y. Wan, and H. Ye, “A Data Imputation Method for Multivariate Time Series Based on Generative Adversarial Network,” Neurocomputing 360, 185-197 (2019).
doi 10.1016/j.neucom.2019.06.007
- Y. Luo, Y. Zhang, X. Cai, and X. Yuan, “E²GAN: End-to-End Generative Adversarial Network for Multivariate Time Series Imputation,” in Proc. 28th Int. Joint Conf. on Artificial Intelligence, Macao, China, August 10-16, 2019 (AAAI Press, Washington, DC, 2019), pp. 3094-3100.
doi 10.24963/ijcai.2019/429
- S. Gharghabi, S. Imani, A. Bagnall, et al., “An Ultra-Fast Time Series Distance Measure to Allow Data Mining in More Complex Real-World Deployments,” Data Min. Knowl. Disc. 34, 1104-1135 (2020).
doi 10.1007/s10618-020-00695-8
- C.-C. M. Yeh, Y. Zhu, L. Ulanova, et al., “Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View that Includes Motifs, Discords and Shapelets,” in Proc. IEEE 16th Int. Conf. on Data Mining (ICDM), Barcelona, Spain, December 12-15, 2016 (IEEE Press, New York, 2017), pp. 1317-1322.
doi 10.1109/ICDM.2016.0179
- M. L. Zymbler and A. I. Goglachev, “Discovery of Typical Subsequences of Time Series on Graphical Processor,” Numerical Methods and Programming (Vychislitel’nye Metody i Programmirovanie). 22 (4), 344-359 (2021).
doi 10.26089/NumMet.v22r423
- J. Sola and J. Sevilla, “Importance of Input Data Normalization for the Application of Neural Networks to Complex Industrial Problems,” IEEE Trans. Nucl. Sci. 44 (3), 1464-1468 (1997).
doi 10.1109/23.589532
- L. Huang, Normalization Techniques in Deep Learning (Springer, Cham, 2022).
doi 10.1007/978-3-031-14595-7
- E. M. Reingold, J. Nievergelt, and N. Deo, Combinatorial Algorithms: Theory and Practice (Prentice-Hall, Englewood Cliffs, 1977; Mir, Moscow, 1980).
- S. Hochreiter, “The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions,” Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 6 (2), 107-116 (1998).
doi 10.1142/S0218488598000094
- L. Lu, Y. Shin, Y. Su, and G. E. Karniadakis, “Dying ReLU and Initialization: Theory and Numerical Examples,” Commun. Comput. Phys. 28, 1671-1706 (2020).
doi 10.4208/cicp.OA-2020-0165
- R. V. Bilenko, N. Yu. Dolganina, E. V. Ivanova, and A. I. Rekachinsky, “High-Performance Computing Resources of South Ural State University,” Vestn. Yuzhn. Ural. Gos. Univ. Ser. Vychisl. Mat. Inf. 11 (1), 15-30 (2022).
doi 10.14529/cmse220102
- I. Laña, I. Olabarrieta, M. Vélez, and J. Del Ser, “On the Imputation of Missing Data for Road Traffic Forecasting: New Insights and Novel Techniques,” Transp. Res. Part C Emerg. Technol. 90, 18-33 (2018).
doi 10.1016/j.trc.2018.02.021
- A. Reiss and D. Stricker, “Introducing a New Benchmarked Dataset for Activity Monitoring,” in Proc. 16th Int. Symposium on Wearable Computers, Newcastle, United Kingdom, June 18-22, 2012 (IEEE Press, New York, 2012), pp. 108-109.
doi 10.1109/ISWC.2012.13
- L. Biewald, “Experiment Tracking with Weights and Biases,” Software available from wandb.com:
https://docs.wandb.ai/.Cited June 15, 2023.
- X. Shu, F. Porikli, and N. Ahuja, “Robust Orthonormal Subspace Learning: Efficient Recovery of Corrupted Low-Rank Matrices,” in 2014 IEEE Conf. on Computer Vision and Pattern Recognition, Columbus, USA, June 23-28, 2014 (IEEE Press, New York, 2014), pp. 3874-3881.
doi 10.1109/CVPR.2014.495
- L. Li, J. McCann, N. S. Pollard, and C. Faloutsos, “DynaMMo: Mining and Summarization of Coevolving Sequences with Missing Values,” in Proc. 15th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Paris, France, June 28-July 1, 2009 (ACM Press, New York, 2009), pp. 507-516.
doi 10.1145/1557019.1557078
- M. Khayati, P. Cudré-Mauroux, and M. H. Böhlen, “Scalable Recovery of Missing Blocks in Time Series with High and Low Cross-Correlations,” Knowl. Inf. Syst. 62 (6), 2257-2280 (2020).
doi 10.1007/s10115-019-01421-7
- D. Zhang and L. Balzano, “Global Convergence of a Grassmannian Gradient Descent Algorithm for Subspace Estimation,” in Proc. 19th Int. Conf. on Artificial Intelligence and Statistics, Cadiz, Spain, May 9-11, 2016. Volume 51, 1460-1468 (2016).
http://proceedings.mlr.press/v51/zhang16b.pdf . Cited June 15, 2023.
- R. Mazumder, T. Hastie, and R. Tibshirani, “Spectral Regularization Algorithms for Learning Large Incomplete Matrices,” J. Mach. Learn. Res. 11, Article Number 80, 2287-2322 (2010).
https://www.jmlr.org/papers/volume11/mazumder10a/mazumder10a.pdf . Cited June 15, 2023.
- O. Troyanskaya, M. Cantor, G. Sherlock, et al., “Missing Value Estimation Methods for DNA Microarrays,” Bioinformatics 17 (6), 520-525 (2001).
doi 10.1093/bioinformatics/17.6.520
- J. Mei, Y. de Castro, Y. Goude, and G. Hébrail, “Nonnegative Matrix Factorization for Time Series Recovery from a Few Temporal Aggregates,” in Proc. 34th Int. Conf. on Machine Learning, Sydney, Australia, August 6-11, 2017. Volume 70, 2382-2390 (2017).
https://dl.acm.org/doi/10.5555/3305890.3305927 . Cited June 15, 2023.
- H.-F. Yu, N. Rao, and I. S. Dhillon, “Temporal Regularized Matrix Factorization for High-Dimensional Time Series Prediction,” in Proc. Annual Conf. on Neural Information Processing Systems, Barcelona, Spain, December 5-10, 2016.
https://dl.acm.org/doi/abs/10.5555/3157096.3157191 . Cited June 15, 2023.
- B. D. Minor, J. R. Doppa, and D. J. Cook, “Learning Activity Predictors from Sensor Data: Algorithms, Evaluation, and Applications,” IEEE Trans. Knowl. Data Eng. 29 (12), 2744-2757 (2017).
doi 10.1109/TKDE.2017.2750669