|

Anomalies detection in prognostic data analysis

Authors: Kuzovlev V.I., Orlov A.O. Published: 12.10.2016
Published in issue: #5(110)/2016  
DOI: 10.18698/0236-3933-2016-5-75-85

 
Category: Informatics, Computer Engineering and Control | Chapter: System Analysis, Control, and Information Processing  
Keywords: anomalies, outliers in data, prognostic analysis, decision tree model

Designing data models for prognostic purposes require anomalies detection method. This article describes the choice of the method and how it applies for the decision tree model algorithm. The authors not only describe the methods of data anomalies search, but also explain basic steps of the algorithm itself. The work analyzes search parameters and their major influence on the method application outcome. As a result of both anomalies detection methods and decision tree model algorithm design the accuracy of the prognostic model increases. It happens due to improved model robustness and also a significant performance improvement of the analysis.

References

[1] Tolochko S.I., Chernen’kiy V.M. Information system analysis and the definition of a notion of information system of prompt decision support. Vestn. Mosk. Gos. Tekh. Univ. im. N.E. Baumana, Priborostr., Spetsvyp. [Herald of the Bauman Moscow State Tech. Univ., Instrum. Eng., Spec. Issue], 2011, pp. 69-80 (in Russ.).

[2] Kuzovlev V.I., Orlov A.O. Prognostic analysis of data by ID3O. Nauka i obrazovanie. MGTU im. N.E. Baumana [Science & Education of the Bauman MSTU. Electronic Journal], 2012, no. 10. DOI: 10.7463/1012.0483286 Available at: http://technomag.neicon.ru/en/doc/483286.html

[3] Chandola V., Banerjee A., Kumar V. Anomaly detection: A survey. ACM Computing Surveys, 2009, vol. 41, no. 3. Article 15. 58 p.

[4] Boriah S., Chandola V., Kumar V. Similarity measures for categorical data: A comparative evaluation. In Proceedings of the 8th SIAM International Conference on Data Mining, 2008.

[5] Chernen’kiy V.M., Gapanyuk Yu.E. The passenger identification technique using passenger name record data. Jelektr. nauchno-tekh. izd. "Inzhenernyy zhurnal: nauka i innovacii" [El. Sc.-Tech. Publ. "Eng. J.: Science and Innovation"], 2012, iss. 3. DOI: 10.18698/2308-6033-2012-3-89 Available at: http://engjournal.ru/eng/catalog/it/biometric/89.html

[6] Tolochko S.I., Chernen’kiy V.M., Spiridonov I.N., Martynov P.I. Development and implementation of automated passport-control systems. Jelektr. nauchno-tekh. izd. "Inzhenernyy zhurnal: nauka i innovacii" [El. Sc.-Tech. Publ. "Eng. J.: Science and Innovation"], 2012, iss. 3. DOI: 10.18698/2308-6033-2012-3-94 Available at: http://engjournal.ru/eng/catalog/it/biometric/94.html

[7] Shubert E., Zimek A., Kriegel H.-P. Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video and network outlier detection. Data Min. and Knowl. Disc., 2014, vol. 28, iss. 1, pp. 190-237. DOI: 10.1007/s10618-012-0300-z

[8] Kuzovlev V.I., Orlov A.O. Method of detecting anomalies in the source data at constructing a prognostic model of a decision tree in decision support systems. Nauka i obrazovanie. MGTU im. N.E. Baumana [Science & Education of the Bauman MSTU. Electronic Journal], 2012, no. 9. DOI: 10.7463/0912.0483269 Available at: http://technomag.neicon.ru/en/doc/483269.html

[9] Kuzovlev V.I., Orlov A.O. Probabilistic approach to estimating the validity factor of elements of profiling results. Jelektr. nauchno-tekh. izd. "Inzhenernyy zhurnal: nauka i innovacii" [El. Sc.-Tech. Publ. "Eng. J.: Science and Innovation"], 2012, iss. 3. DOI: 10.18698/2308-6033-2012-3-115 Available at: http://engjournal.ru/eng/catalog/it/hidden/115.html

[10] Vagin V.N., Golovina E.Yu., Zagoryanskaya A.A., Fomina M.V. Dostovernyy i pravdopodobnyy vyvod v intellektual’nykh sistemakh [Credible and plausible inference in intelligent systems]. Moscow, Fizmatlit Publ., 2008. 712 p.

[11] Kuzovlev V.I., Orlov A.O. The method of parameters selection and data anomaly analysis results interpretation in decision support systems. Jelektr. nauchno-tekh. izd. "Inzhenernyy zhurnal: nauka i innovacii" [El. Sc.-Tech. Publ. "Eng. J.: Science and Innovation"], 2013, iss. 11. DOI: 10.18698/2308-6033-2013-11-1045 Available at: http://engjournal.ru/eng/catalog/it/hidden/1045.html

[12] Orlov A.O. The problem of search distances between values of categorical attributes detection emissions data. V mire nauchnykh otkrytiy [In the World of Scientific Discoveries], 2012, no. 8.1, pp. 142-155 (in Russ.).