SOLVING THE PROBLEM OF UNIVERSITY DOCUMENTS CLASSIFICATION BASED ON INTELLECTUAL ANALYSIS METHODS
https://doi.org/10.34822/1999-7604-2021-1-12-19
Abstract
The article considers a problem of classification of text documents of a higher educational institution using machine learning methods such as the Bayes method, the k-nearest neighbors algorithm, the decision tree method, and the support-vector machines. The study is conducted on a set of documents of the Siberian State Automobile and Highway University, Omsk, using the Python programming language. Preliminary processing of documents for the study is carried out. All studied documents are divided into four classes namely the order, the instruction, the letter, and the notice of a vacant position.
The process of classification of documents of a higher educational institution is presented. A comparative analysis of the classification results of each machine learning method is carried out on such metrics as the correctness of the classification algorithm, accuracy, completeness, F-measure, and the running
time of the algorithm. As a result of the research, the author gives recommendations on the application of the considered methods for the classification of university documents. The author suggests using a combination of methods to optimize the operation of the documents classifier.
About the Author
A. L. TkachenkoRussian Federation
E-mail: tanaleo@mail.ru
References
1. Ткаченко А. Л. Обзор методов интеллектуального анализа документов // Информационные технологии и автоматизация управления : материалы XI Всерос. науч.-практ. конф. Омск, 2020. С. 2018–2027.
2. Кажемский М. А., Шелухин О. И. Многоклассовая классификация сетевых атак на информационные ресурсы методами машинного обучения // Тр. учеб. заведений связи. 2019. Т. 5, № 1. С. 107–115.
3. Рубцова Ю. С. Методы и алгоритмы построения информационных систем для классификации текстов социальных сетей по тональности : дис. … канд. техн. наук. Новосибирск, 2019. 141 с.
4. Zhang X., Zhao J., LeCun Y. Character-level Convolutional Networks for Text Classification // Neural Information Processing Systems. 2015. Vol. 28. P. 649–657.
5. Батура Т. В. Методы автоматической классификации текстов // Программн. продукты и системы. 2017. Т. 30, № 1. С.85–99.
6. Бондарчук Д. В. Алгоритмы интеллектуального поиска на основе метода категориальных векторов: дис. … канд. техн. наук. Екатеринбург, 2016. 141 с.
7. Jiang L., Li C., Wang S., Zhang L. Deep Feature Weighting for Naive Bayes and its Application to Text Classification // Engineering Applications of Artificial Intelligence. 2016. Vol. 52. P. 26–39.
8. Серобабов А. С. Формирование диапазонов переменных экспертной системы на соответствие нормальному закону распределения // Проблемы и перспективы студ. науки. 2019. № 2. С. 3–6.
9. Nguyen L. Text Classification Based on Support Vector Machine // Dalat University Journal of Science. 2019. Vol. 9, Iss. 2. P. 3–19.
10. Оценка качества в задачах классификации. URL: https://neerc.ifmo.ru/wiki/index. php?title (дата обращения: 10.02.2021).
Review
For citations:
Tkachenko A.L. SOLVING THE PROBLEM OF UNIVERSITY DOCUMENTS CLASSIFICATION BASED ON INTELLECTUAL ANALYSIS METHODS. Proceedings in Cybernetics. 2021;(1 (41)):12-19. (In Russ.) https://doi.org/10.34822/1999-7604-2021-1-12-19