RESISTANCE OF THE ALGORITHMS FOR FEATURE SELECTION TO TYPE II ERRORS
https://doi.org/10.34822/1999-7604-2021-4-78-82
Abstract
The subject of the study is the efficiency of feature selection algorithms in relation to regression problems in the context of frequency of detecting false statistically significant dependencies.
The aim of the study is to build a suitable methodology and to test it on generated data, to test the hypothesis of frequency of occurrence of type II errors from distribution of dependent variable. In total, 7 methods of feature selection were studied in the work: Simulated Annealing, Select Difference,Hill-Climbing, Las Vegas, Sequential Forward Selection, Select Slope, Whale Optimization. Variables distributed according to 8 laws (Beta, Cauchy, exponential, Gamma, log-normal, normal, uniform, Weibull) were chosen as dependent variables. As a result of the study, it was found that the probability of using practical false-valued models is small using a rigorous approach in assessing the quality of models.
About the Author
A. D. CheremukhinRussian Federation
E-mail: ngieu.cheremuhin@yandex.ru
References
1. Djordjevic M., Salom I., Markovic S., Rodic A., Milicevic O., Djodrjevic M. Inferring the Main Drivers of SARS-Cov-2 Global Transmissibility by Feature Selection Methods // Geo-Health. 2021. Vol. 5, Is. 9. P. e2021GH000432.
2. Kaliappan J., Srinivasan K., Qaisar S. M., Sundararajan K., Chang C.-Y., C S. Performance Evaluation of Regression Models for the Prediction of the COVID-19 Reproduction Rate // Front Public Health. 2021. Vol. 9. P. 729795.
3. Conlon E. M., Liu X. S., Lieb J. D., Liu J. S. Integrating Regulatory Motif Discovery and Genome-Wide Expression Analysis // Proc Natl Acad Sci USA. 2003. Vol. 100, № 6. Р. 3339‒3344.
4. Zhong W., Zeng P., Ma P., Liu J. S., Zhu U. RSIR: Regularized Sliced Inverse Regression for Motif Discovery // Bioinformatics. 2005. Vol. 21, No. 22. Р. 4169‒4175.
5. Shastry K. A., Sanjay H. A. A Modified Genetic Algorithm and Weighted Principal Component Analysis Based Feature Selection and Extraction Strategy in Agriculture // Knowledge-Based Systems. 2021. Vol. 232. P. 107460.
6. Jiménez F., García J. M., Sciavicco G., Pechuán L. M. Multi-Objective Evolutionary Feature Selection for Online Sales Forecasting // Neurocomputing. 2017. Vol. 234. P. 75‒92. URL:
7. http://dx.doi.org/10.1016/j.neucom.2016.12.045 (дата обращения: 02.10.2021).
8. Blesser B. A., Kuklinski T. T., Shillman R. J. Empirical Tests for Feature Selection Based on a Psychological Theory of Character Recognition // Pattern Recognition. 1976. Vol. 8, Is. 2. P. 77–85.
9. Tang J., Liu. H. Feature Selection for Social Media Data // ACM Trans Knowl Discov Data. 2014. Vol. 8, Is. 4. Р. 19.
10. Gao Y., Xu A., Hu P. J.-H., Cheng T.-H. Incorporating Association Rule Networks in Feature Category-Weighted Naïve Bayes Model to Support Weaning Decision Making // Decis Support Syst. 2017. Vol. 96. Р. 27‒38.
11. Yuan H., Lau R. Y. K., Xu W. The Determinants of Crowdfunding Success: A Semantic Text Analytics Approach // Decis Support Syst. 2016. Vol. 91. Р. 67‒76.
12. Tibshirani R. Regression Shrinkage and Selection via the Lasso // J R Statist Soc B. 1996. Vol. 58, No. 1. Р. 267‒288.
13. Fan J., Li R. New Estimation and Model Selection Procedures for Semiparametric Modeling in Longitudinal Data Analysis // J Amer Statist Assoc. Vol. 99, № 467. Р. 710‒723.
14. Efron B., Hastie T., Johnstone I., Tibshirani R. Least Angle Regression (with Discussion) // Ann Statist. 2004. Vol. 32, Is. 2. Р. 407‒499.
15. Khalili A. An Overview of the New Feature Selection Methods in Finite Mixture of Regression Models // JIIRS. 2011. Vol. 10, Is. 2. Р. 201‒235.
16. Zhang L., Mistry K., Lim C. P., Neoh S. C. Feature Selection Using Firefly Optimization for Classification and Regression Models // Decis Support Syst. 2018. Vol. 106. P. 64‒85.
17. Shang R., Chang J., Jiao L., Xue Y. Unsupervised Feature Selection Based on Self-Representation Sparse Regression and Local Similarity Preserving // International Journal of Machine
18. Learning and Cybernetics. 2019. Vol. 10. P. 757‒770.
19. Aragón-Royón F., Jiménez-Vílchez A., Arauzo-Azofra A., Benitez J. M. FSinR: An Exhaustive Package for Feature Selection // arXiv:2002.10330 [cs.LG]. 2020. URL: https://arxiv.org/ abs/2002.10330 (дата обращения: 02.10.2021).
20. Posario F., Thangadurai K. Simulated Annealing Algorithm for Feature Selection // International Journal of Computers & Technology. 2016. Vol. 15, № 2. Р. 6471‒6479.
21. Gelbart D., Morgan N., Tsymbal A. Hill-Climbing Feature Selection for Multi-Stream ASR // INTERSPEECH 2009. URL: https://www.icsi.berkeley.edu/pubs/speech/gelbart-2009.pdf (дата обращения: 02.10.2021).
22. Nandy G. An Enhanced Approach to Las Vegas Filter (LVF) Feature Selection Algorithm // 2nd National Conference on Emerging Trends and Applications in Computer Science. 2011. P. 1‒3. URL: https://ieeexplore.ieee.org/document/5751392 (дата обращения: 02.10.2021).
23. Marcano-Cedeño A., Quintanilla J., Cortina-Januchs G., Andina D. Feature Selection Using Sequential Forward Selection and Classification Applying Artificial Metaplasticity Neural Network // 36th Annual Conference on IEEE Industrial Electronics Society. 2010. P. 2845‒2850. URL:
24. https://ieeexplore.ieee.org/document/5675075 (дата обращения: 02.10.2021).
25. Zamani H., Nadimi-Shahraki M. H. Feature Selection Based on Whale Optimization Algorithm for Diseases Diagnosis // International Journal of Computer Science and Information Security. 2016. Vol. 14. Р. 1243‒1247.
Review
For citations:
Cheremukhin A.D. RESISTANCE OF THE ALGORITHMS FOR FEATURE SELECTION TO TYPE II ERRORS. Proceedings in Cybernetics. 2021;(4 (44)):78-82. (In Russ.) https://doi.org/10.34822/1999-7604-2021-4-78-82