Tree-based Machine Learning Ensembles and Feature Importance Approach for the Identification of Intrusions in UNR-IDD Dataset

  • Akinyemi OYELAKIN Department of Computer Science, Al-Hikmah University, Ilorin
Keywords: Intrusion Detection Datasets, Intrusion Detection, Tree-based Algorithms, Ensemble Learners

Abstract

Detection of intrusions from network data with the use of machine learning techniques has gained great attention in the past decades. One of the key problems in the network security domain is the availability of representative datasets for testing and evaluation purposes. Despite several efforts by researchers to release datasets that can be used for benchmarking attack detection models, some of the released datasets still suffer from one limitation or the other. Thus, some researchers at the University of Nevada released a dataset named UNR-IDD dataset which was argued to be free from some of the limitations of the past datasets. This study proposed Tree-based ensemble approaches for building binary intrusion identification models from the UNR-IDD dataset. Decision Tree algorithms are used as base classifiers in the Extra Trees, Random Forest and AdaBoost-based intrusion detection models. The results of the experimental analyses carried out indicated that the three ensembles performed excellently when feature selection was used compared to when all features were applied. For instance, Extra Trees model achieved an accuracy of 0.97, precision of 0.98, recall of 0.98 and f1-score of 0.98. Similarly, Random Forest model achieved an accuracy of 0.98, precision of 0.98, recall of 0.99 and f1-score of 0.98. Adaboost-based model had an accuracy of 0.96, precision of 0.96, recall of 0.99 and f1-score of 0.98. It was deduced that Random Forest intrusion classification model achieved slight overall best results when compared to the other models built. It is concluded that the three homogeneous ensemble models achieved very promising results while feature importance was used as attribute selection method.

References

Almutairi Yasmeen S.,Alhazmi Bader & Munshi Amr A. (2022).Network Intrusion Detection Using Machine Learning Techniques Advances in Science and Technology Research Journal 2022, 16(3), 193–206, https://doi.org/10.12913/22998624/149934
Banerjee M, Reynolds E, Andersson H. B, Nallamothu B. K.(2019). Tree-Based Analysis. Circ Cardiovasc Qual Outcomes.;12(5):e004879. doi: 10.1161/CIRCOUTCOMES.118.004879. Erratum in: Circ Cardiovasc Qual Outcomes. 2019 Jun;12(6):e000056. PMID: 31043064; PMCID: PMC6555420.
Bauer E. & Kohavi R..(1999). comparison of voting classification algorithms: Bagging, boosting and variants. Machine An empirical Learning, 36(1/2), 525–536
Breiman, L. (1996). Bagging Predictors, Machine Learning, 26, No. 2, 123-140
Das. T., Hamdan O. A., Shukla R. M., Sengupta S. and Arslan E. (2023).UNR-IDD: Intrusion Detection Dataset using Network Port Statistics, 2023 IEEE 20th Consumer Communications & Networking Conference (CCNC), Las Vegas, NV, USA, 2023, pp. 497-500, doi: 10.1109/CCNC51644.2023.10059640.
Dong G, & Liu H (2018). Feature engineering for machine learning and data analytics. Boca Raton: CRC Press; 2018.
Freund Yoav & Schapire Robert E. (1999).A Short Introduction to Boosting, Journal of Japanese Society for Artificial Intelligence, 14(5):771-780, September, 1999. (In Japanese, translation by Naoki Abe, 1-14, https://cseweb.ucsd.edu/~yfreund/papers/IntroToBoosting.pdf
Freund, Y., & Schapire, R. (1996). Experiments with anew boosting algorithm. In Proceedings of the ThirteenthInternational Conference on Machine Learning, pp. 148–156 Bari, Italy.
Hadeel Alazzam, Ahmad Sharieh, Khair Eddin Sabri (2020).A feature selection algorithm for intrusion detection system based on Pigeon Inspired Optimizer, Expert Systems With Applications, ttps://doi.org/10.1016/j.eswa.2020.11324
Jimoh R. G., Oyelakin A. M., Abikoye O. C., Akanbi M. B., Gbolagade M. D., Akanni A. O., Jibrin M. A. & Ogundele T. S. (2023). Efficient Ensemble-based Phishing Website Classification Models using Feature Importance Attribute Selection and Hyper parameter Tuning Approaches, Journal of Information Technology and Computing, 2023 Volume 4, Issue 2 : 1 – 10 DOI : 10.48185/jitc.v4i2.891
Kasongo, S.M., Sun, Y. (2020). Performance Analysis of Intrusion Detection Systems Using a Feature Selection Method on the UNSW-NB15 Dataset. Journal of Big Data 7, 105 https://doi.org/10.1186/s40537-020-00379-6
Khammassi, Chaouki; Krichen, Saoussen (2017). A GA-LR wrapper approach for feature selection in network intrusion detection. Computers & Security, 70(), 255–277. doi:10.1016/j.cose.2017.06.005 
Kotsiantis S. B. (2011).Feature selection for machine learning classification problems: a recent overview,
Artif Intell Rev, DOI 10.1007/s10462-011-9230-1, Springer Science+Business Media B.V., https://www.cs.upc.edu/~bejar/URL/articulos/Feature%20selection%202011%20Kotsiantis.pdf
Khammassi C & Krichen S. (2017). A GA-LR wrapper approach for feature selection in network intrusion detection. Computer Security, 2017; 70:255–77.
Oyelakin A. M., Akanbi M. B., Ogundele T. S., Akanni A. O. , Gbolagade M. D., Rilwan M. D., Jibrin M. A. (2023). A Machine Learning Approach for the Identification of Network Intrusions Based on Ensemble XGBOOST Classifier, Indonesian Journal of Data and Science, 4(3), 190-197
Oyelakin A. M., Alimi M. O., Mustapha I.O. & Ajiboye I. K. (2021). Analysis of Single and Ensemble Machine Learning Classifiers for Phishing Attacks Detection. International Journal of Software Engineering and Computer Systems, 7(2), 44–49, Faculty of Computing, College of Computing and Applied Sciences, Universiti Malaysia Pahang, https://doi.org/10.15282/ijsecs.7.2.2021.5.0088
Pierre Geurts, Damien Ernst, and Louis Wehenkel. Extremely randomized trees. In Machine Learning, pages 3–42. Machine Learning, 2006
Singh P. (2019). Supervised machine learning. In: Learn PySpark. Springer. pp. 117–59
Tahri Rachid , Balouki Youssef , Jarrar Abdessamad & Abdellatif Lasbahani (2022). Intrusion Detection System Using machine learning Algorithms, ITM Web of Conferences 46(6):02003, DOI: 10.1051/itmconf/20224602003
Published
2024-05-29