INDONESIAN JOURNAL ON DATA SCIENCE https://ejournal.unjaya.ac.id/index.php/ijds <p><strong>Indonesian Journal of Data Science (IJDS) </strong><span class="Y2IQFc" lang="en">is a scientific journal that contains research results in the realm of data science (Data Science). The scope of the journal includes:</span></p> <p><br>1. Big Data<br>2. Machine Learning<br>3. Data Mining<br>4. Deep Learning<br>5. Artificial Intelligence</p> Lembaga Penelitian dan Pengabdian Kepada Masyarakat Universitas Achmad Yani Yogyakarta en-US INDONESIAN JOURNAL ON DATA SCIENCE 2987-7423 Tree-based Machine Learning Ensembles and Feature Importance Approach for the Identification of Intrusions in UNR-IDD Dataset https://ejournal.unjaya.ac.id/index.php/ijds/article/view/1302 <p>Detection of intrusions from network data with the use of machine learning techniques has gained great attention in the past decades. One of the key problems in the network security domain is the availability of representative datasets for testing and evaluation purposes. Despite several efforts by researchers to release datasets that can be used for benchmarking attack detection models, some of the released datasets still suffer from one limitation or the other. Thus, some researchers at the University of Nevada released a dataset named UNR-IDD dataset which was argued to be free from some of the limitations of the past datasets. This study proposed Tree-based ensemble approaches for building binary intrusion identification models from the UNR-IDD dataset. Decision Tree algorithms are used as base classifiers in the Extra Trees, Random Forest and AdaBoost-based intrusion detection models. The results of the experimental analyses carried out indicated that the three ensembles performed excellently when feature selection was used compared to when all features were applied. For instance, Extra Trees model achieved an accuracy of 0.97, precision of 0.98, recall of 0.98 and f1-score of 0.98. Similarly, Random Forest model achieved an accuracy of 0.98, precision of 0.98, recall of 0.99 and f1-score of 0.98. Adaboost-based model had an accuracy of 0.96, precision of 0.96, recall of 0.99 and f1-score of 0.98. It was deduced that Random Forest intrusion classification model achieved slight overall best results when compared to the other models built. It is concluded that the three homogeneous ensemble models achieved very promising results while feature importance was used as attribute selection method.</p> Akinyemi OYELAKIN Copyright (c) 2024 INDONESIAN JOURNAL ON DATA SCIENCE 2024-05-29 2024-05-29 2 1 1 9 10.30989/ijds.v2i1.1302