ON THE COMPREHENSIVE ANALYSES OF CTU-13 BOTNET DATASET FOR CYBER SECURITY RESEARCHES

Authors

  • Jimoh Rasheed Gbenga Department of Computer Science, University of Ilorin, Ilorin, Nigeria
  • Akinyemi OYELAKIN Department of Computer Science, Crescent University, Abeokuta, Nigeria

Keywords:

Botnet Classification, Exploratory Data Analysis, Machine Learning Methods, Dataset Imbalance

Abstract

Attackers use malware to launch attacks in the internet and corporate networks. Over the years, machine learning techniques have been found promising for the classification of these attacks because they have the ability to identify unknown threats.  Botnets are networks of compromised devices and have been found to be powerful threat vectors that are used against modern systems because they use command and control (C2) characteristics which make their detection very difficult.  Generally, to build attack detection models, intrusion datasets are employed. Comprehensive study of the benchmarking datasets used in intrusion detection researches can provide different actionable insights to other researchers. There have been studies that investigated the analyses of datasets for building intrusion detection systems. However, there has been less focus on the analysis of intrusion detection datasets that are used specifically for botnets detection. This study reported an overview of a popular botnet dataset named CTU-13. Thereafter, the work carried out detailed exploratory analysis of the dataset. The study equally sought to identify if the dataset is representative enough for Machine Learning based botnet detection studies. All the thirteen scenarios in the dataset were used for the experimentations. The exploratory analyses were carried out on each of the thirteen scenarios of the dataset with a view to gaining better understanding of the patterns and characteristics of data in each of them. The information obtained from the overview and exploratory analyses provided actionable insights on how to better use the datasets for improved botnet classification. The challenges of using the captures of the dataset were also identified. In particular, the exploratory investigation of the thirteen captures of the CTU-13 dataset revealed that it has very complex patterns, contain mixed data types and suffers from high class imbalance problem. The results of the exploratory analyses can guide the decision of future cyber security researches. Thus, improved machine learning-based botnet detection models can be built by attending to the issues in the dataset.

References

[1] Z. Lei, Y. Shui, W. Di & Paul Watters, “A Survey on Latest Botnet Attack and Defense”, International Joint Conference of IEEE Trustcom-11/IEEE ICESS-11/FCST-11, 2011.IV, 2011

[2] A. Pektas & T. Acarman, “Effective Feature Selection for Botnet Detection Based on Network Flow Analysis”. Inter,2017

[3] A. Pektaş, & T. Acarman, ” Botnet detection based on network flow summary and deep learning”. International Journal of Network Management, 28(6), 1–15. https://doi.org/10.1002/nem.2039, 2018

[4] S. Garcia, M. Grill, J. Stiborek & A. Zunino, “An empirical comparison of botnet detection method”, Computers and Security Journal, Elsevier, 45, 100-123. http://dx.doi.org/10.1016/j.cose.2014.05.011

detection approaches., 247–255. doi:10.1109/cns.2014.6997492, 2014

[5] L. Bilge, D. Balzarotti, W. Robertson, E. Kirda, & C. Kruegel, “Disclosure: detecting botnet command and control servers through large-scale netflow analysis”, in Proceedings of the 28th Annual Computer Security Applications Conference, ACM, 2012, 129–138,2012

[6] S. M. Kasongo & Y. Sun Y., “Performance Analysis of Intrusion Detection Systems Using a Feature Selection Method on the UNSW-NB15 Dataset”. Journal of Big Data 7, 105 (2020). https://doi.org/10.1186/s40537-020-00379-6-2866-1_3, 2020

[7] M. Ghurab, G. Gaphari, F. Alshami, R. Alshamy & S. Othman, “A Detailed Analysis of Benchmark Datasets for Network Intrusion Detection System”, Asian Journal of Research in Computer Science 7(4): 14-33,DOI: 10.9734/ajrcos/2021/v7i430185, 2021

[8] A. Alenazi, I. Traore, K. Ganame, & I. Woungang, “Holistic Model for HTTP Botnet Detection Based on DNS Traffic Analysis”. In: Traore I., Woungang I., Awad A. (eds) Intelligent, Secure, and Dependable Systems in Distributed and Cloud Environments. ISDDC 2017. Lecture Notes in Computer Science, vol 10618. Springer, Cham, 2017

[10] L. H. Clemmensen & R. D. Kjærsgaard, “Data Representativity for Machine Learning and AI Systems”, retrieved from https://www.semanticscholar.org/reader/ac9dd0a22c31c4e12c6c48559c4d06e567dbeac8 on 23rd December, 2023

[11] A. M. Oyelakin & R. G. Jimoh, “Tree-Based Learning Models for Botnet Malware Classification in Real Life Sub-Sample Dataset”, Innovative Computing Review, published by the School of Systems and Technology (SST), University of Management and Technology (UMT), Lahore, Pakistan, 3(2), 1-13, Dec, 2023

[12] A. M. Oyelakin, A. O. Ameen, T. S. Ogundele, T. T. Salau-Ibrahim, U. T. Abdulrauf, H. I. Olufadi, I. K. Ajiboye, S. Muhammad-Thani, & I. A. Adeniji, “Overview and Exploratory Analyses of CICIDS 2017 Intrusion Detection Dataset”, Journal of Systems Engineering and Information Technology (JOSEIT), 2(2), 45-52. https://doi.org/10.29207/joseit.v2i2.5411, 2023

[13] A. Mashkanova, “Exploratory Data Analysis toward Cloud Intrusion Detection”, A Master Thesis submitted to University of Victoria for the award of M.Sc. Computer Science, 2019

[14] H. A. Mohammad & B. Najla , “A Detailed Analysis of New Intrusion Detection Dataset”, Journal of Theoretical and Applied Information Technology,15th September 2019. Vol.97. No 17, 2019

[15] R. Panigrahi & S. Borah, “A detailed analysis of CICIDS2017 dataset for designing Intrusion Detection Systems”, International Journal of Engineering & Technology 7(3):479-482, 2018

[16] A. M. Oyelakin, T. T. Salau-Ibrahim, B. S. Ogidan, R. D. Azeez & I. K. Ajiboye, ”Peer-to-Peer Botnets: A Survey of Propagation, Detection and Detection Evasive Techniques”, Fulafia Journal of Science and Technology, a Tetfund-funded Journal of Federal University, Lafia, Nassarawa State, Nigeria, 5(3):13-18, 2019

[17] M. Swamynathan, “Mastering Machine Learning with Python in Six steps, A Practical

Implementation Guide to Predictive Data Analytics Using Python”, DOI:10.1007/978-1-4842,2017

[18] S. Harun, T. H. Bhuiyan, S. Zhang, H. Medal & L. Bian, “Bot Classification for Real-Life Highly Class- Imbalanced Dataset”, 2017 IEEE 15th Intl Conf on Dependable, Autonomic and Secure Computing, 15th Intl Conf on Pervasive Intelligence and Computing, 3rd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress,565–572. , 2017

[19] A. M. Oyelakin & R. G. Jimoh, “Towards Building an Improved Botnet Detection Model in Highly Imbalance Botnet Dataset-A Methodological Framework”, Middle East Journal of Applied Science & Technology, 3(1), January - March 2020, available at http://mejast.com/towards-building-an-improved-botnet-detection-model-in-highly-imbalance-botnet-dataset-a-methodological-framework.html, 2020

[20] M. Malowidzki, P. Berezinski & M. Mazur, ” Network Intrusion Detection: Half a Kingdom for a Good Dataset”, Conference: NATO STO- IST-139 Visual Analytics for Exploring, Analysing and Understanding Vast, Complex and Dynamic Data retrieved from https://pdfs.semanticscholar.org/b39e/0f1568d8668d00e4a8bfe1494b5a32a17e17.pdf?_ga=2.237473350.756880770.1576358584-422052986.1572640169, 2015

[21] H. A. Gameng, B. B. Gerardo & R. P. Medina, “Modified Adaptive Synthetic SMOTE to Improve Classification Performance in Imbalanced Datasets”,2019 IEEE 6th International Conference on Engineering Technologies and Applied Sciences (ICETAS), Kuala Lumpur, Malaysia, 2019, pp. 1-5, doi: 10.1109/ICETAS48360.2019.9117287, 2019

[22] C. Beyan & R. Fisher, ”Classifying imbalanced datasets using similarity based hierarchical decomposition”, Pattern recognition, 48(5), 1653-16728, 2015

Downloads

Published

2025-06-02

Citation Check