INDONESIAN JOURNAL ON DATA SCIENCE https://ejournal.unjaya.ac.id/index.php/ijds <p><strong>Indonesian Journal of Data Science (IJDS) </strong><span class="Y2IQFc" lang="en">is a scientific journal that contains research results in the realm of data science (Data Science). The scope of the journal includes:</span></p> <p><br>1. Big Data<br>2. Machine Learning<br>3. Data Mining<br>4. Deep Learning<br>5. Artificial Intelligence</p> en-US ulfiaesyi@gmail.com (Ulfi Saidata Aesyi) muhammadhabibi17@gmail.com (Muhammad Habibi) Wed, 09 Oct 2024 12:50:39 +0700 OJS 3.1.2.1 http://blogs.law.harvard.edu/tech/rss 60 Tree-based Machine Learning Ensembles and Feature Importance Approach for the Identification of Intrusions in UNR-IDD Dataset https://ejournal.unjaya.ac.id/index.php/ijds/article/view/1302 <p>Detection of intrusions from network data with the use of machine learning techniques has gained great attention in the past decades. One of the key problems in the network security domain is the availability of representative datasets for testing and evaluation purposes. Despite several efforts by researchers to release datasets that can be used for benchmarking attack detection models, some of the released datasets still suffer from one limitation or the other. Thus, some researchers at the University of Nevada released a dataset named UNR-IDD dataset which was argued to be free from some of the limitations of the past datasets. This study proposed Tree-based ensemble approaches for building binary intrusion identification models from the UNR-IDD dataset. Decision Tree algorithms are used as base classifiers in the Extra Trees, Random Forest and AdaBoost-based intrusion detection models. The results of the experimental analyses carried out indicated that the three ensembles performed excellently when feature selection was used compared to when all features were applied. For instance, Extra Trees model achieved an accuracy of 0.97, precision of 0.98, recall of 0.98 and f1-score of 0.98. Similarly, Random Forest model achieved an accuracy of 0.98, precision of 0.98, recall of 0.99 and f1-score of 0.98. Adaboost-based model had an accuracy of 0.96, precision of 0.96, recall of 0.99 and f1-score of 0.98. It was deduced that Random Forest intrusion classification model achieved slight overall best results when compared to the other models built. It is concluded that the three homogeneous ensemble models achieved very promising results while feature importance was used as attribute selection method.</p> Akinyemi OYELAKIN Copyright (c) 2024 INDONESIAN JOURNAL ON DATA SCIENCE https://ejournal.unjaya.ac.id/index.php/ijds/article/view/1302 Wed, 29 May 2024 09:23:51 +0700 ANALISIS TRANSFER DATA PADA JARINGAN TERDAMPAK ARP SPOOFING MENGGUNAKAN METODE ARP POISONING DAN STATISTIK DESKRIPTIF https://ejournal.unjaya.ac.id/index.php/ijds/article/view/1375 <table> <tbody> <tr> <td> <p>This <em>&nbsp;Computer network security issues are very important and need to be considered in the development of computer networks. Networks connected to network devices are usually vulnerable to hacking. Hacking is an activity that allows a person or group to change or take data for personal gain. The aim of this research is to carry out testing and analysis to determine the condition and measure the level of security of the ITDA Yogyakarta intra-campus information system and computer network. Describe security gaps and measure the level of security that needs to be immediately repaired so that it can help correct failures in maintaining the security of ITDA Yogayakarta intra-campus information systems and networks. This research uses descriptive statistics with 20 PC units as samples. There were four tests in this study with a total success of 16 out of 20 samples. From the results of Arp spoofing on the local network, it can be concluded that after the local network is infiltrated by an attacker using the ARP spoofing method, the target traffic will be redirected to the attacker's device. This can allow attackers to monitor and understand the contents of data traffic on the local network. Changing the attacker's MAC address is very necessary because if the MAC is not replaced then network traffic will not be redirected to the attacker's device.</em></p> </td> </tr> </tbody> </table> sudaryanto, Dwi Nugraheny Copyright (c) 2024 INDONESIAN JOURNAL ON DATA SCIENCE https://ejournal.unjaya.ac.id/index.php/ijds/article/view/1375 Tue, 09 Jul 2024 17:39:33 +0700 Metode Latent Dirichlet Allocation Untuk Menentukan Topik Pada Review Drama Korea https://ejournal.unjaya.ac.id/index.php/ijds/article/view/1345 <p>The Hallyu Wave, involving the spread of South Korean culture and popular media, has rapidly grown over the past two decades. In addition to entertainment industries such as K-pop and K-drama, this phenomenon has also extended into the food and K-beauty sectors. Korean dramas, as the core of Hallyu, have become a global phenomenon with a continuously expanding fan base worldwide. A global survey in 2022 indicated that 36 percent of respondents in 26 countries considered Korean dramas very popular in their respective countries. In Indonesia, Korean films and dramas remain favorites, with 72 percent of streaming audiences choosing them on OTT services throughout 2022. Viu dominates as the most popular Korean drama streaming platform with 57 percent usage, followed by Netflix, Telegram, and WeTv. This research focuses on the analysis of Korean drama review data from 2015 to 2023 using the Latent Dirichlet Allocation (LDA) method. The goal is to provide a deep understanding of critical aspects such as acting, storyline, and cinematography. With LDA, this research aims to identify topics related to these elements, offering specific insights into audience preferences. From the conducted research, 10 ideal topics emerged out of 20 existing topics to ensure topic consistency using topic coherence. From the topic coherence results for these 20 topics, it can be concluded that the overall topic score for topic 10 is 0.527, providing ideal results for topic modeling in accordance with the rules.</p> Alfun Roehatul Jannah, Ria Kristi, Muhammad Habibi Copyright (c) 2024 INDONESIAN JOURNAL ON DATA SCIENCE https://ejournal.unjaya.ac.id/index.php/ijds/article/view/1345 Wed, 07 Aug 2024 14:12:31 +0700 ANALISIS PROYEKSI KEBUTUHAN TENAGA KERJA BERDASARKAN SKILLS YANG DIBUTUHKAN MENGGUNAKAN ALGORITMA NAIVE BAYES CLASSIFIER https://ejournal.unjaya.ac.id/index.php/ijds/article/view/1346 <p><strong><em>In August 2023, Indonesia faced an unemployment rate of 7.86 million people, although there is no denying that the percentage of unemployment has decreased from the previous year. The data is categorized into four groups, namely unemployment involves those who are looking for work, trying to set up a business having trouble landing a job, and even those who have worked but have not started. The Covid-19 pandemic changed the paradigm of work to remote, but the need for job information remains key. Labor demand projections provide long-term insights into promising sectors and fields, guiding job seekers to develop skills according to labor market trends. This research was conducted using naive bayes classification, which is a text classification method that relies on the likelihood of keywords to compare training and testing documents. This classification method is expected to help reduce unemployment rates and align individual skills with industry needs, contributing to education and training policies to make smart career decisions in the digital era.</em></strong></p> Nur Azizah Firdausa, Ribka Rifanny Br Girsang, Dela Oktaviana, Astr Wahyuningsiam, Muhammad Habibi Copyright (c) 2024 INDONESIAN JOURNAL ON DATA SCIENCE https://ejournal.unjaya.ac.id/index.php/ijds/article/view/1346 Fri, 09 Aug 2024 09:22:48 +0700 Pemetaan Opini Publik Menggunakan Data Mining: Studi Kasus Naturalisasi Pemain Sepak Bola dengan K-Means dan Naive Bayes Classifier https://ejournal.unjaya.ac.id/index.php/ijds/article/view/1318 <p>Naturalisasi merupakan salah satu proses yang dilakukan oleh warga asing agar menjadi Warga Negara Indonesia (WNI) yang sah di mata hukum. Saat ini Timnas Indonesia memiliki beberapa pemain naturalisasi . Beberapa kalangan menyambut positif kehadiran mereka, melihatnya sebagai langkah strategis untuk meningkatkan kualitas dan daya saing tim. Namun, ada pula yang merasa skeptis dan meragukan keberlanjutan dukungan terhadap pemain lokal. Data yang diambil dari 3584 komentar YouTube melalui YouTube Data API mencerminkan keragaman opini yang dapat memberikan gambaran lebih mendalam tentang dinamika pandangan publik. Penelitian ini penting dalam konteks pemahaman pandangan masyarakat terhadap naturalisasi pemain sepak bola Timnas. Dengan menggunakan teknik Data Mining, terutama K-Means Clustering dan Naive Bayes Classifier, penelitian ini memberikan wawasan mendalam tentang kelompok-kelompok masyarakat dengan perspektif serupa atau berbeda terkait isu tersebut. Hasil dari proses K-Means Clustering digunakan sebagai label awal untuk melatih model Naive Bayes Classifier. Evaluasi kinerja model dilakukan menggunakan confusion matrix, yang menghasilkan akurasi sebesar 93,17% dan error rate sebesar 6,83%. Proses ini dilakukan terhadap dataset komentar YouTube yang telah diberi label melalui K-Means Clustering. Hasil klasifikasi menggunakan metode Naive Bayes menunjukan bahwa 3328 data komentar setuju dengan adanya naturalisasi pemain dan 256 data komentar tidak setuju.</p> Tegar Agustian, Emilia Fresia Nandela, Stani A. Sinay, Muhammad Habibi Copyright (c) 2024 INDONESIAN JOURNAL ON DATA SCIENCE https://ejournal.unjaya.ac.id/index.php/ijds/article/view/1318 Sat, 10 Aug 2024 00:00:00 +0700