Machine Learning Application for Classification Prediction of Household’s Welfare Status
Main Article Content
Abstract
Various approaches have been attempted by the Government of Indonesia to eradicate poverty throughout the country, one of which is equitable distribution of social assistance for target households according to their classification of social welfare status. This research aims to re-evaluate the prior evaluation of five well-known machine learning techniques; Naïve Bayes, Random Forest, Support Vector Machines, K-Nearest Neighbor, and C4.5 Algorithm; on how well they predict the classifications of social welfare statuses. Afterwards, the best-performing one is implemented into an executable machine learning application that may predict the user’s social welfare status. Other objectives are to analyze the reliability of the chosen algorithm in predicting new data set, and generate a simple classification-prediction application. This research uses Python Programming Language, Scikit-Learn Library, Jupyter Notebook, and PyInstaller to perform all the methodology processes. The results shows that Random Forest Algorithm is the best machine learning technique for predicting household’s social welfare status with classification accuracy of 74.20% and the resulted application based on it could correctly predict 60.00% of user’s social welfare status out of 40 entries.
Downloads
Article Details
Please find the rights and licenses in the Journal of Information Technology and Computer Engineering (JITCE).
1. License
The non-commercial use of the article will be governed by the Creative Commons Attribution license as currently displayed on Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
2. Author(s)’ Warranties
The author(s) warrants that the article is original, written by stated author(s), has not been published before, contains no unlawful statements, does not infringe the rights of others, is subject to copyright that is vested exclusively in the author and free of any third party rights, and that any necessary permissions to quote from other sources have been obtained by the author(s).
3. User Rights
JITCE adopts the spirit of open access and open science, which disseminates articles published as free as possible under the Creative Commons license. JITCE permits users to copy, distribute, display, and perform the work for non-commercial purposes only. Users will also need to attribute authors and JITCE on distributing works in the journal.
4. Rights of Authors
Authors retain the following rights:
- Copyright, and other proprietary rights relating to the article, such as patent rights,
- the right to use the substance of the article in future own works, including lectures and books,
- the right to reproduce the article for own purposes,
- the right to self-archive the article.
- the right to enter into separate, additional contractual arrangements for the non-exclusive distribution of the article's published version (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal (Journal of Information Technology and Computer Engineering).
5. Co-Authorship
If the article was jointly prepared by other authors; upon submitting the article, the author is agreed on this form and warrants that he/she has been authorized by all co-authors on their behalf, and agrees to inform his/her co-authors. JITCE will be freed on any disputes that will occur regarding this issue.
7. Royalties
By submitting the articles, the authors agreed that no fees are payable from JITCE.
8. Miscellaneous
JITCE will publish the article (or have it published) in the journal if the article’s editorial process is successfully completed and JITCE or its sublicensee has become obligated to have the article published. JITCE may adjust the article to a style of punctuation, spelling, capitalization, referencing and usage that it deems appropriate. The author acknowledges that the article may be published so that it will be publicly accessible and such access will be free of charge for the readers.
References
[2] Ansuategi, A., Greno, P., Houlden, V., Markandya, A., Onofri, L., Picot, H., … Walmsley, N. (2015). The Impact of Climate Change of the Achievement of the Post-2015 Sustainable Development Goals.
[3] Anyanwu, M. N., & Shiva, S. G. (2009). Comparative Analysis of Serial Decision Tree Classification Algorithms. International Journal of Computer Science and Security (IJCSS), 3(3), 230–240.
[4] Beklemysheva, A. (2019). Why Use Python for AI and Machine Learning. https://steelkiwi.com/blog/python-for-ai-and-machine-learning/
[5] Biau, G. (2012). Analysis of a Random Forests Model. Journal of Machine Learning Research, 13, 1063–1095.
[6] Brownlee, J. (2019). Overfitting and Underfitting With Machine Learning Algorithms. https://machinelearningmastery.com/overfitting-and-underfitting-with-machine-learning-algorithms/
[7] Chand, M. (2019). Best Programming Language for Machine Learning. Retrieved November 26, 2019, from https://www.c-sharpcorner.com/article/best-programming-language-for-machine-learning/
[8] Denil, M., Matheson, D., & Freitas, N. de. (n.d.). Narrowing the Gap: Random Forests In Theory and In Practice. Proceedings of the 31 St International Conference on Machine. Learning, Beijing, China, 32.
[9] Géron, A. (2017). Hands-On Machine Learning with Scikit-Learn and TensorFlow (1st ed.; N. Tachhe, Ed.). United States of America: O’Reilly Media, Inc.
[10] Gorunescu, F. (2011). Data Mining: Concepts, Models and Techniques. Germany: Springer-Verlag Berlin Heildelberg.
[11] H., Jesmeen. M. Z., Hossen, J., Sayeed, S., Ho, C. K., K., T., Armanur, R., & Arif, E. M. H. (2018). A Survey on Cleaning Dirty Data Using Machine Learning Paradigm for Big Data Analytics. Indonesian Journal of Electrical Engineering and Computer Science, 10(3), 1234–1243.
[12] Hastuti, K. (2012). Analisis Komparasi Algoritma Klasifikasi Data Mining untuk Prediksi Mahasiswa Non Aktif. Seminar Nasional Teknologi Informasi Dan Komunikasi Terapan, 241–249.
[13] ISGE. (2018). List of Developing Countries. Retrieved from https://isge2018.isgesociety.com/registration/list-of-developing-countries/.
[14] Iskandar, D., & Suprapto, Y. K. (2013). Perbandingan Akurasi Klasifikasi Tingkat Kemiskinan antara Algoritma C4.5 and Naive Bayes Classifier. JAVA Journal of Electrical and Electronics Engineering, 11(1), 14–17.
[15] Jeatrakul, P., Wong, K. W., & Fung, C. C. (2010). Data Cleaning for Classification Using Misclassification Analysis. Journal of Advanced Computational Intelligence and Intelligence Informatics, 14(3), 297–302.
[16] Jović, A., Brkić, K., & Bogunović, N. (2014). An overview of free software tools for general data mining.
[17] Jupyter, P. (2019). Project Jupyter. Retrieved November 26, 2019, from https://jupyter.org/
[18] Karyadiputra, E. (2016). Analisis Algoritma Naive Bayes untuk Klasifikasi Status Kesejahteraan Rumah Tangga Keluarga Binaan Sosial. Jurnal Ilmiah Fakultas Teknik Technologia, 7(4), 199–208.
[19] Kawelah, W. A. A. S., & Abdala, A. S. E. (2019). A Comparative Study for Machine Learning Tools Using WEKA and Rapid Miner with Classifier Algorithms Random Tree and Random Forest for Network Intrusion Detection. International Journal of Innovative Science and Research Technology, 4(4), 749–752.
[20] Khondoker, M., Dobson, R., Skirrow, C., Simmons, A., & Stahl, D. (2013). A comparison of machine learning methods for classification using simulation with multiple real data examples from mental health studies. Statistical Methods in Medical Research.
[21] Kołcz, A., Chowdhury, A., & Alspector, J. (2003). Data Duplication: An Imbalance Problem? Workshop on Learning from Imbalanced Datasets II, ICML.
[22] Nofriani. (2013). Pembangunan Mesin Pencari Statistik Berbasiskan Supervised Learning dan Relevant Feedback. Sekolah Tinggi Ilmu Statistik (Polstat STIS).
[23] Nofriani. (2019). Comparations of Supervised Machine Learning Techniques in Predicting the Classification of the Household’s Welfare Status. Pekommas Journal, 4(1), 43–52. https://doi.org/10.30818/jpkm.2019.2040105.
[24] Patra, B. G., Kundu, A., Das, D., & Bandyopadhyay, S. (2012). Classification of Interviews – A Case Study on Cancer Patients. Proceedings of the 2nd Workshop on Sentiment Analysis Where AI Meets Psychology, 27–36.
[25] Patton, R. (2001). Software Testing. United States of America: Sams Publishing.
[26] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., & Bertrand Thirion. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
[27] Pretorius, A., Bierman, S., & Steel, S. J. (2016). A Meta-Analysis of Research in Random Forests for Classification. 2016 Pattern Recognition Association of South Africa and Robotics and Mechatronics: International Conference (PRASA-RobMech).
[28] Roßbach, P. (2018). Neural Networks vs. Random Forests – Does It Always Have to be Deep Learning? Germany: Frankfurt School of Finance and Management.
[29] Selvi, P. (2017). An Analysis on Removal of Duplicate Records using Different Types of Data Mining Techniques: A Survey. International Journal of Computer Science and Mobile Computing, 6(11), 38–42.
[30] Shafer, J., Agrawal, R., & Mehta, M. (1996). SPRINT: A Scalable Parallel Classifier for Data Mining. Proceedings of the 22nd International Conference on Very Large Database Mumbai (Bombay) India, 544–555.
[31] Tang, C., Garreau, D., & Luxburg, U. von. (2018). When do random forests fail? 32nd Conference on Neural Information Processing Systems (NeurIPS).
[32] TNP2K. (2019). Tentang Data Terpadu PPFM. Retrieved December 3, 2019, from http://bdt.tnp2k.go.id/tentang
[33] Umarani, V., & Rathika, C. (2019). Predicting Safety Information of Drugs Using Data Mining Technique. International Journal of Computer Engineering & Technology (IJCET), 10(2), 89–90.
[34] Valencia-Zapata, G., Mejia, D., Klimeck, G., Zentner, M. G., & Ersoy, O. (2017). A Statistical Approach to Increase Classification Accuracy in Supervised Learning Algorithms.
[35] Waikato, U. (2019). Weka - Machine Learning Software in Java. Retrieved November 25, 2019, from website: http://www.cs.waikato.ac.nz/ml/weka/
[36] Oyedeji, A. O., Salami, A. M., Folorunsho, O., & Abolade, O. R. (2020). Analysis and Prediction of Student Academic Performance Using Machine Learning. JITCE (Journal of Information Technology and Computer Engineering), 4(1), 10–15.