Machine Learning Application for Classification Prediction of Household’s Welfare Status

Main Article Content

Nofriani Nofriani

Abstract

Various approaches have been attempted by the Government of Indonesia to eradicate poverty throughout the country, one of which is equitable distribution of social assistance for target households according to their classification of social welfare status. This research aims to re-evaluate the prior evaluation of five well-known machine learning techniques; Naïve Bayes, Random Forest, Support Vector Machines, K-Nearest Neighbor, and C4.5 Algorithm; on how well they predict the classifications of social welfare statuses. Afterwards, the best-performing one is implemented into an executable machine learning application that may predict the user’s social welfare status. Other objectives are to analyze the reliability of the chosen algorithm in predicting new data set, and generate a simple classification-prediction application. This research uses Python Programming Language, Scikit-Learn Library, Jupyter Notebook, and PyInstaller to perform all the methodology processes. The results shows that Random Forest Algorithm is the best machine learning technique for predicting household’s social welfare status with classification accuracy of 74.20% and the resulted application based on it could correctly predict 60.00% of user’s social welfare status out of 40 entries.

Downloads

Download data is not yet available.

Article Details

How to Cite
Nofriani, N. (2020, September 30). Machine Learning Application for Classification Prediction of Household’s Welfare Status. JITCE (Journal of Information Technology and Computer Engineering), 4(02), 72-82. https://doi.org/https://doi.org/10.25077/jitce.4.02.72-82.2020
Section
Articles

References

[1] Alpaydin, E. (2010). Introduction to Machine Learning (2nd ed.; T. Dietterich, Ed.). London: Massachusets Insitute of Technology.
[2] Ansuategi, A., Greno, P., Houlden, V., Markandya, A., Onofri, L., Picot, H., … Walmsley, N. (2015). The Impact of Climate Change of the Achievement of the Post-2015 Sustainable Development Goals.
[3] Anyanwu, M. N., & Shiva, S. G. (2009). Comparative Analysis of Serial Decision Tree Classification Algorithms. International Journal of Computer Science and Security (IJCSS), 3(3), 230–240.
[4] Beklemysheva, A. (2019). Why Use Python for AI and Machine Learning. https://steelkiwi.com/blog/python-for-ai-and-machine-learning/
[5] Biau, G. (2012). Analysis of a Random Forests Model. Journal of Machine Learning Research, 13, 1063–1095.
[6] Brownlee, J. (2019). Overfitting and Underfitting With Machine Learning Algorithms. https://machinelearningmastery.com/overfitting-and-underfitting-with-machine-learning-algorithms/
[7] Chand, M. (2019). Best Programming Language for Machine Learning. Retrieved November 26, 2019, from https://www.c-sharpcorner.com/article/best-programming-language-for-machine-learning/
[8] Denil, M., Matheson, D., & Freitas, N. de. (n.d.). Narrowing the Gap: Random Forests In Theory and In Practice. Proceedings of the 31 St International Conference on Machine. Learning, Beijing, China, 32.
[9] Géron, A. (2017). Hands-On Machine Learning with Scikit-Learn and TensorFlow (1st ed.; N. Tachhe, Ed.). United States of America: O’Reilly Media, Inc.
[10] Gorunescu, F. (2011). Data Mining: Concepts, Models and Techniques. Germany: Springer-Verlag Berlin Heildelberg.
[11] H., Jesmeen. M. Z., Hossen, J., Sayeed, S., Ho, C. K., K., T., Armanur, R., & Arif, E. M. H. (2018). A Survey on Cleaning Dirty Data Using Machine Learning Paradigm for Big Data Analytics. Indonesian Journal of Electrical Engineering and Computer Science, 10(3), 1234–1243.
[12] Hastuti, K. (2012). Analisis Komparasi Algoritma Klasifikasi Data Mining untuk Prediksi Mahasiswa Non Aktif. Seminar Nasional Teknologi Informasi Dan Komunikasi Terapan, 241–249.
[13] ISGE. (2018). List of Developing Countries. Retrieved from https://isge2018.isgesociety.com/registration/list-of-developing-countries/.
[14] Iskandar, D., & Suprapto, Y. K. (2013). Perbandingan Akurasi Klasifikasi Tingkat Kemiskinan antara Algoritma C4.5 and Naive Bayes Classifier. JAVA Journal of Electrical and Electronics Engineering, 11(1), 14–17.
[15] Jeatrakul, P., Wong, K. W., & Fung, C. C. (2010). Data Cleaning for Classification Using Misclassification Analysis. Journal of Advanced Computational Intelligence and Intelligence Informatics, 14(3), 297–302.
[16] Jović, A., Brkić, K., & Bogunović, N. (2014). An overview of free software tools for general data mining.
[17] Jupyter, P. (2019). Project Jupyter. Retrieved November 26, 2019, from https://jupyter.org/
[18] Karyadiputra, E. (2016). Analisis Algoritma Naive Bayes untuk Klasifikasi Status Kesejahteraan Rumah Tangga Keluarga Binaan Sosial. Jurnal Ilmiah Fakultas Teknik Technologia, 7(4), 199–208.
[19] Kawelah, W. A. A. S., & Abdala, A. S. E. (2019). A Comparative Study for Machine Learning Tools Using WEKA and Rapid Miner with Classifier Algorithms Random Tree and Random Forest for Network Intrusion Detection. International Journal of Innovative Science and Research Technology, 4(4), 749–752.
[20] Khondoker, M., Dobson, R., Skirrow, C., Simmons, A., & Stahl, D. (2013). A comparison of machine learning methods for classification using simulation with multiple real data examples from mental health studies. Statistical Methods in Medical Research.
[21] Kołcz, A., Chowdhury, A., & Alspector, J. (2003). Data Duplication: An Imbalance Problem? Workshop on Learning from Imbalanced Datasets II, ICML.
[22] Nofriani. (2013). Pembangunan Mesin Pencari Statistik Berbasiskan Supervised Learning dan Relevant Feedback. Sekolah Tinggi Ilmu Statistik (Polstat STIS).
[23] Nofriani. (2019). Comparations of Supervised Machine Learning Techniques in Predicting the Classification of the Household’s Welfare Status. Pekommas Journal, 4(1), 43–52. https://doi.org/10.30818/jpkm.2019.2040105.
[24] Patra, B. G., Kundu, A., Das, D., & Bandyopadhyay, S. (2012). Classification of Interviews – A Case Study on Cancer Patients. Proceedings of the 2nd Workshop on Sentiment Analysis Where AI Meets Psychology, 27–36.
[25] Patton, R. (2001). Software Testing. United States of America: Sams Publishing.
[26] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., & Bertrand Thirion. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
[27] Pretorius, A., Bierman, S., & Steel, S. J. (2016). A Meta-Analysis of Research in Random Forests for Classification. 2016 Pattern Recognition Association of South Africa and Robotics and Mechatronics: International Conference (PRASA-RobMech).
[28] Roßbach, P. (2018). Neural Networks vs. Random Forests – Does It Always Have to be Deep Learning? Germany: Frankfurt School of Finance and Management.
[29] Selvi, P. (2017). An Analysis on Removal of Duplicate Records using Different Types of Data Mining Techniques: A Survey. International Journal of Computer Science and Mobile Computing, 6(11), 38–42.
[30] Shafer, J., Agrawal, R., & Mehta, M. (1996). SPRINT: A Scalable Parallel Classifier for Data Mining. Proceedings of the 22nd International Conference on Very Large Database Mumbai (Bombay) India, 544–555.
[31] Tang, C., Garreau, D., & Luxburg, U. von. (2018). When do random forests fail? 32nd Conference on Neural Information Processing Systems (NeurIPS).
[32] TNP2K. (2019). Tentang Data Terpadu PPFM. Retrieved December 3, 2019, from http://bdt.tnp2k.go.id/tentang
[33] Umarani, V., & Rathika, C. (2019). Predicting Safety Information of Drugs Using Data Mining Technique. International Journal of Computer Engineering & Technology (IJCET), 10(2), 89–90.
[34] Valencia-Zapata, G., Mejia, D., Klimeck, G., Zentner, M. G., & Ersoy, O. (2017). A Statistical Approach to Increase Classification Accuracy in Supervised Learning Algorithms.
[35] Waikato, U. (2019). Weka - Machine Learning Software in Java. Retrieved November 25, 2019, from website: http://www.cs.waikato.ac.nz/ml/weka/
[36] Oyedeji, A. O., Salami, A. M., Folorunsho, O., & Abolade, O. R. (2020). Analysis and Prediction of Student Academic Performance Using Machine Learning. JITCE (Journal of Information Technology and Computer Engineering), 4(1), 10–15.