Classifying perceptions from Stakeholders about Brazil’s future using machine learning
DOI:
https://doi.org/10.5380/atoz.v12i0.84075Keywords:
Machine learning, Classification, Brazil, Stakeholders.Abstract
This paper compares five machine learning (ML) techniques to classify the perceptions of stakeholders as to Brazil's future. The ML techniques used were artificial neural networks, k-nearest neighbors, naïve bayes, random forest and support vector machines. They were applied to a dataset retrieved from the World Bank about Brazil's development. The dataset was preprocessed and configured in two different versions: the first contained a subset of attributes manually selected by the authors, whereas the second was composed of attributes selected using the information gain approach. It was found that all ML techniques performed better using the second version of the dataset, where attributes were ranked based on information gain. However, within each version of the dataset all techniques had similar performance. This research also found that the most relevant attributes are related to business opportunities, development indexes associated with critical subjects and trust in institutions and organizations.
References
Abiodun, O. I., Jantan, A., Omolara, A. E., Dada, K. V., Mohamed, N. A., & Arshad, H. (2018). State-of-the-art in artificial neural network applications: A survey. Heliyon, 4(11), e00938. https://doi.org/https://doi.org/10.1016/j.heliyon.2018.e00938
Aggarwal, C. C. (2015). Data Mining: The Textbook. Springer International Publishing. https://doi.org/10.1007/978-3-319-14142-8
Aggarwal, C. C. (2018). Neural Networks and Deep Learning. In Neural Networks and Deep Learning. Springer International Publishing. https://doi.org/10.1007/978-3-319-94463-0
Banco Central do Brasil – BCB. (2020). Relatório de Mercado Focus. ttps://www.bcb.gov.br/publicacoes/focus
Bramer, M. (2016). Principles of Data Mining (3rd ed.). Springer London. https://doi.org/10.1007/978-1-4471-7307-6
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32. https://doi.org/10.1007/9781441993267_5
Cortes, C., & Vapnik, V. (1995). Support-Vector Networks. Chemical Biology & Drug Design, 20, 273–297. https://doi.org/10.1111/j.1747-0285.2009.00840.x
Dogan, A., & Birant, D. (2021). Machine learning and data mining in manufacturing. Expert Systems with Applications, 166, 114060. https://doi.org/10.1016/j.eswa.2020.114060
Featherstone, J. D., Ruiz, J. B., Barnett, G. A., & Millam, B. J. (2020). Exploring childhood vaccination themes and public opinions on Twitter: a semantic network analysis. Telematics and Informatics, 54(January), 101474. https://doi.org/10.1016/j.tele.2020.101474
Fergus, P., Idowu, I., Hussain, A., & Dobbins, C. (2016). Advanced artificial neural network classification for detecting preterm births using EHG records. Neurocomputing, 188, 42–49. https://doi.org/10.1016/j.neucom.2015.01.107
Genuer, R. & Poggi, J. (2020). Random Forests with R. Springer
Cham. https://doi.org/10.1007/978-3-030-56485-8
Géron, A. (2019). Mãos à Obra: Aprendizado de máquina com Scikit-Learn & TensowFlow (1a ed.). Alta Books.
Haihong, E., Yingxi, H., Haipeng, P., Wen, Z., Siqi, X., & Peiqing, N. (2019). Theme and sentiment analysis model of public opinion dissemination based on generative adversarial network. Chaos, Solitons and Fractals, 121, 160–167. https://doi.org/10.1016/j.chaos.2018.11.036
Han, J., Kamber, M., & Pei, J. (2012). Data Mining: Concepts and Techniques (Third). Morgan Kaufmann.
Haupt, M. R., Jinich-Diamant, A., Li, J., Nali, M., & Mackey, T. K. (2021). Characterizing twitter user topics and communication network dynamics of the “Liberate” movement during COVID-19 using unsupervised machine learning and social network analysis. Online Social Networks and Media, 21, 100114. https://doi.org/10.1016/j.osnem.2020.100114
Instituto Brasileiro de Geografia e Estatística – IBGE. (2020). Produto Interno Bruto – PIB. Recuperado de https://www.ibge.gov.br/explica/pib.php
Kafaf, D. AL., Kim, D-K., Lu, L. (2017). B-kNN do Improve the Efficiency of kNN. Proceedings of the 6th International Conference on Data Science, Technology and Applications, 126-132. https://doi.org/10.5220/0006393301260132
Kang, Y., Wang, Y., Zhang, D., & Zhou, L. (2017). The public’s opinions on a new school meals policy for childhood obesity prevention in the U.S.: A social media analytics approach. International Journal of Medical Informatics, 103, 83–88. https://doi.org/https://doi.org/10.1016/j.ijmedinf.2017.04.013
Kubat, M. (2017). An Introduction to Machine Learning. Springer International Publishing. https://doi.org/10.1007/978-3-319-63913-0
Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R. P., Tang, J., & Liu, H. (2018). Feature Selection: A Data Perspective. ACM Computing Surveys, 50(6), 1-45. https://doi.org/10.1145/3136625
Liu, K., Ergu, D., Cai, Y., Gong, B., & Sheng, J. (2019). A New Approach to Process the Unknown Words in Financial Public Opinion. Procedia Computer Science, 162(Itqm 2019), 523–531. https://doi.org/10.1016/j.procs.2019.12.019
Modu, B., Polovina, N., Lan, Y., Konur, S., Taufiq Asyhari, A., & Peng, Y. (2017). Towards a predictive analytics-based intelligent malaria outbreakwarning system. Applied Sciences (Switzerland), 7(8). https://doi.org/10.3390/app7080836
Mullainathan, S., & Spiess, J. (2017). Machine learning: an applied econometric approach. Journal of Economic Perspectives, 31(2), 87–106. https://doi.org/10.1257/jep.31.2.87
Myslín, M., Zhu, S.-H., Chapman, W., & Conway, M. (2013). Using twitter to examine smoking behavior and perceptions of emerging tobacco products. Journal of Medical Internet Research, 15(8). https://doi.org/10.2196/jmir.2534
Oliveira, A., Faria, B. M., Gaio, A. R., & Reis, L. P. (2017). Data Mining in HIV-AIDS Surveillance System: Application to Portuguese Data. Journal of Medical Systems, 41(4). https://doi.org/10.1007/s10916-017-0697-4
Pan, Z., Wang, Y., & Pan, Y. (2020). A new locally adaptive k-nearest neighbor algorithm based on discrimination class. Knowledge-Based Systems, 204, 106185. https://doi.org/10.1016/j.knosys.2020.106185
Patle, A., & Chouhan, D. S. (2013). SVM kernel functions for classification. 2013 International Conference on Advances in Technology and Engineering (ICATE). 1-9. 10.1109/ICAdTE.2013.6524743
Puri, M., & Robinson, D. T. (2007). Optimism and economic choice. Journal of Financial Economics, 86(1), 71–99. https://doi.org/10.1016/j.jfineco.2006.09.003
Python Software Foundation (2022). What’s New In Python 3.7. Recuperado de: https://docs.python.org/3.7/whatsnew/3.7.html
Raghavendra, N, S., & Deka, P. C. (2014). Support vector machine applications in the field of hydrology: A review. Applied Soft Computing, 19, 372–386. https://doi.org/10.1016/j.asoc.2014.02.002
Silva, C., Welfer, D., Gioda, F. P., & Dornelles, C. (2017). Cattle Brand Recognition using Convolutional Neural Network and Support Vector Machines. IEEE Latin America Transactions, 15(2), 310–316. https://doi.org/10.1109/TLA.2017.7854627
Souza, J. G. de, & Spinola, N. D. (2017). Medidas do Desenvolvimento Econômico. RDE – Revista de Desenvolvimento Econômico, 1(39), 78. https://doi.org/10.21452/rde.v1i36.4697
Speiser, J. L., Miller, M. E., Tooze, J., & Ip, E. (2019). A comparison of random forest variable selection methods for classification prediction modeling. Expert Systems with Applications, 134, 93–101. https://doi.org/10.1016/j.eswa.2019.05.028
Stekhoven, D. J., & Bühlmann, P. (2011). Missforest-Non-parametric missing value imputation for mixed-type data. Bioinformatics, 28(1), 112–118. https://doi.org/10.1093/bioinformatics/btr597
Tan, P.-N., Steinbach, M., Karpatne, A., & Kumar, V. (2019). Introduction to Data Mining (2nd ed.). Pearson Prentice Hall.
Viana, G., & Lima, J. F. de. (2010). Capital humano e crescimento econômico. Interações (Campo Grande), 11(2), 137-148. https://doi.org/10.1590/S1518-70122010000200003
Wang, G., Chi, Y., Liu, Y., & Wang, Y. (2019). Studies on a multidimensional public opinion network model and its topic detection algorithm. Information Processing and Management, 56(3), 584–608. https://doi.org/10.1016/j.ipm.2018.11.010
WEKA – Waikato Environment for Knowledge Analysis. (2021). University of Waikato. Recuperado de https://www.cs.waikato.ac.nz/ml/weka/
World Bank Group – WBG. (2020). World Bank Group Country Survey 2019. https://microdata.worldbank.org/index.php/catalog/3511/get-microdata
Zendehboudi, A., Baseer, M. A., & Saidur, R. (2018). Application of support vector machine models for forecasting solar and wind energy resources: A review. Journal of Cleaner Production, 199, 272–285. https://doi.org/10.1016/j.jclepro.2018.07.164
Zhang, X., Gou, H. (2022). Statistical-mean double-quantitative K-nearest neighbor classification learning based on neighborhood distance measurement, Knowledge-Based Systems, 250, 109018. https://doi.org/10.1016/j.knosys.2022.109018
Zhang, M.-L., & Zhou, Z.-H. (2007). ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition. 40(7), 2038-2048. https://doi.org/10.1016/j.patcog.2006.12.019
Zhao, J., Henriksson, A., Asker, L., & Boström, H. (2015). Predictive modeling of structured electronic health records for adverse drug event detection. BMC Medical Informatics and Decision Making, 15(4). https://doi.org/10.1186/1472-6947-15-S4-S1
Downloads
Published
How to Cite
Issue
Section
License
Atoz is a open access journal and the authors have permission and are encouraged to deposit their papers in personal web pages, institutional repositories or portals before (pre-print) or after (post-print) the publication at AtoZ. It is just asked, when and where possible, the mention, as a bibliographic reference (including the atributted URL), to the AtoZ Journal.
The authors license the AtoZ for the solely purpose of disseminate the published work (peer reviewed version/post-print) in aggregation, curation and indexing systems.
The AtoZ is a Diadorim/IBICT green academic journal.
All the journal content (including instructions, editorial policies and templates) - except where otherwise indicated - is under a Creative Commons Attribution 4.0 International, since October 2020.
When published by this journal, articles are free to share (copy and redistribute the material in any support or format for any purpose, even commercial) and adapt (remix, transform, and create from the material for any purpose , even if commercial). You must give appropriate credit , provide a link to the license, and indicate if changes were made
AtoZ does not apply any charges regarding manuscripts submission/processing and papers publication.
























