Data mining and the quality of extracted knowledge from police reports of Brazilian Federal Highways
DOI:
https://doi.org/10.5380/atoz.v3i2.41346Keywords:
Open Government Data, Data Mining, Association Rules, Knowledge Discovery in DatabasesAbstract
Introduction: This paper presents and analyzes the results obtained when applying Data Mining process in the bulletins of occurrences of the Brazilian federal highways generated by the Federal Highway Police (PRF) in 2012. The purpose of this work is to analyze the feasibility of implementing the Data Mining process on data provided by PRF in order to identify associations between variables related to transit accidents in all Brazilian federal highways. Method: It was used symbolic supervised learning algorithms, as well as an algorithm of generation of association rules, implemented in Weka tool. Regarding the database, it was used the records of 2012. On this portion of the database it was conducted the step of data preprocessing, which were used for extracting models and patterns in the Weka tool and, lastly, evaluated the models and extracted patterns. Results: In supervised learning, the results obtained with J48 and PART algorithms have been considered promising due to the fact that for all classes of accidents causes, the values of area under the ROC curve (AUC) were above 0.5. Furthermore, using the Apriori algorithm there have been generated 38 association rules with confidence greater than 0.8. Conclusions: It was concluded that is important to propose a model for data distribution of this database, in order to use it for data mining process, as well as other knowledge extraction tasks and decision making. It was noted still, the need to improve the quality of data to be provided from the initial stage of data gathering, that is, in the very systems used to record the data.
References
Agrawal, R., Imielinski, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. ACM Sigmod Conference. Retirado de http://www.it.uu.se/edu/course/homepage/infoutv/ht08/agrawal93mining.pdf
Agune, R. M., Gregorio Filho, A. S., & Bolliger, S. P. (2010). Governo aberto SP: disponibilização de bases de dados e informações em formato aberto. Congresso Consad de Gestão Pública. Retirado de http://www.prefeitura.sp.gov.br/cidade/secretarias/upload/controladoria_geral/arquivos/C3_TP_GOVERNO%20ABERTO%20SP%20DISPONIBILIZACAO%20DE%20BASES%20DE%20DADOS.pdf
Balbo, F. A. N. (2011). Análise multivariada aplicada aos acidentes da BR-277 entre janeiro de 2007 e novembro de 2009. (Dissertação de Mestrado em Métodos Numéricos em Engenharia). Universidade Federal do Paraná. Retirado de http://www.ppgmne.ufpr.br/arquivos/diss/239.pdf
Baranauskas, J. A., & Monard, M. C. (2000). Reviewing some machine learning concepts and methods. Relatórios Técnicos do ICMC/USP, 102.
Bernardini, F. C. (2006). Combinação de classificadores simbólicos utilizando medidas de regras de conhecimento e algoritmos genéricos. (Tese de Doutorado em Ciências – Ciências de Computação e Matemática Computacional). Universidade de São Paulo/São Carlos. Retirado de http://www.teses.usp.br/teses/disponiveis/55/55134/tde-29092006-110806/
Berry, M. J. A., & Linoff, G. (©1997). Data mining techniques: For marketing, sales, and customer support. New York: John Wiley & Sons.
Borgelt, C., & Kruse, R. (2002). Induction of association rules: Apriori implementation. 15th Conference on Computational Statistics. Retirado de http://www.borgelt.net/papers/cstat_02.pdf
Brasil. Ministério da Justiça. (2014a). Sistema BR-Brasil: boletins de ocorrências em rodovias federais. Retirado de http://dados.gov.br/dataset/acidentes-rodovias-federais
Brasil. Portal Brasileiro de Dados Abertos. (2014b). O que são Dados Abertos? 2014. Retirado de http://www.governoeletronico.gov.br/acoes-e-projetos/Dados-Abertos
Breitman, K. (2005). Web semântica: a Internet do futuro. Rio de Janeiro: LTC.
Carvalho, J. V., Sampaio, M. C., & Mongiovi, G. (1999). Utilização de técnicas de “Data Mining” para o reconhecimento de caracteres manuscritos. 14º Simpósio Brasileiro de Bancos de Dados, 235-249. Retirado de http://www.dsc.ufcg.edu.br/~sampaio/Artigos/reconhecimentocaracteresmanuscritos.pdf
Domingos, P. A. (2012). Few useful things to know about machine learning. Communications of the ACM, 55(10), 78-87. Retirado de http://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf
Facelli, K., Lorena, A. C., Gama, J., & Carvalho, A. C. P. L. F. (2011). Inteligência Artificial: Uma abordagem de aprendizado de máquina. Rio de Janeiro: LTC.
Frank, E., & Witten, I. H. (1998). Generating accurate rule sets without global optimization. Hamilton, New Zealand: University of Waikato.
Mitchell, T. (1997). Machine Learning. New York: McGraw Hill.
Quinlan, J. R. (1988). Decision trees and multi-valued attributes. In: Hayes, J. E., Michei, D., & Richards, J. (Orgs.). Machine Intelligence, 11. New York: Oxford University. Retirado de http://aitopics.org/sites/default/files/classic/Machine_Intelligence_11/MI11-Ch13-Quinlan.pdf
Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Francisco: Morgan Kaufmann.
Reis, C. V. R. (2013). O uso da descoberta de conhecimento em Banco de Dados nos acidentes da BR-381. (Projeto de pesquisa – Mestrado Profissional em Sistemas de Informação e Gestão do Conhecimento). Universidade FUMEC. Retirado de http://www.fumec.br/revistas/sigc/article/view/1508
Rezende, S. O., Pugliesi, J. B., Melanda, E. A., & Paula, M. D. (2003). Mineração de dados. In: REZENDE, S.O. (Org.). Sistemas inteligentes: Fundamentos e aplicações. São Paulo: Manole.
The Annotated 8 principles of Open Government Data. (2014). Retirado de http://opengovdata.org/
Witten, I. H., & Frank, E. (2009). Data Mining: Practical machine learning tools and techniques with java implementations. Burlington, Massachusetts: Morgan Kaufmann.
Published
How to Cite
Issue
Section
License
Atoz is a open access journal and the authors have permission and are encouraged to deposit their papers in personal web pages, institutional repositories or portals before (pre-print) or after (post-print) the publication at AtoZ. It is just asked, when and where possible, the mention, as a bibliographic reference (including the atributted URL), to the AtoZ Journal.
The authors license the AtoZ for the solely purpose of disseminate the published work (peer reviewed version/post-print) in aggregation, curation and indexing systems.
The AtoZ is a Diadorim/IBICT green academic journal.
All the journal content (including instructions, editorial policies and templates) - except where otherwise indicated - is under a Creative Commons Attribution 4.0 International, since October 2020.
When published by this journal, articles are free to share (copy and redistribute the material in any support or format for any purpose, even commercial) and adapt (remix, transform, and create from the material for any purpose , even if commercial). You must give appropriate credit , provide a link to the license, and indicate if changes were made
AtoZ does not apply any charges regarding manuscripts submission/processing and papers publication.
























