Application of data cleaning to improve data quality at the Brazilian Civil Aviation Aircraft Accidents Database
DOI:
https://doi.org/10.5380/atoz.v5i2.47303Keywords:
Data cleansing, Data quality, Data cleansing methodsAbstract
Introduction: It shows the application of techniques of data cleaning in the aeronautical accidents of brazilian civil aviation with the aim of measuring the degree of improvement in the quality of the data. Method: Initially, there was a literature review on the concepts of data cleaning and data quality, and then applied the techniques of data cleaning in a database composed of 4601 records, relating to aviation accidents that occurred between the years of 1979 and 2014 in brazilian civil aviation. The measurement of the improvement in the quality of the data was performed using the metric "percent of improvement of data". Results: Observing the general context all the attributes of the database there was a 9% improvement on the quality of the data, with attributes, such as weight, manufacturer and model of the aircraft, which had a degree of improvement over 55% after application of the methodology. Conclusion: The data cleaning technique can be used to define policies for continuous improvement in data bases and improve decision-making processes in organizations that deal with aviation, particularly in the area of flight safety.
References
Centro de Investigação e Prevenção de Acidentes. (2008). Relatório final a-022/cenipa/2008.
Centro de Investigação e Prevenção de Acidentes. (2009). Relatório final a-no67/cenipa/2009.
Kanki, B. G., & Seamster, T. L. (2002). Aviation information management: From documents to data. Burlington: Ashgate.
Lopes, F. P. (2006). Administração de dados: Técnicas, metodologias e ferramentas para garantir a qualidade dos dados. Recife: Universidade Federal de Pernambuco.
Oliveira, P. J., Rodrigues, F., & Henriques, P. R. (2004). Limpeza de dados: Uma visão geral. Recuperado de http://wiki.di.uminho.pt/twiki/pub/Research/Doutoramentos/SDDI2004/ArtigoOliveira.pdf
Orr, K. (1998, Feb.). Data quality and systems theory. Communications of the ACM, 41(2), 66–71. doi:10.1145/269012.269023
Pipino, L. L., Lee, Y. W., & Wang, R. Y. (2002, Apr.). Data quality assessment. Communications of the ACM, 45(4), 211–218. doi: 10.1145/505248.506010
Rahm, E., & Do, H. H. (2000). Data cleaning: Problems and current approaches. IEEE Data Engineering Bulletin, 23(4), 3–13. Recuperado de http://sites.computer.org/debull/A00dec/issue1.htm
Strong, D. M., Lee, Y. W., & Wang, R. Y. (1997, May). Data quality in context. Communications of the ACM, 40(5), 103–110. doi: 10.1145/253769.253804
Vasco, D. O. (2013). Identificação de anomalias contextuais.Porto: Universidade do Porto.
Published
How to Cite
Issue
Section
License
Atoz is a open access journal and the authors have permission and are encouraged to deposit their papers in personal web pages, institutional repositories or portals before (pre-print) or after (post-print) the publication at AtoZ. It is just asked, when and where possible, the mention, as a bibliographic reference (including the atributted URL), to the AtoZ Journal.
The authors license the AtoZ for the solely purpose of disseminate the published work (peer reviewed version/post-print) in aggregation, curation and indexing systems.
The AtoZ is a Diadorim/IBICT green academic journal.
All the journal content (including instructions, editorial policies and templates) - except where otherwise indicated - is under a Creative Commons Attribution 4.0 International, since October 2020.
When published by this journal, articles are free to share (copy and redistribute the material in any support or format for any purpose, even commercial) and adapt (remix, transform, and create from the material for any purpose , even if commercial). You must give appropriate credit , provide a link to the license, and indicate if changes were made
AtoZ does not apply any charges regarding manuscripts submission/processing and papers publication.
























