Indexing and information retrieval of theses and dissertations through noun phrases
DOI:
https://doi.org/10.5380/atoz.v1i1.41280Keywords:
Noun phrase, Information retrieval, Automatic indexing, Theses and dissertationsAbstract
Introduction: Discusses the use of noun phrases in the automatic indexing process of theses and dissertations deposited in the UFPE Digital Library of Theses and Dissertations (BDTD-UFPE), on the assumption that noun phrases consist of a better knowledge unit for indexing and information retrieval that individual words, allowing an adequate response to the users information need when searching for information. It presentes the state of the art of noun phrases and their automatic extraction process, as well as its applicability in automatic indexing and information retrieval. Method: Based on text analysis tool (OGMA), analyses the applicability of the extraction of noun phrases in automatic indexing and information retrieval of thesis and dissertations in the context of BDTD-UFPE. Applied to abstracts from Law, Computer and Nutrition thesis and dissertations, the variables could be observed, allowing the research team assess the extraction of noun phrases using: the percentage of accuracy of relevant noun phrases; the error rate extract strings that are not noun phrases, and; the percentage of non relevant noun phrases extracted. Results: The process of extracting noun phrases by OGMA showed different performances for each graduate program, with better performance (better accuracy rate) for abstracts from Law Thesis and Dissertations, followed by Computer and Nutrition ones. This performance difference can be partly explained by the different nature of technical terms presented in the abstracts. Conclusions: It concludes that although there are limitations in the available tools, the application of automated methods of extraction and indexing by noun phrases appears to be quite promising, since the noun phrases are configured as best descriptors and access to documents, eliminating the problems caused by synonymy and polysemy of isolated words.
References
BAEZA-YATES, R.; RIBEIRO-NETO, B. Modern information retrieval. New York: ACM Press, 1999.
KURAMOTO, H. Sintagmas nominais: uma nova proposta para a recuperação de informação. DataGramaZero: revista de Ciência da Informação, v. 3, n. 1, 2002.
______. Uma abordagem alternativa para o tratamento e a recuperação de informação textual: os sintagmas nominais. Ciência da Informação, Brasília, v. 25, n. 2, 1995.
LE GUERN, M. Un analyseur morpho-syntaxique pour l’indexation automatique. Le Français Moderne, v. 59, n. 1, p. 22-35, juin 1991.
MAIA, L. C. G. Uso de sintagmas nominais na classificação automática de documentos eletrônicos. 2008. Tese (Doutorado em Ciência da Informação) – Universidade Federal de Minas Gerais – UFMG. Belo Horizonte, 2008.
MIORELLI, S. T. Extração do sintagma nominal em sentenças em português. 2001. 98 f. Dissertação (Mestrado em Ciência da Computação) – Faculdade de Informática, Pontifícia Universidade Católica do Rio Grande do Sul, Porto Alegre.
PERINI, M. A. Gramática descritiva do português. 3 ed. São Paulo: Ática, 1998.
SOUZA, R. R.; ALVARENGA NETO, R. C. D. de; MENDES, K. C. I. Mapeamento semântico através da análise de ocorrência de descritores sobre gestão do conhecimento. Transinformação, v. 19, n. 1, p. 19-30, 2007.
SOUZA, R. R. Uma proposta de metodologia para indexação automática utilizando sintagmas nominais. Encontros Bibli: revista eletrônica de Biblioteconomia e Ciência da Informação, v. 11, n. esp., p. 42-59, 2006.
TUFANO, D. Estudos de língua e literatura. 4. ed. São Paulo, Moderna, 1990.
Published
How to Cite
Issue
Section
License
Atoz is a open access journal and the authors have permission and are encouraged to deposit their papers in personal web pages, institutional repositories or portals before (pre-print) or after (post-print) the publication at AtoZ. It is just asked, when and where possible, the mention, as a bibliographic reference (including the atributted URL), to the AtoZ Journal.
The authors license the AtoZ for the solely purpose of disseminate the published work (peer reviewed version/post-print) in aggregation, curation and indexing systems.
The AtoZ is a Diadorim/IBICT green academic journal.
All the journal content (including instructions, editorial policies and templates) - except where otherwise indicated - is under a Creative Commons Attribution 4.0 International, since October 2020.
When published by this journal, articles are free to share (copy and redistribute the material in any support or format for any purpose, even commercial) and adapt (remix, transform, and create from the material for any purpose , even if commercial). You must give appropriate credit , provide a link to the license, and indicate if changes were made
AtoZ does not apply any charges regarding manuscripts submission/processing and papers publication.
























