UnderRL Tagger: a free software for Under-Resourced Languages POS tagging: UnderRL Tagger: un software libre para etiquetar POS en Under- Resourced Languages


Jorge Molina Mejía
Universidad de Antioquia
José Luis Pemberty Tamayo
Universidad de Antioquia


Abstract: This chapter presents a free software program that can be used for POS tagging in a multiplicity of languages that do not have automatic taggers. The program aims to facilitate the work with corpora in these languages through Natural Language Processing. Its operation allows the manual tagging process to be gradually automated thanks to a system that makes it possible to recall and reuse tags, as well as to handle large amounts of text and to generate output files in XML format with tags based on the EAGLES system.
Resumen: En este capítulo se presenta un software libre que puede utilizarse para el etiquetado de POS en una multiplicidad de lenguas que no cuentan con etiquetadores automáticos. El programa busca facilitar el trabajo con corpus en estas lenguas a través de la lingüística computacional. Su funcionamiento permite que el proceso manual de etiquetado se convierta poco a poco en automático gracias a un sistema que permite recordar y reutilizar las etiquetas, de la misma manera en que permite manejar grandes cantidades de textos y generar archivos de salida en formato XML con etiquetas basadas en el sistema EAGLES.

Author Biography

Jorge Molina Mejía, Universidad de Antioquia

Jorge Mauricio Molina Mejía is an associate professor in the area of linguistics at the University of Antioquia, professor of computational linguistics and Spanish as a foreign language, coordinator of the research group Corpus Ex Machina, he is part of the Committee of the Doctorate in Linguistics of the Faculty of Communications and Philology (University of Antioquia). His research fields are Computational Linguistics, Natural Language Processing and the teaching of Spanish as a Foreign Language. He has written articles, book chapters and books in these fields of knowledge, particularly the book "Lingüística computacional y de corpus: teorías, métodos y aplicaciones" (Editorial Universidad de Antioquia).



September 10, 2023
