Miguel Alexis Solano-Jiménez, Jose Julio Tobar-Cifuentes, Luz Marina Sierra-Martínez, C. Cobos-Lozada
{"title":"Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem","authors":"Miguel Alexis Solano-Jiménez, Jose Julio Tobar-Cifuentes, Luz Marina Sierra-Martínez, C. Cobos-Lozada","doi":"10.19053/01211129.V29.N54.2020.11762","DOIUrl":null,"url":null,"abstract":"Part-of-Speech Tagging (POST) is a complex task in the preprocessing of Natural Language Processing applications. Tagging has been tackled from statistical information and rule-based approaches, making use of a range of methods. Most recently, metaheuristic algorithms have gained attention while being used in a wide variety of knowledge areas, with good results. As a result, they were deployed in this research in a POST problem to assign the best sequence of tags (roles) for the words of a sentence based on information statistics. This process was carried out in two cycles, each of them comprised four phases, allowing the adaptation to the tagging problem in metaheuristic algorithms such as Particle Swarm Optimization, Jaya, Random-Restart Hill Climbing, and a memetic algorithm based on Global-Best Harmony Search as a global optimizer, and on Hill Climbing as a local optimizer. In the consolidation of each algorithm, preliminary experiments were carried out (using cross-validation) to adjust the parameters of each algorithm and, thus, evaluate them 1 Universidad del Cauca (Popayán-Cauca, Colombia). miguelsolano@unicauca.edu.co. ORCID: 0000-00031936-3488 2 Universidad del Cauca (Popayán-Cauca, Colombia). josej@unicauca.edu.co. ORCID: 0000-0002-5436-0816 3 Ph. D. Universidad del Cauca (Popayán-Cauca, Colombia). lsierra@unicauca.edu.co. ORCID: 0000-00033847-3324 4 Ph. D. Universidad del Cauca (Popayán-Cauca, Colombia). ccobos@unicauca.edu.co. ORCID: 0000-00026263-1911 Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem Revista Facultad de Ingeniería (Rev. Fac. Ing.) Vol. 29 (54), e11762. 2020. Tunja-Boyacá, Colombia. L-ISSN: 0121-1129, e-ISSN: 2357-5328, DOI: https://doi.org/10.19053/01211129.v29.n54.2020.11762 on the datasets of the complete tagged corpus: IULA (Spanish), Brown (English) and Nasa Yuwe (Nasa). The results obtained by the proposed taggers were compared, and the Friedman and Wilcoxon statistical tests were applied, confirming that the proposed memetic, GBHS Tagger, obtained better results in precision. The proposed taggers make an important contribution to POST for traditional languages (English and Spanish), non-traditional languages (Nasa Yuwe), and their application areas.","PeriodicalId":21428,"journal":{"name":"Revista Facultad De Ingenieria-universidad De Antioquia","volume":null,"pages":null},"PeriodicalIF":0.9000,"publicationDate":"2020-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Revista Facultad De Ingenieria-universidad De Antioquia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.19053/01211129.V29.N54.2020.11762","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Part-of-Speech Tagging (POST) is a complex task in the preprocessing of Natural Language Processing applications. Tagging has been tackled from statistical information and rule-based approaches, making use of a range of methods. Most recently, metaheuristic algorithms have gained attention while being used in a wide variety of knowledge areas, with good results. As a result, they were deployed in this research in a POST problem to assign the best sequence of tags (roles) for the words of a sentence based on information statistics. This process was carried out in two cycles, each of them comprised four phases, allowing the adaptation to the tagging problem in metaheuristic algorithms such as Particle Swarm Optimization, Jaya, Random-Restart Hill Climbing, and a memetic algorithm based on Global-Best Harmony Search as a global optimizer, and on Hill Climbing as a local optimizer. In the consolidation of each algorithm, preliminary experiments were carried out (using cross-validation) to adjust the parameters of each algorithm and, thus, evaluate them 1 Universidad del Cauca (Popayán-Cauca, Colombia). miguelsolano@unicauca.edu.co. ORCID: 0000-00031936-3488 2 Universidad del Cauca (Popayán-Cauca, Colombia). josej@unicauca.edu.co. ORCID: 0000-0002-5436-0816 3 Ph. D. Universidad del Cauca (Popayán-Cauca, Colombia). lsierra@unicauca.edu.co. ORCID: 0000-00033847-3324 4 Ph. D. Universidad del Cauca (Popayán-Cauca, Colombia). ccobos@unicauca.edu.co. ORCID: 0000-00026263-1911 Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem Revista Facultad de Ingeniería (Rev. Fac. Ing.) Vol. 29 (54), e11762. 2020. Tunja-Boyacá, Colombia. L-ISSN: 0121-1129, e-ISSN: 2357-5328, DOI: https://doi.org/10.19053/01211129.v29.n54.2020.11762 on the datasets of the complete tagged corpus: IULA (Spanish), Brown (English) and Nasa Yuwe (Nasa). The results obtained by the proposed taggers were compared, and the Friedman and Wilcoxon statistical tests were applied, confirming that the proposed memetic, GBHS Tagger, obtained better results in precision. The proposed taggers make an important contribution to POST for traditional languages (English and Spanish), non-traditional languages (Nasa Yuwe), and their application areas.
期刊介绍:
Revista Facultad de Ingenieria started in 1984 and is a publication of the School of Engineering at the University of Antioquia.
The main objective of the journal is to promote and stimulate the publishing of national and international scientific research results. The journal publishes original articles, resulting from scientific research, experimental and or simulation studies in engineering sciences, technology, and similar disciplines (Electronics, Telecommunications, Bioengineering, Biotechnology, Electrical, Computer Science, Mechanical, Chemical, Environmental, Materials, Sanitary, Civil and Industrial Engineering).
In exceptional cases, the journal will publish insightful articles related to current important subjects, or revision articles representing a significant contribution to the contextualization of the state of the art in a known relevant topic. Case reports will only be published when those cases are related to studies in which the validity of a methodology is being proven for the first time, or when a significant contribution to the knowledge of an unexplored system can be proven.
All published articles have undergone a peer review process, carried out by experts recognized for their knowledge and contributions to the relevant field.
To adapt the Journal to international standards and to promote the visibility of the published articles; and therefore, to have a greater impact in the global academic community, after November 1st 2013, the journal will accept only manuscripts written in English for reviewing and publication.
Revista Facultad de Ingeniería –redin is entirely financed by University of Antioquia
Since 2015, every article accepted for publication in the journal is assigned a DOI number.