Fernando E. Quiroz, Jr., Chona B. Sabinay, Jeneffer A. Sabonsolin
{"title":"Building the Waray-waray Neural Language Model using Recurrent Neural Network","authors":"Fernando E. Quiroz, Jr., Chona B. Sabinay, Jeneffer A. Sabonsolin","doi":"10.61310/mndjsteect.1170.23","DOIUrl":null,"url":null,"abstract":"In the Philippines, language modeling is challenging since most of its languages are low-resourced. Tagalog and Cebuano are the only languages present in machine translation platforms like Google Translate; Winaray, a language spoken in the Eastern Visayas region, is inexistent. Hence, this study developed a Winaray language model that could be used in any natural language processing-related tasks. The text corpus used in creating the model was scrapped from the web (religious and local news websites, and Wikipedia) containing Winaray sentences. The model was trained using an encoder-decoder recurrent neural network with four sequential layers and 100 hidden neurons. The text prediction accuracy of the model reached 76.17%. The model was manually evaluated based on its text-generated sentences using linguistic quality dimensions such as grammaticality, non-redundancy, focus, structure and coherence. Results of manual evaluation showed a promising result as the linguistic quality reached 3.66 (acceptable); however, training data must be improved in terms of size with the addition of texts in various text genres.","PeriodicalId":40697,"journal":{"name":"Mindanao Journal of Science and Technology","volume":" ","pages":""},"PeriodicalIF":0.4000,"publicationDate":"2023-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mindanao Journal of Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.61310/mndjsteect.1170.23","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
In the Philippines, language modeling is challenging since most of its languages are low-resourced. Tagalog and Cebuano are the only languages present in machine translation platforms like Google Translate; Winaray, a language spoken in the Eastern Visayas region, is inexistent. Hence, this study developed a Winaray language model that could be used in any natural language processing-related tasks. The text corpus used in creating the model was scrapped from the web (religious and local news websites, and Wikipedia) containing Winaray sentences. The model was trained using an encoder-decoder recurrent neural network with four sequential layers and 100 hidden neurons. The text prediction accuracy of the model reached 76.17%. The model was manually evaluated based on its text-generated sentences using linguistic quality dimensions such as grammaticality, non-redundancy, focus, structure and coherence. Results of manual evaluation showed a promising result as the linguistic quality reached 3.66 (acceptable); however, training data must be improved in terms of size with the addition of texts in various text genres.