用递归神经网络构建Waray-waray神经语言模型

IF 0.4 Q4 MULTIDISCIPLINARY SCIENCES Mindanao Journal of Science and Technology Pub Date : 2023-06-23 DOI:10.61310/mndjsteect.1170.23
Fernando E. Quiroz, Jr., Chona B. Sabinay, Jeneffer A. Sabonsolin
{"title":"用递归神经网络构建Waray-waray神经语言模型","authors":"Fernando E. Quiroz, Jr., Chona B. Sabinay, Jeneffer A. Sabonsolin","doi":"10.61310/mndjsteect.1170.23","DOIUrl":null,"url":null,"abstract":"In the Philippines, language modeling is challenging since most of its languages are low-resourced. Tagalog and Cebuano are the only languages present in machine translation platforms like Google Translate; Winaray, a language spoken in the Eastern Visayas region, is inexistent. Hence, this study developed a Winaray language model that could be used in any natural language processing-related tasks. The text corpus used in creating the model was scrapped from the web (religious and local news websites, and Wikipedia) containing Winaray sentences. The model was trained using an encoder-decoder recurrent neural network with four sequential layers and 100 hidden neurons. The text prediction accuracy of the model reached 76.17%. The model was manually evaluated based on its text-generated sentences using linguistic quality dimensions such as grammaticality, non-redundancy, focus, structure and coherence. Results of manual evaluation showed a promising result as the linguistic quality reached 3.66 (acceptable); however, training data must be improved in terms of size with the addition of texts in various text genres.","PeriodicalId":40697,"journal":{"name":"Mindanao Journal of Science and Technology","volume":" ","pages":""},"PeriodicalIF":0.4000,"publicationDate":"2023-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Building the Waray-waray Neural Language Model using Recurrent Neural Network\",\"authors\":\"Fernando E. Quiroz, Jr., Chona B. Sabinay, Jeneffer A. Sabonsolin\",\"doi\":\"10.61310/mndjsteect.1170.23\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the Philippines, language modeling is challenging since most of its languages are low-resourced. Tagalog and Cebuano are the only languages present in machine translation platforms like Google Translate; Winaray, a language spoken in the Eastern Visayas region, is inexistent. Hence, this study developed a Winaray language model that could be used in any natural language processing-related tasks. The text corpus used in creating the model was scrapped from the web (religious and local news websites, and Wikipedia) containing Winaray sentences. The model was trained using an encoder-decoder recurrent neural network with four sequential layers and 100 hidden neurons. The text prediction accuracy of the model reached 76.17%. The model was manually evaluated based on its text-generated sentences using linguistic quality dimensions such as grammaticality, non-redundancy, focus, structure and coherence. Results of manual evaluation showed a promising result as the linguistic quality reached 3.66 (acceptable); however, training data must be improved in terms of size with the addition of texts in various text genres.\",\"PeriodicalId\":40697,\"journal\":{\"name\":\"Mindanao Journal of Science and Technology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.4000,\"publicationDate\":\"2023-06-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Mindanao Journal of Science and Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.61310/mndjsteect.1170.23\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mindanao Journal of Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.61310/mndjsteect.1170.23","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

在菲律宾,语言建模具有挑战性,因为大多数语言资源不足。他加禄语和塞布阿诺语是谷歌翻译等机器翻译平台中唯一存在的语言;维那莱语是东米沙亚斯地区的一种语言,是不存在的。因此,本研究开发了一个Winaray语言模型,可用于任何与自然语言处理相关的任务。创建该模型时使用的文本语料库已从包含Winaray语句的网络(宗教和地方新闻网站以及维基百科)中删除。该模型使用具有四个顺序层和100个隐藏神经元的编码器-解码器递归神经网络进行训练。该模型的文本预测准确率达到76.17%。该模型基于文本生成的句子,使用语法性、非冗余性、焦点、结构和连贯性等语言质量维度进行手动评估。人工评估结果显示,语言质量达到3.66(可接受),效果良好;然而,随着各种文本类型的文本的添加,训练数据必须在大小方面得到改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Building the Waray-waray Neural Language Model using Recurrent Neural Network
In the Philippines, language modeling is challenging since most of its languages are low-resourced. Tagalog and Cebuano are the only languages present in machine translation platforms like Google Translate; Winaray, a language spoken in the Eastern Visayas region, is inexistent. Hence, this study developed a Winaray language model that could be used in any natural language processing-related tasks. The text corpus used in creating the model was scrapped from the web (religious and local news websites, and Wikipedia) containing Winaray sentences. The model was trained using an encoder-decoder recurrent neural network with four sequential layers and 100 hidden neurons. The text prediction accuracy of the model reached 76.17%. The model was manually evaluated based on its text-generated sentences using linguistic quality dimensions such as grammaticality, non-redundancy, focus, structure and coherence. Results of manual evaluation showed a promising result as the linguistic quality reached 3.66 (acceptable); however, training data must be improved in terms of size with the addition of texts in various text genres.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Mindanao Journal of Science and Technology
Mindanao Journal of Science and Technology MULTIDISCIPLINARY SCIENCES-
CiteScore
0.90
自引率
0.00%
发文量
18
期刊最新文献
Fragmentation Analysis of Capisaan Surface Karst Landscape through Changes in Land Use and Land Cover using FRAGSTATS Sustainable Management for Spiralling Whitefly, Aleurodicus dispersus Russell (Hemiptera: Aleyrodidae) Infesting Guava and Its Effects on the Natural Enemies’ Complex Prevalence of Streptococci spp. and Unexpected Non-Streptococci Strains Associated with Bovine Mastitis Infection in Dairy Cattle in Region IV-A, Philippines Building the Waray-waray Neural Language Model using Recurrent Neural Network Utilization of Forage Crops as an Effective and Eco-friendly Method for Weed Growth Control and Distribution in an Immature Rubber Plantation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1