Indonesian Parsing using Probabilistic Context-Free Grammar (PCFG) and Viterbi-Cocke Younger Kasami (Viterbi-CYK)

D. E. Cahyani, L. Gumilar, Ajie Pangestu
{"title":"Indonesian Parsing using Probabilistic Context-Free Grammar (PCFG) and Viterbi-Cocke Younger Kasami (Viterbi-CYK)","authors":"D. E. Cahyani, L. Gumilar, Ajie Pangestu","doi":"10.1109/ISRITI51436.2020.9315395","DOIUrl":null,"url":null,"abstract":"Parsing is a tool for understanding natural grammar patterns. The problem of structural ambiguity in identifying sentence patterns often occurs in parsing. Syntactic parsing is one approach to solving structural ambiguity problems using the Probabilistic Context-Free Grammar (PCFG) and Viterbi-Cocke Younger Kasami (Viterbi-CYK) methods. Meanwhile, a large number of Indonesian language resources are needed as machine knowledge to parse. This research build a parsing of Indonesian sentence patterns with Indonesian Tagged corpus resource then solve the ambiguity problem of Indonesian sentence pattern parsing using PCFG and Viterbi-CYK algorithms. The corpus data is processed to obtain grammar rules using the PCFG algorithm. Then, the sentence on the corpus is processed by the PCFG rule that generated and uses the Viterbi-CYK algorithm to get the parse tree taken based on the highest probability value. The results of the research produced an average value of similarity production rules which the highest values is 92.95%. This shows that the Indonesian parsing successfully parses Indonesian sentence and can solve the problem of structural ambiguity in the parsing of Indonesian sentence patterns.","PeriodicalId":325920,"journal":{"name":"2020 3rd International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 3rd International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISRITI51436.2020.9315395","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Parsing is a tool for understanding natural grammar patterns. The problem of structural ambiguity in identifying sentence patterns often occurs in parsing. Syntactic parsing is one approach to solving structural ambiguity problems using the Probabilistic Context-Free Grammar (PCFG) and Viterbi-Cocke Younger Kasami (Viterbi-CYK) methods. Meanwhile, a large number of Indonesian language resources are needed as machine knowledge to parse. This research build a parsing of Indonesian sentence patterns with Indonesian Tagged corpus resource then solve the ambiguity problem of Indonesian sentence pattern parsing using PCFG and Viterbi-CYK algorithms. The corpus data is processed to obtain grammar rules using the PCFG algorithm. Then, the sentence on the corpus is processed by the PCFG rule that generated and uses the Viterbi-CYK algorithm to get the parse tree taken based on the highest probability value. The results of the research produced an average value of similarity production rules which the highest values is 92.95%. This shows that the Indonesian parsing successfully parses Indonesian sentence and can solve the problem of structural ambiguity in the parsing of Indonesian sentence patterns.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用概率上下文无关语法(PCFG)和Viterbi-Cocke Younger Kasami (Viterbi-CYK)的印尼语解析
解析是一种理解自然语法模式的工具。句法分析中经常出现句型识别中的结构歧义问题。句法分析是利用概率上下文无关语法(PCFG)和Viterbi-Cocke - Younger Kasami (Viterbi-CYK)方法解决结构歧义问题的一种方法。同时,需要大量的印尼语资源作为机器知识进行解析。本研究利用印尼语标记语料库资源构建印尼语句型分析,并利用PCFG和Viterbi-CYK算法解决印尼语句型分析的歧义问题。使用PCFG算法对语料库数据进行处理以获得语法规则。然后,语料库上的句子由生成的PCFG规则处理,并使用Viterbi-CYK算法获得基于最高概率值的解析树。研究结果得出相似产生规则的平均值,最高值为92.95%。这说明印尼语解析成功地解析了印尼语句子,解决了印尼语句型解析中的结构歧义问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Combined Firefly Algorithm-Random Forest to Classify Autistic Spectrum Disorders Analysis of Indonesia's Internet Topology Borders at the Autonomous System Level Influence Distribution Training Data on Performance Supervised Machine Learning Algorithms Design of Optimal Satellite Constellation for Indonesian Regional Navigation System based on GEO and GSO Satellites Real-time Testing on Improved Data Transmission Security in the Industrial Control System
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1