Discovering Correction Rules for Auto Editing

Anta Huang, Tsung-Ting Kuo, Ying-Chun Lai, Shou-de Lin
{"title":"Discovering Correction Rules for Auto Editing","authors":"Anta Huang, Tsung-Ting Kuo, Ying-Chun Lai, Shou-de Lin","doi":"10.30019/IJCLCLP.201009.0004","DOIUrl":null,"url":null,"abstract":"This paper describes a framework that extracts effective correction rules from a sentence-aligned corpus and shows a practical application: auto-editing using the discovered rules. The framework exploits the methodology of finding the Levenshtein distance between sentences to identify the key parts of the rules and uses the editing corpus to filter, condense, and refine the rules. We have produced the rule candidates of such form, A → B, where A stands for the erroneous pattern and B for the correct pattern.The developed framework is language independent; therefore, it can be applied to other languages. The evaluation of the discovered rules reveals that 67.2% of the top 1500 ranked rules are annotated as correct or mostly correct by experts. Based on the rules, we have developed an online auto-editing system for demonstration at http://ppt.cc/02yY.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Comput. Linguistics Chin. Lang. Process.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.30019/IJCLCLP.201009.0004","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

This paper describes a framework that extracts effective correction rules from a sentence-aligned corpus and shows a practical application: auto-editing using the discovered rules. The framework exploits the methodology of finding the Levenshtein distance between sentences to identify the key parts of the rules and uses the editing corpus to filter, condense, and refine the rules. We have produced the rule candidates of such form, A → B, where A stands for the erroneous pattern and B for the correct pattern.The developed framework is language independent; therefore, it can be applied to other languages. The evaluation of the discovered rules reveals that 67.2% of the top 1500 ranked rules are annotated as correct or mostly correct by experts. Based on the rules, we have developed an online auto-editing system for demonstration at http://ppt.cc/02yY.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
发现自动编辑的更正规则
本文描述了一个从与句子对齐的语料库中提取有效纠错规则的框架,并展示了一个实际应用:利用发现的规则进行自动编辑。该框架利用寻找句子之间Levenshtein距离的方法来识别规则的关键部分,并使用编辑语料库来过滤、浓缩和精炼规则。我们已经产生了A→B这种形式的候选规则,其中A代表错误模式,B代表正确模式。开发的框架是独立于语言的;因此,它可以应用于其他语言。对发现的规则的评估显示,排名前1500的规则中有67.2%被专家注释为正确或基本正确。根据这些规则,我们开发了一个在线自动编辑系统进行演示,网址是http://ppt.cc/02yY。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Enriching Cold Start Personalized Language Model Using Social Network Information Detecting and Correcting Syntactic Errors in Machine Translation Using Feature-Based Lexicalized Tree Adjoining Grammars TQDL: Integrated Models for Cross-Language Document Retrieval Evaluation of TTS Systems in Intelligibility and Comprehension Tasks: a Case Study of HTS-2008 and Multisyn Synthesizers Effects of Combining Bilingual and Collocational Information on Translation of English and Chinese Verb-Noun Pairs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1