利用图算法加速大型押韵语料库的标注

Q2 Arts and Humanities Cahiers de Linguistique Asie Orientale Pub Date : 2022-03-17 DOI:10.1163/19606028-bja10019
Julien Baley
{"title":"利用图算法加速大型押韵语料库的标注","authors":"Julien Baley","doi":"10.1163/19606028-bja10019","DOIUrl":null,"url":null,"abstract":"\n Rhyming patterns play a crucial role in the phonological reconstruction of earlier stages of Chinese. The past few years have seen the emergence of the use of graphs to model rhyming patterns, notably with List’s (2016) proposal to use graph community detection as a way to go beyond the limits of the link-and-bind method and test new hypotheses regarding phonological reconstruction. List’s approach requires the existence of a rhyme-annotated corpus; such corpora are rare and prohibitively expensive to produce. The present paper solves this problem by introducing several strategies to automate annotation. Among others, the main contribution is the use of graph community detection itself to build an automatic annotator. This annotator requires no previous annotation, no knowledge of phonology, and automatically adapts to corpora of different periods by learning their rhyme categories. Through a series of case studies, we demonstrate the viability of the approach in quickly annotating hundreds of thousands of poems with high accuracy.","PeriodicalId":35117,"journal":{"name":"Cahiers de Linguistique Asie Orientale","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Leveraging graph algorithms to speed up the annotation of large rhymed corpora\",\"authors\":\"Julien Baley\",\"doi\":\"10.1163/19606028-bja10019\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n Rhyming patterns play a crucial role in the phonological reconstruction of earlier stages of Chinese. The past few years have seen the emergence of the use of graphs to model rhyming patterns, notably with List’s (2016) proposal to use graph community detection as a way to go beyond the limits of the link-and-bind method and test new hypotheses regarding phonological reconstruction. List’s approach requires the existence of a rhyme-annotated corpus; such corpora are rare and prohibitively expensive to produce. The present paper solves this problem by introducing several strategies to automate annotation. Among others, the main contribution is the use of graph community detection itself to build an automatic annotator. This annotator requires no previous annotation, no knowledge of phonology, and automatically adapts to corpora of different periods by learning their rhyme categories. Through a series of case studies, we demonstrate the viability of the approach in quickly annotating hundreds of thousands of poems with high accuracy.\",\"PeriodicalId\":35117,\"journal\":{\"name\":\"Cahiers de Linguistique Asie Orientale\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cahiers de Linguistique Asie Orientale\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1163/19606028-bja10019\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Arts and Humanities\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cahiers de Linguistique Asie Orientale","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1163/19606028-bja10019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Arts and Humanities","Score":null,"Total":0}
引用次数: 1

摘要

押韵模式在汉语早期语音重建中起着至关重要的作用。在过去的几年里,出现了使用图来建模押韵模式的现象,特别是List(2016)提出的使用图群落检测来超越链接绑定方法的限制,并测试有关语音重建的新假设。List的方法需要有一个押韵的语料库;这样的社团很少见,而且制作成本高得令人望而却步。本文通过引入几种自动注释策略来解决这个问题。其中,主要贡献是使用图社区检测本身来构建自动注释器。这个注释者不需要先前的注释,也不需要音韵学知识,并通过学习不同时期的韵类来自动适应语料库。通过一系列的案例研究,我们证明了这种方法在快速准确地注释数十万首诗歌方面的可行性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Leveraging graph algorithms to speed up the annotation of large rhymed corpora
Rhyming patterns play a crucial role in the phonological reconstruction of earlier stages of Chinese. The past few years have seen the emergence of the use of graphs to model rhyming patterns, notably with List’s (2016) proposal to use graph community detection as a way to go beyond the limits of the link-and-bind method and test new hypotheses regarding phonological reconstruction. List’s approach requires the existence of a rhyme-annotated corpus; such corpora are rare and prohibitively expensive to produce. The present paper solves this problem by introducing several strategies to automate annotation. Among others, the main contribution is the use of graph community detection itself to build an automatic annotator. This annotator requires no previous annotation, no knowledge of phonology, and automatically adapts to corpora of different periods by learning their rhyme categories. Through a series of case studies, we demonstrate the viability of the approach in quickly annotating hundreds of thousands of poems with high accuracy.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Cahiers de Linguistique Asie Orientale
Cahiers de Linguistique Asie Orientale Arts and Humanities-Language and Linguistics
CiteScore
0.90
自引率
0.00%
发文量
11
期刊介绍: The Cahiers is an international linguistics journal whose mission is to publish new and original research on the analysis of languages of the Asian region, be they descriptive or theoretical. This clearly reflects the broad research domain of our laboratory : the Centre for Linguistic Research on East Asian Languages (CRLAO). The journal was created in 1977 by Viviane Alleton and Alain Peyraube and has been directed by three successive teams of editors, all professors based at the CRLAO in Paris. An Editorial Board, composed of scholars from around the world, assists in the reviewing process and in a consultative role.
期刊最新文献
Person indexation in Dpal.ri Smar Evidentiality in Middle Classical Tibetan The development of zl in Tibetic languages Immediate anteriority construction in Cantonese Negation in Hlai
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1