Automating the Detection of Poetic Features: The Limerick as Model Organism

Almas Abdibayev, Yohei Igarashi, A. Riddell, D. Rockmore
{"title":"Automating the Detection of Poetic Features: The Limerick as Model Organism","authors":"Almas Abdibayev, Yohei Igarashi, A. Riddell, D. Rockmore","doi":"10.18653/v1/2021.latechclfl-1.9","DOIUrl":null,"url":null,"abstract":"In this paper we take up the problem of “limerick detection” and describe a system to identify five-line poems as limericks or not. This turns out to be a surprisingly difficult challenge with many subtleties. More precisely, we produce an algorithm which focuses on the structural aspects of the limerick – rhyme scheme and rhythm (i.e., stress patterns) – and when tested on a a culled data set of 98,454 publicly available limericks, our “limerick filter” accepts 67% as limericks. The primary failure of our filter is on the detection of “non-standard” rhymes, which we highlight as an outstanding challenge in computational poetics. Our accent detection algorithm proves to be very robust. Our main contributions are (1) a novel rhyme detection algorithm that works on English words including rare proper nouns and made-up words (and thus, words not in the widely used CMUDict database); (2) a novel rhythm-identifying heuristic that is robust to language noise at moderate levels and comparable in accuracy to state-of-the-art scansion algorithms. As a third significant contribution (3) we make publicly available a large corpus of limericks that includes tags of “limerick” or “not-limerick” as determined by our identification software, thereby providing a benchmark for the community. The poetic tasks that we have identified as challenges for machines suggest that the limerick is a useful “model organism” for the study of machine capabilities in poetry and more broadly literature and language. We include a list of open challenges as well. Generally, we anticipate that this work will provide useful material and benchmarks for future explorations in the field.","PeriodicalId":441300,"journal":{"name":"Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2021.latechclfl-1.9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

In this paper we take up the problem of “limerick detection” and describe a system to identify five-line poems as limericks or not. This turns out to be a surprisingly difficult challenge with many subtleties. More precisely, we produce an algorithm which focuses on the structural aspects of the limerick – rhyme scheme and rhythm (i.e., stress patterns) – and when tested on a a culled data set of 98,454 publicly available limericks, our “limerick filter” accepts 67% as limericks. The primary failure of our filter is on the detection of “non-standard” rhymes, which we highlight as an outstanding challenge in computational poetics. Our accent detection algorithm proves to be very robust. Our main contributions are (1) a novel rhyme detection algorithm that works on English words including rare proper nouns and made-up words (and thus, words not in the widely used CMUDict database); (2) a novel rhythm-identifying heuristic that is robust to language noise at moderate levels and comparable in accuracy to state-of-the-art scansion algorithms. As a third significant contribution (3) we make publicly available a large corpus of limericks that includes tags of “limerick” or “not-limerick” as determined by our identification software, thereby providing a benchmark for the community. The poetic tasks that we have identified as challenges for machines suggest that the limerick is a useful “model organism” for the study of machine capabilities in poetry and more broadly literature and language. We include a list of open challenges as well. Generally, we anticipate that this work will provide useful material and benchmarks for future explorations in the field.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
诗歌特征的自动化检测:作为模式生物的打油诗
本文探讨了“打油诗检测”问题,提出了一个五行诗是否为打油诗的识别系统。事实证明,这是一个非常困难的挑战,其中有许多微妙之处。更准确地说,我们产生了一个算法,专注于打油诗的结构方面——押韵方案和节奏(即重音模式)——当在98,454首公开可用的打油诗的精选数据集上进行测试时,我们的“打油诗过滤器”接受67%的打油诗。我们的过滤器的主要失败是在“非标准”押韵的检测上,我们强调这是计算诗学中的一个突出挑战。我们的算法被证明是非常鲁棒的。我们的主要贡献有:(1)一种新颖的押韵检测算法,该算法适用于英语单词,包括罕见的专有名词和合成词(因此,不在广泛使用的CMUDict数据库中的单词);(2)一种新颖的节奏识别启发式算法,它对中等水平的语言噪声具有鲁棒性,其准确性可与最先进的扫描算法相媲美。作为第三个重要贡献(3),我们公开了大量的打油诗语料库,其中包括由我们的识别软件确定的“打油诗”或“非打油诗”标签,从而为社区提供了一个基准。我们已经确定为机器挑战的诗歌任务表明,打油诗是研究机器在诗歌和更广泛的文学和语言方面的能力的有用的“模式生物”。我们还列出了一系列公开的挑战。总的来说,我们预计这项工作将为该领域未来的探索提供有用的材料和基准。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
FrameNet-like Annotation of Olfactory Information in Texts The Early Modern Dutch Mediascape. Detecting Media Mentions in Chronicles Using Word Embeddings and CRF Period Classification in Chinese Historical Texts A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek The Multilingual Corpus of Survey Questionnaires Query Interface
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1