基于语言的机器感知:从语言学角度看字幕数据集的编制工作

IF 0.7 3区 文学 0 HUMANITIES, MULTIDISCIPLINARY Digital Scholarship in the Humanities Pub Date : 2024-06-22 DOI:10.1093/llc/fqae029
Laura Hekanaho, Maija Hirvonen, Tuomas Virtanen
{"title":"基于语言的机器感知:从语言学角度看字幕数据集的编制工作","authors":"Laura Hekanaho, Maija Hirvonen, Tuomas Virtanen","doi":"10.1093/llc/fqae029","DOIUrl":null,"url":null,"abstract":"Over the last decade, a plethora of training datasets have been compiled for use in language-based machine perception and in human-centered AI, alongside research regarding their compilation methods. From a primarily linguistic perspective, we add to these studies in two ways. First, we provide an overview of sixty-six training datasets used in automatic image, video, and audio captioning, examining their compilation methods with a metadata analysis. Second, we delve into the annotation process of crowdsourced datasets with an interest in understanding the linguistic factors that affect the form and content of the captions, such as contextualization and perspectivation. With a qualitative content analysis, we examine annotator instructions with a selection of eleven datasets. Drawing from various theoretical frameworks that help assess the effectiveness of the instructions, we discuss the visual and textual presentation of the instructions, as well as the perspective-guidance that is an essential part of the language instructions. While our analysis indicates that some standards in the formulation of instructions seem to have formed in the field, we also identified various reoccurring issues potentially hindering readability and comprehensibility of the instructions, and therefore, caption quality. To enhance readability, we emphasize the importance of text structure, organization of the information, consistent use of typographical cues, and clarity of language use. Last, engaging with previous research, we assess the compilation of both web-sourced and crowdsourced captioning datasets from various perspectives, discussing factors affecting the diversity of the datasets.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":"44 1","pages":""},"PeriodicalIF":0.7000,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Language-based machine perception: linguistic perspectives on the compilation of captioning datasets\",\"authors\":\"Laura Hekanaho, Maija Hirvonen, Tuomas Virtanen\",\"doi\":\"10.1093/llc/fqae029\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Over the last decade, a plethora of training datasets have been compiled for use in language-based machine perception and in human-centered AI, alongside research regarding their compilation methods. From a primarily linguistic perspective, we add to these studies in two ways. First, we provide an overview of sixty-six training datasets used in automatic image, video, and audio captioning, examining their compilation methods with a metadata analysis. Second, we delve into the annotation process of crowdsourced datasets with an interest in understanding the linguistic factors that affect the form and content of the captions, such as contextualization and perspectivation. With a qualitative content analysis, we examine annotator instructions with a selection of eleven datasets. Drawing from various theoretical frameworks that help assess the effectiveness of the instructions, we discuss the visual and textual presentation of the instructions, as well as the perspective-guidance that is an essential part of the language instructions. While our analysis indicates that some standards in the formulation of instructions seem to have formed in the field, we also identified various reoccurring issues potentially hindering readability and comprehensibility of the instructions, and therefore, caption quality. To enhance readability, we emphasize the importance of text structure, organization of the information, consistent use of typographical cues, and clarity of language use. Last, engaging with previous research, we assess the compilation of both web-sourced and crowdsourced captioning datasets from various perspectives, discussing factors affecting the diversity of the datasets.\",\"PeriodicalId\":45315,\"journal\":{\"name\":\"Digital Scholarship in the Humanities\",\"volume\":\"44 1\",\"pages\":\"\"},\"PeriodicalIF\":0.7000,\"publicationDate\":\"2024-06-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Digital Scholarship in the Humanities\",\"FirstCategoryId\":\"98\",\"ListUrlMain\":\"https://doi.org/10.1093/llc/fqae029\",\"RegionNum\":3,\"RegionCategory\":\"文学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"HUMANITIES, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Scholarship in the Humanities","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1093/llc/fqae029","RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"HUMANITIES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

在过去的十年中,已经有大量的训练数据集被编译用于基于语言的机器感知和以人为中心的人工智能,同时还有关于其编译方法的研究。我们主要从语言学的角度,从两个方面对这些研究进行补充。首先,我们概述了用于自动图像、视频和音频字幕的 66 个训练数据集,并通过元数据分析研究了它们的编译方法。其次,我们深入研究了众包数据集的注释过程,希望了解影响字幕形式和内容的语言因素,如语境化和视角化。通过定性内容分析,我们选取了 11 个数据集来研究注释者的说明。我们借鉴了有助于评估说明有效性的各种理论框架,讨论了说明的视觉和文本呈现方式,以及作为语言说明重要组成部分的视角指导。我们的分析表明,在制定说明方面似乎已经形成了一些标准,但我们也发现了一些反复出现的问题,这些问题可能会妨碍说明的可读性和可理解性,从而影响字幕质量。为了提高可读性,我们强调了文本结构、信息组织、排版提示的连贯使用以及语言使用清晰度的重要性。最后,结合之前的研究,我们从不同角度评估了网络来源和众包字幕数据集的编译情况,并讨论了影响数据集多样性的因素。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Language-based machine perception: linguistic perspectives on the compilation of captioning datasets
Over the last decade, a plethora of training datasets have been compiled for use in language-based machine perception and in human-centered AI, alongside research regarding their compilation methods. From a primarily linguistic perspective, we add to these studies in two ways. First, we provide an overview of sixty-six training datasets used in automatic image, video, and audio captioning, examining their compilation methods with a metadata analysis. Second, we delve into the annotation process of crowdsourced datasets with an interest in understanding the linguistic factors that affect the form and content of the captions, such as contextualization and perspectivation. With a qualitative content analysis, we examine annotator instructions with a selection of eleven datasets. Drawing from various theoretical frameworks that help assess the effectiveness of the instructions, we discuss the visual and textual presentation of the instructions, as well as the perspective-guidance that is an essential part of the language instructions. While our analysis indicates that some standards in the formulation of instructions seem to have formed in the field, we also identified various reoccurring issues potentially hindering readability and comprehensibility of the instructions, and therefore, caption quality. To enhance readability, we emphasize the importance of text structure, organization of the information, consistent use of typographical cues, and clarity of language use. Last, engaging with previous research, we assess the compilation of both web-sourced and crowdsourced captioning datasets from various perspectives, discussing factors affecting the diversity of the datasets.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
1.80
自引率
25.00%
发文量
78
期刊介绍: DSH or Digital Scholarship in the Humanities is an international, peer reviewed journal which publishes original contributions on all aspects of digital scholarship in the Humanities including, but not limited to, the field of what is currently called the Digital Humanities. Long and short papers report on theoretical, methodological, experimental, and applied research and include results of research projects, descriptions and evaluations of tools, techniques, and methodologies, and reports on work in progress. DSH also publishes reviews of books and resources. Digital Scholarship in the Humanities was previously known as Literary and Linguistic Computing.
期刊最新文献
Social network analysis of the Babylonian Talmud Ancient classical theatre from the digital humanities: a systematic review 2010–21 Language-based machine perception: linguistic perspectives on the compilation of captioning datasets Personality prediction via multi-task transformer architecture combined with image aesthetics Who wrote the first Constitutions of Freemasonry?
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1