众包缩略图说明:数据收集和验证

IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE ACM Transactions on Interactive Intelligent Systems Pub Date : 2023-03-28 DOI:10.1145/3589346
Carlos Alejandro Aguirre, Shiye Cao, Amama Mahmood, Chien-Ming Huang
{"title":"众包缩略图说明:数据收集和验证","authors":"Carlos Alejandro Aguirre, Shiye Cao, Amama Mahmood, Chien-Ming Huang","doi":"10.1145/3589346","DOIUrl":null,"url":null,"abstract":"Speech interfaces, such as personal assistants and screen readers, read image captions to users. Typically, however, only one caption is available per image, which may not be adequate for all situations (e.g., browsing large quantities of images). Long captions provide a deeper understanding of an image but require more time to listen to, whereas shorter captions may not allow for such thorough comprehension yet have the advantage of being faster to consume. We explore how to effectively collect both thumbnail captions—succinct image descriptions meant to be consumed quickly—and comprehensive captions—which allow individuals to understand visual content in greater detail. We consider text-based instructions and time-constrained methods to collect descriptions at these two levels of detail and find that a time-constrained method is the most effective for collecting thumbnail captions while preserving caption accuracy. Additionally, we verify that caption authors using this time-constrained method are still able to focus on the most important regions of an image by tracking their eye gaze. We evaluate our collected captions along human-rated axes—correctness, fluency, amount of detail, and mentions of important concepts—and discuss the potential for model-based metrics to perform large-scale automatic evaluations in the future.","PeriodicalId":48574,"journal":{"name":"ACM Transactions on Interactive Intelligent Systems","volume":"14 4 1","pages":"1 - 28"},"PeriodicalIF":3.6000,"publicationDate":"2023-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Crowdsourcing Thumbnail Captions: Data Collection and Validation\",\"authors\":\"Carlos Alejandro Aguirre, Shiye Cao, Amama Mahmood, Chien-Ming Huang\",\"doi\":\"10.1145/3589346\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech interfaces, such as personal assistants and screen readers, read image captions to users. Typically, however, only one caption is available per image, which may not be adequate for all situations (e.g., browsing large quantities of images). Long captions provide a deeper understanding of an image but require more time to listen to, whereas shorter captions may not allow for such thorough comprehension yet have the advantage of being faster to consume. We explore how to effectively collect both thumbnail captions—succinct image descriptions meant to be consumed quickly—and comprehensive captions—which allow individuals to understand visual content in greater detail. We consider text-based instructions and time-constrained methods to collect descriptions at these two levels of detail and find that a time-constrained method is the most effective for collecting thumbnail captions while preserving caption accuracy. Additionally, we verify that caption authors using this time-constrained method are still able to focus on the most important regions of an image by tracking their eye gaze. We evaluate our collected captions along human-rated axes—correctness, fluency, amount of detail, and mentions of important concepts—and discuss the potential for model-based metrics to perform large-scale automatic evaluations in the future.\",\"PeriodicalId\":48574,\"journal\":{\"name\":\"ACM Transactions on Interactive Intelligent Systems\",\"volume\":\"14 4 1\",\"pages\":\"1 - 28\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2023-03-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Interactive Intelligent Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3589346\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Interactive Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3589346","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

语音界面,如个人助理和屏幕阅读器,会向用户读取图像说明。然而,通常情况下,每张图片只有一个标题,这可能不适用于所有情况(例如,浏览大量图片)。较长的字幕提供了对图像更深入的理解,但需要更多的时间来听,而较短的字幕可能不允许如此彻底的理解,但具有更快的消费优势。我们探讨了如何有效地收集缩略图标题(简洁的图像描述,旨在快速消费)和综合标题(允许个人更详细地理解视觉内容)。我们考虑了基于文本的指令和时间约束的方法来收集这两个细节级别的描述,并发现时间约束的方法在收集缩略图标题的同时保持标题的准确性是最有效的。此外,我们验证了使用这种时间约束方法的标题作者仍然能够通过跟踪他们的眼睛注视来关注图像中最重要的区域。我们按照人类评定的标准(正确性、流畅性、细节数量和重要概念的提及)评估收集到的标题,并讨论基于模型的指标在未来执行大规模自动评估的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Crowdsourcing Thumbnail Captions: Data Collection and Validation
Speech interfaces, such as personal assistants and screen readers, read image captions to users. Typically, however, only one caption is available per image, which may not be adequate for all situations (e.g., browsing large quantities of images). Long captions provide a deeper understanding of an image but require more time to listen to, whereas shorter captions may not allow for such thorough comprehension yet have the advantage of being faster to consume. We explore how to effectively collect both thumbnail captions—succinct image descriptions meant to be consumed quickly—and comprehensive captions—which allow individuals to understand visual content in greater detail. We consider text-based instructions and time-constrained methods to collect descriptions at these two levels of detail and find that a time-constrained method is the most effective for collecting thumbnail captions while preserving caption accuracy. Additionally, we verify that caption authors using this time-constrained method are still able to focus on the most important regions of an image by tracking their eye gaze. We evaluate our collected captions along human-rated axes—correctness, fluency, amount of detail, and mentions of important concepts—and discuss the potential for model-based metrics to perform large-scale automatic evaluations in the future.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
ACM Transactions on Interactive Intelligent Systems
ACM Transactions on Interactive Intelligent Systems Computer Science-Human-Computer Interaction
CiteScore
7.80
自引率
2.90%
发文量
38
期刊介绍: The ACM Transactions on Interactive Intelligent Systems (TiiS) publishes papers on research concerning the design, realization, or evaluation of interactive systems that incorporate some form of machine intelligence. TIIS articles come from a wide range of research areas and communities. An article can take any of several complementary views of interactive intelligent systems, focusing on: the intelligent technology, the interaction of users with the system, or both aspects at once.
期刊最新文献
Categorical and Continuous Features in Counterfactual Explanations of AI Systems ID.8: Co-Creating Visual Stories with Generative AI Visualization for Recommendation Explainability: A Survey and New Perspectives Unpacking Human-AI interactions: From interaction primitives to a design space AutoRL X: Automated Reinforcement Learning on the Web
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1