Automated testing of image captioning systems

Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis Pub Date : 2022-06-14 DOI:10.1145/3533767.3534389

Boxi Yu, Zhiqi Zhong, Xinran Qin, Jiayi Yao, Yuancheng Wang, Pinjia He

{"title":"Automated testing of image captioning systems","authors":"Boxi Yu, Zhiqi Zhong, Xinran Qin, Jiayi Yao, Yuancheng Wang, Pinjia He","doi":"10.1145/3533767.3534389","DOIUrl":null,"url":null,"abstract":"Image captioning (IC) systems, which automatically generate a text description of the salient objects in an image (real or synthetic), have seen great progress over the past few years due to the development of deep neural networks. IC plays an indispensable role in human society, for example, labeling massive photos for scientific studies and assisting visually-impaired people in perceiving the world. However, even the top-notch IC systems, such as Microsoft Azure Cognitive Services and IBM Image Caption Generator, may return incorrect results, leading to the omission of important objects, deep misunderstanding, and threats to personal safety. To address this problem, we propose MetaIC, the first metamorphic testing approach to validate IC systems. Our core idea is that the object names should exhibit directional changes after object insertion. Specifically, MetaIC (1) extracts objects from existing images to construct an object corpus; (2) inserts an object into an image via novel object resizing and location tuning algorithms; and (3) reports image pairs whose captions do not exhibit differences in an expected way. In our evaluation, we use MetaIC to test one widely-adopted image captioning API and five state-of-the-art (SOTA) image captioning models. Using 1,000 seeds, MetaIC successfully reports 16,825 erroneous issues with high precision (84.9%-98.4%). There are three kinds of errors: misclassification, omission, and incorrect quantity. We visualize the errors reported by MetaIC, which shows that flexible overlapping setting facilitates IC testing by increasing and diversifying the reported errors. In addition, MetaIC can be further generalized to detect label errors in the training dataset, which has successfully detected 151 incorrect labels in MS COCO Caption, a standard dataset in image captioning.","PeriodicalId":412271,"journal":{"name":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3533767.3534389","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

Image captioning (IC) systems, which automatically generate a text description of the salient objects in an image (real or synthetic), have seen great progress over the past few years due to the development of deep neural networks. IC plays an indispensable role in human society, for example, labeling massive photos for scientific studies and assisting visually-impaired people in perceiving the world. However, even the top-notch IC systems, such as Microsoft Azure Cognitive Services and IBM Image Caption Generator, may return incorrect results, leading to the omission of important objects, deep misunderstanding, and threats to personal safety. To address this problem, we propose MetaIC, the first metamorphic testing approach to validate IC systems. Our core idea is that the object names should exhibit directional changes after object insertion. Specifically, MetaIC (1) extracts objects from existing images to construct an object corpus; (2) inserts an object into an image via novel object resizing and location tuning algorithms; and (3) reports image pairs whose captions do not exhibit differences in an expected way. In our evaluation, we use MetaIC to test one widely-adopted image captioning API and five state-of-the-art (SOTA) image captioning models. Using 1,000 seeds, MetaIC successfully reports 16,825 erroneous issues with high precision (84.9%-98.4%). There are three kinds of errors: misclassification, omission, and incorrect quantity. We visualize the errors reported by MetaIC, which shows that flexible overlapping setting facilitates IC testing by increasing and diversifying the reported errors. In addition, MetaIC can be further generalized to detect label errors in the training dataset, which has successfully detected 151 incorrect labels in MS COCO Caption, a standard dataset in image captioning.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

图像字幕系统的自动化测试

由于深度神经网络的发展，图像字幕(IC)系统在过去几年中取得了很大的进步，该系统自动生成图像(真实或合成)中显著物体的文本描述。IC在人类社会中发挥着不可或缺的作用，例如为科学研究标记大量照片，帮助视障人士感知世界。然而，即使是最顶尖的IC系统，如微软Azure认知服务和IBM Image Caption Generator，也可能返回不正确的结果，导致遗漏重要对象，造成深刻的误解，并威胁人身安全。为了解决这个问题，我们提出了MetaIC，这是验证IC系统的第一个变质测试方法。我们的核心思想是对象名称应该在对象插入后显示方向变化。具体而言，MetaIC(1)从现有图像中提取对象，构建对象语料库;(2)通过新的目标调整大小和位置调整算法将目标插入图像;(3)报告标题没有表现出预期差异的图像对。在我们的评估中，我们使用MetaIC测试了一个被广泛采用的图像字幕API和五个最先进的(SOTA)图像字幕模型。使用1000个种子，MetaIC以高精度(84.9%-98.4%)成功报告了16,825个错误问题。错误有三种:分类错误、遗漏错误和数量错误。我们可视化了MetaIC报告的错误，这表明灵活的重叠设置通过增加和多样化报告的错误来促进IC测试。此外，MetaIC还可以进一步推广到训练数据集中的标签错误检测，该方法已经成功地在图像字幕的标准数据集MS COCO Caption中检测了151个错误标签。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis

自引率

0.00%

发文量

期刊最新文献

One step further: evaluating interpreters using metamorphic testing Faster mutation analysis with MeMu Test mimicry to assess the exploitability of library vulnerabilities A large-scale study of usability criteria addressed by static analysis tools NCScope: hardware-assisted analyzer for native code in Android apps