Image Dense Captioning of Irregular Regions Based on Visual Saliency

Xiaosheng Wen, Ping Jian
{"title":"Image Dense Captioning of Irregular Regions Based on Visual Saliency","authors":"Xiaosheng Wen, Ping Jian","doi":"10.1109/PRMVIA58252.2023.00008","DOIUrl":null,"url":null,"abstract":"Traditional Dense Captioning intends to describe local details of image with natural language. It usually uses target detection first and then describes the contents in the detected bounding box, which will make the description content rich. But captioning based on target detection often lacks the attention to the association between objects and the environment, or between the objects. And for now, there is no dense captioning method has the ability to deal with irregular areas. To solve these problems, we propose a visual-saliency based region division method. It focuses more on areas than just on objects. Based on the division, the local description of the irregular region is carried out. For each area, we combine the image with the target area to generate features, which are put into the caption model. We used the Visual Genome dataset for training and testing. Through experiments, our model is comparable to the baseline under the traditional bounding box. And the description of irregular region generated by our method is equally good. Our model performs well in image retrieval experiments and has less information redundancy. In the application, we support to manually select the region of interest on the image for description, for assist in expanding the dataset.","PeriodicalId":221346,"journal":{"name":"2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PRMVIA58252.2023.00008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Traditional Dense Captioning intends to describe local details of image with natural language. It usually uses target detection first and then describes the contents in the detected bounding box, which will make the description content rich. But captioning based on target detection often lacks the attention to the association between objects and the environment, or between the objects. And for now, there is no dense captioning method has the ability to deal with irregular areas. To solve these problems, we propose a visual-saliency based region division method. It focuses more on areas than just on objects. Based on the division, the local description of the irregular region is carried out. For each area, we combine the image with the target area to generate features, which are put into the caption model. We used the Visual Genome dataset for training and testing. Through experiments, our model is comparable to the baseline under the traditional bounding box. And the description of irregular region generated by our method is equally good. Our model performs well in image retrieval experiments and has less information redundancy. In the application, we support to manually select the region of interest on the image for description, for assist in expanding the dataset.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于视觉显著性的不规则区域图像密集字幕
传统的密集字幕是用自然语言描述图像的局部细节。通常先对目标进行检测,然后对检测到的边界框内的内容进行描述,使描述内容更加丰富。但基于目标检测的字幕往往缺乏对目标与环境之间或目标之间关联的关注。而目前,还没有密集字幕的方法能够处理不规则区域。为了解决这些问题,我们提出了一种基于视觉显著性的区域划分方法。它更多地关注区域而不仅仅是对象。在此基础上,对不规则区域进行局部描述。对于每个区域,我们将图像与目标区域结合生成特征,并将这些特征放入标题模型中。我们使用Visual Genome数据集进行训练和测试。通过实验,我们的模型与传统边界框下的基线具有可比性。对不规则区域的描述也很好。该模型在图像检索实验中表现良好,信息冗余少。在应用程序中,我们支持手动选择图像上感兴趣的区域进行描述,以帮助扩展数据集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Surface deformation monitoring based on DINSAR technique Sigma-UAP: An Invisible Semi-Universal Adversarial Attack Against Deep Neural Networks Lightweight defect detection method of punched nickel-plated steel strip based on GhostNet Performance Analysis of CHAID Algorithm for Accuracy Garbage Classification and Detection Based on Improved YOLOv7 Network
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1