Tiny TR-CAP: A novel small-scale benchmark dataset for general-purpose image captioning tasks

IF 5.1 2区 工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY Engineering Science and Technology-An International Journal-Jestech Pub Date : 2025-03-03 DOI:10.1016/j.jestch.2025.102009
Abbas Memiş , Serdar Yıldız
{"title":"Tiny TR-CAP: A novel small-scale benchmark dataset for general-purpose image captioning tasks","authors":"Abbas Memiş ,&nbsp;Serdar Yıldız","doi":"10.1016/j.jestch.2025.102009","DOIUrl":null,"url":null,"abstract":"<div><div>In the last decade, the outstanding performance of deep learning has also led to a rapid and inevitable rise in automatic image captioning, as well as the need for large amounts of data. Although well-known, conventional and publicly available datasets have been proposed for the image captioning task, the lack of ground-truth caption data still remains a major challenge in the generation of accurate image captions. To address this issue, in this paper we introduced a novel image captioning benchmark dataset called Tiny TR-CAP, which consists of 1076 original images and 5380 handwritten captions (5 captions for each image with high diversity). The captions, which were translated into English using two web-based language translation APIs and a novel multilingual deep machine translation model, were tested against 11 state-of-the-art and prominent deep learning-based models, including CLIPCap, BLIP, BLIP2, FUSECAP, OFA, PromptCap, Kosmos2, MiniGPT4, LlaVA, BakLlaVA, and GIT. In the experimental studies, the accuracy statistics of the captions generated by the related models were reported in terms of the BLEU, METEOR, ROUGE-L, CIDEr, SPICE, and WMD captioning metrics, and their performance was evaluated comparatively. In the performance analysis, quite promising captioning performances were observed, and the best success rates were achieved with the OFA model with scores of 0.7097 BLEU-1, 0.5389 BLEU-2, 0.3940 BLEU-3, 0.2875 BLEU-4, 0.1797 METEOR, 0.4627 ROUGE-L, 0.2938 CIDEr, 0.0626 SPICE, and 0.4605 WMD. To support research studies in the field of image captioning, the image and caption sets of Tiny TR-CAP will also be publicly available on GitHub (<span><span>https://github.com/abbasmemis/tiny_TR-CAP</span><svg><path></path></svg></span>) for academic research purposes.</div></div>","PeriodicalId":48609,"journal":{"name":"Engineering Science and Technology-An International Journal-Jestech","volume":"64 ","pages":"Article 102009"},"PeriodicalIF":5.1000,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Science and Technology-An International Journal-Jestech","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2215098625000643","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

In the last decade, the outstanding performance of deep learning has also led to a rapid and inevitable rise in automatic image captioning, as well as the need for large amounts of data. Although well-known, conventional and publicly available datasets have been proposed for the image captioning task, the lack of ground-truth caption data still remains a major challenge in the generation of accurate image captions. To address this issue, in this paper we introduced a novel image captioning benchmark dataset called Tiny TR-CAP, which consists of 1076 original images and 5380 handwritten captions (5 captions for each image with high diversity). The captions, which were translated into English using two web-based language translation APIs and a novel multilingual deep machine translation model, were tested against 11 state-of-the-art and prominent deep learning-based models, including CLIPCap, BLIP, BLIP2, FUSECAP, OFA, PromptCap, Kosmos2, MiniGPT4, LlaVA, BakLlaVA, and GIT. In the experimental studies, the accuracy statistics of the captions generated by the related models were reported in terms of the BLEU, METEOR, ROUGE-L, CIDEr, SPICE, and WMD captioning metrics, and their performance was evaluated comparatively. In the performance analysis, quite promising captioning performances were observed, and the best success rates were achieved with the OFA model with scores of 0.7097 BLEU-1, 0.5389 BLEU-2, 0.3940 BLEU-3, 0.2875 BLEU-4, 0.1797 METEOR, 0.4627 ROUGE-L, 0.2938 CIDEr, 0.0626 SPICE, and 0.4605 WMD. To support research studies in the field of image captioning, the image and caption sets of Tiny TR-CAP will also be publicly available on GitHub (https://github.com/abbasmemis/tiny_TR-CAP) for academic research purposes.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
求助全文
约1分钟内获得全文 去求助
来源期刊
Engineering Science and Technology-An International Journal-Jestech
Engineering Science and Technology-An International Journal-Jestech Materials Science-Electronic, Optical and Magnetic Materials
CiteScore
11.20
自引率
3.50%
发文量
153
审稿时长
22 days
期刊介绍: Engineering Science and Technology, an International Journal (JESTECH) (formerly Technology), a peer-reviewed quarterly engineering journal, publishes both theoretical and experimental high quality papers of permanent interest, not previously published in journals, in the field of engineering and applied science which aims to promote the theory and practice of technology and engineering. In addition to peer-reviewed original research papers, the Editorial Board welcomes original research reports, state-of-the-art reviews and communications in the broadly defined field of engineering science and technology. The scope of JESTECH includes a wide spectrum of subjects including: -Electrical/Electronics and Computer Engineering (Biomedical Engineering and Instrumentation; Coding, Cryptography, and Information Protection; Communications, Networks, Mobile Computing and Distributed Systems; Compilers and Operating Systems; Computer Architecture, Parallel Processing, and Dependability; Computer Vision and Robotics; Control Theory; Electromagnetic Waves, Microwave Techniques and Antennas; Embedded Systems; Integrated Circuits, VLSI Design, Testing, and CAD; Microelectromechanical Systems; Microelectronics, and Electronic Devices and Circuits; Power, Energy and Energy Conversion Systems; Signal, Image, and Speech Processing) -Mechanical and Civil Engineering (Automotive Technologies; Biomechanics; Construction Materials; Design and Manufacturing; Dynamics and Control; Energy Generation, Utilization, Conversion, and Storage; Fluid Mechanics and Hydraulics; Heat and Mass Transfer; Micro-Nano Sciences; Renewable and Sustainable Energy Technologies; Robotics and Mechatronics; Solid Mechanics and Structure; Thermal Sciences) -Metallurgical and Materials Engineering (Advanced Materials Science; Biomaterials; Ceramic and Inorgnanic Materials; Electronic-Magnetic Materials; Energy and Environment; Materials Characterizastion; Metallurgy; Polymers and Nanocomposites)
期刊最新文献
Designing green and safe micro mobility routes: An advanced geo-analytic decision system based approach to sustainable urban infrastructure New blockchain consensus algorithm applied on healthcare industry: Proof of Visit- (POV) Treatment of real pharmaceutical industry wastewater by photo-Fenton oxidation using the response surface methodology, evaluation of diclofenac degradation and toxicity Tiny TR-CAP: A novel small-scale benchmark dataset for general-purpose image captioning tasks Editorial Board
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1