多模态机器人图像文本匹配方法

IF 3.6 3区 管理学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Journal of Organizational and End User Computing Pub Date : 2023-12-08 DOI:10.4018/joeuc.334701
Ke Zheng, Zhou Li
{"title":"多模态机器人图像文本匹配方法","authors":"Ke Zheng, Zhou Li","doi":"10.4018/joeuc.334701","DOIUrl":null,"url":null,"abstract":"With the rapid development of artificial intelligence and deep learning, image-text matching has gradually become an important research topic in cross-modal fields. Achieving correct image-text matching requires a strong understanding of the correspondence between visual and textual information. In recent years, deep learning-based image-text matching methods have achieved significant success. However, image-text matching requires a deep understanding of intra-modal information and the exploration of fine-grained alignment between image regions and textual words. How to integrate these two aspects into a single model remains a challenge. Additionally, reducing the internal complexity of the model and effectively constructing and utilizing prior knowledge are also areas worth exploring, therefore addressing the issues of excessive computational complexity in existing fine-grained matching methods and the lack of multi-perspective matching.","PeriodicalId":49029,"journal":{"name":"Journal of Organizational and End User Computing","volume":"40 28","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2023-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Image-Text Matching Method for Multi-Modal Robots\",\"authors\":\"Ke Zheng, Zhou Li\",\"doi\":\"10.4018/joeuc.334701\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the rapid development of artificial intelligence and deep learning, image-text matching has gradually become an important research topic in cross-modal fields. Achieving correct image-text matching requires a strong understanding of the correspondence between visual and textual information. In recent years, deep learning-based image-text matching methods have achieved significant success. However, image-text matching requires a deep understanding of intra-modal information and the exploration of fine-grained alignment between image regions and textual words. How to integrate these two aspects into a single model remains a challenge. Additionally, reducing the internal complexity of the model and effectively constructing and utilizing prior knowledge are also areas worth exploring, therefore addressing the issues of excessive computational complexity in existing fine-grained matching methods and the lack of multi-perspective matching.\",\"PeriodicalId\":49029,\"journal\":{\"name\":\"Journal of Organizational and End User Computing\",\"volume\":\"40 28\",\"pages\":\"\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2023-12-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Organizational and End User Computing\",\"FirstCategoryId\":\"91\",\"ListUrlMain\":\"https://doi.org/10.4018/joeuc.334701\",\"RegionNum\":3,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Organizational and End User Computing","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.4018/joeuc.334701","RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

随着人工智能和深度学习的快速发展,图像-文本匹配逐渐成为跨模态领域的重要研究课题。实现正确的图像-文本匹配需要对视觉和文本信息之间的对应关系有深刻的理解。近年来,基于深度学习的图像-文本匹配方法取得了显著的成功。然而,图像-文本匹配需要深入理解模态内信息,并探索图像区域和文本单词之间的细粒度对齐。如何将这两个方面集成到一个模型中仍然是一个挑战。此外,降低模型的内部复杂性,有效地构建和利用先验知识也是值得探索的领域,从而解决现有细粒度匹配方法计算复杂度过高和缺乏多视角匹配的问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
An Image-Text Matching Method for Multi-Modal Robots
With the rapid development of artificial intelligence and deep learning, image-text matching has gradually become an important research topic in cross-modal fields. Achieving correct image-text matching requires a strong understanding of the correspondence between visual and textual information. In recent years, deep learning-based image-text matching methods have achieved significant success. However, image-text matching requires a deep understanding of intra-modal information and the exploration of fine-grained alignment between image regions and textual words. How to integrate these two aspects into a single model remains a challenge. Additionally, reducing the internal complexity of the model and effectively constructing and utilizing prior knowledge are also areas worth exploring, therefore addressing the issues of excessive computational complexity in existing fine-grained matching methods and the lack of multi-perspective matching.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Organizational and End User Computing
Journal of Organizational and End User Computing COMPUTER SCIENCE, INFORMATION SYSTEMS-
CiteScore
6.00
自引率
9.20%
发文量
77
期刊介绍: The Journal of Organizational and End User Computing (JOEUC) provides a forum to information technology educators, researchers, and practitioners to advance the practice and understanding of organizational and end user computing. The journal features a major emphasis on how to increase organizational and end user productivity and performance, and how to achieve organizational strategic and competitive advantage. JOEUC publishes full-length research manuscripts, insightful research and practice notes, and case studies from all areas of organizational and end user computing that are selected after a rigorous blind review by experts in the field.
期刊最新文献
Cross-Checking-Based Trademark Image Retrieval for Hot Company Detection E-Commerce Review Sentiment Analysis and Purchase Intention Prediction Based on Deep Learning Technology Financial Cycle With Text Information Embedding Based on LDA Measurement and Nowcasting Enhancing Innovation Management and Venture Capital Evaluation via Advanced Deep Learning Techniques Going Global in the Digital Era
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1