{"title":"An Image-Text Matching Method for Multi-Modal Robots","authors":"Ke Zheng, Zhou Li","doi":"10.4018/joeuc.334701","DOIUrl":null,"url":null,"abstract":"With the rapid development of artificial intelligence and deep learning, image-text matching has gradually become an important research topic in cross-modal fields. Achieving correct image-text matching requires a strong understanding of the correspondence between visual and textual information. In recent years, deep learning-based image-text matching methods have achieved significant success. However, image-text matching requires a deep understanding of intra-modal information and the exploration of fine-grained alignment between image regions and textual words. How to integrate these two aspects into a single model remains a challenge. Additionally, reducing the internal complexity of the model and effectively constructing and utilizing prior knowledge are also areas worth exploring, therefore addressing the issues of excessive computational complexity in existing fine-grained matching methods and the lack of multi-perspective matching.","PeriodicalId":49029,"journal":{"name":"Journal of Organizational and End User Computing","volume":"40 28","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2023-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Organizational and End User Computing","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.4018/joeuc.334701","RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
With the rapid development of artificial intelligence and deep learning, image-text matching has gradually become an important research topic in cross-modal fields. Achieving correct image-text matching requires a strong understanding of the correspondence between visual and textual information. In recent years, deep learning-based image-text matching methods have achieved significant success. However, image-text matching requires a deep understanding of intra-modal information and the exploration of fine-grained alignment between image regions and textual words. How to integrate these two aspects into a single model remains a challenge. Additionally, reducing the internal complexity of the model and effectively constructing and utilizing prior knowledge are also areas worth exploring, therefore addressing the issues of excessive computational complexity in existing fine-grained matching methods and the lack of multi-perspective matching.
期刊介绍:
The Journal of Organizational and End User Computing (JOEUC) provides a forum to information technology educators, researchers, and practitioners to advance the practice and understanding of organizational and end user computing. The journal features a major emphasis on how to increase organizational and end user productivity and performance, and how to achieve organizational strategic and competitive advantage. JOEUC publishes full-length research manuscripts, insightful research and practice notes, and case studies from all areas of organizational and end user computing that are selected after a rigorous blind review by experts in the field.