{"title":"多模态机器人图像文本匹配方法","authors":"Ke Zheng, Zhou Li","doi":"10.4018/joeuc.334701","DOIUrl":null,"url":null,"abstract":"With the rapid development of artificial intelligence and deep learning, image-text matching has gradually become an important research topic in cross-modal fields. Achieving correct image-text matching requires a strong understanding of the correspondence between visual and textual information. In recent years, deep learning-based image-text matching methods have achieved significant success. However, image-text matching requires a deep understanding of intra-modal information and the exploration of fine-grained alignment between image regions and textual words. How to integrate these two aspects into a single model remains a challenge. Additionally, reducing the internal complexity of the model and effectively constructing and utilizing prior knowledge are also areas worth exploring, therefore addressing the issues of excessive computational complexity in existing fine-grained matching methods and the lack of multi-perspective matching.","PeriodicalId":49029,"journal":{"name":"Journal of Organizational and End User Computing","volume":"40 28","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2023-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Image-Text Matching Method for Multi-Modal Robots\",\"authors\":\"Ke Zheng, Zhou Li\",\"doi\":\"10.4018/joeuc.334701\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the rapid development of artificial intelligence and deep learning, image-text matching has gradually become an important research topic in cross-modal fields. Achieving correct image-text matching requires a strong understanding of the correspondence between visual and textual information. In recent years, deep learning-based image-text matching methods have achieved significant success. However, image-text matching requires a deep understanding of intra-modal information and the exploration of fine-grained alignment between image regions and textual words. How to integrate these two aspects into a single model remains a challenge. Additionally, reducing the internal complexity of the model and effectively constructing and utilizing prior knowledge are also areas worth exploring, therefore addressing the issues of excessive computational complexity in existing fine-grained matching methods and the lack of multi-perspective matching.\",\"PeriodicalId\":49029,\"journal\":{\"name\":\"Journal of Organizational and End User Computing\",\"volume\":\"40 28\",\"pages\":\"\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2023-12-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Organizational and End User Computing\",\"FirstCategoryId\":\"91\",\"ListUrlMain\":\"https://doi.org/10.4018/joeuc.334701\",\"RegionNum\":3,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Organizational and End User Computing","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.4018/joeuc.334701","RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
An Image-Text Matching Method for Multi-Modal Robots
With the rapid development of artificial intelligence and deep learning, image-text matching has gradually become an important research topic in cross-modal fields. Achieving correct image-text matching requires a strong understanding of the correspondence between visual and textual information. In recent years, deep learning-based image-text matching methods have achieved significant success. However, image-text matching requires a deep understanding of intra-modal information and the exploration of fine-grained alignment between image regions and textual words. How to integrate these two aspects into a single model remains a challenge. Additionally, reducing the internal complexity of the model and effectively constructing and utilizing prior knowledge are also areas worth exploring, therefore addressing the issues of excessive computational complexity in existing fine-grained matching methods and the lack of multi-perspective matching.
期刊介绍:
The Journal of Organizational and End User Computing (JOEUC) provides a forum to information technology educators, researchers, and practitioners to advance the practice and understanding of organizational and end user computing. The journal features a major emphasis on how to increase organizational and end user productivity and performance, and how to achieve organizational strategic and competitive advantage. JOEUC publishes full-length research manuscripts, insightful research and practice notes, and case studies from all areas of organizational and end user computing that are selected after a rigorous blind review by experts in the field.