Visual language integration: A survey and open challenges

IF 13.3 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Computer Science Review Pub Date : 2023-05-01 DOI:10.1016/j.cosrev.2023.100548
Sang-Min Park , Young-Gab Kim
{"title":"Visual language integration: A survey and open challenges","authors":"Sang-Min Park ,&nbsp;Young-Gab Kim","doi":"10.1016/j.cosrev.2023.100548","DOIUrl":null,"url":null,"abstract":"<div><p>With the recent development of deep learning<span><span> technology comes the wide use of artificial intelligence (AI) models in various domains. AI shows good performance for definite-purpose tasks, such as image recognition and </span>text classification. The recognition performance for every single task has become more accurate than feature engineering, enabling more work that could not be done before. In addition, with the development of generation technology (e.g., GPT-3), AI models are showing stable performances in each recognition and generation task. However, not many studies have focused on how to integrate these models efficiently to achieve comprehensive human interaction. Each model grows in size with improved performance, thereby consequently requiring more computing power and more complicated designs to train than before. This requirement increases the complexity of each model and requires more paired data, making model integration difficult. This study provides a survey on visual language integration with a hierarchical approach for reviewing the recent trends that have already been performed on AI models among research communities as the interaction component. We also compare herein the strengths of existing AI models and integration approaches and the limitations they face. Furthermore, we discuss the current related issues and which research is needed for visual language integration. More specifically, we identify four aspects of visual language integration models: multimodal learning, multi-task learning, end-to-end learning, and embodiment for embodied visual language interaction. Finally, we discuss some current open issues and challenges and conclude our survey by giving possible future directions.</span></p></div>","PeriodicalId":48633,"journal":{"name":"Computer Science Review","volume":null,"pages":null},"PeriodicalIF":13.3000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Science Review","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1574013723000151","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 1

Abstract

With the recent development of deep learning technology comes the wide use of artificial intelligence (AI) models in various domains. AI shows good performance for definite-purpose tasks, such as image recognition and text classification. The recognition performance for every single task has become more accurate than feature engineering, enabling more work that could not be done before. In addition, with the development of generation technology (e.g., GPT-3), AI models are showing stable performances in each recognition and generation task. However, not many studies have focused on how to integrate these models efficiently to achieve comprehensive human interaction. Each model grows in size with improved performance, thereby consequently requiring more computing power and more complicated designs to train than before. This requirement increases the complexity of each model and requires more paired data, making model integration difficult. This study provides a survey on visual language integration with a hierarchical approach for reviewing the recent trends that have already been performed on AI models among research communities as the interaction component. We also compare herein the strengths of existing AI models and integration approaches and the limitations they face. Furthermore, we discuss the current related issues and which research is needed for visual language integration. More specifically, we identify four aspects of visual language integration models: multimodal learning, multi-task learning, end-to-end learning, and embodiment for embodied visual language interaction. Finally, we discuss some current open issues and challenges and conclude our survey by giving possible future directions.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
视觉语言整合:一个调查和开放的挑战
随着深度学习技术的发展,人工智能(AI)模型在各个领域得到了广泛应用。人工智能在图像识别和文本分类等特定任务中表现出良好的性能。每一项任务的识别性能都比特征工程更准确,实现了以前无法完成的更多工作。此外,随着生成技术(如GPT-3)的发展,人工智能模型在每个识别和生成任务中都表现出稳定的性能。然而,没有多少研究关注如何有效地整合这些模型,以实现全面的人类互动。每个模型的大小都随着性能的提高而增长,因此需要比以前更多的计算能力和更复杂的设计来训练。这一要求增加了每个模型的复杂性,并需要更多的配对数据,从而使模型集成变得困难。这项研究提供了一项关于视觉语言集成的调查,采用分层方法来回顾研究社区中人工智能模型作为交互组件的最新趋势。我们还比较了现有人工智能模型和集成方法的优势及其面临的局限性。此外,我们还讨论了当前的相关问题以及视觉语言整合需要进行哪些研究。更具体地说,我们确定了视觉语言集成模型的四个方面:多模式学习、多任务学习、端到端学习和具体视觉语言交互的体现。最后,我们讨论了一些当前悬而未决的问题和挑战,并通过给出未来可能的方向来结束我们的调查。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Computer Science Review
Computer Science Review Computer Science-General Computer Science
CiteScore
32.70
自引率
0.00%
发文量
26
审稿时长
51 days
期刊介绍: Computer Science Review, a publication dedicated to research surveys and expository overviews of open problems in computer science, targets a broad audience within the field seeking comprehensive insights into the latest developments. The journal welcomes articles from various fields as long as their content impacts the advancement of computer science. In particular, articles that review the application of well-known Computer Science methods to other areas are in scope only if these articles advance the fundamental understanding of those methods.
期刊最新文献
A systematic review on security aspects of fog computing environment: Challenges, solutions and future directions A survey of deep learning techniques for detecting and recognizing objects in complex environments Intervention scenarios and robot capabilities for support, guidance and health monitoring for the elderly Resilience of deep learning applications: A systematic literature review of analysis and hardening techniques AI-driven cluster-based routing protocols in WSNs: A survey of fuzzy heuristics, metaheuristics, and machine learning models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1