Augment Machine Intelligence with Multimodal Information

Zhou Yu
{"title":"Augment Machine Intelligence with Multimodal Information","authors":"Zhou Yu","doi":"10.1145/3423325.3424123","DOIUrl":null,"url":null,"abstract":"Humans interact with other humans or the world through information from various channels including vision, audio, language, haptics, etc. To simulate intelligence, machines require similar abilities to process and combine information from different channels to acquire better situation awareness, better communication ability, and better decision-making ability. In this talk, we describe three projects. In the first study, we enable a robot to utilize both vision and audio information to achieve better user understanding [1]. Then we use incremental language generation to improve the robot's communication with a human. In the second study, we utilize multimodal history tracking to optimize policy planning in task-oriented visual dialogs. In the third project, we tackle the well-known trade-off between dialog response relevance and policy effectiveness in visual dialog generation. We propose a new machine learning procedure that alternates from supervised learning and reinforcement learning to optimum language generation and policy planning jointly in visual dialogs [2]. We will also cover some recent ongoing work on image synthesis through dialogs.","PeriodicalId":142947,"journal":{"name":"Proceedings of the 1st International Workshop on Multimodal Conversational AI","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1st International Workshop on Multimodal Conversational AI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3423325.3424123","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Humans interact with other humans or the world through information from various channels including vision, audio, language, haptics, etc. To simulate intelligence, machines require similar abilities to process and combine information from different channels to acquire better situation awareness, better communication ability, and better decision-making ability. In this talk, we describe three projects. In the first study, we enable a robot to utilize both vision and audio information to achieve better user understanding [1]. Then we use incremental language generation to improve the robot's communication with a human. In the second study, we utilize multimodal history tracking to optimize policy planning in task-oriented visual dialogs. In the third project, we tackle the well-known trade-off between dialog response relevance and policy effectiveness in visual dialog generation. We propose a new machine learning procedure that alternates from supervised learning and reinforcement learning to optimum language generation and policy planning jointly in visual dialogs [2]. We will also cover some recent ongoing work on image synthesis through dialogs.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用多模态信息增强机器智能
人类通过视觉、听觉、语言、触觉等各种渠道的信息与他人或世界互动。为了模拟智能,机器需要类似的能力来处理和组合来自不同渠道的信息,以获得更好的态势感知、更好的沟通能力和更好的决策能力。在这次演讲中,我们将介绍三个项目。在第一项研究中,我们使机器人能够同时利用视觉和音频信息来实现更好的用户理解[1]。然后,我们使用增量语言生成来改善机器人与人的沟通。在第二项研究中,我们利用多模态历史跟踪来优化面向任务的可视化对话中的策略规划。在第三个项目中,我们解决了在视觉对话生成中对话响应相关性和策略有效性之间众所周知的权衡。我们提出了一种新的机器学习过程,从监督学习和强化学习到视觉对话中的最佳语言生成和策略规划[2]。我们还将介绍一些最近正在进行的通过对话框进行图像合成的工作。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Automatic Speech Recognition and Natural Language Understanding for Emotion Detection in Multi-party Conversations FUN-Agent: A 2020 HUMAINE Competition Entrant Augment Machine Intelligence with Multimodal Information Assisted Speech to Enable Second Language Motivation and Design of the Conversational Components of DraftAgent for Human-Agent Negotiation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1