Deep Learning for Multimedia: Science or Technology?

J. Sang, Jun Yu, R. Jain, R. Lienhart, Peng Cui, Jiashi Feng
{"title":"Deep Learning for Multimedia: Science or Technology?","authors":"J. Sang, Jun Yu, R. Jain, R. Lienhart, Peng Cui, Jiashi Feng","doi":"10.1145/3240508.3243931","DOIUrl":null,"url":null,"abstract":"Deep learning has been successfully explored in addressing different multimedia topics recent years, ranging from object detection, semantic classification, entity annotation, to multimedia captioning, multimedia question answering and storytelling. Open source libraries and platforms such as Tensorflow, Caffe, MXnet significantly help promote the wide deployment of deep learning in solving real-world applications. On one hand, deep learning practitioners, while not necessary to understand the involved math behind, are able to set up and make use of a complex deep network. One recent deep learning tool based on Keras even provides the graphical interface to enable straightforward 'drag and drop' operation for deep learning programming. On the other hand, however, some general theoretical problems of learning such as the interpretation and generalization, have only achieved limited progress. Most deep learning papers published these days follow the pipeline of designing/modifying network structures - tuning parameters - reporting performance improvement in specific applications. We have even seen many deep learning application papers without one single equation. Theoretical interpretation and the science behind the study are largely ignored. While excited about the successful application of deep learning in classical and novel problems, we multimedia researchers are responsible to think and solve the fundamental topics in deep learning science. Prof. Guanrong Chen recently wrote an editorial note titled 'Science and Technology, not SciTech' [1]. This panel falls into similar discussion and aims to invite prestigious multimedia researchers and active deep learning practitioners to discuss the positioning of deep learning research now and in the future. Specifically, each panelist is asked to present their opinions on the following five questions: 1)How do you think the current phenomenon that deep learning applications are explosively growing, while the general theoretical problems remain slow progress? 2)Do you agree that deployment of deep learning techniques is getting easy (with a low barrier), while deep learning research is difficult (with a high barrier) 3)What do you think are the core problems for deep learning techniques? 4)What do you think are the core problems for deep learning science? 5)What's your suggestion on the multimedia research in the post-deep learning era?","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 26th ACM international conference on Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3240508.3243931","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Deep learning has been successfully explored in addressing different multimedia topics recent years, ranging from object detection, semantic classification, entity annotation, to multimedia captioning, multimedia question answering and storytelling. Open source libraries and platforms such as Tensorflow, Caffe, MXnet significantly help promote the wide deployment of deep learning in solving real-world applications. On one hand, deep learning practitioners, while not necessary to understand the involved math behind, are able to set up and make use of a complex deep network. One recent deep learning tool based on Keras even provides the graphical interface to enable straightforward 'drag and drop' operation for deep learning programming. On the other hand, however, some general theoretical problems of learning such as the interpretation and generalization, have only achieved limited progress. Most deep learning papers published these days follow the pipeline of designing/modifying network structures - tuning parameters - reporting performance improvement in specific applications. We have even seen many deep learning application papers without one single equation. Theoretical interpretation and the science behind the study are largely ignored. While excited about the successful application of deep learning in classical and novel problems, we multimedia researchers are responsible to think and solve the fundamental topics in deep learning science. Prof. Guanrong Chen recently wrote an editorial note titled 'Science and Technology, not SciTech' [1]. This panel falls into similar discussion and aims to invite prestigious multimedia researchers and active deep learning practitioners to discuss the positioning of deep learning research now and in the future. Specifically, each panelist is asked to present their opinions on the following five questions: 1)How do you think the current phenomenon that deep learning applications are explosively growing, while the general theoretical problems remain slow progress? 2)Do you agree that deployment of deep learning techniques is getting easy (with a low barrier), while deep learning research is difficult (with a high barrier) 3)What do you think are the core problems for deep learning techniques? 4)What do you think are the core problems for deep learning science? 5)What's your suggestion on the multimedia research in the post-deep learning era?
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
多媒体的深度学习:科学还是技术?
近年来,深度学习已经成功地用于解决不同的多媒体主题,从目标检测、语义分类、实体注释到多媒体字幕、多媒体问答和讲故事。开源库和平台,如Tensorflow、Caffe、MXnet,极大地促进了深度学习在解决实际应用中的广泛部署。一方面,深度学习从业者虽然不需要理解背后涉及的数学,但能够建立和使用复杂的深度网络。最近一个基于Keras的深度学习工具甚至提供了图形界面,可以为深度学习编程提供直接的“拖放”操作。然而,另一方面,学习的一些一般性理论问题,如解释和概括,只取得了有限的进展。最近发表的大多数深度学习论文都遵循了设计/修改网络结构——调整参数——在特定应用中报告性能改进的流程。我们甚至看到许多深度学习应用论文没有一个方程。这项研究背后的理论解释和科学在很大程度上被忽视了。在对深度学习在经典和新问题中的成功应用感到兴奋的同时,我们多媒体研究人员有责任思考和解决深度学习科学中的基本问题。陈冠荣教授最近写了一篇题为《科学技术,而不是科学技术》的社论[1]。本次专题讨论也是类似的讨论,旨在邀请著名的多媒体研究人员和活跃的深度学习实践者来讨论深度学习研究现在和未来的定位。具体来说,每个小组成员都被要求就以下五个问题发表自己的看法:1)您如何看待当前深度学习应用爆炸式增长,而一般理论问题仍然进展缓慢的现象?2)你是否同意深度学习技术的部署变得越来越容易(低门槛),而深度学习研究变得越来越困难(高门槛)3)你认为深度学习技术的核心问题是什么?4)你认为深度学习科学的核心问题是什么?
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
相关文献
The dynamical problems of the theory of elasticity and thermoelasticity
IF 0 Journal of Soviet MathematicsPub Date : 1977-03-01 DOI: 10.1007/BF01091837
V. Kupradze, T. Burchuladze
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
OSMO Session details: Multimodal-2 (Cross-Modal Translation) Pseudo Transfer with Marginalized Corrupted Attribute for Zero-shot Learning Session details: System-2 (Smart Multimedia Systems) ALERT
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1