首页 > 最新文献

2020 IEEE International Symposium on Multimedia (ISM)最新文献

英文 中文
Can You All Look Here? Towards Determining Gaze Uniformity In Group Images 你们能看这里吗?群体图像凝视均匀性的确定
Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00024
Omkar N. Kulkarni, Vikram Patil, Shivam B. Parikh, Shashank Arora, P. Atrey
Since the advent of the smartphone, the number of group images taken every day is rising exponentially. The photographers' struggle is to make sure everyone looks at the camera while taking the picture. More specifically, in a group image, if everybody is not looking in the same direction, then the image's aesthetic quality and utility are depreciated. The photographer usually discards the image, and then subsequently, several images are taken to mitigate this issue. Usually, users have to manually check if the image is uniformly gazed, which is tedious and time-consuming. This paper proposes a method for classifying a given group image as uniformly gazed or nonuniformly gazed by calculating the Gaze Uniformity Index. We evaluate the proposed method on a subset of the ‘Images of Groups' dataset. The proposed method achieved an accuracy of 67%.
自从智能手机问世以来,每天拍摄的集体照片数量呈指数级增长。摄影师的努力是确保每个人在拍照时都看着相机。更具体地说,在一个群体形象中,如果每个人都没有朝同一个方向看,那么形象的审美质量和效用就会贬值。摄影师通常丢弃图像,然后随后拍摄几张图像以减轻这个问题。通常,用户必须手动检查图像是否均匀凝视,这是繁琐且耗时的。本文提出了一种通过计算注视均匀指数来对给定图像进行均匀注视和非均匀注视分类的方法。我们在“组的图像”数据集的一个子集上评估了所提出的方法。该方法的准确率为67%。
{"title":"Can You All Look Here? Towards Determining Gaze Uniformity In Group Images","authors":"Omkar N. Kulkarni, Vikram Patil, Shivam B. Parikh, Shashank Arora, P. Atrey","doi":"10.1109/ISM.2020.00024","DOIUrl":"https://doi.org/10.1109/ISM.2020.00024","url":null,"abstract":"Since the advent of the smartphone, the number of group images taken every day is rising exponentially. The photographers' struggle is to make sure everyone looks at the camera while taking the picture. More specifically, in a group image, if everybody is not looking in the same direction, then the image's aesthetic quality and utility are depreciated. The photographer usually discards the image, and then subsequently, several images are taken to mitigate this issue. Usually, users have to manually check if the image is uniformly gazed, which is tedious and time-consuming. This paper proposes a method for classifying a given group image as uniformly gazed or nonuniformly gazed by calculating the Gaze Uniformity Index. We evaluate the proposed method on a subset of the ‘Images of Groups' dataset. The proposed method achieved an accuracy of 67%.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"76 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116310016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An Effective Rotational Invariant Key-point Detector for Image Matching 一种有效的图像匹配旋转不变性关键点检测器
Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00043
Thanh Hong-Phuoc, L. Guan
Traditional detectors e.g. Harris, SIFT, SFOP... are known inflexible in different contexts as they solely target corners, blobs, junctions or other specific human-designed structures. To account for this inflexibility and additionally their unreliability under non-uniform lighting change, recently, a Sparse Coding based Key-point detector (SCK) relying on no human-designed structures and invariant to non-uniform illumination change was proposed. Yet, geometric transformations such as rotation are not considered in SCK. Thus, a novel Rotational Invariant SCK called RI-SCK is proposed in this paper. To make SCK rotational invariant, an effective use of multiple rotated versions of the original dictionary in the sparse coding step of SCK is proposed. A novel strength measure is also introduced for comparison of key-points across image pyramid levels if scale invariance is required. Experimental results on three public datasets have confirmed that significant gains in repeatability and matching score could be achieved by the proposed detector.
传统的探测器,如Harris, SIFT, sop…在不同的环境中是不灵活的,因为它们只针对角落、斑点、连接点或其他特定的人类设计的结构。针对这种不灵活性以及在非均匀光照变化下的不可靠性,最近提出了一种基于稀疏编码的不依赖人为设计结构且对非均匀光照变化不变性的关键点检测器(SCK)。然而,在SCK中不考虑旋转等几何变换。因此,本文提出了一种新的旋转不变量SCK - RI-SCK。为了使SCK具有旋转不变性,提出了在SCK稀疏编码步骤中有效利用原始字典的多个旋转版本的方法。在要求尺度不变性的情况下,还引入了一种新的强度度量来比较图像金字塔水平上的关键点。在三个公共数据集上的实验结果表明,该检测器在可重复性和匹配分数方面取得了显著的提高。
{"title":"An Effective Rotational Invariant Key-point Detector for Image Matching","authors":"Thanh Hong-Phuoc, L. Guan","doi":"10.1109/ISM.2020.00043","DOIUrl":"https://doi.org/10.1109/ISM.2020.00043","url":null,"abstract":"Traditional detectors e.g. Harris, SIFT, SFOP... are known inflexible in different contexts as they solely target corners, blobs, junctions or other specific human-designed structures. To account for this inflexibility and additionally their unreliability under non-uniform lighting change, recently, a Sparse Coding based Key-point detector (SCK) relying on no human-designed structures and invariant to non-uniform illumination change was proposed. Yet, geometric transformations such as rotation are not considered in SCK. Thus, a novel Rotational Invariant SCK called RI-SCK is proposed in this paper. To make SCK rotational invariant, an effective use of multiple rotated versions of the original dictionary in the sparse coding step of SCK is proposed. A novel strength measure is also introduced for comparison of key-points across image pyramid levels if scale invariance is required. Experimental results on three public datasets have confirmed that significant gains in repeatability and matching score could be achieved by the proposed detector.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116841336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Real-time Spatio-Temporal Action Localization in 360 Videos 360视频中的实时时空动作定位
Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00018
Bo Chen, A. Ali-Eldin, P. Shenoy, K. Nahrstedt
Spatio-temporal action localization of human actions in a video has been a popular topic over the past few years. It tries to localize the bounding boxes, the time span and the class of one action, which summarizes information in the video and helps humans understand it. Though many approaches have been proposed to solve this problem, these efforts have only focused on perspective videos. Unfortunately, perspective videos only cover a small field-of-view (FOV), which limits the capability of action localization. In this paper, we develop a comprehensive approach to real-time spatio-temporallocalization that can be used to detect actions in 360 videos. We create two datasets named UCF-101-24-360 and JHMDB-21-360 for our evaluation. Our experiments show that our method consistently outperforms other competing approaches and achieves a real-time processing speed of 15fps for 360 videos.
在过去的几年里,视频中人类动作的时空定位一直是一个热门话题。它试图定位边界框、时间跨度和一个动作的类别,从而总结视频中的信息,帮助人类理解它。虽然已经提出了许多方法来解决这个问题,但这些努力只集中在透视视频上。不幸的是,视角视频只覆盖很小的视场(FOV),这限制了动作定位的能力。在本文中,我们开发了一种全面的实时时空定位方法,可用于检测360视频中的动作。我们创建了两个名为UCF-101-24-360和JHMDB-21-360的数据集进行评估。我们的实验表明,我们的方法始终优于其他竞争方法,并实现了15fps的360视频实时处理速度。
{"title":"Real-time Spatio-Temporal Action Localization in 360 Videos","authors":"Bo Chen, A. Ali-Eldin, P. Shenoy, K. Nahrstedt","doi":"10.1109/ISM.2020.00018","DOIUrl":"https://doi.org/10.1109/ISM.2020.00018","url":null,"abstract":"Spatio-temporal action localization of human actions in a video has been a popular topic over the past few years. It tries to localize the bounding boxes, the time span and the class of one action, which summarizes information in the video and helps humans understand it. Though many approaches have been proposed to solve this problem, these efforts have only focused on perspective videos. Unfortunately, perspective videos only cover a small field-of-view (FOV), which limits the capability of action localization. In this paper, we develop a comprehensive approach to real-time spatio-temporallocalization that can be used to detect actions in 360 videos. We create two datasets named UCF-101-24-360 and JHMDB-21-360 for our evaluation. Our experiments show that our method consistently outperforms other competing approaches and achieves a real-time processing speed of 15fps for 360 videos.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"601 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116177255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multimodal Classification of Emotions in Latin Music 拉丁音乐中情感的多模态分类
Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00038
L. G. Catharin, Rafael P. Ribeiro, C. Silla, Yandre M. G. Costa, V. D. Feltrim
In this study we classified the songs of the Latin Music Mood Database (LMMD) according to their emotion using two approaches: single-step classification, which consists of classifying the songs by emotion, valence, arousal and quadrant; and multistep classification, which consists of using the predictions of the best valence and arousal classifiers to classify quadrants and the best valence, arousal and quadrant predictions as features to classify emotions. Our hypothesis is that breaking the emotion classification in smaller problems would reduce complexity and improve results. Our best single-step emotion and valence classifiers used multimodal sets of features extracted from lyrics and audio. Our best arousal classifier used features extracted from lyrics and SMOTE to mitigate the dataset imbalance. The proposed multistep emotion classifier, which uses the predictions of a multistep quadrant classifier, improved the single-step classifier performance, reaching 0.605 of mean f-measure. These results show that using valence, arousal, and consequently, quadrant information can improve the prediction of specific emotions.
本文采用两种方法对拉丁音乐情绪数据库(LMMD)中的歌曲进行情绪分类:单步分类,即按情绪、效价、唤醒和象限对歌曲进行分类;多步分类,包括使用最佳效价和唤醒分类器的预测对象限进行分类,并将最佳效价,唤醒和象限预测作为分类情绪的特征。我们的假设是,在较小的问题中打破情绪分类将降低复杂性并改善结果。我们最好的单步情绪和效价分类器使用从歌词和音频中提取的多模态特征集。我们最好的唤醒分类器使用从歌词和SMOTE中提取的特征来减轻数据集的不平衡。提出的多步情绪分类器利用多步象限分类器的预测结果,提高了单步分类器的性能,平均f-measure达到0.605。这些结果表明,使用效价、唤醒以及相应的象限信息可以提高对特定情绪的预测。
{"title":"Multimodal Classification of Emotions in Latin Music","authors":"L. G. Catharin, Rafael P. Ribeiro, C. Silla, Yandre M. G. Costa, V. D. Feltrim","doi":"10.1109/ISM.2020.00038","DOIUrl":"https://doi.org/10.1109/ISM.2020.00038","url":null,"abstract":"In this study we classified the songs of the Latin Music Mood Database (LMMD) according to their emotion using two approaches: single-step classification, which consists of classifying the songs by emotion, valence, arousal and quadrant; and multistep classification, which consists of using the predictions of the best valence and arousal classifiers to classify quadrants and the best valence, arousal and quadrant predictions as features to classify emotions. Our hypothesis is that breaking the emotion classification in smaller problems would reduce complexity and improve results. Our best single-step emotion and valence classifiers used multimodal sets of features extracted from lyrics and audio. Our best arousal classifier used features extracted from lyrics and SMOTE to mitigate the dataset imbalance. The proposed multistep emotion classifier, which uses the predictions of a multistep quadrant classifier, improved the single-step classifier performance, reaching 0.605 of mean f-measure. These results show that using valence, arousal, and consequently, quadrant information can improve the prediction of specific emotions.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130585477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SEAWARE: Semantic Aware View Prediction System for 360-degree Video Streaming SEAWARE: 360度视频流语义感知视图预测系统
Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00016
Jounsup Park, Mingyuan Wu, Kuan-Ying Lee, Bo Chen, K. Nahrstedt, M. Zink, R. Sitaraman
Future view prediction for a 360-degree video streaming system is important to save the network bandwidth and improve the Quality of Experience (QoE). Historical view data of a single viewer and multiple viewers have been used for future view prediction. Video semantic information is also useful to predict the viewer's future behavior. However, extracting video semantic information requires powerful computing hardware and large memory space to perform deep learning-based video analysis. It is not a desirable condition for most of client devices, such as small mobile devices or Head Mounted Display (HMD). Therefore, we develop an approach where video semantic analysis is executed on the media server, and the analysis results are shared with clients via the Semantic Flow Descriptor (SFD) and View-Object State Machine (VOSM). SFD and VOSM become new descriptive additions of the Media Presentation Description (MPD) and Spatial Relation Description (SRD) to support 360-degree video streaming. Using the semantic-based approach, we design the Semantic-Aware View Prediction System (SEAWARE) to improve the overall view prediction performance. The evaluation results of 360-degree videos and real HMD view traces show that the SEAWARE system improves the view prediction performance and streams high-quality video with limited network bandwidth.
360度视频流系统的未来视图预测对于节省网络带宽和提高体验质量(QoE)具有重要意义。单个查看器和多个查看器的历史视图数据已用于未来视图预测。视频语义信息对于预测观看者的未来行为也很有用。然而,基于深度学习的视频分析需要强大的计算硬件和较大的存储空间来提取视频语义信息。对于大多数客户端设备,例如小型移动设备或头戴式显示器(HMD),这不是理想的条件。因此,我们开发了一种在媒体服务器上执行视频语义分析的方法,并通过语义流描述符(SFD)和视图-对象状态机(VOSM)与客户端共享分析结果。SFD和VOSM成为媒体表示描述(MPD)和空间关系描述(SRD)的新描述性补充,以支持360度视频流。采用基于语义的方法,设计了语义感知视图预测系统(SEAWARE),以提高整体视图预测性能。对360度视频和真实HMD视点轨迹的评估结果表明,SEAWARE系统提高了视点预测性能,并在有限的网络带宽下传输了高质量的视频。
{"title":"SEAWARE: Semantic Aware View Prediction System for 360-degree Video Streaming","authors":"Jounsup Park, Mingyuan Wu, Kuan-Ying Lee, Bo Chen, K. Nahrstedt, M. Zink, R. Sitaraman","doi":"10.1109/ISM.2020.00016","DOIUrl":"https://doi.org/10.1109/ISM.2020.00016","url":null,"abstract":"Future view prediction for a 360-degree video streaming system is important to save the network bandwidth and improve the Quality of Experience (QoE). Historical view data of a single viewer and multiple viewers have been used for future view prediction. Video semantic information is also useful to predict the viewer's future behavior. However, extracting video semantic information requires powerful computing hardware and large memory space to perform deep learning-based video analysis. It is not a desirable condition for most of client devices, such as small mobile devices or Head Mounted Display (HMD). Therefore, we develop an approach where video semantic analysis is executed on the media server, and the analysis results are shared with clients via the Semantic Flow Descriptor (SFD) and View-Object State Machine (VOSM). SFD and VOSM become new descriptive additions of the Media Presentation Description (MPD) and Spatial Relation Description (SRD) to support 360-degree video streaming. Using the semantic-based approach, we design the Semantic-Aware View Prediction System (SEAWARE) to improve the overall view prediction performance. The evaluation results of 360-degree videos and real HMD view traces show that the SEAWARE system improves the view prediction performance and streams high-quality video with limited network bandwidth.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131166589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Deriving Strategies for the Evaluation of Spaced Repetition Learning in Mobile Learning Applications from Learning Analytics 基于学习分析的移动学习应用中间隔重复学习的评估策略
Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00049
Florian Schimanke, R. Mertens
Evaluating the success of learning technologies with respect to improvement in the learners' abilities and knowledge is not an easy task. The problem is formed by the existence of many different definitions with different perspectives like grades on the one hand and workplace performance on the other. This paper reviews definitions from the literature with the aim to find a suitable definition for the evaluation of learning success in spaced repetition based mobile learning for knowledge improvement. It also borrows approaches from learning analytics to tackle the fact that learner groups are heterogeneous which leads to the need of analyzing learning success differently in different groups of learners.
评价学习技术在提高学习者的能力和知识方面的成功并不是一件容易的事。这个问题是由许多不同的定义和不同的视角形成的,比如一方面是成绩,另一方面是工作场所的表现。本文回顾了文献中的定义,旨在为基于间隔重复的移动学习中学习成功的评价找到一个合适的定义。它还借鉴了学习分析的方法来解决学习者群体是异质的这一事实,这导致需要对不同学习者群体的学习成功进行不同的分析。
{"title":"Deriving Strategies for the Evaluation of Spaced Repetition Learning in Mobile Learning Applications from Learning Analytics","authors":"Florian Schimanke, R. Mertens","doi":"10.1109/ISM.2020.00049","DOIUrl":"https://doi.org/10.1109/ISM.2020.00049","url":null,"abstract":"Evaluating the success of learning technologies with respect to improvement in the learners' abilities and knowledge is not an easy task. The problem is formed by the existence of many different definitions with different perspectives like grades on the one hand and workplace performance on the other. This paper reviews definitions from the literature with the aim to find a suitable definition for the evaluation of learning success in spaced repetition based mobile learning for knowledge improvement. It also borrows approaches from learning analytics to tackle the fact that learner groups are heterogeneous which leads to the need of analyzing learning success differently in different groups of learners.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127861740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Two types of flows admission control method for maximizing all user satisfaction considering seek-bar operation 考虑搜索栏操作的两种流量允许控制方法,以最大化所有用户满意度
Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00048
Keisuke Ode, S. Miyata
In recent years, the available network bandwidth is decreased by increasing the mobile devices such as a smartphones or tablets. Quality of Service (QoS) control is required to guarantee the communication quality for users of the network. As one of QoS control techniques, which judges a new arrival streaming application (=flow) can be accommodate in the network has been proposed. Conventional admission control methods that focus on the user's cooperative behaviour have been proposed. In general, some users use the video navigation tools like a seek-bar to jump to different time position because they want to watch some specific scenes. However, this conventional method does not assume the user's behaviour. In this paper, we propose an admission control to maximize user satisfaction by considering user behaviour to reduce the video time. We evaluate our proposed method by numerical analysis using queuing theory and show the effectiveness of our proposed method.
近年来,由于智能手机或平板电脑等移动设备的增加,可用的网络带宽减少了。为了保证网络用户的通信质量,需要QoS (Quality of Service)控制。作为QoS控制技术之一,提出了一种判断网络是否能够容纳新到达的流应用(=流)的方法。传统的准入控制方法关注的是用户的合作行为。一般情况下,有些用户因为想观看特定的场景,会使用搜索栏等视频导航工具跳转到不同的时间位置。然而,这种传统的方法并没有假设用户的行为。在本文中,我们提出了一种允许控制,通过考虑用户行为来最大化用户满意度,从而减少视频时间。我们用排队论的数值分析来评价我们提出的方法,并证明了我们提出的方法的有效性。
{"title":"Two types of flows admission control method for maximizing all user satisfaction considering seek-bar operation","authors":"Keisuke Ode, S. Miyata","doi":"10.1109/ISM.2020.00048","DOIUrl":"https://doi.org/10.1109/ISM.2020.00048","url":null,"abstract":"In recent years, the available network bandwidth is decreased by increasing the mobile devices such as a smartphones or tablets. Quality of Service (QoS) control is required to guarantee the communication quality for users of the network. As one of QoS control techniques, which judges a new arrival streaming application (=flow) can be accommodate in the network has been proposed. Conventional admission control methods that focus on the user's cooperative behaviour have been proposed. In general, some users use the video navigation tools like a seek-bar to jump to different time position because they want to watch some specific scenes. However, this conventional method does not assume the user's behaviour. In this paper, we propose an admission control to maximize user satisfaction by considering user behaviour to reduce the video time. We evaluate our proposed method by numerical analysis using queuing theory and show the effectiveness of our proposed method.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115576733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structured Pruning of LSTMs via Eigenanalysis and Geometric Median for Mobile Multimedia and Deep Learning Applications 基于特征分析和几何中值的lstm结构化剪枝在移动多媒体和深度学习中的应用
Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00028
Nikolaos Gkalelis, V. Mezaris
In this paper, a novel structured pruning approach for learning efficient long short-term memory (LSTM) network architectures is proposed. More specifically, the eigenvalues of the covariance matrix associated with the responses of each LSTM layer are computed and utilized to quantify the layers' redundancy and automatically obtain an individual pruning rate for each layer. Subsequently, a Geometric Median based (GM-based) criterion is used to identify and prune in a structured way the most redundant LSTM units, realizing the pruning rates derived in the previous step. The experimental evaluation on the Penn Treebank text corpus and the large-scale YouTube-8M audio-video dataset for the tasks of word-level prediction and visual concept detection, respectively, shows the efficacy of the proposed approach1.
本文提出了一种学习高效长短期记忆(LSTM)网络结构的结构化剪枝方法。更具体地说,计算与每一层LSTM响应相关的协方差矩阵的特征值,并利用其量化各层的冗余度,自动获得每一层的单个剪枝率。随后,采用基于几何中值(Geometric Median based, GM-based)的准则对冗余度最大的LSTM单元进行结构化识别和剪枝,实现上一步导出的剪枝率。在Penn Treebank文本语料库和大规模YouTube-8M音视频数据集上分别对词级预测和视觉概念检测任务进行了实验评估,结果表明了该方法的有效性1。
{"title":"Structured Pruning of LSTMs via Eigenanalysis and Geometric Median for Mobile Multimedia and Deep Learning Applications","authors":"Nikolaos Gkalelis, V. Mezaris","doi":"10.1109/ISM.2020.00028","DOIUrl":"https://doi.org/10.1109/ISM.2020.00028","url":null,"abstract":"In this paper, a novel structured pruning approach for learning efficient long short-term memory (LSTM) network architectures is proposed. More specifically, the eigenvalues of the covariance matrix associated with the responses of each LSTM layer are computed and utilized to quantify the layers' redundancy and automatically obtain an individual pruning rate for each layer. Subsequently, a Geometric Median based (GM-based) criterion is used to identify and prune in a structured way the most redundant LSTM units, realizing the pruning rates derived in the previous step. The experimental evaluation on the Penn Treebank text corpus and the large-scale YouTube-8M audio-video dataset for the tasks of word-level prediction and visual concept detection, respectively, shows the efficacy of the proposed approach1.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114617084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
SumBot: Summarize Videos Like a Human SumBot:像人类一样总结视频
Pub Date : 2020-12-01 DOI: 10.1109/ISM.2020.00044
Hongxiang Gu, Stefano Petrangeli, Viswanathan Swaminathan
Video currently accounts for 70% of all internet traffic and this number is expected to continue to grow. Each minute, more than 500 hours worth of videos are uploaded on YouTube. Generating engaging short videos out of the raw captured content is often a time-consuming and cumbersome activity for content creators. Existing ML- based video summarization and highlight generation approaches often neglect the fact that many summarization tasks require specific domain knowledge of the video content, and that human editors often follow a semistructured template when creating the summary (e.g. to create the highlights for a sport event). We therefore address in this paper the challenge of creating domain-specific summaries, by actively leveraging this editorial template. Particularly, we present an Inverse Reinforcement Learning (IRL)-based framework that can automatically learn the hidden structure or template followed by a human expert when generating a video summary for a specific domain. Particularly, we propose to formulate the video summarization task as a Markov Decision Process, where each state is a combination of the features of the video shots added to the summary, and the possible actions are to include/remove a shot from the summary or leave it as is. Using a set of domain-specific human-generated video highlights as examples, we employ a Maximum Entropy IRL algorithm to learn the implicit reward function governing the summary generation process. The learned reward function is then used to train an RL-agent that can produce video summaries for a specific domain, closely resembling what a human expert would create. Learning from expert demonstrations allows our approach to be applicable to any domain or editorial styles. To demonstrate the superior performance of our approach, we employ it to the task of soccer games highlight generation and show that it outperforms other state-of-the-art methods, both quantitatively and qualitatively.
视频目前占所有互联网流量的70%,预计这一数字将继续增长。每分钟,超过500个小时的视频被上传到YouTube上。对于内容创作者来说,从原始捕获的内容中生成引人入胜的短视频通常是一项耗时且繁琐的活动。现有的基于机器学习的视频摘要和突出显示生成方法往往忽略了这样一个事实,即许多摘要任务需要视频内容的特定领域知识,并且人类编辑在创建摘要时通常遵循半结构化模板(例如为体育赛事创建突出显示)。因此,我们在本文中通过积极地利用这个编辑模板来解决创建特定领域摘要的挑战。特别是,我们提出了一个基于逆强化学习(IRL)的框架,该框架可以在为特定领域生成视频摘要时自动学习人类专家遵循的隐藏结构或模板。特别是,我们建议将视频摘要任务制定为马尔可夫决策过程,其中每个状态是添加到摘要中的视频镜头特征的组合,可能的动作是从摘要中添加/删除一个镜头或保持不变。以一组特定领域的人工生成视频为例,我们采用最大熵IRL算法来学习控制摘要生成过程的隐式奖励函数。然后,学习到的奖励函数被用来训练一个强化学习代理,该代理可以为特定领域生成视频摘要,与人类专家创建的视频摘要非常相似。从专家演示中学习可以使我们的方法适用于任何领域或编辑风格。为了证明我们的方法的卓越性能,我们将其用于足球比赛的突出显示生成任务,并表明它在数量和质量上都优于其他最先进的方法。
{"title":"SumBot: Summarize Videos Like a Human","authors":"Hongxiang Gu, Stefano Petrangeli, Viswanathan Swaminathan","doi":"10.1109/ISM.2020.00044","DOIUrl":"https://doi.org/10.1109/ISM.2020.00044","url":null,"abstract":"Video currently accounts for 70% of all internet traffic and this number is expected to continue to grow. Each minute, more than 500 hours worth of videos are uploaded on YouTube. Generating engaging short videos out of the raw captured content is often a time-consuming and cumbersome activity for content creators. Existing ML- based video summarization and highlight generation approaches often neglect the fact that many summarization tasks require specific domain knowledge of the video content, and that human editors often follow a semistructured template when creating the summary (e.g. to create the highlights for a sport event). We therefore address in this paper the challenge of creating domain-specific summaries, by actively leveraging this editorial template. Particularly, we present an Inverse Reinforcement Learning (IRL)-based framework that can automatically learn the hidden structure or template followed by a human expert when generating a video summary for a specific domain. Particularly, we propose to formulate the video summarization task as a Markov Decision Process, where each state is a combination of the features of the video shots added to the summary, and the possible actions are to include/remove a shot from the summary or leave it as is. Using a set of domain-specific human-generated video highlights as examples, we employ a Maximum Entropy IRL algorithm to learn the implicit reward function governing the summary generation process. The learned reward function is then used to train an RL-agent that can produce video summaries for a specific domain, closely resembling what a human expert would create. Learning from expert demonstrations allows our approach to be applicable to any domain or editorial styles. To demonstrate the superior performance of our approach, we employ it to the task of soccer games highlight generation and show that it outperforms other state-of-the-art methods, both quantitatively and qualitatively.","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"330 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122831691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
2020 IEEE International Symposium on Multimedia ISM 2020 2020 IEEE多媒体ISM国际研讨会
Pub Date : 2020-12-01 DOI: 10.1109/ism.2020.00001
{"title":"2020 IEEE International Symposium on Multimedia ISM 2020","authors":"","doi":"10.1109/ism.2020.00001","DOIUrl":"https://doi.org/10.1109/ism.2020.00001","url":null,"abstract":"","PeriodicalId":120972,"journal":{"name":"2020 IEEE International Symposium on Multimedia (ISM)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126224254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2020 IEEE International Symposium on Multimedia (ISM)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1