首页 > 最新文献

Proceedings of the 26th ACM international conference on Multimedia最新文献

英文 中文
A Unified Framework for Multimodal Domain Adaptation 多模态域自适应的统一框架
Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240633
Fan Qi, Xiaoshan Yang, Changsheng Xu
Domain adaptation aims to train a model on labeled data from a source domain while minimizing test error on a target domain. Most of existing domain adaptation methods only focus on reducing domain shift of single-modal data. In this paper, we consider a new problem of multimodal domain adaptation and propose a unified framework to solve it. The proposed multimodal domain adaptation neural networks(MDANN) consist of three important modules. (1) A covariant multimodal attention is designed to learn a common feature representation for multiple modalities. (2) A fusion module adaptively fuses attended features of different modalities. (3) Hybrid domain constraints are proposed to comprehensively learn domain-invariant features by constraining single modal features, fused features, and attention scores. Through jointly attending and fusing under an adversarial objective, the most discriminative and domain-adaptive parts of the features are adaptively fused together. Extensive experimental results on two real-world cross-domain applications (emotion recognition and cross-media retrieval) demonstrate the effectiveness of the proposed method.
领域自适应的目的是在源领域的标记数据上训练模型,同时最小化目标领域的测试误差。现有的域自适应方法大多只关注减少单模态数据的域移。本文考虑了一个新的多模态域自适应问题,并提出了一个统一的框架来解决该问题。提出的多模态域自适应神经网络(MDANN)由三个重要模块组成。(1)设计了协变多模态注意来学习多模态的共同特征表示。(2)融合模块自适应融合不同模态的出席特征。(3)提出混合域约束,通过约束单模态特征、融合特征和注意分数,综合学习域不变特征。通过对抗性目标下的共同参与和融合,将特征中最具判别性和最具领域适应性的部分自适应地融合在一起。在两个现实世界的跨领域应用(情感识别和跨媒体检索)上的大量实验结果证明了该方法的有效性。
{"title":"A Unified Framework for Multimodal Domain Adaptation","authors":"Fan Qi, Xiaoshan Yang, Changsheng Xu","doi":"10.1145/3240508.3240633","DOIUrl":"https://doi.org/10.1145/3240508.3240633","url":null,"abstract":"Domain adaptation aims to train a model on labeled data from a source domain while minimizing test error on a target domain. Most of existing domain adaptation methods only focus on reducing domain shift of single-modal data. In this paper, we consider a new problem of multimodal domain adaptation and propose a unified framework to solve it. The proposed multimodal domain adaptation neural networks(MDANN) consist of three important modules. (1) A covariant multimodal attention is designed to learn a common feature representation for multiple modalities. (2) A fusion module adaptively fuses attended features of different modalities. (3) Hybrid domain constraints are proposed to comprehensively learn domain-invariant features by constraining single modal features, fused features, and attention scores. Through jointly attending and fusing under an adversarial objective, the most discriminative and domain-adaptive parts of the features are adaptively fused together. Extensive experimental results on two real-world cross-domain applications (emotion recognition and cross-media retrieval) demonstrate the effectiveness of the proposed method.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132941728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
Session details: FF-6 会话详情:FF-6
Pub Date : 2018-10-15 DOI: 10.1145/3286943
B. Huet
{"title":"Session details: FF-6","authors":"B. Huet","doi":"10.1145/3286943","DOIUrl":"https://doi.org/10.1145/3286943","url":null,"abstract":"","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133034224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
End2End Semantic Segmentation for 3D Indoor Scenes 三维室内场景的End2End语义分割
Pub Date : 2018-10-15 DOI: 10.1145/3240508.3243933
Na Zhao
This research is concerned with semantic segmentation of 3D point clouds arising from videos of 3D indoor scenes. It is an important building block of 3D scene understanding and has promising applications such as augmented reality and robotics. Although various deep learning based approaches have been proposed to replicate the success of 2D semantic segmentation in 3D domain, they either result in severe information loss or fail to model the geometric structures well. In this paper, we aim to model the local and global geometric structures of 3D scenes by designing an end-to-end 3D semantic segmentation framework. It captures the local geometries from point-level feature learning and voxel-level aggregation, models the global structures via 3D CNN, and enforces label consistency with high-order CRF. Through preliminary experiments conducted on two indoor datasets, we describe our insights on the proposed approach, and present some directions to be pursued in the future.
本文研究了三维室内场景视频中产生的三维点云的语义分割问题。它是3D场景理解的重要组成部分,在增强现实和机器人等领域有着广阔的应用前景。虽然已经提出了各种基于深度学习的方法来复制3D领域中2D语义分割的成功,但它们要么导致严重的信息丢失,要么不能很好地模拟几何结构。在本文中,我们旨在通过设计一个端到端的3D语义分割框架来建模3D场景的局部和全局几何结构。它从点级特征学习和体素级聚合中捕获局部几何形状,通过3D CNN建模全局结构,并通过高阶CRF强制标签一致性。通过在两个室内数据集上进行的初步实验,我们描述了我们对所提出方法的见解,并提出了未来要追求的一些方向。
{"title":"End2End Semantic Segmentation for 3D Indoor Scenes","authors":"Na Zhao","doi":"10.1145/3240508.3243933","DOIUrl":"https://doi.org/10.1145/3240508.3243933","url":null,"abstract":"This research is concerned with semantic segmentation of 3D point clouds arising from videos of 3D indoor scenes. It is an important building block of 3D scene understanding and has promising applications such as augmented reality and robotics. Although various deep learning based approaches have been proposed to replicate the success of 2D semantic segmentation in 3D domain, they either result in severe information loss or fail to model the geometric structures well. In this paper, we aim to model the local and global geometric structures of 3D scenes by designing an end-to-end 3D semantic segmentation framework. It captures the local geometries from point-level feature learning and voxel-level aggregation, models the global structures via 3D CNN, and enforces label consistency with high-order CRF. Through preliminary experiments conducted on two indoor datasets, we describe our insights on the proposed approach, and present some directions to be pursued in the future.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129840592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
SoMin.ai: Social Multimedia Influencer Discovery Marketplace SoMin。ai:社交多媒体影响者发现市场
Pub Date : 2018-10-15 DOI: 10.1145/3240508.3241387
Aleksandr Farseev, Kirill Lepikhin, H. Schwartz, Eu Khoon Ang, K. Powar
In this technical demonstration, we showcase the first ai-driven social multimedia influencer discovery marketplace, called SoMin. The platform combines advanced data analytics and behavioral science to help marketers find, understand their audience and engage the most relevant social media micro-influencers at a large scale. SoMin harvests brand-specific life social multimedia streams in a specified market domain, followed by rich analytics and semantic-based influencer search. The Individual User Profiling models extrapolate the key personal characteristics of the brand audience, while the influencer retrieval engine reveals the semantically-matching social media influencers to the platform users. The influencers are matched in terms of both their-posted content and social media audiences, while the evaluation results demonstrate an excellent performance of the proposed recommender framework. By leveraging influencers at a large scale, marketers will be able to execute more effective marketing campaigns of higher trust and at a lower cost.
在这个技术演示中,我们展示了第一个人工智能驱动的社交多媒体影响者发现市场,称为SoMin。该平台结合了先进的数据分析和行为科学,帮助营销人员找到、了解他们的受众,并大规模地吸引最相关的社交媒体微影响者。SoMin在特定的市场领域收集品牌特定的生活社交多媒体流,然后进行丰富的分析和基于语义的影响者搜索。个人用户分析模型推断品牌受众的关键个人特征,而网红检索引擎则向平台用户揭示语义匹配的社交媒体网红。影响者在其发布的内容和社交媒体受众方面都是匹配的,而评估结果表明所提议的推荐框架具有出色的性能。通过大规模地利用有影响力的人,营销人员将能够以更低的成本执行更高信任的更有效的营销活动。
{"title":"SoMin.ai: Social Multimedia Influencer Discovery Marketplace","authors":"Aleksandr Farseev, Kirill Lepikhin, H. Schwartz, Eu Khoon Ang, K. Powar","doi":"10.1145/3240508.3241387","DOIUrl":"https://doi.org/10.1145/3240508.3241387","url":null,"abstract":"In this technical demonstration, we showcase the first ai-driven social multimedia influencer discovery marketplace, called SoMin. The platform combines advanced data analytics and behavioral science to help marketers find, understand their audience and engage the most relevant social media micro-influencers at a large scale. SoMin harvests brand-specific life social multimedia streams in a specified market domain, followed by rich analytics and semantic-based influencer search. The Individual User Profiling models extrapolate the key personal characteristics of the brand audience, while the influencer retrieval engine reveals the semantically-matching social media influencers to the platform users. The influencers are matched in terms of both their-posted content and social media audiences, while the evaluation results demonstrate an excellent performance of the proposed recommender framework. By leveraging influencers at a large scale, marketers will be able to execute more effective marketing campaigns of higher trust and at a lower cost.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132789930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Facial Expression Recognition Enhanced by Thermal Images through Adversarial Learning 基于对抗学习的热图像面部表情识别
Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240608
Bowen Pan, Shangfei Wang
Currently, fusing visible and thermal images for facial expression recognition requires two modalities during both training and testing. Visible cameras are commonly used in real-life applications, and thermal cameras are typically only available in lab situations due to their high price. Thermal imaging for facial expression recognition is not frequently used in real-world situations. To address this, we propose a novel thermally enhanced facial expression recognition method which uses thermal images as privileged information to construct better visible feature representation and improved classifiers by incorporating adversarial learning and similarity constraints during training. Specifically, we train two deep neural networks from visible images and thermal images. We impose adversarial loss to enforce statistical similarity between the learned representations of two modalities, and a similarity constraint to regulate the mapping functions from visible and thermal representation to expressions. Thus, thermal images are leveraged to simultaneously improve visible feature representation and classification during training. To mimic real-world scenarios, only visible images are available during testing. We further extend the proposed expression recognition method for partially unpaired data to explore thermal images' supplementary role in visible facial expression recognition when visible images and thermal images are not synchronously recorded. Experimental results on the MAHNOB Laughter database demonstrate that our proposed method can effectively regularize visible representation and expression classifiers with the help of thermal images, achieving state-of-the-art recognition performance.
目前,用于人脸表情识别的可见图像和热图像融合在训练和测试过程中需要两种模式。可见光相机通常用于现实生活中的应用,而热像仪由于价格高昂,通常只能在实验室中使用。面部表情识别的热成像技术在现实世界中并不常用。为了解决这个问题,我们提出了一种新的热增强面部表情识别方法,该方法利用热图像作为特权信息来构建更好的可见特征表示,并通过在训练过程中结合对抗学习和相似约束来改进分类器。具体来说,我们从可见图像和热图像中训练两个深度神经网络。我们施加对抗损失来加强两种模态的学习表示之间的统计相似性,并施加相似性约束来调节从可见和热表示到表达式的映射函数。因此,在训练过程中利用热图像来同时改进可见特征表示和分类。为了模拟真实的场景,在测试期间只有可见的图像可用。我们进一步扩展了部分未配对数据的表情识别方法,探索了当可见图像和热图像未同步记录时,热图像在可见面部表情识别中的补充作用。在MAHNOB笑声数据库上的实验结果表明,我们提出的方法可以有效地利用热图像对可见表示和表达分类器进行正则化,达到了最先进的识别性能。
{"title":"Facial Expression Recognition Enhanced by Thermal Images through Adversarial Learning","authors":"Bowen Pan, Shangfei Wang","doi":"10.1145/3240508.3240608","DOIUrl":"https://doi.org/10.1145/3240508.3240608","url":null,"abstract":"Currently, fusing visible and thermal images for facial expression recognition requires two modalities during both training and testing. Visible cameras are commonly used in real-life applications, and thermal cameras are typically only available in lab situations due to their high price. Thermal imaging for facial expression recognition is not frequently used in real-world situations. To address this, we propose a novel thermally enhanced facial expression recognition method which uses thermal images as privileged information to construct better visible feature representation and improved classifiers by incorporating adversarial learning and similarity constraints during training. Specifically, we train two deep neural networks from visible images and thermal images. We impose adversarial loss to enforce statistical similarity between the learned representations of two modalities, and a similarity constraint to regulate the mapping functions from visible and thermal representation to expressions. Thus, thermal images are leveraged to simultaneously improve visible feature representation and classification during training. To mimic real-world scenarios, only visible images are available during testing. We further extend the proposed expression recognition method for partially unpaired data to explore thermal images' supplementary role in visible facial expression recognition when visible images and thermal images are not synchronously recorded. Experimental results on the MAHNOB Laughter database demonstrate that our proposed method can effectively regularize visible representation and expression classifiers with the help of thermal images, achieving state-of-the-art recognition performance.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132903195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
FoV-Aware Edge Caching for Adaptive 360° Video Streaming 自适应360°视频流的视场感知边缘缓存
Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240680
A. Mahzari, A. T. Nasrabadi, Aliehsan Samiei, R. Prakash
In recent years, there has been growing popularity of Virtual Reality (VR), enabled by technologies like 360° video streaming. Streaming 360° video is extremely challenging due to high bandwidth and low latency requirements. Some VR solutions employ adaptive 360° video streaming which tries to reduce bandwidth consumption by only streaming high resolution video for user's Field of View (FoV). FoV is the part of the video which is being viewed by the user at any given time. Although FoV-adaptive 360° video streaming has been helpful in reducing bandwidth requirements, streaming 360° video from distant content servers is still challenging due to network latency. Caching popular content close to the end users not only decreases network latency, but also alleviates network bandwidth demands by reducing the number of future requests that have to be sent all the way to remote content servers. In this paper, we propose a novel caching policy based on users' FoV, called FoV-aware caching policy. In FoV-aware caching policy, we learn a probabilistic model of common-FoV for each 360° video based on previous users' viewing histories to improve caching performance. Through experiments with real users' head movement dataset, we show that our proposed approach improves cache hit ratio compared to Least Frequently Used (LFU) and Least Recently Used (LRU) caching policies by at least 40% and 17%, respectively.
近年来,通过360°视频流等技术,虚拟现实(VR)越来越受欢迎。由于高带宽和低延迟要求,流媒体360°视频极具挑战性。一些VR解决方案采用自适应360°视频流,试图通过仅为用户的视场(FoV)传输高分辨率视频来减少带宽消耗。视场是用户在任何给定时间观看的视频部分。尽管fov自适应360°视频流有助于降低带宽需求,但由于网络延迟,从远程内容服务器传输360°视频流仍然具有挑战性。将流行的内容缓存在靠近最终用户的地方,不仅可以减少网络延迟,还可以通过减少必须一路发送到远程内容服务器的未来请求的数量来缓解网络带宽需求。本文提出了一种基于用户视场的缓存策略,称为视场感知缓存策略。在视场感知缓存策略中,我们基于以前用户的观看历史,学习了每个360°视频的公共视场概率模型,以提高缓存性能。通过真实用户头部运动数据集的实验,我们表明,与Least frequency use (LFU)和Least Recently Used (LRU)缓存策略相比,我们提出的方法将缓存命中率分别提高了至少40%和17%。
{"title":"FoV-Aware Edge Caching for Adaptive 360° Video Streaming","authors":"A. Mahzari, A. T. Nasrabadi, Aliehsan Samiei, R. Prakash","doi":"10.1145/3240508.3240680","DOIUrl":"https://doi.org/10.1145/3240508.3240680","url":null,"abstract":"In recent years, there has been growing popularity of Virtual Reality (VR), enabled by technologies like 360° video streaming. Streaming 360° video is extremely challenging due to high bandwidth and low latency requirements. Some VR solutions employ adaptive 360° video streaming which tries to reduce bandwidth consumption by only streaming high resolution video for user's Field of View (FoV). FoV is the part of the video which is being viewed by the user at any given time. Although FoV-adaptive 360° video streaming has been helpful in reducing bandwidth requirements, streaming 360° video from distant content servers is still challenging due to network latency. Caching popular content close to the end users not only decreases network latency, but also alleviates network bandwidth demands by reducing the number of future requests that have to be sent all the way to remote content servers. In this paper, we propose a novel caching policy based on users' FoV, called FoV-aware caching policy. In FoV-aware caching policy, we learn a probabilistic model of common-FoV for each 360° video based on previous users' viewing histories to improve caching performance. Through experiments with real users' head movement dataset, we show that our proposed approach improves cache hit ratio compared to Least Frequently Used (LFU) and Least Recently Used (LRU) caching policies by at least 40% and 17%, respectively.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130246256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 78
Pseudo Transfer with Marginalized Corrupted Attribute for Zero-shot Learning 带有边缘损坏属性的伪迁移零射击学习
Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240715
Teng Long, Xing Xu, Youyou Li, Fumin Shen, Jingkuan Song, Heng Tao Shen
Zero-shot learning (ZSL) aims to recognize unseen classes that are excluded from training classes. ZSL suffers from 1) Zero-shot bias (Z-Bias) --- model is biased towards seen classes because unseen data is inaccessible for training; 2) Zero-shot variance (Z-Variance) --- associating different images to same semantic embedding yields large associating error. To reduce Z-Bias, we propose a pseudo transfer mechanism, where we first synthesize the distribution of unseen data using semantic embeddings, then we minimize the mismatch between the seen distribution and the synthesized unseen distribution. To reduce Z-Variance, we implicitly corrupted one semantic embedding multiple times to generate image-wise semantic vectors, with which our model learn robust classifiers. Lastly, we integrate our Z-Bias and Z-variance reduction techniques with a linear ZSL model to show its usefulness. Our proposed model successfully overcomes the Z-bias and Z-variance problems. Extensive experiments on five benchmark datasets including ImageNet-1K demonstrate that our model outperforms the state-of-the-art methods with fast training.
零射击学习(Zero-shot learning, ZSL)旨在识别那些被排除在培训课程之外的看不见的课程。ZSL存在1)零射击偏差(Z-Bias)——模型偏向于看到的类别,因为看不到的数据无法用于训练;2)零射击方差(Z-Variance)——将不同的图像关联到相同的语义嵌入会产生很大的关联误差。为了减少z偏差,我们提出了一种伪传递机制,首先使用语义嵌入合成未见数据的分布,然后最小化已见分布与合成未见分布之间的不匹配。为了减少Z-Variance,我们隐式地多次破坏一个语义嵌入来生成图像语义向量,我们的模型用它来学习鲁棒分类器。最后,我们将z偏差和z方差减少技术与线性ZSL模型相结合,以显示其实用性。我们提出的模型成功地克服了z偏差和z方差问题。在包括ImageNet-1K在内的五个基准数据集上进行的大量实验表明,我们的模型通过快速训练优于最先进的方法。
{"title":"Pseudo Transfer with Marginalized Corrupted Attribute for Zero-shot Learning","authors":"Teng Long, Xing Xu, Youyou Li, Fumin Shen, Jingkuan Song, Heng Tao Shen","doi":"10.1145/3240508.3240715","DOIUrl":"https://doi.org/10.1145/3240508.3240715","url":null,"abstract":"Zero-shot learning (ZSL) aims to recognize unseen classes that are excluded from training classes. ZSL suffers from 1) Zero-shot bias (Z-Bias) --- model is biased towards seen classes because unseen data is inaccessible for training; 2) Zero-shot variance (Z-Variance) --- associating different images to same semantic embedding yields large associating error. To reduce Z-Bias, we propose a pseudo transfer mechanism, where we first synthesize the distribution of unseen data using semantic embeddings, then we minimize the mismatch between the seen distribution and the synthesized unseen distribution. To reduce Z-Variance, we implicitly corrupted one semantic embedding multiple times to generate image-wise semantic vectors, with which our model learn robust classifiers. Lastly, we integrate our Z-Bias and Z-variance reduction techniques with a linear ZSL model to show its usefulness. Our proposed model successfully overcomes the Z-bias and Z-variance problems. Extensive experiments on five benchmark datasets including ImageNet-1K demonstrate that our model outperforms the state-of-the-art methods with fast training.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"249 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114158632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Session details: Vision-2 (Object & Scene Understanding) 会议细节:Vision-2(对象和场景理解)
Pub Date : 2018-10-15 DOI: 10.1145/3286922
Zhengjun Zha
{"title":"Session details: Vision-2 (Object & Scene Understanding)","authors":"Zhengjun Zha","doi":"10.1145/3286922","DOIUrl":"https://doi.org/10.1145/3286922","url":null,"abstract":"","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114877056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust Correlation Filter Tracking with Shepherded Instance-Aware Proposals 基于引导实例感知提议的鲁棒相关滤波器跟踪
Pub Date : 2018-10-15 DOI: 10.1145/3240508.3240709
Yanjie Liang, Qiangqiang Wu, Yi Liu, Y. Yan, Hanzi Wang
In recent years, convolutional neural network (CNN) based correlation filter trackers have achieved state-of-the-art results on the benchmark datasets. However, the CNN based correlation filters cannot effectively handle large scale variation and distortion (such as fast motion, background clutter, occlusion, etc.), leading to the sub-optimal performance. In this paper, we propose a novel CNN based correlation filter tracker with shepherded instance-aware proposals, namely DeepCFIAP, which automatically estimates the target scale in each frame and re-detects the target when distortion happens. DeepCFIAP is proposed to take advantage of the merits of both instance-aware proposals and CNN based correlation filters. Compared with the CNN based correlation filter trackers, DeepCFIAP can successfully solve the problems of large scale variation and distortion via the shepherded instance-aware proposals, resulting in more robust tracking performance. Specifically, we develop a novel proposal ranking algorithm based on the similarities between proposals and instances. In contrast to the detection proposal based trackers, DeepCFIAP shepherds the instance-aware proposals towards their optimal positions via the CNN based correlation filters, resulting in more accurate tracking results. Extensive experiments on two challenging benchmark datasets demonstrate that the proposed DeepCFIAP performs favorably against state-of-the-art trackers and it is especially feasible for long-term tracking.
近年来,基于卷积神经网络(CNN)的相关滤波跟踪器在基准数据集上取得了较好的效果。然而,基于CNN的相关滤波器不能有效处理大尺度的变化和失真(如快速运动、背景杂波、遮挡等),导致性能次优。在本文中,我们提出了一种新颖的基于CNN的相关滤波跟踪器,即DeepCFIAP,它可以自动估计每帧中的目标尺度,并在发生失真时重新检测目标。DeepCFIAP是利用实例感知提议和基于CNN的相关滤波器的优点而提出的。与基于CNN的相关滤波跟踪器相比,DeepCFIAP通过引导的实例感知提议成功地解决了大规模变化和失真的问题,从而获得了更强的跟踪性能。具体而言,我们基于提案和实例之间的相似性开发了一种新的提案排序算法。与基于检测建议的跟踪器相比,DeepCFIAP通过基于CNN的相关滤波器将实例感知的建议引导到最佳位置,从而获得更准确的跟踪结果。在两个具有挑战性的基准数据集上进行的大量实验表明,所提出的DeepCFIAP与最先进的跟踪器相比表现良好,对于长期跟踪尤其可行。
{"title":"Robust Correlation Filter Tracking with Shepherded Instance-Aware Proposals","authors":"Yanjie Liang, Qiangqiang Wu, Yi Liu, Y. Yan, Hanzi Wang","doi":"10.1145/3240508.3240709","DOIUrl":"https://doi.org/10.1145/3240508.3240709","url":null,"abstract":"In recent years, convolutional neural network (CNN) based correlation filter trackers have achieved state-of-the-art results on the benchmark datasets. However, the CNN based correlation filters cannot effectively handle large scale variation and distortion (such as fast motion, background clutter, occlusion, etc.), leading to the sub-optimal performance. In this paper, we propose a novel CNN based correlation filter tracker with shepherded instance-aware proposals, namely DeepCFIAP, which automatically estimates the target scale in each frame and re-detects the target when distortion happens. DeepCFIAP is proposed to take advantage of the merits of both instance-aware proposals and CNN based correlation filters. Compared with the CNN based correlation filter trackers, DeepCFIAP can successfully solve the problems of large scale variation and distortion via the shepherded instance-aware proposals, resulting in more robust tracking performance. Specifically, we develop a novel proposal ranking algorithm based on the similarities between proposals and instances. In contrast to the detection proposal based trackers, DeepCFIAP shepherds the instance-aware proposals towards their optimal positions via the CNN based correlation filters, resulting in more accurate tracking results. Extensive experiments on two challenging benchmark datasets demonstrate that the proposed DeepCFIAP performs favorably against state-of-the-art trackers and it is especially feasible for long-term tracking.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116485528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
ChildAR-bot: Educational Playing Projection-based AR Robot for Children ChildAR-bot:儿童教育游戏投影AR机器人
Pub Date : 2018-10-15 DOI: 10.1145/3240508.3241362
Yoonjung Park, Yoonsik Yang, Hyocheol Ro, Jinwon Cha, Kyuri Kim, T. Han
Children encounter a variety of experiences through play, which can improve their ability to form ideas and undergo multi-faceted development. Using Augmented Reality (AR) technology to integrate various digital learning elements with real environments can lead to increased learning ability. This study proposes a 360° rotatable and portable system specialized for education and development through projection-based AR play. This system allows existing projection-based AR technology, which once could only be experienced at large-scale exhibitions and experience centers, to be used in individual and small-scale spaces. It also promotes the development of multi-sensory abilities through a multi-modality which provides various intuitive and sensory interactions. By experiencing the various educational play applications provided by the proposed system, children can increase their physical, perceptive, and emotional abilities and thinking skills.
儿童在游戏中遇到各种各样的经历,可以提高他们形成思想的能力,进行多方面的发展。使用增强现实(AR)技术将各种数字学习元素与真实环境相结合,可以提高学习能力。本研究提出了一种360°可旋转和便携式系统,专门用于通过基于投影的AR游戏进行教育和开发。该系统允许现有的基于投影的AR技术,这种技术曾经只能在大型展览和体验中心体验,可以在个人和小规模空间中使用。它还通过提供各种直观和感官互动的多模态促进多感官能力的发展。通过体验该系统提供的各种教育游戏应用,儿童可以提高他们的身体、感知、情感能力和思维技能。
{"title":"ChildAR-bot: Educational Playing Projection-based AR Robot for Children","authors":"Yoonjung Park, Yoonsik Yang, Hyocheol Ro, Jinwon Cha, Kyuri Kim, T. Han","doi":"10.1145/3240508.3241362","DOIUrl":"https://doi.org/10.1145/3240508.3241362","url":null,"abstract":"Children encounter a variety of experiences through play, which can improve their ability to form ideas and undergo multi-faceted development. Using Augmented Reality (AR) technology to integrate various digital learning elements with real environments can lead to increased learning ability. This study proposes a 360° rotatable and portable system specialized for education and development through projection-based AR play. This system allows existing projection-based AR technology, which once could only be experienced at large-scale exhibitions and experience centers, to be used in individual and small-scale spaces. It also promotes the development of multi-sensory abilities through a multi-modality which provides various intuitive and sensory interactions. By experiencing the various educational play applications provided by the proposed system, children can increase their physical, perceptive, and emotional abilities and thinking skills.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116747928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
Proceedings of the 26th ACM international conference on Multimedia
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1