首页 > 最新文献

2017 IEEE International Conference on Multimedia and Expo (ICME)最新文献

英文 中文
Gait phase classification for in-home gait assessment 家用步态评估的步态阶段分类
Pub Date : 2017-08-31 DOI: 10.1109/ICME.2017.8019500
Minxiang Ye, Cheng Yang, V. Stanković, L. Stanković, Samuel Cheng
With growing ageing population, acquiring joint measurements with sufficient accuracy for reliable gait assessment is essential. Additionally, the quality of gait analysis relies heavily on accurate feature selection and classification. Sensor-driven and one-camera optical motion capture systems are becoming increasingly popular in the scientific literature due to their portability and cost-efficacy. In this paper, we propose 12 gait parameters to characterise gait patterns and a novel gait-phase classifier, resulting in comparable classification performance with a state-of-the-art multi-sensor optical motion system. Furthermore, a novel multi-channel time series segmentation method is proposed that maximizes the temporal information of gait parameters improving the final classification success rate after gait event reconstruction. The validation, conducted over 126 experiments on 6 healthy volunteers and 9 stroke patients with handlabelled ground truth gait phases, demonstrates high gait classification accuracy.
随着人口老龄化的不断增长,获得足够精确的关节测量以进行可靠的步态评估是必不可少的。此外,步态分析的质量很大程度上依赖于准确的特征选择和分类。传感器驱动和单摄像头光学运动捕捉系统由于其便携性和成本效益在科学文献中越来越受欢迎。在本文中,我们提出了12个步态参数来表征步态模式和一种新的步态相位分类器,从而获得与最先进的多传感器光学运动系统相当的分类性能。此外,提出了一种新的多通道时间序列分割方法,最大限度地利用步态参数的时间信息,提高步态事件重构后的最终分类成功率。在6名健康志愿者和9名中风患者身上进行了126次实验,验证了步态分类的高准确性。
{"title":"Gait phase classification for in-home gait assessment","authors":"Minxiang Ye, Cheng Yang, V. Stanković, L. Stanković, Samuel Cheng","doi":"10.1109/ICME.2017.8019500","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019500","url":null,"abstract":"With growing ageing population, acquiring joint measurements with sufficient accuracy for reliable gait assessment is essential. Additionally, the quality of gait analysis relies heavily on accurate feature selection and classification. Sensor-driven and one-camera optical motion capture systems are becoming increasingly popular in the scientific literature due to their portability and cost-efficacy. In this paper, we propose 12 gait parameters to characterise gait patterns and a novel gait-phase classifier, resulting in comparable classification performance with a state-of-the-art multi-sensor optical motion system. Furthermore, a novel multi-channel time series segmentation method is proposed that maximizes the temporal information of gait parameters improving the final classification success rate after gait event reconstruction. The validation, conducted over 126 experiments on 6 healthy volunteers and 9 stroke patients with handlabelled ground truth gait phases, demonstrates high gait classification accuracy.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130752623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Novel view synthesis with light-weight view-dependent texture mapping for a stereoscopic HMD 基于轻型视依赖纹理映射的新型立体HMD视图合成
Pub Date : 2017-08-31 DOI: 10.1109/ICME.2017.8019417
Thiwat Rongsirigul, Yuta Nakashima, Tomokazu Sato, N. Yokoya
The proliferation of off-the-shelf head-mounted displays (HMDs) let end-users enjoy virtual reality applications, some of which render a real-world scene using a novel view synthesis (NVS) technique. View-dependent texture mapping (VDTM) has been studied for NVS due to its photo-realistic quality. The VDTM technique renders a novel view by adaptively selecting textures from the most appropriate images. However, this process is computationally expensive because VDTM scans every captured image. For stereoscopic HMDs, the situation is much worse because we need to render novel views once for each eye, almost doubling the cost. This paper proposes light-weight VDTM tailored for an HMD. In order to reduce the computational cost in VDTM, our method leverages the overlapping fields of view between a stereoscopic pair of HMD images and pruning the images to be scanned. We show that the proposed method drastically accelerates the VDTM process without spoiling the image quality through a user study.
随着现成的头戴式显示器(hmd)的普及,终端用户可以享受虚拟现实应用,其中一些应用使用新颖的视图合成(NVS)技术来渲染真实世界的场景。基于视点的纹理映射(VDTM)由于具有逼真的图像质量而受到广泛的研究。VDTM技术通过自适应地从最合适的图像中选择纹理来呈现新的视图。然而,这个过程在计算上很昂贵,因为VDTM扫描每个捕获的图像。对于立体头显来说,情况更糟,因为我们需要为每只眼睛渲染一次新的视图,这几乎是成本的两倍。本文提出了一种为HMD量身定制的轻型VDTM。为了降低VDTM的计算成本,我们的方法利用立体HMD图像对之间的重叠视场,并对待扫描的图像进行裁剪。我们通过用户研究表明,所提出的方法大大加快了VDTM过程,而不会破坏图像质量。
{"title":"Novel view synthesis with light-weight view-dependent texture mapping for a stereoscopic HMD","authors":"Thiwat Rongsirigul, Yuta Nakashima, Tomokazu Sato, N. Yokoya","doi":"10.1109/ICME.2017.8019417","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019417","url":null,"abstract":"The proliferation of off-the-shelf head-mounted displays (HMDs) let end-users enjoy virtual reality applications, some of which render a real-world scene using a novel view synthesis (NVS) technique. View-dependent texture mapping (VDTM) has been studied for NVS due to its photo-realistic quality. The VDTM technique renders a novel view by adaptively selecting textures from the most appropriate images. However, this process is computationally expensive because VDTM scans every captured image. For stereoscopic HMDs, the situation is much worse because we need to render novel views once for each eye, almost doubling the cost. This paper proposes light-weight VDTM tailored for an HMD. In order to reduce the computational cost in VDTM, our method leverages the overlapping fields of view between a stereoscopic pair of HMD images and pruning the images to be scanned. We show that the proposed method drastically accelerates the VDTM process without spoiling the image quality through a user study.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121355509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Enhancing feature modulation spectra with dictionary learning approaches for robust speech recognition 用字典学习方法增强特征调制谱用于鲁棒语音识别
Pub Date : 2017-08-28 DOI: 10.1109/ICME.2017.8019509
Bi-Cheng Yan, Chin-Hong Shih, Shih-Hung Liu, Berlin Chen
Noise robustness has long garnered much interest from researchers and practitioners of the automatic speech recognition (ASR) community due to its paramount importance to the success of ASR systems. This paper presents a novel approach to improving the noise robustness of speech features, building on top of the dictionary learning paradigm. To this end, we employ the K-SVD method and its variants to create sparse representations with respect to a common set of basis spectral vectors that captures the intrinsic temporal structure inherent in the modulation spectra of clean training speech features. The enhanced modulation spectra of speech features, constructed by mapping the original modulation spectra into the space spanned by these representative basis vectors, can better carry noise-resistant acoustic characteristics. In addition, considering the nonnegative property of the modulation spectrum amplitudes, we utilize the nonnegative K-SVD method, in combination with the nonnegative sparse coding method, to generate more noise-robust speech features. All experiments were conducted and verified using the standard Aurora-2 database and task. The empirical results show that the proposed dictionary learning based approach can provide significant average word error reductions when being integrated with either a GMM-HMM or a DNN-HMM based ASR system.
噪声鲁棒性对自动语音识别(ASR)系统的成功至关重要,因此长期以来一直受到自动语音识别(ASR)社区研究人员和实践者的关注。本文在字典学习范式的基础上提出了一种提高语音特征噪声鲁棒性的新方法。为此,我们采用K-SVD方法及其变体来创建相对于一组公共基谱向量的稀疏表示,这些基谱向量捕获了干净训练语音特征调制谱中固有的固有时间结构。通过将原始调制谱映射到这些代表性基向量所跨越的空间中,构建语音特征的增强调制谱,可以更好地承载抗噪声声学特性。此外,考虑到调制频谱幅值的非负特性,我们利用非负K-SVD方法,结合非负稀疏编码方法,生成更强的噪声鲁棒性语音特征。所有实验都是使用标准的Aurora-2数据库和任务进行和验证的。实证结果表明,本文提出的基于字典学习的方法在与基于GMM-HMM或基于DNN-HMM的ASR系统集成时,可以显著降低平均单词误差。
{"title":"Enhancing feature modulation spectra with dictionary learning approaches for robust speech recognition","authors":"Bi-Cheng Yan, Chin-Hong Shih, Shih-Hung Liu, Berlin Chen","doi":"10.1109/ICME.2017.8019509","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019509","url":null,"abstract":"Noise robustness has long garnered much interest from researchers and practitioners of the automatic speech recognition (ASR) community due to its paramount importance to the success of ASR systems. This paper presents a novel approach to improving the noise robustness of speech features, building on top of the dictionary learning paradigm. To this end, we employ the K-SVD method and its variants to create sparse representations with respect to a common set of basis spectral vectors that captures the intrinsic temporal structure inherent in the modulation spectra of clean training speech features. The enhanced modulation spectra of speech features, constructed by mapping the original modulation spectra into the space spanned by these representative basis vectors, can better carry noise-resistant acoustic characteristics. In addition, considering the nonnegative property of the modulation spectrum amplitudes, we utilize the nonnegative K-SVD method, in combination with the nonnegative sparse coding method, to generate more noise-robust speech features. All experiments were conducted and verified using the standard Aurora-2 database and task. The empirical results show that the proposed dictionary learning based approach can provide significant average word error reductions when being integrated with either a GMM-HMM or a DNN-HMM based ASR system.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133846800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Facial attractiveness computation by label distribution learning with deep CNN and geometric features 基于深度CNN和几何特征的标签分布学习人脸吸引力计算
Pub Date : 2017-08-28 DOI: 10.1109/ICME.2017.8019454
Shu Liu, Bo Li, Yangyu Fan, Zhe Guo, A. Samal
Facial attractiveness computation is a challenging task because of the lack of labeled data and discriminative features. In this paper, an end-to-end label distribution learning (LDL) framework with deep convolutional neural network (CNN) and geometric features is proposed to meet these two challenges. Different from the previous work, we recast this task as an LDL problem. Compared with the single label regression, the LDL could improve the generalization ability of our model significantly. In addition, we propose some kinds of geometric features as well as an incremental feature selection method, which could select hundred-dimensional discriminative geometric features from an exhaustive pool of raw features. More importantly, we find these selected geometric features are complementary to CNN features. Extensive experiments are carried out on the SCUT-FBP dataset, where our approach achieves superior performance in comparison to the state-of-the-arts.
由于缺乏标记数据和判别特征,面部吸引力计算是一项具有挑战性的任务。本文提出了一种基于深度卷积神经网络(CNN)和几何特征的端到端标签分布学习(LDL)框架来应对这两个挑战。与之前的工作不同,我们将此任务重新定义为LDL问题。与单标签回归相比,LDL可以显著提高模型的泛化能力。此外,我们还提出了几种几何特征以及一种增量特征选择方法,该方法可以从穷举的原始特征池中选择百维判别性几何特征。更重要的是,我们发现这些选择的几何特征与CNN特征是互补的。在SCUT-FBP数据集上进行了广泛的实验,与最先进的方法相比,我们的方法取得了卓越的性能。
{"title":"Facial attractiveness computation by label distribution learning with deep CNN and geometric features","authors":"Shu Liu, Bo Li, Yangyu Fan, Zhe Guo, A. Samal","doi":"10.1109/ICME.2017.8019454","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019454","url":null,"abstract":"Facial attractiveness computation is a challenging task because of the lack of labeled data and discriminative features. In this paper, an end-to-end label distribution learning (LDL) framework with deep convolutional neural network (CNN) and geometric features is proposed to meet these two challenges. Different from the previous work, we recast this task as an LDL problem. Compared with the single label regression, the LDL could improve the generalization ability of our model significantly. In addition, we propose some kinds of geometric features as well as an incremental feature selection method, which could select hundred-dimensional discriminative geometric features from an exhaustive pool of raw features. More importantly, we find these selected geometric features are complementary to CNN features. Extensive experiments are carried out on the SCUT-FBP dataset, where our approach achieves superior performance in comparison to the state-of-the-arts.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128659685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Random forest regression based acoustic event detection with bottleneck features 基于随机森林回归的瓶颈特征声事件检测
Pub Date : 2017-08-28 DOI: 10.1109/ICME.2017.8019418
Xianjun Xia, R. Togneri, Ferdous Sohel, David Huang
This paper deals with random forest regression based acoustic event detection (AED) by combining acoustic features with bottleneck features (BN). The bottleneck features have a good reputation of being inherently discriminative in acoustic signal processing. To deal with the unstructured and complex real-world acoustic events, an acoustic event detection system is constructed using bottleneck features combined with acoustic features. Evaluations were carried out on the UPC-TALP and ITC-Irst databases which consist of highly variable acoustic events. Experimental results demonstrate the usefulness of the low-dimensional and discriminative bottleneck features with relative 5.33% and 5.51% decreases in error rates respectively.
本文将声学特征与瓶颈特征相结合,研究了基于随机森林回归的声事件检测方法。瓶颈特征在声信号处理中具有固有的判别性。为了处理非结构化、复杂的现实声事件,将瓶颈特征与声学特征相结合,构建了声事件检测系统。对UPC-TALP和itc - first数据库进行了评估,这些数据库由高度可变的声学事件组成。实验结果证明了低维瓶颈特征和判别瓶颈特征的有效性,错误率分别相对降低了5.33%和5.51%。
{"title":"Random forest regression based acoustic event detection with bottleneck features","authors":"Xianjun Xia, R. Togneri, Ferdous Sohel, David Huang","doi":"10.1109/ICME.2017.8019418","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019418","url":null,"abstract":"This paper deals with random forest regression based acoustic event detection (AED) by combining acoustic features with bottleneck features (BN). The bottleneck features have a good reputation of being inherently discriminative in acoustic signal processing. To deal with the unstructured and complex real-world acoustic events, an acoustic event detection system is constructed using bottleneck features combined with acoustic features. Evaluations were carried out on the UPC-TALP and ITC-Irst databases which consist of highly variable acoustic events. Experimental results demonstrate the usefulness of the low-dimensional and discriminative bottleneck features with relative 5.33% and 5.51% decreases in error rates respectively.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129900327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Quality assessment of multi-view-plus-depth images 多视点加深度图像的质量评估
Pub Date : 2017-07-11 DOI: 10.1109/ICME.2017.8019542
Jiheng Wang, Shiqi Wang, Kai Zeng, Zhou Wang
Multi-view-plus-depth (MVD) representation has gained significant attention recently as a means to encode 3D scenes, allowing for intermediate views to be synthesized on-the-fly at the display site through depth-image-based-rendering (DIBR). Automatic quality assessment of MVD images/videos is critical for the optimal design of MVD image/video coding and transmission schemes. Most existing image quality assessment (IQA) and video quality assessment (VQA) methods are applicable only after the DIBR process. Such post-DIBR measures are valuable in assessing the overall system performance, but are difficult to be directly employed in the encoder optimization process in MVD image/video coding. Here we make one of the first attempts to develop a perceptual pre-DIBR IQA approach for MVD images by employing an information content weighted approach that balances between local quality measures of texture and depth images. Experiment results show that the proposed approach achieves competitive performance when compared with state-of-the-art IQA algorithms applied post-DIBR.
多视图加深度(MVD)表示作为一种对3D场景进行编码的方法,最近得到了广泛的关注,它允许在显示现场通过基于深度图像的渲染(DIBR)实时合成中间视图。MVD图像/视频的自动质量评估是优化设计MVD图像/视频编码和传输方案的关键。大多数现有的图像质量评估(IQA)和视频质量评估(VQA)方法只有在DIBR过程之后才适用。这种后dibr度量在评估系统整体性能方面是有价值的,但很难直接用于MVD图像/视频编码的编码器优化过程。在这里,我们通过采用信息内容加权方法来平衡纹理和深度图像的局部质量度量,首次尝试开发用于MVD图像的感知预dibr IQA方法。实验结果表明,与应用后dibr的最先进的IQA算法相比,该方法具有相当的性能。
{"title":"Quality assessment of multi-view-plus-depth images","authors":"Jiheng Wang, Shiqi Wang, Kai Zeng, Zhou Wang","doi":"10.1109/ICME.2017.8019542","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019542","url":null,"abstract":"Multi-view-plus-depth (MVD) representation has gained significant attention recently as a means to encode 3D scenes, allowing for intermediate views to be synthesized on-the-fly at the display site through depth-image-based-rendering (DIBR). Automatic quality assessment of MVD images/videos is critical for the optimal design of MVD image/video coding and transmission schemes. Most existing image quality assessment (IQA) and video quality assessment (VQA) methods are applicable only after the DIBR process. Such post-DIBR measures are valuable in assessing the overall system performance, but are difficult to be directly employed in the encoder optimization process in MVD image/video coding. Here we make one of the first attempts to develop a perceptual pre-DIBR IQA approach for MVD images by employing an information content weighted approach that balances between local quality measures of texture and depth images. Experiment results show that the proposed approach achieves competitive performance when compared with state-of-the-art IQA algorithms applied post-DIBR.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114715083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Saliency detection with two-level fully convolutional networks 两级全卷积网络的显著性检测
Pub Date : 2017-07-10 DOI: 10.1109/ICME.2017.8019309
Yang Yi, Li Su, Qingming Huang, Zhe Wu, Chunfeng Wang
This paper proposes a deep architecture for saliency detection by fusing pixel-level and superpixel-level predictions. Different from the previous methods that either make dense pixellevel prediction with complex networks or region-level prediction for each region with fully-connected layers, this paper investigates an elegant route to make two-level predictions based on a same simple fully convolutional network via seamless transformation. In the transformation module, we integrate the low level features to model the similarities between pixels and superpixels as well as superpixels and superpixels. The pixel-level saliency map detects and highlights the salient object well and the superpixel-level saliency map preserves sharp boundary in a complementary way. A shallow fusion net is applied to learn to fuse the two saliency maps, followed by a CRF post-refinement module. Experiments on four benchmark data sets demonstrate that our method performs favorably against the state-of-art methods.
本文提出了一种融合像素级和超像素级预测的显著性检测深度架构。不同于以往使用复杂网络进行密集像素级预测或使用全连接层对每个区域进行区域级预测的方法,本文研究了一种基于相同简单的全卷积网络通过无缝转换进行两级预测的优雅路径。在转换模块中,我们整合了低级特征来模拟像素和超像素以及超像素和超像素之间的相似度。像素级显著性图可以很好地检测和突出突出显著目标,超像素级显著性图以互补的方式保留清晰的边界。采用浅融合网学习融合两个显著性图,然后采用CRF后精模块。在四个基准数据集上的实验表明,我们的方法比目前的方法表现得更好。
{"title":"Saliency detection with two-level fully convolutional networks","authors":"Yang Yi, Li Su, Qingming Huang, Zhe Wu, Chunfeng Wang","doi":"10.1109/ICME.2017.8019309","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019309","url":null,"abstract":"This paper proposes a deep architecture for saliency detection by fusing pixel-level and superpixel-level predictions. Different from the previous methods that either make dense pixellevel prediction with complex networks or region-level prediction for each region with fully-connected layers, this paper investigates an elegant route to make two-level predictions based on a same simple fully convolutional network via seamless transformation. In the transformation module, we integrate the low level features to model the similarities between pixels and superpixels as well as superpixels and superpixels. The pixel-level saliency map detects and highlights the salient object well and the superpixel-level saliency map preserves sharp boundary in a complementary way. A shallow fusion net is applied to learn to fuse the two saliency maps, followed by a CRF post-refinement module. Experiments on four benchmark data sets demonstrate that our method performs favorably against the state-of-art methods.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129937889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Remembering history with convolutional LSTM for anomaly detection 用卷积LSTM记忆历史进行异常检测
Pub Date : 2017-07-10 DOI: 10.1109/ICME.2017.8019325
Weixin Luo, Wen Liu, Shenghua Gao
This paper tackles anomaly detection in videos, which is an extremely challenging task because anomaly is unbounded. We approach this task by leveraging a Convolutional Neural Network (CNN or ConvNet) for appearance encoding for each frame, and leveraging a Convolutional Long Short Term Memory (ConvLSTM) for memorizing all past frames which corresponds to the motion information. Then we integrate ConvNet and ConvLSTM with Auto-Encoder, which is referred to as ConvLSTM-AE, to learn the regularity of appearance and motion for the ordinary moments. Compared with 3D Convolutional Auto-Encoder based anomaly detection, our main contribution lies in that we propose a ConvLSTM-AE framework which better encodes the change of appearance and motion for normal events, respectively. To evaluate our method, we first conduct experiments on a synthesized Moving-MNIST dataset under controlled settings, and results show that our method can easily identify the change of appearance and motion. Extensive experiments on real anomaly datasets further validate the effectiveness of our method for anomaly detection.
视频中的异常检测是一项极具挑战性的任务,因为异常是无界的。我们通过利用卷积神经网络(CNN或ConvNet)对每帧进行外观编码,并利用卷积长短期记忆(ConvLSTM)来记忆与运动信息对应的所有过去帧来完成这项任务。然后将ConvNet和ConvLSTM与Auto-Encoder(称为ConvLSTM- ae)相结合,学习普通矩的外观和运动规律。与基于3D卷积自编码器的异常检测相比,我们的主要贡献在于我们提出了一个ConvLSTM-AE框架,它可以更好地分别对正常事件的外观变化和运动变化进行编码。为了评估我们的方法,我们首先在一个受控设置下的合成Moving-MNIST数据集上进行了实验,结果表明我们的方法可以很容易地识别出外观和运动的变化。在实际异常数据集上的大量实验进一步验证了我们的异常检测方法的有效性。
{"title":"Remembering history with convolutional LSTM for anomaly detection","authors":"Weixin Luo, Wen Liu, Shenghua Gao","doi":"10.1109/ICME.2017.8019325","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019325","url":null,"abstract":"This paper tackles anomaly detection in videos, which is an extremely challenging task because anomaly is unbounded. We approach this task by leveraging a Convolutional Neural Network (CNN or ConvNet) for appearance encoding for each frame, and leveraging a Convolutional Long Short Term Memory (ConvLSTM) for memorizing all past frames which corresponds to the motion information. Then we integrate ConvNet and ConvLSTM with Auto-Encoder, which is referred to as ConvLSTM-AE, to learn the regularity of appearance and motion for the ordinary moments. Compared with 3D Convolutional Auto-Encoder based anomaly detection, our main contribution lies in that we propose a ConvLSTM-AE framework which better encodes the change of appearance and motion for normal events, respectively. To evaluate our method, we first conduct experiments on a synthesized Moving-MNIST dataset under controlled settings, and results show that our method can easily identify the change of appearance and motion. Extensive experiments on real anomaly datasets further validate the effectiveness of our method for anomaly detection.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125738434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 354
A subjective visual quality assessment method of panoramic videos 全景视频主观视觉质量评价方法
Pub Date : 2017-07-10 DOI: 10.1109/ICME.2017.8019351
Mai Xu, Chen Li, Yufan Liu, Xin Deng, Jiaxin Lu
Different from 2-dimensional (2D) videos, panoramic videos contain spherical viewing direction with the support of head-mounted displays, thus improving immersive and interactive visual experience. Unfortunately, to our best knowledge, there are few subjective visual quality assessment (VQA) methods for panoramic videos. In this paper, we therefore propose a subjective VQA method for assessing quality loss of impaired panoramic videos. Specifically, we first establish a database containing viewing direction data of several subjects on watching panoramic videos. Then, we find out that there exists high consistency of viewing direction on panoramic videos across different subjects. Upon this finding, we present a procedure of subjective test in measuring quality of panoramic videos by different subjects, yielding different mean opinion score (DMOS). To couple with inconsistency of viewing directions on panoramic videos, we further propose a vectorized DMOS metric. Finally, experimental results verify that our subjective VQA method, in the forms of both overall and vectorized DMOS metrics, is effective in measuring subjective quality of panoramic videos.
与二维视频不同,全景视频在头戴式显示器的支持下包含球形观看方向,从而提高了沉浸式和互动性的视觉体验。遗憾的是,据我们所知,目前很少有针对全景视频的主观视觉质量评估(VQA)方法。因此,本文提出了一种主观VQA方法来评估受损全景视频的质量损失。具体而言,我们首先建立了一个包含多个主体观看全景视频时的观看方向数据的数据库。然后,我们发现不同主题的全景视频在观看方向上存在高度一致性。基于这一发现,我们提出了一种主观测试方法来衡量不同受试者的全景视频质量,从而产生不同的平均意见得分(DMOS)。为了结合全景视频中观看方向的不一致性,我们进一步提出了一种矢量化的DMOS度量。最后,实验结果验证了我们的主观VQA方法,以整体和矢量化DMOS度量的形式衡量全景视频的主观质量是有效的。
{"title":"A subjective visual quality assessment method of panoramic videos","authors":"Mai Xu, Chen Li, Yufan Liu, Xin Deng, Jiaxin Lu","doi":"10.1109/ICME.2017.8019351","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019351","url":null,"abstract":"Different from 2-dimensional (2D) videos, panoramic videos contain spherical viewing direction with the support of head-mounted displays, thus improving immersive and interactive visual experience. Unfortunately, to our best knowledge, there are few subjective visual quality assessment (VQA) methods for panoramic videos. In this paper, we therefore propose a subjective VQA method for assessing quality loss of impaired panoramic videos. Specifically, we first establish a database containing viewing direction data of several subjects on watching panoramic videos. Then, we find out that there exists high consistency of viewing direction on panoramic videos across different subjects. Upon this finding, we present a procedure of subjective test in measuring quality of panoramic videos by different subjects, yielding different mean opinion score (DMOS). To couple with inconsistency of viewing directions on panoramic videos, we further propose a vectorized DMOS metric. Finally, experimental results verify that our subjective VQA method, in the forms of both overall and vectorized DMOS metrics, is effective in measuring subjective quality of panoramic videos.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132241444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 59
DLML: Deep linear mappings learning for face super-resolution with nonlocal-patch DLML:基于非局部补丁的人脸超分辨率深度线性映射学习
Pub Date : 2017-07-10 DOI: 10.1109/ICME.2017.8019298
T. Lu, Lanlan Pan, Junjun Jiang, Yanduo Zhang, Zixiang Xiong
Learning-based face super-resolution approaches rely on representative dictionary as self-similarity prior from training samples to estimate the relationship between the low-resolution (LR) and high-resolution (HR) image patches. The most popular approaches, learn mapping function directly from LR patches to HR ones but neglects the multi-layered nature of image degradation process (resolution down-sampling) which means observed LR images are gradually formed from HR version to lower resolution ones. In this paper, we present a novel deep linear mappings learning framework for face super-resolution to learn the complex relationship between LR features and HR ones by alternately updating multi-layered embedding dictionaries and linear mapping matrices instead of directly mapping. Furthermore, in contrast to existing position based studies that only use local patch for self-similarity prior, we develop a feature-induced nonlocal dictionary pair embedding method to support hierarchical multiple linear mappings learning. With coarse-to-fine nature of deep learning architecture, cascaded incremental linear mappings matrices can be used to exploit the complex relationship between LR and HR images. Experimental results demonstrate that such framework outperforms state-of-the-art (including both general super-resolution approaches and face super-resolution approaches) on FEI face database.
基于学习的人脸超分辨率方法依靠代表性字典作为训练样本的自相似先验来估计低分辨率(LR)和高分辨率(HR)图像斑块之间的关系。最流行的方法是直接从LR补丁学习映射函数到HR补丁,但忽略了图像退化过程的多层性(分辨率降采样),即观察到的LR图像从HR版本逐渐形成到低分辨率版本。本文提出了一种新的面部超分辨率深度线性映射学习框架,通过交替更新多层嵌入字典和线性映射矩阵来学习LR特征和HR特征之间的复杂关系,而不是直接映射。此外,与现有基于位置的研究仅使用局部补丁进行自相似先验相比,我们开发了一种特征诱导的非局部字典对嵌入方法来支持分层多重线性映射学习。由于深度学习架构具有从粗到精的特性,级联增量线性映射矩阵可用于利用LR和HR图像之间的复杂关系。实验结果表明,该框架在FEI人脸数据库上优于现有技术(包括一般超分辨率方法和人脸超分辨率方法)。
{"title":"DLML: Deep linear mappings learning for face super-resolution with nonlocal-patch","authors":"T. Lu, Lanlan Pan, Junjun Jiang, Yanduo Zhang, Zixiang Xiong","doi":"10.1109/ICME.2017.8019298","DOIUrl":"https://doi.org/10.1109/ICME.2017.8019298","url":null,"abstract":"Learning-based face super-resolution approaches rely on representative dictionary as self-similarity prior from training samples to estimate the relationship between the low-resolution (LR) and high-resolution (HR) image patches. The most popular approaches, learn mapping function directly from LR patches to HR ones but neglects the multi-layered nature of image degradation process (resolution down-sampling) which means observed LR images are gradually formed from HR version to lower resolution ones. In this paper, we present a novel deep linear mappings learning framework for face super-resolution to learn the complex relationship between LR features and HR ones by alternately updating multi-layered embedding dictionaries and linear mapping matrices instead of directly mapping. Furthermore, in contrast to existing position based studies that only use local patch for self-similarity prior, we develop a feature-induced nonlocal dictionary pair embedding method to support hierarchical multiple linear mappings learning. With coarse-to-fine nature of deep learning architecture, cascaded incremental linear mappings matrices can be used to exploit the complex relationship between LR and HR images. Experimental results demonstrate that such framework outperforms state-of-the-art (including both general super-resolution approaches and face super-resolution approaches) on FEI face database.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133228810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
期刊
2017 IEEE International Conference on Multimedia and Expo (ICME)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1