首页 > 最新文献

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文 中文
Contour-based 3D tongue motion visualization using ultrasound image sequences 基于轮廓的三维舌头运动可视化超声图像序列
Kele Xu, Yin Yang, Clémence Leboullenger, P. Roussel-Ragot, B. Denby
This article describes a contour-based 3D tongue deformation visualization framework using B-mode ultrasound image sequences. A robust, automatic tracking algorithm characterizes tongue motion via a contour, which is then used to drive a generic 3D Finite Element Model (FEM). A novel contour-based 3D dynamic modeling method is presented. Modal reduction and modal warping techniques are applied to model the deformation of the tongue physically and efficiently. This work can be helpful in a variety of fields, such as speech production, silent speech recognition, articulation training, speech disorder study, etc.
本文介绍了一种基于b超图像序列的基于轮廓的三维舌形可视化框架。一种鲁棒的自动跟踪算法通过轮廓来表征舌头的运动,然后用于驱动通用的三维有限元模型(FEM)。提出了一种新的基于轮廓的三维动态建模方法。采用模态还原和模态翘曲技术对舌形变形进行了物理有效的建模。这项工作可用于语音产生、无声语音识别、发音训练、语言障碍研究等多个领域。
{"title":"Contour-based 3D tongue motion visualization using ultrasound image sequences","authors":"Kele Xu, Yin Yang, Clémence Leboullenger, P. Roussel-Ragot, B. Denby","doi":"10.1109/ICASSP.2016.7472705","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7472705","url":null,"abstract":"This article describes a contour-based 3D tongue deformation visualization framework using B-mode ultrasound image sequences. A robust, automatic tracking algorithm characterizes tongue motion via a contour, which is then used to drive a generic 3D Finite Element Model (FEM). A novel contour-based 3D dynamic modeling method is presented. Modal reduction and modal warping techniques are applied to model the deformation of the tongue physically and efficiently. This work can be helpful in a variety of fields, such as speech production, silent speech recognition, articulation training, speech disorder study, etc.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126795495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Depth map estimation using census transform for light field cameras 基于普查变换的光场相机深度图估计
Takayuki Tomioka, Kazu Mishiba, Y. Oyamada, K. Kondo
Depth estimation for the lense-array type cameras is a challenging problem because of sensor noise and radiometric distortion which is a global brightness change between sub-aperture images caused by a vignetting effect of the micro-lenses. We propose a depth map estimation method which has robustness against the sensor noise and the radiometric distortion. Our method first binarizes sub-aperture images by applying the census transform. Next, the binarized images are matched by computing the majority operations between corresponding bits and summing up the Hamming distance. An initial map obtained by matching has ambiguity caused by extremely short baselines among sub-aperture images. We refine an initial map by the optimization which uses the assumption that the variations of the depth values in the depth map and of the pixel values in the texture-less objects are similar. Experiments show that our method outperforms the conventional methods.
由于传感器噪声和辐射畸变的存在,透镜阵列相机的深度估计是一个具有挑战性的问题。辐射畸变是由微透镜的渐晕效应引起的子孔径图像之间的全局亮度变化。提出了一种对传感器噪声和辐射失真具有鲁棒性的深度图估计方法。该方法首先利用普查变换对子孔径图像进行二值化处理。接下来,通过计算对应位之间的多数运算并求和汉明距离来匹配二值化后的图像。由于子孔径图像间基线极短,匹配得到的初始地图存在模糊性。我们利用深度图中深度值的变化与无纹理对象中像素值的变化相似的假设,通过优化来优化初始地图。实验表明,该方法优于传统方法。
{"title":"Depth map estimation using census transform for light field cameras","authors":"Takayuki Tomioka, Kazu Mishiba, Y. Oyamada, K. Kondo","doi":"10.1109/ICASSP.2016.7471955","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7471955","url":null,"abstract":"Depth estimation for the lense-array type cameras is a challenging problem because of sensor noise and radiometric distortion which is a global brightness change between sub-aperture images caused by a vignetting effect of the micro-lenses. We propose a depth map estimation method which has robustness against the sensor noise and the radiometric distortion. Our method first binarizes sub-aperture images by applying the census transform. Next, the binarized images are matched by computing the majority operations between corresponding bits and summing up the Hamming distance. An initial map obtained by matching has ambiguity caused by extremely short baselines among sub-aperture images. We refine an initial map by the optimization which uses the assumption that the variations of the depth values in the depth map and of the pixel values in the texture-less objects are similar. Experiments show that our method outperforms the conventional methods.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127005360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Pushing the limit of non-rigid structure-from-motion by shape clustering 通过形状聚类突破运动非刚性结构的极限
Huizhong Deng, Yuchao Dai
Recovering both camera motions and non-rigid 3D shapes from 2D feature tracks is a challenging problem in computer vision. Long-term, complex non-rigid shape variations in real world videos further increase the difficulty for Non-rigid structure-from-motion (NRSfM). Furthermore, there does not exist a criterion to characterize the possibility in recovering the non-rigid shapes and camera motions (i.e., how easy or how difficult the problem could be). In this paper, we first present an analysis to the "reconstructability" measure for NRSfM, where we show that 3D shape complexity and camera motion complexity can be used to index the re-constructability. We propose an iterative shape clustering based method to NRSfM, which alternates between 3D shape clustering and 3D shape reconstruction. Thus, the global reconstructability has been improved and better reconstruction can be achieved. Experimental results on long-term, complex non-rigid motion sequences show that our method outperforms the current state-of-the-art methods by a margin.
从2D特征轨迹中恢复相机运动和非刚性3D形状是计算机视觉中的一个具有挑战性的问题。现实世界视频中长期复杂的非刚性形状变化进一步增加了非刚性运动结构(NRSfM)的难度。此外,不存在一个标准来描述恢复非刚性形状和相机运动的可能性(即,问题的容易程度或困难程度)。本文首先对NRSfM的“可重构性”指标进行了分析,表明三维形状复杂度和摄像机运动复杂度可以用来衡量其可重构性。提出了一种基于迭代形状聚类的NRSfM方法,该方法在三维形状聚类和三维形状重建之间交替进行。从而提高了全局可重构性,实现了更好的重构。长期,复杂的非刚性运动序列的实验结果表明,我们的方法优于目前最先进的方法。
{"title":"Pushing the limit of non-rigid structure-from-motion by shape clustering","authors":"Huizhong Deng, Yuchao Dai","doi":"10.1109/ICASSP.2016.7472027","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7472027","url":null,"abstract":"Recovering both camera motions and non-rigid 3D shapes from 2D feature tracks is a challenging problem in computer vision. Long-term, complex non-rigid shape variations in real world videos further increase the difficulty for Non-rigid structure-from-motion (NRSfM). Furthermore, there does not exist a criterion to characterize the possibility in recovering the non-rigid shapes and camera motions (i.e., how easy or how difficult the problem could be). In this paper, we first present an analysis to the \"reconstructability\" measure for NRSfM, where we show that 3D shape complexity and camera motion complexity can be used to index the re-constructability. We propose an iterative shape clustering based method to NRSfM, which alternates between 3D shape clustering and 3D shape reconstruction. Thus, the global reconstructability has been improved and better reconstruction can be achieved. Experimental results on long-term, complex non-rigid motion sequences show that our method outperforms the current state-of-the-art methods by a margin.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129232220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Distributed dyadic cyclic descent for non-negative matrix factorization 非负矩阵分解的分布二进循环下降
M. Ulfarsson, V. Solo, J. Sigurdsson, J. R. Sveinsson
Non-negative matrix factorization (NMF) has found use in fields such as remote sensing and computer vision where the signals of interest are usually non-negative. Data dimensions in these applications can be huge and traditional algorithms break down due to unachievable memory demands. One is then compelled to consider distributed algorithms. In this paper, we develop for the first time a distributed version of NMF using the alternating direction method of multipliers (ADMM) algorithm and dyadic cyclic descent. The algorithm is compared to well established variants of NMF using simulated data, and is also evaluated using real remote sensing hyperspectral data.
非负矩阵分解(NMF)在遥感和计算机视觉等领域中得到了应用,这些领域中感兴趣的信号通常是非负的。这些应用程序中的数据维度可能是巨大的,传统算法由于无法实现的内存需求而崩溃。于是人们不得不考虑分布式算法。本文首次利用乘法器的交替方向法(ADMM)算法和二进循环下降法开发了一种分布式的NMF。利用模拟数据将该算法与已建立的NMF变体进行了比较,并利用真实的遥感高光谱数据对该算法进行了评估。
{"title":"Distributed dyadic cyclic descent for non-negative matrix factorization","authors":"M. Ulfarsson, V. Solo, J. Sigurdsson, J. R. Sveinsson","doi":"10.1109/ICASSP.2016.7472489","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7472489","url":null,"abstract":"Non-negative matrix factorization (NMF) has found use in fields such as remote sensing and computer vision where the signals of interest are usually non-negative. Data dimensions in these applications can be huge and traditional algorithms break down due to unachievable memory demands. One is then compelled to consider distributed algorithms. In this paper, we develop for the first time a distributed version of NMF using the alternating direction method of multipliers (ADMM) algorithm and dyadic cyclic descent. The algorithm is compared to well established variants of NMF using simulated data, and is also evaluated using real remote sensing hyperspectral data.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124005338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Improving face detection with depth 改进深度人脸检测
Gregory P. Meyer, Steven Alfano, M. Do
Face detection serves an important role in many computer vision systems. Typically, a face detector identifies faces within a grayscale or color image. Due to the recent increase in consumer depth cameras, obtaining both color and depth images of a scene has never been easier. We propose a technique that utilizes depth information to improve face detection. Standard face detection methods, such as the Viola-Jones object detection framework, detects faces by searching an image at every location and scale. Our method increases the speed and accuracy of the Viola-Jones face detector by utilizing depth data to constrain the detector's search over the image. Leveraging a Kinect camera, we are able to detect faces 3.5× faster, while greatly reducing the amount of false positives.
人脸检测在许多计算机视觉系统中起着重要的作用。通常,人脸检测器在灰度或彩色图像中识别人脸。由于最近消费者深度相机的增加,获得场景的彩色和深度图像从未如此容易。我们提出了一种利用深度信息来改进人脸检测的技术。标准的人脸检测方法,如Viola-Jones对象检测框架,通过在每个位置和尺度上搜索图像来检测人脸。我们的方法通过利用深度数据来限制检测器对图像的搜索,从而提高了维奥拉-琼斯人脸检测器的速度和准确性。利用Kinect摄像头,我们能够以3.5倍的速度检测人脸,同时大大减少误报的数量。
{"title":"Improving face detection with depth","authors":"Gregory P. Meyer, Steven Alfano, M. Do","doi":"10.1109/ICASSP.2016.7471884","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7471884","url":null,"abstract":"Face detection serves an important role in many computer vision systems. Typically, a face detector identifies faces within a grayscale or color image. Due to the recent increase in consumer depth cameras, obtaining both color and depth images of a scene has never been easier. We propose a technique that utilizes depth information to improve face detection. Standard face detection methods, such as the Viola-Jones object detection framework, detects faces by searching an image at every location and scale. Our method increases the speed and accuracy of the Viola-Jones face detector by utilizing depth data to constrain the detector's search over the image. Leveraging a Kinect camera, we are able to detect faces 3.5× faster, while greatly reducing the amount of false positives.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124246817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An improved anthropometry-based customization method of individual head-related transfer functions 一种改进的基于人体测量学的个体头部相关传递函数定制方法
Xuejie Liu, Xiaoli Zhong
Individual head-related transfer functions (HRTFs) are necessary for rendering authentic spatial perceptions in spatial audio applications. To obtain individual HRTFs while avoiding tedious and complicated measurement and calculation, an improved customization method based on anthropometry matching is proposed. In the method, a set of HRTFs, which is the best match to the pinna shape of the listener using four pinna-related anatomical parameters, is selected as the listener's individual HRTFs from a pre-acquired HRTF baseline database. A series of subject localization experiments was conducted to verify the effectiveness of the proposed method compared with the existing method. Results show that the median-plane localization performance of the customization method proposed in the present work is prior to the existing method, though performance improvement varies with source position.
在空间音频应用中,个体头部相关传递函数(hrtf)对于呈现真实的空间感知是必要的。为了获得个体hrtf,同时避免繁琐复杂的测量和计算,提出了一种改进的基于人体测量匹配的定制方法。在该方法中,从预先获取的HRTF基线数据库中选择与听者耳廓形状最匹配的一组HRTF作为听者的个体HRTF。通过一系列受试者定位实验,与现有方法进行了对比,验证了所提方法的有效性。结果表明,本文提出的自定义方法的中平面定位性能优于现有方法,但性能的提高随源位置的不同而不同。
{"title":"An improved anthropometry-based customization method of individual head-related transfer functions","authors":"Xuejie Liu, Xiaoli Zhong","doi":"10.1109/ICASSP.2016.7471692","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7471692","url":null,"abstract":"Individual head-related transfer functions (HRTFs) are necessary for rendering authentic spatial perceptions in spatial audio applications. To obtain individual HRTFs while avoiding tedious and complicated measurement and calculation, an improved customization method based on anthropometry matching is proposed. In the method, a set of HRTFs, which is the best match to the pinna shape of the listener using four pinna-related anatomical parameters, is selected as the listener's individual HRTFs from a pre-acquired HRTF baseline database. A series of subject localization experiments was conducted to verify the effectiveness of the proposed method compared with the existing method. Results show that the median-plane localization performance of the customization method proposed in the present work is prior to the existing method, though performance improvement varies with source position.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123417809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Robust sparsity-promoting acoustic multi-channel equalization for speech dereverberation 用于语音去噪的鲁棒稀疏性声学多通道均衡
I. Kodrasi, Ante Jukic, S. Doclo
This paper presents a novel signal-dependent method to increase the robustness of acoustic multi-channel equalization techniques against room impulse response (RIR) estimation errors. Aiming at obtaining an output signal which better resembles a clean speech signal, we propose to extend the acoustic multi-channel equalization cost function with a penalty function which promotes sparsity of the output signal in the short-time Fourier transform domain. Two conventionally used sparsity-promoting penalty functions are investigated, i.e., the l0-norm and the l1-norm, and the sparsity-promoting filters are iteratively computed using the alternating direction method of multipliers. Simulation results for several RIR estimation errors show that incorporating a sparsity-promoting penalty function significantly increases the robustness, with the l1-norm penalty function outperforming the l0-norm penalty function.
本文提出了一种新的信号相关方法来提高声学多通道均衡技术对房间脉冲响应估计误差的鲁棒性。为了获得更接近于清晰语音信号的输出信号,我们提出将声学多通道均衡代价函数扩展为一个惩罚函数,以提高输出信号在短时傅里叶变换域中的稀疏性。研究了两种常用的促进稀疏性的惩罚函数,即10 -范数和11 -范数,并使用乘法器的交替方向法迭代计算了促进稀疏性的滤波器。对几种RIR估计误差的仿真结果表明,加入促进稀疏性的惩罚函数显著提高了鲁棒性,11范数惩罚函数的鲁棒性优于10范数惩罚函数。
{"title":"Robust sparsity-promoting acoustic multi-channel equalization for speech dereverberation","authors":"I. Kodrasi, Ante Jukic, S. Doclo","doi":"10.1109/ICASSP.2016.7471658","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7471658","url":null,"abstract":"This paper presents a novel signal-dependent method to increase the robustness of acoustic multi-channel equalization techniques against room impulse response (RIR) estimation errors. Aiming at obtaining an output signal which better resembles a clean speech signal, we propose to extend the acoustic multi-channel equalization cost function with a penalty function which promotes sparsity of the output signal in the short-time Fourier transform domain. Two conventionally used sparsity-promoting penalty functions are investigated, i.e., the l0-norm and the l1-norm, and the sparsity-promoting filters are iteratively computed using the alternating direction method of multipliers. Simulation results for several RIR estimation errors show that incorporating a sparsity-promoting penalty function significantly increases the robustness, with the l1-norm penalty function outperforming the l0-norm penalty function.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121180521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A generalized LDPC framework for robust and sublinear compressive sensing 一种用于鲁棒和亚线性压缩感知的广义LDPC框架
Xu Chen, Dongning Guo
Compressive sensing aims to recover a high-dimensional sparse signal from a relatively small number of measurements. In this paper, a novel design of the measurement matrix is proposed. The design is inspired by the construction of generalized low-density parity-check codes, where the capacity-achieving point-to-point codes serve as subcodes to robustly estimate the signal support. In the case that each entry of the n-dimensional ft-sparse signal lies in a known discrete alphabet, the proposed scheme requires only O(k log n) measurements and arithmetic operations. In the case of arbitrary, possibly continuous alphabet, an error propagation graph is proposed to characterize the residual estimation error. With O(k log2 n) measurements and computational complexity, the reconstruction error can be made arbitrarily small with high probability.
压缩感知旨在从相对较少的测量中恢复高维稀疏信号。本文提出了一种新的测量矩阵设计方法。该设计的灵感来自于广义低密度奇偶校验码的构造,其中容量实现点对点码作为子码来鲁棒估计信号支持度。在n维ft稀疏信号的每个条目位于已知的离散字母表的情况下,所提出的方案只需要O(k log n)次测量和算术运算。对于任意的、可能连续的字母,提出了一个误差传播图来表征残差估计误差。在O(k log2 n)测量值和计算复杂度下,重构误差可以在高概率下任意小。
{"title":"A generalized LDPC framework for robust and sublinear compressive sensing","authors":"Xu Chen, Dongning Guo","doi":"10.1109/ICASSP.2016.7472553","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7472553","url":null,"abstract":"Compressive sensing aims to recover a high-dimensional sparse signal from a relatively small number of measurements. In this paper, a novel design of the measurement matrix is proposed. The design is inspired by the construction of generalized low-density parity-check codes, where the capacity-achieving point-to-point codes serve as subcodes to robustly estimate the signal support. In the case that each entry of the n-dimensional ft-sparse signal lies in a known discrete alphabet, the proposed scheme requires only O(k log n) measurements and arithmetic operations. In the case of arbitrary, possibly continuous alphabet, an error propagation graph is proposed to characterize the residual estimation error. With O(k log2 n) measurements and computational complexity, the reconstruction error can be made arbitrarily small with high probability.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114238365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Fast intra mode decision and block matching for HEVC screen content compression HEVC屏幕内容压缩的快速模式内决策和块匹配
Hao Zhang, Qiao-Yan Zhou, Ningning Shi, Feng Yang, Xin Feng, Zhan Ma
Screen content coding (SCC) is the latest extension of the High-Efficiency Video Coding (HEVC) aiming to improve the compression efficiency of screen content video. With newly developed tools such as intra block copy (IntraBC) and palette (PLT) mode, SCC has been able to compress the desktop screens more efficiently but with significant complexity increase. In this paper, we improve the intra prediction from two aspects. Firstly, by leveraging the temporal correlation among coding units (CU), we develop a fast CU depth prediction scheme. Furthermore, adaptive search step is employed for further speed up of the time-consuming block matching in IntraBC. The overall encoding time is reduced by about 39% and 35% for the All Intra (AI) lossy and lossless encoding scenarios with negligible quality loss under the SCC common test condition.
屏幕内容编码(SCC)是高效视频编码(HEVC)的最新扩展,旨在提高屏幕内容视频的压缩效率。随着新开发的工具,如intra block copy (IntraBC)和palette (PLT)模式,SCC已经能够更有效地压缩桌面屏幕,但复杂性也显著增加。本文从两个方面改进了图像内预测。首先,利用编码单元(CU)之间的时间相关性,开发了一种快速的CU深度预测方案。此外,采用自适应搜索步骤,进一步加快了IntraBC算法中耗时的块匹配速度。在SCC通用测试条件下,在质量损失可以忽略不计的情况下,所有Intra (AI)有损和无损编码方案的总编码时间分别减少了39%和35%。
{"title":"Fast intra mode decision and block matching for HEVC screen content compression","authors":"Hao Zhang, Qiao-Yan Zhou, Ningning Shi, Feng Yang, Xin Feng, Zhan Ma","doi":"10.1109/ICASSP.2016.7471902","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7471902","url":null,"abstract":"Screen content coding (SCC) is the latest extension of the High-Efficiency Video Coding (HEVC) aiming to improve the compression efficiency of screen content video. With newly developed tools such as intra block copy (IntraBC) and palette (PLT) mode, SCC has been able to compress the desktop screens more efficiently but with significant complexity increase. In this paper, we improve the intra prediction from two aspects. Firstly, by leveraging the temporal correlation among coding units (CU), we develop a fast CU depth prediction scheme. Furthermore, adaptive search step is employed for further speed up of the time-consuming block matching in IntraBC. The overall encoding time is reduced by about 39% and 35% for the All Intra (AI) lossy and lossless encoding scenarios with negligible quality loss under the SCC common test condition.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116184440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
A hierarchical framework for language identification 语言识别的层次框架
S. Irtza, V. Sethu, Haris Bavattichalil, E. Ambikairajah, Haizhou Li
Most current language recognition systems model different levels of information such as acoustic, prosodic, phonotactic, etc. independently and combine the model likelihoods in order to make a decision. However, these are single level systems that treat all languages identically and hence incapable of exploiting any similarities that may exist within groups of languages. In this paper, a hierarchical language identification (HLID) framework is proposed that involves a series of classification decisions at multiple levels involving language clusters of decreasing sizes with individual languages identified only at the final level. The performance of proposed hierarchical framework is compared with a state-of-the-art LID system on the NIST 2007 database and the results indicate that the proposed approach outperforms state-of-the-art systems.
目前的语言识别系统大多是对不同层次的信息,如声学、韵律、语音等进行独立建模,并结合模型的似然来进行决策。然而,这些都是单一级别的系统,对所有语言一视同仁,因此无法利用语言组中可能存在的任何相似性。本文提出了一种分层语言识别(HLID)框架,该框架涉及一系列多层次的分类决策,其中涉及语言簇的大小逐渐减少,而单个语言仅在最后一层被识别。将所提出的分层框架的性能与NIST 2007数据库中最先进的LID系统进行了比较,结果表明所提出的方法优于最先进的系统。
{"title":"A hierarchical framework for language identification","authors":"S. Irtza, V. Sethu, Haris Bavattichalil, E. Ambikairajah, Haizhou Li","doi":"10.1109/ICASSP.2016.7472793","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7472793","url":null,"abstract":"Most current language recognition systems model different levels of information such as acoustic, prosodic, phonotactic, etc. independently and combine the model likelihoods in order to make a decision. However, these are single level systems that treat all languages identically and hence incapable of exploiting any similarities that may exist within groups of languages. In this paper, a hierarchical language identification (HLID) framework is proposed that involves a series of classification decisions at multiple levels involving language clusters of decreasing sizes with individual languages identified only at the final level. The performance of proposed hierarchical framework is compared with a state-of-the-art LID system on the NIST 2007 database and the results indicate that the proposed approach outperforms state-of-the-art systems.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121510233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
期刊
2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1