Pub Date : 2016-03-20DOI: 10.1109/ICASSP.2016.7472705
Kele Xu, Yin Yang, Clémence Leboullenger, P. Roussel-Ragot, B. Denby
This article describes a contour-based 3D tongue deformation visualization framework using B-mode ultrasound image sequences. A robust, automatic tracking algorithm characterizes tongue motion via a contour, which is then used to drive a generic 3D Finite Element Model (FEM). A novel contour-based 3D dynamic modeling method is presented. Modal reduction and modal warping techniques are applied to model the deformation of the tongue physically and efficiently. This work can be helpful in a variety of fields, such as speech production, silent speech recognition, articulation training, speech disorder study, etc.
{"title":"Contour-based 3D tongue motion visualization using ultrasound image sequences","authors":"Kele Xu, Yin Yang, Clémence Leboullenger, P. Roussel-Ragot, B. Denby","doi":"10.1109/ICASSP.2016.7472705","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7472705","url":null,"abstract":"This article describes a contour-based 3D tongue deformation visualization framework using B-mode ultrasound image sequences. A robust, automatic tracking algorithm characterizes tongue motion via a contour, which is then used to drive a generic 3D Finite Element Model (FEM). A novel contour-based 3D dynamic modeling method is presented. Modal reduction and modal warping techniques are applied to model the deformation of the tongue physically and efficiently. This work can be helpful in a variety of fields, such as speech production, silent speech recognition, articulation training, speech disorder study, etc.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126795495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-03-20DOI: 10.1109/ICASSP.2016.7471955
Takayuki Tomioka, Kazu Mishiba, Y. Oyamada, K. Kondo
Depth estimation for the lense-array type cameras is a challenging problem because of sensor noise and radiometric distortion which is a global brightness change between sub-aperture images caused by a vignetting effect of the micro-lenses. We propose a depth map estimation method which has robustness against the sensor noise and the radiometric distortion. Our method first binarizes sub-aperture images by applying the census transform. Next, the binarized images are matched by computing the majority operations between corresponding bits and summing up the Hamming distance. An initial map obtained by matching has ambiguity caused by extremely short baselines among sub-aperture images. We refine an initial map by the optimization which uses the assumption that the variations of the depth values in the depth map and of the pixel values in the texture-less objects are similar. Experiments show that our method outperforms the conventional methods.
{"title":"Depth map estimation using census transform for light field cameras","authors":"Takayuki Tomioka, Kazu Mishiba, Y. Oyamada, K. Kondo","doi":"10.1109/ICASSP.2016.7471955","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7471955","url":null,"abstract":"Depth estimation for the lense-array type cameras is a challenging problem because of sensor noise and radiometric distortion which is a global brightness change between sub-aperture images caused by a vignetting effect of the micro-lenses. We propose a depth map estimation method which has robustness against the sensor noise and the radiometric distortion. Our method first binarizes sub-aperture images by applying the census transform. Next, the binarized images are matched by computing the majority operations between corresponding bits and summing up the Hamming distance. An initial map obtained by matching has ambiguity caused by extremely short baselines among sub-aperture images. We refine an initial map by the optimization which uses the assumption that the variations of the depth values in the depth map and of the pixel values in the texture-less objects are similar. Experiments show that our method outperforms the conventional methods.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127005360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-03-20DOI: 10.1109/ICASSP.2016.7472027
Huizhong Deng, Yuchao Dai
Recovering both camera motions and non-rigid 3D shapes from 2D feature tracks is a challenging problem in computer vision. Long-term, complex non-rigid shape variations in real world videos further increase the difficulty for Non-rigid structure-from-motion (NRSfM). Furthermore, there does not exist a criterion to characterize the possibility in recovering the non-rigid shapes and camera motions (i.e., how easy or how difficult the problem could be). In this paper, we first present an analysis to the "reconstructability" measure for NRSfM, where we show that 3D shape complexity and camera motion complexity can be used to index the re-constructability. We propose an iterative shape clustering based method to NRSfM, which alternates between 3D shape clustering and 3D shape reconstruction. Thus, the global reconstructability has been improved and better reconstruction can be achieved. Experimental results on long-term, complex non-rigid motion sequences show that our method outperforms the current state-of-the-art methods by a margin.
{"title":"Pushing the limit of non-rigid structure-from-motion by shape clustering","authors":"Huizhong Deng, Yuchao Dai","doi":"10.1109/ICASSP.2016.7472027","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7472027","url":null,"abstract":"Recovering both camera motions and non-rigid 3D shapes from 2D feature tracks is a challenging problem in computer vision. Long-term, complex non-rigid shape variations in real world videos further increase the difficulty for Non-rigid structure-from-motion (NRSfM). Furthermore, there does not exist a criterion to characterize the possibility in recovering the non-rigid shapes and camera motions (i.e., how easy or how difficult the problem could be). In this paper, we first present an analysis to the \"reconstructability\" measure for NRSfM, where we show that 3D shape complexity and camera motion complexity can be used to index the re-constructability. We propose an iterative shape clustering based method to NRSfM, which alternates between 3D shape clustering and 3D shape reconstruction. Thus, the global reconstructability has been improved and better reconstruction can be achieved. Experimental results on long-term, complex non-rigid motion sequences show that our method outperforms the current state-of-the-art methods by a margin.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129232220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-03-20DOI: 10.1109/ICASSP.2016.7472489
M. Ulfarsson, V. Solo, J. Sigurdsson, J. R. Sveinsson
Non-negative matrix factorization (NMF) has found use in fields such as remote sensing and computer vision where the signals of interest are usually non-negative. Data dimensions in these applications can be huge and traditional algorithms break down due to unachievable memory demands. One is then compelled to consider distributed algorithms. In this paper, we develop for the first time a distributed version of NMF using the alternating direction method of multipliers (ADMM) algorithm and dyadic cyclic descent. The algorithm is compared to well established variants of NMF using simulated data, and is also evaluated using real remote sensing hyperspectral data.
{"title":"Distributed dyadic cyclic descent for non-negative matrix factorization","authors":"M. Ulfarsson, V. Solo, J. Sigurdsson, J. R. Sveinsson","doi":"10.1109/ICASSP.2016.7472489","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7472489","url":null,"abstract":"Non-negative matrix factorization (NMF) has found use in fields such as remote sensing and computer vision where the signals of interest are usually non-negative. Data dimensions in these applications can be huge and traditional algorithms break down due to unachievable memory demands. One is then compelled to consider distributed algorithms. In this paper, we develop for the first time a distributed version of NMF using the alternating direction method of multipliers (ADMM) algorithm and dyadic cyclic descent. The algorithm is compared to well established variants of NMF using simulated data, and is also evaluated using real remote sensing hyperspectral data.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124005338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-03-20DOI: 10.1109/ICASSP.2016.7471884
Gregory P. Meyer, Steven Alfano, M. Do
Face detection serves an important role in many computer vision systems. Typically, a face detector identifies faces within a grayscale or color image. Due to the recent increase in consumer depth cameras, obtaining both color and depth images of a scene has never been easier. We propose a technique that utilizes depth information to improve face detection. Standard face detection methods, such as the Viola-Jones object detection framework, detects faces by searching an image at every location and scale. Our method increases the speed and accuracy of the Viola-Jones face detector by utilizing depth data to constrain the detector's search over the image. Leveraging a Kinect camera, we are able to detect faces 3.5× faster, while greatly reducing the amount of false positives.
{"title":"Improving face detection with depth","authors":"Gregory P. Meyer, Steven Alfano, M. Do","doi":"10.1109/ICASSP.2016.7471884","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7471884","url":null,"abstract":"Face detection serves an important role in many computer vision systems. Typically, a face detector identifies faces within a grayscale or color image. Due to the recent increase in consumer depth cameras, obtaining both color and depth images of a scene has never been easier. We propose a technique that utilizes depth information to improve face detection. Standard face detection methods, such as the Viola-Jones object detection framework, detects faces by searching an image at every location and scale. Our method increases the speed and accuracy of the Viola-Jones face detector by utilizing depth data to constrain the detector's search over the image. Leveraging a Kinect camera, we are able to detect faces 3.5× faster, while greatly reducing the amount of false positives.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124246817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-03-20DOI: 10.1109/ICASSP.2016.7471692
Xuejie Liu, Xiaoli Zhong
Individual head-related transfer functions (HRTFs) are necessary for rendering authentic spatial perceptions in spatial audio applications. To obtain individual HRTFs while avoiding tedious and complicated measurement and calculation, an improved customization method based on anthropometry matching is proposed. In the method, a set of HRTFs, which is the best match to the pinna shape of the listener using four pinna-related anatomical parameters, is selected as the listener's individual HRTFs from a pre-acquired HRTF baseline database. A series of subject localization experiments was conducted to verify the effectiveness of the proposed method compared with the existing method. Results show that the median-plane localization performance of the customization method proposed in the present work is prior to the existing method, though performance improvement varies with source position.
{"title":"An improved anthropometry-based customization method of individual head-related transfer functions","authors":"Xuejie Liu, Xiaoli Zhong","doi":"10.1109/ICASSP.2016.7471692","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7471692","url":null,"abstract":"Individual head-related transfer functions (HRTFs) are necessary for rendering authentic spatial perceptions in spatial audio applications. To obtain individual HRTFs while avoiding tedious and complicated measurement and calculation, an improved customization method based on anthropometry matching is proposed. In the method, a set of HRTFs, which is the best match to the pinna shape of the listener using four pinna-related anatomical parameters, is selected as the listener's individual HRTFs from a pre-acquired HRTF baseline database. A series of subject localization experiments was conducted to verify the effectiveness of the proposed method compared with the existing method. Results show that the median-plane localization performance of the customization method proposed in the present work is prior to the existing method, though performance improvement varies with source position.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123417809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-03-20DOI: 10.1109/ICASSP.2016.7471658
I. Kodrasi, Ante Jukic, S. Doclo
This paper presents a novel signal-dependent method to increase the robustness of acoustic multi-channel equalization techniques against room impulse response (RIR) estimation errors. Aiming at obtaining an output signal which better resembles a clean speech signal, we propose to extend the acoustic multi-channel equalization cost function with a penalty function which promotes sparsity of the output signal in the short-time Fourier transform domain. Two conventionally used sparsity-promoting penalty functions are investigated, i.e., the l0-norm and the l1-norm, and the sparsity-promoting filters are iteratively computed using the alternating direction method of multipliers. Simulation results for several RIR estimation errors show that incorporating a sparsity-promoting penalty function significantly increases the robustness, with the l1-norm penalty function outperforming the l0-norm penalty function.
{"title":"Robust sparsity-promoting acoustic multi-channel equalization for speech dereverberation","authors":"I. Kodrasi, Ante Jukic, S. Doclo","doi":"10.1109/ICASSP.2016.7471658","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7471658","url":null,"abstract":"This paper presents a novel signal-dependent method to increase the robustness of acoustic multi-channel equalization techniques against room impulse response (RIR) estimation errors. Aiming at obtaining an output signal which better resembles a clean speech signal, we propose to extend the acoustic multi-channel equalization cost function with a penalty function which promotes sparsity of the output signal in the short-time Fourier transform domain. Two conventionally used sparsity-promoting penalty functions are investigated, i.e., the l0-norm and the l1-norm, and the sparsity-promoting filters are iteratively computed using the alternating direction method of multipliers. Simulation results for several RIR estimation errors show that incorporating a sparsity-promoting penalty function significantly increases the robustness, with the l1-norm penalty function outperforming the l0-norm penalty function.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121180521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-03-20DOI: 10.1109/ICASSP.2016.7472553
Xu Chen, Dongning Guo
Compressive sensing aims to recover a high-dimensional sparse signal from a relatively small number of measurements. In this paper, a novel design of the measurement matrix is proposed. The design is inspired by the construction of generalized low-density parity-check codes, where the capacity-achieving point-to-point codes serve as subcodes to robustly estimate the signal support. In the case that each entry of the n-dimensional ft-sparse signal lies in a known discrete alphabet, the proposed scheme requires only O(k log n) measurements and arithmetic operations. In the case of arbitrary, possibly continuous alphabet, an error propagation graph is proposed to characterize the residual estimation error. With O(k log2 n) measurements and computational complexity, the reconstruction error can be made arbitrarily small with high probability.
{"title":"A generalized LDPC framework for robust and sublinear compressive sensing","authors":"Xu Chen, Dongning Guo","doi":"10.1109/ICASSP.2016.7472553","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7472553","url":null,"abstract":"Compressive sensing aims to recover a high-dimensional sparse signal from a relatively small number of measurements. In this paper, a novel design of the measurement matrix is proposed. The design is inspired by the construction of generalized low-density parity-check codes, where the capacity-achieving point-to-point codes serve as subcodes to robustly estimate the signal support. In the case that each entry of the n-dimensional ft-sparse signal lies in a known discrete alphabet, the proposed scheme requires only O(k log n) measurements and arithmetic operations. In the case of arbitrary, possibly continuous alphabet, an error propagation graph is proposed to characterize the residual estimation error. With O(k log2 n) measurements and computational complexity, the reconstruction error can be made arbitrarily small with high probability.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114238365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Screen content coding (SCC) is the latest extension of the High-Efficiency Video Coding (HEVC) aiming to improve the compression efficiency of screen content video. With newly developed tools such as intra block copy (IntraBC) and palette (PLT) mode, SCC has been able to compress the desktop screens more efficiently but with significant complexity increase. In this paper, we improve the intra prediction from two aspects. Firstly, by leveraging the temporal correlation among coding units (CU), we develop a fast CU depth prediction scheme. Furthermore, adaptive search step is employed for further speed up of the time-consuming block matching in IntraBC. The overall encoding time is reduced by about 39% and 35% for the All Intra (AI) lossy and lossless encoding scenarios with negligible quality loss under the SCC common test condition.
{"title":"Fast intra mode decision and block matching for HEVC screen content compression","authors":"Hao Zhang, Qiao-Yan Zhou, Ningning Shi, Feng Yang, Xin Feng, Zhan Ma","doi":"10.1109/ICASSP.2016.7471902","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7471902","url":null,"abstract":"Screen content coding (SCC) is the latest extension of the High-Efficiency Video Coding (HEVC) aiming to improve the compression efficiency of screen content video. With newly developed tools such as intra block copy (IntraBC) and palette (PLT) mode, SCC has been able to compress the desktop screens more efficiently but with significant complexity increase. In this paper, we improve the intra prediction from two aspects. Firstly, by leveraging the temporal correlation among coding units (CU), we develop a fast CU depth prediction scheme. Furthermore, adaptive search step is employed for further speed up of the time-consuming block matching in IntraBC. The overall encoding time is reduced by about 39% and 35% for the All Intra (AI) lossy and lossless encoding scenarios with negligible quality loss under the SCC common test condition.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116184440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-03-20DOI: 10.1109/ICASSP.2016.7472793
S. Irtza, V. Sethu, Haris Bavattichalil, E. Ambikairajah, Haizhou Li
Most current language recognition systems model different levels of information such as acoustic, prosodic, phonotactic, etc. independently and combine the model likelihoods in order to make a decision. However, these are single level systems that treat all languages identically and hence incapable of exploiting any similarities that may exist within groups of languages. In this paper, a hierarchical language identification (HLID) framework is proposed that involves a series of classification decisions at multiple levels involving language clusters of decreasing sizes with individual languages identified only at the final level. The performance of proposed hierarchical framework is compared with a state-of-the-art LID system on the NIST 2007 database and the results indicate that the proposed approach outperforms state-of-the-art systems.
{"title":"A hierarchical framework for language identification","authors":"S. Irtza, V. Sethu, Haris Bavattichalil, E. Ambikairajah, Haizhou Li","doi":"10.1109/ICASSP.2016.7472793","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7472793","url":null,"abstract":"Most current language recognition systems model different levels of information such as acoustic, prosodic, phonotactic, etc. independently and combine the model likelihoods in order to make a decision. However, these are single level systems that treat all languages identically and hence incapable of exploiting any similarities that may exist within groups of languages. In this paper, a hierarchical language identification (HLID) framework is proposed that involves a series of classification decisions at multiple levels involving language clusters of decreasing sizes with individual languages identified only at the final level. The performance of proposed hierarchical framework is compared with a state-of-the-art LID system on the NIST 2007 database and the results indicate that the proposed approach outperforms state-of-the-art systems.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121510233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}