Pub Date : 2017-04-01DOI: 10.1109/ICASSP.2017.7952403
Xavier Fontaine, R. Achanta, S. Süsstrunk
Face recognition systems are designed to handle well-aligned images captured under controlled situations. However real-world images present varying orientations, expressions, and illumination conditions. Traditional face recognition algorithms perform poorly on such images. In this paper we present a method for face recognition adapted to real-world conditions that can be trained using very few training examples and is computationally efficient. Our method consists of performing a novel alignment process followed by classification using sparse representation techniques. We present our recognition rates on a difficult dataset that represents real-world faces where we significantly outperform state-of-the-art methods.
{"title":"Face recognition in real-world images","authors":"Xavier Fontaine, R. Achanta, S. Süsstrunk","doi":"10.1109/ICASSP.2017.7952403","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7952403","url":null,"abstract":"Face recognition systems are designed to handle well-aligned images captured under controlled situations. However real-world images present varying orientations, expressions, and illumination conditions. Traditional face recognition algorithms perform poorly on such images. In this paper we present a method for face recognition adapted to real-world conditions that can be trained using very few training examples and is computationally efficient. Our method consists of performing a novel alignment process followed by classification using sparse representation techniques. We present our recognition rates on a difficult dataset that represents real-world faces where we significantly outperform state-of-the-art methods.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128957156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-03-23DOI: 10.1109/ICASSP.2017.7952463
Shiwei Zhou, Y. Hu, Hongrui Jiang
A novel patch-based multi-view image denoising algorithm is proposed. This method leverages the 3D focus image stacks structure to exploit self-similarity and image redundancy inherent in multiple view images. Then a depth-guided adaptive window and dynamic view selection criterion is developed to aid proper selection of most consistent patches for the multi-view image denoising. Extensive experiments have been performed. Comparing the outcomes against those of state of the art image denoising algorithms, our proposed algorithm demonstrates significant performance advantage.
{"title":"Patch-based multiple view image denoising with occlusion handling","authors":"Shiwei Zhou, Y. Hu, Hongrui Jiang","doi":"10.1109/ICASSP.2017.7952463","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7952463","url":null,"abstract":"A novel patch-based multi-view image denoising algorithm is proposed. This method leverages the 3D focus image stacks structure to exploit self-similarity and image redundancy inherent in multiple view images. Then a depth-guided adaptive window and dynamic view selection criterion is developed to aid proper selection of most consistent patches for the multi-view image denoising. Extensive experiments have been performed. Comparing the outcomes against those of state of the art image denoising algorithms, our proposed algorithm demonstrates significant performance advantage.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114611982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-03-23DOI: 10.1109/ICASSP.2017.7952593
Rizwan Sadiq, E. Erzin
Lips deliver visually active clues for speech articulation. Affective states define how humans articulate speech; hence, they also change articulation of lip motion. In this paper, we investigate effect of phonetic classes for affect recognition from lip articulations. The affect recognition problem is formalized in discrete activation, valence and dominance attributes. We use the symmetric KullbackLeibler divergence (KLD) to rate phonetic classes with larger discrimination across different affective states. We perform experimental evaluations using the IEMOCAP database. Our results demonstrate that lip articulations over a set of discriminative phonetic classes improves the affect recognition performance, and attains 3-class recognition rates for the activation, valence and dominance (AVD) attributes as 72.16%, 46.44% and 64.92%, respectively.
{"title":"Affect recognition from lip articulations","authors":"Rizwan Sadiq, E. Erzin","doi":"10.1109/ICASSP.2017.7952593","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7952593","url":null,"abstract":"Lips deliver visually active clues for speech articulation. Affective states define how humans articulate speech; hence, they also change articulation of lip motion. In this paper, we investigate effect of phonetic classes for affect recognition from lip articulations. The affect recognition problem is formalized in discrete activation, valence and dominance attributes. We use the symmetric KullbackLeibler divergence (KLD) to rate phonetic classes with larger discrimination across different affective states. We perform experimental evaluations using the IEMOCAP database. Our results demonstrate that lip articulations over a set of discriminative phonetic classes improves the affect recognition performance, and attains 3-class recognition rates for the activation, valence and dominance (AVD) attributes as 72.16%, 46.44% and 64.92%, respectively.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115331219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-03-22DOI: 10.1109/ICASSP.2017.7952983
Zeyu Fu, P. Feng, S. M. Naqvi, J. Chambers
Structured sparse representation has been recently found to achieve better efficiency and robustness in exploiting the target appearance model in tracking systems with both holistic and local information. Therefore, to better simultaneously discriminate multi-targets from their background, we propose a novel video-based multi-target tracking system that combines the particle probability hypothesis density (PHD) filter with discriminative group-structured dictionary learning. The discriminative dictionary with group structure learned by the hierarchical K-means clustering algorithm implicitly associates the dictionary atoms with the group labels, simultaneously enforcing the target candidates from the same group (class) to share the same structured sparsity pattern. Furthermore, we propose a new joint likelihood calculation by relating the discriminative sparse codes with the maximum voting technique to enhance the particle PHD updating step. Experimental results on two publicly available benchmark video sequences confirm the improved performance of our proposed method over other state-of-the-art techniques in video-based multi-target tracking.
{"title":"Particle PHD filter based multi-target tracking using discriminative group-structured dictionary learning","authors":"Zeyu Fu, P. Feng, S. M. Naqvi, J. Chambers","doi":"10.1109/ICASSP.2017.7952983","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7952983","url":null,"abstract":"Structured sparse representation has been recently found to achieve better efficiency and robustness in exploiting the target appearance model in tracking systems with both holistic and local information. Therefore, to better simultaneously discriminate multi-targets from their background, we propose a novel video-based multi-target tracking system that combines the particle probability hypothesis density (PHD) filter with discriminative group-structured dictionary learning. The discriminative dictionary with group structure learned by the hierarchical K-means clustering algorithm implicitly associates the dictionary atoms with the group labels, simultaneously enforcing the target candidates from the same group (class) to share the same structured sparsity pattern. Furthermore, we propose a new joint likelihood calculation by relating the discriminative sparse codes with the maximum voting technique to enhance the particle PHD updating step. Experimental results on two publicly available benchmark video sequences confirm the improved performance of our proposed method over other state-of-the-art techniques in video-based multi-target tracking.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116044764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-03-20DOI: 10.1109/ICASSP.2017.7952331
Elmar Messner, Martin Hagmüller, P. Swatek, F. Smolle-Jüttner, F. Pernkopf
The aim of this work is the estimation of respiratory flow from lung sound recordings, i.e. acoustic airflow estimation. With a 16-channel lung sound recording device, we simultaneously record the respiratory flow and the lung sounds on the posterior chest from six lung-healthy subjects in supine position. For the recordings of four selected sensor positions, we extract linear frequency cepstral coefficient (LFCC) features and map these on the airflow signal. We use multivariate polynomial regression to fit the features to the airflow signal. Compared to most of the previous approaches, the proposed method uses lung sounds instead of trachea sounds. Furthermore, our method masters the estimation of the airflow without prior knowledge of the respiratory phase, i.e. no additional algorithm for phase detection is required. Another benefit is the avoidance of time-consuming calibration. In experiments, we evaluate the proposed method for various selections of sensor positions in terms of mean squared error (MSE) between estimated and actual airflow. Moreover, we show the accuracy of the method regarding a frame-based breathing-phase detection.
{"title":"Respiratory airflow estimation from lung sounds based on regression","authors":"Elmar Messner, Martin Hagmüller, P. Swatek, F. Smolle-Jüttner, F. Pernkopf","doi":"10.1109/ICASSP.2017.7952331","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7952331","url":null,"abstract":"The aim of this work is the estimation of respiratory flow from lung sound recordings, i.e. acoustic airflow estimation. With a 16-channel lung sound recording device, we simultaneously record the respiratory flow and the lung sounds on the posterior chest from six lung-healthy subjects in supine position. For the recordings of four selected sensor positions, we extract linear frequency cepstral coefficient (LFCC) features and map these on the airflow signal. We use multivariate polynomial regression to fit the features to the airflow signal. Compared to most of the previous approaches, the proposed method uses lung sounds instead of trachea sounds. Furthermore, our method masters the estimation of the airflow without prior knowledge of the respiratory phase, i.e. no additional algorithm for phase detection is required. Another benefit is the avoidance of time-consuming calibration. In experiments, we evaluate the proposed method for various selections of sensor positions in terms of mean squared error (MSE) between estimated and actual airflow. Moreover, we show the accuracy of the method regarding a frame-based breathing-phase detection.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"147 Pt 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126307337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-03-17DOI: 10.1109/ICASSP.2017.7953137
Zhaocheng Huang, J. Epps
Continuous prediction of dimensional emotions (e.g. arousal and valence) has attracted increasing research interest recently. When processing emotional speech signals, phonetic features have been rarely used due to the assumption that phonetic variability is a confounding factor that degrades emotion recognition/prediction performance. In this paper, instead of eliminating phonetic variability, we investigated whether Phone Log-Likelihood Ratio (PLLR) features could be used to index arousal and valence in a pairwise low/high framework. A multi-stage staircase regression (SR) framework which enables fusion at three different stages is also investigated. Results on the RECOLA database show that PLLR outperforms EGEMAPS features for arousal and valence. Interestingly, long-term averaged PLLR proved to be more robust and emotionally informative than local frame-level PLLR, which contains more phoneme-specific information. Within the multistage SR framework, PLLR yielded an 8.2% and 11.6% relative improvement in CCC for arousal and valence respectively, showing great promise for including phonetic features in emotion prediction systems.
{"title":"A PLLR and multi-stage Staircase Regression framework for speech-based emotion prediction","authors":"Zhaocheng Huang, J. Epps","doi":"10.1109/ICASSP.2017.7953137","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7953137","url":null,"abstract":"Continuous prediction of dimensional emotions (e.g. arousal and valence) has attracted increasing research interest recently. When processing emotional speech signals, phonetic features have been rarely used due to the assumption that phonetic variability is a confounding factor that degrades emotion recognition/prediction performance. In this paper, instead of eliminating phonetic variability, we investigated whether Phone Log-Likelihood Ratio (PLLR) features could be used to index arousal and valence in a pairwise low/high framework. A multi-stage staircase regression (SR) framework which enables fusion at three different stages is also investigated. Results on the RECOLA database show that PLLR outperforms EGEMAPS features for arousal and valence. Interestingly, long-term averaged PLLR proved to be more robust and emotionally informative than local frame-level PLLR, which contains more phoneme-specific information. Within the multistage SR framework, PLLR yielded an 8.2% and 11.6% relative improvement in CCC for arousal and valence respectively, showing great promise for including phonetic features in emotion prediction systems.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115506717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-03-16DOI: 10.1109/ICASSP.2017.7953135
Reza Lotfidereshgi, P. Gournay
Conventional feature-based classification methods do not apply well to automatic recognition of speech emotions, mostly because the precise set of spectral and prosodic features that is required to identify the emotional state of a speaker has not been determined yet. This paper presents a method that operates directly on the speech signal, thus avoiding the problematic step of feature extraction. Furthermore, this method combines the strengths of the classical source-filter model of human speech production with those of the recently introduced liquid state machine (LSM), a biologically-inspired spiking neural network (SNN). The source and vocal tract components of the speech signal are first separated and converted into perceptually relevant spectral representations. These representations are then processed separately by two reservoirs of neurons. The output of each reservoir is reduced in dimensionality and fed to a final classifier. This method is shown to provide very good classification performance on the Berlin Database of Emotional Speech (Emo-DB). This seems a very promising framework for solving efficiently many other problems in speech processing.
{"title":"Biologically inspired speech emotion recognition","authors":"Reza Lotfidereshgi, P. Gournay","doi":"10.1109/ICASSP.2017.7953135","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7953135","url":null,"abstract":"Conventional feature-based classification methods do not apply well to automatic recognition of speech emotions, mostly because the precise set of spectral and prosodic features that is required to identify the emotional state of a speaker has not been determined yet. This paper presents a method that operates directly on the speech signal, thus avoiding the problematic step of feature extraction. Furthermore, this method combines the strengths of the classical source-filter model of human speech production with those of the recently introduced liquid state machine (LSM), a biologically-inspired spiking neural network (SNN). The source and vocal tract components of the speech signal are first separated and converted into perceptually relevant spectral representations. These representations are then processed separately by two reservoirs of neurons. The output of each reservoir is reduced in dimensionality and fed to a final classifier. This method is shown to provide very good classification performance on the Berlin Database of Emotional Speech (Emo-DB). This seems a very promising framework for solving efficiently many other problems in speech processing.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115301322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-03-15DOI: 10.1109/ICASSP.2017.7952552
Seyedmahdad Mirsamadi, Emad Barsoum, Cha Zhang
Automatic emotion recognition from speech is a challenging task which relies heavily on the effectiveness of the speech features used for classification. In this work, we study the use of deep learning to automatically discover emotionally relevant features from speech. It is shown that using a deep recurrent neural network, we can learn both the short-time frame-level acoustic features that are emotionally relevant, as well as an appropriate temporal aggregation of those features into a compact utterance-level representation. Moreover, we propose a novel strategy for feature pooling over time which uses local attention in order to focus on specific regions of a speech signal that are more emotionally salient. The proposed solution is evaluated on the IEMOCAP corpus, and is shown to provide more accurate predictions compared to existing emotion recognition algorithms.
{"title":"Automatic speech emotion recognition using recurrent neural networks with local attention","authors":"Seyedmahdad Mirsamadi, Emad Barsoum, Cha Zhang","doi":"10.1109/ICASSP.2017.7952552","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7952552","url":null,"abstract":"Automatic emotion recognition from speech is a challenging task which relies heavily on the effectiveness of the speech features used for classification. In this work, we study the use of deep learning to automatically discover emotionally relevant features from speech. It is shown that using a deep recurrent neural network, we can learn both the short-time frame-level acoustic features that are emotionally relevant, as well as an appropriate temporal aggregation of those features into a compact utterance-level representation. Moreover, we propose a novel strategy for feature pooling over time which uses local attention in order to focus on specific regions of a speech signal that are more emotionally salient. The proposed solution is evaluated on the IEMOCAP corpus, and is shown to provide more accurate predictions compared to existing emotion recognition algorithms.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128664871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-03-14DOI: 10.1109/ICASSP.2017.7952178
K. Aono, S. Chakrabartty, T. Yamasaki
Ambient infrasound with frequency ranges well below 20 Hz is known to carry robust navigation cues that can be exploited to authenticate the location of a speaker. Unfortunately, many of the mobile devices like smartphones have been optimized to work in the human auditory range, thereby suppressing information in the infrasonic region. In this paper, we show that these ultra-low frequency cues can still be extracted from a standard smartphone recording by using acceleration-based cepstral features. To validate our claim, we have collected smartphone recordings from more than 30 different scenes and used the cues for scene fingerprinting. We report scene recognition rates in excess of 90% and a feature set analysis reveals the importance of the infrasonic signatures towards achieving the state-of-the-art recognition performance.
{"title":"Infrasonic scene fingerprinting for authenticating speaker location","authors":"K. Aono, S. Chakrabartty, T. Yamasaki","doi":"10.1109/ICASSP.2017.7952178","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7952178","url":null,"abstract":"Ambient infrasound with frequency ranges well below 20 Hz is known to carry robust navigation cues that can be exploited to authenticate the location of a speaker. Unfortunately, many of the mobile devices like smartphones have been optimized to work in the human auditory range, thereby suppressing information in the infrasonic region. In this paper, we show that these ultra-low frequency cues can still be extracted from a standard smartphone recording by using acceleration-based cepstral features. To validate our claim, we have collected smartphone recordings from more than 30 different scenes and used the cues for scene fingerprinting. We report scene recognition rates in excess of 90% and a feature set analysis reveals the importance of the infrasonic signatures towards achieving the state-of-the-art recognition performance.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115037695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-03-14DOI: 10.1109/ICASSP.2017.7952505
Lu Wang, Cheolkon Jung
In this paper, we propose surrounding adaptive tone mapping in displayed images under ambient light. Under strong ambient light, the displayed images on the screen are darkly perceived by human eyes, especially in dark regions. We deal with the ambient light problem in mobile devices by brightness enhancement and adaptive tone mapping. First, we perform brightness compensation in dark regions using Bartleson-Breneman equation which represents lightness effect on the image under different surrounding illuminations. Then, we perform adaptive tone mapping to reproduce the whole image under various ambient light conditions. Adaptive tone mapping combines human visual characteristics with a tone mapping operation considering ambient light influence. Experimental results demonstrate that the proposed method significantly enhances the readability of displayed images under different surrounding light conditions.
{"title":"Surrounding adaptive tone mapping in displayed images under ambient light","authors":"Lu Wang, Cheolkon Jung","doi":"10.1109/ICASSP.2017.7952505","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7952505","url":null,"abstract":"In this paper, we propose surrounding adaptive tone mapping in displayed images under ambient light. Under strong ambient light, the displayed images on the screen are darkly perceived by human eyes, especially in dark regions. We deal with the ambient light problem in mobile devices by brightness enhancement and adaptive tone mapping. First, we perform brightness compensation in dark regions using Bartleson-Breneman equation which represents lightness effect on the image under different surrounding illuminations. Then, we perform adaptive tone mapping to reproduce the whole image under various ambient light conditions. Adaptive tone mapping combines human visual characteristics with a tone mapping operation considering ambient light influence. Experimental results demonstrate that the proposed method significantly enhances the readability of displayed images under different surrounding light conditions.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114166618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}