Pub Date : 2014-12-01DOI: 10.1109/APSIPA.2014.7041724
C. Matcha, S. G. Srinivasa
Partial response maximum likelihood (PRML) scheme is a well known technique to equalize the data read from ID magnetic recording channels. The PRML scheme uses a linear equalizer followed by a maximum likelihood (ML) detector. This paper is novel in addressing the following aspects: a) We propose two different methods to design separable and non-separable 2D PR targets that help in signal detection, b) We propose an extension of ID Viterbi detector for signal detection in 2D ISI channels. We use the detector to study the efficacy of PR targets designed for a particular choice of 2D ISI channel.
{"title":"Target design and low complexity signal detection for two-dimensional magnetic recording","authors":"C. Matcha, S. G. Srinivasa","doi":"10.1109/APSIPA.2014.7041724","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041724","url":null,"abstract":"Partial response maximum likelihood (PRML) scheme is a well known technique to equalize the data read from ID magnetic recording channels. The PRML scheme uses a linear equalizer followed by a maximum likelihood (ML) detector. This paper is novel in addressing the following aspects: a) We propose two different methods to design separable and non-separable 2D PR targets that help in signal detection, b) We propose an extension of ID Viterbi detector for signal detection in 2D ISI channels. We use the detector to study the efficacy of PR targets designed for a particular choice of 2D ISI channel.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129211073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/APSIPA.2014.7041534
Rong Zhu, C. Bao, Mao-shen Jia, Bing Bu, Ling-song Zhou
Ambisonic decoder for irregular speaker arrays could be derived by optimization techniques. In terms of higher order reproduction system, the optimization program is hard to be guided to a large number of decoder coefficients. This paper describes a new method for higher order decoder based on the optimal symmetrical virtual microphone response (OSVMR). In the proposed method, the number of the decoder coefficients is reduced and the optimal symmetrical polar pattern of speaker feeds is obtained. The binaural evaluation demonstrates that the proposed method is found to be significantly better than reference methods on interaural time difference (ITD) and interaural level difference (ILD).
{"title":"The design of HOA irregular decoders based on the optimal symmetrical virtual microphone response","authors":"Rong Zhu, C. Bao, Mao-shen Jia, Bing Bu, Ling-song Zhou","doi":"10.1109/APSIPA.2014.7041534","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041534","url":null,"abstract":"Ambisonic decoder for irregular speaker arrays could be derived by optimization techniques. In terms of higher order reproduction system, the optimization program is hard to be guided to a large number of decoder coefficients. This paper describes a new method for higher order decoder based on the optimal symmetrical virtual microphone response (OSVMR). In the proposed method, the number of the decoder coefficients is reduced and the optimal symmetrical polar pattern of speaker feeds is obtained. The binaural evaluation demonstrates that the proposed method is found to be significantly better than reference methods on interaural time difference (ITD) and interaural level difference (ILD).","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128646892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/APSIPA.2014.7041636
Zhizheng Wu, Sheng Gao, Eng Siong Cling, Haizhou Li
Replay, which is to playback a pre-recorded speech sample, presents a genuine risk to automatic speaker verification technology. In this study, we evaluate the vulnerability of text-dependent speaker verification systems under the replay attack using a standard benchmarking database, and also propose an anti-spoofing technique to safeguard the speaker verification systems. The key idea of the spoofing detection technique is to decide whether the presented sample is matched to any previous stored speech samples based a similarity score. The experiments conducted on the RSR2015 database showed that the equal error rate (EER) and false acceptance rate (FAR) increased from both 2.92 % to 25.56 % and 78.36 % respectively as a result of the replay attack. It confirmed the vulnerability of speaker verification to replay attacks. On the other hand, our proposed spoofing countermeasure was able to reduce the FARs from 78.36 % and 73.14 % to 0.06 % and 0.0 % for male and female systems, respectively, in the face of replay spoofing. The experiments confirmed the effectiveness of the proposed anti-spoofing technique.
{"title":"A study on replay attack and anti-spoofing for text-dependent speaker verification","authors":"Zhizheng Wu, Sheng Gao, Eng Siong Cling, Haizhou Li","doi":"10.1109/APSIPA.2014.7041636","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041636","url":null,"abstract":"Replay, which is to playback a pre-recorded speech sample, presents a genuine risk to automatic speaker verification technology. In this study, we evaluate the vulnerability of text-dependent speaker verification systems under the replay attack using a standard benchmarking database, and also propose an anti-spoofing technique to safeguard the speaker verification systems. The key idea of the spoofing detection technique is to decide whether the presented sample is matched to any previous stored speech samples based a similarity score. The experiments conducted on the RSR2015 database showed that the equal error rate (EER) and false acceptance rate (FAR) increased from both 2.92 % to 25.56 % and 78.36 % respectively as a result of the replay attack. It confirmed the vulnerability of speaker verification to replay attacks. On the other hand, our proposed spoofing countermeasure was able to reduce the FARs from 78.36 % and 73.14 % to 0.06 % and 0.0 % for male and female systems, respectively, in the face of replay spoofing. The experiments confirmed the effectiveness of the proposed anti-spoofing technique.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129278846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/APSIPA.2014.7041521
K. Raja, Ramachandra Raghavendra, C. Busch
Smartphones are increasingly used as biométrie sensor for many authentication applications due to the computational ability and high resolution cameras that can be used to capture biométrie information. The objective of this paper is to assess the performance of iris versus periocular recognition for smartphones in non ideal conditions (change of illumination, highly pigmented iris, shadows on iris pattern) in real-life for verification in visible spectrum. We introduce various protocols for real-life verification scenarios using smartphones for iris and periocular recognition. Further, we also study the verification performance where enrollment and probe data originate from different smartphones. From the extensive set of experiments conducted on a publicly available smartphone database, it can be observed that the information from periocular region provides substantially good performance in terms of recognition accuracy in cross sensor and varying illumination scenarios as compared to iris under same conditions.
{"title":"Empirical evaluation of visible spectrum iris versus periocular recognition in unconstrained scenario on smartphones","authors":"K. Raja, Ramachandra Raghavendra, C. Busch","doi":"10.1109/APSIPA.2014.7041521","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041521","url":null,"abstract":"Smartphones are increasingly used as biométrie sensor for many authentication applications due to the computational ability and high resolution cameras that can be used to capture biométrie information. The objective of this paper is to assess the performance of iris versus periocular recognition for smartphones in non ideal conditions (change of illumination, highly pigmented iris, shadows on iris pattern) in real-life for verification in visible spectrum. We introduce various protocols for real-life verification scenarios using smartphones for iris and periocular recognition. Further, we also study the verification performance where enrollment and probe data originate from different smartphones. From the extensive set of experiments conducted on a publicly available smartphone database, it can be observed that the information from periocular region provides substantially good performance in terms of recognition accuracy in cross sensor and varying illumination scenarios as compared to iris under same conditions.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115396810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/APSIPA.2014.7041538
Liang-Chih Yu, K. R. Lai
Dimensional emotion representation such as valence and arousal (VA) space has been an emerging way to represent emotions. In this representation, emotion words can be projected to the VA space according to their valence and arousal values. Sentence and document-level emotions can then be projected based on the emotion words within them. However, emotion expressions in sentences and documents usually contain various modifier structure such as negation (e.g., not happy), degree (very happy) and emotion compounds. Such modifier structure can provide more precise information for measuring VA values in both sentence and document-levels. In this study, we analyze various types of modifier structure for emotion expressions. In addition, we also investigate the effect of different types of modifier structure on measuring VA values for emotion expressions.
{"title":"Analysis of modifier structure for emotion expressions","authors":"Liang-Chih Yu, K. R. Lai","doi":"10.1109/APSIPA.2014.7041538","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041538","url":null,"abstract":"Dimensional emotion representation such as valence and arousal (VA) space has been an emerging way to represent emotions. In this representation, emotion words can be projected to the VA space according to their valence and arousal values. Sentence and document-level emotions can then be projected based on the emotion words within them. However, emotion expressions in sentences and documents usually contain various modifier structure such as negation (e.g., not happy), degree (very happy) and emotion compounds. Such modifier structure can provide more precise information for measuring VA values in both sentence and document-levels. In this study, we analyze various types of modifier structure for emotion expressions. In addition, we also investigate the effect of different types of modifier structure on measuring VA values for emotion expressions.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114283992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/APSIPA.2014.7041804
JongWon Kim, Taeheum Na, A. C. Risdianto, Byung-Rae Cha, Sun Park
The lifecycle management of service realization is very challenging. With virtualized playgrounds over Future Internet testbeds, the lifecycle experiments could be easily exercised so that all tasks and responsibilities are well-defined for entire experiment stages among developers and operators. Also, the dynamic provisioning of hyper-convergent compute/networking/storage resources is appropriately streamlined with the experiment lifecycle. In this paper, by considering these issues, we discuss the agile and economic realization of automated media-centric experiments over OF@TEIN (OpenFlow @ Trans-Eurasian Information Network) SDI (Software-Defined Infrastructure).
服务实现的生命周期管理非常具有挑战性。有了未来互联网测试平台上的虚拟游乐场,生命周期实验可以很容易地进行,这样开发人员和操作人员在整个实验阶段都可以定义所有的任务和责任。此外,超融合计算/网络/存储资源的动态供应也随着实验的生命周期得到适当的精简。在本文中,通过考虑这些问题,我们讨论了通过OF@TEIN (OpenFlow @ Trans-Eurasian Information Network) SDI(软件定义基础设施)实现以媒体为中心的自动化实验的敏捷和经济实现。
{"title":"Agile and economic media-centric service realization over Software-Defined Infrastructure","authors":"JongWon Kim, Taeheum Na, A. C. Risdianto, Byung-Rae Cha, Sun Park","doi":"10.1109/APSIPA.2014.7041804","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041804","url":null,"abstract":"The lifecycle management of service realization is very challenging. With virtualized playgrounds over Future Internet testbeds, the lifecycle experiments could be easily exercised so that all tasks and responsibilities are well-defined for entire experiment stages among developers and operators. Also, the dynamic provisioning of hyper-convergent compute/networking/storage resources is appropriately streamlined with the experiment lifecycle. In this paper, by considering these issues, we discuss the agile and economic realization of automated media-centric experiments over OF@TEIN (OpenFlow @ Trans-Eurasian Information Network) SDI (Software-Defined Infrastructure).","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115282692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/APSIPA.2014.7041715
Chuang Shi, H. Nomura, T. Kamakura, W. Gan
The parametric loudspeaker is a type of directional loudspeakers making use of the nonlinear acoustic effects. The past studies to reproduce the three-dimensional audio contents with a pair of the parametric loudspeakers have demonstrated satisfactory performance. In this paper, the steerable parametric loudspeakers are proposed to relocate the sweet spot to follow the head movement of the listener. Although the spatial aliasing effects are observed in the steerable parametric loudspeaker, they can be converted to generate multiple sound beams simultaneously. A new case of the grating lobe elimination, namely the over elimination, is studied to extend the controllable level difference between the two sound beams. The simulation results to compare the equal and Chebyshev weights are also presented in this paper.
{"title":"Development of a steerable stereophonic parametric loudspeaker","authors":"Chuang Shi, H. Nomura, T. Kamakura, W. Gan","doi":"10.1109/APSIPA.2014.7041715","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041715","url":null,"abstract":"The parametric loudspeaker is a type of directional loudspeakers making use of the nonlinear acoustic effects. The past studies to reproduce the three-dimensional audio contents with a pair of the parametric loudspeakers have demonstrated satisfactory performance. In this paper, the steerable parametric loudspeakers are proposed to relocate the sweet spot to follow the head movement of the listener. Although the spatial aliasing effects are observed in the steerable parametric loudspeaker, they can be converted to generate multiple sound beams simultaneously. A new case of the grating lobe elimination, namely the over elimination, is studied to extend the controllable level difference between the two sound beams. The simulation results to compare the equal and Chebyshev weights are also presented in this paper.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114799596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/APSIPA.2014.7041579
Misaki Nagae, T. Irino, R. Nisimura, Hideki Kawahara, R. Patterson
This paper describes a simulator for presenting normal hearing (NH) listeners with the experience of a hearing impaired (HI) listener. The simulator is based on the compressive gammachirp (cGC) filter used to derive level-dependent filter shapes and the cochlear compression function from to notched-noise masking data. The level dependence of the cGC is reversed to produce inverse compression which is used to resynthesize sounds that cancel the compression applied by the auditory system of the NH listener. A frame-based analysis/synthesis procedure is newly introduced to improve processing speed for a graphical user interface (GUI) that allows the users to control the degree of compression within the range of the audiogram of the HI person. The simulator is intended for speech-language-hearing therapists (ST) and patients' families.
{"title":"Hearing impairment simulator based on compressive gammachirp filter","authors":"Misaki Nagae, T. Irino, R. Nisimura, Hideki Kawahara, R. Patterson","doi":"10.1109/APSIPA.2014.7041579","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041579","url":null,"abstract":"This paper describes a simulator for presenting normal hearing (NH) listeners with the experience of a hearing impaired (HI) listener. The simulator is based on the compressive gammachirp (cGC) filter used to derive level-dependent filter shapes and the cochlear compression function from to notched-noise masking data. The level dependence of the cGC is reversed to produce inverse compression which is used to resynthesize sounds that cancel the compression applied by the auditory system of the NH listener. A frame-based analysis/synthesis procedure is newly introduced to improve processing speed for a graphical user interface (GUI) that allows the users to control the degree of compression within the range of the audiogram of the HI person. The simulator is intended for speech-language-hearing therapists (ST) and patients' families.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126151718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/APSIPA.2014.7041661
Jing-Ming Guo, Jia-Yu Chang, Yun-Fu Liu
Error diffusion is an efficient halftone method for mainly being applied on printers. The promising high image quality and processing efficiency endorse it as a popular and competitive candidate in halftoning and multitoning applications. The multitoning is an extension of halftoning, adopting more than three tone levels for improving the similarity between an original image and the converted image. Yet, the banding effect, indicating the areas with only one tone level, disturbs the visual perception, and thus seriously degrades image quality. To cope with the banding effect, the tone replacement strategy is proposed in this study. As documented in the experimental results, excellent tone-similarity as that of the original image and promising reconstructed dot-distribution can be provided simultaneously. Comparing with the former banding-free methods, the apparent improvements/features suggest that the proposed method can be a very competitive candidate for multitoning applications.
{"title":"Banding effect removal for digital multitoning","authors":"Jing-Ming Guo, Jia-Yu Chang, Yun-Fu Liu","doi":"10.1109/APSIPA.2014.7041661","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041661","url":null,"abstract":"Error diffusion is an efficient halftone method for mainly being applied on printers. The promising high image quality and processing efficiency endorse it as a popular and competitive candidate in halftoning and multitoning applications. The multitoning is an extension of halftoning, adopting more than three tone levels for improving the similarity between an original image and the converted image. Yet, the banding effect, indicating the areas with only one tone level, disturbs the visual perception, and thus seriously degrades image quality. To cope with the banding effect, the tone replacement strategy is proposed in this study. As documented in the experimental results, excellent tone-similarity as that of the original image and promising reconstructed dot-distribution can be provided simultaneously. Comparing with the former banding-free methods, the apparent improvements/features suggest that the proposed method can be a very competitive candidate for multitoning applications.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128106379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/APSIPA.2014.7041648
Mingliang Zhou, Hai-Miao Hu, Yongfei Zhang
In High Efficiency Video Coding (HEVC), the coding efficiency of infra-frames is lower than inter-frames, which will cause the flicker artifact and perceptual fluctuation among CTUs in low bitrates applications. Therefore, this paper proposes a region-based intra-frame rate-control scheme to improve the objective quality and to reduce PSNR fluctuation among CTUs. Firstly, the CTUs in intra-frame are classified into three regions according to their characteristics and complexity. And a region-based bit allocation is proposed to pre-determine bit among different regions. Secondly, a rate-complexity-quality model is proposed for infra-frame to adjust the QPs to achieve a smooth perceptual quality. The experimental results demonstrate that the proposed scheme can achieve higher coding performance and consistent visual quality when compared with the scheme adopted by HM12.0.
在高效视频编码(High Efficiency Video Coding, HEVC)中,基础帧的编码效率低于帧间的编码效率,在低比特率应用中会造成帧间的闪烁伪影和感知波动。因此,本文提出了一种基于区域的帧内速率控制方案,以提高目标质量,降低帧间的PSNR波动。首先,根据帧内cpu的特征和复杂度将其划分为三个区域。提出了一种基于区域的比特分配方法,在不同区域之间预先确定比特。其次,提出了一种速率-复杂度-质量模型,用于调整qp以获得平滑的感知质量。实验结果表明,与HM12.0所采用的编码方案相比,该方案具有更高的编码性能和一致的视觉质量。
{"title":"Region-based intra-frame rate-control scheme for High Efficiency Video Coding","authors":"Mingliang Zhou, Hai-Miao Hu, Yongfei Zhang","doi":"10.1109/APSIPA.2014.7041648","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041648","url":null,"abstract":"In High Efficiency Video Coding (HEVC), the coding efficiency of infra-frames is lower than inter-frames, which will cause the flicker artifact and perceptual fluctuation among CTUs in low bitrates applications. Therefore, this paper proposes a region-based intra-frame rate-control scheme to improve the objective quality and to reduce PSNR fluctuation among CTUs. Firstly, the CTUs in intra-frame are classified into three regions according to their characteristics and complexity. And a region-based bit allocation is proposed to pre-determine bit among different regions. Secondly, a rate-complexity-quality model is proposed for infra-frame to adjust the QPs to achieve a smooth perceptual quality. The experimental results demonstrate that the proposed scheme can achieve higher coding performance and consistent visual quality when compared with the scheme adopted by HM12.0.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126406148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}