Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659467
Kazuya Kawai, H. Hontani, Tatsuya Yokota, M. Sakata, Y. Kimura
Positron emission tomography (PET) is an important imaging technique to visualize a number of functions in the brain or human body. For reconstructing PET images from the sinogram data, an inverse problem has to be solved using numerical optimizations such as expectation-maximization (EM)-based methods. However, the standard EM method suffers from measurement noise added in the sinogram data. In this paper, we propose a new simultaneous PET image reconstruction and parts extraction method using constrained non-negative matrix factorization. In contrast that the many existing methods reconstruct a single PET image independently, we reconstruct the time-series of PET images simultaneously from the time-series of sinograms using non-negative matrix factorization. Furthermore, we impose the smoothness constraint for the temporal feature, and the exclusive LASSO-based sparseness constraint for the spatial feature for robust image reconstruction and physically meaningful feature extraction.
{"title":"Simultaneous PET Image Reconstruction and Feature Extraction Method using Non-negative, Smooth, and Sparse Matrix Factorization","authors":"Kazuya Kawai, H. Hontani, Tatsuya Yokota, M. Sakata, Y. Kimura","doi":"10.23919/APSIPA.2018.8659467","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659467","url":null,"abstract":"Positron emission tomography (PET) is an important imaging technique to visualize a number of functions in the brain or human body. For reconstructing PET images from the sinogram data, an inverse problem has to be solved using numerical optimizations such as expectation-maximization (EM)-based methods. However, the standard EM method suffers from measurement noise added in the sinogram data. In this paper, we propose a new simultaneous PET image reconstruction and parts extraction method using constrained non-negative matrix factorization. In contrast that the many existing methods reconstruct a single PET image independently, we reconstruct the time-series of PET images simultaneously from the time-series of sinograms using non-negative matrix factorization. Furthermore, we impose the smoothness constraint for the temporal feature, and the exclusive LASSO-based sparseness constraint for the spatial feature for robust image reconstruction and physically meaningful feature extraction.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121449632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659468
Michael Hentschel, Marc Delcroix, A. Ogawa, T. Nakatani
In recent years, many approaches have been proposed for domain adaptation of neural network language models. These methods can be separated into two categories. The first is model-based adaptation, which creates a domain specific language model by re-training the weights in the network on the in-domain data. This requires domain annotation in the training and test data. The second is feature-based adaptation, which uses topic features to perform mainly bias adaptation of network input or output layers in an unsupervised manner. Recently, a scheme called learning hidden unit contributions was proposed for acoustic model adaptation. We propose applying this scheme to feature-based domain adaptation of recurrent neural network language model. In addition, we also investigate the combination of this approach with bias-based domain adaptation. For the experiments, we use a corpus based on TED talks and the CSJ lecture corpus to show perplexity and speech recognition results. Our proposed method consistently outperforms a pure non-adapted baseline and the combined approach can improve on pure bias adaptation.
{"title":"Feature-Based Learning Hidden Unit Contributions for Domain Adaptation of RNN-LMs","authors":"Michael Hentschel, Marc Delcroix, A. Ogawa, T. Nakatani","doi":"10.23919/APSIPA.2018.8659468","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659468","url":null,"abstract":"In recent years, many approaches have been proposed for domain adaptation of neural network language models. These methods can be separated into two categories. The first is model-based adaptation, which creates a domain specific language model by re-training the weights in the network on the in-domain data. This requires domain annotation in the training and test data. The second is feature-based adaptation, which uses topic features to perform mainly bias adaptation of network input or output layers in an unsupervised manner. Recently, a scheme called learning hidden unit contributions was proposed for acoustic model adaptation. We propose applying this scheme to feature-based domain adaptation of recurrent neural network language model. In addition, we also investigate the combination of this approach with bias-based domain adaptation. For the experiments, we use a corpus based on TED talks and the CSJ lecture corpus to show perplexity and speech recognition results. Our proposed method consistently outperforms a pure non-adapted baseline and the combined approach can improve on pure bias adaptation.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"2007 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127307164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659522
Wen-Ping Lai, Yong-Hsiang Wang, Kuan-Chun Chiu
In recent years, network functions virtualization (NFV) has been well perceived as the driving force behind innovations of the 5G system, such as slicing precious system resources for differential service needs. In this paper, we propose a container-based design of virtual evolved packet core (vEPC) slice and its light-weight version (LW-vEPC) based on the OpenAirInterface (OAI) software package. We have successfully containerized, and thus virtualized, the EPC component functions into two separate containers: the control-plane (CP) container for virtual home subscriber server (vHSS) and virtual mobility management entity (vMME), and the data-plane (DP) container for virtual serving and packet data network gateway (vSPGW). Via a joint configuration design of virtual linking, binding and bridging, including appropriate source and destination network address translation (SNAT and DNAT), both the intra-container and inter-container communications have been successfully realized. An OAI-based joint test of vEPC with a small-cell base station (ENB) has also been successfully demonstrated via a downlink video streaming showcase from the Internet to a cellular phone. The DP container itself can also perform as a LW-EPC slice near the mobile edge of ENB to greatly reduce the latency for time-critical applications. The resource allocation methodology of multiple CPU cores for vEPC and LW-EPC slicing is being developed. This paper proposes a simple but powerful algorithm called specifically assigned cores (SAC) to achieve better utilization of CPU cores. Our preliminary results show that SAC outperforms the default scheme, namely randomly assigned cores (RAC), in terms of lower CPU load and less packet loss. The superiority of SAC over RAC amplifies with the traffic level.
{"title":"Containerized Design and Realization of Network Functions Virtualization for a Light-Weight Evolved Packet Core Using OpenAirInterface","authors":"Wen-Ping Lai, Yong-Hsiang Wang, Kuan-Chun Chiu","doi":"10.23919/APSIPA.2018.8659522","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659522","url":null,"abstract":"In recent years, network functions virtualization (NFV) has been well perceived as the driving force behind innovations of the 5G system, such as slicing precious system resources for differential service needs. In this paper, we propose a container-based design of virtual evolved packet core (vEPC) slice and its light-weight version (LW-vEPC) based on the OpenAirInterface (OAI) software package. We have successfully containerized, and thus virtualized, the EPC component functions into two separate containers: the control-plane (CP) container for virtual home subscriber server (vHSS) and virtual mobility management entity (vMME), and the data-plane (DP) container for virtual serving and packet data network gateway (vSPGW). Via a joint configuration design of virtual linking, binding and bridging, including appropriate source and destination network address translation (SNAT and DNAT), both the intra-container and inter-container communications have been successfully realized. An OAI-based joint test of vEPC with a small-cell base station (ENB) has also been successfully demonstrated via a downlink video streaming showcase from the Internet to a cellular phone. The DP container itself can also perform as a LW-EPC slice near the mobile edge of ENB to greatly reduce the latency for time-critical applications. The resource allocation methodology of multiple CPU cores for vEPC and LW-EPC slicing is being developed. This paper proposes a simple but powerful algorithm called specifically assigned cores (SAC) to achieve better utilization of CPU cores. Our preliminary results show that SAC outperforms the default scheme, namely randomly assigned cores (RAC), in terms of lower CPU load and less packet loss. The superiority of SAC over RAC amplifies with the traffic level.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116821884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659748
M. Shiomi, Tsuyoshi Komatsubara, T. Kaczmarek, T. Kanda, H. Ishiguro
Good teachers recognize how each of their students is different from the others and adapt how they support them. We replicate such a capability to understand the individual specificities of children. Our approach observed the social signals of fifth graders based on their daily classroom behavior using a sensor network. We used depth cameras to track their positions and identified them with RGB cameras. We observed 84 children (three classes) and used these results to estimate school-related children's characteristics: self-efficacy, performance-goal, and exam scores. The estimation yielded 73.0–74.7% accuracy for the target variables.
{"title":"Estimating Children's Characteristics by Observing their Classroom Activities","authors":"M. Shiomi, Tsuyoshi Komatsubara, T. Kaczmarek, T. Kanda, H. Ishiguro","doi":"10.23919/APSIPA.2018.8659748","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659748","url":null,"abstract":"Good teachers recognize how each of their students is different from the others and adapt how they support them. We replicate such a capability to understand the individual specificities of children. Our approach observed the social signals of fifth graders based on their daily classroom behavior using a sensor network. We used depth cameras to track their positions and identified them with RGB cameras. We observed 84 children (three classes) and used these results to estimate school-related children's characteristics: self-efficacy, performance-goal, and exam scores. The estimation yielded 73.0–74.7% accuracy for the target variables.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116999156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659646
Le Yu, Guang-zheng Yu, Q. Meng
The speech transmission index (STI) is one of most common objective speech intelligibility metrics. The value of STI metric ranges from 0, indicating unintelligible speech, to 1, indicating excellent speech intelligibility, which could be influenced by signal to noise ratio (SNR), reverberation (or echo), and driving environment. In addition, a passenger talker could actually rotate his / her head to the driver when he / her talks to driver, which could induce the influence of speaker directivity on the STI. In current work, we measured several groups of binaural impulse responses on a human subject, under different orientations of a human speaker, and then calculate the corresponding STI values. Finally, we analyze the variation of STI caused by the human speaker's orientations, and give some advices on the measurement and evaluation method for STI in vehicle.
{"title":"Effect of Human Speaker's Head Rotation on Speech Transmission Index in Vehicle Sound Environment","authors":"Le Yu, Guang-zheng Yu, Q. Meng","doi":"10.23919/APSIPA.2018.8659646","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659646","url":null,"abstract":"The speech transmission index (STI) is one of most common objective speech intelligibility metrics. The value of STI metric ranges from 0, indicating unintelligible speech, to 1, indicating excellent speech intelligibility, which could be influenced by signal to noise ratio (SNR), reverberation (or echo), and driving environment. In addition, a passenger talker could actually rotate his / her head to the driver when he / her talks to driver, which could induce the influence of speaker directivity on the STI. In current work, we measured several groups of binaural impulse responses on a human subject, under different orientations of a human speaker, and then calculate the corresponding STI values. Finally, we analyze the variation of STI caused by the human speaker's orientations, and give some advices on the measurement and evaluation method for STI in vehicle.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132361108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659688
Yong-woo Lee, T. Bui, Jitae Shin
Since most of the pedestrian detection method focus on color images, the detection accuracy is lower when the images are captured at night or dark. In this paper, we propose a deep fusion network based pedestrian detection method. We utilize deconvolutional single shot multi-box detector (DSSD) fused at halfway stage. Also, we apply feature correlation for two image modality feature maps to produce a new feature map. For the experiment, we use KAIST dataset to train and test the proposed method. The experiment results show that the proposed method gains 22.46% lower miss rate compared to the KAIST pedestrian detection baseline. In addition, the proposed method shows at least 4.28% lower miss rate compared to the conventional halfway fusion method.
{"title":"Pedestrian Detection based on Deep Fusion Network using Feature Correlation","authors":"Yong-woo Lee, T. Bui, Jitae Shin","doi":"10.23919/APSIPA.2018.8659688","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659688","url":null,"abstract":"Since most of the pedestrian detection method focus on color images, the detection accuracy is lower when the images are captured at night or dark. In this paper, we propose a deep fusion network based pedestrian detection method. We utilize deconvolutional single shot multi-box detector (DSSD) fused at halfway stage. Also, we apply feature correlation for two image modality feature maps to produce a new feature map. For the experiment, we use KAIST dataset to train and test the proposed method. The experiment results show that the proposed method gains 22.46% lower miss rate compared to the KAIST pedestrian detection baseline. In addition, the proposed method shows at least 4.28% lower miss rate compared to the conventional halfway fusion method.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128460510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659534
Xinyue Ling, S. Ibi, K. Miyamoto, S. Sampei, J. Terada, A. Otaka
This paper investigates the uplink coordinated multiple point (CoMP) reception with quantize-and-forward (QF) relaying. In this relaying system, two relay nodes directly quantize the received symbols to integer numbers based on modulo-lattice and parallelly forward these numbers to the destination via optical fiber. In order to reduce the traffic load of optical fiber without sacrificing CoMP gain, we propose an optimization strategy from the viewpoint of mutual information which adaptively controls the quantization level at the relay nodes. We have demonstrated that the proposed optimization is capable of helping the system achieve high throughput and low traffic load of optical fiber at the same time by computer simulations.
{"title":"Optimization of Quantization Levels for Quantize-and-Forward Relaying with QAM Signaling","authors":"Xinyue Ling, S. Ibi, K. Miyamoto, S. Sampei, J. Terada, A. Otaka","doi":"10.23919/APSIPA.2018.8659534","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659534","url":null,"abstract":"This paper investigates the uplink coordinated multiple point (CoMP) reception with quantize-and-forward (QF) relaying. In this relaying system, two relay nodes directly quantize the received symbols to integer numbers based on modulo-lattice and parallelly forward these numbers to the destination via optical fiber. In order to reduce the traffic load of optical fiber without sacrificing CoMP gain, we propose an optimization strategy from the viewpoint of mutual information which adaptively controls the quantization level at the relay nodes. We have demonstrated that the proposed optimization is capable of helping the system achieve high throughput and low traffic load of optical fiber at the same time by computer simulations.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134454047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659673
Satoru Ishibashi, S. Koshita, M. Abe, M. Kawamata
In this paper, we implement adaptive notch filters with constrained poles and zeros (CPZ-ANFs) using fixed-point DSP. Since the CPZ-ANFs are IIR filters that have narrow notch width, a signal can be amplified significantly in their feedback loops. Therefore, direct-form II structure suffers from high probability of overflow in its internal state. When an overflow occurs in internal state of filters, inaccurate values due to the overflow are used repeatedly to calculate the output signal of the filters. As a result, the filters do not operate correctly and therefore we have to prevent such overflow. In order to avoid the overflow, we use direct-form I structure in implementation of the CPZ-ANFs. Experimental results show that our method allows the CPZ-ANFs to operate properly on the fixed-point DSP.
{"title":"DSP Implementation of Adaptive Notch Filters With Overflow Avoidance in Fixed-Point Arithmetic","authors":"Satoru Ishibashi, S. Koshita, M. Abe, M. Kawamata","doi":"10.23919/APSIPA.2018.8659673","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659673","url":null,"abstract":"In this paper, we implement adaptive notch filters with constrained poles and zeros (CPZ-ANFs) using fixed-point DSP. Since the CPZ-ANFs are IIR filters that have narrow notch width, a signal can be amplified significantly in their feedback loops. Therefore, direct-form II structure suffers from high probability of overflow in its internal state. When an overflow occurs in internal state of filters, inaccurate values due to the overflow are used repeatedly to calculate the output signal of the filters. As a result, the filters do not operate correctly and therefore we have to prevent such overflow. In order to avoid the overflow, we use direct-form I structure in implementation of the CPZ-ANFs. Experimental results show that our method allows the CPZ-ANFs to operate properly on the fixed-point DSP.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"214 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122378042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659453
Yoshiko Kawabata, Toshihiko Matsuka
The present study investigates how mutual beliefs are achieved by examining the relationship between actual behaviors and utterances in task-oriented dialogues. According to a widely accepted model, mutual belief about a task is considered to be achieved when a listener accepted utterances about the task given by another agent and gives some signs of task completion to the agent. However, by analyzing Japanese Map Task Dialogue Corpus (JMTDC), we found vast majority of conversations (94%) did not follow what was suggested by the model. We categorized those non-standard dialogues into six categories, namely, delayed acceptance, premature sign of completion, execution postponement, silent adjustment, unconfirmed, and indirection. We further analyzed those six categories carefully to see how and when participants were able to achieve mutual belief in the dialogues.
{"title":"How do people construct mutual beliefs in task-oriented dialogues?","authors":"Yoshiko Kawabata, Toshihiko Matsuka","doi":"10.23919/APSIPA.2018.8659453","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659453","url":null,"abstract":"The present study investigates how mutual beliefs are achieved by examining the relationship between actual behaviors and utterances in task-oriented dialogues. According to a widely accepted model, mutual belief about a task is considered to be achieved when a listener accepted utterances about the task given by another agent and gives some signs of task completion to the agent. However, by analyzing Japanese Map Task Dialogue Corpus (JMTDC), we found vast majority of conversations (94%) did not follow what was suggested by the model. We categorized those non-standard dialogues into six categories, namely, delayed acceptance, premature sign of completion, execution postponement, silent adjustment, unconfirmed, and indirection. We further analyzed those six categories carefully to see how and when participants were able to achieve mutual belief in the dialogues.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124439573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659729
Henglu Wei, Wei Zhou, Rui Bai, Zhemin Duan
In this paper, visual saliency is used to guide the coding tree unit (CTU) level bit allocation process in high efficiency video coding (HEVC) to improve the visual quality. At first, a saliency detection algorithm is proposed. With the detected saliency map, the distortion of each CTU is weighted by the corresponding saliency, so that the distortion of the salient areas is more critical. Then, the optimal bit allocation problem constraint by the picture level target bits and minimum quality fluctuation is built. Numerical method is used to solve the bit allocation problem. Experiment results show that quality gaining in salient areas is up to 0.8658 dB, the gaining of saliency weighted PSNR is up to 1.0318 dB.
{"title":"A Rate Control Algorithm for HEVC Considering Visual Saliency","authors":"Henglu Wei, Wei Zhou, Rui Bai, Zhemin Duan","doi":"10.23919/APSIPA.2018.8659729","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659729","url":null,"abstract":"In this paper, visual saliency is used to guide the coding tree unit (CTU) level bit allocation process in high efficiency video coding (HEVC) to improve the visual quality. At first, a saliency detection algorithm is proposed. With the detected saliency map, the distortion of each CTU is weighted by the corresponding saliency, so that the distortion of the salient areas is more critical. Then, the optimal bit allocation problem constraint by the picture level target bits and minimum quality fluctuation is built. Numerical method is used to solve the bit allocation problem. Experiment results show that quality gaining in salient areas is up to 0.8658 dB, the gaining of saliency weighted PSNR is up to 1.0318 dB.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128992471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}