Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659626
Jiahong Zhao, C. Ritz
This paper investigates the application of the steered response power - phase transform (SRP-PHAT) method to coprime microphone array (CPMA) recordings to estimate the direction of arrival (DOA) of speech sources. While existing CPMA approaches for acoustics applications are limited, especially under reverberant conditions, the proposed algorithm utilises SRP-PHAT to estimate the DOA of speech sources and then employs a histogram-based stochastic algorithm using steered response power (SRP) adjustment and kernel density evaluation (KDE) to improve the DOA estimation accuracy. Experiments are conducted for up to three simultaneous speech sources in the far field considering both anechoic and reverberant scenarios. Results suggest that the proposed approach achieves more accurate DOA estimates than a uniform linear array (ULA) with the same number of microphones under both anechoic and low reverberant conditions, and it significantly decreases the number of microphones of another equivalent ULA while maintaining similar performances. Moreover, the operating frequency of the microphone array is largely increased without changing the number of microphones, making it possible to accurately record higher-frequency components of source signals.
{"title":"Investigating Co-Prime Microphone Arrays for Speech Direction of Arrival Estimation","authors":"Jiahong Zhao, C. Ritz","doi":"10.23919/APSIPA.2018.8659626","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659626","url":null,"abstract":"This paper investigates the application of the steered response power - phase transform (SRP-PHAT) method to coprime microphone array (CPMA) recordings to estimate the direction of arrival (DOA) of speech sources. While existing CPMA approaches for acoustics applications are limited, especially under reverberant conditions, the proposed algorithm utilises SRP-PHAT to estimate the DOA of speech sources and then employs a histogram-based stochastic algorithm using steered response power (SRP) adjustment and kernel density evaluation (KDE) to improve the DOA estimation accuracy. Experiments are conducted for up to three simultaneous speech sources in the far field considering both anechoic and reverberant scenarios. Results suggest that the proposed approach achieves more accurate DOA estimates than a uniform linear array (ULA) with the same number of microphones under both anechoic and low reverberant conditions, and it significantly decreases the number of microphones of another equivalent ULA while maintaining similar performances. Moreover, the operating frequency of the microphone array is largely increased without changing the number of microphones, making it possible to accurately record higher-frequency components of source signals.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127104814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659645
Takuya Takahashi, T. Hori, Christoph M. Wilk, S. Sagayama
In this paper, we discuss non-negative matrix factorization (NMF) applied to chroma feature sequences to reduce the chroma-specific noise in chord estimation from music signals using the hidden Markov model (HMM). Even in the case of single pitch sounds, the raw 12-dimensional chroma vectors obtained from the music signal by summing and normalizing the spectrum by octaves often contain irrelevant components such as non-octave overtones falling into different pitch classes and cause inaccuracies in estimation of harmonies. NMF applied to the chroma domain is expected to suppress such chroma components in the NMF activation matrix caused by overtones, and thus “purifies” the noisy chroma vectors. By reducing the dimensionality to 12 dimensions as opposed to NMF applied to the raw spectrum, we expect advantages with respect to statistical robustness as well as computational cost for pitch class estimation of single and multiple tones. We use the “purified” chroma vectors in combination with a harmony progression model based on an HMM where the NMF activation distributions are modeled as observations associated with hidden harmonies, whose transition probabilities have been obtained statistically. We attempt to improve harmony estimation accuracy by combining suppression of irrelevant components and the HMM-based harmony model. In the experimental evaluation, we demonstrate the reduction of irrelevant components in raw chroma vectors computed from recordings of musical instruments. In addition, using music audio data with harmony annotation from the RWC database, we compare the harmony estimation accuracies using our method and conventional chroma.
{"title":"Semi-Supervised NMF in the chroma Domain Applied to Music Harmony Estimation","authors":"Takuya Takahashi, T. Hori, Christoph M. Wilk, S. Sagayama","doi":"10.23919/APSIPA.2018.8659645","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659645","url":null,"abstract":"In this paper, we discuss non-negative matrix factorization (NMF) applied to chroma feature sequences to reduce the chroma-specific noise in chord estimation from music signals using the hidden Markov model (HMM). Even in the case of single pitch sounds, the raw 12-dimensional chroma vectors obtained from the music signal by summing and normalizing the spectrum by octaves often contain irrelevant components such as non-octave overtones falling into different pitch classes and cause inaccuracies in estimation of harmonies. NMF applied to the chroma domain is expected to suppress such chroma components in the NMF activation matrix caused by overtones, and thus “purifies” the noisy chroma vectors. By reducing the dimensionality to 12 dimensions as opposed to NMF applied to the raw spectrum, we expect advantages with respect to statistical robustness as well as computational cost for pitch class estimation of single and multiple tones. We use the “purified” chroma vectors in combination with a harmony progression model based on an HMM where the NMF activation distributions are modeled as observations associated with hidden harmonies, whose transition probabilities have been obtained statistically. We attempt to improve harmony estimation accuracy by combining suppression of irrelevant components and the HMM-based harmony model. In the experimental evaluation, we demonstrate the reduction of irrelevant components in raw chroma vectors computed from recordings of musical instruments. In addition, using music audio data with harmony annotation from the RWC database, we compare the harmony estimation accuracies using our method and conventional chroma.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126170805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659529
Takahiro Maekawa, Ayana Kawamura, Yuma Kinoshita, H. Kiya
Privacy-preserving Support Vector Machine (SVM) computing scheme is proposed in this paper. Cloud computing has been spreading in many fields. However, the cloud computing has some serious issues for end users, such as unauthorized use and leak of data, and privacy compromise. We focus on templates protected by a block scrambling-based encryption scheme, and consider some properties of the protected templates for secure SVM computing, where templates mean features extracted from data. The proposed scheme enables us not only to protect templates, but also to have the same performance as that of unprotected templates under some useful kernel functions. Moreover, it can be directly carried out by using well-known SVM algorithms, without preparing any algorithms specialized for secure SVM computing. In an experiment, the pfroposed scheme is applied to a face-based authentication algorithm with SVM classifiers to confirm the effectiveness.
{"title":"Privacy-Preserving SVM Computing in the Encrypted Domain","authors":"Takahiro Maekawa, Ayana Kawamura, Yuma Kinoshita, H. Kiya","doi":"10.23919/APSIPA.2018.8659529","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659529","url":null,"abstract":"Privacy-preserving Support Vector Machine (SVM) computing scheme is proposed in this paper. Cloud computing has been spreading in many fields. However, the cloud computing has some serious issues for end users, such as unauthorized use and leak of data, and privacy compromise. We focus on templates protected by a block scrambling-based encryption scheme, and consider some properties of the protected templates for secure SVM computing, where templates mean features extracted from data. The proposed scheme enables us not only to protect templates, but also to have the same performance as that of unprotected templates under some useful kernel functions. Moreover, it can be directly carried out by using well-known SVM algorithms, without preparing any algorithms specialized for secure SVM computing. In an experiment, the pfroposed scheme is applied to a face-based authentication algorithm with SVM classifiers to confirm the effectiveness.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114063091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659565
Rui Bai, Wei Zhou, Guanwen Zhang, Henglu Wei
Saliency detection has been widely used to predict human fixation. In this paper, a Visual Saliency Detection Algorithm in Compressed HEVC Domain is proposed which consists of three parts: static saliency detection, dynamic saliency detection and competitive fusion. Firstly, the Gauss model is used to filter out the background of the static features which are extracted by down-sampling and DCT. Secondly, the motion vectors are used to represent the dynamic feature. Then the dynamic saliency is calculated by filtering out the background of dynamic feature. Finally, the competitive fusion model is used to adaptively combine the characteristic of static and dynamic saliency maps. Experimental results show that the proposed method is superior to classic state-of-the-art saliency detection methods with 0.05 AUC value increasing and 0.17 KL divergence decreasing on average. The average time of one frame detection is 2.3 seconds.
{"title":"Visual Saliency Detection Algorithm in Compressed HEVC Domain","authors":"Rui Bai, Wei Zhou, Guanwen Zhang, Henglu Wei","doi":"10.23919/APSIPA.2018.8659565","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659565","url":null,"abstract":"Saliency detection has been widely used to predict human fixation. In this paper, a Visual Saliency Detection Algorithm in Compressed HEVC Domain is proposed which consists of three parts: static saliency detection, dynamic saliency detection and competitive fusion. Firstly, the Gauss model is used to filter out the background of the static features which are extracted by down-sampling and DCT. Secondly, the motion vectors are used to represent the dynamic feature. Then the dynamic saliency is calculated by filtering out the background of dynamic feature. Finally, the competitive fusion model is used to adaptively combine the characteristic of static and dynamic saliency maps. Experimental results show that the proposed method is superior to classic state-of-the-art saliency detection methods with 0.05 AUC value increasing and 0.17 KL divergence decreasing on average. The average time of one frame detection is 2.3 seconds.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"258 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114300222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659482
Maolong Tang, Ming-Ting Sun, Leonardo Seda, J. Swanson, Zhengyou Zhang
It is important to measure an infant's length regularly to estimate the growth velocity to make sure that the infant is growing normally. Traditionally, measuring an infant's length is performed with an infantometer. However, the infant struggles and cries in the measuring process, and it often needs three persons to position the infant's head, legs, and the boards of the infantometer during the process. Thus, it is not practical for a parent to perform this measurement at home regularly. In this paper, we propose a new approach which allows the measurement of an infant's length using a cellphone picture without the need to position the infant. Our algorithm automatically calculates the 3D positions of the body parts and the total length of the infant with the help of round stickers. The round stickers can be put on the infant's body easily in a few seconds, before the picture is taken. This new technology would make frequent measurements of the infant's length and the tracking of the growth velocity possible.
{"title":"Measuring Infant's Length with an Image","authors":"Maolong Tang, Ming-Ting Sun, Leonardo Seda, J. Swanson, Zhengyou Zhang","doi":"10.23919/APSIPA.2018.8659482","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659482","url":null,"abstract":"It is important to measure an infant's length regularly to estimate the growth velocity to make sure that the infant is growing normally. Traditionally, measuring an infant's length is performed with an infantometer. However, the infant struggles and cries in the measuring process, and it often needs three persons to position the infant's head, legs, and the boards of the infantometer during the process. Thus, it is not practical for a parent to perform this measurement at home regularly. In this paper, we propose a new approach which allows the measurement of an infant's length using a cellphone picture without the need to position the infant. Our algorithm automatically calculates the 3D positions of the body parts and the total length of the infant with the help of round stickers. The round stickers can be put on the infant's body easily in a few seconds, before the picture is taken. This new technology would make frequent measurements of the infant's length and the tracking of the growth velocity possible.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122557015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659728
R. Jinzai, K. Yamaoka, Mitsuo Matsumoto, Takeshi Yamada, S. Makino
In this paper, microphone realignment by phase extrapolation using the virtual microphone technique for reproducing binaural signals with adequate the interaural time differences (ITDs) for a listener is proposed. For a sound source in the horizontal plane, ITDs are a major cues for localizing a sound image. Since ITDs are not considered for headphones listening in conventional amplitude panning in multichannel recording, sound images are localized inside the head (lateralization). A microphone array is applicable to recording signals with time differences corresponding to the directions of sound sources. Since microphones in such an array are closely positioned, the time differences are inappropriate as ITDs for localizing sound images for the sources. In this paper, phase extrapolation using the virtual microphone technique is applied to the virtual realignment of a microphone in such an array for restoring ITD. In the experiments with two speeches as sound sources located at the leftmost and the rightmost positions from the viewpoint of two real microphones positioned 2.83 cm apart. Furthermore, the phase of a signal of a virtual realigned microphone is extrapolated eight times as much as the phase between the two real microphones. Time differences between signals of one of the real microphones and the realigned one are observed to be $-500 boldsymbol{mu}mathbf{s}$ for the source on the left and $500 boldsymbol{mu}mathbf{s}$ for the source on the right. Furthermore, the interaural cross correlations of the two signals suggest that sound images will be perceived on both the left and right of a listener. In this method, it is expected that prior information on the number of sources and the direction of arrival is not required, and the adjustment of individual differences is easy.
{"title":"Microphone Position Realignment by Extrapolation of Virtual Microphone","authors":"R. Jinzai, K. Yamaoka, Mitsuo Matsumoto, Takeshi Yamada, S. Makino","doi":"10.23919/APSIPA.2018.8659728","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659728","url":null,"abstract":"In this paper, microphone realignment by phase extrapolation using the virtual microphone technique for reproducing binaural signals with adequate the interaural time differences (ITDs) for a listener is proposed. For a sound source in the horizontal plane, ITDs are a major cues for localizing a sound image. Since ITDs are not considered for headphones listening in conventional amplitude panning in multichannel recording, sound images are localized inside the head (lateralization). A microphone array is applicable to recording signals with time differences corresponding to the directions of sound sources. Since microphones in such an array are closely positioned, the time differences are inappropriate as ITDs for localizing sound images for the sources. In this paper, phase extrapolation using the virtual microphone technique is applied to the virtual realignment of a microphone in such an array for restoring ITD. In the experiments with two speeches as sound sources located at the leftmost and the rightmost positions from the viewpoint of two real microphones positioned 2.83 cm apart. Furthermore, the phase of a signal of a virtual realigned microphone is extrapolated eight times as much as the phase between the two real microphones. Time differences between signals of one of the real microphones and the realigned one are observed to be $-500 boldsymbol{mu}mathbf{s}$ for the source on the left and $500 boldsymbol{mu}mathbf{s}$ for the source on the right. Furthermore, the interaural cross correlations of the two signals suggest that sound images will be perceived on both the left and right of a listener. In this method, it is expected that prior information on the number of sources and the direction of arrival is not required, and the adjustment of individual differences is easy.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123046519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659468
Michael Hentschel, Marc Delcroix, A. Ogawa, T. Nakatani
In recent years, many approaches have been proposed for domain adaptation of neural network language models. These methods can be separated into two categories. The first is model-based adaptation, which creates a domain specific language model by re-training the weights in the network on the in-domain data. This requires domain annotation in the training and test data. The second is feature-based adaptation, which uses topic features to perform mainly bias adaptation of network input or output layers in an unsupervised manner. Recently, a scheme called learning hidden unit contributions was proposed for acoustic model adaptation. We propose applying this scheme to feature-based domain adaptation of recurrent neural network language model. In addition, we also investigate the combination of this approach with bias-based domain adaptation. For the experiments, we use a corpus based on TED talks and the CSJ lecture corpus to show perplexity and speech recognition results. Our proposed method consistently outperforms a pure non-adapted baseline and the combined approach can improve on pure bias adaptation.
{"title":"Feature-Based Learning Hidden Unit Contributions for Domain Adaptation of RNN-LMs","authors":"Michael Hentschel, Marc Delcroix, A. Ogawa, T. Nakatani","doi":"10.23919/APSIPA.2018.8659468","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659468","url":null,"abstract":"In recent years, many approaches have been proposed for domain adaptation of neural network language models. These methods can be separated into two categories. The first is model-based adaptation, which creates a domain specific language model by re-training the weights in the network on the in-domain data. This requires domain annotation in the training and test data. The second is feature-based adaptation, which uses topic features to perform mainly bias adaptation of network input or output layers in an unsupervised manner. Recently, a scheme called learning hidden unit contributions was proposed for acoustic model adaptation. We propose applying this scheme to feature-based domain adaptation of recurrent neural network language model. In addition, we also investigate the combination of this approach with bias-based domain adaptation. For the experiments, we use a corpus based on TED talks and the CSJ lecture corpus to show perplexity and speech recognition results. Our proposed method consistently outperforms a pure non-adapted baseline and the combined approach can improve on pure bias adaptation.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"2007 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127307164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659673
Satoru Ishibashi, S. Koshita, M. Abe, M. Kawamata
In this paper, we implement adaptive notch filters with constrained poles and zeros (CPZ-ANFs) using fixed-point DSP. Since the CPZ-ANFs are IIR filters that have narrow notch width, a signal can be amplified significantly in their feedback loops. Therefore, direct-form II structure suffers from high probability of overflow in its internal state. When an overflow occurs in internal state of filters, inaccurate values due to the overflow are used repeatedly to calculate the output signal of the filters. As a result, the filters do not operate correctly and therefore we have to prevent such overflow. In order to avoid the overflow, we use direct-form I structure in implementation of the CPZ-ANFs. Experimental results show that our method allows the CPZ-ANFs to operate properly on the fixed-point DSP.
{"title":"DSP Implementation of Adaptive Notch Filters With Overflow Avoidance in Fixed-Point Arithmetic","authors":"Satoru Ishibashi, S. Koshita, M. Abe, M. Kawamata","doi":"10.23919/APSIPA.2018.8659673","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659673","url":null,"abstract":"In this paper, we implement adaptive notch filters with constrained poles and zeros (CPZ-ANFs) using fixed-point DSP. Since the CPZ-ANFs are IIR filters that have narrow notch width, a signal can be amplified significantly in their feedback loops. Therefore, direct-form II structure suffers from high probability of overflow in its internal state. When an overflow occurs in internal state of filters, inaccurate values due to the overflow are used repeatedly to calculate the output signal of the filters. As a result, the filters do not operate correctly and therefore we have to prevent such overflow. In order to avoid the overflow, we use direct-form I structure in implementation of the CPZ-ANFs. Experimental results show that our method allows the CPZ-ANFs to operate properly on the fixed-point DSP.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"214 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122378042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659453
Yoshiko Kawabata, Toshihiko Matsuka
The present study investigates how mutual beliefs are achieved by examining the relationship between actual behaviors and utterances in task-oriented dialogues. According to a widely accepted model, mutual belief about a task is considered to be achieved when a listener accepted utterances about the task given by another agent and gives some signs of task completion to the agent. However, by analyzing Japanese Map Task Dialogue Corpus (JMTDC), we found vast majority of conversations (94%) did not follow what was suggested by the model. We categorized those non-standard dialogues into six categories, namely, delayed acceptance, premature sign of completion, execution postponement, silent adjustment, unconfirmed, and indirection. We further analyzed those six categories carefully to see how and when participants were able to achieve mutual belief in the dialogues.
{"title":"How do people construct mutual beliefs in task-oriented dialogues?","authors":"Yoshiko Kawabata, Toshihiko Matsuka","doi":"10.23919/APSIPA.2018.8659453","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659453","url":null,"abstract":"The present study investigates how mutual beliefs are achieved by examining the relationship between actual behaviors and utterances in task-oriented dialogues. According to a widely accepted model, mutual belief about a task is considered to be achieved when a listener accepted utterances about the task given by another agent and gives some signs of task completion to the agent. However, by analyzing Japanese Map Task Dialogue Corpus (JMTDC), we found vast majority of conversations (94%) did not follow what was suggested by the model. We categorized those non-standard dialogues into six categories, namely, delayed acceptance, premature sign of completion, execution postponement, silent adjustment, unconfirmed, and indirection. We further analyzed those six categories carefully to see how and when participants were able to achieve mutual belief in the dialogues.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124439573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659729
Henglu Wei, Wei Zhou, Rui Bai, Zhemin Duan
In this paper, visual saliency is used to guide the coding tree unit (CTU) level bit allocation process in high efficiency video coding (HEVC) to improve the visual quality. At first, a saliency detection algorithm is proposed. With the detected saliency map, the distortion of each CTU is weighted by the corresponding saliency, so that the distortion of the salient areas is more critical. Then, the optimal bit allocation problem constraint by the picture level target bits and minimum quality fluctuation is built. Numerical method is used to solve the bit allocation problem. Experiment results show that quality gaining in salient areas is up to 0.8658 dB, the gaining of saliency weighted PSNR is up to 1.0318 dB.
{"title":"A Rate Control Algorithm for HEVC Considering Visual Saliency","authors":"Henglu Wei, Wei Zhou, Rui Bai, Zhemin Duan","doi":"10.23919/APSIPA.2018.8659729","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659729","url":null,"abstract":"In this paper, visual saliency is used to guide the coding tree unit (CTU) level bit allocation process in high efficiency video coding (HEVC) to improve the visual quality. At first, a saliency detection algorithm is proposed. With the detected saliency map, the distortion of each CTU is weighted by the corresponding saliency, so that the distortion of the salient areas is more critical. Then, the optimal bit allocation problem constraint by the picture level target bits and minimum quality fluctuation is built. Numerical method is used to solve the bit allocation problem. Experiment results show that quality gaining in salient areas is up to 0.8658 dB, the gaining of saliency weighted PSNR is up to 1.0318 dB.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128992471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}