首页 > 最新文献

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)最新文献

英文 中文
A Prediction Model for End-of-Utterance Based on Prosodic Features and Phrase-Dependency in Spontaneous Japanese 基于韵律特征和短语依赖的自发性日语语末预测模型
Y. Ishimoto, Takehiro Teraoka, M. Enomoto
This study aims to reveal a clue for predicting end-of-utterance in spontaneous Japanese speech. In casual everyday conversation, participants must predict the ends of utterances of a speaker to perform smooth turn-taking with small gaps or overlaps. Syntactic and prosodic factors are considered to project the end of utterance of speech, and participants utilize these factors to predict the end-of-utterance. In this paper, we focused on the dependency structure among bunsetsu-phrases as a syntactic feature and F0, intensity, and mora duration for bunsetsu-phrases as prosodic features. We investigated the relationship between the position of a bunsetsu-phrase in an utterance and these features. The results showed that a single feature cannot be an authoritative clue that determines the position of bunsetsu-phrases. Next, we constructed a Bayesian hierarchical model to estimate the bunsetsu-phrase position from the syntactic and prosodic features. The results of the model indicated that prosodic features vary in usefulness according to speakers. This suggests that the different combinations of syntactic and prosodic features for each speaker are relevant to predict the ends of utterances.
本研究旨在揭示一个预测日语自发语中话语结束的线索。在日常随意的对话中,参与者必须预测说话者的话语结束,以在小间隙或重叠的情况下进行平稳的轮流。句法和韵律因素被认为是预测话语结束的因素,参与者利用这些因素来预测话语结束。本文主要研究了小句的句法特征和小句的F0、强度、语气持续时间等韵律特征之间的依存关系。我们研究了短语在话语中的位置与这些特征之间的关系。结果表明,单一的特征不能作为确定词组位置的权威线索。其次,我们构建了贝叶斯层次模型,从句法和韵律特征来估计短语位置。该模型的结果表明,韵律特征的有用性因说话者而异。这表明,每个说话者的句法和韵律特征的不同组合与预测话语的结尾有关。
{"title":"A Prediction Model for End-of-Utterance Based on Prosodic Features and Phrase-Dependency in Spontaneous Japanese","authors":"Y. Ishimoto, Takehiro Teraoka, M. Enomoto","doi":"10.23919/APSIPA.2018.8659535","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659535","url":null,"abstract":"This study aims to reveal a clue for predicting end-of-utterance in spontaneous Japanese speech. In casual everyday conversation, participants must predict the ends of utterances of a speaker to perform smooth turn-taking with small gaps or overlaps. Syntactic and prosodic factors are considered to project the end of utterance of speech, and participants utilize these factors to predict the end-of-utterance. In this paper, we focused on the dependency structure among bunsetsu-phrases as a syntactic feature and F0, intensity, and mora duration for bunsetsu-phrases as prosodic features. We investigated the relationship between the position of a bunsetsu-phrase in an utterance and these features. The results showed that a single feature cannot be an authoritative clue that determines the position of bunsetsu-phrases. Next, we constructed a Bayesian hierarchical model to estimate the bunsetsu-phrase position from the syntactic and prosodic features. The results of the model indicated that prosodic features vary in usefulness according to speakers. This suggests that the different combinations of syntactic and prosodic features for each speaker are relevant to predict the ends of utterances.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116277861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low-Frequency Character Clustering for End-to-End ASR System 端到端ASR系统的低频特征聚类
Hitoshi Ito, Aiko Hagiwara, Manon Ichiki, Takeshi S. Kobayakawa, T. Mishima, Shoei Sato, A. Kobayashi
We developed a label-designing and restoration method for end-to-end automatic speech recognition based on connectionist temporal classification (CTC). With an end-to-end speech-recognition system including thousands of output labels such as words or characters, it is difficult to train a robust model because of data sparsity. With our proposed method, characters with less training data are estimated using the context of a language model rather than the acoustic features. Our method involves two steps. First, we train acoustic models using 70 class labels instead of thousands of low-frequency labels. Second, the class labels are restored to the original labels by using a weighted finite state transducer and n-gram language model. We applied the proposed method to a Japanese end-to-end automatic speech-recognition system including labels of over 3,000 characters. Experimental results indicate that the word error rate relatively improved with our method by a maximum of 15.5% compared with a conventional CTC-based method and is comparable to state-of-the-art hybrid DNN methods.
提出了一种基于连接时间分类(CTC)的端到端自动语音识别的标签设计与恢复方法。端到端语音识别系统包含数千个输出标签,如单词或字符,由于数据稀疏性,很难训练出鲁棒模型。使用我们提出的方法,使用语言模型的上下文而不是声学特征来估计训练数据较少的字符。我们的方法包括两个步骤。首先,我们使用70类标签而不是数千个低频标签来训练声学模型。其次,使用加权有限状态传感器和n-gram语言模型将类标签恢复到原始标签;我们将提出的方法应用于一个日语端到端自动语音识别系统,该系统包含超过3000个字符的标签。实验结果表明,与传统的基于ctc的方法相比,该方法的单词错误率最高提高了15.5%,与最先进的混合深度神经网络方法相当。
{"title":"Low-Frequency Character Clustering for End-to-End ASR System","authors":"Hitoshi Ito, Aiko Hagiwara, Manon Ichiki, Takeshi S. Kobayakawa, T. Mishima, Shoei Sato, A. Kobayashi","doi":"10.23919/APSIPA.2018.8659735","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659735","url":null,"abstract":"We developed a label-designing and restoration method for end-to-end automatic speech recognition based on connectionist temporal classification (CTC). With an end-to-end speech-recognition system including thousands of output labels such as words or characters, it is difficult to train a robust model because of data sparsity. With our proposed method, characters with less training data are estimated using the context of a language model rather than the acoustic features. Our method involves two steps. First, we train acoustic models using 70 class labels instead of thousands of low-frequency labels. Second, the class labels are restored to the original labels by using a weighted finite state transducer and n-gram language model. We applied the proposed method to a Japanese end-to-end automatic speech-recognition system including labels of over 3,000 characters. Experimental results indicate that the word error rate relatively improved with our method by a maximum of 15.5% compared with a conventional CTC-based method and is comparable to state-of-the-art hybrid DNN methods.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116867622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing the Performance of Halftoning-Based Block Truncation Coding 基于半色调的块截断编码性能优化
Zi-Xin Xu, Y. Chan, D. Lun
Block Truncation Coding (BTC) is an effective lossy image coding technique that enjoys both high efficiency and low complexity especially when halftoning techniques are employed to shape the noise spectrum of its output. However, due to its block-based nature, blocking artifacts are commonly found in the coding outputs. Post-processing schemes are generally applied to soften the problem. Recently, a halftoning-based BTC algorithm was proposed to solve this problem by eliminating the cause of blocking artifacts. In this paper, through an optimization step, the performance of the algorithm is optimized in terms of a given objective measure. The idea can be adopted to work with other halftoning methods to optimize other measures for suiting different needs in different circumstances.
块截断编码(BTC)是一种有效的有损图像编码技术,具有高效率和低复杂度,特别是当采用半调技术对其输出的噪声谱进行塑造时。然而,由于其基于块的性质,块构件通常出现在编码输出中。通常采用后处理方案来缓和这个问题。最近提出了一种基于半色调的BTC算法,通过消除阻塞伪影的原因来解决这一问题。本文通过一个优化步骤,根据给定的客观度量对算法的性能进行优化。这个想法可以与其他半调色方法一起使用,以优化其他措施,以适应不同情况下的不同需求。
{"title":"Optimizing the Performance of Halftoning-Based Block Truncation Coding","authors":"Zi-Xin Xu, Y. Chan, D. Lun","doi":"10.23919/APSIPA.2018.8659744","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659744","url":null,"abstract":"Block Truncation Coding (BTC) is an effective lossy image coding technique that enjoys both high efficiency and low complexity especially when halftoning techniques are employed to shape the noise spectrum of its output. However, due to its block-based nature, blocking artifacts are commonly found in the coding outputs. Post-processing schemes are generally applied to soften the problem. Recently, a halftoning-based BTC algorithm was proposed to solve this problem by eliminating the cause of blocking artifacts. In this paper, through an optimization step, the performance of the algorithm is optimized in terms of a given objective measure. The idea can be adopted to work with other halftoning methods to optimize other measures for suiting different needs in different circumstances.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124052371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data Hiding in MP4 Video Container based on Subtitle Track 基于字幕轨道的MP4视频容器数据隐藏
ChuanSheng Chan, Koksheik Wong, Imdad MaungMaung
This paper proposes a data hiding method in MP4 container format. Specifically, the synchronization between subtitle and audio-video tracks is exploited to hide data. The time scale is first scaled, and the sample duration pair is modified to hide data. The proposed method is able to hide data reversibly when the payload size is relative small, and it switches to the irreversible mode to offer higher payload. Although synchronization between audio-video and subtitle tracks are manipulated, the delay or ahead in displaying subtitle is imperceptible. The filesize of the processed MP4 file is also completely preserved. Subjective evaluations are carried out to verify the basic performance of the proposed method.
提出了一种MP4容器格式的数据隐藏方法。具体来说,利用字幕和音视频轨道之间的同步来隐藏数据。首先缩放时间尺度,然后修改样本持续时间对以隐藏数据。该方法在有效载荷较小时能够实现数据的可逆隐藏,在有效载荷较大时切换到不可逆模式。虽然对音视频和字幕轨之间的同步进行了控制,但字幕显示的延迟或提前是难以察觉的。处理后的MP4文件的文件大小也完全保留。通过主观评价来验证所提方法的基本性能。
{"title":"Data Hiding in MP4 Video Container based on Subtitle Track","authors":"ChuanSheng Chan, Koksheik Wong, Imdad MaungMaung","doi":"10.23919/APSIPA.2018.8659643","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659643","url":null,"abstract":"This paper proposes a data hiding method in MP4 container format. Specifically, the synchronization between subtitle and audio-video tracks is exploited to hide data. The time scale is first scaled, and the sample duration pair is modified to hide data. The proposed method is able to hide data reversibly when the payload size is relative small, and it switches to the irreversible mode to offer higher payload. Although synchronization between audio-video and subtitle tracks are manipulated, the delay or ahead in displaying subtitle is imperceptible. The filesize of the processed MP4 file is also completely preserved. Subjective evaluations are carried out to verify the basic performance of the proposed method.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125779782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Ensemble Deep Learning Based Cooperative Spectrum Sensing with Stacking Fusion Center 基于集成深度学习的叠加融合中心协同频谱感知
Hang Liu, Xu Zhu, T. Fujii
In this paper, an ensemble learning (EL) framework is adopted for cooperative spectrum sensing (CSS) in an orthogonal frequency division multiplexing (OFDM) signal based cognitive radio system. Each secondary user (SU) is accordingly considered as a base learner, where the local spectrum sensing is for investigating the probability of PU being inactive or active. The convolution neural networks with simple architecture are applied given its strength in image recognition as well as the limited computation ability of each SU, meanwhile, the cyclic spectral correlation feature is introduced as the input data. Here, as for the supervised learning, the bagging strategy is helped to establish the training database. For the global decision, the fusion center employs the stacked generalization for further combination learning the SU output of classification pre-prediction of the PU status. Our method shows significant advantages over conventional CSS methods in term of the detection probability or false alarm probability performance.
在基于正交频分复用(OFDM)信号的认知无线电系统中,采用集成学习(EL)框架进行协同频谱感知(CSS)。因此,每个辅助用户(SU)被视为一个基础学习者,其中局部频谱感知用于调查PU不活跃或活跃的概率。考虑到卷积神经网络在图像识别方面的优势以及每个神经网络的计算能力有限,采用了结构简单的卷积神经网络,同时引入了循环谱相关特征作为输入数据。这里,对于监督学习,利用bagging策略建立训练库。对于全局决策,融合中心采用堆叠泛化进一步组合学习PU状态分类预预测的SU输出。我们的方法在检测概率和虚警概率性能上都比传统的CSS方法有明显的优势。
{"title":"Ensemble Deep Learning Based Cooperative Spectrum Sensing with Stacking Fusion Center","authors":"Hang Liu, Xu Zhu, T. Fujii","doi":"10.23919/APSIPA.2018.8659774","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659774","url":null,"abstract":"In this paper, an ensemble learning (EL) framework is adopted for cooperative spectrum sensing (CSS) in an orthogonal frequency division multiplexing (OFDM) signal based cognitive radio system. Each secondary user (SU) is accordingly considered as a base learner, where the local spectrum sensing is for investigating the probability of PU being inactive or active. The convolution neural networks with simple architecture are applied given its strength in image recognition as well as the limited computation ability of each SU, meanwhile, the cyclic spectral correlation feature is introduced as the input data. Here, as for the supervised learning, the bagging strategy is helped to establish the training database. For the global decision, the fusion center employs the stacked generalization for further combination learning the SU output of classification pre-prediction of the PU status. Our method shows significant advantages over conventional CSS methods in term of the detection probability or false alarm probability performance.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125900290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Multichannel NMF with Reduced Computational Complexity for Speech Recognition 降低计算复杂度的多通道NMF语音识别
T. Izumi, Takanobu Uramoto, Shingo Uenohara, K. Furuya, Ryo Aihara, Toshiyuki Hanazawa, Y. Okato
In this study, we propose efficient the number of computational iteration method of MNMF for speech recognition. The proposed method initializes and estimates the MNMF algorithm with respect to the estimated spatial correlation matrix reducing the number of iteration of update algorithm. This time, mask emphasis via Expectation Maximization algorithm is used for estimation of a spatial correlation matrix. As another method, we propose a computational complexity reduction method via decimating update of the spatial correlation matrixH. The experimental result indicates that our method reduced the computational complexity of MNMF. It shows that the performance of the conventional MNMF was maintained and the computational complexity could be reduced.
在本研究中,我们提出了高效的MNMF计算迭代次数方法用于语音识别。该方法根据估计的空间相关矩阵对MNMF算法进行初始化和估计,减少了更新算法的迭代次数。这一次,通过期望最大化算法的掩模强调被用于空间相关矩阵的估计。作为另一种方法,我们提出了一种通过抽取更新空间相关矩阵h来降低计算复杂度的方法。实验结果表明,该方法降低了MNMF的计算复杂度。结果表明,在保持传统MNMF的性能的同时,可以降低计算复杂度。
{"title":"Multichannel NMF with Reduced Computational Complexity for Speech Recognition","authors":"T. Izumi, Takanobu Uramoto, Shingo Uenohara, K. Furuya, Ryo Aihara, Toshiyuki Hanazawa, Y. Okato","doi":"10.23919/APSIPA.2018.8659493","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659493","url":null,"abstract":"In this study, we propose efficient the number of computational iteration method of MNMF for speech recognition. The proposed method initializes and estimates the MNMF algorithm with respect to the estimated spatial correlation matrix reducing the number of iteration of update algorithm. This time, mask emphasis via Expectation Maximization algorithm is used for estimation of a spatial correlation matrix. As another method, we propose a computational complexity reduction method via decimating update of the spatial correlation matrixH. The experimental result indicates that our method reduced the computational complexity of MNMF. It shows that the performance of the conventional MNMF was maintained and the computational complexity could be reduced.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128243373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Digital Modeling Technique for Distortion Effect Based on a Machine Learning Approach 基于机器学习方法的失真效果数字化建模技术
Yuto Matsunaga, N. Aoki, Y. Dobashi, Tsuyoshi Yamamoto
This paper describes an experimental result of modeling stomp boxes of the distortion effect based on a machine learning approach. Our proposed technique models the distortion stomp boxes as a neural network consisting of CNN and LSTM. In this approach, CNN is employed for modeling the linear component that appears in the pre and post filters of the stomp boxes. On the other hand, LSTM is employed for modeling the nonlinear component that appears in the distortion process of the stomp boxes. All the parameters are estimated through the training process using the input and output signals of the distortion stomp boxes. The experimental result indicates that the proposed technique may have a certain potential to replicate the distortion stomp boxes appropriately by using the well-trained neural network.
本文描述了一种基于机器学习方法的跺箱变形效果建模的实验结果。我们提出的技术将失真踩箱建模为一个由CNN和LSTM组成的神经网络。在这种方法中,使用CNN对出现在踩踏箱的前后滤波器中的线性分量进行建模。另一方面,利用LSTM对冲压箱变形过程中出现的非线性分量进行建模。利用畸变踩箱的输入和输出信号,通过训练过程估计出所有参数。实验结果表明,该方法具有一定的潜力,可以通过训练良好的神经网络,较好地复制变形踏箱。
{"title":"A Digital Modeling Technique for Distortion Effect Based on a Machine Learning Approach","authors":"Yuto Matsunaga, N. Aoki, Y. Dobashi, Tsuyoshi Yamamoto","doi":"10.23919/APSIPA.2018.8659547","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659547","url":null,"abstract":"This paper describes an experimental result of modeling stomp boxes of the distortion effect based on a machine learning approach. Our proposed technique models the distortion stomp boxes as a neural network consisting of CNN and LSTM. In this approach, CNN is employed for modeling the linear component that appears in the pre and post filters of the stomp boxes. On the other hand, LSTM is employed for modeling the nonlinear component that appears in the distortion process of the stomp boxes. All the parameters are estimated through the training process using the input and output signals of the distortion stomp boxes. The experimental result indicates that the proposed technique may have a certain potential to replicate the distortion stomp boxes appropriately by using the well-trained neural network.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128697250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Sequential Generation of Singing F0 Contours from Musical Note Sequences Based on WaveNet 基于WaveNet的音符序列演唱F0轮廓序列生成
Yusuke Wada, Ryo Nishikimi, Eita Nakamura, Katsutoshi Itoyama, Kazuyoshi Yoshii
This paper describes a method that can generate a continuous F0 contour of a singing voice from a monophonic sequence of musical notes (musical score) by using a deep neural autoregressive model called WaveNet. Real F0 contours include complicated temporal and frequency fluctuations caused by singing expressions such as vibrato and portamento. Although explicit models such as hidden Markov models (HMMs) have often used for representing the F0 dynamics, it is difficult to generate realistic F0 contours due to the poor representation capability of such models. To overcome this limitation, WaveNet, which was invented for modeling raw waveforms in an unsupervised manner, was recently used for generating singing F0 contours from a musical score with lyrics in a supervised manner. Inspired by this attempt, we investigate the capability of WaveNet for generating singing F0 contours without using lyric information. Our method conditions WaveNet on pitch and contextual features of a musical score. As a loss function that is more suitable for generating F0 contours, we adopted the modified cross-entropy loss weighted with the square error between target and output F0s on the log-frequency axis. The experimental results show that these techniques improve the quality of generated F0 contours.
本文描述了一种利用深度神经自回归模型WaveNet从单音音符序列(乐谱)中生成连续F0轮廓的方法。真实的F0轮廓包括复杂的时间和频率波动,这些波动是由诸如颤音和奏调等歌唱表达引起的。虽然隐马尔可夫模型(hmm)等显式模型经常用于F0动力学的表示,但由于这些模型的表示能力较差,难以生成真实的F0轮廓。为了克服这一限制,WaveNet(用于以无监督的方式对原始波形进行建模)最近被用于以监督的方式从带有歌词的乐谱中生成歌唱F0轮廓。受到这一尝试的启发,我们研究了WaveNet在不使用歌词信息的情况下生成歌唱F0轮廓的能力。我们的方法根据音高和乐谱的上下文特征来设置WaveNet条件。作为一种更适合生成F0轮廓的损失函数,我们采用了目标与输出F0在对数-频率轴上的平方误差加权的修正交叉熵损失。实验结果表明,这些技术提高了生成的F0轮廓的质量。
{"title":"Sequential Generation of Singing F0 Contours from Musical Note Sequences Based on WaveNet","authors":"Yusuke Wada, Ryo Nishikimi, Eita Nakamura, Katsutoshi Itoyama, Kazuyoshi Yoshii","doi":"10.23919/APSIPA.2018.8659502","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659502","url":null,"abstract":"This paper describes a method that can generate a continuous F0 contour of a singing voice from a monophonic sequence of musical notes (musical score) by using a deep neural autoregressive model called WaveNet. Real F0 contours include complicated temporal and frequency fluctuations caused by singing expressions such as vibrato and portamento. Although explicit models such as hidden Markov models (HMMs) have often used for representing the F0 dynamics, it is difficult to generate realistic F0 contours due to the poor representation capability of such models. To overcome this limitation, WaveNet, which was invented for modeling raw waveforms in an unsupervised manner, was recently used for generating singing F0 contours from a musical score with lyrics in a supervised manner. Inspired by this attempt, we investigate the capability of WaveNet for generating singing F0 contours without using lyric information. Our method conditions WaveNet on pitch and contextual features of a musical score. As a loss function that is more suitable for generating F0 contours, we adopted the modified cross-entropy loss weighted with the square error between target and output F0s on the log-frequency axis. The experimental results show that these techniques improve the quality of generated F0 contours.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130544901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Exploring redundancy of HRTFs for fast training DNN-based HRTF personalization 探索基于dnn的HRTF个性化快速训练的HRTF冗余
Tzu-Yu Chen, Po-Wen Hsiao, T. Chi
A deep neural network (DNN) is constructed to predict the magnitude responses of the head-related transfer functions (HRTFs) of users for a specific direction and a specific ear. Using the CIPIC HRTF database (including 25 azimuth angles and 50 elevation angles for both ears), we trained 2500 DNNs to predict magnitude responses of all HRTFs of a user. To reduce training time, we propose to use the final weights of the trained DNN of a nearby direction as the initial weights of the current DNN under training since magnitude responses of the HRTFs are smoothly changing across nearby directions. Analysis of variance (ANOVA) was performed to show that the proposed training scheme produces equivalent magnitude responses of HRTFs as the standard training scheme with random initial weights in terms of the log-spectral distortion (LSD) measure. Meanwhile, the proposed training scheme can dramatically reduce training time by more than 95%.
构建深度神经网络(DNN)来预测用户对特定方向和特定耳朵的头部相关传递函数(hrtf)的大小响应。使用CIPIC HRTF数据库(包括双耳25个方位角和50个仰角),我们训练了2500个dnn来预测用户所有HRTF的震级响应。为了减少训练时间,我们建议使用附近方向的训练DNN的最终权值作为当前DNN在训练中的初始权值,因为hrtf的大小响应在附近方向上是平滑变化的。方差分析(ANOVA)表明,就对数谱失真(LSD)测量而言,所提出的训练方案与具有随机初始权值的标准训练方案产生的hrtf响应大小相当。同时,所提出的训练方案可将训练时间大幅减少95%以上。
{"title":"Exploring redundancy of HRTFs for fast training DNN-based HRTF personalization","authors":"Tzu-Yu Chen, Po-Wen Hsiao, T. Chi","doi":"10.23919/APSIPA.2018.8659704","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659704","url":null,"abstract":"A deep neural network (DNN) is constructed to predict the magnitude responses of the head-related transfer functions (HRTFs) of users for a specific direction and a specific ear. Using the CIPIC HRTF database (including 25 azimuth angles and 50 elevation angles for both ears), we trained 2500 DNNs to predict magnitude responses of all HRTFs of a user. To reduce training time, we propose to use the final weights of the trained DNN of a nearby direction as the initial weights of the current DNN under training since magnitude responses of the HRTFs are smoothly changing across nearby directions. Analysis of variance (ANOVA) was performed to show that the proposed training scheme produces equivalent magnitude responses of HRTFs as the standard training scheme with random initial weights in terms of the log-spectral distortion (LSD) measure. Meanwhile, the proposed training scheme can dramatically reduce training time by more than 95%.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127887837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Journal Name Extraction from Japanese Scientific News Articles 日本科学新闻文章的期刊名称提取
M. Kikuchi, Mitsuo Yoshida, Kyoji Umemura
In Japanese scientific news articles, although the research results are described clearly, the article's sources tend to be uncited. This makes it difficult for readers to know the details of the research. In this paper, we address the task of extracting journal names from Japanese scientific news articles. We hypothesize that a journal name is likely to occur in a specific context. To support the hypothesis, we construct a character-based method and extract journal names using this method. This method only uses the left and right context features of journal names. The results of the journal name extractions suggest that the distribution hypothesis plays an important role in identifying the journal names.
在日本的科学新闻文章中,虽然研究结果描述得很清楚,但文章的来源往往没有被引用。这使得读者很难了解研究的细节。在本文中,我们解决了从日本科学新闻文章中提取期刊名称的任务。我们假设期刊名称可能出现在特定的上下文中。为了支持这一假设,我们构建了一个基于字符的方法,并使用该方法提取期刊名称。此方法仅使用日志名称的左右上下文特征。期刊名称提取的结果表明,分布假设在期刊名称识别中起着重要作用。
{"title":"Journal Name Extraction from Japanese Scientific News Articles","authors":"M. Kikuchi, Mitsuo Yoshida, Kyoji Umemura","doi":"10.23919/APSIPA.2018.8659765","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659765","url":null,"abstract":"In Japanese scientific news articles, although the research results are described clearly, the article's sources tend to be uncited. This makes it difficult for readers to know the details of the research. In this paper, we address the task of extracting journal names from Japanese scientific news articles. We hypothesize that a journal name is likely to occur in a specific context. To support the hypothesis, we construct a character-based method and extract journal names using this method. This method only uses the left and right context features of journal names. The results of the journal name extractions suggest that the distribution hypothesis plays an important role in identifying the journal names.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131352225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1