首页 > 最新文献

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)最新文献

英文 中文
Video Generation and Synthesis Network for Long-term Video Interpolation 用于长期视频插值的视频生成与合成网络
Na-young Kim, Jung Kyung Lee, C. Yoo, Seunghyun Cho, Jewon Kang
In this paper, we propose a bidirectional synthesis video interpolation technique based on deep learning, using a forward and a backward video generation network and a synthesis network. The forward generation network first extrapolates a video sequence, given the past video frames, and then the backward generation network generates the same video sequence, given the future video frames. Next, a synthesis network fuses the results of the two generation networks to create an intermediate video sequence. To jointly train the video generation and synthesis networks, we define a cost function to approximate the visual quality and the motion of the interpolated video as close as possible to those of the original video. Experimental results show that the proposed technique outperforms the state-of-the art long-term video interpolation model based on deep learning.
在本文中,我们提出了一种基于深度学习的双向合成视频插值技术,使用前向和后向视频生成网络和合成网络。前向生成网络首先根据过去的视频帧外推视频序列,然后后向生成网络根据未来的视频帧生成相同的视频序列。接下来,一个合成网络融合两代网络的结果来创建一个中间视频序列。为了联合训练视频生成和合成网络,我们定义了一个代价函数来近似插值视频的视觉质量和运动,使其尽可能接近原始视频的视觉质量和运动。实验结果表明,该方法优于基于深度学习的长期视频插值模型。
{"title":"Video Generation and Synthesis Network for Long-term Video Interpolation","authors":"Na-young Kim, Jung Kyung Lee, C. Yoo, Seunghyun Cho, Jewon Kang","doi":"10.23919/APSIPA.2018.8659743","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659743","url":null,"abstract":"In this paper, we propose a bidirectional synthesis video interpolation technique based on deep learning, using a forward and a backward video generation network and a synthesis network. The forward generation network first extrapolates a video sequence, given the past video frames, and then the backward generation network generates the same video sequence, given the future video frames. Next, a synthesis network fuses the results of the two generation networks to create an intermediate video sequence. To jointly train the video generation and synthesis networks, we define a cost function to approximate the visual quality and the motion of the interpolated video as close as possible to those of the original video. Experimental results show that the proposed technique outperforms the state-of-the art long-term video interpolation model based on deep learning.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122100376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Speech Synthesis Using WaveNet Vocoder Based on Periodic/Aperiodic Decomposition 基于周期/非周期分解的波网声码器语音合成
Takato Fujimoto, Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, K. Tokuda
This paper proposes speech synthesis using a WaveNet vocoder based on periodic/aperiodic decomposition. Normally, quasiperiodic and aperiodic components are contained in human speech waveforms. Therefore, it is important to accurately model periodic and aperiodic components. Periodic and aperiodic components are represented as the ratios of the energies in conventional statistical parametric speech synthesis. On the other hand, statistical parametric speech synthesis based on periodic/aperiodic decomposition has been proposed. Although the effectiveness of this approach has been shown, speech waveforms considering both periodic and aperiodic components cannot be generated directly. In this paper, we propose speech synthesis using a WaveNet vocoder based on periodic/aperiodic decomposition. In the proposed approach, separated periodic and aperiodic components are modeled by a single acoustic model based on deep neural networks, and then speech waveforms considering both periodic and aperiodic components are directly generated by a single WaveNet vocoder based on neural networks. Experimental results show that the proposed approach outperforms the conventional approach in the naturalness of the synthesized speech.
本文提出了一种基于周期/非周期分解的WaveNet声码器语音合成方法。人的语音波形通常包含准周期和非周期分量。因此,对周期和非周期元件进行精确建模是非常重要的。在传统的统计参数语音合成中,周期分量和非周期分量表示为能量的比值。另一方面,提出了基于周期/非周期分解的统计参数语音合成方法。虽然这种方法的有效性已被证明,但考虑周期和非周期分量的语音波形不能直接生成。在本文中,我们提出了一种基于周期/非周期分解的WaveNet声码器的语音合成。该方法利用基于深度神经网络的单一声学模型对分离的周期分量和非周期分量进行建模,然后利用基于神经网络的单一WaveNet声码器直接生成同时考虑周期分量和非周期分量的语音波形。实验结果表明,该方法在合成语音的自然度方面优于传统方法。
{"title":"Speech Synthesis Using WaveNet Vocoder Based on Periodic/Aperiodic Decomposition","authors":"Takato Fujimoto, Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, K. Tokuda","doi":"10.23919/APSIPA.2018.8659541","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659541","url":null,"abstract":"This paper proposes speech synthesis using a WaveNet vocoder based on periodic/aperiodic decomposition. Normally, quasiperiodic and aperiodic components are contained in human speech waveforms. Therefore, it is important to accurately model periodic and aperiodic components. Periodic and aperiodic components are represented as the ratios of the energies in conventional statistical parametric speech synthesis. On the other hand, statistical parametric speech synthesis based on periodic/aperiodic decomposition has been proposed. Although the effectiveness of this approach has been shown, speech waveforms considering both periodic and aperiodic components cannot be generated directly. In this paper, we propose speech synthesis using a WaveNet vocoder based on periodic/aperiodic decomposition. In the proposed approach, separated periodic and aperiodic components are modeled by a single acoustic model based on deep neural networks, and then speech waveforms considering both periodic and aperiodic components are directly generated by a single WaveNet vocoder based on neural networks. Experimental results show that the proposed approach outperforms the conventional approach in the naturalness of the synthesized speech.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122212009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Encryption and Data Insertion Technique using Region Division and Histogram Manipulation 使用区域划分和直方图处理的加密和数据插入技术
Ryoma Ito, Koksheik Wong, Simying Ong, Kiyoshi Tanaka
A separable encryption and data insertion method is proposed in this paper. The input image is divided into 2 parts, where the first part is manipulated to mask the perceptual semantics, while the second part is processed to hide data. The binary image, which is the data to be inserted, further divides the second part of the input image into 2 regions called the ‘zero’ and ‘one’ regions. Pixels of the original image at position coinciding with the ‘zero’ region are darken, while those coinciding with the ‘one’ region are brightened. The darkening and brightening processes are performed by using histogram matching technique. The proposed joint method is separable, where the inserted binary image can be extracted directly from the masked image or from the reconstructed image. The proposed method is also commutative because the same results is achieved irregardless of the order of processing in encrypting and inserting data. Experiments were carried out to verify the basic performances of the proposed method.
提出了一种可分离的加密和数据插入方法。将输入图像分成2部分,对第一部分进行处理以掩盖感知语义,对第二部分进行处理以隐藏数据。二值图像即要插入的数据,它将输入图像的第二部分进一步划分为两个区域,分别称为“0”和“1”区域。与“0”区域重合的原始图像像素变暗,而与“1”区域重合的像素变亮。采用直方图匹配技术对图像进行调暗和调亮处理。所提出的联合方法是可分离的,插入的二值图像可以直接从被屏蔽图像中提取,也可以从重建图像中提取。所提出的方法也是可交换的,因为无论加密和插入数据的处理顺序如何,都可以获得相同的结果。实验验证了该方法的基本性能。
{"title":"Encryption and Data Insertion Technique using Region Division and Histogram Manipulation","authors":"Ryoma Ito, Koksheik Wong, Simying Ong, Kiyoshi Tanaka","doi":"10.23919/APSIPA.2018.8659671","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659671","url":null,"abstract":"A separable encryption and data insertion method is proposed in this paper. The input image is divided into 2 parts, where the first part is manipulated to mask the perceptual semantics, while the second part is processed to hide data. The binary image, which is the data to be inserted, further divides the second part of the input image into 2 regions called the ‘zero’ and ‘one’ regions. Pixels of the original image at position coinciding with the ‘zero’ region are darken, while those coinciding with the ‘one’ region are brightened. The darkening and brightening processes are performed by using histogram matching technique. The proposed joint method is separable, where the inserted binary image can be extracted directly from the masked image or from the reconstructed image. The proposed method is also commutative because the same results is achieved irregardless of the order of processing in encrypting and inserting data. Experiments were carried out to verify the basic performances of the proposed method.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116978283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Automatic Evaluation of Singing Quality without a Reference 无参考的歌唱质量自动评价
Chitralekha Gupta, Haizhou Li, Ye Wang
Automatic singing quality evaluation methods currently rely on reference singing vocals or score information for comparison. However singers may deviate from the reference singing vocal to personalize the singing that still sounds good. In this work, we present pitch histogram-based methods to automatically evaluate singing quality without any reference singing or score information. We validate the methods with the help of human ratings, and compare with the baseline methods of singing evaluation without a reference. We obtain an average Spearman's rank correlation of 0.716 with human judgments.
目前的自动演唱质量评价方法依赖于参考演唱人声或乐谱信息进行比较。然而,歌手可能会偏离参考演唱声乐个性化的歌唱,仍然听起来很好。在这项工作中,我们提出了基于音高直方图的方法来自动评估演唱质量,而不需要任何参考演唱或乐谱信息。我们在人类评分的帮助下验证了这些方法,并在没有参考的情况下与唱歌评估的基线方法进行了比较。我们得到了与人类判断的平均斯皮尔曼秩相关系数为0.716。
{"title":"Automatic Evaluation of Singing Quality without a Reference","authors":"Chitralekha Gupta, Haizhou Li, Ye Wang","doi":"10.23919/APSIPA.2018.8659545","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659545","url":null,"abstract":"Automatic singing quality evaluation methods currently rely on reference singing vocals or score information for comparison. However singers may deviate from the reference singing vocal to personalize the singing that still sounds good. In this work, we present pitch histogram-based methods to automatically evaluate singing quality without any reference singing or score information. We validate the methods with the help of human ratings, and compare with the baseline methods of singing evaluation without a reference. We obtain an average Spearman's rank correlation of 0.716 with human judgments.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128322054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Model-Based Encoding Parameter Optimization for 3D Point Cloud Compression 基于模型的三维点云压缩编码参数优化
Qi Liu, Hui Yuan, Junhui Hou, Hao Liu, R. Hamzaoui
Rate-distortion optimal 3D point cloud compression is very challenging due to the irregular structure of 3D point clouds. For a popular 3D point cloud codec that uses octrees for geometry compression and JPEG for color compression, we first find analytical models that describe the relationship between the encoding parameters and the bitrate and distortion, respectively. We then use our models to formulate the rate-distortion optimization problem as a constrained convex optimization problem and apply an interior point method to solve it. Experimental results for six 3D point clouds show that our technique gives similar results to exhaustive search at only about 1.57% of its computational cost.
由于三维点云的不规则结构,速率失真优化三维点云压缩非常具有挑战性。对于使用八叉树进行几何压缩和JPEG进行颜色压缩的流行3D点云编解码器,我们首先找到了分别描述编码参数与比特率和失真之间关系的分析模型。然后,我们使用我们的模型将率失真优化问题表述为一个约束凸优化问题,并应用内点法来解决它。对6个三维点云的实验结果表明,该方法与穷举搜索的结果相似,而计算成本仅为穷举搜索的1.57%。
{"title":"Model-Based Encoding Parameter Optimization for 3D Point Cloud Compression","authors":"Qi Liu, Hui Yuan, Junhui Hou, Hao Liu, R. Hamzaoui","doi":"10.23919/APSIPA.2018.8659653","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659653","url":null,"abstract":"Rate-distortion optimal 3D point cloud compression is very challenging due to the irregular structure of 3D point clouds. For a popular 3D point cloud codec that uses octrees for geometry compression and JPEG for color compression, we first find analytical models that describe the relationship between the encoding parameters and the bitrate and distortion, respectively. We then use our models to formulate the rate-distortion optimization problem as a constrained convex optimization problem and apply an interior point method to solve it. Experimental results for six 3D point clouds show that our technique gives similar results to exhaustive search at only about 1.57% of its computational cost.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129689672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Cascade and Lifting Structures in the Spectral Domain for Bipartite Graph Filter Banks 二部图滤波器组的谱域级联和提升结构
David B. H. Tay, Antonio Ortega, Aamir Anis
In classical multirate filter bank systems, the cascade (product) of simple polyphase matrices is an important technique for the theory, design and implementation of filter banks. A particularly important class of cascades uses elementary matrices and leads to the well known lifting scheme in wavelets. In this paper the theory and principles of cascade and lifting structures for bipartite graph filter banks are developed. Accurate spectral characterizations of these structures using equivalent subgraphs will be presented. Some features of the structures in the graph case, that are not present in the classical case, will be discussed.
在经典的多速率滤波器组系统中,简单多相矩阵的级联(积)是滤波器组理论、设计和实现的一项重要技术。一类特别重要的级联使用初等矩阵并导致众所周知的小波提升格式。本文研究了二部图滤波器组的串级和提升结构的理论和原理。这些结构的精确光谱表征使用等效子图将被提出。在图的情况下,结构的一些特征,不存在于经典的情况下,将讨论。
{"title":"Cascade and Lifting Structures in the Spectral Domain for Bipartite Graph Filter Banks","authors":"David B. H. Tay, Antonio Ortega, Aamir Anis","doi":"10.23919/APSIPA.2018.8659561","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659561","url":null,"abstract":"In classical multirate filter bank systems, the cascade (product) of simple polyphase matrices is an important technique for the theory, design and implementation of filter banks. A particularly important class of cascades uses elementary matrices and leads to the well known lifting scheme in wavelets. In this paper the theory and principles of cascade and lifting structures for bipartite graph filter banks are developed. Accurate spectral characterizations of these structures using equivalent subgraphs will be presented. Some features of the structures in the graph case, that are not present in the classical case, will be discussed.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129881666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Lapped Cuboid-based Perceptual Encryption for Motion JPEG Standard 基于叠边长方体的运动JPEG标准感知加密
Kosuke Shimizu, Taizo Suzuki, K. Kameyama
This paper proposes cuboid-based perceptual encryption (Cd-PE) and a version of cube-based perceptual encryption (C-PE), named lapped cuboid-based perceptual encryption (LCd-PE), to enhance the security for Motion JPEG (MJPEG). Although C-PE provides a high level of security by dealing with several frames of the input video sequence simultaneously, keyless attackers may illegally decrypt the encrypted video sequence with conceivable attacks such as a cube-based jigsaw puzzle solver (CJPS) attack. LCd-PE subdivides the video sequence pre-encrypted with C-PE into small cuboids and further encrypts it so that attackers cannot conduct attacks such as CJPS. The experiments show that the compression performance of an encryption-then-compression (ETC) system with LCd-PE and MJPEG is almost equivalent to that of one using C-PE and yet achieves a higher level of security.
本文提出了基于立方体感知加密(Cd-PE)和基于立方体感知加密(C-PE)的一种版本,称为叠置立方体感知加密(LCd-PE),以提高运动JPEG (MJPEG)的安全性。尽管C-PE通过同时处理输入视频序列的多个帧提供了高级别的安全性,但无密钥攻击者可能会使用可想象的攻击(例如基于立方体的拼图解决器(CJPS)攻击)非法解密加密的视频序列。LCd-PE将C-PE预加密的视频序列细分为小长方体,再进行加密,使攻击者无法进行CJPS等攻击。实验表明,采用LCd-PE和MJPEG的先加密后压缩(ETC)系统的压缩性能几乎与使用C-PE的系统相当,但具有更高的安全性。
{"title":"Lapped Cuboid-based Perceptual Encryption for Motion JPEG Standard","authors":"Kosuke Shimizu, Taizo Suzuki, K. Kameyama","doi":"10.23919/APSIPA.2018.8659680","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659680","url":null,"abstract":"This paper proposes cuboid-based perceptual encryption (Cd-PE) and a version of cube-based perceptual encryption (C-PE), named lapped cuboid-based perceptual encryption (LCd-PE), to enhance the security for Motion JPEG (MJPEG). Although C-PE provides a high level of security by dealing with several frames of the input video sequence simultaneously, keyless attackers may illegally decrypt the encrypted video sequence with conceivable attacks such as a cube-based jigsaw puzzle solver (CJPS) attack. LCd-PE subdivides the video sequence pre-encrypted with C-PE into small cuboids and further encrypts it so that attackers cannot conduct attacks such as CJPS. The experiments show that the compression performance of an encryption-then-compression (ETC) system with LCd-PE and MJPEG is almost equivalent to that of one using C-PE and yet achieves a higher level of security.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130378862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Progressive Neural Network-based Knowledge Transfer in Acoustic Models 基于渐进式神经网络的声学模型知识转移
Takafumi Moriya, Ryo Masumura, Taichi Asami, Yusuke Shinohara, Marc Delcroix, Y. Yamaguchi, Y. Aono
This paper presents a novel deep neural network architecture for transfer learning in acoustic models. A well-known approach for transfer leaning is using target domain data to fine-tune a pre-trained model with source model. The model is trained so as to raise its performance in the target domain. However, this approach may not fully utilize the knowledge of the pre-trained model because the pre-trained knowledge is forgotten when the target domain is updated. To solve this problem, we propose a new architecture based on progressive neural networks (PNN) that can transfer knowledge; it does not forget and can well utilize pre-trained knowledge. In addition, we introduce an enhanced PNN that uses feature augmentation to better leverage pre-trained knowledge. The proposed architecture is challenged in experiments on three different recorded Japanese speech recognition tasks (one source and two target domain tasks). In a comparison with various transfer learning approaches, our proposal achieves the lowest error rate in the target tasks.
提出了一种新的用于声学模型迁移学习的深度神经网络结构。一种众所周知的迁移学习方法是使用目标域数据对预训练模型与源模型进行微调。对模型进行训练,以提高其在目标域的性能。然而,这种方法可能不能充分利用预训练模型的知识,因为当目标域更新时,预训练的知识会被遗忘。为了解决这一问题,我们提出了一种基于渐进式神经网络(PNN)的新架构,该架构可以转移知识;它不会忘记并能很好地利用预先训练的知识。此外,我们引入了一种增强的PNN,它使用特征增强来更好地利用预训练的知识。在三种不同的日语语音识别任务(一个源域和两个目标域任务)的实验中,对所提出的架构进行了挑战。在与各种迁移学习方法的比较中,我们的方法在目标任务中实现了最低的错误率。
{"title":"Progressive Neural Network-based Knowledge Transfer in Acoustic Models","authors":"Takafumi Moriya, Ryo Masumura, Taichi Asami, Yusuke Shinohara, Marc Delcroix, Y. Yamaguchi, Y. Aono","doi":"10.23919/APSIPA.2018.8659556","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659556","url":null,"abstract":"This paper presents a novel deep neural network architecture for transfer learning in acoustic models. A well-known approach for transfer leaning is using target domain data to fine-tune a pre-trained model with source model. The model is trained so as to raise its performance in the target domain. However, this approach may not fully utilize the knowledge of the pre-trained model because the pre-trained knowledge is forgotten when the target domain is updated. To solve this problem, we propose a new architecture based on progressive neural networks (PNN) that can transfer knowledge; it does not forget and can well utilize pre-trained knowledge. In addition, we introduce an enhanced PNN that uses feature augmentation to better leverage pre-trained knowledge. The proposed architecture is challenged in experiments on three different recorded Japanese speech recognition tasks (one source and two target domain tasks). In a comparison with various transfer learning approaches, our proposal achieves the lowest error rate in the target tasks.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114819494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Relevant Phonetic-aware Neural Acoustic Models using Native English and Japanese Speech for Japanese-English Automatic Speech Recognition 基于英语和日语语音的日英自动语音识别相关语音感知神经声学模型
Ryo Masumura, Suguru Kabashima, Takafumi Moriya, Satoshi Kobashikawa, Y. Yamaguchi, Y. Aono
This paper proposes relevant phonetic-aware neural acoustic models that leverage native Japanese speech and native English speech to create improved automatic speech recognition (ASR) of Japanese-English speech. In order to accurately transcribe Japanese-English speech in ASR, acoustic models are needed that are specific to Japanese-English speech since Japanese-English speech exhibits pronunciations that differ from those of native English speech. The major problem is that it is difficult to collect a lot of Japanese-English speech for constructing acoustic models. Therefore, our motivation is to efficiently leverage the significant amounts of native English and native Japanese speech material available since Japanese-English is definitely affected by both native English and native Japanese. Our idea is to utilize them indirectly to enhance the phonetic-awareness of Japanese-English acoustic models. It can be expected that the native English speech is effective in enhancing the classification performance of English-like phonemes, while the native Japanese speech is effective in enhancing the classification performance of Japanese-like phonemes. In the proposed relevant phonetic-aware neural acoustic models, this idea is implemented by utilizing bottleneck features of native English and native Japanese neural acoustic models. Our experiments construct the relevant phonetic-aware neural acoustic models by utilizing 300 hours of Japanese-English speech, 1,500 hours of native Japanese speech, and 900 hours of native English speech. We demonstrate effectiveness of our proposal using evaluation data sets that involve four levels of Japanese-English.
本文提出了相关的语音感知神经声学模型,利用日语母语语音和英语母语语音来创建改进的日语-英语语音自动语音识别(ASR)。由于日语-英语语音表现出与英语母语语音不同的发音,因此为了在ASR中准确地转录日语-英语语音,需要针对日语-英语语音的声学模型。主要的问题是很难收集到大量的日英语音来构建声学模型。因此,我们的动机是有效地利用大量可用的英语母语和日语母语演讲材料,因为日语-英语肯定会受到英语母语和日语母语的影响。我们的想法是间接地利用它们来增强日英声学模型的语音意识。可以预期,英语母语语音在提高类英语音素分类性能上是有效的,而日语母语语音在提高类日语音素分类性能上是有效的。在提出的相关语音感知神经声学模型中,这一思想是通过利用英语母语和日语母语神经声学模型的瓶颈特征来实现的。我们的实验利用300小时的日语-英语语音、1500小时的日语母语语音和900小时的英语母语语音构建了相应的语音感知神经声学模型。我们使用涉及日语-英语四个级别的评估数据集来证明我们建议的有效性。
{"title":"Relevant Phonetic-aware Neural Acoustic Models using Native English and Japanese Speech for Japanese-English Automatic Speech Recognition","authors":"Ryo Masumura, Suguru Kabashima, Takafumi Moriya, Satoshi Kobashikawa, Y. Yamaguchi, Y. Aono","doi":"10.23919/APSIPA.2018.8659784","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659784","url":null,"abstract":"This paper proposes relevant phonetic-aware neural acoustic models that leverage native Japanese speech and native English speech to create improved automatic speech recognition (ASR) of Japanese-English speech. In order to accurately transcribe Japanese-English speech in ASR, acoustic models are needed that are specific to Japanese-English speech since Japanese-English speech exhibits pronunciations that differ from those of native English speech. The major problem is that it is difficult to collect a lot of Japanese-English speech for constructing acoustic models. Therefore, our motivation is to efficiently leverage the significant amounts of native English and native Japanese speech material available since Japanese-English is definitely affected by both native English and native Japanese. Our idea is to utilize them indirectly to enhance the phonetic-awareness of Japanese-English acoustic models. It can be expected that the native English speech is effective in enhancing the classification performance of English-like phonemes, while the native Japanese speech is effective in enhancing the classification performance of Japanese-like phonemes. In the proposed relevant phonetic-aware neural acoustic models, this idea is implemented by utilizing bottleneck features of native English and native Japanese neural acoustic models. Our experiments construct the relevant phonetic-aware neural acoustic models by utilizing 300 hours of Japanese-English speech, 1,500 hours of native Japanese speech, and 900 hours of native English speech. We demonstrate effectiveness of our proposal using evaluation data sets that involve four levels of Japanese-English.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124470498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Accurate OD Traffic Matrix Estimation Based on Resampling of Observed Flow Data 基于实测流量重采样的OD流量矩阵精确估计
Simon Kase, M. Tsuru, M. Uchida
It is important to observe the statistical characteristics of global flows, which are defined as series of packets between networks, for the management and operation of the Internet. However, because the Internet is a diverse and large-scale system organized by multiple distributed authorities, it is not practical (sometimes impossible) to directly measure the precise statistical characteristics of global flows. In this paper, we consider the problem of estimating the traffic rate of every unobservable global flow between corresponding origin-destination (OD) pair (hereafter referred to as “individual-flows”) based on the measured data of aggregated traffic rates of individual flows (hereafter referred to as “aggregated-flows”), which can be easily measured at certain links (e.g., router interfaces) in a network. In order to solve the OD traffic matrix estimation problem, the prior method uses an inverse function mapping from the probability distributions of the traffic rate of aggregated-flows to those of individual-flows. However, because this inverse function method is executed recursively, the accuracy of estimation is heavily affected by the initial values of recursion and variation of the measurement data. In order to solve this issue and improve estimation accuracy, we propose a method based on a resampling of measurement data to obtain a set of solution candidates for OD traffic matrix estimation. The results of performance evaluations using a real traffic trace demonstrate that the proposed method achieves better estimation accuracy than the prior method.
对于互联网的管理和运营来说,观察全球流量的统计特征是很重要的。全球流量被定义为网络之间的一系列数据包。然而,由于互联网是一个由多个分布式权威组织的多样化和大规模系统,直接测量全球流量的精确统计特征是不现实的(有时是不可能的)。在本文中,我们考虑了基于在网络中某些链路(如路由器接口)上易于测量的单个流(以下简称“聚合流”)的聚合流量的测量数据,估计相应的始发-目的地(OD)对之间的每个不可观测全局流(以下简称“个体流”)的流量速率的问题。为了解决OD流量矩阵估计问题,先前的方法使用了从聚合流的流量率概率分布到单个流的流量率概率分布的反函数映射。然而,由于这种逆函数方法是递归执行的,估计的精度受到递归初始值和测量数据变化的严重影响。为了解决这一问题并提高估计精度,我们提出了一种基于测量数据重采样的方法来获得OD流量矩阵估计的一组候选解。利用真实流量轨迹进行性能评估的结果表明,该方法比原有方法具有更好的估计精度。
{"title":"Accurate OD Traffic Matrix Estimation Based on Resampling of Observed Flow Data","authors":"Simon Kase, M. Tsuru, M. Uchida","doi":"10.23919/APSIPA.2018.8659531","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659531","url":null,"abstract":"It is important to observe the statistical characteristics of global flows, which are defined as series of packets between networks, for the management and operation of the Internet. However, because the Internet is a diverse and large-scale system organized by multiple distributed authorities, it is not practical (sometimes impossible) to directly measure the precise statistical characteristics of global flows. In this paper, we consider the problem of estimating the traffic rate of every unobservable global flow between corresponding origin-destination (OD) pair (hereafter referred to as “individual-flows”) based on the measured data of aggregated traffic rates of individual flows (hereafter referred to as “aggregated-flows”), which can be easily measured at certain links (e.g., router interfaces) in a network. In order to solve the OD traffic matrix estimation problem, the prior method uses an inverse function mapping from the probability distributions of the traffic rate of aggregated-flows to those of individual-flows. However, because this inverse function method is executed recursively, the accuracy of estimation is heavily affected by the initial values of recursion and variation of the measurement data. In order to solve this issue and improve estimation accuracy, we propose a method based on a resampling of measurement data to obtain a set of solution candidates for OD traffic matrix estimation. The results of performance evaluations using a real traffic trace demonstrate that the proposed method achieves better estimation accuracy than the prior method.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124527528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1