首页 > 最新文献

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)最新文献

英文 中文
On the construction of more human-like chatbots: Affect and emotion analysis of movie dialogue data 关于构建更像人类的聊天机器人:电影对白数据的情感和情感分析
Rafael E. Banchs
Affect and emotion are inherent properties of human-human communication and interaction. Recent research interest in chatbots and conversational agents aims at making human-machine interaction more human-like in both behavioral and attitudinal terms. This paper intends to present some baby steps in this direction by analyzing a large dialogue dataset in terms of tonal, affective and emotional bias, with the objective of providing a valuable resource for developing and training datadriven conversational agents with discriminative power across such dimensions. Preliminary results of the conducted analysis demonstrate that only a relative small, although not negligible, percentage of the dialogue turns present clear orientation in any of the considered dimensions. Future research is still needed to determine whether this proportion is enough for biasing system responses in order to create different personality trends in conversational agents that are perceptible by humans when interacting with them.
情感和情感是人与人之间交流和互动的固有属性。最近对聊天机器人和会话代理的研究兴趣旨在使人机交互在行为和态度方面更像人类。本文打算通过分析一个大型对话数据集,在音调、情感和情感偏见方面,在这个方向上迈出一些小步,目的是为开发和训练具有这些维度上判别能力的数据驱动对话代理提供有价值的资源。所进行的分析的初步结果表明,只有一个相对较小的,虽然不可忽略,百分比的对话转向呈现明确的方向在任何考虑的维度。未来的研究还需要确定这个比例是否足以使系统反应产生偏差,从而在对话代理中创造出不同的个性趋势,从而在与人类互动时被人类感知。
{"title":"On the construction of more human-like chatbots: Affect and emotion analysis of movie dialogue data","authors":"Rafael E. Banchs","doi":"10.1109/APSIPA.2017.8282245","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282245","url":null,"abstract":"Affect and emotion are inherent properties of human-human communication and interaction. Recent research interest in chatbots and conversational agents aims at making human-machine interaction more human-like in both behavioral and attitudinal terms. This paper intends to present some baby steps in this direction by analyzing a large dialogue dataset in terms of tonal, affective and emotional bias, with the objective of providing a valuable resource for developing and training datadriven conversational agents with discriminative power across such dimensions. Preliminary results of the conducted analysis demonstrate that only a relative small, although not negligible, percentage of the dialogue turns present clear orientation in any of the considered dimensions. Future research is still needed to determine whether this proportion is enough for biasing system responses in order to create different personality trends in conversational agents that are perceptible by humans when interacting with them.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128908679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Locomotion control of a serpentine crawling robot inspired by central pattern generators 基于中心模式发生器的蛇形爬行机器人运动控制
Jiadong Wang, Wenjuan Ouyang, Wenchao Gao, Qinyuan Ren
Serpentine locomotion is highly coordinating and full of adaptive ability in a clutter environment. Such outstanding and unique characteristics are acquired through millions of years' evolution. It is highly desirable to enhance robot with such characteristics, which is one of the ultimate aims of biomimetic research. To achieve this goal, we adopt a central pattern generator (CPG) inspired controller to generate Serpentine locomotion in a crawling robot. According to biology studies, CPGs are a set of neuronal circuits, which are responsible for producing rhythmic motion employed in animal locomotion. Such locomotion generation approach makes use of a set of coupled Kuramoto Oscillators to imitate of CPG in a nerve system. Moreover, to deal with dynamically changing environments, a feedback based on fuzzy logic control strategy is investigated. Finally, the proposed control approach is verified through the experiments of a crawling robot prototype.
蛇形运动具有高度的协调性,在杂波环境下具有很强的适应能力。这些突出而独特的特征是经过数百万年的进化而获得的。增强机器人的这些特性是人们迫切需要的,也是仿生研究的终极目标之一。为了实现这一目标,我们采用了一种中心模式生成器(CPG)启发的控制器来生成爬行机器人的蛇形运动。根据生物学研究,cpg是一组神经元回路,负责产生动物运动中使用的有节奏的运动。这种运动生成方法利用一组耦合的Kuramoto振荡器来模拟神经系统中的CPG。此外,为了应对动态变化的环境,研究了一种基于模糊反馈的控制策略。最后,通过爬行机器人原型的实验验证了所提出的控制方法。
{"title":"Locomotion control of a serpentine crawling robot inspired by central pattern generators","authors":"Jiadong Wang, Wenjuan Ouyang, Wenchao Gao, Qinyuan Ren","doi":"10.1109/APSIPA.2017.8282067","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282067","url":null,"abstract":"Serpentine locomotion is highly coordinating and full of adaptive ability in a clutter environment. Such outstanding and unique characteristics are acquired through millions of years' evolution. It is highly desirable to enhance robot with such characteristics, which is one of the ultimate aims of biomimetic research. To achieve this goal, we adopt a central pattern generator (CPG) inspired controller to generate Serpentine locomotion in a crawling robot. According to biology studies, CPGs are a set of neuronal circuits, which are responsible for producing rhythmic motion employed in animal locomotion. Such locomotion generation approach makes use of a set of coupled Kuramoto Oscillators to imitate of CPG in a nerve system. Moreover, to deal with dynamically changing environments, a feedback based on fuzzy logic control strategy is investigated. Finally, the proposed control approach is verified through the experiments of a crawling robot prototype.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121317282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Pose-invariant kinematic features for action recognition 动作识别的位姿不变运动特征
M. Ramanathan, W. Yau, E. Teoh, N. Magnenat-Thalmann
Recognition of actions from videos is a difficult task due to several factors like dynamic backgrounds, occlusion, pose-variations observed. To tackle the pose variation problem, we propose a simple method based on a novel set of pose-invariant kinematic features which are encoded in a human body centric space. The proposed framework begins with detection of neck point, which will serve as a origin of body centric space. We propose a deep learning based classifier to detect neck point based on the output of fully connected network layer. With the help of the detected neck, propagation mechanism is proposed to divide the foreground region into head, torso and leg grids. The motion observed in each of these body part grids are represented using a set of pose-invariant kinematic features. These features represent motion of foreground or body region with respect to the detected neck point's motion and encoded based on view in a human body centric space. Based on these features, poseinvariant action recognition can be achieved. Due to the body centric space is used, non-upright human posture actions can also be handled easily. To test its effectiveness in non-upright human postures in actions, a new dataset is introduced with 8 non-upright actions performed by 35 subjects in 3 different views. Experiments have been conducted on benchmark and newly proposed non-upright action dataset to identify limitations and get insights on the proposed framework.
由于动态背景、遮挡、姿势变化等因素的影响,从视频中识别动作是一项艰巨的任务。为了解决位姿变化问题,我们提出了一种基于一组新的以人体为中心空间编码的位姿不变运动特征的简单方法。该框架从颈部点的检测开始,颈部点将作为身体中心空间的原点。我们提出了一种基于深度学习的分类器,基于全连接网络层的输出来检测颈部点。利用检测到的颈部,提出了将前景区域划分为头部、躯干和腿部网格的传播机制。在这些身体部分网格中观察到的运动使用一组位姿不变的运动学特征来表示。这些特征表示前景或身体区域相对于检测到的颈部点的运动,并基于以人体为中心的空间中的视图进行编码。基于这些特征,可以实现定位不变的动作识别。由于使用了以身体为中心的空间,非直立的人体姿势动作也可以轻松处理。为了测试其在人类非直立姿势动作中的有效性,引入了一个新的数据集,其中包含35名受试者在3个不同的视图中执行的8个非直立动作。在基准和新提出的非直立动作数据集上进行了实验,以确定所提出框架的局限性并获得见解。
{"title":"Pose-invariant kinematic features for action recognition","authors":"M. Ramanathan, W. Yau, E. Teoh, N. Magnenat-Thalmann","doi":"10.1109/APSIPA.2017.8282038","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282038","url":null,"abstract":"Recognition of actions from videos is a difficult task due to several factors like dynamic backgrounds, occlusion, pose-variations observed. To tackle the pose variation problem, we propose a simple method based on a novel set of pose-invariant kinematic features which are encoded in a human body centric space. The proposed framework begins with detection of neck point, which will serve as a origin of body centric space. We propose a deep learning based classifier to detect neck point based on the output of fully connected network layer. With the help of the detected neck, propagation mechanism is proposed to divide the foreground region into head, torso and leg grids. The motion observed in each of these body part grids are represented using a set of pose-invariant kinematic features. These features represent motion of foreground or body region with respect to the detected neck point's motion and encoded based on view in a human body centric space. Based on these features, poseinvariant action recognition can be achieved. Due to the body centric space is used, non-upright human posture actions can also be handled easily. To test its effectiveness in non-upright human postures in actions, a new dataset is introduced with 8 non-upright actions performed by 35 subjects in 3 different views. Experiments have been conducted on benchmark and newly proposed non-upright action dataset to identify limitations and get insights on the proposed framework.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133865510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust template matching using scale-adaptive deep convolutional features 使用尺度自适应深度卷积特征的鲁棒模板匹配
Jonghee Kim, Jinsu Kim, Seokeon Choi, Muhammad Abul Hasan, Changick Kim
In this paper, we propose a deep convolutional feature-based robust and efficient template matching method. The originality of the proposed method is that it is based on a scale-adaptive feature extraction approach. This approach is influenced by an observation that each layer in a CNN represents a different level of deep features of the actual image contents. In order to keep the features scalable, we extract deep feature vectors of the template and the input image adaptively from a layer of a CNN. By using such scalable and deep representation of the image contents, we attempt to solve the template matching by measuring the similarity between the features of the template and the input image using an efficient similarity measuring technique called normalized cross-correlation (NCC). Using NCC helps in avoiding redundant computations of adjacent patches caused by the sliding window approach. As a result, the proposed method achieves state-of-the-art template matching performance and lowers the computational cost significantly than the state-of- the-art methods in the literature.
本文提出了一种基于深度卷积特征的鲁棒高效模板匹配方法。该方法的独创性在于基于尺度自适应特征提取方法。这种方法受到一个观察结果的影响,即CNN中的每一层代表了实际图像内容的不同层次的深度特征。为了保持特征的可扩展性,我们从CNN的一层中自适应提取模板和输入图像的深度特征向量。通过使用这种可扩展和深度的图像内容表示,我们尝试通过使用一种称为归一化互相关(NCC)的有效相似性测量技术测量模板与输入图像之间的相似性来解决模板匹配问题。使用NCC可以避免滑动窗口方法引起的相邻块的冗余计算。因此,与文献中最先进的方法相比,所提出的方法实现了最先进的模板匹配性能,并显著降低了计算成本。
{"title":"Robust template matching using scale-adaptive deep convolutional features","authors":"Jonghee Kim, Jinsu Kim, Seokeon Choi, Muhammad Abul Hasan, Changick Kim","doi":"10.1109/APSIPA.2017.8282124","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282124","url":null,"abstract":"In this paper, we propose a deep convolutional feature-based robust and efficient template matching method. The originality of the proposed method is that it is based on a scale-adaptive feature extraction approach. This approach is influenced by an observation that each layer in a CNN represents a different level of deep features of the actual image contents. In order to keep the features scalable, we extract deep feature vectors of the template and the input image adaptively from a layer of a CNN. By using such scalable and deep representation of the image contents, we attempt to solve the template matching by measuring the similarity between the features of the template and the input image using an efficient similarity measuring technique called normalized cross-correlation (NCC). Using NCC helps in avoiding redundant computations of adjacent patches caused by the sliding window approach. As a result, the proposed method achieves state-of-the-art template matching performance and lowers the computational cost significantly than the state-of- the-art methods in the literature.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124954486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
CNN-based bottleneck feature for noise robust query-by-example spoken term detection 基于cnn的噪声鲁棒样例查询语音词检测瓶颈特征
Hyungjun Lim, Younggwan Kim, Yoonhoe Kim, Hoirin Kim
This paper addresses the problem of query-by-example spoken term detection (QbE-STD) in the presence of background noises that are inevitable in real applications. To deal with this, we propose a convolutional neural network (CNN) based bottleneck feature representation for a keyword. A combined network that is made by attaching a bottleneck layer on top of a CNN is trained on Wall Street Journal (WSJ) database. Finally, dynamic time warping (DTW) based template matching is performed to measure the distance between enrollment and test feature matrices which are extracted from the bottleneck layer. The proposed method is evaluated in terms of equal error rate (EER) on Aurora 4 Database. A series of experimental results verify that the proposed method performs significantly better than the baseline system in noisy environments shows over 30% relative equal error rate (EER) improvement in average.
本文研究了在实际应用中不可避免的背景噪声存在下的按例查询语音词检测问题。为了解决这个问题,我们提出了一个基于卷积神经网络(CNN)的瓶颈特征表示关键字。在CNN上附加瓶颈层的组合网络是在《华尔街日报》(WSJ)数据库上进行训练的。最后,基于动态时间规整(DTW)的模板匹配,测量从瓶颈层提取的特征矩阵与测试特征矩阵之间的距离。在Aurora 4数据库上用等错误率(EER)对该方法进行了评价。一系列实验结果表明,该方法在噪声环境下的性能明显优于基线系统,平均相对等误差率(EER)提高30%以上。
{"title":"CNN-based bottleneck feature for noise robust query-by-example spoken term detection","authors":"Hyungjun Lim, Younggwan Kim, Yoonhoe Kim, Hoirin Kim","doi":"10.1109/APSIPA.2017.8282220","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282220","url":null,"abstract":"This paper addresses the problem of query-by-example spoken term detection (QbE-STD) in the presence of background noises that are inevitable in real applications. To deal with this, we propose a convolutional neural network (CNN) based bottleneck feature representation for a keyword. A combined network that is made by attaching a bottleneck layer on top of a CNN is trained on Wall Street Journal (WSJ) database. Finally, dynamic time warping (DTW) based template matching is performed to measure the distance between enrollment and test feature matrices which are extracted from the bottleneck layer. The proposed method is evaluated in terms of equal error rate (EER) on Aurora 4 Database. A series of experimental results verify that the proposed method performs significantly better than the baseline system in noisy environments shows over 30% relative equal error rate (EER) improvement in average.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"305 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124347782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
A perception system for robot arms to convey objects to in-car passengers 机器人手臂的感知系统,将物体传递给车内乘客
Li Jun, Tee Keng Peng, C. Lawrence, Wan Kong Wah, Yau Wei Yun
Automatically delivering objects to in-car passengers has many potential applications. Such a system generally consists of two sub-systems: a perception system and an action system. The perception system basically looks for the targets' positions and the action system delivers objects to the targets. In this paper, we propose a novel perception system, which contains two major functions: estimation of reaching points and discovering potential risks. The reaching points are the locations where robot arms needs to reach. Moreover, it should be able to reach with comfort by passengers and keep a safe distance from the car body. In order to achieve this, all the vehicle components (side surfaces, side mirrors etc.), which may cause collision, need to be detected. Potential risks are usually caused by moving objects or changing door state (close to open) during the operation. It is necessary to monitor these two situations to avoid any potential risks during operation. Our offline test shows that the accuracy of reaching points estimation can reach up to 94% and the response time for moving objects detection or door state changes is less than 1 millisecond.
自动向车内乘客递送物品有许多潜在的应用。这种系统一般由两个子系统组成:感知系统和行动系统。感知系统主要是寻找目标的位置,而行动系统则将物体传递给目标。在本文中,我们提出了一个新的感知系统,它包含两个主要功能:到达点的估计和潜在风险的发现。到达点是机器人手臂需要到达的位置。此外,它应该能够让乘客舒适地接触到,并与车身保持安全距离。为了实现这一目标,需要检测可能导致碰撞的所有车辆部件(侧表面,侧后视镜等)。潜在的危险通常是由于在操作过程中移动物体或改变门的状态(关闭或打开)而引起的。有必要对这两种情况进行监测,以避免在操作过程中出现任何潜在的风险。我们的离线测试表明,到达点估计的准确率可以达到94%,移动物体检测或门状态变化的响应时间小于1毫秒。
{"title":"A perception system for robot arms to convey objects to in-car passengers","authors":"Li Jun, Tee Keng Peng, C. Lawrence, Wan Kong Wah, Yau Wei Yun","doi":"10.1109/APSIPA.2017.8282065","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282065","url":null,"abstract":"Automatically delivering objects to in-car passengers has many potential applications. Such a system generally consists of two sub-systems: a perception system and an action system. The perception system basically looks for the targets' positions and the action system delivers objects to the targets. In this paper, we propose a novel perception system, which contains two major functions: estimation of reaching points and discovering potential risks. The reaching points are the locations where robot arms needs to reach. Moreover, it should be able to reach with comfort by passengers and keep a safe distance from the car body. In order to achieve this, all the vehicle components (side surfaces, side mirrors etc.), which may cause collision, need to be detected. Potential risks are usually caused by moving objects or changing door state (close to open) during the operation. It is necessary to monitor these two situations to avoid any potential risks during operation. Our offline test shows that the accuracy of reaching points estimation can reach up to 94% and the response time for moving objects detection or door state changes is less than 1 millisecond.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116042461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Motion planning of a 6-Dofs robot arm for bandaging nursing task 用于包扎护理任务的6自由度机械臂运动规划
Yi Feng, Zhifeng Huang, Yun Zhang
In this paper, the motion planning of 6-Dofs robot arm for bandaging nursing task is proposed. With the increase in the degree of social aging, and inconvenient old man is easily injured in daily life, the robot which can handle the simple task of medical care can give them a great help. In this task, DH method is used to establish the robot model, and using the RPY coordinate transformation to plan the spiral trajectory. And then detect the collision between the joint or link of the robot and the injured place. Later with the result of the detection, changing the end effector's posture to avoid collision in order to avoid the damage to wound. By using the method a simulation experiment is made in MATLAB. And the collision-free trajectory motion is realized in the bandaging simulation experiment.
本文提出了用于包扎护理任务的六自由度机械臂的运动规划。随着社会老龄化程度的增加,不便的老人在日常生活中容易受伤,可以处理简单医疗任务的机器人可以给他们很大的帮助。在本任务中,采用DH方法建立机器人模型,并利用RPY坐标变换规划螺旋轨迹。然后检测机器人的关节或环节与受伤部位之间的碰撞。随后根据检测结果,改变末端执行器的姿态以避免碰撞,从而避免对伤口造成伤害。利用该方法在MATLAB中进行了仿真实验。在绑扎仿真实验中实现了无碰撞的轨迹运动。
{"title":"Motion planning of a 6-Dofs robot arm for bandaging nursing task","authors":"Yi Feng, Zhifeng Huang, Yun Zhang","doi":"10.1109/APSIPA.2017.8282066","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282066","url":null,"abstract":"In this paper, the motion planning of 6-Dofs robot arm for bandaging nursing task is proposed. With the increase in the degree of social aging, and inconvenient old man is easily injured in daily life, the robot which can handle the simple task of medical care can give them a great help. In this task, DH method is used to establish the robot model, and using the RPY coordinate transformation to plan the spiral trajectory. And then detect the collision between the joint or link of the robot and the injured place. Later with the result of the detection, changing the end effector's posture to avoid collision in order to avoid the damage to wound. By using the method a simulation experiment is made in MATLAB. And the collision-free trajectory motion is realized in the bandaging simulation experiment.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115227581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Joint unsupervised adaptation of n-gram and RNN language models via LDA-based hybrid mixture modeling 基于lda混合建模的n-gram和RNN语言模型联合无监督自适应
Ryo Masumura, Taichi Asami, H. Masataki, Y. Aono
This paper reports an initial study of unsupervised adaptation that assumes simultaneous use of both n-gram and recurrent neural network (RNN) language models (LMs) in automatic speech recognition (ASR). It is known that a combination of n-grams and RNN LMs is a more effective approach to ASR than using each of them singly. However, unsupervised adaptation methods that simultaneously adapt both n-grams and RNN LMs have not been presented while various unsupervised adaptation methods specific to either n-gram LMs or RNN LMs have been examined. In order to handle different LMs in a unified unsupervised adaptation framework, our key idea is to introduce mixture modeling for both n-gram LMs and RNN LMs. The mixture modeling can simultaneously handle multiple LMs and unsupervised adaptation can be easily accomplished merely by adjusting their mixture weights using a recognition hypothesis of an input speech. This paper proposes joint unsupervised adaptation achieved by a hybrid mixture modeling using both n-gram mixture models and RNN mixture models. We present latent Dirichlet allocation based hybrid mixture modeling for effective topic adaptation. Our experiments in lecture ASR tasks show the effectiveness of joint unsupervised adaptation. We also reveal performance in which only one n-gram or RNN LM is adapted.
本文报道了一项无监督自适应的初步研究,该研究假设在自动语音识别(ASR)中同时使用n-gram和递归神经网络(RNN)语言模型(LMs)。众所周知,n-grams和RNN LMs的组合比单独使用它们更有效。然而,同时适应n图和RNN LMs的无监督自适应方法尚未提出,而针对n图LMs或RNN LMs的各种无监督自适应方法已经被研究过。为了在统一的无监督自适应框架中处理不同的lm,我们的关键思想是为n-gram lm和RNN lm引入混合建模。混合建模可以同时处理多个LMs,并且只需使用输入语音的识别假设来调整混合权重即可轻松实现无监督自适应。本文提出了n-gram混合模型和RNN混合模型混合建模实现联合无监督自适应的方法。我们提出了基于潜在狄利克雷分配的混合模型,以实现有效的主题自适应。我们在课堂ASR任务中的实验显示了联合无监督自适应的有效性。我们还揭示了仅适应一个n-gram或RNN LM的性能。
{"title":"Joint unsupervised adaptation of n-gram and RNN language models via LDA-based hybrid mixture modeling","authors":"Ryo Masumura, Taichi Asami, H. Masataki, Y. Aono","doi":"10.1109/APSIPA.2017.8282277","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282277","url":null,"abstract":"This paper reports an initial study of unsupervised adaptation that assumes simultaneous use of both n-gram and recurrent neural network (RNN) language models (LMs) in automatic speech recognition (ASR). It is known that a combination of n-grams and RNN LMs is a more effective approach to ASR than using each of them singly. However, unsupervised adaptation methods that simultaneously adapt both n-grams and RNN LMs have not been presented while various unsupervised adaptation methods specific to either n-gram LMs or RNN LMs have been examined. In order to handle different LMs in a unified unsupervised adaptation framework, our key idea is to introduce mixture modeling for both n-gram LMs and RNN LMs. The mixture modeling can simultaneously handle multiple LMs and unsupervised adaptation can be easily accomplished merely by adjusting their mixture weights using a recognition hypothesis of an input speech. This paper proposes joint unsupervised adaptation achieved by a hybrid mixture modeling using both n-gram mixture models and RNN mixture models. We present latent Dirichlet allocation based hybrid mixture modeling for effective topic adaptation. Our experiments in lecture ASR tasks show the effectiveness of joint unsupervised adaptation. We also reveal performance in which only one n-gram or RNN LM is adapted.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115242580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A free Kazakh speech database and a speech recognition baseline 一个免费的哈萨克语语音数据库和语音识别基线
Ying Shi, Askar Hamdullah, Zhiyuan Tang, Dong Wang, T. Zheng
Automatic speech recognition (ASR) has gained significant improvement for major languages such as English and Chinese, partly due to the emergence of deep neural networks (DNN) and large amount of training data. For minority languages, however, the progress is largely behind the main stream. A particularly obstacle is that there are almost no large-scale speech databases for minority languages, and the only few databases are held by some institutes as private properties, far from open and standard, and very few are free. Besides the speech database, phonetic and linguistic resources are also scarce, including phone set, lexicon, and language model. In this paper, we publish a speech database in Kazakh, a major minority language in the western China. Accompanying this database, a full set of phonetic and linguistic resources are also published, by which a full-fledged Kazakh ASR system can be constructed. We will describe the recipe for constructing a baseline system, and report our present results. The resources are free for research institutes and can be obtained by request. The publication is supported by the M2ASR project supported by NSFC, which aims to build multilingual ASR systems for minority languages in China.
自动语音识别(ASR)在英语和汉语等主要语言上取得了显著的进步,部分原因是深度神经网络(DNN)和大量训练数据的出现。然而,对于少数民族语言来说,进步在很大程度上落后于主流语言。一个特别的障碍是,目前几乎没有大规模的少数民族语言语音数据库,仅有的几个数据库被一些机构作为私有财产持有,远非开放和标准,而且很少是免费的。除了语音数据库,语音和语言资源也很匮乏,包括电话机、词汇和语言模型。在本文中,我们发布了一个哈萨克语语音数据库,哈萨克语是中国西部主要的少数民族语言。与此数据库配套的还有一整套的语音和语言资源,通过这些资源可以构建一个完整的哈萨克语ASR系统。我们将描述构建基线系统的方法,并报告我们目前的结果。这些资源对研究机构是免费的,可以根据要求获得。本论文受国家自然科学基金M2ASR项目资助,该项目旨在构建中国少数民族语言的多语种ASR系统。
{"title":"A free Kazakh speech database and a speech recognition baseline","authors":"Ying Shi, Askar Hamdullah, Zhiyuan Tang, Dong Wang, T. Zheng","doi":"10.1109/APSIPA.2017.8282133","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282133","url":null,"abstract":"Automatic speech recognition (ASR) has gained significant improvement for major languages such as English and Chinese, partly due to the emergence of deep neural networks (DNN) and large amount of training data. For minority languages, however, the progress is largely behind the main stream. A particularly obstacle is that there are almost no large-scale speech databases for minority languages, and the only few databases are held by some institutes as private properties, far from open and standard, and very few are free. Besides the speech database, phonetic and linguistic resources are also scarce, including phone set, lexicon, and language model. In this paper, we publish a speech database in Kazakh, a major minority language in the western China. Accompanying this database, a full set of phonetic and linguistic resources are also published, by which a full-fledged Kazakh ASR system can be constructed. We will describe the recipe for constructing a baseline system, and report our present results. The resources are free for research institutes and can be obtained by request. The publication is supported by the M2ASR project supported by NSFC, which aims to build multilingual ASR systems for minority languages in China.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115682427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Development of under-resourced Bahasa Indonesia speech corpus 发展资源不足的印尼语语料库
E. Cahyaningtyas, D. Arifianto
Although Bahasa Indonesia is used by about 263 milion people in the world, it is calssified into an under- resourced language. In this paper we outlined the development of casual sentences of Bahasa Indonesia speech corpus in which contains a speech database and its transcription. Firstly, we selected casual Bahasa Indonesia sentences from movie and drama trasncript and formed 1029 declarative sentences and 500 question sentences, respectively. We hired six professional radio news readers to utter the sentences to avoid local dialect in sound-proof booth. Then segmentation and labeling was performed to make create transcription including the time label of each invidual phoneme. To ensure the quality of the database, we manually inspected the waveform and the frequency of the individual sentences using spectrogram. The results suggest that the speech corpus may be used for speech processing project like speech recognition and speech synthesis. In the on-going research, we are developing high quality of speech synthesis, namely speaker adaptation and speaker averaging.
虽然世界上约有2.63亿人使用印尼语,但它被归类为资源不足的语言。本文概述了印尼语语料库中随意句的发展,其中包含一个语音数据库及其转录。首先,我们从电影剧本和戏剧剧本中选取随意的印尼语句子,分别组成1029个陈述句和500个疑问句。我们聘请了6名专业的广播新闻播音员,在隔音隔间里朗读句子,避免使用当地方言。然后进行切分和标记,生成包含每个音素时间标记的转录。为了保证数据库的质量,我们使用谱图手工检查了波形和单个句子的频率。结果表明,该语料库可用于语音识别和语音合成等语音处理项目。在正在进行的研究中,我们正在开发高质量的语音合成,即说话人自适应和说话人平均。
{"title":"Development of under-resourced Bahasa Indonesia speech corpus","authors":"E. Cahyaningtyas, D. Arifianto","doi":"10.1109/APSIPA.2017.8282191","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282191","url":null,"abstract":"Although Bahasa Indonesia is used by about 263 milion people in the world, it is calssified into an under- resourced language. In this paper we outlined the development of casual sentences of Bahasa Indonesia speech corpus in which contains a speech database and its transcription. Firstly, we selected casual Bahasa Indonesia sentences from movie and drama trasncript and formed 1029 declarative sentences and 500 question sentences, respectively. We hired six professional radio news readers to utter the sentences to avoid local dialect in sound-proof booth. Then segmentation and labeling was performed to make create transcription including the time label of each invidual phoneme. To ensure the quality of the database, we manually inspected the waveform and the frequency of the individual sentences using spectrogram. The results suggest that the speech corpus may be used for speech processing project like speech recognition and speech synthesis. In the on-going research, we are developing high quality of speech synthesis, namely speaker adaptation and speaker averaging.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124435699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1