首页 > 最新文献

Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific最新文献

英文 中文
Selection of best match keyword using spoken term detection for spoken document indexing 使用口语词检测为口语文档索引选择最佳匹配关键字
Kentaro Domoto, T. Utsuro, N. Sawada, H. Nishizaki
This paper presents a novel keyword selection-based spoken document-indexing framework that selects the best match keyword from query candidates using spoken term detection (STD) for spoken document retrieval. Our method comprises creating a keyword set including keywords that are likely to be in a spoken document. Next, an STD is conducted for all the keywords as query terms for STD; then, the detection result, a set of each keyword and its detection intervals in the spoken document, is obtained. For the keywords that have competitive intervals, we rank them based on the matching cost of STD and select the best one with the longest duration among competitive detections. This is the final output of STD process and serves as an index word for the spoken document. The proposed framework was evaluated on lecture speeches as spoken documents in an STD task. The results show that our framework was quite effective for preventing false detection errors and in annotating keyword indices to spoken documents.
本文提出了一种基于关键字选择的语音文档索引框架,该框架利用语音词检测(STD)从查询候选者中选择最匹配的关键字进行语音文档检索。我们的方法包括创建一个关键字集,其中包括可能出现在口语文档中的关键字。然后,对所有作为STD查询词的关键词执行STD;然后,得到语音文档中每个关键字及其检测间隔的集合作为检测结果。对于具有竞争间隔的关键词,我们根据STD的匹配成本对它们进行排序,并在竞争检测中选择持续时间最长的最佳关键词。这是STD过程的最终输出,并作为口语文档的索引词。在STD任务中,以演讲作为口语文档对所提出的框架进行了评估。结果表明,该框架在防止误检错误和为口语文档标注关键词索引方面非常有效。
{"title":"Selection of best match keyword using spoken term detection for spoken document indexing","authors":"Kentaro Domoto, T. Utsuro, N. Sawada, H. Nishizaki","doi":"10.1109/APSIPA.2014.7041589","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041589","url":null,"abstract":"This paper presents a novel keyword selection-based spoken document-indexing framework that selects the best match keyword from query candidates using spoken term detection (STD) for spoken document retrieval. Our method comprises creating a keyword set including keywords that are likely to be in a spoken document. Next, an STD is conducted for all the keywords as query terms for STD; then, the detection result, a set of each keyword and its detection intervals in the spoken document, is obtained. For the keywords that have competitive intervals, we rank them based on the matching cost of STD and select the best one with the longest duration among competitive detections. This is the final output of STD process and serves as an index word for the spoken document. The proposed framework was evaluated on lecture speeches as spoken documents in an STD task. The results show that our framework was quite effective for preventing false detection errors and in annotating keyword indices to spoken documents.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":"215 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123869677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-learning-based signal decomposition for multimedia applications: A review and comparative study 基于自学习的多媒体信号分解:综述与比较研究
Li-Wei Kang, C. Yeh, Duan-Yu Chen, Chia-Tsung Lin
Decomposition of a signal (e.g., image or video) into multiple semantic components has been an effective research topic for various image/video processing applications, such as image/video denoising, enhancement, and inpainting. In this paper, we present a survey of signal decomposition frameworks based on the uses of sparsity and morphological diversity in signal mixtures and its applications in multimedia. First, we analyze existing MCA (morphological component analysis) based image decomposition frameworks with their applications and explore the potential limitations of these approaches for image denoising. Then, we discuss our recently proposed self-learning based image decomposition framework with its applications to several image/video denoising tasks, including single image rain streak removal, denoising, deblocking, joint super-resolution and deblocking for a highly compressed image/video. By advancing sparse representation and morphological diversity of image signals, the proposed framework first learns an over-complete dictionary from the high frequency part of an input image for reconstruction purposes. An unsupervised or supervised clustering technique is applied to the dictionary atoms for identifying the morphological component corresponding to the noise pattern of interest (e.g., rain streaks, blocking artifacts, or Gaussian noises). Different from prior learning-based approaches, our method does not need to collect training data in advance and no image priors are required. Our experimental results have confirmed the effectiveness and robustness of the proposed framework, which has been shown to outperform state-of-the-art approaches.
将信号(如图像或视频)分解为多个语义成分已经成为各种图像/视频处理应用的有效研究课题,例如图像/视频去噪、增强和涂漆。本文综述了基于稀疏性和形态多样性的信号分解框架及其在多媒体中的应用。首先,我们分析了现有的基于形态成分分析的图像分解框架及其应用,并探讨了这些方法在图像去噪方面的潜在局限性。然后,我们讨论了我们最近提出的基于自学习的图像分解框架及其在若干图像/视频去噪任务中的应用,包括单幅图像雨纹去除、去噪、去块、高度压缩图像/视频的联合超分辨率和去块。通过提高图像信号的稀疏表示和形态多样性,该框架首先从输入图像的高频部分学习一个过完备字典进行重建。将无监督或有监督聚类技术应用于字典原子,以识别与感兴趣的噪声模式(例如,雨条,阻塞伪影或高斯噪声)相对应的形态学成分。与之前基于学习的方法不同,我们的方法不需要提前收集训练数据,也不需要图像先验。我们的实验结果证实了所提出框架的有效性和鲁棒性,该框架已被证明优于最先进的方法。
{"title":"Self-learning-based signal decomposition for multimedia applications: A review and comparative study","authors":"Li-Wei Kang, C. Yeh, Duan-Yu Chen, Chia-Tsung Lin","doi":"10.1109/APSIPA.2014.7041778","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041778","url":null,"abstract":"Decomposition of a signal (e.g., image or video) into multiple semantic components has been an effective research topic for various image/video processing applications, such as image/video denoising, enhancement, and inpainting. In this paper, we present a survey of signal decomposition frameworks based on the uses of sparsity and morphological diversity in signal mixtures and its applications in multimedia. First, we analyze existing MCA (morphological component analysis) based image decomposition frameworks with their applications and explore the potential limitations of these approaches for image denoising. Then, we discuss our recently proposed self-learning based image decomposition framework with its applications to several image/video denoising tasks, including single image rain streak removal, denoising, deblocking, joint super-resolution and deblocking for a highly compressed image/video. By advancing sparse representation and morphological diversity of image signals, the proposed framework first learns an over-complete dictionary from the high frequency part of an input image for reconstruction purposes. An unsupervised or supervised clustering technique is applied to the dictionary atoms for identifying the morphological component corresponding to the noise pattern of interest (e.g., rain streaks, blocking artifacts, or Gaussian noises). Different from prior learning-based approaches, our method does not need to collect training data in advance and no image priors are required. Our experimental results have confirmed the effectiveness and robustness of the proposed framework, which has been shown to outperform state-of-the-art approaches.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116767970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
R-cube: A dialogue agent for restaurant recommendation and reservation R-cube:一个餐厅推荐和预订的对话代理
Seokhwan Kim, Rafael E. Banchs
This paper describes a hybrid dialogue system for restaurant recommendation and reservation. The proposed system combines rule-based and data-driven components by using a flexible architecture aiming at diminishing error propagation along the different steps of the dialogue management and processing pipeline. The system implements three basic subsystems for restaurant recommendation, selection and booking, which leverage on the same system architecture and processing components. The specific system described here operates with a data collection of Singapore's F&B industry but it can be easily adapted to any other city or location by simply replacing the used data collection.
本文介绍了一种用于餐厅推荐和预订的混合对话系统。提出的系统通过使用灵活的体系结构将基于规则的组件和数据驱动的组件结合在一起,旨在减少对话管理和处理管道的不同步骤中的错误传播。该系统实现了餐厅推荐、选择和预订三个基本子系统,它们利用相同的系统架构和处理组件。这里描述的特定系统使用新加坡餐饮行业的数据收集,但它可以很容易地适应任何其他城市或地点,只需更换使用的数据收集。
{"title":"R-cube: A dialogue agent for restaurant recommendation and reservation","authors":"Seokhwan Kim, Rafael E. Banchs","doi":"10.1109/APSIPA.2014.7041732","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041732","url":null,"abstract":"This paper describes a hybrid dialogue system for restaurant recommendation and reservation. The proposed system combines rule-based and data-driven components by using a flexible architecture aiming at diminishing error propagation along the different steps of the dialogue management and processing pipeline. The system implements three basic subsystems for restaurant recommendation, selection and booking, which leverage on the same system architecture and processing components. The specific system described here operates with a data collection of Singapore's F&B industry but it can be easily adapted to any other city or location by simply replacing the used data collection.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122845742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Real-time depth map generation using hybrid multi-view cameras 实时深度图生成使用混合多视图相机
Yunseok Song, Dong-Won Shin, Eunsang Ko, Yo-Sung Ho
In this paper, we present a hybrid multi-view camera system for real-time depth generation. We set up eight color cameras and three depth cameras. For simple test scenarios, we capture a single object at a blue-screen studio. The objective is depth map generation at eight color viewpoints. Due to hardware limitations, depth cameras produce low resolution images, i.e., 176×144. Thus, we warp the depth data to the color cameras views (1280×720) and then execute filtering. Joint bilateral filtering (JBF) is used to exploit range and spatial weights, considering color data as well. Simulation results exhibit depth generation of 13 frames per second (fps) when treating eight images as a single frame. When the proposed method is executed on a computer per depth camera basis, the speed can become three times faster. Thus, we have successfully achieved real-time depth generation using a hybrid multi-view camera system.
本文提出了一种用于实时深度生成的混合多视点相机系统。我们设置了8台彩色摄像机和3台深度摄像机。对于简单的测试场景,我们在蓝屏工作室捕获单个对象。目标是在八个颜色视点生成深度图。由于硬件限制,深度相机产生低分辨率图像,即176×144。因此,我们将深度数据扭曲到彩色摄像机视图(1280×720),然后执行过滤。联合双边滤波(JBF)利用距离和空间权重,同时考虑颜色数据。仿真结果显示,当将8张图像作为一帧处理时,深度生成速度为每秒13帧(fps)。当所提出的方法在每个深度相机的基础上在计算机上执行时,速度可以提高三倍。因此,我们已经成功地实现了实时深度生成使用混合多视图相机系统。
{"title":"Real-time depth map generation using hybrid multi-view cameras","authors":"Yunseok Song, Dong-Won Shin, Eunsang Ko, Yo-Sung Ho","doi":"10.1109/APSIPA.2014.7041683","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041683","url":null,"abstract":"In this paper, we present a hybrid multi-view camera system for real-time depth generation. We set up eight color cameras and three depth cameras. For simple test scenarios, we capture a single object at a blue-screen studio. The objective is depth map generation at eight color viewpoints. Due to hardware limitations, depth cameras produce low resolution images, i.e., 176×144. Thus, we warp the depth data to the color cameras views (1280×720) and then execute filtering. Joint bilateral filtering (JBF) is used to exploit range and spatial weights, considering color data as well. Simulation results exhibit depth generation of 13 frames per second (fps) when treating eight images as a single frame. When the proposed method is executed on a computer per depth camera basis, the speed can become three times faster. Thus, we have successfully achieved real-time depth generation using a hybrid multi-view camera system.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131539563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Robust anchorperson detection based on audio streams using a hybrid I-vector and DNN system 基于i向量和DNN混合系统的音频流鲁棒主播检测
Yun-Fan Chang, Payton Lin, Shao-Hua Cheng, Kai-Hsuan Chan, Y. Zeng, Chia-Wei Liao, Wen-Tsung Chang, Y. Wang, Yu Tsao
Anchorperson segment detection enables efficient video content indexing for information retrieval. Anchorperson detection based on audio analysis has gained popularity due to lower computational complexity and satisfactory performance. This paper presents a robust framework using a hybrid I-vector and deep neural network (DNN) system to perform anchorperson detection based on audio streams of video content. The proposed system first applies I-vector to extract speaker identity features from the audio data. With the extracted speaker identity features, a DNN classifier is then used to verify the claimed anchorperson identity. In addition, subspace feature normalization (SFN) is incorporated into the hybrid system for robust feature extraction to compensate the audio mismatch issues caused by recording devices. An anchorperson verification experiment was conducted to evaluate the equal error rate (EER) of the proposed hybrid system. Experimental results demonstrate that the proposed system outperforms the state-of-the-art hybrid I-vector and support vector machine (SVM) system. Moreover, the proposed system was further enhanced by integrating SFN to effectively compensate the audio mismatch issues in anchorperson detection tasks.
主播片段检测使高效的视频内容索引信息检索。基于音频分析的主播检测以其较低的计算复杂度和令人满意的性能得到了广泛的应用。本文提出了一种鲁棒框架,利用混合i向量和深度神经网络(DNN)系统来执行基于视频内容音频流的主播检测。该系统首先利用i向量从音频数据中提取说话人身份特征。使用提取的说话人身份特征,然后使用DNN分类器来验证所声明的主播身份。此外,在混合系统中引入了子空间特征归一化(SFN)来进行鲁棒特征提取,以补偿由录音设备引起的音频不匹配问题。通过主播验证实验对该混合系统的等错误率进行了评价。实验结果表明,该系统优于当前最先进的i向量和支持向量机(SVM)混合系统。此外,该系统通过集成SFN进一步增强,有效补偿主播检测任务中的音频不匹配问题。
{"title":"Robust anchorperson detection based on audio streams using a hybrid I-vector and DNN system","authors":"Yun-Fan Chang, Payton Lin, Shao-Hua Cheng, Kai-Hsuan Chan, Y. Zeng, Chia-Wei Liao, Wen-Tsung Chang, Y. Wang, Yu Tsao","doi":"10.1109/APSIPA.2014.7041717","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041717","url":null,"abstract":"Anchorperson segment detection enables efficient video content indexing for information retrieval. Anchorperson detection based on audio analysis has gained popularity due to lower computational complexity and satisfactory performance. This paper presents a robust framework using a hybrid I-vector and deep neural network (DNN) system to perform anchorperson detection based on audio streams of video content. The proposed system first applies I-vector to extract speaker identity features from the audio data. With the extracted speaker identity features, a DNN classifier is then used to verify the claimed anchorperson identity. In addition, subspace feature normalization (SFN) is incorporated into the hybrid system for robust feature extraction to compensate the audio mismatch issues caused by recording devices. An anchorperson verification experiment was conducted to evaluate the equal error rate (EER) of the proposed hybrid system. Experimental results demonstrate that the proposed system outperforms the state-of-the-art hybrid I-vector and support vector machine (SVM) system. Moreover, the proposed system was further enhanced by integrating SFN to effectively compensate the audio mismatch issues in anchorperson detection tasks.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":"62 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131540250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Comparison the training methods of neural network for English and Thai character recognition 神经网络训练方法在英文和泰文字符识别中的比较
A. Saenthon, Natchanon Sukkhadamrongrak
Currently, the optical character recognition (OCR) is applied in many fields such as reading the office letter and to read the serial on parts of industrial. The most manufacturing focus the processing time and accuracy for inspection process. The learning method of the optical character recognition is used a neural network to recognize the fonts and correlation the matching value. The neural network has many learning techniques which each technique impact to the processing time and accuracy. Therefore, this paper studies to comparisons a suitable procedure of training in neural network for recognizing both Thai and English characters. The experiment results show the comparing values of error and processing time of each training technique.
目前,光学字符识别(OCR)技术已广泛应用于办公信件的识别、工业零件的序列号识别等领域。制造过程中最关注的是检验过程的加工时间和精度。光学字符识别的学习方法是利用神经网络对字体进行识别并关联匹配值。神经网络有许多学习技术,每一种技术都会对处理时间和精度产生影响。为此,本文研究比较了一种适合于泰、英文字符识别的神经网络训练方法。实验结果显示了各种训练方法的误差和处理时间的比较值。
{"title":"Comparison the training methods of neural network for English and Thai character recognition","authors":"A. Saenthon, Natchanon Sukkhadamrongrak","doi":"10.1109/APSIPA.2014.7041795","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041795","url":null,"abstract":"Currently, the optical character recognition (OCR) is applied in many fields such as reading the office letter and to read the serial on parts of industrial. The most manufacturing focus the processing time and accuracy for inspection process. The learning method of the optical character recognition is used a neural network to recognize the fonts and correlation the matching value. The neural network has many learning techniques which each technique impact to the processing time and accuracy. Therefore, this paper studies to comparisons a suitable procedure of training in neural network for recognizing both Thai and English characters. The experiment results show the comparing values of error and processing time of each training technique.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":"304 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132744084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Redefining self-similarity in natural images for denoising using graph signal gradient 重新定义自然图像的自相似度,利用图信号梯度去噪
Jiahao Pang, Gene Cheung, Wei Hu, O. Au
Image denoising is the most basic inverse imaging problem. As an under-determined problem, appropriate definition of image priors to regularize the problem is crucial. Among recent proposed priors for image denoising are: i) graph Laplacian regularizer where a given pixel patch is assumed to be smooth in the graph-signal domain; and ii) self-similarity prior where image patches are assumed to recur throughout a natural image in non-local spatial regions. In our first contribution, we demonstrate that the graph Laplacian regularizer converges to a continuous time functional counterpart, and careful selection of its features can lead to a discriminant signal prior. In our second contribution, we redefine patch self-similarity in terms of patch gradients and argue that the new definition results in a more accurate estimate of the graph Laplacian matrix, and thus better image denoising performance. Experiments show that our designed algorithm based on graph Laplacian regularizer and gradient-based self-similarity can outperform non-local means (NLM) denoising by up to 1.4 dB in PSNR.
图像去噪是最基本的逆成像问题。作为一个欠定问题,适当的图像先验定义对问题进行正则化至关重要。最近提出的图像去噪的先验算法有:i)图拉普拉斯正则化,其中假设给定的像素块在图信号域中是光滑的;ii)自相似性先验,即假设图像斑块在非局部空间区域中在整个自然图像中反复出现。在我们的第一个贡献中,我们证明了图拉普拉斯正则化器收敛于连续时间函数对立物,并且仔细选择其特征可以导致判别信号先验。在我们的第二篇论文中,我们根据斑块梯度重新定义了斑块自相似性,并认为新的定义可以更准确地估计图拉普拉斯矩阵,从而获得更好的图像去噪性能。实验表明,基于图拉普拉斯正则化和基于梯度的自相似度的算法比非局部均值(NLM)去噪的PSNR提高了1.4 dB。
{"title":"Redefining self-similarity in natural images for denoising using graph signal gradient","authors":"Jiahao Pang, Gene Cheung, Wei Hu, O. Au","doi":"10.1109/APSIPA.2014.7041627","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041627","url":null,"abstract":"Image denoising is the most basic inverse imaging problem. As an under-determined problem, appropriate definition of image priors to regularize the problem is crucial. Among recent proposed priors for image denoising are: i) graph Laplacian regularizer where a given pixel patch is assumed to be smooth in the graph-signal domain; and ii) self-similarity prior where image patches are assumed to recur throughout a natural image in non-local spatial regions. In our first contribution, we demonstrate that the graph Laplacian regularizer converges to a continuous time functional counterpart, and careful selection of its features can lead to a discriminant signal prior. In our second contribution, we redefine patch self-similarity in terms of patch gradients and argue that the new definition results in a more accurate estimate of the graph Laplacian matrix, and thus better image denoising performance. Experiments show that our designed algorithm based on graph Laplacian regularizer and gradient-based self-similarity can outperform non-local means (NLM) denoising by up to 1.4 dB in PSNR.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":"35 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114117807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Recursive neural network paraphrase identification for example-based dialog retrieval 基于实例的对话检索递归神经网络释义识别
Lasguido Nio, S. Sakti, Graham Neubig, T. Toda, Satoshi Nakamura
An example-based dialog model often require a lot of data collections to achieve a good performance. However, when it comes on handling an out of vocabulary (OOV) database queries, this approach resulting in weakness and inadequate handling of interactions between words in the sentence. In this work, we try to overcome this problem by utilizing recursive neural network paraphrase identification to improve the robustness of example-based dialog response retrieval. We model our dialog-pair database and user input query with distributed word representations, and employ recursive autoencoders and dynamic pooling to determine whether two sentences with arbitrary length have the same meaning. The distributed representations have the potential to improve handling of OOV cases, and the recursive structure can reduce confusion in example matching.
基于示例的对话框模型通常需要大量的数据收集才能获得良好的性能。然而,在处理词汇量不足(OOV)数据库查询时,这种方法会导致对句子中单词之间交互的处理不足。在这项工作中,我们试图通过使用递归神经网络释义识别来克服这个问题,以提高基于示例的对话响应检索的鲁棒性。我们使用分布式单词表示对对话对数据库和用户输入查询进行建模,并使用递归自动编码器和动态池来确定任意长度的两个句子是否具有相同的含义。分布式表示有可能改善对OOV情况的处理,递归结构可以减少示例匹配中的混淆。
{"title":"Recursive neural network paraphrase identification for example-based dialog retrieval","authors":"Lasguido Nio, S. Sakti, Graham Neubig, T. Toda, Satoshi Nakamura","doi":"10.1109/APSIPA.2014.7041777","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041777","url":null,"abstract":"An example-based dialog model often require a lot of data collections to achieve a good performance. However, when it comes on handling an out of vocabulary (OOV) database queries, this approach resulting in weakness and inadequate handling of interactions between words in the sentence. In this work, we try to overcome this problem by utilizing recursive neural network paraphrase identification to improve the robustness of example-based dialog response retrieval. We model our dialog-pair database and user input query with distributed word representations, and employ recursive autoencoders and dynamic pooling to determine whether two sentences with arbitrary length have the same meaning. The distributed representations have the potential to improve handling of OOV cases, and the recursive structure can reduce confusion in example matching.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114629805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-agent ad hoc team partitioning by observing and modeling single-agent performance 通过观察和建模单智能体性能来进行多智能体特别团队划分
Etkin Baris Ozgul, Somchaya Liemhetcharat, K. H. Low
Multi-agent research has focused on finding the optimal team for a task. Many approaches assume that the performance of the agents are known a priori. We are interested in ad hoc teams, where the agents' algorithms and performance are initially unknown. We focus on the task of modeling the performance of single agents through observation in training environments, and using the learned models to partition a new environment for a multi-agent team. The goal is to minimize the number of agents used, while maintaining a performance threshold of the multi-agent team. We contribute a novel model to learn the agent's performance through observations, and a partitioning algorithm that minimizes the team size. We evaluate our algorithms in simulation, and show the efficacy of our learn model and partitioning algorithm.
多智能体研究的重点是寻找任务的最佳团队。许多方法假设代理的性能是已知的先验。我们对临时团队感兴趣,其中代理的算法和性能最初是未知的。我们专注于通过在训练环境中观察单个智能体的性能来建模,并使用学习到的模型为多智能体团队划分新的环境。目标是尽量减少使用的代理数量,同时保持多代理团队的性能阈值。我们提出了一个新的模型,通过观察来学习智能体的性能,以及一个最小化团队规模的划分算法。我们在仿真中评估了我们的算法,并证明了我们的学习模型和划分算法的有效性。
{"title":"Multi-agent ad hoc team partitioning by observing and modeling single-agent performance","authors":"Etkin Baris Ozgul, Somchaya Liemhetcharat, K. H. Low","doi":"10.1109/APSIPA.2014.7041644","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041644","url":null,"abstract":"Multi-agent research has focused on finding the optimal team for a task. Many approaches assume that the performance of the agents are known a priori. We are interested in ad hoc teams, where the agents' algorithms and performance are initially unknown. We focus on the task of modeling the performance of single agents through observation in training environments, and using the learned models to partition a new environment for a multi-agent team. The goal is to minimize the number of agents used, while maintaining a performance threshold of the multi-agent team. We contribute a novel model to learn the agent's performance through observations, and a partitioning algorithm that minimizes the team size. We evaluate our algorithms in simulation, and show the efficacy of our learn model and partitioning algorithm.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114247026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Spectral-temporal receptive fields and MFCC balanced feature extraction for noisy speech recognition 噪声语音识别的频谱-时间接受野和MFCC平衡特征提取
Jia-Ching Wang, Chang-Hong Lin, En-Ting Chen, P. Chang
This paper aims to propose a new set of acoustic features based on spectral-temporal receptive fields (STRFs). The STRF is an analysis method for studying physiological model of the mammalian auditory system in spectral-temporal domain. It has two different parts: one is the rate (in Hz) which represents the temporal response and the other is the scale (in cycle/octave) which represents the spectral response. With the obtained STRF, we propose an effective acoustic feature. First, the energy of each scale is calculated from the STRF. The logarithmic operation is then imposed on the scale energies. Finally, the discrete Cosine transform is applied to generate the proposed STRF feature. In our experiments, we combine the proposed STRF feature with conventional Mel frequency cepstral coefficients (MFCCs) to verify its effectiveness. In a noise-free environment, the proposed feature can increase the recognition rate by 17.48%. Moreover, the increase in the recognition rate ranges from 5% to 12% in noisy environments.
本文旨在提出一套新的基于频谱-时间接受场(strf)的声学特征。STRF是研究哺乳动物听觉系统生理模型的一种频谱-时域分析方法。它有两个不同的部分:一个是表示时间响应的速率(以赫兹为单位),另一个是表示频谱响应的尺度(以周期/倍频)。利用得到的STRF,我们提出了一个有效的声学特征。首先,从STRF中计算出各个尺度的能量。然后对刻度能量进行对数运算。最后,应用离散余弦变换生成所提出的STRF特征。在我们的实验中,我们将提出的STRF特征与传统的Mel频率倒谱系数(MFCCs)相结合来验证其有效性。在无噪声环境下,该特征可将识别率提高17.48%。在噪声环境下,识别率的提高幅度在5% ~ 12%之间。
{"title":"Spectral-temporal receptive fields and MFCC balanced feature extraction for noisy speech recognition","authors":"Jia-Ching Wang, Chang-Hong Lin, En-Ting Chen, P. Chang","doi":"10.1109/APSIPA.2014.7041624","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041624","url":null,"abstract":"This paper aims to propose a new set of acoustic features based on spectral-temporal receptive fields (STRFs). The STRF is an analysis method for studying physiological model of the mammalian auditory system in spectral-temporal domain. It has two different parts: one is the rate (in Hz) which represents the temporal response and the other is the scale (in cycle/octave) which represents the spectral response. With the obtained STRF, we propose an effective acoustic feature. First, the energy of each scale is calculated from the STRF. The logarithmic operation is then imposed on the scale energies. Finally, the discrete Cosine transform is applied to generate the proposed STRF feature. In our experiments, we combine the proposed STRF feature with conventional Mel frequency cepstral coefficients (MFCCs) to verify its effectiveness. In a noise-free environment, the proposed feature can increase the recognition rate by 17.48%. Moreover, the increase in the recognition rate ranges from 5% to 12% in noisy environments.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114445257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1