首页 > 最新文献

Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific最新文献

英文 中文
Quality enhancement for feature matching on car black box videos 汽车黑匣子视频特征匹配的质量增强
C. Simon, Man Hee Lee, I. Park
Video has difficulty to maintain consistent intensity and color tone from frame to frame. Particularly, it happens when imaging device such as black box camera has to deal with fast changing illumination environment. However, conventional automatic white balance algorithms cannot handle this good enough to maintain tone consistency, which is observed in most commercial black box products. In this paper, a novel tone stabilization is proposed to enhance the performance of further applied algorithms like detecting and matching visual features across video frames. The proposed technique utilizes multiple anchor frames as references to smooth tone fluctuation between them. Experimental result shows the improvement of tone consistency as well as feature detection and matching accuracy on car black box videos with varying tone over time.
视频很难从一帧到另一帧保持一致的强度和色调。特别是当像黑匣子相机这样的成像设备需要处理快速变化的照明环境时,这种情况就会发生。然而,传统的自动白平衡算法不能很好地处理这种情况,以保持色调一致性,这在大多数商业黑匣子产品中都可以看到。在本文中,提出了一种新的色调稳定方法,以提高诸如跨视频帧的视觉特征检测和匹配等进一步应用算法的性能。该技术利用多个锚帧作为参考,以平滑它们之间的音调波动。实验结果表明,随着时间的推移,该方法对不同色调的汽车黑匣子视频的色调一致性、特征检测和匹配精度都有所提高。
{"title":"Quality enhancement for feature matching on car black box videos","authors":"C. Simon, Man Hee Lee, I. Park","doi":"10.1109/APSIPA.2014.7041750","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041750","url":null,"abstract":"Video has difficulty to maintain consistent intensity and color tone from frame to frame. Particularly, it happens when imaging device such as black box camera has to deal with fast changing illumination environment. However, conventional automatic white balance algorithms cannot handle this good enough to maintain tone consistency, which is observed in most commercial black box products. In this paper, a novel tone stabilization is proposed to enhance the performance of further applied algorithms like detecting and matching visual features across video frames. The proposed technique utilizes multiple anchor frames as references to smooth tone fluctuation between them. Experimental result shows the improvement of tone consistency as well as feature detection and matching accuracy on car black box videos with varying tone over time.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127883552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved robustness of biometrie authentication system using features of utterance 利用话语特征改进生物特征认证系统的鲁棒性
Qian Shi, Y. Kajikawa
In this paper, we propose a novel biometric authentication system using motion vectors of lips. We have already proposed a biometric authentication system using multimodal features of utterance. However, since both the edges and texture of lips can be easily extracted from a still image, an imposter may be recognized as a registrant by using a still image of the registrant. Therefore, the robustness of our biometric authentication system must be enhanced. Hence, we utilize motion vectors of lips as a feature. The proposed authentication system utilizes physical traits (edges and texture) and a behavioral trait (motion vectors) in the lip region to improve the security. Experimental results demonstrate that motion vectors in the lip region are effective for improving the robustness against imposters and can increase the authentication rate.
本文提出了一种基于嘴唇运动向量的生物特征认证系统。我们已经提出了一种使用话语多模态特征的生物识别认证系统。然而,由于嘴唇的边缘和纹理都可以很容易地从静止图像中提取出来,因此可以通过使用注册者的静止图像来识别冒名顶替者。因此,我们必须增强生物识别认证系统的鲁棒性。因此,我们利用嘴唇的运动向量作为特征。该认证系统利用唇区域的物理特征(边缘和纹理)和行为特征(运动矢量)来提高安全性。实验结果表明,唇区运动向量可以有效提高对冒名顶替者的鲁棒性,提高认证率。
{"title":"Improved robustness of biometrie authentication system using features of utterance","authors":"Qian Shi, Y. Kajikawa","doi":"10.1109/APSIPA.2014.7041562","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041562","url":null,"abstract":"In this paper, we propose a novel biometric authentication system using motion vectors of lips. We have already proposed a biometric authentication system using multimodal features of utterance. However, since both the edges and texture of lips can be easily extracted from a still image, an imposter may be recognized as a registrant by using a still image of the registrant. Therefore, the robustness of our biometric authentication system must be enhanced. Hence, we utilize motion vectors of lips as a feature. The proposed authentication system utilizes physical traits (edges and texture) and a behavioral trait (motion vectors) in the lip region to improve the security. Experimental results demonstrate that motion vectors in the lip region are effective for improving the robustness against imposters and can increase the authentication rate.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127914039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancement of EMG-based Thai number words classification using frame-based time domain features with stacking filter 基于帧时域特征和叠加滤波器增强基于肌电图的泰文数字词分类
N. Srisuwan, Michael Wand, M. Janke, P. Phukpattaranont, Tanja Schultz, C. Limsakul
In order to overcome a problem existing in a classical automatic speech recognition (e.g. ambient noise and loss of privacy), Electromyography (EMG) from speech production muscles was used in place of a human speech signal. We aim to investigate the EMG speech recognition based on Thai language. The earlier work, we used five channels of the EMG from the facial and neck muscles to classify 11 Thai number words based on Neural Network Classification. 15 features in time domain and frequency domain were employed for feature extraction. We obtained an average accuracy rate of 89.45% for audible speech and 78.55% for silent speech. However, it needs to be enhanced to get the best result. This paper proposes to improve an accuracy rate of EMG-based Thai number words classification. The ten subjects uttered 11 words in both an audible and a silent speech while five channels of the EMG signal were captured. Frame-based time domain features with a stacking filter was performed for feature extraction stage. After that, LDA was used to lessen a dimension of the feature vector. Hidden Markov Model (HMM) was employed in classification stage. The results show that using above techniques of feature extraction, feature dimensionality reduction and classification can improve an average accuracy rate by 3% absolute for audible speech when were compared to earlier work. We achieved an average classification rate of 92.45% and 75.73% for audible and silent speech respectively.
为了克服经典自动语音识别中存在的问题(例如环境噪声和隐私丢失),使用语音产生肌肉的肌电图(EMG)来代替人类语音信号。我们的目的是研究基于泰语的肌电图语音识别。在前期工作中,我们利用面部和颈部肌肉肌电图的5个通道,基于神经网络分类对11个泰语数字词进行分类,并利用15个时域和频域特征进行特征提取。对可听语音的平均准确率为89.45%,对无声语音的平均准确率为78.55%。然而,它需要增强以获得最佳效果。本文提出了一种提高基于肌电图的泰语数字词分类准确率的方法。这10名受试者以可听和无声的方式说出11个单词,同时捕捉到5个通道的肌电图信号。特征提取阶段采用基于帧的时域特征和叠加滤波器。然后,使用LDA对特征向量进行降维。分类阶段采用隐马尔可夫模型(HMM)。结果表明,采用上述特征提取、特征降维和分类技术,可使可听语音识别的平均准确率比之前的研究提高3%。我们对可听语音和无声语音的平均分类率分别为92.45%和75.73%。
{"title":"Enhancement of EMG-based Thai number words classification using frame-based time domain features with stacking filter","authors":"N. Srisuwan, Michael Wand, M. Janke, P. Phukpattaranont, Tanja Schultz, C. Limsakul","doi":"10.1109/APSIPA.2014.7041549","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041549","url":null,"abstract":"In order to overcome a problem existing in a classical automatic speech recognition (e.g. ambient noise and loss of privacy), Electromyography (EMG) from speech production muscles was used in place of a human speech signal. We aim to investigate the EMG speech recognition based on Thai language. The earlier work, we used five channels of the EMG from the facial and neck muscles to classify 11 Thai number words based on Neural Network Classification. 15 features in time domain and frequency domain were employed for feature extraction. We obtained an average accuracy rate of 89.45% for audible speech and 78.55% for silent speech. However, it needs to be enhanced to get the best result. This paper proposes to improve an accuracy rate of EMG-based Thai number words classification. The ten subjects uttered 11 words in both an audible and a silent speech while five channels of the EMG signal were captured. Frame-based time domain features with a stacking filter was performed for feature extraction stage. After that, LDA was used to lessen a dimension of the feature vector. Hidden Markov Model (HMM) was employed in classification stage. The results show that using above techniques of feature extraction, feature dimensionality reduction and classification can improve an average accuracy rate by 3% absolute for audible speech when were compared to earlier work. We achieved an average classification rate of 92.45% and 75.73% for audible and silent speech respectively.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127938852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Combined per-user SLNR and SINR criterions for interference alignment in uplink coordinated multi-point joint reception 上行协调多点联合接收中干扰对准的单用户单反比和信噪比联合准则
A. E. Rakhmania, P. Tsai, O. Setyawati
An interference alignment (IA) algorithm for uplink coordinated multi-point (CoMP) is proposed. For the design of precoder at the transmitter, the unselfish per-user signal-to-leakage-and-noise ratio (SLNR) criterion is used. On the other hand, the per-user signal-to-interference-and-noise-ratio (SINR) criterion is adopted to determine the decoder of the receiver. The proposed algorithm does not rely on the channel reciprocity and thus is suitable to operate in case of different user transmission powers. Through iterative procedure, we show that the per-user-based criterion which keeps user data streams orthogonal can suppress interference effectively and achieve higher sum rate than the conventional IA algorithms, such as minimum leakage and maximum per-stream SINR algorithms in the multi-user CoMP joint reception scenarios.
提出了一种用于上行协调多点(CoMP)的干扰对准算法。在发射端预编码器的设计中,采用了无私的单用户信噪比(SLNR)准则。另一方面,采用每用户信噪比(SINR)准则来确定接收机的解码器。该算法不依赖于信道互易性,适用于不同用户传输功率的情况。通过迭代过程,我们证明了在多用户CoMP联合接收场景下,保持用户数据流正交的基于用户的准则可以有效地抑制干扰,并且比传统的IA算法(如最小泄漏和最大单流SINR算法)实现更高的和速率。
{"title":"Combined per-user SLNR and SINR criterions for interference alignment in uplink coordinated multi-point joint reception","authors":"A. E. Rakhmania, P. Tsai, O. Setyawati","doi":"10.1109/APSIPA.2014.7041771","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041771","url":null,"abstract":"An interference alignment (IA) algorithm for uplink coordinated multi-point (CoMP) is proposed. For the design of precoder at the transmitter, the unselfish per-user signal-to-leakage-and-noise ratio (SLNR) criterion is used. On the other hand, the per-user signal-to-interference-and-noise-ratio (SINR) criterion is adopted to determine the decoder of the receiver. The proposed algorithm does not rely on the channel reciprocity and thus is suitable to operate in case of different user transmission powers. Through iterative procedure, we show that the per-user-based criterion which keeps user data streams orthogonal can suppress interference effectively and achieve higher sum rate than the conventional IA algorithms, such as minimum leakage and maximum per-stream SINR algorithms in the multi-user CoMP joint reception scenarios.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125499731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Wiring control by RTL design for reconfigurable wave-pipelined circuits 基于RTL设计的可重构波管电路布线控制
Tomoaki Sato, S. Chivapreecha, P. Moungnoul
High-speed and low-power circuits of considering the development cycle for digital signal processing are very important in a mobile computing. The achievement of them on an FPGA (Field Programmable Gate Array) dominant in the point of shortening the development cycle. Nevertheless a reconfigurable device such as an FPGA for a power-aware design has not been developed. The authors have developed logic blocks for reconfigurable wave-pipelined circuits for the achievement of high-speed and low-power reconfigurable circuits. Wave-pipeline is one of a circuit design technique for high-speed processing and low-power consumption. They are very useful for the reduction in the resource on the FPGA. However, a wiring control to connect them have not been achieved. In this paper, the wiring control by RTL Design is developed. Its operation speeds are evaluated using 0.18 um CMOS technology.
考虑到数字信号处理开发周期的高速低功耗电路在移动计算中是非常重要的。它们的实现在FPGA(现场可编程门阵列)上占主导地位,缩短了开发周期。然而,一种可重新配置的器件,如用于功率感知设计的FPGA,尚未被开发出来。为了实现高速低功耗的可重构电路,作者开发了可重构波流水线电路的逻辑块。波浪管道是一种高速处理、低功耗的电路设计技术。它们对于减少FPGA上的资源非常有用。然而,连接它们的布线控制尚未实现。本文提出了RTL设计的布线控制方法。其运行速度采用0.18 um CMOS技术进行评估。
{"title":"Wiring control by RTL design for reconfigurable wave-pipelined circuits","authors":"Tomoaki Sato, S. Chivapreecha, P. Moungnoul","doi":"10.1109/APSIPA.2014.7041673","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041673","url":null,"abstract":"High-speed and low-power circuits of considering the development cycle for digital signal processing are very important in a mobile computing. The achievement of them on an FPGA (Field Programmable Gate Array) dominant in the point of shortening the development cycle. Nevertheless a reconfigurable device such as an FPGA for a power-aware design has not been developed. The authors have developed logic blocks for reconfigurable wave-pipelined circuits for the achievement of high-speed and low-power reconfigurable circuits. Wave-pipeline is one of a circuit design technique for high-speed processing and low-power consumption. They are very useful for the reduction in the resource on the FPGA. However, a wiring control to connect them have not been achieved. In this paper, the wiring control by RTL Design is developed. Its operation speeds are evaluated using 0.18 um CMOS technology.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122991949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Probabilistic growth model for dendrobium orchid 兰花石斛的概率生长模型
Korakoch Kongsombut, R. Chaisricharoen
Dendrobium orchid has several plant states which are requiring different patterns of cultivation. In order to deliver appropriate advice to orchid farmers, status of their farms must be aware especially in the composite of orchid in each state. To model and predict farm status based on given initial data, a growth model is introduced in form of the CDF which can be easily adapted to estimate ratio of status change based on certain amount of plant in each state. The experiment involves around 120 orchid plants divided into four growing states with over one year of observations. The proposed model is confirmed with collected data which is strongly representing normal distribution behavior.
石斛兰有几种不同的植物状态,需要不同的栽培模式。为了向兰花种植者提供适当的建议,必须了解他们农场的状况,特别是在每个州的兰花组合中。为了在给定初始数据的基础上对农场状态进行建模和预测,引入了CDF形式的生长模型,该模型可以很容易地用于估计基于每个状态中一定数量的植物的状态变化比例。该实验涉及大约120株兰花,分为四种生长状态,并进行了一年多的观察。用所收集的数据对模型进行了验证,该模型具有较强的正态分布特征。
{"title":"Probabilistic growth model for dendrobium orchid","authors":"Korakoch Kongsombut, R. Chaisricharoen","doi":"10.1109/APSIPA.2014.7041816","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041816","url":null,"abstract":"Dendrobium orchid has several plant states which are requiring different patterns of cultivation. In order to deliver appropriate advice to orchid farmers, status of their farms must be aware especially in the composite of orchid in each state. To model and predict farm status based on given initial data, a growth model is introduced in form of the CDF which can be easily adapted to estimate ratio of status change based on certain amount of plant in each state. The experiment involves around 120 orchid plants divided into four growing states with over one year of observations. The proposed model is confirmed with collected data which is strongly representing normal distribution behavior.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114251670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Affected people's needs detection after the East Japan Great Earthquake — Time series analysis using LDA 东日本大地震后受灾群众需求检测——LDA时间序列分析
T. Hashimoto, B. Chakraborty, S. Aramvith, T. Kuboyama, Y. Shirota
After the East Japan Great Earthquake happened on Mar. 11, 2011, many affected people who lost houses, jobs and families fell into difficulties. Governmental agencies and NPOs supported them by offering relief supplies, foods, evacuation centers and temporary houses. When various supports were offered to affected people, if Governmental agencies and NPOs could detect their needs appropriately, it was effective for supporting them. This paper proposes the method to extract affected people's needs from Social Media after the Earthquake and analyze their needs changes over time. We target the blog that expressed thoughts, requirements and complaints of affected people, and adopt the Latent Dirichlet Allocation (LDA) that is one of popular techniques for topic extraction. We then compare the analysis result with affected people's actual situation and real events and evaluate the effectiveness of our method. In addition, we evaluate the effectiveness as the method that can help decision making for providing appropriate supports to affected people.
2011年3月11日东日本大地震发生后,许多失去房屋、工作和家庭的受灾群众陷入了困境。政府机构和非政府组织通过提供救济物资、食品、疏散中心和临时住房来支持他们。当向受影响的人民提供各种支助时,如果政府机构和非政府组织能够适当地发现他们的需要,就能有效地支助他们。本文提出了从地震后的社交媒体中提取受灾人群需求的方法,并分析其需求随时间的变化。我们针对那些表达了受影响人群的想法、需求和抱怨的博客,采用了目前流行的话题抽取技术之一的潜狄利克雷分配(Latent Dirichlet Allocation, LDA)。然后,我们将分析结果与受影响人群的实际情况和真实事件进行比较,评估我们方法的有效性。此外,我们评估的有效性,作为一种方法,可以帮助决策提供适当的支持受影响的人。
{"title":"Affected people's needs detection after the East Japan Great Earthquake — Time series analysis using LDA","authors":"T. Hashimoto, B. Chakraborty, S. Aramvith, T. Kuboyama, Y. Shirota","doi":"10.1109/APSIPA.2014.7041714","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041714","url":null,"abstract":"After the East Japan Great Earthquake happened on Mar. 11, 2011, many affected people who lost houses, jobs and families fell into difficulties. Governmental agencies and NPOs supported them by offering relief supplies, foods, evacuation centers and temporary houses. When various supports were offered to affected people, if Governmental agencies and NPOs could detect their needs appropriately, it was effective for supporting them. This paper proposes the method to extract affected people's needs from Social Media after the Earthquake and analyze their needs changes over time. We target the blog that expressed thoughts, requirements and complaints of affected people, and adopt the Latent Dirichlet Allocation (LDA) that is one of popular techniques for topic extraction. We then compare the analysis result with affected people's actual situation and real events and evaluate the effectiveness of our method. In addition, we evaluate the effectiveness as the method that can help decision making for providing appropriate supports to affected people.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115983482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Enhanced local feature approach for overlapping sound event recognition 基于增强局部特征的重叠声事件识别方法
J. Dennis, T. H. Dat
In this paper, we propose a feature-based approach to address the challenging task of recognising overlapping sound events from single channel audio. Our approach is based on our previous work on Local Spectrogram Features (LSFs), where we combined a local spectral representation of the spectrogram with the Generalised Hough Transform (GHT) voting system for recognition. Here we propose to take the output from the GHT and use it as a feature for classification, and demonstrate that such an approach can improve upon the previous knowledge-based scoring system. Experiments are carried out on a challenging set of five overlapping sound events, with the addition of non-stationary background noise and volume change. The results show that the proposed system can achieve a detection rate of 99% and 91% in clean and 0dB noise conditions respectively, which is a strong improvement over our previous work.
在本文中,我们提出了一种基于特征的方法来解决从单通道音频中识别重叠声音事件的挑战性任务。我们的方法是基于我们之前在局部谱图特征(LSFs)方面的工作,其中我们将谱图的局部谱表示与广义霍夫变换(GHT)投票系统相结合以进行识别。在这里,我们建议将GHT的输出用作分类的特征,并证明这种方法可以改进以前基于知识的评分系统。实验在一组具有挑战性的5个重叠的声音事件上进行,并添加了非平稳背景噪声和音量变化。结果表明,该系统在清洁和0dB噪声条件下的检测率分别达到99%和91%,比我们之前的工作有了很大的提高。
{"title":"Enhanced local feature approach for overlapping sound event recognition","authors":"J. Dennis, T. H. Dat","doi":"10.1109/APSIPA.2014.7041646","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041646","url":null,"abstract":"In this paper, we propose a feature-based approach to address the challenging task of recognising overlapping sound events from single channel audio. Our approach is based on our previous work on Local Spectrogram Features (LSFs), where we combined a local spectral representation of the spectrogram with the Generalised Hough Transform (GHT) voting system for recognition. Here we propose to take the output from the GHT and use it as a feature for classification, and demonstrate that such an approach can improve upon the previous knowledge-based scoring system. Experiments are carried out on a challenging set of five overlapping sound events, with the addition of non-stationary background noise and volume change. The results show that the proposed system can achieve a detection rate of 99% and 91% in clean and 0dB noise conditions respectively, which is a strong improvement over our previous work.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116648235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An effective reduction of subproblems in design of CSD coefficient FIR filters CSD系数FIR滤波器设计中子问题的有效减少
Takuya Imaizumi, K. Suyama
In this paper, an effective reduction method of the number of subproblems for the design of CSD (Canonic Signed Digit) coefficient FIR (Finite Impulse Response) filters using BB (Branch and Bound method) is studied. The problem can be formulated as a mixed integer programming problem. The problem can be optimally solved by using the BB. For solving the problem, it is required to solve a large number of subproblems, and it causes a high computational time. Recently, a novel method for the reduction of the number of subproblems has been proposed. In the method, the subproblems can be reduced by starting from an initial branch tree constructed by an approximate solution obtained by any heuristic method. However, the reduction of subproblems depends on the heuristic method applied. In this paper, the effective methods for the reduction of subproblems are studied. Several examples are shown to present the efficiency of the studied methods.
本文研究了用BB(分支定界法)设计CSD(正整数)系数FIR(有限脉冲响应)滤波器时子问题数量的有效减少方法。该问题可表述为一个混合整数规划问题。使用BB可以最佳地解决这个问题。为了求解该问题,需要求解大量的子问题,并且计算时间非常长。近年来,提出了一种减少子问题数量的新方法。在该方法中,子问题可以从由任意启发式方法得到的近似解构造的初始分支树开始简化。然而,子问题的约简取决于所采用的启发式方法。本文研究了子问题约简的有效方法。算例表明了所研究方法的有效性。
{"title":"An effective reduction of subproblems in design of CSD coefficient FIR filters","authors":"Takuya Imaizumi, K. Suyama","doi":"10.1109/APSIPA.2014.7041542","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041542","url":null,"abstract":"In this paper, an effective reduction method of the number of subproblems for the design of CSD (Canonic Signed Digit) coefficient FIR (Finite Impulse Response) filters using BB (Branch and Bound method) is studied. The problem can be formulated as a mixed integer programming problem. The problem can be optimally solved by using the BB. For solving the problem, it is required to solve a large number of subproblems, and it causes a high computational time. Recently, a novel method for the reduction of the number of subproblems has been proposed. In the method, the subproblems can be reduced by starting from an initial branch tree constructed by an approximate solution obtained by any heuristic method. However, the reduction of subproblems depends on the heuristic method applied. In this paper, the effective methods for the reduction of subproblems are studied. Several examples are shown to present the efficiency of the studied methods.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116685190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Selection of best match keyword using spoken term detection for spoken document indexing 使用口语词检测为口语文档索引选择最佳匹配关键字
Kentaro Domoto, T. Utsuro, N. Sawada, H. Nishizaki
This paper presents a novel keyword selection-based spoken document-indexing framework that selects the best match keyword from query candidates using spoken term detection (STD) for spoken document retrieval. Our method comprises creating a keyword set including keywords that are likely to be in a spoken document. Next, an STD is conducted for all the keywords as query terms for STD; then, the detection result, a set of each keyword and its detection intervals in the spoken document, is obtained. For the keywords that have competitive intervals, we rank them based on the matching cost of STD and select the best one with the longest duration among competitive detections. This is the final output of STD process and serves as an index word for the spoken document. The proposed framework was evaluated on lecture speeches as spoken documents in an STD task. The results show that our framework was quite effective for preventing false detection errors and in annotating keyword indices to spoken documents.
本文提出了一种基于关键字选择的语音文档索引框架,该框架利用语音词检测(STD)从查询候选者中选择最匹配的关键字进行语音文档检索。我们的方法包括创建一个关键字集,其中包括可能出现在口语文档中的关键字。然后,对所有作为STD查询词的关键词执行STD;然后,得到语音文档中每个关键字及其检测间隔的集合作为检测结果。对于具有竞争间隔的关键词,我们根据STD的匹配成本对它们进行排序,并在竞争检测中选择持续时间最长的最佳关键词。这是STD过程的最终输出,并作为口语文档的索引词。在STD任务中,以演讲作为口语文档对所提出的框架进行了评估。结果表明,该框架在防止误检错误和为口语文档标注关键词索引方面非常有效。
{"title":"Selection of best match keyword using spoken term detection for spoken document indexing","authors":"Kentaro Domoto, T. Utsuro, N. Sawada, H. Nishizaki","doi":"10.1109/APSIPA.2014.7041589","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041589","url":null,"abstract":"This paper presents a novel keyword selection-based spoken document-indexing framework that selects the best match keyword from query candidates using spoken term detection (STD) for spoken document retrieval. Our method comprises creating a keyword set including keywords that are likely to be in a spoken document. Next, an STD is conducted for all the keywords as query terms for STD; then, the detection result, a set of each keyword and its detection intervals in the spoken document, is obtained. For the keywords that have competitive intervals, we rank them based on the matching cost of STD and select the best one with the longest duration among competitive detections. This is the final output of STD process and serves as an index word for the spoken document. The proposed framework was evaluated on lecture speeches as spoken documents in an STD task. The results show that our framework was quite effective for preventing false detection errors and in annotating keyword indices to spoken documents.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123869677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1