首页 > 最新文献

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文 中文
Robust Binary Loss for Multi-Category Classification with Label Noise 带有标签噪声的多类别分类鲁棒二值损失
Defu Liu, Guowu Yang, Jinzhao Wu, Jiayi Zhao, Fengmao Lv
Deep learning has achieved tremendous success in image classification. However, the corresponding performance leap relies heavily on large-scale accurate annotations, which are usually hard to collect in reality. It is essential to explore methods that can train deep models effectively under label noise. To address the problem, we propose to train deep models with robust binary loss functions. To be specific, we tackle the K-class classification task by using K binary classifiers. We can immediately use multi-category large margin classification approaches, e.g., Pairwise-Comparison (PC) or One-Versus-All (OVA), to jointly train the binary classifiers for multi-category classification. Our method can be robust to label noise if symmetric functions, e.g., the sigmoid loss or the ramp loss, are employed as the binary loss function in the framework of risk minimization. The learning theory reveals that our method can be inherently tolerant to label noise in multi-category classification tasks. Extensive experiments over different datasets with different types of label noise are conducted. The experimental results clearly confirm the effectiveness of our method.
深度学习在图像分类方面取得了巨大的成功。然而,相应的性能飞跃在很大程度上依赖于大规模准确的注释,而这些注释在现实中通常很难收集到。探索在标签噪声下有效训练深度模型的方法至关重要。为了解决这个问题,我们提出用鲁棒二值损失函数训练深度模型。具体来说,我们通过使用K个二元分类器来处理K类分类任务。我们可以立即使用多类别大边际分类方法,例如,成对比较(Pairwise-Comparison, PC)或单对全(One-Versus-All, OVA),来联合训练用于多类别分类的二元分类器。如果在风险最小化的框架下使用对称函数,如s型损失或斜坡损失作为二值损失函数,我们的方法对噪声标记具有鲁棒性。学习理论表明,该方法对多类别分类任务中的标签噪声具有固有的容忍度。在具有不同类型标签噪声的不同数据集上进行了广泛的实验。实验结果清楚地证实了该方法的有效性。
{"title":"Robust Binary Loss for Multi-Category Classification with Label Noise","authors":"Defu Liu, Guowu Yang, Jinzhao Wu, Jiayi Zhao, Fengmao Lv","doi":"10.1109/ICASSP39728.2021.9414493","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414493","url":null,"abstract":"Deep learning has achieved tremendous success in image classification. However, the corresponding performance leap relies heavily on large-scale accurate annotations, which are usually hard to collect in reality. It is essential to explore methods that can train deep models effectively under label noise. To address the problem, we propose to train deep models with robust binary loss functions. To be specific, we tackle the K-class classification task by using K binary classifiers. We can immediately use multi-category large margin classification approaches, e.g., Pairwise-Comparison (PC) or One-Versus-All (OVA), to jointly train the binary classifiers for multi-category classification. Our method can be robust to label noise if symmetric functions, e.g., the sigmoid loss or the ramp loss, are employed as the binary loss function in the framework of risk minimization. The learning theory reveals that our method can be inherently tolerant to label noise in multi-category classification tasks. Extensive experiments over different datasets with different types of label noise are conducted. The experimental results clearly confirm the effectiveness of our method.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122701894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Privacy-Accuracy Trade-Off of Inference as Service 推理即服务的隐私-准确性权衡
Yulu Jin, L. Lai
In this paper, we propose a general framework to provide a desirable trade-off between inference accuracy and privacy protection in the inference as service scenario. Instead of sending data directly to the server, the user will preprocess the data through a privacy-preserving mapping, which will increase privacy protection but reduce inference accuracy. To properly address the trade-off between privacy protection and inference accuracy, we formulate an optimization problem to find the optimal privacy-preserving mapping. Even though the problem is non-convex in general, we characterize nice structures of the problem and develop an iterative algorithm to find the desired privacy-preserving mapping.
在本文中,我们提出了一个通用框架,以在推理即服务场景中提供推理准确性和隐私保护之间的理想权衡。用户将通过隐私保护映射对数据进行预处理,而不是直接向服务器发送数据,这将增加隐私保护,但会降低推理准确性。为了正确处理隐私保护和推理精度之间的权衡,我们提出了一个优化问题来寻找最优的隐私保护映射。尽管问题通常是非凸的,但我们描述了问题的良好结构,并开发了一种迭代算法来找到所需的隐私保护映射。
{"title":"Privacy-Accuracy Trade-Off of Inference as Service","authors":"Yulu Jin, L. Lai","doi":"10.1109/ICASSP39728.2021.9413438","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413438","url":null,"abstract":"In this paper, we propose a general framework to provide a desirable trade-off between inference accuracy and privacy protection in the inference as service scenario. Instead of sending data directly to the server, the user will preprocess the data through a privacy-preserving mapping, which will increase privacy protection but reduce inference accuracy. To properly address the trade-off between privacy protection and inference accuracy, we formulate an optimization problem to find the optimal privacy-preserving mapping. Even though the problem is non-convex in general, we characterize nice structures of the problem and develop an iterative algorithm to find the desired privacy-preserving mapping.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122733087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Deep Lung Auscultation Using Acoustic Biomarkers for Abnormal Respiratory Sound Event Detection 利用声学生物标志物检测异常呼吸声事件的深肺听诊
Upasana Tiwari, Swapnil Bhosale, Rupayan Chakraborty, S. Kopparapu
Lung Auscultation is a non-invasive process of distinguishing normal respiratory sounds from abnormal ones by analyzing the airflow along the respiratory tract. With developments in the Deep Learning (DL) techniques and wider access to anonymized medical data, automatic detection of specific sounds such as crackles and wheezes have been gaining popularity. In this paper, we propose to use two sets of diversified acoustic biomarkers extracted using Discrete Wavelet Transform (DWT) and deep encoded features from the intermediate layer of a pre-trained Audio Event Detection (AED) model trained using sounds from daily activities. First set of biomarkers highlight the time frequency localization characteristics obtained from DWT coefficients. However, the second set of deep encoded biomarkers captures a generalized reliable representation, and thus indemnifies the scarcity of training samples and the class imbalance in dataset. The model trained using these features achieves a 15.05% increase in terms of the specificity over the baseline model that uses spectrogram features. Moreover, ensemble of DWT features and deep encoded feature based models show absolute improvements of 8.32%, 6.66% and 7.40% in terms of sensitivity, specificity and ICBHI-score, respectively, and clearly outperforms the state-of-the-art with a significant margin.
肺听诊是一种通过分析呼吸道气流来区分正常呼吸音和异常呼吸音的无创过程。随着深度学习(DL)技术的发展和对匿名医疗数据的广泛访问,自动检测特定声音(如噼啪声和喘息声)越来越受欢迎。在本文中,我们建议使用离散小波变换(DWT)和深度编码特征提取的两组多样化声学生物标志物,这些特征来自预先训练的音频事件检测(AED)模型的中间层,该模型使用来自日常活动的声音进行训练。第一组生物标记突出了从DWT系数中获得的时频定位特征。然而,第二组深度编码的生物标志物捕获了一个广义的可靠表示,从而弥补了训练样本的稀缺性和数据集中的类不平衡。使用这些特征训练的模型在特异性方面比使用谱图特征的基线模型提高了15.05%。此外,基于DWT特征的集成和基于深度编码特征的模型在灵敏度、特异性和icbhi评分方面分别显示出8.32%、6.66%和7.40%的绝对提高,明显优于目前的水平。
{"title":"Deep Lung Auscultation Using Acoustic Biomarkers for Abnormal Respiratory Sound Event Detection","authors":"Upasana Tiwari, Swapnil Bhosale, Rupayan Chakraborty, S. Kopparapu","doi":"10.1109/ICASSP39728.2021.9414845","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414845","url":null,"abstract":"Lung Auscultation is a non-invasive process of distinguishing normal respiratory sounds from abnormal ones by analyzing the airflow along the respiratory tract. With developments in the Deep Learning (DL) techniques and wider access to anonymized medical data, automatic detection of specific sounds such as crackles and wheezes have been gaining popularity. In this paper, we propose to use two sets of diversified acoustic biomarkers extracted using Discrete Wavelet Transform (DWT) and deep encoded features from the intermediate layer of a pre-trained Audio Event Detection (AED) model trained using sounds from daily activities. First set of biomarkers highlight the time frequency localization characteristics obtained from DWT coefficients. However, the second set of deep encoded biomarkers captures a generalized reliable representation, and thus indemnifies the scarcity of training samples and the class imbalance in dataset. The model trained using these features achieves a 15.05% increase in terms of the specificity over the baseline model that uses spectrogram features. Moreover, ensemble of DWT features and deep encoded feature based models show absolute improvements of 8.32%, 6.66% and 7.40% in terms of sensitivity, specificity and ICBHI-score, respectively, and clearly outperforms the state-of-the-art with a significant margin.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122465927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Periodic Frame Learning Approach for Accurate Landmark Localization in M-Mode Echocardiography 一种周期框架学习方法用于m型超声心动图的准确地标定位
Yinbing Tian, Shibiao Xu, Li Guo, Fu'ze Cong
Anatomical landmark localization has been a key challenge for medical image analysis. Existing researches mostly adopt CNN as the main architecture for landmark localization while they are not applicable to process image modalities with periodic structure. In this paper, we propose a novel two-stage frame-level detection and heatmap regression model for accurate landmark localization in m-mode echocardiography, which promotes better integration between global context information and local appearance. Specifically, a periodic frame detection module with LSTM is designed to model periodic context and detect frames of systole and diastole from original echocardiography. Next, a CNN based heatmap regression model is introduced to predict landmark localization in each systolic or diastolic local region. Experiment results show that the proposed model achieves average distance error of 9.31, which is at a reduction by 24% comparing to baseline models.
解剖地标定位一直是医学图像分析的关键挑战。现有研究多采用CNN作为地标定位的主要架构,不适用于处理具有周期结构的图像模态。在本文中,我们提出了一种新的两阶段帧级检测和热图回归模型,用于m模超声心动图的准确地标定位,该模型促进了全局上下文信息和局部外观之间的更好整合。具体来说,利用LSTM设计了一个周期帧检测模块来模拟周期上下文,并从原始超声心动图中检测收缩期和舒张期帧。接下来,引入基于CNN的热图回归模型来预测每个收缩期或舒张期局部区域的地标定位。实验结果表明,该模型的平均距离误差为9.31,与基线模型相比降低了24%。
{"title":"A Periodic Frame Learning Approach for Accurate Landmark Localization in M-Mode Echocardiography","authors":"Yinbing Tian, Shibiao Xu, Li Guo, Fu'ze Cong","doi":"10.1109/ICASSP39728.2021.9414375","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414375","url":null,"abstract":"Anatomical landmark localization has been a key challenge for medical image analysis. Existing researches mostly adopt CNN as the main architecture for landmark localization while they are not applicable to process image modalities with periodic structure. In this paper, we propose a novel two-stage frame-level detection and heatmap regression model for accurate landmark localization in m-mode echocardiography, which promotes better integration between global context information and local appearance. Specifically, a periodic frame detection module with LSTM is designed to model periodic context and detect frames of systole and diastole from original echocardiography. Next, a CNN based heatmap regression model is introduced to predict landmark localization in each systolic or diastolic local region. Experiment results show that the proposed model achieves average distance error of 9.31, which is at a reduction by 24% comparing to baseline models.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122477849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Improvements to Prosodic Alignment for Automatic Dubbing 自动配音的韵律对齐改进
Yogesh Virkar, Marcello Federico, Robert Enyedi, R. Barra-Chicote
Automatic dubbing is an extension of speech-to-speech translation such that the resulting target speech is carefully aligned in terms of duration, lip movements, timbre, emotion, prosody, etc. of the speaker in order to achieve audiovisual coherence. Dubbing quality strongly depends on isochrony, i.e., arranging the translation of the original speech to optimally match its sequence of phrases and pauses. To this end, we present improvements to the prosodic alignment component of our recently introduced dubbing architecture. We present empirical results for four dubbing directions – English to French, Italian, German and Spanish – on a publicly available collection of TED Talks. Compared to previous work, our enhanced prosodic alignment model significantly improves prosodic alignment accuracy and provides segmentation perceptibly better or on par with manually annotated reference segmentation.
自动配音是语音到语音翻译的一种扩展,使所产生的目标语音在说话者的持续时间、嘴唇运动、音色、情感、韵律等方面仔细对齐,以实现视听连贯。配音质量在很大程度上取决于等时性,即安排翻译原文以最佳地匹配短语和停顿的顺序。为此,我们提出了对我们最近引入的配音架构的韵律对齐组件的改进。我们在公开的TED演讲集上展示了四种配音方向的实证结果——英语到法语、意大利语、德语和西班牙语。与之前的工作相比,我们改进的韵律对齐模型显著提高了韵律对齐精度,并且提供了明显更好的分割,或者与手动注释的参考分割相当。
{"title":"Improvements to Prosodic Alignment for Automatic Dubbing","authors":"Yogesh Virkar, Marcello Federico, Robert Enyedi, R. Barra-Chicote","doi":"10.1109/ICASSP39728.2021.9414966","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414966","url":null,"abstract":"Automatic dubbing is an extension of speech-to-speech translation such that the resulting target speech is carefully aligned in terms of duration, lip movements, timbre, emotion, prosody, etc. of the speaker in order to achieve audiovisual coherence. Dubbing quality strongly depends on isochrony, i.e., arranging the translation of the original speech to optimally match its sequence of phrases and pauses. To this end, we present improvements to the prosodic alignment component of our recently introduced dubbing architecture. We present empirical results for four dubbing directions – English to French, Italian, German and Spanish – on a publicly available collection of TED Talks. Compared to previous work, our enhanced prosodic alignment model significantly improves prosodic alignment accuracy and provides segmentation perceptibly better or on par with manually annotated reference segmentation.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114521812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Sparse Parameter Estimation for PMCW MIMO Radar Using Few-Bit ADCs 基于少位adc的PMCW MIMO雷达稀疏参数估计
Chao-Yi Wu, Jian Li, T. Wong
In this work, we consider target parameter estimation of phase-modulated continuous-wave (PMCW) multiple-input multiple-output (MIMO) radars with few-bit analog-to-digital converters (ADCs). We formulate the estimation problem as a sparse signal recovery problem and modify the fast iterative shrinkage-thresholding algorithm (FISTA) to solve it. The ℓ2,1-norm is adopted to promote the sparsity in the range-Doppler-angle domain. Simulation results show that using few-bit ADCs can achieve comparable performance to many-bit ADCs when targets are widely separated. However, if targets are spaced closely, performance losses can occur when 1-bit ADCs are applied.
在这项工作中,我们考虑了具有少量模数转换器(adc)的相位调制连续波(PMCW)多输入多输出(MIMO)雷达的目标参数估计。我们将估计问题表述为一个稀疏信号恢复问题,并改进快速迭代收缩阈值算法(FISTA)来解决它。为了提高距离-多普勒角域的稀疏性,采用了1,2范数。仿真结果表明,在目标距离较远的情况下,使用少位adc可以达到与多位adc相当的性能。但是,如果目标间隔很近,则在应用1位adc时可能会发生性能损失。
{"title":"Sparse Parameter Estimation for PMCW MIMO Radar Using Few-Bit ADCs","authors":"Chao-Yi Wu, Jian Li, T. Wong","doi":"10.1109/ICASSP39728.2021.9414267","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414267","url":null,"abstract":"In this work, we consider target parameter estimation of phase-modulated continuous-wave (PMCW) multiple-input multiple-output (MIMO) radars with few-bit analog-to-digital converters (ADCs). We formulate the estimation problem as a sparse signal recovery problem and modify the fast iterative shrinkage-thresholding algorithm (FISTA) to solve it. The ℓ2,1-norm is adopted to promote the sparsity in the range-Doppler-angle domain. Simulation results show that using few-bit ADCs can achieve comparable performance to many-bit ADCs when targets are widely separated. However, if targets are spaced closely, performance losses can occur when 1-bit ADCs are applied.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114574315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
DoA estimation of a hidden RF source exploiting simple backscatter radio tags 利用简单反向散射无线电标签的隐藏射频源的DoA估计
G. Vougioukas, A. Bletsas
Conventional direction of arrival (DoA) techniques employ multi-antenna receivers with increased complexity and cost. This work emulates a multi-antenna system using a singleantenna receiver and exploiting the beauty and simplicity of backscatter radio. More specifically, a number of simple backscatter radio tags offer copies of the hidden RF source, relayed in space and shifted in frequency, while requiring minimal time-synchronisation. DoA of a hidden RF source was estimated with an error of less than 5 degrees, exploiting a small number of simple, ultra-low-cost backscattering tags.
传统的到达方向(DoA)技术采用多天线接收器,增加了复杂性和成本。这项工作模拟了使用单天线接收器的多天线系统,并利用了反向散射无线电的美观和简单性。更具体地说,一些简单的反向散射无线电标签提供隐藏射频源的副本,在空间中中继并在频率上移动,同时需要最小的时间同步。利用少量简单、超低成本的后向散射标签,估计了隐藏射频源的DoA,误差小于5度。
{"title":"DoA estimation of a hidden RF source exploiting simple backscatter radio tags","authors":"G. Vougioukas, A. Bletsas","doi":"10.1109/ICASSP39728.2021.9414918","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414918","url":null,"abstract":"Conventional direction of arrival (DoA) techniques employ multi-antenna receivers with increased complexity and cost. This work emulates a multi-antenna system using a singleantenna receiver and exploiting the beauty and simplicity of backscatter radio. More specifically, a number of simple backscatter radio tags offer copies of the hidden RF source, relayed in space and shifted in frequency, while requiring minimal time-synchronisation. DoA of a hidden RF source was estimated with an error of less than 5 degrees, exploiting a small number of simple, ultra-low-cost backscattering tags.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121972477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active Speaker Selection 视听多人语音识别与主动说话人选择研究
Otavio Braga, O. Siohan
Audio-visual automatic speech recognition is a promising approach to robust ASR under noisy conditions. However, up until recently it had been traditionally studied in isolation assuming the video of a single speaking face matches the audio, and selecting the active speaker at inference time when multiple people are on screen was put aside as a separate problem. As an alternative, recent work has proposed to address the two problems simultaneously with an attention mechanism, baking the speaker selection problem directly into a fully differentiable model. One interesting finding was that the attention indirectly learns the association between the audio and the speaking face even though this correspondence is never explicitly provided at training time. In the present work we further investigate this connection and examine the interplay between the two problems. With experiments involving over 50 thousand hours of public YouTube videos as training data, we first evaluate the accuracy of the attention layer on an active speaker selection task. Secondly, we show under closer scrutiny that an end-to-end model performs at least as well as a considerably larger two-step system that utilizes a hard decision boundary under various noise conditions and number of parallel face tracks.
在噪声条件下,视听自动语音识别是一种很有前途的鲁棒ASR方法。然而,直到最近,传统上对它的研究都是孤立的,假设单个说话面孔的视频与音频相匹配,当屏幕上有多人时,在推理时间选择主动说话者被作为一个单独的问题放在一边。作为一种替代方案,最近的研究提出用注意机制同时解决这两个问题,将说话人选择问题直接转化为一个完全可微分的模型。一个有趣的发现是,注意力间接地学会了声音和说话的脸之间的联系,尽管这种联系在训练时从未明确提供过。在目前的工作中,我们进一步研究了这种联系,并研究了这两个问题之间的相互作用。通过涉及超过5万小时的公共YouTube视频作为训练数据的实验,我们首先评估了注意力层在主动说话人选择任务上的准确性。其次,我们在更仔细的审查下表明,端到端模型至少与在各种噪声条件和平行面轨迹数量下利用硬决策边界的相当大的两步系统一样好。
{"title":"A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active Speaker Selection","authors":"Otavio Braga, O. Siohan","doi":"10.1109/ICASSP39728.2021.9414160","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414160","url":null,"abstract":"Audio-visual automatic speech recognition is a promising approach to robust ASR under noisy conditions. However, up until recently it had been traditionally studied in isolation assuming the video of a single speaking face matches the audio, and selecting the active speaker at inference time when multiple people are on screen was put aside as a separate problem. As an alternative, recent work has proposed to address the two problems simultaneously with an attention mechanism, baking the speaker selection problem directly into a fully differentiable model. One interesting finding was that the attention indirectly learns the association between the audio and the speaking face even though this correspondence is never explicitly provided at training time. In the present work we further investigate this connection and examine the interplay between the two problems. With experiments involving over 50 thousand hours of public YouTube videos as training data, we first evaluate the accuracy of the attention layer on an active speaker selection task. Secondly, we show under closer scrutiny that an end-to-end model performs at least as well as a considerably larger two-step system that utilizes a hard decision boundary under various noise conditions and number of parallel face tracks.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122033490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Radio Frequency Based Heart Rate Variability Monitoring 基于射频的心率变异性监测
Fengyu Wang, Xiaolu Zeng, Chenshu Wu, Beibei Wang, K. Liu
Heart Rate Variability (HRV), which measures the fluctuation of heartbeat intervals, has been considered as an important indicator for general health evaluation. In this paper, we present mmHRV, a contact-free HRV monitoring system using commercial millimeter-wave (mmWave) radio. We devise a heartbeat signal extractor, which can optimize the decomposition of the phase of the channel information modulated by the chest movement, and thus estimate the heartbeat signal. The exact time of heartbeats is estimated by finding the peak location of the heartbeat signal while the Inter-Beat Intervals (IBIs) can be further derived for evaluating the HRV metrics. Experimental results show that mmHRV can measure the HRV accurately with 3.68ms average error of mean IBI (w.r.t. 99.49% accuracy) based on the experiments over 10 participants.
心率变异性(HRV)是衡量心跳间隔波动的指标,一直被认为是一般健康评价的重要指标。在本文中,我们提出了mmHRV,一种使用商用毫米波(mmWave)无线电的无接触HRV监测系统。设计了心跳信号提取器,对胸部运动调制的信道信息进行相位优化分解,从而估计出心跳信号。通过寻找心跳信号的峰值位置来估计心跳的确切时间,而心跳间隔(IBIs)可以进一步导出以评估HRV指标。实验结果表明,mmHRV可以准确测量HRV,平均IBI平均误差为3.68ms (w.r.t准确率为99.49%)。
{"title":"Radio Frequency Based Heart Rate Variability Monitoring","authors":"Fengyu Wang, Xiaolu Zeng, Chenshu Wu, Beibei Wang, K. Liu","doi":"10.1109/ICASSP39728.2021.9413465","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413465","url":null,"abstract":"Heart Rate Variability (HRV), which measures the fluctuation of heartbeat intervals, has been considered as an important indicator for general health evaluation. In this paper, we present mmHRV, a contact-free HRV monitoring system using commercial millimeter-wave (mmWave) radio. We devise a heartbeat signal extractor, which can optimize the decomposition of the phase of the channel information modulated by the chest movement, and thus estimate the heartbeat signal. The exact time of heartbeats is estimated by finding the peak location of the heartbeat signal while the Inter-Beat Intervals (IBIs) can be further derived for evaluating the HRV metrics. Experimental results show that mmHRV can measure the HRV accurately with 3.68ms average error of mean IBI (w.r.t. 99.49% accuracy) based on the experiments over 10 participants.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122065170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Arrhythmia Classification with Heartbeat-Aware Transformer 心律失常分类与心电感应变压器
Bin Wang, Chang Liu, Chuanyan Hu, Xudong Liu, Jun Cao
Electrocardiography (ECG) is a conventional method in arrhythmia diagnosis. In this paper, we proposed a novel neural network model which treats typical heartbeat classification task as ‘Translation’ problem. By introducing Transformer structure into model, and adding heartbeat-aware attention mechanism to enhance the alignment between encoded sequence and decoded sequence, after trained with ECG database, (which are collected from 200k patients in over 2000 hospitals for more than 10 years), the validation result of independent test dataset shows that this new heartbeat-aware Transformer model can outperform classic Transformer and other sequence to sequence methods. Finally, we show that the visualization of encoder-decoder attention weights provides more interpretable information about how a Transformer make a diagnosis based on raw ECG signals, which has guiding significance in clinical diagnosis.
心电图(ECG)是诊断心律失常的常规方法。本文提出了一种新的神经网络模型,将典型的心跳分类任务视为“翻译”问题。通过在模型中引入Transformer结构,并加入心跳感知注意机制,增强编码序列与解码序列之间的一致性,通过对2000多家医院超过10年的20万例患者的心电数据库进行训练,独立测试数据集的验证结果表明,这种新的心跳感知Transformer模型优于经典Transformer和其他序列对序列方法。最后,我们证明了编码器-解码器注意权重的可视化为变压器如何基于原始心电信号进行诊断提供了更多可解释的信息,这对临床诊断具有指导意义。
{"title":"Arrhythmia Classification with Heartbeat-Aware Transformer","authors":"Bin Wang, Chang Liu, Chuanyan Hu, Xudong Liu, Jun Cao","doi":"10.1109/ICASSP39728.2021.9413938","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413938","url":null,"abstract":"Electrocardiography (ECG) is a conventional method in arrhythmia diagnosis. In this paper, we proposed a novel neural network model which treats typical heartbeat classification task as ‘Translation’ problem. By introducing Transformer structure into model, and adding heartbeat-aware attention mechanism to enhance the alignment between encoded sequence and decoded sequence, after trained with ECG database, (which are collected from 200k patients in over 2000 hospitals for more than 10 years), the validation result of independent test dataset shows that this new heartbeat-aware Transformer model can outperform classic Transformer and other sequence to sequence methods. Finally, we show that the visualization of encoder-decoder attention weights provides more interpretable information about how a Transformer make a diagnosis based on raw ECG signals, which has guiding significance in clinical diagnosis.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122142524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1