首页 > 最新文献

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文 中文
FINT: Field-Aware Interaction Neural Network for Click-Through Rate Prediction 用于点击率预测的场感知交互神经网络
Zhishan Zhao, Sen Yang, Guohui Liu, Dawei Feng, Kele Xu
As a critical component for online advertising and marketing, click-through rate (CTR) prediction has drawn lots of attention from both industry and academia. Recently, deep learning has become the mainstream methodological choice for CTR. Despite sustainable efforts have been made, existing approaches still pose several challenges. On the one hand, high-order interaction between the features is under-explored. On the other hand, high-order interactions may neglect the semantic information from the low-order fields. In this paper, we proposed a novel prediction method, named FINT, that employs the Field-aware INTeraction layer which explicitly captures high-order feature interactions while retaining the low-order field information. To empirically investigate the effectiveness and robustness of the FINT, we perform extensive experiments on the three realistic databases: KDD2012, Criteo and Avazu. The obtained results demonstrate that the FINT can significantly improve the performance compared to the existing methods, without increasing the amount of computation required. Moreover, the proposed method brought about 2.72% increase to the advertising revenue of iQIYI, a big online video app through A/B testing. To better promote the research in CTR field, we released our code as well as reference implementation at: https://github.com/zhishan01/FINT.
点击率(CTR)预测作为网络广告和营销的重要组成部分,受到了业界和学术界的广泛关注。最近,深度学习已经成为点击率的主流方法选择。尽管作出了可持续的努力,现有的办法仍然构成若干挑战。一方面,特征之间的高阶交互尚未得到充分的探索。另一方面,高阶交互可能会忽略来自低阶域的语义信息。在本文中,我们提出了一种新的预测方法,称为FINT,该方法采用场感知交互层,在保留低阶场信息的同时显式捕获高阶特征交互。为了实证研究FINT的有效性和鲁棒性,我们在KDD2012、Criteo和Avazu三个现实数据库上进行了广泛的实验。结果表明,FINT在不增加计算量的情况下,可以显著提高现有方法的性能。此外,通过a /B测试,该方法为大型在线视频应用爱奇艺带来了2.72%的广告收入增长。为了更好地推动CTR领域的研究,我们在https://github.com/zhishan01/FINT上发布了我们的代码和参考实现。
{"title":"FINT: Field-Aware Interaction Neural Network for Click-Through Rate Prediction","authors":"Zhishan Zhao, Sen Yang, Guohui Liu, Dawei Feng, Kele Xu","doi":"10.1109/ICASSP43922.2022.9747247","DOIUrl":"https://doi.org/10.1109/ICASSP43922.2022.9747247","url":null,"abstract":"As a critical component for online advertising and marketing, click-through rate (CTR) prediction has drawn lots of attention from both industry and academia. Recently, deep learning has become the mainstream methodological choice for CTR. Despite sustainable efforts have been made, existing approaches still pose several challenges. On the one hand, high-order interaction between the features is under-explored. On the other hand, high-order interactions may neglect the semantic information from the low-order fields. In this paper, we proposed a novel prediction method, named FINT, that employs the Field-aware INTeraction layer which explicitly captures high-order feature interactions while retaining the low-order field information. To empirically investigate the effectiveness and robustness of the FINT, we perform extensive experiments on the three realistic databases: KDD2012, Criteo and Avazu. The obtained results demonstrate that the FINT can significantly improve the performance compared to the existing methods, without increasing the amount of computation required. Moreover, the proposed method brought about 2.72% increase to the advertising revenue of iQIYI, a big online video app through A/B testing. To better promote the research in CTR field, we released our code as well as reference implementation at: https://github.com/zhishan01/FINT.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116691095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Unlimited Sampling with Local Averages 无限采样与局部平均
Dorian Florescu, A. Bhandari
Signal saturation or clipping is a fundamental bottleneck that limits the capability of analog-to-digital converters (ADCs). The problem arises when the input signal dynamic range is larger than ADC’s dynamic range. To overcome this issue, an alternative acquisition protocol called the Unlimited Sensing Framework (USF) was recently proposed. This non-linear sensing scheme incorporates signal folding (via modulo non-linearity) before sampling. Reconstruction then entails "unfolding" of the high dynamic range input. Taking an end-to-end approach to the USF, a hardware validation called US-ADC was recently presented. US-ADC experiments show that, in some scenarios, the samples can be more accurately modelled as local averages than ideal, pointwise measurements. In particular, this happens when the input signal frequency is much larger than the operational bandwidth of the US-ADC. Pushing such hardware limits using computational approaches motivates the study of modulo sampling and reconstruction via local averages. By incorporating a modulo-hysteresis model, both in theory and in hardware, we present a guaranteed recovery algorithm for input reconstruction. We also explore a practical method suited for low sampling rates. Our approach is validated via simulations and experiments on hardware, thus enabling a step closer to practice.
信号饱和或削波是限制模数转换器(adc)性能的基本瓶颈。当输入信号的动态范围大于ADC的动态范围时,问题就出现了。为了克服这个问题,最近提出了一种称为无限传感框架(USF)的替代采集协议。这种非线性传感方案在采样前结合了信号折叠(通过模非线性)。然后重建需要“展开”高动态范围输入。采用端到端USF方法,最近提出了一种称为US-ADC的硬件验证。US-ADC实验表明,在某些情况下,样品可以更准确地建模为局部平均值,而不是理想的逐点测量。特别是,当输入信号频率远远大于US-ADC的工作带宽时,就会发生这种情况。利用计算方法突破硬件限制,激发了模采样和局部平均重建的研究。通过在理论上和硬件上结合模滞模型,我们提出了一种保证恢复的输入重构算法。我们还探索了一种适用于低采样率的实用方法。我们的方法通过硬件上的模拟和实验进行了验证,从而使实践更接近一步。
{"title":"Unlimited Sampling with Local Averages","authors":"Dorian Florescu, A. Bhandari","doi":"10.1109/ICASSP43922.2022.9747127","DOIUrl":"https://doi.org/10.1109/ICASSP43922.2022.9747127","url":null,"abstract":"Signal saturation or clipping is a fundamental bottleneck that limits the capability of analog-to-digital converters (ADCs). The problem arises when the input signal dynamic range is larger than ADC’s dynamic range. To overcome this issue, an alternative acquisition protocol called the Unlimited Sensing Framework (USF) was recently proposed. This non-linear sensing scheme incorporates signal folding (via modulo non-linearity) before sampling. Reconstruction then entails \"unfolding\" of the high dynamic range input. Taking an end-to-end approach to the USF, a hardware validation called US-ADC was recently presented. US-ADC experiments show that, in some scenarios, the samples can be more accurately modelled as local averages than ideal, pointwise measurements. In particular, this happens when the input signal frequency is much larger than the operational bandwidth of the US-ADC. Pushing such hardware limits using computational approaches motivates the study of modulo sampling and reconstruction via local averages. By incorporating a modulo-hysteresis model, both in theory and in hardware, we present a guaranteed recovery algorithm for input reconstruction. We also explore a practical method suited for low sampling rates. Our approach is validated via simulations and experiments on hardware, thus enabling a step closer to practice.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116905397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
An Efficient Framework for Detection and Recognition of Numerical Traffic Signs 一种有效的数字交通标志检测与识别框架
Zhishan Li, Mingmu Chen, Yifan He, Lei Xie, H. Su
Due to the variety of categories and uneven distribution of available samples, automatic traffic sign detection and recognition is still a challenging task. For those categories with less training data, existing deep learning methods cannot achieve desirable performance, and the overall detection effect is not satisfactory as well. In this letter, we fully explore the relationship between different traffic signs with digital characters and transform the category objects into multi-level classes to alleviate the uneven distribution of samples. We design a lightweight two-stage object detection framework with high real-time performance. The first stage network is proposed to obtain the category groups of traffic signs, and then we construct another object detection network to identify the digital characters of the detected traffic signs. To make the prediction in the first stage more accurate, we put forward a boxes fusion algorithm in the post-processing process and a refine module to improve the recognition performance. Experimental results show that our approach possesses significantly improved performance compared with the latest object detection networks and other traffic sign detectors. Even some traffic signs that only exist in testset can also be recognized accurately by our method.
由于可用样本种类繁多且分布不均,交通标志的自动检测与识别仍然是一项具有挑战性的任务。对于训练数据较少的类别,现有的深度学习方法无法达到理想的性能,整体检测效果也不令人满意。在这封信中,我们充分探索了带有数字字符的不同交通标志之间的关系,并将类别对象转化为多层次的类,以缓解样本分布的不均匀。我们设计了一个轻量级的两阶段目标检测框架,具有较高的实时性。提出了第一阶段网络获取交通标志的类别组,然后构建另一个目标检测网络来识别检测到的交通标志的数字特征。为了使第一阶段的预测更加准确,我们在后处理过程中提出了框融合算法和细化模块来提高识别性能。实验结果表明,与最新的目标检测网络和其他交通标志检测器相比,我们的方法具有显著提高的性能。甚至一些只存在于测试集中的交通标志也可以被我们的方法准确地识别出来。
{"title":"An Efficient Framework for Detection and Recognition of Numerical Traffic Signs","authors":"Zhishan Li, Mingmu Chen, Yifan He, Lei Xie, H. Su","doi":"10.1109/ICASSP43922.2022.9747406","DOIUrl":"https://doi.org/10.1109/ICASSP43922.2022.9747406","url":null,"abstract":"Due to the variety of categories and uneven distribution of available samples, automatic traffic sign detection and recognition is still a challenging task. For those categories with less training data, existing deep learning methods cannot achieve desirable performance, and the overall detection effect is not satisfactory as well. In this letter, we fully explore the relationship between different traffic signs with digital characters and transform the category objects into multi-level classes to alleviate the uneven distribution of samples. We design a lightweight two-stage object detection framework with high real-time performance. The first stage network is proposed to obtain the category groups of traffic signs, and then we construct another object detection network to identify the digital characters of the detected traffic signs. To make the prediction in the first stage more accurate, we put forward a boxes fusion algorithm in the post-processing process and a refine module to improve the recognition performance. Experimental results show that our approach possesses significantly improved performance compared with the latest object detection networks and other traffic sign detectors. Even some traffic signs that only exist in testset can also be recognized accurately by our method.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121235393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
DeepGBASS: Deep Guided Boundary-Aware Semantic Segmentation 深度引导边界感知语义分割
Qingfeng Liu, Hai Su, Mostafa El-Khamy, Kee-Bong Song
Image semantic segmentation is ubiquitously used in scene understanding applications, such as AI Camera, which require high accuracy and efficiency. Deep learning has significantly advanced the state-of-the-art in semantic segmentation. However, many of recent semantic segmentation works only consider class accuracy and ignore the accuracies at the boundaries between semantic classes. To improve the semantic boundary accuracy, we propose low complexity Deep Guided Decoder (DGD) networks, trained with a novel Semantic Boundary-Aware Learning (SBAL) strategy. Our ablation studies on Cityscapes and the ADE20K-32 confirm the effectiveness of our approach with network of different complexities. We show that our DeepGBASS approach significantly improves the mIoU by up to 11% relative gain and the mean boundary F1-score (mBF) by up to 39.4% when training MobileNetEdgeTPU DeepLab on ADE20K-32 dataset.
图像语义分割在人工智能相机等场景理解应用中得到了广泛的应用,对精度和效率的要求很高。深度学习极大地推动了语义分割的发展。然而,目前许多语义分割工作只考虑类的准确性,而忽略了语义类之间边界的准确性。我们对城市景观和ADE20K-32的消融研究证实了我们的方法在不同复杂性网络中的有效性。我们表明,当在ADE20K-32数据集上训练MobileNetEdgeTPU deepplab时,我们的DeepGBASS方法显著提高了mIoU的相对增益高达11%,平均边界f1分数(mBF)提高了39.4%。
{"title":"DeepGBASS: Deep Guided Boundary-Aware Semantic Segmentation","authors":"Qingfeng Liu, Hai Su, Mostafa El-Khamy, Kee-Bong Song","doi":"10.1109/ICASSP43922.2022.9747892","DOIUrl":"https://doi.org/10.1109/ICASSP43922.2022.9747892","url":null,"abstract":"Image semantic segmentation is ubiquitously used in scene understanding applications, such as AI Camera, which require high accuracy and efficiency. Deep learning has significantly advanced the state-of-the-art in semantic segmentation. However, many of recent semantic segmentation works only consider class accuracy and ignore the accuracies at the boundaries between semantic classes. To improve the semantic boundary accuracy, we propose low complexity Deep Guided Decoder (DGD) networks, trained with a novel Semantic Boundary-Aware Learning (SBAL) strategy. Our ablation studies on Cityscapes and the ADE20K-32 confirm the effectiveness of our approach with network of different complexities. We show that our DeepGBASS approach significantly improves the mIoU by up to 11% relative gain and the mean boundary F1-score (mBF) by up to 39.4% when training MobileNetEdgeTPU DeepLab on ADE20K-32 dataset.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127192443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hierarchical Deep Learning Model with Inertial and Physiological Sensors Fusion for Wearable-Based Human Activity Recognition 基于惯性和生理传感器融合的可穿戴人体活动识别层次深度学习模型
Dae Yon Hwang, Pai Chet Ng, Yuanhao Yu, Yang Wang, P. Spachos, D. Hatzinakos, K. Plataniotis
This paper presents a human activity recognition (HAR) system with wearable devices. While various approaches have been suggested for HAR, most of them focus on either 1) the inertial sensors to capture the physical movement or 2) subject-dependent evaluations that are less practical to real world cases. To this end, our work integrates sensing in-puts from physiological sensors to compensate the limitation of inertial sensors in capturing the human activities with less physical movements. Physiological sensors can capture physiological responses reflecting human behaviors in executing daily activities. To simulate a realistic application, three different evaluation scenarios are considered, namely All-access, Cross-subject and Cross-activity. Lastly, we propose a Hierarchical Deep Learning (HDL) model, which improves the accuracy and stability of HAR, compared to conventional models. Our proposed HDL with fusion of inertial and physiological sensing inputs achieves 97.16%, 92.23%, 90.18% average accuracy in All-access, Cross-subject, Cross-activity scenarios, which confirms the effectiveness of our approach.
提出了一种基于可穿戴设备的人体活动识别系统。虽然已经为HAR提出了各种方法,但其中大多数都集中在1)惯性传感器捕捉物理运动或2)对现实世界案例不太实用的主体依赖评估上。为此,我们的工作整合了来自生理传感器的传感输入,以弥补惯性传感器在捕捉较少物理运动的人类活动方面的局限性。生理传感器可以捕捉反映人类日常活动行为的生理反应。为了模拟一个现实的应用,考虑了三种不同的评估场景,即全访问、跨学科和跨活动。最后,我们提出了一种层次深度学习(HDL)模型,与传统模型相比,该模型提高了HAR的准确性和稳定性。我们提出的融合惯性和生理传感输入的HDL在All-access, Cross-subject, Cross-activity场景下的平均准确率达到97.16%,92.23%,90.18%,证实了我们方法的有效性。
{"title":"Hierarchical Deep Learning Model with Inertial and Physiological Sensors Fusion for Wearable-Based Human Activity Recognition","authors":"Dae Yon Hwang, Pai Chet Ng, Yuanhao Yu, Yang Wang, P. Spachos, D. Hatzinakos, K. Plataniotis","doi":"10.1109/icassp43922.2022.9747471","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747471","url":null,"abstract":"This paper presents a human activity recognition (HAR) system with wearable devices. While various approaches have been suggested for HAR, most of them focus on either 1) the inertial sensors to capture the physical movement or 2) subject-dependent evaluations that are less practical to real world cases. To this end, our work integrates sensing in-puts from physiological sensors to compensate the limitation of inertial sensors in capturing the human activities with less physical movements. Physiological sensors can capture physiological responses reflecting human behaviors in executing daily activities. To simulate a realistic application, three different evaluation scenarios are considered, namely All-access, Cross-subject and Cross-activity. Lastly, we propose a Hierarchical Deep Learning (HDL) model, which improves the accuracy and stability of HAR, compared to conventional models. Our proposed HDL with fusion of inertial and physiological sensing inputs achieves 97.16%, 92.23%, 90.18% average accuracy in All-access, Cross-subject, Cross-activity scenarios, which confirms the effectiveness of our approach.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127372310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Improved Beamforming Encoding for Joint Radar and Communication 联合雷达与通信的改进波束形成编码
Tuomas Aittomäki, V. Koivunen
Integrated Sensing and Communication Systems (ISAC) that are capable of functioning both as radars and communication systems have a tremendous potential to provide significant performance gains and cost savings and facilitate the sharing of the same energy, spectral, and hardware resources. Managing interference in frequency and spatial domains is a crucial task in ISAC. We consider a radar-centric scenario where a radar system is able to do beamforming and transmit communications data on the side. We propose an improved method allowing good control of the transmit beampattern power resulting in lower communication error level. Furthermore, we also propose straightforward method for phase coding of information in radar signals.
集成传感和通信系统(ISAC)能够同时作为雷达和通信系统,具有巨大的潜力,可以提供显著的性能提升和成本节约,并促进相同能源、频谱和硬件资源的共享。频率域和空间域的干扰管理是ISAC的关键任务。我们考虑了一个以雷达为中心的场景,其中雷达系统能够进行波束形成并在侧面传输通信数据。我们提出了一种改进的方法,可以很好地控制发射波束图功率,从而降低通信错误水平。此外,我们还提出了一种简单的雷达信号信息相位编码方法。
{"title":"Improved Beamforming Encoding for Joint Radar and Communication","authors":"Tuomas Aittomäki, V. Koivunen","doi":"10.1109/icassp43922.2022.9747241","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747241","url":null,"abstract":"Integrated Sensing and Communication Systems (ISAC) that are capable of functioning both as radars and communication systems have a tremendous potential to provide significant performance gains and cost savings and facilitate the sharing of the same energy, spectral, and hardware resources. Managing interference in frequency and spatial domains is a crucial task in ISAC. We consider a radar-centric scenario where a radar system is able to do beamforming and transmit communications data on the side. We propose an improved method allowing good control of the transmit beampattern power resulting in lower communication error level. Furthermore, we also propose straightforward method for phase coding of information in radar signals.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127515305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low-Latency Human-Computer Auditory Interface Based on Real-Time Vision Analysis 基于实时视觉分析的低延迟人机听觉界面
Florian Scalvini, Camille Bordeau, Maxime Ambard, C. Migniot, Julien Dubois
This paper proposes a visuo-auditory substitution method to assist visually impaired people in scene understanding. Our approach focuses on person localisation in the user’s vicinity in order to ease urban walking. Since a real-time and low-latency is required in this context for user’s security, we propose an embedded system. The processing is based on a lightweight convolutional neural network to perform an efficient 2D person localisation. This measurement is enhanced with the corresponding person depth information, and is then transcribed into a stereophonic signal via a head-related transfer function. A GPU-based implementation is presented that enables a real-time processing to be reached at 23 frames/s on a 640x480 video stream. We show with an experiment that this method allows for a real-time accurate audio-based localization.
本文提出了一种视觉-听觉替代方法来帮助视障人士理解场景。我们的方法侧重于用户附近的人员定位,以缓解城市步行。由于在这种情况下,用户的安全需要实时和低延迟,我们提出了嵌入式系统。该处理基于轻量级卷积神经网络来执行有效的2D人物定位。该测量与相应的人深度信息增强,然后通过头部相关传递函数转录成立体声信号。提出了一种基于gpu的实现,可以在640x480视频流上以23帧/秒的速度进行实时处理。我们通过实验证明,这种方法可以实现实时准确的基于音频的定位。
{"title":"Low-Latency Human-Computer Auditory Interface Based on Real-Time Vision Analysis","authors":"Florian Scalvini, Camille Bordeau, Maxime Ambard, C. Migniot, Julien Dubois","doi":"10.1109/icassp43922.2022.9747094","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747094","url":null,"abstract":"This paper proposes a visuo-auditory substitution method to assist visually impaired people in scene understanding. Our approach focuses on person localisation in the user’s vicinity in order to ease urban walking. Since a real-time and low-latency is required in this context for user’s security, we propose an embedded system. The processing is based on a lightweight convolutional neural network to perform an efficient 2D person localisation. This measurement is enhanced with the corresponding person depth information, and is then transcribed into a stereophonic signal via a head-related transfer function. A GPU-based implementation is presented that enables a real-time processing to be reached at 23 frames/s on a 640x480 video stream. We show with an experiment that this method allows for a real-time accurate audio-based localization.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124894974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Dual Attention Pooling Network for Recording Device Classification Using Neutral and Whispered Speech 基于中性和耳语语音的录音设备分类双注意池网络
Abinay Reddy Naini, B. Singhal, P. Ghosh
In this work, we proposed a method for recording device classification using the recorded speech signal. With the rapid increase in different mobile and professional recording devices, determining the source device has many applications in forensics and in further improving various speech-based applications. This paper proposes dual and single attention pooling-based convolutional neural networks (CNN) for recording device classification using neutral and whispered speech. Experiments using five recording devices with simultaneous direct recordings from 88 speakers speaking both in neutral and whisper and recordings from 21 mobile devices with simultaneous playback recordings reveal that the proposed dual attention pooling based CNN method performs better than the best baseline scheme. We show that we achieve a better performance in recording device classification with whispered speech recordings than corresponding neutral speech. We also demonstrate the importance of voiced/unvoiced speech and different frequency bands in classifying the recording devices.
在这项工作中,我们提出了一种利用录制语音信号对录音设备进行分类的方法。随着各种移动和专业录音设备的快速增加,确定源设备在取证和进一步改进各种基于语音的应用中有许多应用。本文提出了基于双注意池和单注意池的卷积神经网络(CNN),用于录音设备分类。通过5台录音设备和21台同时播放录音的移动设备进行的实验表明,基于双重注意池的CNN方法比最佳基线方案表现更好。我们的研究表明,与相应的中性语音相比,我们在录音设备分类中取得了更好的性能。我们还论证了浊音/浊音和不同频带对录音设备分类的重要性。
{"title":"Dual Attention Pooling Network for Recording Device Classification Using Neutral and Whispered Speech","authors":"Abinay Reddy Naini, B. Singhal, P. Ghosh","doi":"10.1109/icassp43922.2022.9747700","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747700","url":null,"abstract":"In this work, we proposed a method for recording device classification using the recorded speech signal. With the rapid increase in different mobile and professional recording devices, determining the source device has many applications in forensics and in further improving various speech-based applications. This paper proposes dual and single attention pooling-based convolutional neural networks (CNN) for recording device classification using neutral and whispered speech. Experiments using five recording devices with simultaneous direct recordings from 88 speakers speaking both in neutral and whisper and recordings from 21 mobile devices with simultaneous playback recordings reveal that the proposed dual attention pooling based CNN method performs better than the best baseline scheme. We show that we achieve a better performance in recording device classification with whispered speech recordings than corresponding neutral speech. We also demonstrate the importance of voiced/unvoiced speech and different frequency bands in classifying the recording devices.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125098711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Neural Network-Based Compression Framework for DOA Estimation Exploiting Distributed Array 基于神经网络的分布式阵列DOA估计压缩框架
S. Pavel, Yimin D. Zhang
Distributed array consisting of multiple subarrays is attractive for high-resolution direction-of-arrival (DOA) estimation when a large-scale array is infeasible. To achieve effective distributed DOA estimation, it is required to transmit information observed at the subarrays to the fusion center, where DOA estimation is performed. For noncoherent data fusion, the covariance matrices are used for subarray fusion. To address the complexity involved with the large array size, we propose a compression framework consisting of multiple parallel encoders and a classifier. The parallel encoders at the distributed subarrays are trained to compress the respective covariance matrices. The compressed results are sent to the fusion center where the signal DOAs are estimated using a classifier based on the compressed covariance matrices.
当大规模阵列无法实现时,由多个子阵列组成的分布式阵列对高分辨率到达方向估计具有吸引力。为了实现有效的分布式DOA估计,需要将子阵列观测到的信息传输到融合中心进行DOA估计。对于非相干数据融合,采用协方差矩阵进行子阵列融合。为了解决大数组大小所涉及的复杂性,我们提出了一个由多个并行编码器和一个分类器组成的压缩框架。训练分布式子阵列上的并行编码器压缩各自的协方差矩阵。压缩后的结果被发送到融合中心,在融合中心使用基于压缩协方差矩阵的分类器估计信号的doa。
{"title":"Neural Network-Based Compression Framework for DOA Estimation Exploiting Distributed Array","authors":"S. Pavel, Yimin D. Zhang","doi":"10.1109/icassp43922.2022.9746724","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746724","url":null,"abstract":"Distributed array consisting of multiple subarrays is attractive for high-resolution direction-of-arrival (DOA) estimation when a large-scale array is infeasible. To achieve effective distributed DOA estimation, it is required to transmit information observed at the subarrays to the fusion center, where DOA estimation is performed. For noncoherent data fusion, the covariance matrices are used for subarray fusion. To address the complexity involved with the large array size, we propose a compression framework consisting of multiple parallel encoders and a classifier. The parallel encoders at the distributed subarrays are trained to compress the respective covariance matrices. The compressed results are sent to the fusion center where the signal DOAs are estimated using a classifier based on the compressed covariance matrices.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125835525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Conjugate Augmented Spatial-Temporal Near-Field Sources Localization with Cross Array 基于交叉阵的共轭增强时空近场源定位
Zhi-Min Jiang, Hua Chen, Wei Liu, Ye Tian, G. Wang
A new near-field source localization method is proposed for two-dimensional (2-D) direction-of-arrival (DOA) and range estimation based on a symmetrical cross array. It first employs the conjugate symmetry property of the signal auto-correlation at different time delays to construct a conjugate augmented spatial-temporal cross correlation matrix, then the extended steering vector is decoupled to avoid the usual multiple-dimensional (M-D) search based on the properties of the Khatri-Rao product, and finally three one-dimensional (1-D) MUSIC type searches are employed to obtain the results. The proposed method can realize automatic pairing of multiple parameters associated with each source and it also works in the underdetermined case.
提出了一种基于对称交叉阵列的二维到达方向和距离估计的近场源定位方法。首先利用信号自相关在不同时滞处的共轭对称性构造共轭增广时空互相关矩阵,然后根据Khatri-Rao积的性质对扩展转向向量进行解耦,避免了通常的多维(M-D)搜索,最后采用三次一维(1-D) MUSIC型搜索得到结果。该方法可以实现与每个源相关联的多个参数的自动配对,并适用于欠确定情况。
{"title":"Conjugate Augmented Spatial-Temporal Near-Field Sources Localization with Cross Array","authors":"Zhi-Min Jiang, Hua Chen, Wei Liu, Ye Tian, G. Wang","doi":"10.1109/icassp43922.2022.9746864","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746864","url":null,"abstract":"A new near-field source localization method is proposed for two-dimensional (2-D) direction-of-arrival (DOA) and range estimation based on a symmetrical cross array. It first employs the conjugate symmetry property of the signal auto-correlation at different time delays to construct a conjugate augmented spatial-temporal cross correlation matrix, then the extended steering vector is decoupled to avoid the usual multiple-dimensional (M-D) search based on the properties of the Khatri-Rao product, and finally three one-dimensional (1-D) MUSIC type searches are employed to obtain the results. The proposed method can realize automatic pairing of multiple parameters associated with each source and it also works in the underdetermined case.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126108413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1