首页 > 最新文献

IEEE open journal of signal processing最新文献

英文 中文
L3DAS23: Learning 3D Audio Sources for Audio-Visual Extended Reality L3DAS23:为视听扩展现实学习 3D 音频源
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-03-12 DOI: 10.1109/OJSP.2024.3376297
Riccardo F. Gramaccioni;Christian Marinoni;Changan Chen;Aurelio Uncini;Danilo Comminiello
The primary goal of the L3DAS (Learning 3D Audio Sources) project is to stimulate and support collaborative research studies concerning machine learning techniques applied to 3D audio signal processing. To this end, the L3DAS23 Challenge, presented at IEEE ICASSP 2023, focuses on two spatial audio tasks of paramount interest for practical uses: 3D speech enhancement (3DSE) and 3D sound event localization and detection (3DSELD). Both tasks are evaluated within augmented reality applications. The aim of this paper is to describe the main results obtained from this challenge. We provide the L3DAS23 dataset, which comprises a collection of first-order Ambisonics recordings in reverberant simulated environments. Indeed, we maintain some general characteristics of the previous L3DAS challenges, featuring a pair of first-order Ambisonics microphones to capture the audio signals and involving multiple-source and multiple-perspective Ambisonics recordings. However, in this new edition, we introduce audio-visual scenarios by including images that depict the frontal view of the environments as captured from the perspective of the microphones. This addition aims to enrich the challenge experience, giving participants tools for exploring a combination of audio and images for solving the 3DSE and 3DSELD tasks. In addition to a brand-new dataset, we provide updated baseline models designed to take advantage of audio-image pairs. To ensure accessibility and reproducibility, we also supply supporting API for an effortless replication of our results. Lastly, we present the results achieved by the participants of the L3DAS23 Challenge.
L3DAS(学习三维音频信号源)项目的主要目标是激励和支持将机器学习技术应用于三维音频信号处理的合作研究。为此,在 IEEE ICASSP 2023 大会上提出的 L3DAS23 挑战赛重点关注两个在实际应用中具有重大意义的空间音频任务:三维语音增强(3DSE)和三维声音事件定位和检测(3DSELD)。这两项任务都是在增强现实应用中进行评估的。本文旨在介绍从这一挑战中获得的主要结果。我们提供了 L3DAS23 数据集,该数据集由混响模拟环境中的一阶 Ambisonics 录音组成。事实上,我们保留了以往 L3DAS 挑战赛的一些一般特点,即使用一对一阶 Ambisonics 麦克风捕捉音频信号,并涉及多源和多视角 Ambisonics 录音。不过,在新版本中,我们引入了视听场景,加入了从麦克风角度捕捉环境正面视图的图像。这一新增内容旨在丰富挑战体验,为参赛者提供探索音频和图像相结合的工具,以解决 3DSE 和 3DSELD 任务。除了全新的数据集之外,我们还提供了更新的基线模型,旨在利用音频和图像对的优势。为了确保可访问性和可复制性,我们还提供了支持应用程序接口,以便轻松复制我们的成果。最后,我们介绍了 L3DAS23 挑战赛参赛者取得的成果。
{"title":"L3DAS23: Learning 3D Audio Sources for Audio-Visual Extended Reality","authors":"Riccardo F. Gramaccioni;Christian Marinoni;Changan Chen;Aurelio Uncini;Danilo Comminiello","doi":"10.1109/OJSP.2024.3376297","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3376297","url":null,"abstract":"The primary goal of the L3DAS (Learning 3D Audio Sources) project is to stimulate and support collaborative research studies concerning machine learning techniques applied to 3D audio signal processing. To this end, the L3DAS23 Challenge, presented at IEEE ICASSP 2023, focuses on two spatial audio tasks of paramount interest for practical uses: 3D speech enhancement (3DSE) and 3D sound event localization and detection (3DSELD). Both tasks are evaluated within augmented reality applications. The aim of this paper is to describe the main results obtained from this challenge. We provide the L3DAS23 dataset, which comprises a collection of first-order Ambisonics recordings in reverberant simulated environments. Indeed, we maintain some general characteristics of the previous L3DAS challenges, featuring a pair of first-order Ambisonics microphones to capture the audio signals and involving multiple-source and multiple-perspective Ambisonics recordings. However, in this new edition, we introduce audio-visual scenarios by including images that depict the frontal view of the environments as captured from the perspective of the microphones. This addition aims to enrich the challenge experience, giving participants tools for exploring a combination of audio and images for solving the 3DSE and 3DSELD tasks. In addition to a brand-new dataset, we provide updated baseline models designed to take advantage of audio-image pairs. To ensure accessibility and reproducibility, we also supply supporting API for an effortless replication of our results. Lastly, we present the results achieved by the participants of the L3DAS23 Challenge.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"632-640"},"PeriodicalIF":2.9,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10468560","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141447968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Auditory EEG Decoding Challenge for ICASSP 2023 2023 年国际听觉、视觉和听觉科学大会听觉脑电图解码挑战赛
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-03-12 DOI: 10.1109/OJSP.2024.3376296
Mohammad Jalilpour Monesi;Lies Bollens;Bernd Accou;Jonas Vanthornhout;Hugo Van Hamme;Tom Francart
This paper describes the auditory EEG challenge, organized as one of the Signal Processing Grand Challenges at ICASSP 2023. The challenge provides EEG recordings of 85 subjects who listened to continuous speech, as audiobooks or podcasts, while their brain activity was recorded. EEG recordings of 71 subjects were provided as a training set such that challenge participants could train their models on a relatively large dataset. The remaining 14 subjects were used as held-out subjects in evaluating the challenge. The challenge consists of two tasks that relate electroencephalogram (EEG) signals to the presented speech stimulus. The first task, match-mismatch, aims to determine which of two speech segments induced a given EEG segment. In the second regression task, the goal is to reconstruct the speech envelope from the EEG. For the match-mismatch task, the performance of different teams was close to the baseline model, and the models did generalize well to unseen subjects. In contrast, For the regression task, the top teams significantly improved over the baseline models in the held-out stories test set while failing to generalize to unseen subjects.
本文介绍了听觉脑电图挑战赛,该挑战赛是 2023 年 ICASSP 上的信号处理大挑战赛之一。挑战赛提供了 85 名受试者的脑电图记录,这些受试者在听有声读物或播客等连续语音的同时,大脑活动也被记录下来。其中 71 名受试者的脑电图记录作为训练集提供,这样挑战赛参赛者就可以在一个相对较大的数据集上训练他们的模型。剩下的 14 个受试者则作为评估挑战赛的保留受试者。挑战赛由两项任务组成,将脑电图(EEG)信号与呈现的语音刺激相关联。第一项任务是 "匹配-错配",目的是确定两个语音片段中哪个诱发了给定的脑电图片段。在第二项回归任务中,目标是根据脑电图重建语音包络。在 "匹配-错配 "任务中,不同团队的表现都接近基线模型,而且这些模型都能很好地泛化到未见过的受试者身上。与此相反,在回归任务中,顶尖团队在保留故事测试集上的表现明显优于基线模型,但却不能泛化到未见过的受试者身上。
{"title":"Auditory EEG Decoding Challenge for ICASSP 2023","authors":"Mohammad Jalilpour Monesi;Lies Bollens;Bernd Accou;Jonas Vanthornhout;Hugo Van Hamme;Tom Francart","doi":"10.1109/OJSP.2024.3376296","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3376296","url":null,"abstract":"This paper describes the auditory EEG challenge, organized as one of the Signal Processing Grand Challenges at ICASSP 2023. The challenge provides EEG recordings of 85 subjects who listened to continuous speech, as audiobooks or podcasts, while their brain activity was recorded. EEG recordings of 71 subjects were provided as a training set such that challenge participants could train their models on a relatively large dataset. The remaining 14 subjects were used as held-out subjects in evaluating the challenge. The challenge consists of two tasks that relate electroencephalogram (EEG) signals to the presented speech stimulus. The first task, match-mismatch, aims to determine which of two speech segments induced a given EEG segment. In the second regression task, the goal is to reconstruct the speech envelope from the EEG. For the match-mismatch task, the performance of different teams was close to the baseline model, and the models did generalize well to unseen subjects. In contrast, For the regression task, the top teams significantly improved over the baseline models in the held-out stories test set while failing to generalize to unseen subjects.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"652-661"},"PeriodicalIF":2.9,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10468639","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141447999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Person Identification and Relapse Detection From Continuous Recordings of Biosignals Challenge: Overview and Results 从连续记录的生物信号挑战中进行人员识别和复发检测:概述和结果
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-03-12 DOI: 10.1109/OJSP.2024.3376300
Athanasia Zlatintsi;Panagiotis P. Filntisis;Niki Efthymiou;Christos Garoufis;George Retsinas;Thomas Sounapoglou;Ilias Maglogiannis;Panayiotis Tsanakas;Nikolaos Smyrnis;Petros Maragos
This paper presents an overview of the e-Prevention: Person Identification and Relapse Detection Challenge, which was an open call for researchers at ICASSP-2023. The challenge aimed at the analysis and processing of long-term continuous recordings of biosignals recorded from wearable sensors, namely accelerometers, gyroscopes and heart rate monitors embedded in smartwatches, as well as sleep information and daily step counts, in order to extract high-level representations of the wearer's activity and behavior, termed as digital phenotypes. Specifically, with the goal of analyzing the ability of these digital phenotypes in quantifying behavioral patterns, two tasks were evaluated in two distinct tracks: 1) Identification of the wearer of the smartwatch, and 2) Detection of psychotic relapses in patients in the psychotic spectrum. The long-term data that have been used in this challenge have been acquired during the course of the e-Prevention project (Zlatintsi et al., 2022), an innovative integrated system for medical support that facilitates effective monitoring and relapse prevention in patients with mental disorders. Two baseline systems, one for each task, were described and the validation scores for both tasks were provided to the participants. Herein, we present an overview of the approaches and methods as well as the performance analysis and the results of the 5-top ranked participating teams, which in track 1 achieved accuracy results between 91%-95%, while in track 2 mean PR- and ROC-AUC scores between 0.6051 and 0.6489 were obtained. Finally, we also make the datasets publicly available at https://robotics.ntua.gr/eprevention-sp-challenge/.
本文概述了电子预防:个人识别和复发检测挑战赛是 ICASSP-2023 会议向研究人员发出的一项公开呼吁。该挑战旨在分析和处理从可穿戴传感器(即智能手表中嵌入的加速计、陀螺仪和心率监测器)记录的长期连续生物信号记录,以及睡眠信息和每日步数,以提取佩戴者活动和行为的高级表征,称为数字表型。具体来说,为了分析这些数字表型在量化行为模式方面的能力,我们在两个不同的轨道上对两项任务进行了评估:1)识别智能手表的佩戴者;2)检测精神病谱系患者的精神病复发。本次挑战所使用的长期数据是在电子预防项目(Zlatintsi 等人,2022 年)过程中获得的,该项目是一个创新的医疗支持综合系统,有助于对精神障碍患者进行有效监控和预防复发。我们介绍了两个基线系统,每个任务一个,并向参与者提供了两个任务的验证分数。在此,我们将概述这些方法和手段,以及性能分析和排名前 5 位的参赛团队的结果,其中第 1 赛道的准确率在 91%-95% 之间,而第 2 赛道的平均 PR- 和 ROC-AUC 分数在 0.6051 和 0.6489 之间。最后,我们还在 https://robotics.ntua.gr/eprevention-sp-challenge/ 上公开了数据集。
{"title":"Person Identification and Relapse Detection From Continuous Recordings of Biosignals Challenge: Overview and Results","authors":"Athanasia Zlatintsi;Panagiotis P. Filntisis;Niki Efthymiou;Christos Garoufis;George Retsinas;Thomas Sounapoglou;Ilias Maglogiannis;Panayiotis Tsanakas;Nikolaos Smyrnis;Petros Maragos","doi":"10.1109/OJSP.2024.3376300","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3376300","url":null,"abstract":"This paper presents an overview of the e-Prevention: Person Identification and Relapse Detection Challenge, which was an open call for researchers at ICASSP-2023. The challenge aimed at the analysis and processing of long-term continuous recordings of biosignals recorded from wearable sensors, namely accelerometers, gyroscopes and heart rate monitors embedded in smartwatches, as well as sleep information and daily step counts, in order to extract high-level representations of the wearer's activity and behavior, termed as digital phenotypes. Specifically, with the goal of analyzing the ability of these digital phenotypes in quantifying behavioral patterns, two tasks were evaluated in two distinct tracks: 1) Identification of the wearer of the smartwatch, and 2) Detection of psychotic relapses in patients in the psychotic spectrum. The long-term data that have been used in this challenge have been acquired during the course of the e-Prevention project (Zlatintsi et al., 2022), an innovative integrated system for medical support that facilitates effective monitoring and relapse prevention in patients with mental disorders. Two baseline systems, one for each task, were described and the validation scores for both tasks were provided to the participants. Herein, we present an overview of the approaches and methods as well as the performance analysis and the results of the 5-top ranked participating teams, which in track 1 achieved accuracy results between 91%-95%, while in track 2 mean PR- and ROC-AUC scores between 0.6051 and 0.6489 were obtained. Finally, we also make the datasets publicly available at \u0000<uri>https://robotics.ntua.gr/eprevention-sp-challenge/</uri>\u0000.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"641-651"},"PeriodicalIF":2.9,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10470363","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141447969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ICASSP 2023 Speech Signal Improvement Challenge ICASSP 2023 语音信号改进挑战赛
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-03-12 DOI: 10.1109/OJSP.2024.3376293
Ross Cutler;Ando Saabas;Babak Naderi;Nicolae-Cătălin Ristea;Sebastian Braun;Solomiya Branets
The ICASSP 2023 Speech Signal Improvement Challenge is intended to stimulate research in the area of improving the speech signal quality in communication systems. The speech signal quality can be measured with SIG in ITU-T P.835 and is still a top issue in audio communication and conferencing systems. For example, in the ICASSP 2022 Deep Noise Suppression challenge, the improvement in the background and overall quality is impressive, but the improvement in the speech signal is not statistically significant. To improve the speech signal the following speech impairment areas must be addressed: coloration, discontinuity, loudness, reverberation, and noise. A training and test set was provided for the challenge, and the winners were determined using an extended crowdsourced implementation of ITU-T P.804’s listening phase. The results show significant improvement was made across all measured dimensions of speech quality.
ICASSP 2023 语音信号改进挑战赛旨在促进通信系统中语音信号质量改进领域的研究。语音信号质量可通过 ITU-T P.835 中的 SIG 进行测量,目前仍是音频通信和会议系统中的首要问题。例如,在 ICASSP 2022 深度噪声抑制挑战赛中,背景和整体质量的改善令人印象深刻,但语音信号的改善在统计学上并不显著。要改善语音信号,必须解决以下语音损伤方面的问题:色彩、不连续性、响度、混响和噪音。我们为挑战赛提供了训练集和测试集,并通过对 ITU-T P.804 聆听阶段的扩展众包实施来确定优胜者。结果表明,在语音质量的所有测量维度上都有明显改善。
{"title":"ICASSP 2023 Speech Signal Improvement Challenge","authors":"Ross Cutler;Ando Saabas;Babak Naderi;Nicolae-Cătălin Ristea;Sebastian Braun;Solomiya Branets","doi":"10.1109/OJSP.2024.3376293","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3376293","url":null,"abstract":"The ICASSP 2023 Speech Signal Improvement Challenge is intended to stimulate research in the area of improving the speech signal quality in communication systems. The speech signal quality can be measured with SIG in ITU-T P.835 and is still a top issue in audio communication and conferencing systems. For example, in the ICASSP 2022 Deep Noise Suppression challenge, the improvement in the background and overall quality is impressive, but the improvement in the speech signal is not statistically significant. To improve the speech signal the following speech impairment areas must be addressed: coloration, discontinuity, loudness, reverberation, and noise. A training and test set was provided for the challenge, and the winners were determined using an extended crowdsourced implementation of ITU-T P.804’s listening phase. The results show significant improvement was made across all measured dimensions of speech quality.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"662-674"},"PeriodicalIF":2.9,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10470433","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141448002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatial Sigma-Delta Modulation for Coarsely Quantized Massive MIMO Downlink: Flexible Designs by Convex Optimization 用于粗量化大规模多输入多输出下行链路的空间Σ-Δ调制:通过凸优化实现灵活设计
Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-03-11 DOI: 10.1109/OJSP.2024.3375653
Wai-Yiu Keung;Wing-Kin Ma
This article considers the context of multiuser massive MIMO downlink precoding with low-resolution digital-to-analog converters (DACs) at the transmitter. This subject is motivated by the consideration that it is expensive to employ high-resolution DACs for practical massive MIMO implementations. The challenge with using low-resolution DACs is to overcome the detrimental quantization error effects. Recently, spatial Sigma-Delta ($Sigma Delta$) modulation has arisen as a viable way to put quantization errors under control. This approach takes insight from temporal $Sigma Delta$ modulation in classical DAC studies. Assuming a 1D uniform linear transmit antenna array, the principle is to shape the quantization errors in space such that the shaped quantization errors are pushed away from the user-serving angle sector. In the previous studies, spatial $Sigma Delta$ modulation was performed by direct application of the basic first- and second-order modulators from the $Sigma Delta$ literature. In this paper, we develop a general $Sigma Delta$ modulator design framework for any given order, for any given number of quantization levels, and for any given angle sector. We formulate our design as a problem of maximizing the signal-to-quantization-and-noise ratios (SQNRs) experienced by the users. The formulated problem is convex and can be efficiently solved by available solvers. Our proposed framework offers the alternative option of focused quantization error suppression in accordance with channel state information. Our framework can also be extended to 2D planar transmit antenna arrays. We perform numerical study under different operating conditions, and the numerical results suggest that, given a moderate number of quantization levels, say, 5 to 7 levels, our optimization-based $Sigma Delta$ modulation schemes can lead to bit error rate performance close to that of the unquantized counterpart.
本文探讨了在发射器使用低分辨率数模转换器(DAC)进行多用户大规模 MIMO 下行链路预编码的问题。考虑到在实际大规模 MIMO 实施中采用高分辨率 DAC 成本高昂,因此提出了这一主题。使用低分辨率 DAC 所面临的挑战是如何克服有害的量化误差效应。最近,空间Σ-Δ($Sigma Delta$)调制已成为控制量化误差的一种可行方法。这种方法从经典 DAC 研究中的时间 $Sigma Delta$ 调制中汲取了灵感。假设有一个 1D 均匀线性发射天线阵列,其原理是在空间中塑造量化误差,使塑造后的量化误差被推离用户服务角度扇区。在以往的研究中,空间 $Sigma Delta$ 调制是通过直接应用 $Sigma Delta$ 文献中的基本一阶和二阶调制器来实现的。在本文中,我们为任何给定的阶、任何给定的量化级数和任何给定的角度扇区开发了一个通用的 $Sigma Delta$ 调制器设计框架。我们将设计表述为用户体验到的信号-量化-噪声比(SQNRs)最大化问题。所提出的问题是凸问题,可用求解器有效求解。我们提出的框架提供了另一种选择,即根据信道状态信息集中抑制量化误差。我们的框架还可以扩展到二维平面发射天线阵列。我们在不同的工作条件下进行了数值研究,数值结果表明,给定中等数量的量化级别,例如 5 到 7 级,我们基于优化的 $Sigma Delta$ 调制方案可以带来接近于未量化的对应方案的误码率性能。
{"title":"Spatial Sigma-Delta Modulation for Coarsely Quantized Massive MIMO Downlink: Flexible Designs by Convex Optimization","authors":"Wai-Yiu Keung;Wing-Kin Ma","doi":"10.1109/OJSP.2024.3375653","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3375653","url":null,"abstract":"This article considers the context of multiuser massive MIMO downlink precoding with low-resolution digital-to-analog converters (DACs) at the transmitter. This subject is motivated by the consideration that it is expensive to employ high-resolution DACs for practical massive MIMO implementations. The challenge with using low-resolution DACs is to overcome the detrimental quantization error effects. Recently, spatial Sigma-Delta (\u0000<inline-formula><tex-math>$Sigma Delta$</tex-math></inline-formula>\u0000) modulation has arisen as a viable way to put quantization errors under control. This approach takes insight from temporal \u0000<inline-formula><tex-math>$Sigma Delta$</tex-math></inline-formula>\u0000 modulation in classical DAC studies. Assuming a 1D uniform linear transmit antenna array, the principle is to shape the quantization errors in space such that the shaped quantization errors are pushed away from the user-serving angle sector. In the previous studies, spatial \u0000<inline-formula><tex-math>$Sigma Delta$</tex-math></inline-formula>\u0000 modulation was performed by direct application of the basic first- and second-order modulators from the \u0000<inline-formula><tex-math>$Sigma Delta$</tex-math></inline-formula>\u0000 literature. In this paper, we develop a general \u0000<inline-formula><tex-math>$Sigma Delta$</tex-math></inline-formula>\u0000 modulator design framework for any given order, for any given number of quantization levels, and for any given angle sector. We formulate our design as a problem of maximizing the signal-to-quantization-and-noise ratios (SQNRs) experienced by the users. The formulated problem is convex and can be efficiently solved by available solvers. Our proposed framework offers the alternative option of focused quantization error suppression in accordance with channel state information. Our framework can also be extended to 2D planar transmit antenna arrays. We perform numerical study under different operating conditions, and the numerical results suggest that, given a moderate number of quantization levels, say, 5 to 7 levels, our optimization-based \u0000<inline-formula><tex-math>$Sigma Delta$</tex-math></inline-formula>\u0000 modulation schemes can lead to bit error rate performance close to that of the unquantized counterpart.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"520-538"},"PeriodicalIF":0.0,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10465600","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140621199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Representation Synthesis by Probabilistic Many-Valued Logic Operation in Self-Supervised Learning 自我监督学习中通过概率多值逻辑操作进行表征合成
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-03-10 DOI: 10.1109/OJSP.2024.3399663
Hiroki Nakamura;Masashi Okada;Tadahiro Taniguchi
In this paper, we propose a new self-supervised learning (SSL) method for representations that enable logic operations. Representation learning has been applied to various tasks like image generation and retrieval. The logical controllability of representations is important for these tasks. Although some methods have been shown to enable the intuitive control of representations using natural languages as the inputs, representation control via logic operations between representations has not been demonstrated. Some SSL methods using representation synthesis (e.g., elementwise mean and maximum operations) have been proposed, but the operations performed in these methods do not incorporate logic operations. In this work, we propose a logic-operable self-supervised representation learning method by replacing the existing representation synthesis with the OR operation on the probabilistic extension of many-valued logic. The representations comprise a set of feature-possession degrees, which are truth values indicating the presence or absence of each feature in the image, and realize the logic operations (e.g., OR and AND). Our method can generate a representation that has the features of both representations or only those features common to both representations. Furthermore, the expression of the ambiguous presence of a feature is realized by indicating the feature-possession degree by the probability distribution of truth values of the many-valued logic. We showed that our method performs competitively in single and multi-label classification tasks compared with prior SSL methods using synthetic representations. Moreover, experiments on image retrieval using MNIST and PascalVOC showed the representations of our method can be operated by OR and AND operations.
在本文中,我们提出了一种新的自我监督学习(SSL)方法,用于实现逻辑运算的表征。表征学习已被应用于图像生成和检索等各种任务中。表征的逻辑可控性对这些任务非常重要。虽然有些方法已经证明可以使用自然语言作为输入对表征进行直观控制,但通过表征之间的逻辑运算进行表征控制的方法尚未得到证实。有人提出了一些使用表征合成(如元素平均和最大值运算)的 SSL 方法,但这些方法中执行的运算并不包含逻辑运算。在这项工作中,我们提出了一种可进行逻辑运算的自监督表征学习方法,即在多值逻辑的概率扩展上使用 OR 运算取代现有的表征合成。表征由一组特征拥有度(表示图像中每个特征存在或不存在的真值)组成,并实现逻辑运算(如 OR 和 AND)。我们的方法可以生成具有两种表征特征或仅具有两种表征共同特征的表征。此外,通过多值逻辑真值的概率分布来表示特征的拥有程度,从而实现对特征模糊存在的表达。我们的研究表明,与之前使用合成表征的 SSL 方法相比,我们的方法在单标签和多标签分类任务中表现出很强的竞争力。此外,使用 MNIST 和 PascalVOC 进行的图像检索实验表明,我们方法的表示法可以通过 OR 和 AND 运算进行操作。
{"title":"Representation Synthesis by Probabilistic Many-Valued Logic Operation in Self-Supervised Learning","authors":"Hiroki Nakamura;Masashi Okada;Tadahiro Taniguchi","doi":"10.1109/OJSP.2024.3399663","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3399663","url":null,"abstract":"In this paper, we propose a new self-supervised learning (SSL) method for representations that enable logic operations. Representation learning has been applied to various tasks like image generation and retrieval. The logical controllability of representations is important for these tasks. Although some methods have been shown to enable the intuitive control of representations using natural languages as the inputs, representation control via logic operations between representations has not been demonstrated. Some SSL methods using representation synthesis (e.g., elementwise mean and maximum operations) have been proposed, but the operations performed in these methods do not incorporate logic operations. In this work, we propose a logic-operable self-supervised representation learning method by replacing the existing representation synthesis with the OR operation on the probabilistic extension of many-valued logic. The representations comprise a set of feature-possession degrees, which are truth values indicating the presence or absence of each feature in the image, and realize the logic operations (e.g., OR and AND). Our method can generate a representation that has the features of both representations or only those features common to both representations. Furthermore, the expression of the ambiguous presence of a feature is realized by indicating the feature-possession degree by the probability distribution of truth values of the many-valued logic. We showed that our method performs competitively in single and multi-label classification tasks compared with prior SSL methods using synthetic representations. Moreover, experiments on image retrieval using MNIST and PascalVOC showed the representations of our method can be operated by OR and AND operations.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"831-840"},"PeriodicalIF":2.9,"publicationDate":"2024-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10528856","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141543877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PtychoDV: Vision Transformer-Based Deep Unrolling Network for Ptychographic Image Reconstruction PtychoDV:基于视觉变换器的深度解卷网络,用于双色图像重建
Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-03-08 DOI: 10.1109/OJSP.2024.3375276
Weijie Gan;Qiuchen Zhai;Michael T. McCann;Cristina Garcia Cardona;Ulugbek S. Kamilov;Brendt Wohlberg
Ptychography is an imaging technique that captures multiple overlapping snapshots of a sample, illuminated coherently by a moving localized probe. The image recovery from ptychographic data is generally achieved via an iterative algorithm that solves a nonlinear phase retrieval problem derived from measured diffraction patterns. However, these iterative approaches have high computational cost. In this paper, we introduce PtychoDV, a novel deep model-based network designed for efficient, high-quality ptychographic image reconstruction. PtychoDV comprises a vision transformer that generates an initial image from the set of raw measurements, taking into consideration their mutual correlations. This is followed by a deep unrolling network that refines the initial image using learnable convolutional priors and the ptychography measurement model. Experimental results on simulated data demonstrate that PtychoDV is capable of outperforming existing deep learning methods for this problem, and significantly reduces computational cost compared to iterative methodologies, while maintaining competitive performance.
层析成像是一种成像技术,通过移动的局部探针相干照射,捕捉样品的多个重叠快照。通常通过迭代算法来从分层成像数据中恢复图像,该算法解决了从测量衍射图样中得出的非线性相位检索问题。然而,这些迭代方法的计算成本很高。在本文中,我们介绍了 PtychoDV,这是一种基于深度模型的新型网络,专为高效、高质量的梯形图像重建而设计。PtychoDV 包括一个视觉转换器,它能从一组原始测量值生成初始图像,并考虑到它们之间的相互关联。随后,深度卷积网络利用可学习的卷积先验和梯形摄影测量模型完善初始图像。模拟数据的实验结果表明,PtychoDV 能够超越现有的深度学习方法来解决这个问题,与迭代方法相比,它大大降低了计算成本,同时保持了极具竞争力的性能。
{"title":"PtychoDV: Vision Transformer-Based Deep Unrolling Network for Ptychographic Image Reconstruction","authors":"Weijie Gan;Qiuchen Zhai;Michael T. McCann;Cristina Garcia Cardona;Ulugbek S. Kamilov;Brendt Wohlberg","doi":"10.1109/OJSP.2024.3375276","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3375276","url":null,"abstract":"Ptychography is an imaging technique that captures multiple overlapping snapshots of a sample, illuminated coherently by a moving localized probe. The image recovery from ptychographic data is generally achieved via an iterative algorithm that solves a nonlinear phase retrieval problem derived from measured diffraction patterns. However, these iterative approaches have high computational cost. In this paper, we introduce PtychoDV, a novel deep model-based network designed for efficient, high-quality ptychographic image reconstruction. PtychoDV comprises a vision transformer that generates an initial image from the set of raw measurements, taking into consideration their mutual correlations. This is followed by a deep unrolling network that refines the initial image using learnable convolutional priors and the ptychography measurement model. Experimental results on simulated data demonstrate that PtychoDV is capable of outperforming existing deep learning methods for this problem, and significantly reduces computational cost compared to iterative methodologies, while maintaining competitive performance.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"539-547"},"PeriodicalIF":0.0,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10463649","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140621193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards a Geometric Understanding of Spatiotemporal Graph Convolution Networks 实现对时空图卷积网络的几何理解
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-03-03 DOI: 10.1109/OJSP.2024.3396635
Pratyusha Das;Sarath Shekkizhar;Antonio Ortega
Spatiotemporal graph convolutional networks (STGCNs) have emerged as a desirable model for skeleton-based human action recognition. Despite achieving state-of-the-art performance, there is a limited understanding of the representations learned by these models, which hinders their application in critical and real-world settings. While layerwise analysis of CNN models has been studied in the literature, to the best of our knowledge, there exists no study on the layerwise explainability of the embeddings learned on spatiotemporal data using STGCNs. In this paper, we first propose to use a local Dataset Graph (DS-Graph) obtained from the feature representation of input data at each layer to develop an understanding of the layer-wise embedding geometry of the STGCN. To do so, we develop a window-based dynamic time warping (DTW) method to compute the distance between data sequences with varying temporal lengths. To validate our findings, we have developed a layer-specific Spatiotemporal Graph Gradient-weighted Class Activation Mapping (L-STG-GradCAM) technique tailored for spatiotemporal data. This approach enables us to visually analyze and interpret each layer within the STGCN network. We characterize the functions learned by each layer of the STGCN using the label smoothness of the representation and visualize them using our L-STG-GradCAM approach. Our proposed method is generic and can yield valuable insights for STGCN architectures in different applications. However, this paper focuses on the human activity recognition task as a representative application. Our experiments show that STGCN models learn representations that capture general human motion in their initial layers while discriminating different actions only in later layers. This justifies experimental observations showing that fine-tuning deeper layers works well for transfer between related tasks. We provide experimental evidence for different human activity datasets and advanced spatiotemporal graph networks to validate that the proposed method is general enough to analyze any STGCN model and can be useful for drawing insight into networks in various scenarios. We also show that noise at the input has a limited effect on label smoothness, which can help justify the robustness of STGCNs to noise.
时空图卷积网络(STGCN)已成为基于骨骼的人类动作识别的理想模型。尽管其性能达到了最先进的水平,但人们对这些模型学习到的表征的了解却很有限,这阻碍了它们在关键和真实世界环境中的应用。虽然已有文献对 CNN 模型进行了分层分析,但据我们所知,目前还没有关于使用 STGCN 对时空数据所学嵌入的分层可解释性的研究。在本文中,我们首先提议使用从每层输入数据的特征表示中获得的局部数据集图(DS-Graph)来理解 STGCN 的层上嵌入几何。为此,我们开发了一种基于窗口的动态时间扭曲(DTW)方法,用于计算不同时间长度的数据序列之间的距离。为了验证我们的研究结果,我们开发了一种专为时空数据定制的特定层时空图梯度加权类激活映射(L-STG-GradCAM)技术。这种方法使我们能够直观地分析和解释 STGCN 网络中的每一层。我们使用表征的标签平滑度来描述 STGCN 每一层学习到的函数,并使用 L-STG-GradCAM 方法将其可视化。我们提出的方法具有通用性,可以为不同应用中的 STGCN 架构提供有价值的见解。不过,本文重点讨论的是人类活动识别任务这一代表性应用。我们的实验表明,STGCN 模型在初始层中学习的表征可以捕捉到一般的人体运动,而在后面的层中只能识别不同的动作。这证明了实验观察结果,即微调更深层次的表征对相关任务之间的转换非常有效。我们为不同的人类活动数据集和高级时空图网络提供了实验证据,以验证所提出的方法具有足够的通用性,可以分析任何 STGCN 模型,并有助于深入了解各种情况下的网络。我们还表明,输入噪声对标签平滑度的影响有限,这有助于证明 STGCN 对噪声的鲁棒性。
{"title":"Towards a Geometric Understanding of Spatiotemporal Graph Convolution Networks","authors":"Pratyusha Das;Sarath Shekkizhar;Antonio Ortega","doi":"10.1109/OJSP.2024.3396635","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3396635","url":null,"abstract":"Spatiotemporal graph convolutional networks (STGCNs) have emerged as a desirable model for \u0000<italic>skeleton</i>\u0000-based human action recognition. Despite achieving state-of-the-art performance, there is a limited understanding of the representations learned by these models, which hinders their application in critical and real-world settings. While layerwise analysis of CNN models has been studied in the literature, to the best of our knowledge, there exists \u0000<italic>no study</i>\u0000 on the layerwise explainability of the embeddings learned on spatiotemporal data using STGCNs. In this paper, we first propose to use a local Dataset Graph (DS-Graph) obtained from the feature representation of input data at each layer to develop an understanding of the layer-wise embedding geometry of the STGCN. To do so, we develop a window-based dynamic time warping (DTW) method to compute the distance between data sequences with varying temporal lengths. To validate our findings, we have developed a layer-specific Spatiotemporal Graph Gradient-weighted Class Activation Mapping (L-STG-GradCAM) technique tailored for spatiotemporal data. This approach enables us to visually analyze and interpret each layer within the STGCN network. We characterize the functions learned by each layer of the STGCN using the label smoothness of the representation and visualize them using our L-STG-GradCAM approach. Our proposed method is generic and can yield valuable insights for STGCN architectures in different applications. However, this paper focuses on the human activity recognition task as a representative application. Our experiments show that STGCN models learn representations that capture general human motion in their initial layers while discriminating different actions only in later layers. This justifies experimental observations showing that fine-tuning deeper layers works well for transfer between related tasks. We provide experimental evidence for different human activity datasets and advanced spatiotemporal graph networks to validate that the proposed method is general enough to analyze any STGCN model and can be useful for drawing insight into networks in various scenarios. We also show that noise at the input has a limited effect on label smoothness, which can help justify the robustness of STGCNs to noise.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"1023-1030"},"PeriodicalIF":2.9,"publicationDate":"2024-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10518107","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142316397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attention-Based End-to-End Differentiable Particle Filter for Audio Speaker Tracking 用于音频扬声器跟踪的基于注意力的端到端可微粒滤波器
Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-02-08 DOI: 10.1109/OJSP.2024.3363649
Jinzheng Zhao;Yong Xu;Xinyuan Qian;Haohe Liu;Mark D. Plumbley;Wenwu Wang
Particle filters (PFs) have been widely used in speaker tracking due to their capability in modeling a non-linear process or a non-Gaussian environment. However, particle filters are limited by several issues. For example, pre-defined handcrafted measurements are often used which can limit the model performance. In addition, the transition and update models are often preset which make PF less flexible to be adapted to different scenarios. To address these issues, we propose an end-to-end differentiable particle filter framework by employing the multi-head attention to model the long-range dependencies. The proposed model employs the self-attention as the learned transition model and the cross-attention as the learned update model. To our knowledge, this is the first proposal of combining particle filter and transformer for speaker tracking, where the measurement extraction, transition and update steps are integrated into an end-to-end architecture. Experimental results show that the proposed model achieves superior performance over the recurrent baseline models.
由于粒子滤波器(PFs)能够模拟非线性过程或非高斯环境,因此在扬声器跟踪中得到了广泛应用。然而,粒子滤波器受到几个问题的限制。例如,通常使用预定义的手工测量,这会限制模型的性能。此外,过渡和更新模型通常是预设的,这使得粒子滤波器在适应不同场景时不够灵活。为了解决这些问题,我们提出了一种端到端可微分粒子滤波器框架,利用多头注意力来模拟长程依赖关系。所提出的模型采用自注意力作为学习过渡模型,交叉注意力作为学习更新模型。据我们所知,这是首个将粒子滤波器和转换器结合起来用于扬声器跟踪的提案,其中测量提取、转换和更新步骤被集成到一个端到端架构中。实验结果表明,所提出的模型比递归基线模型性能更优。
{"title":"Attention-Based End-to-End Differentiable Particle Filter for Audio Speaker Tracking","authors":"Jinzheng Zhao;Yong Xu;Xinyuan Qian;Haohe Liu;Mark D. Plumbley;Wenwu Wang","doi":"10.1109/OJSP.2024.3363649","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3363649","url":null,"abstract":"Particle filters (PFs) have been widely used in speaker tracking due to their capability in modeling a non-linear process or a non-Gaussian environment. However, particle filters are limited by several issues. For example, pre-defined handcrafted measurements are often used which can limit the model performance. In addition, the transition and update models are often preset which make PF less flexible to be adapted to different scenarios. To address these issues, we propose an end-to-end differentiable particle filter framework by employing the multi-head attention to model the long-range dependencies. The proposed model employs the self-attention as the learned transition model and the cross-attention as the learned update model. To our knowledge, this is the first proposal of combining particle filter and transformer for speaker tracking, where the measurement extraction, transition and update steps are integrated into an end-to-end architecture. Experimental results show that the proposed model achieves superior performance over the recurrent baseline models.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"449-458"},"PeriodicalIF":0.0,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10428039","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139976169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Channel-Temporal Attention for Boosting RF Fingerprinting 提升射频指纹识别的高效信道时空注意力
Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-02-06 DOI: 10.1109/OJSP.2024.3362695
Hanqing Gu;Lisheng Su;Yuxia Wang;Weifeng Zhang;Chuan Ran
In recent years, Deep Convolutional Neural Networks (DCNNs) have been widely used to solve Radio Frequency (RF) fingerprinting task. DCNNs are capable of learning the proper convolution kernels driven by data and directly extracting RF fingerprints from raw In-phase/Quadratur (IQ) data which are brought by variations or minor flaws in transmitters' circuits, enabling the identification of a specific transmitter. One of the main challenges in employing this sort of technology is how to optimize model design so that it can automatically learn discriminative RF fingerprints and show robustness to changes in environmental factors. To this end, this paper proposes ECTAttention, an Efficient Channel-Temporal Attention block that can be used to enhance the feature learning capability of DCNNs. ECTAttention has two parallel branches. On the one hand, it automatically mines the correlation between channels through channel attention to discover and enhance important convolution kernels. On the other hand, it can recalibrate the feature map through temporal attention. ECTAttention has good flexibility and high efficiency, and can be combined with existing DCNNs to effectively enhance their feature learning ability on the basis of increasing only a small amount of computational consumption, so as to achieve high precision of RF fingerprinting. Our experimental results show that ResNet enhanced by ECTAttention can identify 10 USRP X310 SDRs with an accuracy of 97.5%, and achieve a recognition accuracy of 91.9% for 56 actual ADS-B signal sources under unconstrained acquisition environment.
近年来,深度卷积神经网络(DCNN)被广泛用于解决射频(RF)指纹识别任务。DCNNs 能够根据数据学习适当的卷积核,并直接从原始同相/四相(IQ)数据中提取射频指纹(这些数据由发射机电路中的变化或微小缺陷带来),从而识别特定的发射机。采用这种技术的主要挑战之一是如何优化模型设计,使其能够自动学习具有鉴别力的射频指纹,并对环境因素的变化表现出鲁棒性。为此,本文提出了 ECTAttention,一种可用于增强 DCNN 特征学习能力的高效信道-时态注意模块。ECTAttention 有两个并行分支。一方面,它通过通道注意力自动挖掘通道之间的相关性,以发现和增强重要的卷积核。另一方面,它可以通过时间注意力重新校准特征图。ECTAttention 具有良好的灵活性和较高的效率,可以与现有的 DCNN 结合使用,在只增加少量计算消耗的基础上有效增强其特征学习能力,从而实现高精度的射频指纹识别。实验结果表明,经ECTAttention增强的ResNet对10个USRP X310 SDR的识别准确率达到97.5%,在无约束采集环境下对56个实际ADS-B信号源的识别准确率达到91.9%。
{"title":"Efficient Channel-Temporal Attention for Boosting RF Fingerprinting","authors":"Hanqing Gu;Lisheng Su;Yuxia Wang;Weifeng Zhang;Chuan Ran","doi":"10.1109/OJSP.2024.3362695","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3362695","url":null,"abstract":"In recent years, Deep Convolutional Neural Networks (DCNNs) have been widely used to solve Radio Frequency (RF) fingerprinting task. DCNNs are capable of learning the proper convolution kernels driven by data and directly extracting RF fingerprints from raw In-phase/Quadratur (IQ) data which are brought by variations or minor flaws in transmitters' circuits, enabling the identification of a specific transmitter. One of the main challenges in employing this sort of technology is how to optimize model design so that it can automatically learn discriminative RF fingerprints and show robustness to changes in environmental factors. To this end, this paper proposes \u0000<italic>ECTAttention</i>\u0000, an \u0000<bold>E</b>\u0000fficient \u0000<bold>C</b>\u0000hannel-\u0000<bold>T</b>\u0000emporal \u0000<bold>A</b>\u0000ttention block that can be used to enhance the feature learning capability of DCNNs. \u0000<italic>ECTAttention</i>\u0000 has two parallel branches. On the one hand, it automatically mines the correlation between channels through channel attention to discover and enhance important convolution kernels. On the other hand, it can recalibrate the feature map through temporal attention. \u0000<italic>ECTAttention</i>\u0000 has good flexibility and high efficiency, and can be combined with existing DCNNs to effectively enhance their feature learning ability on the basis of increasing only a small amount of computational consumption, so as to achieve high precision of RF fingerprinting. Our experimental results show that ResNet enhanced by \u0000<italic>ECTAttention</i>\u0000 can identify 10 USRP X310 SDRs with an accuracy of 97.5%, and achieve a recognition accuracy of 91.9% for 56 actual ADS-B signal sources under unconstrained acquisition environment.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"478-492"},"PeriodicalIF":0.0,"publicationDate":"2024-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10423213","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139987098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE open journal of signal processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1