首页 > 最新文献

IEEE Journal of Selected Topics in Signal Processing最新文献

英文 中文
Transferability of coVariance Neural Networks 共变异神经网络的可移植性
IF 8.7 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-03-28 DOI: 10.1109/JSTSP.2024.3378887
Saurabh Sihag;Gonzalo Mateos;Corey McMillan;Alejandro Ribeiro
Graph convolutional networks (GCN) leverage topology-driven graph convolutional operations to combine information across the graph for inference tasks. In our recent work, we have studied GCNs with covariance matrices as graphs in the form of coVariance neural networks (VNNs) and shown that VNNs draw similarities with traditional principal component analysis (PCA) while overcoming its limitations regarding instability. In this paper, we focus on characterizing the transferability of VNNs. The notion of transferability is motivated from the intuitive expectation that learning models could generalize to “compatible” datasets (i.e., datasets of different dimensionalities describing the same domain) with minimal effort. VNNs inherit the scale-free data processing architecture from GCNs and here, we show that VNNs exhibit transferability of performance (without re-training) over datasets whose covariance matrices converge to a limit object. Multi-scale neuroimaging datasets enable the study of the brain at multiple scales and hence, provide an ideal scenario to validate the transferability of VNNs. We first demonstrate the quantitative transferability of VNNs over a regression task of predicting chronological age from a multi-scale dataset of cortical thickness features. Further, to elucidate the advantages offered by VNNs in neuroimaging data analysis, we also deploy VNNs as regression models in a pipeline for “brain age” prediction from cortical thickness features. The discordance between brain age and chronological age (“brain age gap”) can reflect increased vulnerability or resilience toward neurological disease or cognitive impairments. The architecture of VNNs allows us to extend beyond the coarse metric of brain age gap and associate anatomical interpretability to elevated brain age gap in Alzheimer's disease (AD). We leverage the transferability of VNNs to cross validate the anatomical interpretability offered by VNNs to brain age gap across datasets of different dimensionalities.
图卷积网络(GCN)利用拓扑驱动的图卷积操作,将整个图中的信息结合起来,完成推理任务。在最近的工作中,我们以共方差神经网络(VNN)的形式研究了具有协方差矩阵图的 GCN,结果表明 VNN 与传统的主成分分析(PCA)有相似之处,同时克服了其在不稳定性方面的局限性。在本文中,我们将重点描述 VNN 的可转移性。可迁移性的概念源于一种直观的期望,即学习模型能以最小的代价推广到 "兼容 "数据集(即描述同一领域的不同维度数据集)。VNN 继承了 GCN 的无标度数据处理架构,在此,我们展示了 VNN 在协方差矩阵收敛到极限对象的数据集上表现出的性能可迁移性(无需重新训练)。多尺度神经成像数据集可对大脑进行多尺度研究,因此为验证 VNN 的可迁移性提供了理想的场景。我们首先展示了 VNN 在从多尺度皮层厚度特征数据集预测年代年龄的回归任务中的定量可转移性。此外,为了阐明 VNN 在神经成像数据分析中的优势,我们还在根据皮层厚度特征预测 "脑年龄 "的管道中部署了 VNN 作为回归模型。脑年龄与实际年龄的不一致("脑年龄差距")可反映出神经系统疾病或认知障碍的脆弱性或恢复力的增加。VNN 的结构使我们能够超越脑龄差距的粗略度量,将解剖学上的可解释性与阿尔茨海默病(AD)中脑龄差距的增大联系起来。我们利用 VNN 的可转移性,在不同维度的数据集上交叉验证了 VNN 对脑年龄差距的解剖可解释性。
{"title":"Transferability of coVariance Neural Networks","authors":"Saurabh Sihag;Gonzalo Mateos;Corey McMillan;Alejandro Ribeiro","doi":"10.1109/JSTSP.2024.3378887","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3378887","url":null,"abstract":"Graph convolutional networks (GCN) leverage topology-driven graph convolutional operations to combine information across the graph for inference tasks. In our recent work, we have studied GCNs with covariance matrices as graphs in the form of coVariance neural networks (VNNs) and shown that VNNs draw similarities with traditional principal component analysis (PCA) while overcoming its limitations regarding instability. In this paper, we focus on characterizing the transferability of VNNs. The notion of transferability is motivated from the intuitive expectation that learning models could generalize to “compatible” datasets (i.e., datasets of different dimensionalities describing the same domain) with minimal effort. VNNs inherit the scale-free data processing architecture from GCNs and here, we show that VNNs exhibit transferability of performance (without re-training) over datasets whose covariance matrices converge to a limit object. Multi-scale neuroimaging datasets enable the study of the brain at multiple scales and hence, provide an ideal scenario to validate the transferability of VNNs. We first demonstrate the quantitative transferability of VNNs over a regression task of predicting chronological age from a multi-scale dataset of cortical thickness features. Further, to elucidate the advantages offered by VNNs in neuroimaging data analysis, we also deploy VNNs as regression models in a pipeline for “brain age” prediction from cortical thickness features. The discordance between brain age and chronological age (“brain age gap”) can reflect increased vulnerability or resilience toward neurological disease or cognitive impairments. The architecture of VNNs allows us to extend beyond the coarse metric of brain age gap and associate anatomical interpretability to elevated brain age gap in Alzheimer's disease (AD). We leverage the transferability of VNNs to cross validate the anatomical interpretability offered by VNNs to brain age gap across datasets of different dimensionalities.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 2","pages":"199-215"},"PeriodicalIF":8.7,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141500179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Compression Ratio Learning and Semantic Communications for Video Imaging 用于视频成像的压缩比学习和语义通信
IF 8.7 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-03-27 DOI: 10.1109/JSTSP.2024.3405853
Bowen Zhang;Zhijin Qin;Geoffrey Ye Li
It is crucial to improve data acquisition and transmission efficiency for mobile robots with limited power, memory, and bandwidth resources. For efficient data acquisition, a novel video compressed-sensing system with spatially-variant compression ratios is designed, which offers high imaging quality with low sampling rates; To improve data transmission efficiency, semantic communication is leveraged to reduce bandwidth requirement, which provides high image recovery quality with low transmission rates. In particular, we focus on the trade-off between rate and quality. To address the challenge, we use neural networks to decide the optimal rate allocation policy for given quality requirements. Due to the non-differentiable issue of rate, we train the networks by policy-gradient-based reinforcement learning. Numerical results show the superiority of the proposed methods over the existing baselines.
对于电力、内存和带宽资源有限的移动机器人来说,提高数据采集和传输效率至关重要。为了提高数据采集效率,我们设计了一种新型视频压缩传感系统,该系统具有空间变异压缩比,能以较低的采样率提供较高的成像质量;为了提高数据传输效率,我们利用语义通信降低带宽需求,能以较低的传输速率提供较高的图像复原质量。我们尤其关注速率与质量之间的权衡。为了应对这一挑战,我们使用神经网络来决定给定质量要求下的最优速率分配策略。由于速率问题不可区分,我们通过基于策略梯度的强化学习来训练网络。数值结果表明,所提出的方法优于现有的基线方法。
{"title":"Compression Ratio Learning and Semantic Communications for Video Imaging","authors":"Bowen Zhang;Zhijin Qin;Geoffrey Ye Li","doi":"10.1109/JSTSP.2024.3405853","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3405853","url":null,"abstract":"It is crucial to improve data acquisition and transmission efficiency for mobile robots with limited power, memory, and bandwidth resources. For efficient data acquisition, a novel video compressed-sensing system with spatially-variant compression ratios is designed, which offers high imaging quality with low sampling rates; To improve data transmission efficiency, semantic communication is leveraged to reduce bandwidth requirement, which provides high image recovery quality with low transmission rates. In particular, we focus on the trade-off between rate and quality. To address the challenge, we use neural networks to decide the optimal rate allocation policy for given quality requirements. Due to the non-differentiable issue of rate, we train the networks by policy-gradient-based reinforcement learning. Numerical results show the superiority of the proposed methods over the existing baselines.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 3","pages":"312-324"},"PeriodicalIF":8.7,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10539255","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NPR: Nocturnal Place Recognition Using Nighttime Translation in Large-Scale Training Procedures NPR:在大规模训练程序中使用夜间翻译进行夜间地点识别
IF 8.7 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-03-20 DOI: 10.1109/JSTSP.2024.3403247
Bingxi Liu;Yujie Fu;Feng Lu;Jinqiang Cui;Yihong Wu;Hong Zhang
Visual Place Recognition (VPR) is a critical task within the fields of intelligent robotics and computer vision. It involves retrieving similar database images based on a query photo from an extensive collection of known images. In real-world applications, this task encounters challenges when dealing with extreme illumination changes caused by nighttime query images. However, a large-scale training set with day-night correspondence for VPR remains absent. To address this challenge, we propose a novel pipeline that divides the general VPR into distinct domains of day and night, subsequently conquering Nocturnal Place Recognition (NPR). Specifically, we first establish a daynight street scene dataset, named NightStreet, and use it to train an unpaired image-to-image translation model. Then, we utilize this model to process existing large-scale VPR datasets, generating the night version of VPR datasets and demonstrating how to combine them with two popular VPR pipelines. Finally, we introduce a divide-and-conquer VPR framework designed to solve the degradation of NPR during daytime conditions. We provide comprehensive explanations at theoretical, experimental, and application levels. Under our framework, the performance of previous methods can be significantly improved on two public datasets, including the top-ranked method.
视觉位置识别(VPR)是智能机器人和计算机视觉领域的一项重要任务。它包括根据查询照片从大量已知图像中检索类似的数据库图像。在实际应用中,这项任务在处理夜间查询图像引起的极端光照变化时会遇到挑战。然而,用于 VPR 的大规模昼夜对应训练集仍然缺失。为了应对这一挑战,我们提出了一种新颖的方法,将一般的 VPR 分成白天和黑夜两个不同的领域,从而实现夜间地点识别(NPR)。具体来说,我们首先建立了一个名为 "NightStreet "的日夜街道场景数据集,并用它来训练一个无配对图像到图像的翻译模型。然后,我们利用该模型处理现有的大规模 VPR 数据集,生成夜间版 VPR 数据集,并演示如何将它们与两种流行的 VPR 管道相结合。最后,我们介绍了一个分而治之的 VPR 框架,旨在解决白天条件下的 NPR 退化问题。我们从理论、实验和应用层面进行了全面解释。在我们的框架下,以往方法在两个公共数据集上的性能得到了显著提高,其中包括排名第一的方法。
{"title":"NPR: Nocturnal Place Recognition Using Nighttime Translation in Large-Scale Training Procedures","authors":"Bingxi Liu;Yujie Fu;Feng Lu;Jinqiang Cui;Yihong Wu;Hong Zhang","doi":"10.1109/JSTSP.2024.3403247","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3403247","url":null,"abstract":"Visual Place Recognition (VPR) is a critical task within the fields of intelligent robotics and computer vision. It involves retrieving similar database images based on a query photo from an extensive collection of known images. In real-world applications, this task encounters challenges when dealing with extreme illumination changes caused by nighttime query images. However, a large-scale training set with day-night correspondence for VPR remains absent. To address this challenge, we propose a novel pipeline that divides the general VPR into distinct domains of day and night, subsequently conquering Nocturnal Place Recognition (NPR). Specifically, we first establish a daynight street scene dataset, named NightStreet, and use it to train an unpaired image-to-image translation model. Then, we utilize this model to process existing large-scale VPR datasets, generating the night version of VPR datasets and demonstrating how to combine them with two popular VPR pipelines. Finally, we introduce a divide-and-conquer VPR framework designed to solve the degradation of NPR during daytime conditions. We provide comprehensive explanations at theoretical, experimental, and application levels. Under our framework, the performance of previous methods can be significantly improved on two public datasets, including the top-ranked method.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 3","pages":"368-379"},"PeriodicalIF":8.7,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DoDo: Double DOE Optical System for Multishot Spectral Imaging DoDo:用于多焦光谱成像的双 DOE 光学系统
IF 8.7 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-03-17 DOI: 10.1109/JSTSP.2024.3402320
Sergio Urrea;Roman Jacome;M. Salman Asif;Henry Arguello;Hans Garcia
Snapshot Compressive Spectral Imaging Systems (SCSI) compress the scenes by capturing 2D projections of the encoded underlying signals. A decoder, trained with pre-acquired datasets, reconstructs the spectral images. SCSI systems based on diffractive optical elements (DOE) provide a small form factor and the single DOE can be optimized in an end-to-end manner. Since the spectral image is highly compressed in a SCSI system based on a single DOE, the quality of image reconstruction can be insufficient for diverse spectral imaging applications. This work proposes a multishot spectral imaging system employing a double-phase encoding with a double DOE architecture (DoDo), to improve the spectral reconstruction performance. The first DOE is fixed and provides the benefits of the diffractive optical systems. The second DOE provides the variable encoding of the multishot architectures. The work presents a differentiable mathematical model for the multishot DoDo system and optimizes the parameters of the DoDo architecture in an end-to-end manner. The proposed system was tested using simulations and a hardware prototype. To obtain a low-cost system, the implementation uses a deformable mirror for the variable DOE. The proposed DoDo system shows an improvement of up to 4 dB in PSNR in the reconstructed spectral images compared with the single DOE system.
快照压缩光谱成像系统(SCSI)通过捕捉编码底层信号的二维投影来压缩场景。解码器通过预先获取的数据集进行训练,重建光谱图像。基于衍射光学元件(DOE)的 SCSI 系统外形小巧,单个 DOE 可以端对端方式进行优化。由于光谱图像在基于单个 DOE 的 SCSI 系统中被高度压缩,图像重建的质量可能无法满足各种光谱成像应用的需要。这项研究提出了一种采用双相编码和双 DOE 结构(DoDo)的多频谱成像系统,以提高光谱重建性能。第一个 DOE 是固定的,具有衍射光学系统的优势。第二个 DOE 提供多点结构的可变编码。这项工作提出了一个多射 DODo 系统的可变数学模型,并以端到端的方式优化了 DoDo 架构的参数。利用模拟和硬件原型对所提出的系统进行了测试。为了实现低成本系统,实施过程中使用了可变 DOE 变形镜。与单一 DOE 系统相比,拟议的 DoDo 系统在重建光谱图像的 PSNR 方面提高了多达 4 dB。
{"title":"DoDo: Double DOE Optical System for Multishot Spectral Imaging","authors":"Sergio Urrea;Roman Jacome;M. Salman Asif;Henry Arguello;Hans Garcia","doi":"10.1109/JSTSP.2024.3402320","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3402320","url":null,"abstract":"Snapshot Compressive Spectral Imaging Systems (SCSI) compress the scenes by capturing 2D projections of the encoded underlying signals. A decoder, trained with pre-acquired datasets, reconstructs the spectral images. SCSI systems based on diffractive optical elements (DOE) provide a small form factor and the single DOE can be optimized in an end-to-end manner. Since the spectral image is highly compressed in a SCSI system based on a single DOE, the quality of image reconstruction can be insufficient for diverse spectral imaging applications. This work proposes a multishot spectral imaging system employing a double-phase encoding with a double DOE architecture (DoDo), to improve the spectral reconstruction performance. The first DOE is fixed and provides the benefits of the diffractive optical systems. The second DOE provides the variable encoding of the multishot architectures. The work presents a differentiable mathematical model for the multishot DoDo system and optimizes the parameters of the DoDo architecture in an end-to-end manner. The proposed system was tested using simulations and a hardware prototype. To obtain a low-cost system, the implementation uses a deformable mirror for the variable DOE. The proposed DoDo system shows an improvement of up to 4 dB in PSNR in the reconstructed spectral images compared with the single DOE system.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 4","pages":"704-713"},"PeriodicalIF":8.7,"publicationDate":"2024-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142587537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ViT-MDHGR: Cross-Day Reliability and Agility in Dynamic Hand Gesture Prediction via HD-sEMG Signal Decoding ViT-MDHGR:通过 HD-sEMG 信号解码实现动态手势预测的跨天可靠性和敏捷性
IF 8.7 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-03-17 DOI: 10.1109/JSTSP.2024.3402340
Qin Hu;Golara Ahmadi Azar;Alyson Fletcher;Sundeep Rangan;S. Farokh Atashzar
Surface electromyography (sEMG) and high-density sEMG (HD-sEMG) biosignals have been extensively investigated for myoelectric control of prosthetic devices, neurorobotics, and more recently human-computer interfaces because of their capability for hand gesture recognition/prediction in a wearable and non-invasive manner. High intraday (same-day) performance has been reported. However, the interday performance (separating training and testing days) is substantially degraded due to the poor generalizability of conventional approaches over time, hindering the application of such techniques in real-life practices. There are limited recent studies on the feasibility of multi-day hand gesture recognition. The existing studies face a major challenge: the need for long sEMG epochs makes the corresponding neural interfaces impractical due to the induced delay in myoelectric control. This paper proposes a compact ViT-based network for multi-day dynamic hand gesture prediction. We tackle the main challenge as the proposed model only relies on very short HD-sEMG signal windows (i.e., 50 ms, accounting for only one-sixth of the convention for real-time myoelectric implementation), boosting agility and responsiveness. Our proposed model can predict 11 dynamic gestures for 20 subjects with an average accuracy of over 71% on the testing day, 3-25 days after training. Moreover, when calibrated on just a small portion of data from the testing day, the proposed model can achieve over 92% accuracy by retraining less than 10% of the parameters for computational efficiency.
由于表面肌电图(sEMG)和高密度 sEMG(HD-sEMG)生物信号能够以可穿戴和无创的方式进行手势识别/预测,因此已被广泛研究用于假肢设备的肌电控制、神经机器人以及最近的人机界面。据报道,该技术具有很高的日内(当天)性能。然而,由于传统方法在一段时间内的通用性较差,跨日(训练日和测试日分开)性能大大降低,阻碍了此类技术在现实生活中的应用。近期关于多天手势识别可行性的研究非常有限。现有的研究面临着一个重大挑战:由于需要较长的 sEMG 时间,相应的神经接口因肌电控制的延迟而变得不切实际。本文提出了一种基于 ViT 的紧凑型网络,用于多日动态手势预测。我们所提出的模型仅依赖于非常短的 HD-sEMG 信号窗口(即 50 毫秒,仅占实时肌电实施惯例的六分之一),提高了灵活性和响应性,从而解决了这一主要挑战。我们提出的模型可以预测 20 名受试者的 11 种动态手势,在训练后 3-25 天的测试日平均准确率超过 71%。此外,当仅对测试日的一小部分数据进行校准时,为了提高计算效率,我们提出的模型只需重新训练不到 10%的参数,就能达到 92% 以上的准确率。
{"title":"ViT-MDHGR: Cross-Day Reliability and Agility in Dynamic Hand Gesture Prediction via HD-sEMG Signal Decoding","authors":"Qin Hu;Golara Ahmadi Azar;Alyson Fletcher;Sundeep Rangan;S. Farokh Atashzar","doi":"10.1109/JSTSP.2024.3402340","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3402340","url":null,"abstract":"Surface electromyography (sEMG) and high-density sEMG (HD-sEMG) biosignals have been extensively investigated for myoelectric control of prosthetic devices, neurorobotics, and more recently human-computer interfaces because of their capability for hand gesture recognition/prediction in a wearable and non-invasive manner. High intraday (same-day) performance has been reported. However, the interday performance (separating training and testing days) is substantially degraded due to the poor generalizability of conventional approaches over time, hindering the application of such techniques in real-life practices. There are limited recent studies on the feasibility of multi-day hand gesture recognition. The existing studies face a major challenge: the need for long sEMG epochs makes the corresponding neural interfaces impractical due to the induced delay in myoelectric control. This paper proposes a compact ViT-based network for multi-day dynamic hand gesture prediction. We tackle the main challenge as the proposed model only relies on very short HD-sEMG signal windows (i.e., 50 ms, accounting for only one-sixth of the convention for real-time myoelectric implementation), boosting agility and responsiveness. Our proposed model can predict 11 dynamic gestures for 20 subjects with an average accuracy of over 71% on the testing day, 3-25 days after training. Moreover, when calibrated on just a small portion of data from the testing day, the proposed model can achieve over 92% accuracy by retraining less than 10% of the parameters for computational efficiency.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 3","pages":"419-430"},"PeriodicalIF":8.7,"publicationDate":"2024-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Single-shot 3D Reconstruction by Fusion of Fourier Transform Profilometry and Line Clustering 融合傅立叶变换轮廓测量法和线条聚类法的单镜头三维重建技术
IF 8.7 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-03-13 DOI: 10.1109/JSTSP.2024.3400010
ZhenZhou Wang
Due to its better accuracy and resolution, Fourier transform profilometry (FTP) is more widely used than the line clustering (LC) based structured light (SL) 3D reconstruction technique. However, it has the bottleneck problem of the unavoidable phase unwrapping errors at places of occlusions and large discontinuities. In this paper, we propose a composite pattern based on the red, green and blue (RGB) channels of the color image to fuse FTP and LC for more robust single-shot reconstruction. The red channel contains the sinusoidal pattern for FTP and the rest of the channels contain the line patterns for LC. Therefore, the intervals between the adjacent lines in the line pattern could be selected as large as possible for robust clustering while the accuracy of FTP will not be affected by the large intervals of the lines. Based on the clustered lines, the phase wrap boundary errors caused by occlusions and large discontinuities are corrected. At last, a one-dimensional phase wrap boundary guided phase unwrapping approach is proposed to solve the bottleneck problem of spatial phase unwrapping for FTP. Experimental results showed that the proposed fusion method could reconstruct the complex shapes with occlusions and large discontinuities more robust than FTP or LC based SL alone.
由于傅立叶变换轮廓测量法(Fourier transform profilometry,FTP)具有更高的精度和分辨率,因此比基于线聚类(LC)的结构光(SL)三维重建技术得到了更广泛的应用。然而,它也有一个瓶颈问题,那就是在遮挡处和大的不连续处不可避免地会出现相位解包误差。在本文中,我们提出了一种基于彩色图像红、绿、蓝(RGB)通道的复合模式,以融合 FTP 和 LC,从而实现更稳健的单次重建。红色通道包含 FTP 的正弦波图案,其余通道包含 LC 的线条图案。因此,为了进行鲁棒聚类,线型中相邻线之间的间隔可以选得越大越好,而 FTP 的精度不会受到线间隔过大的影响。在聚类线的基础上,由遮挡和大的不连续性引起的相位包络边界误差也会得到修正。最后,提出了一种一维相位包边界引导的相位解包方法,以解决 FTP 空间相位解包的瓶颈问题。实验结果表明,与单独的 FTP 或基于 LC 的 SL 相比,所提出的融合方法能更稳健地重建有遮挡和大面积不连续的复杂形状。
{"title":"Single-shot 3D Reconstruction by Fusion of Fourier Transform Profilometry and Line Clustering","authors":"ZhenZhou Wang","doi":"10.1109/JSTSP.2024.3400010","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3400010","url":null,"abstract":"Due to its better accuracy and resolution, Fourier transform profilometry (FTP) is more widely used than the line clustering (LC) based structured light (SL) 3D reconstruction technique. However, it has the bottleneck problem of the unavoidable phase unwrapping errors at places of occlusions and large discontinuities. In this paper, we propose a composite pattern based on the red, green and blue (RGB) channels of the color image to fuse FTP and LC for more robust single-shot reconstruction. The red channel contains the sinusoidal pattern for FTP and the rest of the channels contain the line patterns for LC. Therefore, the intervals between the adjacent lines in the line pattern could be selected as large as possible for robust clustering while the accuracy of FTP will not be affected by the large intervals of the lines. Based on the clustered lines, the phase wrap boundary errors caused by occlusions and large discontinuities are corrected. At last, a one-dimensional phase wrap boundary guided phase unwrapping approach is proposed to solve the bottleneck problem of spatial phase unwrapping for FTP. Experimental results showed that the proposed fusion method could reconstruct the complex shapes with occlusions and large discontinuities more robust than FTP or LC based SL alone.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 3","pages":"325-335"},"PeriodicalIF":8.7,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Efficient RGB-D Indoor Scene-Parsing Solution via Lightweight Multiflow Intersection and Knowledge Distillation 通过轻量级多流交叉和知识蒸馏实现高效 RGB-D 室内场景解析解决方案
IF 8.7 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-03-13 DOI: 10.1109/JSTSP.2024.3400030
Wujie Zhou;Yuming Zhang;Weiqing Yan;Lv Ye
The rapid progression of convolutional neural networks (CNNs) has significantly improved indoor scene parsing, transforming the fields of robotics, autonomous navigation, augmented reality, and surveillance. Currently, societal demand is propelling these technologies toward integration into mobile smart device applications. However, the processing capabilities of mobile devices cannot support the comprehensive system requirements of CNNs, which poses a challenge for several deep-learning applications. One promising solution to this predicament is the deployment of lightweight student networks. These streamlined networks learn from their robust, cloud-based counterparts—that is, teacher networks—through knowledge distillation (KD). This facilitates a reduction in parameter count and optimizes student classification. Furthermore, a lightweight multiflow intersection network (LMINet) is proposed and developed for red–green–blue–depth (RGB-D) indoor scene parsing. The proposed method relies on dual-frequency KD (FKD) and compression KD (CKD) methods. A multiflow intersection module is introduced to efficiently integrate feature information from disparate layers. To maximize the performance of lightweight LMINet student (LMINet-S) networks, the FKD module employs a discrete cosine transform to capture feature information from different frequencies, whereas the CKD module compresses the features of diverse layers and distills their corresponding dimensions. Experiments using the NYUDv2 and SUN-RGBD datasets demonstrate that our LMINet teacher (LMINet-T) model, LMINet-S (without KD), and LMINet-S* (LMINet-S with KD) outperform state-of-the-art scene-parsing tools without increasing the parameter count (26.2M). Consequently, the technology is now closer to integration into mobile devices.
卷积神经网络(CNN)的飞速发展极大地改进了室内场景解析,改变了机器人、自主导航、增强现实和监控领域。目前,社会需求正推动这些技术集成到移动智能设备应用中。然而,移动设备的处理能力无法支持 CNN 的全面系统要求,这给一些深度学习应用带来了挑战。解决这一困境的一个可行方案是部署轻量级学生网络。这些精简的网络通过知识提炼(KD)从其基于云的强大同类网络(即教师网络)中学习。这有助于减少参数数量,优化学生分类。此外,针对红-绿-蓝-深(RGB-D)室内场景解析,提出并开发了一种轻量级多流交叉网络(LMINet)。所提出的方法依赖于双频 KD(FKD)和压缩 KD(CKD)方法。该方法引入了多流交叉模块,以有效整合来自不同层的特征信息。为了最大限度地提高轻量级 LMINet 学生(LMINet-S)网络的性能,FKD 模块采用离散余弦变换来捕捉不同频率的特征信息,而 CKD 模块则压缩不同层的特征并提炼其相应维度。使用 NYUDv2 和 SUN-RGBD 数据集进行的实验表明,我们的 LMINet 教师(LMINet-T)模型、LMINet-S(不含 KD)和 LMINet-S*(含 KD 的 LMINet-S)在不增加参数数(26.2M)的情况下,性能优于最先进的场景解析工具。因此,该技术现在更接近于集成到移动设备中。
{"title":"An Efficient RGB-D Indoor Scene-Parsing Solution via Lightweight Multiflow Intersection and Knowledge Distillation","authors":"Wujie Zhou;Yuming Zhang;Weiqing Yan;Lv Ye","doi":"10.1109/JSTSP.2024.3400030","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3400030","url":null,"abstract":"The rapid progression of convolutional neural networks (CNNs) has significantly improved indoor scene parsing, transforming the fields of robotics, autonomous navigation, augmented reality, and surveillance. Currently, societal demand is propelling these technologies toward integration into mobile smart device applications. However, the processing capabilities of mobile devices cannot support the comprehensive system requirements of CNNs, which poses a challenge for several deep-learning applications. One promising solution to this predicament is the deployment of lightweight student networks. These streamlined networks learn from their robust, cloud-based counterparts—that is, teacher networks—through knowledge distillation (KD). This facilitates a reduction in parameter count and optimizes student classification. Furthermore, a lightweight multiflow intersection network (LMINet) is proposed and developed for red–green–blue–depth (RGB-D) indoor scene parsing. The proposed method relies on dual-frequency KD (FKD) and compression KD (CKD) methods. A multiflow intersection module is introduced to efficiently integrate feature information from disparate layers. To maximize the performance of lightweight LMINet student (LMINet-S) networks, the FKD module employs a discrete cosine transform to capture feature information from different frequencies, whereas the CKD module compresses the features of diverse layers and distills their corresponding dimensions. Experiments using the NYUDv2 and SUN-RGBD datasets demonstrate that our LMINet teacher (LMINet-T) model, LMINet-S (without KD), and LMINet-S* (LMINet-S with KD) outperform state-of-the-art scene-parsing tools without increasing the parameter count (26.2M). Consequently, the technology is now closer to integration into mobile devices.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 3","pages":"336-345"},"PeriodicalIF":8.7,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatial-Temporal-Based Underdetermined Near-Field 3-D Localization Employing a Nonuniform Cross Array 利用非均匀交叉阵列进行基于时空的欠确定近场三维定位
IF 8.7 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-03-13 DOI: 10.1109/JSTSP.2024.3400046
Hua Chen;Zelong Yi;Zhiwei Jiang;Wei Liu;Ye Tian;Qing Wang;Gang Wang
In this paper, an underdetermined three-dimensional (3-D) near-field source localization method is proposed, based on a two-dimensional (2-D) symmetric nonuniform cross array. Firstly, by utilizing the symmetric coprime array along the x-axis, a fourth-order cumulant (FOC) based matrix is constructed, followed by vectorization operation to form a single virtual snapshot, which is equivalent to the received data of a virtual array observing from virtual far-field sources, generating an increased number of degrees of freedom (DOFs) compared to the original physical array. Meanwhile, multiple delay lags, named as pseudo snapshots, are introduced to address the single snapshot issue. Then, the received data of the uniform linear array along the y-axis is similarly processed to form another virtual array, followed by a cross-correlation operation on the virtual array observations constructed from the coprime array. Finally, the 2-D angles of the near-field sources are jointly estimated by employing the recently proposed sparse and parametric approach (SPA) and the Vandermonde decomposition technique, eliminating the need for parameter discretization. To estimate the range term, the conjugate symmetry property of the signal's autocorrelation function is used to construct the second-order statistics based received data with the whole array elements, and subsequently, the one-dimensional (1-D) MUSIC algorithm is applied. Moreover, some properties of the proposed array are analyzed. Compared with existing algorithms, the proposed one has better estimation performance given the same number of sensor elements, which can work in an underdetermined and mixed sources situation, as shown by simulation results with 3-D parameters automatically paired.
本文提出了一种基于二维对称非均匀交叉阵列的欠定三维近场源定位方法。首先,利用沿 x 轴对称共轭阵列,构建基于四阶累积(FOC)的矩阵,然后进行矢量化操作,形成单个虚拟快照,该快照相当于虚拟阵列从虚拟远场源观测到的接收数据,与原始物理阵列相比,增加了自由度(DOF)。同时,为了解决单快照问题,还引入了多个延迟滞后(称为伪快照)。然后,对沿 Y 轴的均匀线性阵列的接收数据进行类似处理,形成另一个虚拟阵列,接着对由共轭阵列构建的虚拟阵列观测数据进行交叉相关操作。最后,利用最近提出的稀疏和参数方法(SPA)以及范德蒙德分解技术共同估算近场源的二维角度,从而消除了参数离散化的需要。为了估算测距项,利用信号自相关函数的共轭对称特性来构建基于整个阵元接收数据的二阶统计量,然后应用一维(1-D)MUSIC 算法。此外,还分析了拟议阵列的一些特性。三维参数自动配对的仿真结果表明,与现有算法相比,拟议算法在相同传感元件数量的情况下具有更好的估计性能,可以在不确定和混合信号源的情况下工作。
{"title":"Spatial-Temporal-Based Underdetermined Near-Field 3-D Localization Employing a Nonuniform Cross Array","authors":"Hua Chen;Zelong Yi;Zhiwei Jiang;Wei Liu;Ye Tian;Qing Wang;Gang Wang","doi":"10.1109/JSTSP.2024.3400046","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3400046","url":null,"abstract":"In this paper, an underdetermined three-dimensional (3-D) near-field source localization method is proposed, based on a two-dimensional (2-D) symmetric nonuniform cross array. Firstly, by utilizing the symmetric coprime array along the x-axis, a fourth-order cumulant (FOC) based matrix is constructed, followed by vectorization operation to form a single virtual snapshot, which is equivalent to the received data of a virtual array observing from virtual far-field sources, generating an increased number of degrees of freedom (DOFs) compared to the original physical array. Meanwhile, multiple delay lags, named as pseudo snapshots, are introduced to address the single snapshot issue. Then, the received data of the uniform linear array along the y-axis is similarly processed to form another virtual array, followed by a cross-correlation operation on the virtual array observations constructed from the coprime array. Finally, the 2-D angles of the near-field sources are jointly estimated by employing the recently proposed sparse and parametric approach (SPA) and the Vandermonde decomposition technique, eliminating the need for parameter discretization. To estimate the range term, the conjugate symmetry property of the signal's autocorrelation function is used to construct the second-order statistics based received data with the whole array elements, and subsequently, the one-dimensional (1-D) MUSIC algorithm is applied. Moreover, some properties of the proposed array are analyzed. Compared with existing algorithms, the proposed one has better estimation performance given the same number of sensor elements, which can work in an underdetermined and mixed sources situation, as shown by simulation results with 3-D parameters automatically paired.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 4","pages":"561-571"},"PeriodicalIF":8.7,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142587507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Energy-Efficient Connectivity-Aware Learning Over Time-Varying D2D Networks 时变 D2D 网络上的高能效连接感知学习
IF 8.7 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-03-11 DOI: 10.1109/JSTSP.2024.3374591
Rohit Parasnis;Seyyedali Hosseinalipour;Yun-Wei Chu;Mung Chiang;Christopher G. Brinton
Semi-decentralized federated learning blends the conventional device-to-server (D2S) interaction structure of federated model training with localized device-to-device (D2D) communications. We study this architecture over edge networks with multiple D2D clusters modeled as time-varying and directed communication graphs. Our investigation results in two algorithms: (a) a connectivity-aware learning algorithm that controls the fundamental trade-off between the convergence rate of the model training process and the number of energy-intensive D2S transmissions required for global aggregation, and (b) a motion-planning algorithm to enhance the densities and regularity levels of cluster digraphs so as to further reduce the number of D2S transmissions in connectivity-aware learning. Specifically, in our semi-decentralized methodology, weighted-averaging-based D2D updates are injected into the federated averaging framework based on column-stochastic weight matrices that encapsulate the connectivity within the clusters. To develop our algorithm, we show how the current expected optimality gap (i.e., the distance between the most recent global model computed by the server and the target/desired optimal model) depends on the greatest two singular values of the weighted adjacency matrices (and hence on the densities and degrees of digraph regularity) of the D2D clusters. We then derive tight bounds on these singular values in terms of the node degrees of the D2D clusters, and we use the resulting expressions to design our connectivity-aware learning algorithm. Simulations performed using real-world datasets and Random Direction Mobility Model (RDMM)-based time-varying D2D topologies reveal that our connectivity-aware algorithm significantly reduces the total communication energy required to reach a target accuracy level compared with baselines while achieving the accuracy level in nearly the same number of iterations as these baselines.
半分散联合学习将联合模型训练的传统设备到服务器(D2S)交互结构与本地化设备到设备(D2D)通信相结合。我们在边缘网络上研究了这一架构,该网络具有多个 D2D 集群,这些集群被建模为时变的有向通信图。我们的研究产生了两种算法:(a) 一种连接性感知学习算法,可控制模型训练过程的收敛速度与全局聚合所需的高能耗 D2S 传输数量之间的基本权衡;(b) 一种运动规划算法,可提高集群数字图的密度和规则性水平,从而进一步减少连接性感知学习中的 D2S 传输数量。具体来说,在我们的半去中心化方法中,基于加权平均的 D2D 更新被注入到基于列随机权重矩阵的联合平均框架中,该权重矩阵封装了集群内的连接性。为了开发我们的算法,我们展示了当前的预期最优差距(即服务器计算的最新全局模型与目标/期望最优模型之间的距离)如何取决于 D2D 群集的加权邻接矩阵的最大两个奇异值(因此也取决于密度和数图规则度)。然后,我们根据 D2D 簇的节点度推导出了这些奇异值的紧约束,并使用由此得到的表达式设计了我们的连接感知学习算法。使用真实世界数据集和基于随机方向移动模型(RDMM)的时变 D2D 拓扑进行的仿真表明,与基线算法相比,我们的连接性感知算法大大降低了达到目标准确度水平所需的总通信能量,同时在与这些基线算法几乎相同的迭代次数内达到了准确度水平。
{"title":"Energy-Efficient Connectivity-Aware Learning Over Time-Varying D2D Networks","authors":"Rohit Parasnis;Seyyedali Hosseinalipour;Yun-Wei Chu;Mung Chiang;Christopher G. Brinton","doi":"10.1109/JSTSP.2024.3374591","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3374591","url":null,"abstract":"Semi-decentralized federated learning blends the conventional device-to-server (D2S) interaction structure of federated model training with localized device-to-device (D2D) communications. We study this architecture over edge networks with multiple D2D clusters modeled as time-varying and directed communication graphs. Our investigation results in two algorithms: (a) a \u0000<italic>connectivity-aware</i>\u0000 learning algorithm that controls the fundamental trade-off between the convergence rate of the model training process and the number of energy-intensive D2S transmissions required for global aggregation, and (b) a \u0000<italic>motion-planning</i>\u0000 algorithm to enhance the densities and regularity levels of cluster digraphs so as to further reduce the number of D2S transmissions in connectivity-aware learning. Specifically, in our semi-decentralized methodology, weighted-averaging-based D2D updates are injected into the federated averaging framework based on column-stochastic weight matrices that encapsulate the connectivity within the clusters. To develop our algorithm, we show how the current expected optimality gap (i.e., the distance between the most recent global model computed by the server and the target/desired optimal model) depends on the greatest two singular values of the weighted adjacency matrices (and hence on the densities and degrees of digraph regularity) of the D2D clusters. We then derive tight bounds on these singular values in terms of the node degrees of the D2D clusters, and we use the resulting expressions to design our connectivity-aware learning algorithm. Simulations performed using real-world datasets and Random Direction Mobility Model (RDMM)-based time-varying D2D topologies reveal that our connectivity-aware algorithm significantly reduces the total communication energy required to reach a target accuracy level compared with baselines while achieving the accuracy level in nearly the same number of iterations as these baselines.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 2","pages":"242-258"},"PeriodicalIF":8.7,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141500324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How Does Promoting the Minority Fraction Affect Generalization? A Theoretical Study of One-Hidden-Layer Neural Network on Group Imbalance 提高少数群体比例对泛化有何影响?关于群体失衡的单隐藏层神经网络理论研究
IF 8.7 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-03-07 DOI: 10.1109/JSTSP.2024.3374593
Hongkang Li;Shuai Zhang;Yihua Zhang;Meng Wang;Sijia Liu;Pin-Yu Chen
Group imbalance has been a known problem in empirical risk minimization (ERM), where the achieved high average accuracy is accompanied by low accuracy in a minority group. Despite algorithmic efforts to improve the minority group accuracy, a theoretical generalization analysis of ERM on individual groups remains elusive. By formulating the group imbalance problem with the Gaussian Mixture Model, this paper quantifies the impact of individual groups on the sample complexity, the convergence rate, and the average and group-level testing performance. Although our theoretical framework is centered on binary classification using a one-hidden-layer neural network, to the best of our knowledge, we provide the first theoretical analysis of the group-level generalization of ERM in addition to the commonly studied average generalization performance. Sample insights of our theoretical results include that when all group-level co-variance is in the medium regime and all mean are close to zero, the learning performance is most desirable in the sense of a small sample complexity, a fast training rate, and a high average and group-level testing accuracy. Moreover, we show that increasing the fraction of the minority group in the training data does not necessarily improve the generalization performance of the minority group. Our theoretical results are validated on both synthetic and empirical datasets, such as CelebA and CIFAR-10 in image classification.
群体不平衡一直是经验风险最小化(ERM)中的一个已知问题,即在获得高平均准确率的同时,少数群体的准确率却很低。尽管在算法上努力提高少数群体的准确率,但在单个群体上对 ERM 的理论概括分析仍未实现。通过用高斯混杂模型提出组不平衡问题,本文量化了单个组对样本复杂度、收敛速度、平均测试性能和组级测试性能的影响。虽然我们的理论框架以使用单隐层神经网络的二元分类为中心,但据我们所知,除了通常研究的平均泛化性能外,我们还首次对 ERM 的组级泛化进行了理论分析。我们的理论结果的样本启示包括:当所有组级共变都处于中等水平且所有均值都接近于零时,学习性能最理想,即样本复杂度小、训练速度快、平均和组级测试精度高。此外,我们还证明,增加训练数据中少数群体的比例并不一定能提高少数群体的泛化性能。我们的理论结果在合成数据集和经验数据集上都得到了验证,如图像分类中的 CelebA 和 CIFAR-10。
{"title":"How Does Promoting the Minority Fraction Affect Generalization? A Theoretical Study of One-Hidden-Layer Neural Network on Group Imbalance","authors":"Hongkang Li;Shuai Zhang;Yihua Zhang;Meng Wang;Sijia Liu;Pin-Yu Chen","doi":"10.1109/JSTSP.2024.3374593","DOIUrl":"10.1109/JSTSP.2024.3374593","url":null,"abstract":"Group imbalance has been a known problem in empirical risk minimization (ERM), where the achieved high \u0000<italic>average</i>\u0000 accuracy is accompanied by low accuracy in a \u0000<italic>minority</i>\u0000 group. Despite algorithmic efforts to improve the minority group accuracy, a theoretical generalization analysis of ERM on individual groups remains elusive. By formulating the group imbalance problem with the Gaussian Mixture Model, this paper quantifies the impact of individual groups on the sample complexity, the convergence rate, and the average and group-level testing performance. Although our theoretical framework is centered on binary classification using a one-hidden-layer neural network, to the best of our knowledge, we provide the first theoretical analysis of the group-level generalization of ERM in addition to the commonly studied average generalization performance. Sample insights of our theoretical results include that when all group-level co-variance is in the medium regime and all mean are close to zero, the learning performance is most desirable in the sense of a small sample complexity, a fast training rate, and a high average and group-level testing accuracy. Moreover, we show that increasing the fraction of the minority group in the training data does not necessarily improve the generalization performance of the minority group. Our theoretical results are validated on both synthetic and empirical datasets, such as CelebA and CIFAR-10 in image classification.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 2","pages":"216-231"},"PeriodicalIF":8.7,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140249159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Journal of Selected Topics in Signal Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1