首页 > 最新文献

IEEE open journal of signal processing最新文献

英文 中文
Task Nuisance Filtration for Unsupervised Domain Adaptation 无监督域自适应的任务干扰过滤
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-01-30 DOI: 10.1109/OJSP.2025.3536850
David Uliel;Raja Giryes
In unsupervised domain adaptation (UDA) labeled data is available for one domain (Source Domain) which is generated according to some distribution, and unlabeled data is available for a second domain (Target Domain) which is generated from a possibly different distribution but has the same task. The goal is to learn a model that performs well on the target domain although labels are available only for the source data. Many recent works attempt to align the source and the target domains by matching their marginal distributions in a learned feature space. In this paper, we address the domain difference as a nuisance, and enables better adaptability of the domains, by encouraging minimality of the target domain representation, disentanglement of the features, and a smoother feature space that cluster better the target data. To this end, we use the information bottleneck theory and a classical technique from the blind source separation framework, namely, ICA (independent components analysis). We show that these concepts can improve performance of leading domain adaptation methods on various domain adaptation benchmarks.
在无监督域自适应(UDA)中,根据某种分布生成的一个域(源域)可以使用标记数据,而从可能不同的分布生成的另一个域(目标域)可以使用未标记数据。目标是学习一个在目标领域上表现良好的模型,尽管标签仅对源数据可用。最近的许多工作试图通过在学习的特征空间中匹配源域和目标域的边缘分布来对齐源域和目标域。在本文中,我们将领域差异视为一种麻烦,并通过鼓励目标领域表示的最小化,特征的解纠缠以及更平滑的特征空间来更好地聚类目标数据,从而实现更好的领域适应性。为此,我们使用了信息瓶颈理论和盲源分离框架中的经典技术,即ICA(独立分量分析)。我们证明了这些概念可以提高领先的领域自适应方法在各种领域自适应基准上的性能。
{"title":"Task Nuisance Filtration for Unsupervised Domain Adaptation","authors":"David Uliel;Raja Giryes","doi":"10.1109/OJSP.2025.3536850","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3536850","url":null,"abstract":"In unsupervised domain adaptation (UDA) labeled data is available for one domain (Source Domain) which is generated according to some distribution, and unlabeled data is available for a second domain (Target Domain) which is generated from a possibly different distribution but has the same task. The goal is to learn a model that performs well on the target domain although labels are available only for the source data. Many recent works attempt to align the source and the target domains by matching their marginal distributions in a learned feature space. In this paper, we address the domain difference as a nuisance, and enables better adaptability of the domains, by encouraging minimality of the target domain representation, disentanglement of the features, and a smoother feature space that cluster better the target data. To this end, we use the information bottleneck theory and a classical technique from the blind source separation framework, namely, ICA (independent components analysis). We show that these concepts can improve performance of leading domain adaptation methods on various domain adaptation benchmarks.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"303-311"},"PeriodicalIF":2.9,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10858365","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143489277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robustifying Routers Against Input Perturbations for Sparse Mixture-of-Experts Vision Transformers 稀疏混合专家视觉变压器对输入扰动的鲁棒增强
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-01-30 DOI: 10.1109/OJSP.2025.3536853
Masahiro Kada;Ryota Yoshihashi;Satoshi Ikehata;Rei Kawakami;Ikuro Sato
Mixture of experts with a sparse expert selection rule has been gaining much attention recently because of its scalability without compromising inference time. However, unlike standard neural networks, sparse mixture-of-experts models inherently exhibit discontinuities in the output space, which may impede the acquisition of appropriate invariance to the input perturbations, leading to a deterioration of model performance for tasks such as classification. To address this issue, we propose Pairwise Router Consistency (PRC) that effectively penalizes the discontinuities occurring under natural deformations of input images. With the supervised loss, the use of PRC loss empirically improves classification accuracy on ImageNet-1 K, CIFAR-10, and CIFAR-100 datasets, compared to a baseline method. Notably, our method with 1-expert selection slightly outperforms the baseline method using 2-expert selection. We also confirmed that models trained with our method experience discontinuous changes less frequently under input perturbations.
基于稀疏专家选择规则的混合专家算法由于其不影响推理时间的可扩展性而受到了广泛的关注。然而,与标准神经网络不同,稀疏混合专家模型在输出空间中固有地表现出不连续,这可能会阻碍对输入扰动的适当不变性的获取,从而导致模型在分类等任务中的性能下降。为了解决这个问题,我们提出了配对路由器一致性(Pairwise Router Consistency, PRC),它可以有效地惩罚输入图像自然变形下出现的不连续。使用监督损失,与基线方法相比,使用PRC损失经验地提高了imagenet - 1k、CIFAR-10和CIFAR-100数据集的分类精度。值得注意的是,我们使用1位专家选择的方法略微优于使用2位专家选择的基线方法。我们还证实,用我们的方法训练的模型在输入扰动下经历不连续变化的频率较低。
{"title":"Robustifying Routers Against Input Perturbations for Sparse Mixture-of-Experts Vision Transformers","authors":"Masahiro Kada;Ryota Yoshihashi;Satoshi Ikehata;Rei Kawakami;Ikuro Sato","doi":"10.1109/OJSP.2025.3536853","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3536853","url":null,"abstract":"Mixture of experts with a sparse expert selection rule has been gaining much attention recently because of its scalability without compromising inference time. However, unlike standard neural networks, sparse mixture-of-experts models inherently exhibit discontinuities in the output space, which may impede the acquisition of appropriate invariance to the input perturbations, leading to a deterioration of model performance for tasks such as classification. To address this issue, we propose Pairwise Router Consistency (PRC) that effectively penalizes the discontinuities occurring under natural deformations of input images. With the supervised loss, the use of PRC loss empirically improves classification accuracy on ImageNet-1 K, CIFAR-10, and CIFAR-100 datasets, compared to a baseline method. Notably, our method with 1-expert selection slightly outperforms the baseline method using 2-expert selection. We also confirmed that models trained with our method experience discontinuous changes less frequently under input perturbations.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"276-283"},"PeriodicalIF":2.9,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10858379","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143465817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SMITIN: Self-Monitored Inference-Time INtervention for Generative Music Transformers 生成音乐变形器的自我监控推理时间干预
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-01-28 DOI: 10.1109/OJSP.2025.3534686
Junghyun Koo;Gordon Wichern;François G. Germain;Sameer Khurana;Jonathan Le Roux
We introduce Self-Monitored Inference-Time INtervention (SMITIN), an approach for controlling an autoregressive generative music transformer using classifier probes. These simple logistic regression probes are trained on the output of each attention head in the transformer using a small dataset of audio examples both exhibiting and missing a specific musical trait (e.g., the presence/absence of drums, or real/synthetic music). We then steer the attention heads in the probe direction, ensuring the generative model output captures the desired musical trait. Additionally, we monitor the probe output to avoid adding an excessive amount of intervention into the autoregressive generation, which could lead to temporally incoherent music. We validate our results objectively and subjectively for both audio continuation and text-to-music applications, demonstrating the ability to add controls to large generative models for which retraining or even fine-tuning is impractical for most musicians. Audio samples of the proposed intervention approach are available on our demo page.
我们介绍了自我监控推理时间干预(SMITIN),一种使用分类器探针控制自回归生成音乐转换器的方法。这些简单的逻辑回归探针使用展示和缺少特定音乐特征(例如,鼓的存在/缺失,或真实/合成音乐)的音频示例的小数据集在变压器中的每个注意力头部的输出上进行训练。然后,我们将注意力转向探针方向,确保生成模型输出捕获所需的音乐特征。此外,我们监控探头输出,以避免在自回归生成中添加过多的干预,这可能导致暂时不连贯的音乐。我们客观和主观地验证了音频延续和文本到音乐应用程序的结果,展示了将控制添加到大型生成模型的能力,对于大多数音乐家来说,重新训练甚至微调都是不切实际的。建议的干预方法的音频样本可以在我们的演示页面上找到。
{"title":"SMITIN: Self-Monitored Inference-Time INtervention for Generative Music Transformers","authors":"Junghyun Koo;Gordon Wichern;François G. Germain;Sameer Khurana;Jonathan Le Roux","doi":"10.1109/OJSP.2025.3534686","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3534686","url":null,"abstract":"We introduce Self-Monitored Inference-Time INtervention (SMITIN), an approach for controlling an autoregressive generative music transformer using classifier probes. These simple logistic regression probes are trained on the output of each attention head in the transformer using a small dataset of audio examples both exhibiting and missing a specific musical trait (e.g., the presence/absence of drums, or real/synthetic music). We then steer the attention heads in the probe direction, ensuring the generative model output captures the desired musical trait. Additionally, we monitor the probe output to avoid adding an excessive amount of intervention into the autoregressive generation, which could lead to temporally incoherent music. We validate our results objectively and subjectively for both audio continuation and text-to-music applications, demonstrating the ability to add controls to large generative models for which retraining or even fine-tuning is impractical for most musicians. Audio samples of the proposed intervention approach are available on our <underline>demo page</u>.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"266-275"},"PeriodicalIF":2.9,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10856829","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143465815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Auditory EEG Decoding Challenge for ICASSP 2024 ICASSP 2024听觉脑电图解码挑战
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-01-27 DOI: 10.1109/OJSP.2025.3534122
Lies Bollens;Corentin Puffay;Bernd Accou;Jonas Vanthornhout;Hugo Van Hamme;Tom Francart
This paper describes the auditory EEG challenge, organized as one of the Signal Processing Grand Challenges at ICASSP 2024. The challenge provides electroencephalogram (EEG) recordings of 105 subjects who listened to continuous speech, as audiobooks or podcasts, while their brain activity was recorded. The challenge consists of two tasks that relate EEG signals to the presented speech stimulus. The first task, called match-mismatch, is to determine which of five speech segments induced a given EEG segment. The second task, called regression, is to reconstruct the Mel spectrogram from the EEG. EEG recordings of 85 subjects were provided as a training set so that challenge participants could train their models on a relatively large dataset. The remaining 20 subjects were used as held-out subjects for the evaluation step of the challenge.
本文描述了听觉脑电图挑战,作为ICASSP 2024年信号处理大挑战之一组织。这项挑战提供了105名受试者的脑电图(EEG)记录,这些受试者连续听有声读物或播客,同时记录他们的大脑活动。挑战包括两个任务,将脑电图信号与呈现的语音刺激联系起来。第一个任务,称为匹配不匹配,是确定五个语音片段中哪一个诱发了给定的EEG片段。第二个任务,称为回归,是从脑电图中重建Mel谱图。85名受试者的脑电图记录作为训练集提供,以便挑战参与者可以在相对较大的数据集上训练他们的模型。剩下的20名被试被用作测试的评估步骤。
{"title":"Auditory EEG Decoding Challenge for ICASSP 2024","authors":"Lies Bollens;Corentin Puffay;Bernd Accou;Jonas Vanthornhout;Hugo Van Hamme;Tom Francart","doi":"10.1109/OJSP.2025.3534122","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3534122","url":null,"abstract":"This paper describes the auditory EEG challenge, organized as one of the Signal Processing Grand Challenges at ICASSP 2024. The challenge provides electroencephalogram (EEG) recordings of 105 subjects who listened to continuous speech, as audiobooks or podcasts, while their brain activity was recorded. The challenge consists of two tasks that relate EEG signals to the presented speech stimulus. The first task, called match-mismatch, is to determine which of five speech segments induced a given EEG segment. The second task, called regression, is to reconstruct the Mel spectrogram from the EEG. EEG recordings of 85 subjects were provided as a training set so that challenge participants could train their models on a relatively large dataset. The remaining 20 subjects were used as held-out subjects for the evaluation step of the challenge.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"478-488"},"PeriodicalIF":2.9,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10854651","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Online Learning of Expanding Graphs 展开图的在线学习
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-01-27 DOI: 10.1109/OJSP.2025.3534692
Samuel Rey;Bishwadeep Das;Elvin Isufi
This paper addresses the problem of online network topology inference for expanding graphs from a stream of spatiotemporal signals. Online algorithms for dynamic graph learning are crucial in delay-sensitive applications or when changes in topology occur rapidly. While existing works focus on inferring the connectivity within a fixed set of nodes, in practice, the graph can grow as new nodes join the network. This poses additional challenges like modeling temporal dynamics involving signals and graphs of different sizes. This growth also increases the computational complexity of the learning process, which may become prohibitive. To the best of our knowledge, this is the first work to tackle this setting. We propose a general online algorithm based on projected proximal gradient descent that accounts for the increasing graph size at each iteration. Recursively updating the sample covariance matrix is a key aspect of our approach. We introduce a strategy that enables different types of updates for nodes that just joined the network and for previously existing nodes. To provide further insights into the proposed method, we specialize it in Gaussian Markov random field settings, where we analyze the computational complexity and characterize the dynamic cumulative regret. Finally, we demonstrate the effectiveness of the proposed approach using both controlled experiments and real-world datasets from epidemic and financial networks.
本文研究了从时空信号流中展开图的在线网络拓扑推理问题。动态图学习的在线算法在延迟敏感应用或拓扑变化迅速发生时至关重要。虽然现有的工作集中在推断一组固定节点内的连通性,但在实践中,图可以随着新节点加入网络而增长。这带来了额外的挑战,比如建模涉及不同大小的信号和图形的时间动态。这种增长也增加了学习过程的计算复杂性,这可能会变得令人望而却步。据我们所知,这是第一个解决这个问题的工作。我们提出了一种基于投影近端梯度下降的通用在线算法,该算法考虑了每次迭代时图大小的增加。递归地更新样本协方差矩阵是我们方法的一个关键方面。我们引入了一种策略,可以对刚刚加入网络的节点和先前存在的节点进行不同类型的更新。为了进一步深入了解所提出的方法,我们将其专门用于高斯马尔可夫随机场设置,其中我们分析了计算复杂性并表征了动态累积后悔。最后,我们使用控制实验和来自流行病和金融网络的真实数据集证明了所提出方法的有效性。
{"title":"Online Learning of Expanding Graphs","authors":"Samuel Rey;Bishwadeep Das;Elvin Isufi","doi":"10.1109/OJSP.2025.3534692","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3534692","url":null,"abstract":"This paper addresses the problem of online network topology inference for expanding graphs from a stream of spatiotemporal signals. Online algorithms for dynamic graph learning are crucial in delay-sensitive applications or when changes in topology occur rapidly. While existing works focus on inferring the connectivity within a fixed set of nodes, in practice, the graph can grow as new nodes join the network. This poses additional challenges like modeling temporal dynamics involving signals and graphs of different sizes. This growth also increases the computational complexity of the learning process, which may become prohibitive. To the best of our knowledge, this is the first work to tackle this setting. We propose a general online algorithm based on projected proximal gradient descent that accounts for the increasing graph size at each iteration. Recursively updating the sample covariance matrix is a key aspect of our approach. We introduce a strategy that enables different types of updates for nodes that just joined the network and for previously existing nodes. To provide further insights into the proposed method, we specialize it in Gaussian Markov random field settings, where we analyze the computational complexity and characterize the dynamic cumulative regret. Finally, we demonstrate the effectiveness of the proposed approach using both controlled experiments and real-world datasets from epidemic and financial networks.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"247-255"},"PeriodicalIF":2.9,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10854617","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143471129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Non-Gaussian Process Dynamical Models 非高斯过程动力学模型
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-01-27 DOI: 10.1109/OJSP.2025.3534690
Yaman Kındap;Simon Godsill
Probabilistic dynamical models used in applications in tracking and prediction are typically assumed to be Gaussian noise driven motions since well-known inference algorithms can be applied to these models. However, in many real world examples deviations from Gaussianity are expected to appear, e.g., rapid changes in speed or direction, which cannot be reflected using processes with a smooth mean response. In this work, we introduce the non-Gaussian process (NGP) dynamical model which allow for straightforward modelling of heavy-tailed, non-Gaussian behaviours while retaining a tractable conditional Gaussian process (GP) structure through an infinite mixture of non-homogeneous GPs representation. We present two novel inference methodologies for these new models based on the conditionally Gaussian formulation of NGPs which are suitable for both MCMC and marginalised particle filtering algorithms. The results are demonstrated on synthetically generated data sets.
在跟踪和预测应用中使用的概率动态模型通常被假设为高斯噪声驱动的运动,因为众所周知的推理算法可以应用于这些模型。然而,在许多现实世界的例子中,预计会出现偏离高斯性的情况,例如,速度或方向的快速变化,这不能用具有平滑平均响应的过程来反映。在这项工作中,我们引入了非高斯过程(NGP)动态模型,该模型允许直接建模重尾,非高斯行为,同时通过非齐次GP表示的无限混合保留可处理的条件高斯过程(GP)结构。基于ngp的条件高斯公式,我们提出了两种适用于MCMC和边缘粒子滤波算法的新模型推理方法。结果在综合生成的数据集上得到了验证。
{"title":"Non-Gaussian Process Dynamical Models","authors":"Yaman Kındap;Simon Godsill","doi":"10.1109/OJSP.2025.3534690","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3534690","url":null,"abstract":"Probabilistic dynamical models used in applications in tracking and prediction are typically assumed to be Gaussian noise driven motions since well-known inference algorithms can be applied to these models. However, in many real world examples deviations from Gaussianity are expected to appear, e.g., rapid changes in speed or direction, which cannot be reflected using processes with a smooth mean response. In this work, we introduce the non-Gaussian process (NGP) dynamical model which allow for straightforward modelling of heavy-tailed, non-Gaussian behaviours while retaining a tractable conditional Gaussian process (GP) structure through an infinite mixture of non-homogeneous GPs representation. We present two novel inference methodologies for these new models based on the conditionally Gaussian formulation of NGPs which are suitable for both MCMC and marginalised particle filtering algorithms. The results are demonstrated on synthetically generated data sets.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"213-221"},"PeriodicalIF":2.9,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10854574","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143455222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Moving Object Segmentation in LiDAR Point Clouds Using Minimal Number of Sweeps 使用最小扫描次数的激光雷达点云中有效的运动目标分割
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-01-20 DOI: 10.1109/OJSP.2025.3532199
Zoltan Rozsa;Akos Madaras;Tamas Sziranyi
LiDAR point clouds are a rich source of information for autonomous vehicles and ADAS systems. However, they can be challenging to segment for moving objects as - among other things - finding correspondences between sparse point clouds of consecutive frames is difficult. Traditional methods rely on a (global or local) map of the environment, which can be demanding to acquire and maintain in real-world conditions and the presence of the moving objects themselves. This paper proposes a novel approach using as minimal sweeps as possible to decrease the computational burden and achieve mapless moving object segmentation (MOS) in LiDAR point clouds. Our approach is based on a multimodal learning model with single-modal inference. The model is trained on a dataset of LiDAR point clouds and related camera images. The model learns to associate features from the two modalities, allowing it to predict dynamic objects even in the absence of a map and the camera modality. We propose semantic information usage for multi-frame instance segmentation in order to enhance performance measures. We evaluate our approach to the SemanticKITTI and Apollo real-world autonomous driving datasets. Our results show that our approach can achieve state-of-the-art performance on moving object segmentation and utilize only a few (even one) LiDAR frames.
激光雷达点云是自动驾驶汽车和ADAS系统的丰富信息源。然而,对于运动物体的分割是具有挑战性的,因为在连续帧的稀疏点云之间找到对应关系是困难的。传统方法依赖于环境的(全局或局部)地图,这可能要求在现实世界条件下获取和维护移动对象本身的存在。本文提出了一种利用尽可能少的扫描来减少计算量并实现激光雷达点云无映射运动目标分割的新方法。我们的方法是基于具有单模态推理的多模态学习模型。该模型在激光雷达点云和相关相机图像数据集上进行训练。该模型学习从两种模态中关联特征,使其能够在没有地图和相机模态的情况下预测动态物体。我们提出了语义信息用于多帧实例分割,以提高性能指标。我们评估了SemanticKITTI和Apollo真实世界自动驾驶数据集的方法。我们的研究结果表明,我们的方法可以在运动目标分割方面实现最先进的性能,并且只使用少数(甚至一个)激光雷达帧。
{"title":"Efficient Moving Object Segmentation in LiDAR Point Clouds Using Minimal Number of Sweeps","authors":"Zoltan Rozsa;Akos Madaras;Tamas Sziranyi","doi":"10.1109/OJSP.2025.3532199","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3532199","url":null,"abstract":"LiDAR point clouds are a rich source of information for autonomous vehicles and ADAS systems. However, they can be challenging to segment for moving objects as - among other things - finding correspondences between sparse point clouds of consecutive frames is difficult. Traditional methods rely on a (global or local) map of the environment, which can be demanding to acquire and maintain in real-world conditions and the presence of the moving objects themselves. This paper proposes a novel approach using as minimal sweeps as possible to decrease the computational burden and achieve mapless moving object segmentation (MOS) in LiDAR point clouds. Our approach is based on a multimodal learning model with single-modal inference. The model is trained on a dataset of LiDAR point clouds and related camera images. The model learns to associate features from the two modalities, allowing it to predict dynamic objects even in the absence of a map and the camera modality. We propose semantic information usage for multi-frame instance segmentation in order to enhance performance measures. We evaluate our approach to the SemanticKITTI and Apollo real-world autonomous driving datasets. Our results show that our approach can achieve state-of-the-art performance on moving object segmentation and utilize only a few (even one) LiDAR frames.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"118-128"},"PeriodicalIF":2.9,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10848132","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143379492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LIMMITS'24: Multi-Speaker, Multi-Lingual INDIC TTS With Voice Cloning 限制'24:多扬声器,多语言印度TTS与语音克隆
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-01-20 DOI: 10.1109/OJSP.2025.3531782
Sathvik Udupa;Jesuraja Bandekar;Abhayjeet Singh;Deekshitha G;Saurabh Kumar;Sandhya Badiger;Amala Nagireddi;Roopa R;Prasanta Kumar Ghosh;Hema A. Murthy;Pranaw Kumar;Keiichi Tokuda;Mark Hasegawa-Johnson;Philipp Olbrich
The Multi-speaker, Multi-lingual Indic Text to Speech (TTS) with voice cloning (LIMMITS'24) challenge is organized as part of the ICASSP 2024 signal processing grand challenge. LIMMITS'24 aims at the development of voice cloning for the multi-speaker, multi-lingual Text-to-Speech (TTS) model. Towards this, 80 hours of TTS data has been released in each of Bengali, Chhattisgarhi, English (Indian), and Kannada languages. This is in addition to Telugu, Hindi, and Marathi data released during the LIMMITS'23 challenge. The challenge encourages the advancement of TTS in Indian Languages as well as the development of multi-speaker voice cloning techniques for TTS. The three tracks of LIMMITS'24 have provided an opportunity for various researchers and practitioners around the world to explore the state of the art in research for voice cloning with TTS.
具有语音克隆的多扬声器,多语言印度文本到语音(TTS) (LIMMITS’24)挑战是ICASSP 2024信号处理大挑战的一部分。LIMMITS’24旨在为多说话者、多语言文本到语音(TTS)模型开发语音克隆。为此,用孟加拉语、恰蒂斯加尔语、英语(印度语)和卡纳达语分别发布了80小时的TTS数据。这是在LIMMITS 23挑战赛期间发布的泰卢固语、印地语和马拉地语数据之外的数据。这项挑战鼓励了印度语言的TTS的进步,以及为TTS开发多说话人语音克隆技术。LIMMITS’24的三个轨道为世界各地的研究人员和实践者提供了一个机会,探索使用TTS进行语音克隆研究的最新技术。
{"title":"LIMMITS'24: Multi-Speaker, Multi-Lingual INDIC TTS With Voice Cloning","authors":"Sathvik Udupa;Jesuraja Bandekar;Abhayjeet Singh;Deekshitha G;Saurabh Kumar;Sandhya Badiger;Amala Nagireddi;Roopa R;Prasanta Kumar Ghosh;Hema A. Murthy;Pranaw Kumar;Keiichi Tokuda;Mark Hasegawa-Johnson;Philipp Olbrich","doi":"10.1109/OJSP.2025.3531782","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3531782","url":null,"abstract":"The Multi-speaker, Multi-lingual Indic Text to Speech (TTS) with voice cloning (LIMMITS'24) challenge is organized as part of the ICASSP 2024 signal processing grand challenge. LIMMITS'24 aims at the development of voice cloning for the multi-speaker, multi-lingual Text-to-Speech (TTS) model. Towards this, 80 hours of TTS data has been released in each of Bengali, Chhattisgarhi, English (Indian), and Kannada languages. This is in addition to Telugu, Hindi, and Marathi data released during the LIMMITS'23 challenge. The challenge encourages the advancement of TTS in Indian Languages as well as the development of multi-speaker voice cloning techniques for TTS. The three tracks of LIMMITS'24 have provided an opportunity for various researchers and practitioners around the world to explore the state of the art in research for voice cloning with TTS.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"293-302"},"PeriodicalIF":2.9,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10845816","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143489278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Posterior-Based Analysis of Spatio-Temporal Features for Sign Language Assessment 基于后验的手语评价时空特征分析
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-01-17 DOI: 10.1109/OJSP.2025.3531781
Neha Tarigopula;Sandrine Tornay;Ozge Mercanoglu Sincan;Richard Bowden;Mathew Magimai.-Doss
Sign Language conveys information through multiple channels composed of manual (handshape, hand movement) and non-manual (facial expression, mouthing, body posture) components. Sign language assessment involves giving granular feedback to a learner, in terms of correctness of the manual and non-manual components, aiding the learner's progress. Existing methods rely on handcrafted skeleton-based features for hand movement within a KL-HMM framework to identify errors in manual components. However, modern deep learning models offer powerful spatio-temporal representations for videos to represent hand movement and facial expressions. Despite their success in classification tasks, these representations often struggle to attribute errors to specific sources, such as incorrect handshape, improper movement, or incorrect facial expressions. To address this limitation, we leverage and analyze the spatio-temporal representations from Inflated 3D Convolutional Networks (I3D) and integrate them into the KL-HMM framework to assess sign language videos on both manual and non-manual components. By applying masking and cropping techniques, we isolate and evaluate distinct channels of hand movement, and facial expressions using the I3D model and handshape using the CNN-based model. Our approach outperforms traditional methods based on handcrafted features, as validated through experiments on the SMILE-DSGS dataset, and therefore demonstrates that it can enhance the effectiveness of sign language learning tools.
手语通过多种渠道传递信息,这些渠道由手势(手形、手部动作)和非手势(面部表情、口齿、身体姿势)组成。手语评估包括根据手动和非手动组件的正确性向学习者提供细粒度的反馈,以帮助学习者的进步。现有的方法依赖于在KL-HMM框架内手工制作的基于骨架的手部运动特征来识别手动组件中的错误。然而,现代深度学习模型为视频提供了强大的时空表征,以表示手部运动和面部表情。尽管它们在分类任务中取得了成功,但这些表示通常难以将错误归因于特定的来源,例如不正确的手形、不正确的动作或不正确的面部表情。为了解决这一限制,我们利用并分析了来自膨胀3D卷积网络(I3D)的时空表征,并将其集成到KL-HMM框架中,以评估手动和非手动组件上的手语视频。通过应用掩蔽和裁剪技术,我们使用I3D模型分离和评估手部运动和面部表情的不同通道,使用基于cnn的模型分离和评估手部形状。我们的方法优于传统的基于手工特征的方法,并通过SMILE-DSGS数据集的实验验证了这一点,因此表明它可以提高手语学习工具的有效性。
{"title":"Posterior-Based Analysis of Spatio-Temporal Features for Sign Language Assessment","authors":"Neha Tarigopula;Sandrine Tornay;Ozge Mercanoglu Sincan;Richard Bowden;Mathew Magimai.-Doss","doi":"10.1109/OJSP.2025.3531781","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3531781","url":null,"abstract":"Sign Language conveys information through multiple channels composed of manual (handshape, hand movement) and non-manual (facial expression, mouthing, body posture) components. Sign language assessment involves giving granular feedback to a learner, in terms of correctness of the manual and non-manual components, aiding the learner's progress. Existing methods rely on handcrafted skeleton-based features for hand movement within a KL-HMM framework to identify errors in manual components. However, modern deep learning models offer powerful spatio-temporal representations for videos to represent hand movement and facial expressions. Despite their success in classification tasks, these representations often struggle to attribute errors to specific sources, such as incorrect handshape, improper movement, or incorrect facial expressions. To address this limitation, we leverage and analyze the spatio-temporal representations from Inflated 3D Convolutional Networks (I3D) and integrate them into the KL-HMM framework to assess sign language videos on both manual and non-manual components. By applying masking and cropping techniques, we isolate and evaluate distinct channels of hand movement, and facial expressions using the I3D model and handshape using the CNN-based model. Our approach outperforms traditional methods based on handcrafted features, as validated through experiments on the SMILE-DSGS dataset, and therefore demonstrates that it can enhance the effectiveness of sign language learning tools.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"284-292"},"PeriodicalIF":2.9,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10845152","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143465818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction to “Energy Efficient Signal Detection Using SPRT and Ordered Transmissions in Wireless Sensor Networks” 修正“无线传感器网络中使用SPRT和有序传输的节能信号检测”
IF 2.9 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-01-17 DOI: 10.1109/OJSP.2024.3519916
Shailee Yagnik;Ramanarayanan Viswanathan;Lei Cao
In [1, p. 1124], a footnote is needed on (13) as shown below: begin{equation*}qquadqquadquad{{alpha }^# } < left( {1 - {{c}_1}} right)alpha + left( {1 - left( {1 - {{c}_1}} right)alpha } right)alphaqquadqquadquad hbox{(13)$^{1}$} end{equation*}
在[1,p. 1124]中,需要对(13)作如下脚注: begin{equation*}qquadqquadquad{{alpha }^# } < left( {1 - {{c}_1}} right)alpha + left( {1 - left( {1 - {{c}_1}} right)alpha } right)alphaqquadqquadquad hbox{(13)$^{1}$} end{equation*}
{"title":"Correction to “Energy Efficient Signal Detection Using SPRT and Ordered Transmissions in Wireless Sensor Networks”","authors":"Shailee Yagnik;Ramanarayanan Viswanathan;Lei Cao","doi":"10.1109/OJSP.2024.3519916","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3519916","url":null,"abstract":"In [1, p. 1124], a footnote is needed on (13) as shown below: begin{equation*}qquadqquadquad{{alpha }^# } < left( {1 - {{c}_1}} right)alpha + left( {1 - left( {1 - {{c}_1}} right)alpha } right)alphaqquadqquadquad hbox{(13)$^{1}$} end{equation*}","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"16-16"},"PeriodicalIF":2.9,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10845022","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE open journal of signal processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1