首页 > 最新文献

IEEE transactions on neural networks and learning systems最新文献

英文 中文
Topology-Preserving Deep Hashing for Ultrafast Drone-Dominated Object Detection. 超高速无人机控制目标检测的拓扑保持深度哈希算法。
IF 8.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-05-04 DOI: 10.1109/TNNLS.2026.3686846
Luming Zhang, Guifeng Wang, Zhiming Wang, Ling Shao

Drone (or unmanned aerial vehicle) has been extensively applied in many modern artificial intelligence systems in the past decade. In this work, we propose a novel deep hashing framework that can detect objects from drone-captured pictures extremely fast. Our method can intrinsically and flexibly encode various topological structures from each target object, based on which multiscale objects can be discovered in a view- and altitude-invariant way. Moreover, by leveraging $l_{F}$ and $l_{1}$ norms collaboratively, the calculated hash codes are robust to low-quality drone pictures and possibly contaminated semantic labels. More specifically, for each drone picture, we extract visually/semantically salient object parts inside it. To characterize their topological structure, we construct a graphlet by linking the spatially adjacent object patches into a small graph. Subsequently, a binary matrix factorization (MF) is designed to hierarchically exploit the semantics of these graphlets, wherein three attributes: 1) deep binary hash codes learning; 2) contaminated pictures/labels denoising; and 3) adaptive data graph updating are seamlessly incorporated. Accordingly, a manifold-regularized feature selector is adopted to further obtain more discriminative deep hash codes. Finally, the selected hash codes corresponding to graphlets within each drone photograph are utilized for ranking-based object discovery. Comprehensive experiments on the DAC-SDC, MOHR, and our self-compiled dataset have demonstrated the competitive speed and accuracy of our method.

近十年来,无人机在许多现代人工智能系统中得到了广泛的应用。在这项工作中,我们提出了一种新的深度哈希框架,可以非常快速地从无人机捕获的图像中检测物体。该方法能够对每个目标物体的各种拓扑结构进行内在、灵活的编码,从而实现对多尺度目标的视图和高度不变的发现。此外,通过协同利用$l_{F}$和$l_{1}$规范,计算出的哈希码对低质量无人机图片和可能受污染的语义标签具有鲁棒性。更具体地说,对于每张无人机图片,我们提取视觉/语义上突出的物体部分。为了表征它们的拓扑结构,我们通过将空间相邻的对象块连接成一个小图来构造一个小图。随后,设计了二进制矩阵分解(MF)来分层地利用这些graphlet的语义,其中有三个属性:1)深度二进制哈希码学习;2)污染图片/标签去噪;3)无缝集成自适应数据图更新。在此基础上,采用流形正则化特征选择器进一步获得更多的判别深度哈希码。最后,利用每张无人机照片中对应graphlet的选定哈希码进行基于排名的对象发现。在DAC-SDC、MOHR和我们自己编写的数据集上的综合实验证明了我们的方法具有竞争力的速度和准确性。
{"title":"Topology-Preserving Deep Hashing for Ultrafast Drone-Dominated Object Detection.","authors":"Luming Zhang, Guifeng Wang, Zhiming Wang, Ling Shao","doi":"10.1109/TNNLS.2026.3686846","DOIUrl":"https://doi.org/10.1109/TNNLS.2026.3686846","url":null,"abstract":"<p><p>Drone (or unmanned aerial vehicle) has been extensively applied in many modern artificial intelligence systems in the past decade. In this work, we propose a novel deep hashing framework that can detect objects from drone-captured pictures extremely fast. Our method can intrinsically and flexibly encode various topological structures from each target object, based on which multiscale objects can be discovered in a view- and altitude-invariant way. Moreover, by leveraging $l_{F}$ and $l_{1}$ norms collaboratively, the calculated hash codes are robust to low-quality drone pictures and possibly contaminated semantic labels. More specifically, for each drone picture, we extract visually/semantically salient object parts inside it. To characterize their topological structure, we construct a graphlet by linking the spatially adjacent object patches into a small graph. Subsequently, a binary matrix factorization (MF) is designed to hierarchically exploit the semantics of these graphlets, wherein three attributes: 1) deep binary hash codes learning; 2) contaminated pictures/labels denoising; and 3) adaptive data graph updating are seamlessly incorporated. Accordingly, a manifold-regularized feature selector is adopted to further obtain more discriminative deep hash codes. Finally, the selected hash codes corresponding to graphlets within each drone photograph are utilized for ranking-based object discovery. Comprehensive experiments on the DAC-SDC, MOHR, and our self-compiled dataset have demonstrated the competitive speed and accuracy of our method.</p>","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":""},"PeriodicalIF":8.9,"publicationDate":"2026-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147837362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MoFTSS: Motion Generation With Frequency and Text State Space Models. MoFTSS:运动生成与频率和文本状态空间模型。
IF 8.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-05-01 DOI: 10.1109/TNNLS.2026.3683909
Chengjian Li, Xiangbo Shu, Qiongjie Cui, Haifeng Xia, Yazhou Yao, Jinhui Tang

Text-driven diffusion models have achieved remarkable performance in human motion generation. However, these generative works struggle to generate high-quality motion consistent with textual descriptions. The primary reasons are: 1) insufficient fine-grained motion modeling due to the motion representations being difficult to distinguish in latent diffusion; and 2) inconsistencies between motions and textual descriptions due to misalignment in the multimodal space. To overcome these limitations, this work proposes the Motion generation with Frequency and Text State Space models (MoFTSS) including two main modules: frequency state space model (FreqSSM) and text state space model (TextSSM). Specifically, FreqSSM derives fine-grained representations by decomposing sequences into low-frequency and high-frequency components. This allows it to guide the generation of static poses (e.g., sitting, lying) and fine-grained motions (e.g, transitions, stumbling). For consistency between text and motion, TextSSM treats text features as a semantic modulation term within the SSM, enabling dynamic filtering of motion features consistent with textual semantics. Extensive experiments suggest that our MoFTSS achieves superior performance on the text-to-motion generation task. Notably, it attains the lowest FID of 0.181 on the HumanML3D dataset, significantly lower than the 0.421 achieved by MLD.

文本驱动扩散模型在人体运动生成中取得了显著的成绩。然而,这些生成的作品很难产生与文本描述一致的高质量运动。主要原因是:1)由于运动表征在潜在扩散中难以区分,导致细粒度运动建模不足;2)由于多模态空间的错位导致动作与文本描述不一致。为了克服这些限制,本工作提出了频率和文本状态空间模型(MoFTSS)的运动生成,包括两个主要模块:频率状态空间模型(FreqSSM)和文本状态空间模型(TextSSM)。具体来说,FreqSSM通过将序列分解为低频和高频组件来派生细粒度表示。这使得它可以引导静态姿势(例如,坐着,躺着)和细粒度动作(例如,过渡,绊倒)的生成。为了保持文本和运动之间的一致性,TextSSM将文本特征视为SSM中的语义调制项,从而支持对与文本语义一致的运动特征进行动态过滤。大量的实验表明,我们的MoFTSS在文本到动作生成任务上取得了优异的性能。值得注意的是,它在HumanML3D数据集上的FID最低,为0.181,显著低于MLD的0.421。
{"title":"MoFTSS: Motion Generation With Frequency and Text State Space Models.","authors":"Chengjian Li, Xiangbo Shu, Qiongjie Cui, Haifeng Xia, Yazhou Yao, Jinhui Tang","doi":"10.1109/TNNLS.2026.3683909","DOIUrl":"https://doi.org/10.1109/TNNLS.2026.3683909","url":null,"abstract":"<p><p>Text-driven diffusion models have achieved remarkable performance in human motion generation. However, these generative works struggle to generate high-quality motion consistent with textual descriptions. The primary reasons are: 1) insufficient fine-grained motion modeling due to the motion representations being difficult to distinguish in latent diffusion; and 2) inconsistencies between motions and textual descriptions due to misalignment in the multimodal space. To overcome these limitations, this work proposes the Motion generation with Frequency and Text State Space models (MoFTSS) including two main modules: frequency state space model (FreqSSM) and text state space model (TextSSM). Specifically, FreqSSM derives fine-grained representations by decomposing sequences into low-frequency and high-frequency components. This allows it to guide the generation of static poses (e.g., sitting, lying) and fine-grained motions (e.g, transitions, stumbling). For consistency between text and motion, TextSSM treats text features as a semantic modulation term within the SSM, enabling dynamic filtering of motion features consistent with textual semantics. Extensive experiments suggest that our MoFTSS achieves superior performance on the text-to-motion generation task. Notably, it attains the lowest FID of 0.181 on the HumanML3D dataset, significantly lower than the 0.421 achieved by MLD.</p>","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":""},"PeriodicalIF":8.9,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147814021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Reinforcement Learning-Based Optimization of Identical-Dual-Band Filters. 基于深度强化学习的同双带滤波器优化。
IF 8.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-05-01 DOI: 10.1109/TNNLS.2026.3684954
Ehsan Adibnia

Designing identical dual-band optical filters remains a complex optimization challenge in photonics and optical communication systems. Conventional methods, which rely on iterative electromagnetic simulations or analytical approximations, often suffer from limited generalizability and high computational costs. In this work, we propose a deep reinforcement learning (RL) framework for the autonomous optimization of identical dual-band fiber Bragg grating (FBG) filters. A policy network based on a three-layer fully connected neural architecture is trained using a proximal policy optimization algorithm to minimize the full width at half maximum (FWHM) of both transmission bands while maintaining spectral symmetry and identical channel characteristics. The deep RL-based design achieves a 43% reduction in FWHM and a 49% reduction in grating length compared to baseline designs, without sacrificing reflectivity or channel uniformity. This study demonstrates the feasibility and effectiveness of deep RL as a powerful optimization tool for complex photonic systems, providing a scalable and data-efficient pathway toward next-generation optical device design.

在光子学和光通信系统中,设计相同的双带滤光器一直是一个复杂的优化挑战。传统的方法依赖于迭代电磁模拟或解析近似,通常具有有限的通用性和较高的计算成本。在这项工作中,我们提出了一个深度强化学习(RL)框架,用于相同双频光纤布拉格光栅(FBG)滤波器的自主优化。基于三层全连接神经结构的策略网络,采用近端策略优化算法,在保持频谱对称和信道特性相同的情况下,最小化两个传输频带的半最大值全宽度(FWHM)。与基线设计相比,基于深度rl的设计在不牺牲反射率或通道均匀性的情况下,可将FWHM减少43%,光栅长度减少49%。该研究证明了深度强化学习作为复杂光子系统强大优化工具的可行性和有效性,为下一代光学器件设计提供了可扩展和数据高效的途径。
{"title":"Deep Reinforcement Learning-Based Optimization of Identical-Dual-Band Filters.","authors":"Ehsan Adibnia","doi":"10.1109/TNNLS.2026.3684954","DOIUrl":"https://doi.org/10.1109/TNNLS.2026.3684954","url":null,"abstract":"<p><p>Designing identical dual-band optical filters remains a complex optimization challenge in photonics and optical communication systems. Conventional methods, which rely on iterative electromagnetic simulations or analytical approximations, often suffer from limited generalizability and high computational costs. In this work, we propose a deep reinforcement learning (RL) framework for the autonomous optimization of identical dual-band fiber Bragg grating (FBG) filters. A policy network based on a three-layer fully connected neural architecture is trained using a proximal policy optimization algorithm to minimize the full width at half maximum (FWHM) of both transmission bands while maintaining spectral symmetry and identical channel characteristics. The deep RL-based design achieves a 43% reduction in FWHM and a 49% reduction in grating length compared to baseline designs, without sacrificing reflectivity or channel uniformity. This study demonstrates the feasibility and effectiveness of deep RL as a powerful optimization tool for complex photonic systems, providing a scalable and data-efficient pathway toward next-generation optical device design.</p>","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":""},"PeriodicalIF":8.9,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147814588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
XAI-Exit: Interpretability-Driven Dynamic Early Exits for Efficient and Transparent DNN Inference. XAI-Exit:可解释性驱动的动态早期退出,用于高效透明的DNN推理。
IF 8.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-30 DOI: 10.1109/TNNLS.2026.3685408
Haseena Rahmath P, Ajith Abraham, Kuldeep Chaurasia

Deep neural networks (DNNs) excel across domains but face challenges in resource-constrained and critical settings due to high computational cost and limited transparency. Early exit DNNs reduce overhead via intermediate predictions; yet, most approaches neglect interpretability, vital for trust in AI systems. This article presents XAI-Exit, an early exit framework that jointly optimizes efficiency and transparency. At its core, ExitDecisionNet (EDN)-a lightweight RNN trained with a curriculum strategy on confidence, interpretability, and stability metrics-dynamically predicts the optimal exit, while a skip mechanism minimizes redundant computation. To ensure transparency, exit attribution maps (EAMs) aggregate feature attributions across exits, revealing the decision trajectory and are complemented by standard XAI methods (integrated gradients (IGs), SmoothGrad, Grad-CAM++, and LRP). Experiments on MobileNetV3, ResNet18, and MSDNet with CIFAR-10, CIFAR-100, and ImageNet show that XAI-Exit improves efficiency without sacrificing accuracy, while uniquely ensuring interpretable exit decisions suitable for real-world deployment.

深度神经网络(dnn)在多个领域都表现出色,但由于计算成本高和透明度有限,在资源受限和关键环境中面临挑战。早期退出dnn通过中间预测减少开销;然而,大多数方法都忽视了可解释性,这对人工智能系统的信任至关重要。本文介绍了XAI-Exit,一个联合优化效率和透明度的早期退出框架。ExitDecisionNet (EDN)是一种轻量级的RNN,其核心是使用基于置信度、可解释性和稳定性指标的课程策略进行训练,可以动态预测最佳退出,而跳过机制则可以最大限度地减少冗余计算。为了确保透明度,出口属性图(eam)聚集了出口之间的特征属性,揭示了决策轨迹,并辅以标准的XAI方法(集成梯度(IGs)、SmoothGrad、Grad-CAM++和LRP)。在MobileNetV3、ResNet18和MSDNet上使用CIFAR-10、CIFAR-100和ImageNet进行的实验表明,XAI-Exit在不牺牲准确性的情况下提高了效率,同时独特地确保了适合实际部署的可解释退出决策。
{"title":"XAI-Exit: Interpretability-Driven Dynamic Early Exits for Efficient and Transparent DNN Inference.","authors":"Haseena Rahmath P, Ajith Abraham, Kuldeep Chaurasia","doi":"10.1109/TNNLS.2026.3685408","DOIUrl":"https://doi.org/10.1109/TNNLS.2026.3685408","url":null,"abstract":"<p><p>Deep neural networks (DNNs) excel across domains but face challenges in resource-constrained and critical settings due to high computational cost and limited transparency. Early exit DNNs reduce overhead via intermediate predictions; yet, most approaches neglect interpretability, vital for trust in AI systems. This article presents XAI-Exit, an early exit framework that jointly optimizes efficiency and transparency. At its core, ExitDecisionNet (EDN)-a lightweight RNN trained with a curriculum strategy on confidence, interpretability, and stability metrics-dynamically predicts the optimal exit, while a skip mechanism minimizes redundant computation. To ensure transparency, exit attribution maps (EAMs) aggregate feature attributions across exits, revealing the decision trajectory and are complemented by standard XAI methods (integrated gradients (IGs), SmoothGrad, Grad-CAM++, and LRP). Experiments on MobileNetV3, ResNet18, and MSDNet with CIFAR-10, CIFAR-100, and ImageNet show that XAI-Exit improves efficiency without sacrificing accuracy, while uniquely ensuring interpretable exit decisions suitable for real-world deployment.</p>","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":""},"PeriodicalIF":8.9,"publicationDate":"2026-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147814072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MID: A Self-Supervised Multimodal Iterative Denoising Framework. 自监督多模态迭代去噪框架。
IF 8.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-30 DOI: 10.1109/TNNLS.2026.3683544
Chang Nie, Tianchen Deng, Zhe Liu, Hesheng Wang

Denoising is important in many vision, medical, and biological applications, yet real observations are often corrupted by complex nonlinear noise and clean targets are often unavailable. We present MID, a self-supervised iterative denoising framework across data modalities. MID treats an observation as an intermediate state along a controllable corruption process and learns from noisy data only through two networks: a step predictor that estimates the current corruption stage and a residual predictor that estimates the effective residual increment to be removed at that stage. For nonlinear corruption, MID uses a first-order local approximation to enable iterative restoration in a locally linear regime. The same formulation can be instantiated with modality-specific backbones for images, signals, point sets, and sequences. Experiments on diverse tasks in computer vision, biomedicine, and bioinformatics show that MID is robust, broadly applicable, and competitive with recent baselines.

去噪在许多视觉、医学和生物应用中都很重要,但实际观察结果经常被复杂的非线性噪声所破坏,并且通常无法获得干净的目标。我们提出了MID,一个跨数据模式的自监督迭代去噪框架。MID将观测值视为沿可控损坏过程的中间状态,并仅通过两个网络从噪声数据中学习:估计当前损坏阶段的步长预测器和估计在该阶段要去除的有效剩余增量的残差预测器。对于非线性损坏,MID使用一阶局部近似来实现局部线性区域的迭代恢复。对于图像、信号、点集和序列,可以使用特定于模态的主干实例化相同的公式。在计算机视觉、生物医学和生物信息学的不同任务上的实验表明,MID具有鲁棒性、广泛适用性,并且与最近的基线具有竞争力。
{"title":"MID: A Self-Supervised Multimodal Iterative Denoising Framework.","authors":"Chang Nie, Tianchen Deng, Zhe Liu, Hesheng Wang","doi":"10.1109/TNNLS.2026.3683544","DOIUrl":"https://doi.org/10.1109/TNNLS.2026.3683544","url":null,"abstract":"<p><p>Denoising is important in many vision, medical, and biological applications, yet real observations are often corrupted by complex nonlinear noise and clean targets are often unavailable. We present MID, a self-supervised iterative denoising framework across data modalities. MID treats an observation as an intermediate state along a controllable corruption process and learns from noisy data only through two networks: a step predictor that estimates the current corruption stage and a residual predictor that estimates the effective residual increment to be removed at that stage. For nonlinear corruption, MID uses a first-order local approximation to enable iterative restoration in a locally linear regime. The same formulation can be instantiated with modality-specific backbones for images, signals, point sets, and sequences. Experiments on diverse tasks in computer vision, biomedicine, and bioinformatics show that MID is robust, broadly applicable, and competitive with recent baselines.</p>","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":""},"PeriodicalIF":8.9,"publicationDate":"2026-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147813843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multiscale Convolutional Stochastic Configuration Network Soft Sensor Modeling Method. 多尺度卷积随机组态网络软传感器建模方法。
IF 8.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-29 DOI: 10.1109/TNNLS.2026.3683436
Aijun Yan, Chunpeng Yang

To address the challenges of industrial process modeling caused by multiscale spatiotemporal coupling, a soft sensor method based on the multiscale convolutional stochastic configuration network (MSC-SCN) is proposed. This method introduces a multiscale convolutional strategy into the SCN framework and designs parallel multiscale feature extractors with incremental learning capability under a supervised learning mechanism. Subsequently, cross-scale feature fusion is employed to integrate the multiscale feature maps generated by the previously constructed feature extractors. The output weights of the SCN are then optimized by combining low-rank matrix approximation and regularization methods to improve the efficiency and stability of the inverse of the hidden layer matrix. Experimental comparisons with state-of-the-art methods on three industrial soft sensor tasks demonstrate that the proposed approach yields the best performance and demonstrates high adaptability to multiscale spatiotemporal coupling.

针对多尺度时空耦合给工业过程建模带来的挑战,提出了一种基于多尺度卷积随机配置网络(MSC-SCN)的软测量方法。该方法在SCN框架中引入多尺度卷积策略,在监督学习机制下设计具有增量学习能力的并行多尺度特征提取器。然后,采用跨尺度特征融合方法对构造的特征提取器生成的多尺度特征图进行融合。然后结合低秩矩阵逼近和正则化方法对SCN的输出权值进行优化,以提高隐层矩阵逆的效率和稳定性。在三种工业软测量任务中与现有方法的实验比较表明,该方法具有最佳性能,对多尺度时空耦合具有较高的适应性。
{"title":"Multiscale Convolutional Stochastic Configuration Network Soft Sensor Modeling Method.","authors":"Aijun Yan, Chunpeng Yang","doi":"10.1109/TNNLS.2026.3683436","DOIUrl":"https://doi.org/10.1109/TNNLS.2026.3683436","url":null,"abstract":"<p><p>To address the challenges of industrial process modeling caused by multiscale spatiotemporal coupling, a soft sensor method based on the multiscale convolutional stochastic configuration network (MSC-SCN) is proposed. This method introduces a multiscale convolutional strategy into the SCN framework and designs parallel multiscale feature extractors with incremental learning capability under a supervised learning mechanism. Subsequently, cross-scale feature fusion is employed to integrate the multiscale feature maps generated by the previously constructed feature extractors. The output weights of the SCN are then optimized by combining low-rank matrix approximation and regularization methods to improve the efficiency and stability of the inverse of the hidden layer matrix. Experimental comparisons with state-of-the-art methods on three industrial soft sensor tasks demonstrate that the proposed approach yields the best performance and demonstrates high adaptability to multiscale spatiotemporal coupling.</p>","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":""},"PeriodicalIF":8.9,"publicationDate":"2026-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147770066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
End-to-End Image Compression With Segmentation Guided Dual Coding for Wind Turbines. 端到端图像压缩与分割引导双编码风力涡轮机。
IF 8.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-29 DOI: 10.1109/TNNLS.2026.3685207
Raul Perez-Gonzalo, Andreas Espersen, Soren Forchhammer, Antonio Agudo

Transferring large volumes of high-resolution images during wind turbine inspections introduces a bottleneck in assessing and detecting severe defects. Efficient coding must preserve high fidelity in blade regions while aggressively compressing the background. In this work, we propose an end-to-end deep learning framework that jointly performs segmentation and dual-mode (lossy and lossless) compression. The segmentation module accurately identifies the blade region, after which our region-of-interest (ROI) compressor encodes it at superior quality compared to the rest of the image. Unlike conventional ROI schemes that merely allocate more bits to salient areas, our framework integrates: 1) a robust segmentation network (BU-Netv2+P) with a CRF-regularized loss for precise blade localization; 2) a hyperprior-based autoencoder optimized for lossy compression; and 3) an extended bits-back coder with hierarchical models for fully lossless blade reconstruction. Furthermore, our ROI framework removes the sequential dependency in bits-back coding by reusing background-coded bits, enabling parallelized and efficient dual-mode compression. To the best of our knowledge, this is the first fully integrated learning-based ROI codec combining segmentation, lossy, and lossless compression, ensuring that subsequent defect detection is not compromised. Experiments on a large-scale wind turbine dataset demonstrate superior compression performance and efficiency, offering a practical solution for automated inspections.

在风力涡轮机检测过程中,传输大量高分辨率图像给评估和检测严重缺陷带来了瓶颈。有效的编码必须在积极压缩背景的同时,在刀片区域保持高保真度。在这项工作中,我们提出了一个端到端深度学习框架,该框架联合执行分割和双模式(有损和无损)压缩。分割模块准确地识别叶片区域,然后我们的感兴趣区域(ROI)压缩器以优于图像其他部分的质量对其进行编码。与传统的ROI方案不同,我们的框架集成了:1)具有crf正则化损失的鲁棒分割网络(BU-Netv2+P),用于精确的叶片定位;2)针对有损压缩优化的基于超先验的自编码器;3)基于分层模型的扩展后位编码器,实现完全无损的叶片重构。此外,我们的ROI框架通过重用背景编码位来消除反向编码中的顺序依赖,从而实现并行和高效的双模式压缩。据我们所知,这是第一个完全集成的基于学习的ROI编解码器,结合了分割、有损和无损压缩,确保随后的缺陷检测不会受到损害。在大型风力涡轮机数据集上的实验证明了卓越的压缩性能和效率,为自动检测提供了实用的解决方案。
{"title":"End-to-End Image Compression With Segmentation Guided Dual Coding for Wind Turbines.","authors":"Raul Perez-Gonzalo, Andreas Espersen, Soren Forchhammer, Antonio Agudo","doi":"10.1109/TNNLS.2026.3685207","DOIUrl":"https://doi.org/10.1109/TNNLS.2026.3685207","url":null,"abstract":"<p><p>Transferring large volumes of high-resolution images during wind turbine inspections introduces a bottleneck in assessing and detecting severe defects. Efficient coding must preserve high fidelity in blade regions while aggressively compressing the background. In this work, we propose an end-to-end deep learning framework that jointly performs segmentation and dual-mode (lossy and lossless) compression. The segmentation module accurately identifies the blade region, after which our region-of-interest (ROI) compressor encodes it at superior quality compared to the rest of the image. Unlike conventional ROI schemes that merely allocate more bits to salient areas, our framework integrates: 1) a robust segmentation network (BU-Netv2+P) with a CRF-regularized loss for precise blade localization; 2) a hyperprior-based autoencoder optimized for lossy compression; and 3) an extended bits-back coder with hierarchical models for fully lossless blade reconstruction. Furthermore, our ROI framework removes the sequential dependency in bits-back coding by reusing background-coded bits, enabling parallelized and efficient dual-mode compression. To the best of our knowledge, this is the first fully integrated learning-based ROI codec combining segmentation, lossy, and lossless compression, ensuring that subsequent defect detection is not compromised. Experiments on a large-scale wind turbine dataset demonstrate superior compression performance and efficiency, offering a practical solution for automated inspections.</p>","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":""},"PeriodicalIF":8.9,"publicationDate":"2026-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147770106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Human-Machine Co-Adaptation for Robot-Assisted Rehabilitation via Dual-Agent Multiple Model Reinforcement Learning (DAMMRL). 基于双智能体多模型强化学习的机器人辅助康复人机协同适应。
IF 8.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-28 DOI: 10.1109/TNNLS.2026.3685832
Yang An, Yaqi Li, Hongwei Wang, Rob Duffield, Steven W Su

This study introduces a novel approach to robot-assisted ankle rehabilitation by proposing a dual-agent multiple model reinforcement learning (DAMMRL) framework, leveraging multiple model adaptive control (MMAC) and co-adaptive control strategies. In robot-assisted rehabilitation, one of the key challenges is modeling human behavior due to the complexity of human cognition and physiological systems. Traditional single-model approaches often fail to capture the dynamics of human-machine interactions. Our research employs a multiple model strategy, using simple submodels to approximate complex human responses during rehabilitation tasks, tailored to varying levels of patient incapacity. The proposed system's versatility is demonstrated in real experiments and simulated environments. Feasibility and potential were evaluated with 13 healthy subjects and nine patients with lower-limb motor disorders, yielding promising results that affirm the anticipated benefits of the approach. This study not only introduces a new paradigm for robot-assisted ankle rehabilitation but also opens the way for future research in adaptive, patient-centered therapeutic interventions.

本研究通过提出双智能体多模型强化学习(DAMMRL)框架,利用多模型自适应控制(MMAC)和协同自适应控制策略,引入了一种机器人辅助踝关节康复的新方法。在机器人辅助康复中,由于人类认知和生理系统的复杂性,人类行为建模是一个关键挑战。传统的单模型方法往往无法捕捉人机交互的动态。我们的研究采用多模型策略,使用简单的子模型来近似康复任务中复杂的人类反应,针对不同程度的患者丧失能力。在实际实验和仿真环境中验证了该系统的通用性。对13名健康受试者和9名下肢运动障碍患者的可行性和潜力进行了评估,得出了令人鼓舞的结果,证实了该方法的预期益处。这项研究不仅为机器人辅助踝关节康复提供了一个新的范例,而且为未来适应性、以患者为中心的治疗干预研究开辟了道路。
{"title":"Human-Machine Co-Adaptation for Robot-Assisted Rehabilitation via Dual-Agent Multiple Model Reinforcement Learning (DAMMRL).","authors":"Yang An, Yaqi Li, Hongwei Wang, Rob Duffield, Steven W Su","doi":"10.1109/TNNLS.2026.3685832","DOIUrl":"https://doi.org/10.1109/TNNLS.2026.3685832","url":null,"abstract":"<p><p>This study introduces a novel approach to robot-assisted ankle rehabilitation by proposing a dual-agent multiple model reinforcement learning (DAMMRL) framework, leveraging multiple model adaptive control (MMAC) and co-adaptive control strategies. In robot-assisted rehabilitation, one of the key challenges is modeling human behavior due to the complexity of human cognition and physiological systems. Traditional single-model approaches often fail to capture the dynamics of human-machine interactions. Our research employs a multiple model strategy, using simple submodels to approximate complex human responses during rehabilitation tasks, tailored to varying levels of patient incapacity. The proposed system's versatility is demonstrated in real experiments and simulated environments. Feasibility and potential were evaluated with 13 healthy subjects and nine patients with lower-limb motor disorders, yielding promising results that affirm the anticipated benefits of the approach. This study not only introduces a new paradigm for robot-assisted ankle rehabilitation but also opens the way for future research in adaptive, patient-centered therapeutic interventions.</p>","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":""},"PeriodicalIF":8.9,"publicationDate":"2026-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147770054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Survey on Vision-Language-Action Models for Embodied AI. 嵌入式人工智能的视觉-语言-动作模型研究综述。
IF 8.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-28 DOI: 10.1109/TNNLS.2025.3650584
Yueen Ma, Zixing Song, Yuzheng Zhuang, Jianye Hao, Irwin King

Embodied AI is widely recognized as a cornerstone of artificial general intelligence (AGI) because it involves controlling embodied agents to perform tasks in the physical world. Building on the success of large language models (LLMs) and vision-language models (VLMs), a new category of multimodal models-referred to as vision-language-action (VLA) models-has emerged to address language-conditioned robotic tasks in embodied AI by leveraging their distinct ability to generate actions. The recent proliferation of VLAs necessitates a comprehensive survey to capture the rapidly evolving landscape. To this end, we present the first survey on VLAs for embodied AI. This work provides a detailed taxonomy of VLAs, organized into three major lines of research. The first line focuses on individual components of VLAs. The second line is dedicated to developing VLA-based control policies adept at predicting low-level actions. The third line comprises high-level task planners capable of decomposing long-horizon tasks into a sequence of subtasks, thereby guiding VLAs to follow more general user instructions. Furthermore, we provide an extensive summary of relevant resources, including datasets, simulators, and benchmarks. Finally, we discuss the challenges facing VLAs and outline promising future directions in embodied AI. A curated repository associated with this survey is available at: https://github.com/yueen-ma/Awesome-VLA.

嵌入式人工智能被广泛认为是通用人工智能(AGI)的基石,因为它涉及控制嵌入式代理在物理世界中执行任务。基于大型语言模型(llm)和视觉语言模型(vlm)的成功,一种新的多模态模型——被称为视觉语言动作(VLA)模型——已经出现,通过利用它们独特的生成动作的能力来解决嵌入人工智能中的语言条件机器人任务。最近VLAs的激增需要进行全面的调查,以捕捉快速变化的景观。为此,我们提出了关于嵌入式AI的VLAs的第一个调查。这项工作提供了vla的详细分类,分为三个主要的研究方向。第一行重点介绍vla的各个组件。第二行致力于开发基于vlan的控制策略,该策略擅长于预测低级动作。第三行包括能够将长期任务分解为一系列子任务的高级任务规划器,从而指导vla遵循更一般的用户指令。此外,我们还提供了相关资源的广泛摘要,包括数据集、模拟器和基准测试。最后,我们讨论了VLAs面临的挑战,并概述了嵌入式AI的未来发展方向。与此调查相关的策划存储库可在:https://github.com/yueen-ma/Awesome-VLA上获得。
{"title":"A Survey on Vision-Language-Action Models for Embodied AI.","authors":"Yueen Ma, Zixing Song, Yuzheng Zhuang, Jianye Hao, Irwin King","doi":"10.1109/TNNLS.2025.3650584","DOIUrl":"https://doi.org/10.1109/TNNLS.2025.3650584","url":null,"abstract":"<p><p>Embodied AI is widely recognized as a cornerstone of artificial general intelligence (AGI) because it involves controlling embodied agents to perform tasks in the physical world. Building on the success of large language models (LLMs) and vision-language models (VLMs), a new category of multimodal models-referred to as vision-language-action (VLA) models-has emerged to address language-conditioned robotic tasks in embodied AI by leveraging their distinct ability to generate actions. The recent proliferation of VLAs necessitates a comprehensive survey to capture the rapidly evolving landscape. To this end, we present the first survey on VLAs for embodied AI. This work provides a detailed taxonomy of VLAs, organized into three major lines of research. The first line focuses on individual components of VLAs. The second line is dedicated to developing VLA-based control policies adept at predicting low-level actions. The third line comprises high-level task planners capable of decomposing long-horizon tasks into a sequence of subtasks, thereby guiding VLAs to follow more general user instructions. Furthermore, we provide an extensive summary of relevant resources, including datasets, simulators, and benchmarks. Finally, we discuss the challenges facing VLAs and outline promising future directions in embodied AI. A curated repository associated with this survey is available at: https://github.com/yueen-ma/Awesome-VLA.</p>","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":""},"PeriodicalIF":8.9,"publicationDate":"2026-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147770125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual-Path Conditional Diffusion Model With Attribute Consistency for Zero-Shot Fault Diagnosis 零弹故障诊断的属性一致性双路径条件扩散模型
IF 10.4 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-27 DOI: 10.1109/tnnls.2026.3683414
Wenjie Liao, Like Wu, Shihui Xu, Shigeru Fujimura
{"title":"Dual-Path Conditional Diffusion Model With Attribute Consistency for Zero-Shot Fault Diagnosis","authors":"Wenjie Liao, Like Wu, Shihui Xu, Shigeru Fujimura","doi":"10.1109/tnnls.2026.3683414","DOIUrl":"https://doi.org/10.1109/tnnls.2026.3683414","url":null,"abstract":"","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"152 1","pages":""},"PeriodicalIF":10.4,"publicationDate":"2026-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147753197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on neural networks and learning systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1