首页 > 最新文献

Applied Soft Computing最新文献

英文 中文
PhysDiffWind: A physics-constrained retrieval-augmented diffusion framework for offshore wind speed forecasting PhysDiffWind:用于海上风速预报的物理约束检索增强扩散框架
IF 6.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-13 DOI: 10.1016/j.asoc.2026.114647
Wei Dong, Jinxing Che, Yan Mei, Lei Zhou, Yuhua Zhang
Offshore wind speed forecasting remains challenging due to strong variability, nonlinear dynamics, and the limited use of physical constraints. Existing models rarely incorporate key physical factors such as the pressure gradient and wind speed relationship, the effects of temperature and humidity on air density, and historical wind patterns, while underutilizing relevant past meteorological scenarios. To address these limitations, we propose PhysDiffWind, a physics-constrained retrieval-augmented diffusion framework for accurate probabilistic offshore wind forecasting. The framework integrates a multi-module cooperative architecture. Specifically, the retrieval-augmented module employs a multi-head temporal convolutional network to encode historical multivariate sequences and dynamically retrieves representative historical patterns from a database via similarity-based embedding matching. Meanwhile, the physics-and-context-aware side information module incorporates temporal features with environmental variables such as pressure gradient and air density to provide structured guidance for the diffusion process. Furthermore, the diffusion module adopts a transformer-based conditional architecture to implement a two-stage modeling approach: forward perturbation and reverse denoising, and generates physically consistent and probabilistically expressive wind speed distributions under the modulation of retrieval samples and physical side information. Extensive experiments on real-world offshore wind datasets show that PhysDiffWind reduces the mean squared error (MSE) by up to 55.1 % and the continuous ranked probability score (CRPS) by 49.3 % compared with state-of-the-art baselines. These results confirm the framework’s effectiveness in capturing nonlinear atmospheric dynamics and improving forecasting reliability for wind farm operations.
由于海上风速的强变异性、非线性动力学和有限的物理约束,海上风速预测仍然具有挑战性。现有模式很少考虑气压梯度和风速关系、温度和湿度对空气密度的影响以及历史风型等关键物理因素,而未充分利用相关的过去气象情景。为了解决这些限制,我们提出了PhysDiffWind,这是一个物理约束的检索增强扩散框架,用于准确的概率海上风预报。该框架集成了多模块协作体系结构。具体而言,检索增强模块采用多头时间卷积网络对历史多元序列进行编码,并通过基于相似性的嵌入匹配从数据库中动态检索具有代表性的历史模式。同时,物理和上下文感知侧信息模块将时间特征与压力梯度和空气密度等环境变量相结合,为扩散过程提供结构化指导。扩散模块采用基于变压器的条件架构,实现前向扰动和反向去噪两阶段建模方法,在检索样本和物理侧信息调制下生成物理一致且具有概率表达性的风速分布。在实际海上风电数据集上进行的大量实验表明,与最先进的基线相比,PhysDiffWind将均方误差(MSE)降低了55.1% %,连续排名概率得分(CRPS)降低了49.3% %。这些结果证实了该框架在捕获非线性大气动力学和提高风电场运行预测可靠性方面的有效性。
{"title":"PhysDiffWind: A physics-constrained retrieval-augmented diffusion framework for offshore wind speed forecasting","authors":"Wei Dong,&nbsp;Jinxing Che,&nbsp;Yan Mei,&nbsp;Lei Zhou,&nbsp;Yuhua Zhang","doi":"10.1016/j.asoc.2026.114647","DOIUrl":"10.1016/j.asoc.2026.114647","url":null,"abstract":"<div><div>Offshore wind speed forecasting remains challenging due to strong variability, nonlinear dynamics, and the limited use of physical constraints. Existing models rarely incorporate key physical factors such as the pressure gradient and wind speed relationship, the effects of temperature and humidity on air density, and historical wind patterns, while underutilizing relevant past meteorological scenarios. To address these limitations, we propose PhysDiffWind, a physics-constrained retrieval-augmented diffusion framework for accurate probabilistic offshore wind forecasting. The framework integrates a multi-module cooperative architecture. Specifically, the retrieval-augmented module employs a multi-head temporal convolutional network to encode historical multivariate sequences and dynamically retrieves representative historical patterns from a database via similarity-based embedding matching. Meanwhile, the physics-and-context-aware side information module incorporates temporal features with environmental variables such as pressure gradient and air density to provide structured guidance for the diffusion process. Furthermore, the diffusion module adopts a transformer-based conditional architecture to implement a two-stage modeling approach: forward perturbation and reverse denoising, and generates physically consistent and probabilistically expressive wind speed distributions under the modulation of retrieval samples and physical side information. Extensive experiments on real-world offshore wind datasets show that PhysDiffWind reduces the mean squared error (MSE) by up to 55.1 % and the continuous ranked probability score (CRPS) by 49.3 % compared with state-of-the-art baselines. These results confirm the framework’s effectiveness in capturing nonlinear atmospheric dynamics and improving forecasting reliability for wind farm operations.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"190 ","pages":"Article 114647"},"PeriodicalIF":6.6,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
3D-NoduleNet: A comprehensive framework for benign and malignant pulmonary nodule classification 3D-NoduleNet:肺良恶性结节分类的综合框架
IF 6.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-13 DOI: 10.1016/j.asoc.2026.114645
Huihui Yu , Qun Dai , Yanfu Wu
Accurate pulmonary nodule classification is essential for lung cancer diagnosis but poses significant challenges due to class imbalance, high appearance heterogeneity, and subtle inter-class variations. To tackle these challenges, this paper proposes 3D-NoduleNet, a clinical-semantics-driven, unified framework that integrates detection and segmentation into a task-oriented pipeline for nodule classification. This framework begins with the Region Proposal and Segmentation Module (RPSM), which adopts a detection-guided segmentation strategy. The region proposals generated by the 3D Faster R-CNN are used to crop ROIs, which are subsequently segmented by the 3D BCDU-Net into aligned, morphologically consistent masks—explicitly instantiating key clinical semantic cues: texture and morphology. Next, the Multi-view Feature Extraction Module (MFEM) employs a dual-path architecture under a refine-then-fuse scheme. Within each path, an internal cascade refines view-specific semantic features through multi-scale contextual aggregation and sequential attention, whose output is fused to leverage complementary clinical cues. Finally, the Pulmonary Nodule Classification Module (PNCM) adopts a dual-supervision strategy: the binary cross-entropy loss provides primary instance-level supervision, while the auxiliary supervised contrastive loss regularizes the feature space by clustering same-class nodules and separating different-class ones—enhancing separability and improving benign-malignant classification. Through this coordinated semantic flow, the modules form a tightly integrated framework, systematically transforming clinical semantics from instantiation to discrimination. Experiments on LIDC-IDRI demonstrate the superior and well-balanced performance of 3D-NoduleNet over existing methods, with achievements of 92.50% sensitivity and 90.63% F1-score, alongside competitive accuracy and specificity. Ablation studies validate the effectiveness of key components and strategies, highlighting its robustness and clinical applicability.
准确的肺结节分类对肺癌的诊断至关重要,但由于类别不平衡、外观异质性高和微妙的类别间差异,存在重大挑战。为了应对这些挑战,本文提出了3D-NoduleNet,这是一个临床语义驱动的统一框架,将检测和分割集成到面向任务的结节分类管道中。该框架从区域建议和分割模块(RPSM)开始,RPSM采用检测引导的分割策略。3D Faster R-CNN生成的区域建议用于裁剪roi,随后由3D BCDU-Net分割成对齐的、形态一致的掩模,明确实例化关键的临床语义线索:纹理和形态。其次,多视图特征提取模块(MFEM)采用先细化后融合的双路径架构。在每个路径中,内部级联通过多尺度上下文聚合和顺序注意来细化特定于视图的语义特征,其输出被融合以利用互补的临床线索。最后,肺结节分类模块(Pulmonary结节Classification Module, PNCM)采用双监督策略:二值交叉熵损失提供主要的实例级监督,而辅助监督对比损失通过聚类同类结节和分离不同类别结节来正则化特征空间,增强可分性,改善良恶性分类。通过这种协调的语义流,各模块形成一个紧密集成的框架,系统地将临床语义从实例化转化为辨析。在LIDC-IDRI上的实验表明,3D-NoduleNet的性能优于现有的方法,灵敏度为92.50%,f1评分为90.63%,准确性和特异性具有竞争力。消融研究验证了关键成分和策略的有效性,突出了其稳健性和临床适用性。
{"title":"3D-NoduleNet: A comprehensive framework for benign and malignant pulmonary nodule classification","authors":"Huihui Yu ,&nbsp;Qun Dai ,&nbsp;Yanfu Wu","doi":"10.1016/j.asoc.2026.114645","DOIUrl":"10.1016/j.asoc.2026.114645","url":null,"abstract":"<div><div>Accurate pulmonary nodule classification is essential for lung cancer diagnosis but poses significant challenges due to class imbalance, high appearance heterogeneity, and subtle inter-class variations. To tackle these challenges, this paper proposes 3D-NoduleNet, a clinical-semantics-driven, unified framework that integrates detection and segmentation into a task-oriented pipeline for nodule classification. This framework begins with the Region Proposal and Segmentation Module (RPSM), which adopts a detection-guided segmentation strategy. The region proposals generated by the 3D Faster R-CNN are used to crop ROIs, which are subsequently segmented by the 3D BCDU-Net into aligned, morphologically consistent masks—explicitly instantiating key clinical semantic cues: texture and morphology. Next, the Multi-view Feature Extraction Module (MFEM) employs a dual-path architecture under a refine-then-fuse scheme. Within each path, an internal cascade refines view-specific semantic features through multi-scale contextual aggregation and sequential attention, whose output is fused to leverage complementary clinical cues. Finally, the Pulmonary Nodule Classification Module (PNCM) adopts a dual-supervision strategy: the binary cross-entropy loss provides primary instance-level supervision, while the auxiliary supervised contrastive loss regularizes the feature space by clustering same-class nodules and separating different-class ones—enhancing separability and improving benign-malignant classification. Through this coordinated semantic flow, the modules form a tightly integrated framework, systematically transforming clinical semantics from instantiation to discrimination. Experiments on LIDC-IDRI demonstrate the superior and well-balanced performance of 3D-NoduleNet over existing methods, with achievements of 92.50% sensitivity and 90.63% F1-score, alongside competitive accuracy and specificity. Ablation studies validate the effectiveness of key components and strategies, highlighting its robustness and clinical applicability.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"191 ","pages":"Article 114645"},"PeriodicalIF":6.6,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BGFF-UNet: A lightweight network based on boundary-guided feature fusion mechanism for skin lesion segmentation BGFF-UNet:一种基于边界引导特征融合机制的皮肤病灶分割轻量级网络
IF 6.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-13 DOI: 10.1016/j.asoc.2026.114658
Jian Li , Hengxu Guan , Jiawei Wang , Jiawei Zhao , Xinglei Lin , Junrui Kang , Jie Luo , Huiling Chen
Accurate segmentation of skin lesion areas is crucial for the early detection of melanoma. However, there is still a significant gap in the effective application of existing models within computationally constrained environments. To this end, the Lightweight Boundary-Guided Feature Fusion Network (BGFF-UNet) is proposed. This model is designed to significantly reduce computational load and parameter count while ensuring competitive segmentation accuracy. BGFF-UNet's core contribution lies in its integration of boundary refinement and a lightweight architecture. Specifically, it employs a Curvature-Driven Adaptive Polygon Fitting with Keypoint Detection (CDAP-KD) to facilitate a unique boundary refinement mechanism for precise boundary prompt generation, as well as a novel Boundary-Guided Feature Fusion (BGFF) module. To ensure efficiency, the model also incorporates Grouped Multi-Axial Hadamard Product Attention (GHPA), Depthwise Separable Convolutions (DWSC) and Efficient Channel Attention (ECA) mechanisms. Extensive evaluations on the ISIC 2017 and ISIC 2018 datasets demonstrate that the accuracy of BGFF-UNet is competitive with that of the advanced lightweight model, LB-UNet. Notably, its parameter count is reduced by 37.75 % to 23.96 K, and its computational load is reduced by 40.97 % to 57.85 M FLOPs compared to LB-UNet. This research offers a promising solution for high-efficiency, high-accuracy melanoma segmentation in settings with limited resources.
皮肤病变区域的准确分割对于黑色素瘤的早期发现至关重要。然而,在计算受限的环境中,现有模型的有效应用仍然存在很大的差距。为此,提出了轻量级边界引导特征融合网络(BGFF-UNet)。该模型旨在显著减少计算负荷和参数计数,同时确保具有竞争力的分割精度。BGFF-UNet的核心贡献在于它集成了边界细化和轻量级架构。具体来说,它采用了曲率驱动的自适应多边形拟合和关键点检测(CDAP-KD)来促进独特的边界细化机制,以实现精确的边界提示生成,以及新颖的边界引导特征融合(BGFF)模块。为了确保效率,该模型还结合了分组多轴阿达玛产品注意(GHPA)、深度可分离卷积(DWSC)和有效通道注意(ECA)机制。对ISIC 2017和ISIC 2018数据集的广泛评估表明,BGFF-UNet的精度与先进的轻量级模型LB-UNet具有竞争力。值得注意的是,与LB-UNet相比,其参数计数减少了37.75% %至23.96 K,计算负荷减少了40.97% %至57.85 M FLOPs。该研究为资源有限的情况下高效、高精度的黑色素瘤分割提供了一个有希望的解决方案。
{"title":"BGFF-UNet: A lightweight network based on boundary-guided feature fusion mechanism for skin lesion segmentation","authors":"Jian Li ,&nbsp;Hengxu Guan ,&nbsp;Jiawei Wang ,&nbsp;Jiawei Zhao ,&nbsp;Xinglei Lin ,&nbsp;Junrui Kang ,&nbsp;Jie Luo ,&nbsp;Huiling Chen","doi":"10.1016/j.asoc.2026.114658","DOIUrl":"10.1016/j.asoc.2026.114658","url":null,"abstract":"<div><div>Accurate segmentation of skin lesion areas is crucial for the early detection of melanoma. However, there is still a significant gap in the effective application of existing models within computationally constrained environments. To this end, the Lightweight Boundary-Guided Feature Fusion Network (BGFF-UNet) is proposed. This model is designed to significantly reduce computational load and parameter count while ensuring competitive segmentation accuracy. BGFF-UNet's core contribution lies in its integration of boundary refinement and a lightweight architecture. Specifically, it employs a Curvature-Driven Adaptive Polygon Fitting with Keypoint Detection (CDAP-KD) to facilitate a unique boundary refinement mechanism for precise boundary prompt generation, as well as a novel Boundary-Guided Feature Fusion (BGFF) module. To ensure efficiency, the model also incorporates Grouped Multi-Axial Hadamard Product Attention (GHPA), Depthwise Separable Convolutions (DWSC) and Efficient Channel Attention (ECA) mechanisms. Extensive evaluations on the ISIC 2017 and ISIC 2018 datasets demonstrate that the accuracy of BGFF-UNet is competitive with that of the advanced lightweight model, LB-UNet. Notably, its parameter count is reduced by 37.75 % to 23.96 K, and its computational load is reduced by 40.97 % to 57.85 M FLOPs compared to LB-UNet. This research offers a promising solution for high-efficiency, high-accuracy melanoma segmentation in settings with limited resources.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"190 ","pages":"Article 114658"},"PeriodicalIF":6.6,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accurate solution of high-dimensional partial differential equations: Research on data-physics collaborative modeling and adaptive sampling 高维偏微分方程的精确解:数据物理协同建模与自适应采样研究
IF 6.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-13 DOI: 10.1016/j.asoc.2025.114472
Haoran Du , Zhao Liu , Ping Zhu
Solving high-dimensional partial differential equations (PDEs) remains a fundamental challenge in modeling complex systems in science and engineering. Traditional methods often suffer from significant accuracy limitations due to the "curse of dimensionality." In this work, we propose a bidirectionally coupled data-physics collaborative modeling framework that integrates a Variational Autoencoder (VAE) with a Physics-Informed Neural Network (PINN), significantly improves the solution accuracy of high-dimensional PDEs. On the data-driven side, the VAE compresses high-dimensional data into a latent space to extract essential features; on the physics-constrained side, the PINN embeds physical priors via residual minimization, achieving deep integration of physical laws and data representations. Building upon this, we introduce a dynamically enhanced version, VAE-PINN-DS, which employs an adaptive sampling strategy to allocate computational resources to regions with high residual gradients, low latent-space density, and high uncertainty. Numerical experiments demonstrate the effectiveness of VAE-PINN-DS in solving the Poisson equation (5D–15D), non-uniform elliptic equation (10D), and the 20-dimensional Korteweg-de Vries (KdV) equation. Compared with the baseline PINN and other mainstream methods such as Galerkin and Deep Ritz, VAE-PINN-DS reduces the relative L2 error by 1–3 orders of magnitude and exhibits strong robustness in ultra-high-dimensional scenarios. These results highlight that the synergy between data-driven dimensionality reduction and physics-constrained learning provides a scalable new paradigm for solving high-dimensional PDEs.
求解高维偏微分方程(PDEs)仍然是科学和工程中复杂系统建模的一个基本挑战。由于“维度的诅咒”,传统的方法常常遭受严重的精度限制。在这项工作中,我们提出了一个双向耦合数据-物理协作建模框架,该框架集成了变分自编码器(VAE)和物理信息神经网络(PINN),显著提高了高维偏微分方程的求解精度。在数据驱动方面,VAE将高维数据压缩到一个潜在空间中提取本质特征;在物理约束方面,PINN通过残差最小化嵌入物理先验,实现物理定律和数据表示的深度集成。在此基础上,我们引入了一个动态增强版本,VAE-PINN-DS,它采用自适应采样策略将计算资源分配到残差梯度高、潜在空间密度低、不确定性高的区域。数值实验证明了VAE-PINN-DS在求解泊松方程(5D-15D)、非均匀椭圆方程(10D)和20维Korteweg-de Vries (KdV)方程中的有效性。与基线PINN和Galerkin、Deep Ritz等主流方法相比,VAE-PINN-DS将相对L2误差降低了1-3个数量级,在超高维场景下表现出较强的鲁棒性。这些结果表明,数据驱动的降维和物理约束的学习之间的协同作用为解决高维pde提供了一个可扩展的新范例。
{"title":"Accurate solution of high-dimensional partial differential equations: Research on data-physics collaborative modeling and adaptive sampling","authors":"Haoran Du ,&nbsp;Zhao Liu ,&nbsp;Ping Zhu","doi":"10.1016/j.asoc.2025.114472","DOIUrl":"10.1016/j.asoc.2025.114472","url":null,"abstract":"<div><div>Solving high-dimensional partial differential equations (PDEs) remains a fundamental challenge in modeling complex systems in science and engineering. Traditional methods often suffer from significant accuracy limitations due to the \"curse of dimensionality.\" In this work, we propose a bidirectionally coupled data-physics collaborative modeling framework that integrates a Variational Autoencoder (VAE) with a Physics-Informed Neural Network (PINN), significantly improves the solution accuracy of high-dimensional PDEs. On the data-driven side, the VAE compresses high-dimensional data into a latent space to extract essential features; on the physics-constrained side, the PINN embeds physical priors via residual minimization, achieving deep integration of physical laws and data representations. Building upon this, we introduce a dynamically enhanced version, VAE-PINN-DS, which employs an adaptive sampling strategy to allocate computational resources to regions with high residual gradients, low latent-space density, and high uncertainty. Numerical experiments demonstrate the effectiveness of VAE-PINN-DS in solving the Poisson equation (5D–15D), non-uniform elliptic equation (10D), and the 20-dimensional Korteweg-de Vries (KdV) equation. Compared with the baseline PINN and other mainstream methods such as Galerkin and Deep Ritz, VAE-PINN-DS reduces the relative L2 error by 1–3 orders of magnitude and exhibits strong robustness in ultra-high-dimensional scenarios. These results highlight that the synergy between data-driven dimensionality reduction and physics-constrained learning provides a scalable new paradigm for solving high-dimensional PDEs.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"191 ","pages":"Article 114472"},"PeriodicalIF":6.6,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MSMT-Net: Multi-scale multi-task network for capsular edge recognition and instrument segmentation of cataract surgery MSMT-Net:用于白内障手术包膜边缘识别和器械分割的多尺度多任务网络
IF 6.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-13 DOI: 10.1016/j.asoc.2026.114654
Shaofeng Han , Mingfeng Lu , Yurun Liu , Yajing Pei , Ke Ma
The continuous curvilinear capsulorhexis (CCC) is one of the most critical steps in cataract surgery, as its quality critically determines intraocular lens implantation success and final visual acuity. However, automatic recognition of the capsular edge poses three challenges: high capsule transparency, intraoperative eyeball motion, and visual artifacts. To address these, we propose MSMT-Net, a Multi-Scale, Multi-Task Network for simultaneous capsular edge detection and surgical instrument segmentation. The core architecture is a dual-branch encoder (CNN and Transformer) that jointly captures local textures for transparent edge resolution and global dependencies for motion robustness. A Hybrid Feature Fusion Block (HFFB) ensures robust cross-branch complementarity. To further refine the multi-task performance, the architecture employs several multi-scale contextual modules: the multi-layer context fusion block (MCFB), multi-receptive field fusion block (MRFB), and channel transformer block (CTB) handle segmentation, while the layer transformer block (LTB) is dedicated to ensuring high-continuity edge detection. Crucially, a novel compensation block is integrated for precise eye center estimation and robust reconstruction of the capsulorhexis trajectory, enabling quantitative quality assessment and robot guidance. Experiments on the self-established CapsDet dataset show that MSMT-Net outperforms state-of-the-art methods, particularly in transparent edge detection, achieving 78.24 % mean IoU and 86.40 % mean Dice for capsular edge and pupil recognition, and 92.83 % IoU and 94.09 % Dice for instrument segmentation. These results demonstrate the network’s robustness across tasks and its potential as a reliable foundation for quantitative evaluation of capsulorhexis quality and robot-assisted surgical guidance. The code is available at https://github.com/hanshaofeng8/MSMT-Net.
连续曲线剥囊术(CCC)是白内障手术中最关键的步骤之一,其质量决定了人工晶状体植入术的成功和最终的视力。然而,囊膜边缘的自动识别存在三个挑战:囊膜透明度高、术中眼球运动和视觉伪影。为了解决这些问题,我们提出了MSMT-Net,一个多尺度,多任务网络,用于同时进行包膜边缘检测和手术器械分割。核心架构是一个双分支编码器(CNN和Transformer),它们联合捕获局部纹理以实现透明边缘分辨率,并捕获全局依赖项以实现运动鲁棒性。混合特征融合块(HFFB)确保了强大的跨分支互补性。为了进一步改善多任务性能,该架构采用了多个多尺度上下文模块:多层上下文融合块(MCFB)、多接受场融合块(MRFB)和通道变压器块(CTB)处理分割,而层变压器块(LTB)致力于确保高连续性边缘检测。至关重要的是,该系统集成了一种新的补偿块,用于精确的眼中心估计和撕囊轨迹的鲁棒重建,从而实现定量质量评估和机器人指导。在自己建立的CapsDet数据集上的实验表明,MSMT-Net优于最先进的方法,特别是在透明边缘检测方面,在包膜边缘和瞳孔识别方面达到78.24 %的平均IoU和86.40 %的平均Dice,在仪器分割方面达到92.83 %的IoU和94.09 %的Dice。这些结果表明,网络的鲁棒性跨任务和它的潜力,作为定量评估撕囊质量和机器人辅助手术指导的可靠基础。代码可在https://github.com/hanshaofeng8/MSMT-Net上获得。
{"title":"MSMT-Net: Multi-scale multi-task network for capsular edge recognition and instrument segmentation of cataract surgery","authors":"Shaofeng Han ,&nbsp;Mingfeng Lu ,&nbsp;Yurun Liu ,&nbsp;Yajing Pei ,&nbsp;Ke Ma","doi":"10.1016/j.asoc.2026.114654","DOIUrl":"10.1016/j.asoc.2026.114654","url":null,"abstract":"<div><div>The continuous curvilinear capsulorhexis (CCC) is one of the most critical steps in cataract surgery, as its quality critically determines intraocular lens implantation success and final visual acuity. However, automatic recognition of the capsular edge poses three challenges: high capsule transparency, intraoperative eyeball motion, and visual artifacts. To address these, we propose <strong>MSMT-Net</strong>, a <strong>M</strong>ulti-<strong>S</strong>cale, <strong>M</strong>ulti-<strong>T</strong>ask <strong>Net</strong>work for simultaneous capsular edge detection and surgical instrument segmentation. The core architecture is a dual-branch encoder (CNN and Transformer) that jointly captures local textures for transparent edge resolution and global dependencies for motion robustness. A Hybrid Feature Fusion Block (HFFB) ensures robust cross-branch complementarity. To further refine the multi-task performance, the architecture employs several multi-scale contextual modules: the multi-layer context fusion block (MCFB), multi-receptive field fusion block (MRFB), and channel transformer block (CTB) handle segmentation, while the layer transformer block (LTB) is dedicated to ensuring high-continuity edge detection. Crucially, a novel compensation block is integrated for precise eye center estimation and robust reconstruction of the capsulorhexis trajectory, enabling quantitative quality assessment and robot guidance. Experiments on the self-established CapsDet dataset show that MSMT-Net outperforms state-of-the-art methods, particularly in transparent edge detection, achieving 78.24 % mean IoU and 86.40 % mean Dice for capsular edge and pupil recognition, and 92.83 % IoU and 94.09 % Dice for instrument segmentation. These results demonstrate the network’s robustness across tasks and its potential as a reliable foundation for quantitative evaluation of capsulorhexis quality and robot-assisted surgical guidance. The code is available at <span><span>https://github.com/hanshaofeng8/MSMT-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"190 ","pages":"Article 114654"},"PeriodicalIF":6.6,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145980290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A cross-task segmentation network for broiler phenotypic measurement via loss-guided masking 基于损失引导掩蔽的肉鸡表型测量的跨任务分割网络
IF 6.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-13 DOI: 10.1016/j.asoc.2026.114596
Xupeng Kou , Hongcheng Xue , Xiaobei Zhao , Yakun Yang , Longhe Wang , Lin Li
In poultry farming, non-invasive and accurate measurement of broiler chickens’ body size phenotypic traits is crucial for high-quality breeding and intelligent farming. Existing deep learning-based segmentation methods offer solutions, but there is still room for improvement in segmentation accuracy for body size phenotypes. To address this, we propose SegXTR (X-shape Transformer-based Segmentation Network), a multi-task learning-based phenotypic segmentation network for broiler chickens. This network combines the masked image modeling task with the segmentation task, enhancing segmentation accuracy and feature representation capability through multi-task collaboration. Specifically, we design two independent encoders to process the original and masked images. The original image encoder focuses on extracting segmentation features, while the masked image encoder adopts an adaptive mask strategy, focusing on areas with high reconstruction loss to enhance contextual feature learning. To achieve complementary and consistent cross-task features, we fuse features at the encoder output using a linear attention mechanism. Extensive experiments on the BroilerZTA dataset demonstrate that our method achieves dice similarity coefficients of 0.9326, 0.9273, and 0.6089 for the segmentation of three traits. This study provides a more accurate and feasible method for intelligent phenotypic segmentation in poultry farming. The code is available at https://github.com/Github-XKou/SegXTR.
在家禽养殖中,无创准确测量肉鸡体型、表型性状对优质养殖和智能化养殖至关重要。现有的基于深度学习的分割方法提供了解决方案,但在体型表型的分割精度上仍有提高的空间。为了解决这个问题,我们提出了一种基于多任务学习的肉鸡表型分割网络SegXTR(基于x形变压器的分割网络)。该网络将掩模图像建模任务与分割任务相结合,通过多任务协作提高了分割精度和特征表示能力。具体来说,我们设计了两个独立的编码器来处理原始图像和屏蔽图像。原始图像编码器专注于提取分割特征,而掩码图像编码器采用自适应掩码策略,专注于重构损失高的区域,增强上下文特征学习。为了实现互补和一致的跨任务特征,我们使用线性注意机制融合编码器输出的特征。在BroilerZTA数据集上的大量实验表明,我们的方法对三个特征的分割得到了0.9326、0.9273和0.6089的骰子相似系数。本研究为家禽养殖智能表型分割提供了一种更为准确可行的方法。代码可在https://github.com/Github-XKou/SegXTR上获得。
{"title":"A cross-task segmentation network for broiler phenotypic measurement via loss-guided masking","authors":"Xupeng Kou ,&nbsp;Hongcheng Xue ,&nbsp;Xiaobei Zhao ,&nbsp;Yakun Yang ,&nbsp;Longhe Wang ,&nbsp;Lin Li","doi":"10.1016/j.asoc.2026.114596","DOIUrl":"10.1016/j.asoc.2026.114596","url":null,"abstract":"<div><div>In poultry farming, non-invasive and accurate measurement of broiler chickens’ body size phenotypic traits is crucial for high-quality breeding and intelligent farming. Existing deep learning-based segmentation methods offer solutions, but there is still room for improvement in segmentation accuracy for body size phenotypes. To address this, we propose SegXTR (X-shape Transformer-based Segmentation Network), a multi-task learning-based phenotypic segmentation network for broiler chickens. This network combines the masked image modeling task with the segmentation task, enhancing segmentation accuracy and feature representation capability through multi-task collaboration. Specifically, we design two independent encoders to process the original and masked images. The original image encoder focuses on extracting segmentation features, while the masked image encoder adopts an adaptive mask strategy, focusing on areas with high reconstruction loss to enhance contextual feature learning. To achieve complementary and consistent cross-task features, we fuse features at the encoder output using a linear attention mechanism. Extensive experiments on the BroilerZTA dataset demonstrate that our method achieves dice similarity coefficients of 0.9326, 0.9273, and 0.6089 for the segmentation of three traits. This study provides a more accurate and feasible method for intelligent phenotypic segmentation in poultry farming. The code is available at <span><span>https://github.com/Github-XKou/SegXTR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"191 ","pages":"Article 114596"},"PeriodicalIF":6.6,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The cooperative deployment optimization of multiple unmanned aerial vehicles considering building obstruction and complex threats 考虑建筑物障碍物和复杂威胁的多架无人机协同部署优化
IF 6.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-13 DOI: 10.1016/j.asoc.2026.114603
Xiaojie Jin, Zhihao Luo, Zengxin Chen, Jianmai Shi
Unmanned Aerial Vehicle (UAV)-enabled communication provides a promising solution for urban wireless coverage. However, physical buildings and complex threats in urban environments pose significant challenges to communication quality and deployment reliability. Motivated by this, we investigate a novel multi-UAV cooperative deployment problem considering building obstruction and complex threats (MUCDP-BT). A grid-based environmental model is developed to represent buildings and cone-shaped threat zones, thereby defining feasible deployment spaces. To account for building-induced signal obstruction, a binary channel model is used to evaluate received signal quality under Line-of-Sight (LoS) or Non-Line-of-Sight (NLoS) conditions. On this basis, an optimization framework is developed for multi-UAV cooperative deployment, jointly considering spatial feasibility, signal obstruction, and communication service constraints. A space-division enhanced biogeography-based optimization (SDE-BBO) algorithm is developed to solve the problem. Extensive experiments, including multi-scale random instances and a real-world case in Changsha, are conducted to validate the effectiveness of the proposed method.
无人机(UAV)通信为城市无线覆盖提供了一个很有前途的解决方案。然而,城市环境中的实体建筑和复杂威胁对通信质量和部署可靠性提出了重大挑战。基于此,我们研究了一种考虑建筑物障碍物和复杂威胁的多无人机协同部署问题(MUCDP-BT)。开发了基于网格的环境模型来表示建筑物和锥形威胁区域,从而定义可行的部署空间。为了考虑建筑物引起的信号阻塞,使用二进制信道模型来评估视距(LoS)或非视距(NLoS)条件下的接收信号质量。在此基础上,综合考虑空间可行性、信号障碍和通信业务约束,构建了多无人机协同部署优化框架。针对这一问题,提出了一种基于空间划分的生物地理优化算法(SDE-BBO)。通过多尺度随机实例和长沙的实际案例进行了大量实验,验证了所提出方法的有效性。
{"title":"The cooperative deployment optimization of multiple unmanned aerial vehicles considering building obstruction and complex threats","authors":"Xiaojie Jin,&nbsp;Zhihao Luo,&nbsp;Zengxin Chen,&nbsp;Jianmai Shi","doi":"10.1016/j.asoc.2026.114603","DOIUrl":"10.1016/j.asoc.2026.114603","url":null,"abstract":"<div><div>Unmanned Aerial Vehicle (UAV)-enabled communication provides a promising solution for urban wireless coverage. However, physical buildings and complex threats in urban environments pose significant challenges to communication quality and deployment reliability. Motivated by this, we investigate a novel multi-UAV cooperative deployment problem considering building obstruction and complex threats (MUCDP-BT). A grid-based environmental model is developed to represent buildings and cone-shaped threat zones, thereby defining feasible deployment spaces. To account for building-induced signal obstruction, a binary channel model is used to evaluate received signal quality under Line-of-Sight (LoS) or Non-Line-of-Sight (NLoS) conditions. On this basis, an optimization framework is developed for multi-UAV cooperative deployment, jointly considering spatial feasibility, signal obstruction, and communication service constraints. A space-division enhanced biogeography-based optimization (SDE-BBO) algorithm is developed to solve the problem. Extensive experiments, including multi-scale random instances and a real-world case in Changsha, are conducted to validate the effectiveness of the proposed method.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"191 ","pages":"Article 114603"},"PeriodicalIF":6.6,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Infrared-based real-time leakage detection in earth–rock dams using an enhanced YOLO algorithm integrated with a biomimetic quadruped robot 基于增强YOLO算法与仿生四足机器人集成的土石坝红外实时泄漏检测
IF 6.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-12 DOI: 10.1016/j.asoc.2026.114609
Zhongdi Rong , Shengyi Cong , Liang Tang , Shuang Tian , Siang Huat Goh , Lei Sheng , Yizhe Nie
Dam leakage poses a critical challenge to the stability and safety of earth–rock dams. However, traditional manual inspection methods are inefficient, limited in scope, and can often fail in complex environments due to subjectivity, time consumption, and human errors. To address these limitations, this study integrates for the first time an optimized You Only Look Once Leakage Enhanced (YOLO-LE) algorithm with infrared thermography and a biomimetic quadruped robot to enable real-time, autonomous leakage detection. The proposed YOLO-LE framework incorporates squeeze-and-excitation attention mechanisms, adaptive spatial feature fusion, and transfer learning to improve detection precision, robustness against environmental interference (e.g., humidity and vegetation), and recognition of small-scale leakage points. The proposed framework is verified by indoor and field experiments using an FLIR A50 infrared camera mounted on a quadruped robot, selected for its terrain adaptability. A custom dataset of 1000 thermal images, augmented with rotation and noise injection, is used for training and evaluation. The results show that the proposed YOLO-LE framework achieves an average precision (AP) of 0.873 and an F1-score of 0.865, outperforming the YOLOv5, single-shot multi-box detector, and faster region-convolutional network models in AP by 7.2 %, 6.9 %, and 1.4 %, respectively. In addition, comparative experiments under different environmental disturbances demonstrate the superior resilience of the proposed framework, with an inference speed of 51 frames per second, ensuring real-time monitoring capability. Finally, the results validate the feasibility of combining deep learning with robotic systems for dam safety, providing a scalable, accurate, and automated solution for monitoring critical infrastructure.
大坝渗漏对土石坝的稳定性和安全性提出了严峻的挑战。然而,传统的人工检测方法效率低下,范围有限,并且由于主观性、时间消耗和人为错误,在复杂的环境中经常失败。为了解决这些限制,该研究首次将优化的You Only Look Once leak Enhanced (YOLO-LE)算法与红外热像仪和仿生四足机器人集成在一起,实现实时、自主的泄漏检测。提出的YOLO-LE框架结合了挤压-激励注意机制、自适应空间特征融合和迁移学习,以提高检测精度、对环境干扰(如湿度和植被)的鲁棒性以及对小尺度泄漏点的识别。采用地形适应性强的四足机器人上安装的FLIR A50红外摄像机,对该框架进行了室内和野外实验验证。1000张热图像的自定义数据集,通过旋转和噪声注入增强,用于训练和评估。结果表明,所提出的YOLO-LE框架的平均精度(AP)为0.873,f1得分为0.865,分别比YOLOv5、单镜头多盒检测器和更快的区域卷积网络模型在AP中的平均精度(AP)提高了7.2 %、6.9 %和1.4 %。此外,在不同环境干扰下的对比实验表明,该框架具有优越的弹性,推理速度达到51帧/秒,保证了实时监控能力。最后,研究结果验证了将深度学习与大坝安全机器人系统相结合的可行性,为监测关键基础设施提供了可扩展、准确和自动化的解决方案。
{"title":"Infrared-based real-time leakage detection in earth–rock dams using an enhanced YOLO algorithm integrated with a biomimetic quadruped robot","authors":"Zhongdi Rong ,&nbsp;Shengyi Cong ,&nbsp;Liang Tang ,&nbsp;Shuang Tian ,&nbsp;Siang Huat Goh ,&nbsp;Lei Sheng ,&nbsp;Yizhe Nie","doi":"10.1016/j.asoc.2026.114609","DOIUrl":"10.1016/j.asoc.2026.114609","url":null,"abstract":"<div><div>Dam leakage poses a critical challenge to the stability and safety of earth–rock dams. However, traditional manual inspection methods are inefficient, limited in scope, and can often fail in complex environments due to subjectivity, time consumption, and human errors. To address these limitations, this study integrates for the first time an optimized You Only Look Once Leakage Enhanced (YOLO-LE) algorithm with infrared thermography and a biomimetic quadruped robot to enable real-time, autonomous leakage detection. The proposed YOLO-LE framework incorporates squeeze-and-excitation attention mechanisms, adaptive spatial feature fusion, and transfer learning to improve detection precision, robustness against environmental interference (e.g., humidity and vegetation), and recognition of small-scale leakage points. The proposed framework is verified by indoor and field experiments using an FLIR A50 infrared camera mounted on a quadruped robot, selected for its terrain adaptability. A custom dataset of 1000 thermal images, augmented with rotation and noise injection, is used for training and evaluation. The results show that the proposed YOLO-LE framework achieves an average precision (AP) of 0.873 and an F1-score of 0.865, outperforming the YOLOv5, single-shot multi-box detector, and faster region-convolutional network models in AP by 7.2 %, 6.9 %, and 1.4 %, respectively. In addition, comparative experiments under different environmental disturbances demonstrate the superior resilience of the proposed framework, with an inference speed of 51 frames per second, ensuring real-time monitoring capability. Finally, the results validate the feasibility of combining deep learning with robotic systems for dam safety, providing a scalable, accurate, and automated solution for monitoring critical infrastructure.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"190 ","pages":"Article 114609"},"PeriodicalIF":6.6,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145980294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel method for large-scale group decision-making with application to e-commerce software system evaluation 一种新的大规模群体决策方法及其在电子商务软件系统评价中的应用
IF 6.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-12 DOI: 10.1016/j.asoc.2026.114595
Chuan Yue

BACKGROUND

Large-scale group decision-making (LSGDM) in big data environments faces challenges in robust data center construction, objective expert weighting, and efficient information fusion.

OBJECTIVE

This study aims to develop a novel LSGDM framework integrating a Golden Ratio-based data center and an inversion-based data quality metric to improve ranking stability and decision reliability.

METHODS

A GR-based data center was introduced to replace conventional mean/median centers, alongside an inversion-number-driven quality metric for expert weighting and a scalable aggregation technique for converting crisp data into intuitionistic fuzzy matrices. The framework was validated through dynamic experiments and sensitivity analysis.

RESULTS

The GR-based center outperformed mean/median centers in 95% of test scenarios. The inversion-based method achieved perfect ranking consistency (Kendall’s τ=1), showing a 50% improvement over entropy-based methods (τ=2/3), and maintained 100% ranking stability under parameter variations—20 times higher than entropy-based approaches.

CONCLUSION

The proposed framework offers a robust, quantitatively validated solution for LSGDM in data-intensive environments, with significant advantages in consistency and scalability.
背景大数据环境下的大规模群体决策(LSGDM)面临着稳健的数据中心建设、客观的专家权重和高效的信息融合等挑战。本研究旨在开发一种新的LSGDM框架,将基于黄金比例的数据中心和基于反转的数据质量度量相结合,以提高排名稳定性和决策可靠性。方法引入基于gr的数据中心来取代传统的均值/中位数中心,同时引入由反转数字驱动的专家权重质量度量和可扩展的聚合技术,将清晰的数据转换为直观的模糊矩阵。通过动态实验和灵敏度分析对该框架进行了验证。结果基于gr的中心在95%的测试场景中优于均值/中位数中心。基于反演的方法获得了完美的排序一致性(Kendall’s τ=1),比基于熵的方法(τ=2/3)提高了50%,在参数变化下保持了100%的排序稳定性,比基于熵的方法高20倍。结论提出的框架为数据密集型环境下的LSGDM提供了一种鲁棒性、定量验证的解决方案,在一致性和可扩展性方面具有显著优势。
{"title":"A novel method for large-scale group decision-making with application to e-commerce software system evaluation","authors":"Chuan Yue","doi":"10.1016/j.asoc.2026.114595","DOIUrl":"10.1016/j.asoc.2026.114595","url":null,"abstract":"<div><h3>BACKGROUND</h3><div>Large-scale group decision-making (LSGDM) in big data environments faces challenges in robust data center construction, objective expert weighting, and efficient information fusion.</div></div><div><h3>OBJECTIVE</h3><div>This study aims to develop a novel LSGDM framework integrating a Golden Ratio-based data center and an inversion-based data quality metric to improve ranking stability and decision reliability.</div></div><div><h3>METHODS</h3><div>A GR-based data center was introduced to replace conventional mean/median centers, alongside an inversion-number-driven quality metric for expert weighting and a scalable aggregation technique for converting crisp data into intuitionistic fuzzy matrices. The framework was validated through dynamic experiments and sensitivity analysis.</div></div><div><h3>RESULTS</h3><div>The GR-based center outperformed mean/median centers in 95% of test scenarios. The inversion-based method achieved perfect ranking consistency (Kendall’s <span><math><mi>τ</mi><mo>=</mo><mn>1</mn></math></span>), showing a 50% improvement over entropy-based methods (<span><math><mi>τ</mi><mo>=</mo><mn>2</mn><mrow><mo>/</mo></mrow><mn>3</mn></math></span>), and maintained 100% ranking stability under parameter variations—20 times higher than entropy-based approaches.</div></div><div><h3>CONCLUSION</h3><div>The proposed framework offers a robust, quantitatively validated solution for LSGDM in data-intensive environments, with significant advantages in consistency and scalability.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"191 ","pages":"Article 114595"},"PeriodicalIF":6.6,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Small aerial object detection through FPN-DETR integrated with a novel alignment proposal network 结合新型对准建议网络的FPN-DETR小目标检测
IF 6.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-12 DOI: 10.1016/j.asoc.2026.114620
Usman Ahmad , Tianlei Ma , Jing Liang , Ponnuthurai Nagaratnam Suganthan , Kunjie Yu , Faisal Mehmood , Farhad Banoori
In the domain of aerial imagery analysis, Small Aerial Object Detection (SAOD) presents significant challenges due to extensive scale variations, diverse orientations, and cluttered arrangements. Existing methods rely on anchor-based boxes or dense points, which involve complex manual steps such as anchor generation, transformation, and non-maximum suppression reasoning. This study proposes an innovative model that incorporates a Feature Pyramid Network into a Detection Transformer (FPN-DETR) to effectively address these challenges. The proposed approach employs the strengths of detection Transformers in modeling long-range dependencies, enabling a more comprehensive understanding of the aerial scene context, extraction of rich and scale-invariant features, and enhancement of detection accuracy for small aerial objects across varied scales and orientations. Additionally, this study develops a Novel Alignment Proposal Network (APN) with a novel loss function to further enhance FPN-DETR, resulting in the creation of the FPN-DETR+APN model. This network eliminates the time-consuming process of creating hand-crafted rotational anchors, which leads to computational complexity in existing conventional oriented detectors. Furthermore, FPN-DETR+APN generates oriented proposals that adeptly capture the diverse orientations of small objects in aerial scenes and provides improved positional priors for feature pooling, thereby enhancing cross-attention modulation in the Transformer decoder. The proposed method demonstrates superior performance in detecting small aerial objects, surpassing state-of-the-art detection frameworks in terms of precision, recall, and overall detection robustness. This study achieves 83.25 % mAP on DOTA-v1.0, 63.07 % mAP on DOTA-v2.0, 97.53 % on HRSC2016, 95.98 % on SSDD, 60.78 % on VisDrone2021, and 94.80 % on HRSID datasets.
在航空图像分析领域,小型空中目标检测(SAOD)由于广泛的尺度变化、不同的方向和混乱的排列而面临重大挑战。现有的方法依赖于基于锚点的盒子或密集点,这涉及复杂的手动步骤,如锚点生成、转换和非最大抑制推理。本研究提出了一种创新模型,该模型将特征金字塔网络集成到检测变压器(FPN-DETR)中,以有效应对这些挑战。该方法利用了探测变形在远程依赖关系建模方面的优势,能够更全面地理解空中场景背景,提取丰富且尺度不变的特征,并提高了对不同尺度和方向的小型空中物体的检测精度。此外,本研究还开发了一种具有新颖损失函数的新型对齐建议网络(APN),以进一步增强FPN-DETR,从而创建了FPN-DETR+APN模型。该网络消除了手工制作旋转锚点的耗时过程,这导致了现有传统定向探测器的计算复杂性。此外,FPN-DETR+APN生成定向建议,熟练地捕获空中场景中小物体的不同方向,并为特征池提供改进的位置先验,从而增强Transformer解码器中的交叉注意调制。该方法在检测小型空中目标方面表现出优异的性能,在精度、召回率和整体检测鲁棒性方面超越了最先进的检测框架。本研究在DOTA-v1.0上实现了83.25 % mAP,在DOTA-v2.0上实现了63.07 % mAP,在HRSC2016上实现了97.53 %,在SSDD上实现了95.98 %,在VisDrone2021上实现了60.78 %,在HRSID上实现了94.80 %。
{"title":"Small aerial object detection through FPN-DETR integrated with a novel alignment proposal network","authors":"Usman Ahmad ,&nbsp;Tianlei Ma ,&nbsp;Jing Liang ,&nbsp;Ponnuthurai Nagaratnam Suganthan ,&nbsp;Kunjie Yu ,&nbsp;Faisal Mehmood ,&nbsp;Farhad Banoori","doi":"10.1016/j.asoc.2026.114620","DOIUrl":"10.1016/j.asoc.2026.114620","url":null,"abstract":"<div><div>In the domain of aerial imagery analysis, Small Aerial Object Detection (SAOD) presents significant challenges due to extensive scale variations, diverse orientations, and cluttered arrangements. Existing methods rely on anchor-based boxes or dense points, which involve complex manual steps such as anchor generation, transformation, and non-maximum suppression reasoning. This study proposes an innovative model that incorporates a Feature Pyramid Network into a Detection Transformer (FPN-DETR) to effectively address these challenges. The proposed approach employs the strengths of detection Transformers in modeling long-range dependencies, enabling a more comprehensive understanding of the aerial scene context, extraction of rich and scale-invariant features, and enhancement of detection accuracy for small aerial objects across varied scales and orientations. Additionally, this study develops a Novel Alignment Proposal Network (APN) with a novel loss function to further enhance FPN-DETR, resulting in the creation of the FPN-DETR+APN model. This network eliminates the time-consuming process of creating hand-crafted rotational anchors, which leads to computational complexity in existing conventional oriented detectors. Furthermore, FPN-DETR+APN generates oriented proposals that adeptly capture the diverse orientations of small objects in aerial scenes and provides improved positional priors for feature pooling, thereby enhancing cross-attention modulation in the Transformer decoder. The proposed method demonstrates superior performance in detecting small aerial objects, surpassing state-of-the-art detection frameworks in terms of precision, recall, and overall detection robustness. This study achieves 83.25 % mAP on DOTA-v1.0, 63.07 % mAP on DOTA-v2.0, 97.53 % on HRSC2016, 95.98 % on SSDD, 60.78 % on VisDrone2021, and 94.80 % on HRSID datasets.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"191 ","pages":"Article 114620"},"PeriodicalIF":6.6,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146039913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Applied Soft Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1