首页 > 最新文献

Knowledge-Based Systems最新文献

英文 中文
SGC: A self-guided cascade multitask model for low-light object detection 弱光目标检测的自引导级联多任务模型
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-08 Epub Date: 2026-02-05 DOI: 10.1016/j.knosys.2026.115462
Jiakun Jin , Junchao Zhang , Yidong Luo , Jiandong Tian
Outdoor scenes often suffer from insufficient and non-uniform illumination, leading to object detection (OD) failures. This issue has garnered research attention, with the mainstream solution being to improve the model’s feature extraction capability through cascaded feature enhancement modules. However, such approaches increase the model’s complexity and the enhancement effect is highly dependent on the similarity between the training and testing data. Alternatively, some methods incorporate parallel low-light image enhancement (LLE) modules to guide the training of object detection models. Nevertheless, due to the lack of object detection datasets containing paired bright and low-light images, these methods often require manually selecting appropriate pre-trained LLE models for different scenes, making end-to-end training challenging. In this paper, we aim to build an end-to-end LLE&OD cascade multitask model that leverages the strengths of both approaches. We use a new data augmentation techniques to synthesize low-light images from normal-light object detection datasets. To mutually train the cascade model, a new self-guided loss is designed. By deconstruction and reorganization of the multitask model, the self-guided loss effectively steering the model away from local optima for single tasks, enabling the model to achieve superior performance compared to many state-of-the-art methods on several publicly available night scene datasets, as well as on a daytime scene dataset. The source code of the proposed method will be available at https://github.com/225ceV/SGC.
户外场景经常受到光照不足和不均匀的影响,导致物体检测(OD)失败。这个问题已经引起了研究的关注,主流的解决方案是通过级联的特征增强模块来提高模型的特征提取能力。然而,这种方法增加了模型的复杂性,并且增强效果高度依赖于训练数据和测试数据之间的相似度。或者,一些方法结合并行低光图像增强(LLE)模块来指导目标检测模型的训练。然而,由于缺乏包含成对的明亮和低光图像的目标检测数据集,这些方法通常需要为不同的场景手动选择适当的预训练LLE模型,这使得端到端训练具有挑战性。在本文中,我们的目标是建立一个端到端的lleod级联多任务模型,利用这两种方法的优势。我们使用一种新的数据增强技术,从正常光目标检测数据集合成低光图像。为了相互训练串级模型,设计了一种新的自导向损失。通过对多任务模型的解构和重组,自引导损失有效地使模型远离单个任务的局部最优,使模型在几个公开可用的夜景数据集以及白天场景数据集上获得比许多最先进的方法更好的性能。建议的方法的源代码可以在https://github.com/225ceV/SGC上获得。
{"title":"SGC: A self-guided cascade multitask model for low-light object detection","authors":"Jiakun Jin ,&nbsp;Junchao Zhang ,&nbsp;Yidong Luo ,&nbsp;Jiandong Tian","doi":"10.1016/j.knosys.2026.115462","DOIUrl":"10.1016/j.knosys.2026.115462","url":null,"abstract":"<div><div>Outdoor scenes often suffer from insufficient and non-uniform illumination, leading to object detection (OD) failures. This issue has garnered research attention, with the mainstream solution being to improve the model’s feature extraction capability through cascaded feature enhancement modules. However, such approaches increase the model’s complexity and the enhancement effect is highly dependent on the similarity between the training and testing data. Alternatively, some methods incorporate parallel low-light image enhancement (LLE) modules to guide the training of object detection models. Nevertheless, due to the lack of object detection datasets containing paired bright and low-light images, these methods often require manually selecting appropriate pre-trained LLE models for different scenes, making end-to-end training challenging. In this paper, we aim to build an end-to-end LLE&amp;OD cascade multitask model that leverages the strengths of both approaches. We use a new data augmentation techniques to synthesize low-light images from normal-light object detection datasets. To mutually train the cascade model, a new self-guided loss is designed. By deconstruction and reorganization of the multitask model, the self-guided loss effectively steering the model away from local optima for single tasks, enabling the model to achieve superior performance compared to many state-of-the-art methods on several publicly available night scene datasets, as well as on a daytime scene dataset. The source code of the proposed method will be available at <span><span>https://github.com/225ceV/SGC</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115462"},"PeriodicalIF":7.6,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A frequency and spatial domain interactive fusion network for underwater image enhancement 一种用于水下图像增强的频域与空域交互融合网络
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-08 Epub Date: 2026-02-05 DOI: 10.1016/j.knosys.2026.115499
Guangfeng Li, Shiming Sun, Chengwen Zhang, Guangsen Jiao
Underwater images often suffer from degradation such as color distortion and detail blurring due to light absorption and scattering. Despite numerous underwater image enhancement methods have been proposed to improve visual quality, most of them are limited to processing spatial domain features and neglect the inherent frequency domain information, thereby affecting the enhancement performance. To address this issue, we propose a Frequency and Spatial domain Interactive Fusion Network (FSIF-Net) for underwater image enhancement. Specifically, we first design a Frequency Domain Enhancement Module (FDEM). This module leverages the feature decomposition capability of wavelet transform alongside the frequency domain modeling advantages of the Fast Fourier Transform (FFT), effectively enhancing global color while improving edge and texture information. Secondly, we design a Local Detail Enhancement Module (LDEM), which utilizes horizontal, vertical, and diagonal convolutional operations to enhance anisotropic image features and introduces a sliding window mechanism to improve local detail enhancement capability. Finally, to achieve complementary fusion of frequency and spatial domain features, we design a Dual-domain Interactive Fusion Module (DIFM). This module adaptively acquires more representative frequency and spatial features through a weight-reshaping gating mechanism, followed by comprehensive fusion of the dual-domain features across both channel and spatial dimensions. Extensive experiments demonstrate that the proposed FSIF-Net significantly enhances the visual quality of underwater images and outperforms state-of-the-art methods in both quantitative and qualitative evaluations.
由于光的吸收和散射,水下图像经常遭受诸如颜色失真和细节模糊等退化。为了提高水下图像的视觉质量,人们提出了许多水下图像增强方法,但大多数方法都局限于处理空间域特征,忽略了固有的频域信息,从而影响了增强效果。为了解决这个问题,我们提出了一种用于水下图像增强的频率和空间域交互融合网络(FSIF-Net)。具体来说,我们首先设计了一个频域增强模块(FDEM)。该模块利用小波变换的特征分解能力和快速傅里叶变换(FFT)的频域建模优势,在提高边缘和纹理信息的同时,有效增强了图像的全局色彩。其次,我们设计了局部细节增强模块(LDEM),利用水平、垂直和对角卷积运算增强图像的各向异性特征,并引入滑动窗口机制来提高局部细节增强能力。最后,为了实现频率域和空间域特征的互补融合,设计了双域交互融合模块(DIFM)。该模块通过权重重塑门控机制自适应获取更具代表性的频率和空间特征,然后在通道和空间维度上对双域特征进行全面融合。大量的实验表明,所提出的FSIF-Net显著提高了水下图像的视觉质量,在定量和定性评估方面都优于最先进的方法。
{"title":"A frequency and spatial domain interactive fusion network for underwater image enhancement","authors":"Guangfeng Li,&nbsp;Shiming Sun,&nbsp;Chengwen Zhang,&nbsp;Guangsen Jiao","doi":"10.1016/j.knosys.2026.115499","DOIUrl":"10.1016/j.knosys.2026.115499","url":null,"abstract":"<div><div>Underwater images often suffer from degradation such as color distortion and detail blurring due to light absorption and scattering. Despite numerous underwater image enhancement methods have been proposed to improve visual quality, most of them are limited to processing spatial domain features and neglect the inherent frequency domain information, thereby affecting the enhancement performance. To address this issue, we propose a Frequency and Spatial domain Interactive Fusion Network (FSIF-Net) for underwater image enhancement. Specifically, we first design a Frequency Domain Enhancement Module (FDEM). This module leverages the feature decomposition capability of wavelet transform alongside the frequency domain modeling advantages of the Fast Fourier Transform (FFT), effectively enhancing global color while improving edge and texture information. Secondly, we design a Local Detail Enhancement Module (LDEM), which utilizes horizontal, vertical, and diagonal convolutional operations to enhance anisotropic image features and introduces a sliding window mechanism to improve local detail enhancement capability. Finally, to achieve complementary fusion of frequency and spatial domain features, we design a Dual-domain Interactive Fusion Module (DIFM). This module adaptively acquires more representative frequency and spatial features through a weight-reshaping gating mechanism, followed by comprehensive fusion of the dual-domain features across both channel and spatial dimensions. Extensive experiments demonstrate that the proposed FSIF-Net significantly enhances the visual quality of underwater images and outperforms state-of-the-art methods in both quantitative and qualitative evaluations.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115499"},"PeriodicalIF":7.6,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RGNN3D: A hybrid radiomic graph neural network for 3D MRI glioma grading RGNN3D:用于三维MRI胶质瘤分级的混合放射图神经网络
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-25 Epub Date: 2026-01-16 DOI: 10.1016/j.knosys.2026.115343
Md Aiyub Ali , Md Shakhawat Hossain , Taslima Ferdaus Shuva , Muhammad Ali Abdullah Almoyad , Nabil Anan Orka , Risala Tasin Khan , M. Shamim Kaiser , Md Tanvir Rahman , Mohammad Ali Moni
The diagnosis of glioma, a complex and often deadly brain tumor, involves extensive medical examinations. Still, accurately grading and classifying gliomas is difficult, as different areas within the same tumor can exhibit varying characteristics. The integration of radiomics, a clinically relevant feature extraction method, with machine learning (ML) is becoming increasingly popular in addressing this issue, but several research gaps persist. To this end, this study proposes a novel deep neural network, RGNN3D, that combines Graph Neural Networks with LSTM layers to precisely grade gliomas in 3D magnetic resonance imaging (MRI) data. To train our proposed model, we meticulously extracted 112 radiomic biomarkers. Utilizing the biomarkers, RGNN3D constructs a graph, channels essential information through its layers, and preserves only pertinent information via its integrated memory cells. The proposed framework attained an accuracy of 98.58%, aligning with the performance of previous state-of-the-art architectures and surpassing prior radiomic-based ML models. We further employed an explainable AI approach (LIME) to highlight the most significant features, assisting radiologists in making more informed decisions. In short, RGNN3D offers a reliable and robust computer-aided solution for potential clinical application in the automated identification of gliomas.
神经胶质瘤是一种复杂且往往致命的脑肿瘤,其诊断需要大量的医学检查。然而,准确分级和分类胶质瘤是困难的,因为同一肿瘤内的不同区域可能表现出不同的特征。放射组学(一种临床相关的特征提取方法)与机器学习(ML)的结合在解决这一问题方面越来越受欢迎,但仍存在一些研究空白。为此,本研究提出了一种新的深度神经网络RGNN3D,该网络将图神经网络与LSTM层相结合,可以精确地对3D磁共振成像(MRI)数据中的胶质瘤进行分级。为了训练我们提出的模型,我们精心提取了112个放射性生物标志物。利用生物标记物,RGNN3D构建一个图形,通过其层传递基本信息,并通过其集成记忆细胞仅保留相关信息。提出的框架达到了98.58%的准确率,与以前最先进的架构的性能保持一致,超过了以前基于放射学的ML模型。我们进一步采用可解释的人工智能方法(LIME)来突出最重要的特征,帮助放射科医生做出更明智的决定。总之,RGNN3D为神经胶质瘤自动识别的潜在临床应用提供了可靠而强大的计算机辅助解决方案。
{"title":"RGNN3D: A hybrid radiomic graph neural network for 3D MRI glioma grading","authors":"Md Aiyub Ali ,&nbsp;Md Shakhawat Hossain ,&nbsp;Taslima Ferdaus Shuva ,&nbsp;Muhammad Ali Abdullah Almoyad ,&nbsp;Nabil Anan Orka ,&nbsp;Risala Tasin Khan ,&nbsp;M. Shamim Kaiser ,&nbsp;Md Tanvir Rahman ,&nbsp;Mohammad Ali Moni","doi":"10.1016/j.knosys.2026.115343","DOIUrl":"10.1016/j.knosys.2026.115343","url":null,"abstract":"<div><div>The diagnosis of glioma, a complex and often deadly brain tumor, involves extensive medical examinations. Still, accurately grading and classifying gliomas is difficult, as different areas within the same tumor can exhibit varying characteristics. The integration of radiomics, a clinically relevant feature extraction method, with machine learning (ML) is becoming increasingly popular in addressing this issue, but several research gaps persist. To this end, this study proposes a novel deep neural network, RGNN3D, that combines Graph Neural Networks with LSTM layers to precisely grade gliomas in 3D magnetic resonance imaging (MRI) data. To train our proposed model, we meticulously extracted 112 radiomic biomarkers. Utilizing the biomarkers, RGNN3D constructs a graph, channels essential information through its layers, and preserves only pertinent information via its integrated memory cells. The proposed framework attained an accuracy of 98.58%, aligning with the performance of previous state-of-the-art architectures and surpassing prior radiomic-based ML models. We further employed an explainable AI approach (LIME) to highlight the most significant features, assisting radiologists in making more informed decisions. In short, RGNN3D offers a reliable and robust computer-aided solution for potential clinical application in the automated identification of gliomas.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115343"},"PeriodicalIF":7.6,"publicationDate":"2026-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-View fusion feature representation learning for drug-target interaction prediction 多视图融合特征表示学习用于药物-靶标相互作用预测
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-25 Epub Date: 2026-01-23 DOI: 10.1016/j.knosys.2026.115364
Hua Duan, Junyue Dong, Yufei Zhao, Shiduo Wang, Wenhao Wang
Prediction of Drug-Target Interactions(DTI) is crucial for drug discovery. Heterogeneous graph neural networks(HGNNs) provide an efficient computational approach by modeling complex biological networks, overcoming the high cost and time constraints associated with traditional experimental methods. However, existing HGNNs primarily rely on meta-path-based topological learning, often overlooking attribute similarities between nodes and inherent structural consistency. This single-perspective learning mechanism limits their ability to leverage multi-source heterogeneous information, resulting in poor generalization performance, particularly under sparse data scenarios. To address these issues, this paper proposes MV-HGNN, a multi-view fusion model. It learns comprehensive feature embeddings for drugs and proteins from three complementary perspectives: 1) A View-Specific Topology Embedding Module, which captures topology-driven representations through graph propagation and aggregation; 2) A Structure-Consensus-Aware Cross-Domain Alignment Module, which identifies latent structural consistency by mining original node features, thereby compensating for missing topological information in sparse networks; 3) A Latent Space Semantic Regularization Aggregation Module, which enhances generalization with scarce samples by pulling semantically similar nodes closer in the refined latent embedding space. The complementary features learned from these topological, structural, and semantic views are fused via an adaptive attention mechanism. The DTI prediction task is formulated as a classification problem on a constructed Drug-Protein Pair(DPP) graph. Experimental results demonstrate that MV-HGNN significantly outperforms existing baseline methods across multiple metrics.
药物-靶标相互作用预测(DTI)对药物发现至关重要。异构图神经网络(hgnn)通过对复杂生物网络进行建模,克服了传统实验方法的高成本和时间限制,提供了一种高效的计算方法。然而,现有的hgnn主要依赖于基于元路径的拓扑学习,往往忽略了节点之间的属性相似性和固有结构一致性。这种单视角学习机制限制了它们利用多源异构信息的能力,导致泛化性能差,特别是在稀疏数据场景下。为了解决这些问题,本文提出了多视图融合模型MV-HGNN。它从三个互补的角度学习药物和蛋白质的综合特征嵌入:1)特定于视图的拓扑嵌入模块,通过图传播和聚合捕获拓扑驱动的表示;2)基于结构共识的跨域对齐模块,通过挖掘原始节点特征来识别潜在的结构一致性,从而补偿稀疏网络中缺失的拓扑信息;3)潜在空间语义正则化聚合模块,该模块通过在精细化的潜在嵌入空间中拉紧语义相似的节点来增强稀缺样本的泛化。从这些拓扑、结构和语义视图中学习到的互补特征通过自适应注意机制融合在一起。将DTI预测任务表述为构建的药物-蛋白质对(Drug-Protein Pair, DPP)图的分类问题。实验结果表明,MV-HGNN在多个指标上明显优于现有的基线方法。
{"title":"Multi-View fusion feature representation learning for drug-target interaction prediction","authors":"Hua Duan,&nbsp;Junyue Dong,&nbsp;Yufei Zhao,&nbsp;Shiduo Wang,&nbsp;Wenhao Wang","doi":"10.1016/j.knosys.2026.115364","DOIUrl":"10.1016/j.knosys.2026.115364","url":null,"abstract":"<div><div>Prediction of Drug-Target Interactions(DTI) is crucial for drug discovery. Heterogeneous graph neural networks(HGNNs) provide an efficient computational approach by modeling complex biological networks, overcoming the high cost and time constraints associated with traditional experimental methods. However, existing HGNNs primarily rely on meta-path-based topological learning, often overlooking attribute similarities between nodes and inherent structural consistency. This single-perspective learning mechanism limits their ability to leverage multi-source heterogeneous information, resulting in poor generalization performance, particularly under sparse data scenarios. To address these issues, this paper proposes MV-HGNN, a multi-view fusion model. It learns comprehensive feature embeddings for drugs and proteins from three complementary perspectives: 1) A View-Specific Topology Embedding Module, which captures topology-driven representations through graph propagation and aggregation; 2) A Structure-Consensus-Aware Cross-Domain Alignment Module, which identifies latent structural consistency by mining original node features, thereby compensating for missing topological information in sparse networks; 3) A Latent Space Semantic Regularization Aggregation Module, which enhances generalization with scarce samples by pulling semantically similar nodes closer in the refined latent embedding space. The complementary features learned from these topological, structural, and semantic views are fused via an adaptive attention mechanism. The DTI prediction task is formulated as a classification problem on a constructed Drug-Protein Pair(DPP) graph. Experimental results demonstrate that MV-HGNN significantly outperforms existing baseline methods across multiple metrics.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115364"},"PeriodicalIF":7.6,"publicationDate":"2026-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel hybrid machine and deep learning approach for brain tumor classification based patient survival time prediction 一种基于脑肿瘤分类的患者生存时间预测的新型混合机器和深度学习方法
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-25 Epub Date: 2026-01-12 DOI: 10.1016/j.knosys.2026.115322
Puligurti Suvarna Raju , P. Chitra , Battula Tirumala Krishna
Brain tumors pose a significant health challenge, necessitating efficient and automated detection methods for timely and accurate diagnosis. Manual examination of MRI scans is often hindered by tumor heterogeneity, including variations in size, shape, and location, which can delay assessment and reduce accuracy. To address these challenges, this research proposes a hybrid deep learning framework, the Multi-layer Extreme Learning Machine-Progressive Adjacent-layer Coordination Symmetric Cascade Network (MELM-PACSCN), designed for precise segmentation and classification of pre-processed MRI scans into pituitary, meningioma, glioma tumors, and the whole tumor area. MELM-PACSCN integrates the rapid-learning capabilities of the Multi-layer Extreme Learning Machine (MELM) with the multi-scale feature fusion of PACSCN to accurately delineate complex tumor boundaries. The Deep Discrete Wavelet Auto Encoder Transform (DDWAET) extracts critical intensity, texture, morphological, and wavelet-based features from segmented tumor images. Additionally, the Tactical Unit Algorithm (TUA) enhances optimization and stability by enabling global search and efficient data management, improving model robustness. For survival prediction, the Multi-Kernel Condition-Aware Neural Network (MKCANN) classifies patients into short-term, mid-term, or long-term survival groups using adaptive patient-specific kernels. Experimental evaluations demonstrate that the proposed model achieves superior performance, with accuracy and Dice scores of 99.15 %/99 % (BraTS2018), 99.52 %/99.97 % (BraTS2019), 98.94 %/99.78 % (BraTS2020), and 99.72 %/99.63 % (BraTS2021), surpassing widely used architectures such as 3D-UNet and ResUNet+. These results indicate enhanced tumor boundary localization, improved tumor-type recognition, and accurate survival-group prediction. The framework offers significant potential to support clinical decision-making in neuro-oncology by providing precise, reliable, and efficient tumor analysis.
脑肿瘤对健康构成重大挑战,需要高效和自动化的检测方法来及时准确地诊断。MRI扫描的人工检查经常受到肿瘤异质性的阻碍,包括大小、形状和位置的变化,这可能会延迟评估并降低准确性。为了解决这些挑战,本研究提出了一种混合深度学习框架,即多层极限学习机-渐进式邻接层协调对称级联网络(MELM-PACSCN),旨在对预处理后的MRI扫描进行精确分割和分类,包括垂体、脑膜瘤、胶质瘤和整个肿瘤区域。MELM-PACSCN将多层极限学习机(Multi-layer Extreme Learning Machine, MELM)的快速学习能力与PACSCN的多尺度特征融合在一起,能够准确描绘复杂的肿瘤边界。深度离散小波自编码器变换(DDWAET)从分割的肿瘤图像中提取关键强度、纹理、形态和基于小波的特征。此外,战术单元算法(TUA)通过实现全局搜索和有效的数据管理来增强优化和稳定性,提高模型的鲁棒性。对于生存预测,多核状态感知神经网络(MKCANN)使用自适应患者特异性核将患者分为短期、中期或长期生存组。实验评估表明,该模型具有优异的性能,准确率和Dice分数分别为99.15% / 99% (BraTS2018)、99.52% / 99.97% (BraTS2019)、98.94% / 99.78% (BraTS2020)和99.72% / 99.63% (BraTS2021),超过了广泛使用的3D-UNet和ResUNet+等架构。这些结果表明增强肿瘤边界定位,改善肿瘤类型识别,准确预测生存组。该框架通过提供精确、可靠和高效的肿瘤分析,为支持神经肿瘤学的临床决策提供了巨大的潜力。
{"title":"A novel hybrid machine and deep learning approach for brain tumor classification based patient survival time prediction","authors":"Puligurti Suvarna Raju ,&nbsp;P. Chitra ,&nbsp;Battula Tirumala Krishna","doi":"10.1016/j.knosys.2026.115322","DOIUrl":"10.1016/j.knosys.2026.115322","url":null,"abstract":"<div><div>Brain tumors pose a significant health challenge, necessitating efficient and automated detection methods for timely and accurate diagnosis. Manual examination of MRI scans is often hindered by tumor heterogeneity, including variations in size, shape, and location, which can delay assessment and reduce accuracy. To address these challenges, this research proposes a hybrid deep learning framework, the Multi-layer Extreme Learning Machine-Progressive Adjacent-layer Coordination Symmetric Cascade Network (MELM-PACSCN), designed for precise segmentation and classification of pre-processed MRI scans into pituitary, meningioma, glioma tumors, and the whole tumor area. MELM-PACSCN integrates the rapid-learning capabilities of the Multi-layer Extreme Learning Machine (MELM) with the multi-scale feature fusion of PACSCN to accurately delineate complex tumor boundaries. The Deep Discrete Wavelet Auto Encoder Transform (DDWAET) extracts critical intensity, texture, morphological, and wavelet-based features from segmented tumor images. Additionally, the Tactical Unit Algorithm (TUA) enhances optimization and stability by enabling global search and efficient data management, improving model robustness. For survival prediction, the Multi-Kernel Condition-Aware Neural Network (MKCANN) classifies patients into short-term, mid-term, or long-term survival groups using adaptive patient-specific kernels. Experimental evaluations demonstrate that the proposed model achieves superior performance, with accuracy and Dice scores of 99.15 %/99 % (BraTS2018), 99.52 %/99.97 % (BraTS2019), 98.94 %/99.78 % (BraTS2020), and 99.72 %/99.63 % (BraTS2021), surpassing widely used architectures such as 3D-UNet and ResUNet+. These results indicate enhanced tumor boundary localization, improved tumor-type recognition, and accurate survival-group prediction. The framework offers significant potential to support clinical decision-making in neuro-oncology by providing precise, reliable, and efficient tumor analysis.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115322"},"PeriodicalIF":7.6,"publicationDate":"2026-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-domain time-frequency Mamba: A more effective model for long-term time series forecasting 跨域时频曼巴:一种更有效的长期时间序列预测模型
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-25 Epub Date: 2026-01-21 DOI: 10.1016/j.knosys.2026.115341
Yuhang Duan, Lin Lin, Jinyuan Liu, Qing Zhang, Xin Fan
Long-term time series forecasting (LTSF) is crucial in domains such as smart energy systems and industrial Internet of Things. Existing methods face intertwined challenges in LTSF. Single-domain modeling often fails to capture local fluctuations and global trends, resulting in incomplete temporal representations. While attention-based models effectively capture long-range dependencies, their quadratic computational complexity limits their efficiency and scalability. Moreover, cross-scale conflicts frequently occur in long-term forecasting. Short-term patterns may interfere with long-term trends, thereby degrading prediction accuracy. To address these issues, we propose cross-domain time-frequency Mamba (CDTF-Mamba), which synergistically models time series in both the time and frequency domains. CDTF-Mamba’s time-domain pyramid Mamba component disentangles multiscale patterns, while the frequency-domain decomposition Mamba component stabilizes state evolution while mitigating nonstationarity. We perform extensive experiments on 13 widely used benchmark datasets. Experimental results demonstrate that CDTF-Mamba achieves superior accuracy while maintaining high efficiency and strong scalability compared with state-of-the-art methods.
长期时间序列预测(LTSF)在智能能源系统和工业物联网等领域至关重要。现有方法在LTSF中面临着错综复杂的挑战。单域建模经常不能捕获局部波动和全局趋势,导致不完整的时间表示。虽然基于注意力的模型可以有效地捕获远程依赖关系,但它们的二次计算复杂性限制了它们的效率和可扩展性。此外,在长期预测中经常出现跨尺度冲突。短期模式可能干扰长期趋势,从而降低预测的准确性。为了解决这些问题,我们提出了跨域时频曼巴(CDTF-Mamba),它在时间和频率域协同建模时间序列。CDTF-Mamba的时域金字塔曼巴分量解开了多尺度模式,而频域分解曼巴分量稳定了状态演变,同时减轻了非平稳性。我们在13个广泛使用的基准数据集上进行了广泛的实验。实验结果表明,与现有方法相比,CDTF-Mamba在保持高效率和较强可扩展性的同时,具有优越的精度。
{"title":"Cross-domain time-frequency Mamba: A more effective model for long-term time series forecasting","authors":"Yuhang Duan,&nbsp;Lin Lin,&nbsp;Jinyuan Liu,&nbsp;Qing Zhang,&nbsp;Xin Fan","doi":"10.1016/j.knosys.2026.115341","DOIUrl":"10.1016/j.knosys.2026.115341","url":null,"abstract":"<div><div>Long-term time series forecasting (LTSF) is crucial in domains such as smart energy systems and industrial Internet of Things. Existing methods face intertwined challenges in LTSF. Single-domain modeling often fails to capture local fluctuations and global trends, resulting in incomplete temporal representations. While attention-based models effectively capture long-range dependencies, their quadratic computational complexity limits their efficiency and scalability. Moreover, cross-scale conflicts frequently occur in long-term forecasting. Short-term patterns may interfere with long-term trends, thereby degrading prediction accuracy. To address these issues, we propose cross-domain time-frequency Mamba (CDTF-Mamba), which synergistically models time series in both the time and frequency domains. CDTF-Mamba’s time-domain pyramid Mamba component disentangles multiscale patterns, while the frequency-domain decomposition Mamba component stabilizes state evolution while mitigating nonstationarity. We perform extensive experiments on 13 widely used benchmark datasets. Experimental results demonstrate that CDTF-Mamba achieves superior accuracy while maintaining high efficiency and strong scalability compared with state-of-the-art methods.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115341"},"PeriodicalIF":7.6,"publicationDate":"2026-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UECNet: A unified framework for exposure correction utilizing region-level prompts UECNet:使用区域级提示进行曝光校正的统一框架
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-25 Epub Date: 2026-01-20 DOI: 10.1016/j.knosys.2026.115365
Shucheng Xia , Kan Chang , Yuqing Li , Mingyang Ling , Xuxin Tai , Yehua Ling , Yujian Yuan , Zan Gao
In real-world scenarios, complex illumination often causes improper exposure in images. Most existing correction methods assume uniform exposure degradation across the entire image, leading to suboptimal performance when multiple exposure degradations coexist in a single image. To address this limitation, we propose UECNet, a Unified Exposure Correction Network guided by region-level prompts. Specifically, we first derive five degradation-specific text prompts through prompt tuning. These prompts are fed into our Exposure Prompts Generation (EPG) module, which generates spatially adaptive, region-level descriptors to characterize local exposure properties. To effectively integrate these region-specific descriptors into the exposure correction pipeline, we design a Prompt-guided Token Mixer (PTM) module. The PTM enables global interactive modeling between high-dimensional visual features and region-level prompts, thereby dynamically steering the correction process. UECNet is built by incorporating EPG and PTM into a U-shaped Transformer backbone. Furthermore, we introduce SICE-DE (SICE-based Diverse Exposure), a new benchmark dataset reorganized from the well-known SICE dataset, to facilitate effective training and comprehensive evaluation. SICE-DE covers six distinct exposure conditions, including challenging severe over/underexposure and non-uniform exposure. Extensive experiments demonstrate that the proposed UECNet consistently outperforms state-of-the-art methods on multiple exposure correction benchmarks. Our code and the SICE-DE dataset will be available at https://github.com/ShuchengXia/UECNet.
在现实世界中,复杂的照明经常会导致图像曝光不当。大多数现有的校正方法假设整个图像的曝光退化是均匀的,当多个曝光退化共存于单个图像时,导致性能不理想。为了解决这一限制,我们提出了UECNet,一个由区域级提示引导的统一曝光校正网络。具体来说,我们首先通过提示调优获得5个特定于降级的文本提示。这些提示被输入到我们的暴露提示生成(EPG)模块中,该模块生成空间自适应的区域级描述符,以表征局部暴露属性。为了有效地将这些特定区域的描述符集成到曝光校正管道中,我们设计了一个提示引导的令牌混合器(PTM)模块。PTM支持高维视觉特征和区域级提示之间的全局交互建模,从而动态地指导校正过程。UECNet是通过将EPG和PTM整合到u型Transformer主干中而构建的。此外,我们引入了SICE- de (SICE-based Diverse Exposure),这是一个从著名的SICE数据集重组的新的基准数据集,以促进有效的训练和综合评估。SICE-DE涵盖六种不同的曝光条件,包括具有挑战性的严重过度/不足曝光和不均匀曝光。广泛的实验表明,所提出的UECNet在多重曝光校正基准上始终优于最先进的方法。我们的代码和SICE-DE数据集可以在https://github.com/ShuchengXia/UECNet上获得。
{"title":"UECNet: A unified framework for exposure correction utilizing region-level prompts","authors":"Shucheng Xia ,&nbsp;Kan Chang ,&nbsp;Yuqing Li ,&nbsp;Mingyang Ling ,&nbsp;Xuxin Tai ,&nbsp;Yehua Ling ,&nbsp;Yujian Yuan ,&nbsp;Zan Gao","doi":"10.1016/j.knosys.2026.115365","DOIUrl":"10.1016/j.knosys.2026.115365","url":null,"abstract":"<div><div>In real-world scenarios, complex illumination often causes improper exposure in images. Most existing correction methods assume uniform exposure degradation across the entire image, leading to suboptimal performance when multiple exposure degradations coexist in a single image. To address this limitation, we propose UECNet, a Unified Exposure Correction Network guided by region-level prompts. Specifically, we first derive five degradation-specific text prompts through prompt tuning. These prompts are fed into our Exposure Prompts Generation (EPG) module, which generates spatially adaptive, region-level descriptors to characterize local exposure properties. To effectively integrate these region-specific descriptors into the exposure correction pipeline, we design a Prompt-guided Token Mixer (PTM) module. The PTM enables global interactive modeling between high-dimensional visual features and region-level prompts, thereby dynamically steering the correction process. UECNet is built by incorporating EPG and PTM into a U-shaped Transformer backbone. Furthermore, we introduce SICE-DE (SICE-based Diverse Exposure), a new benchmark dataset reorganized from the well-known SICE dataset, to facilitate effective training and comprehensive evaluation. SICE-DE covers six distinct exposure conditions, including challenging severe over/underexposure and non-uniform exposure. Extensive experiments demonstrate that the proposed UECNet consistently outperforms state-of-the-art methods on multiple exposure correction benchmarks. Our code and the SICE-DE dataset will be available at <span><span>https://github.com/ShuchengXia/UECNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115365"},"PeriodicalIF":7.6,"publicationDate":"2026-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PYRA: A high-level linter for data science software PYRA:数据科学软件的高级过滤器
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-25 Epub Date: 2026-01-25 DOI: 10.1016/j.knosys.2026.115412
Greta Dolcetti , Vincenzo Arceri , Antonella Mensi , Enea Zaffanella , Caterina Urban , Agostino Cortesi
Due to its interdisciplinary nature, the development of data science software is particularly prone to a wide range of potential mistakes that can easily and silently compromise the final results. Several tools have been proposed that can help the data scientist in identifying the most common, low-level programming issues. However, these tools often fall short in detecting higher-level, domain-specific issues typical of data science pipelines, where subtle errors may not trigger exceptions but can still lead to incorrect or misleading outcomes, or unexpected behaviors. In this paper, we present PYRA, a static analysis tool that aims at detecting code smells in data science workflows. PYRA builds upon the Abstract Interpretation framework to infer abstract datatypes, and exploits such information to flag 16 categories of potential code smells concerning misleading visualizations, challenges for reproducibility, as well as misleading, unreliable or unexpected results. Unlike traditional linters, which focus on syntactic or stylistic issues, PYRA reasons over a domain-specific type system to identify data science-specific problems – such as improper data preprocessing steps and procedures’ misapplications – that could silently propagate through a data-manipulation pipeline. Beyond static checking, we envision tools like PYRA becoming integral components of the development loop, with analysis reports guiding correction and helping assess the reliability of machine learning pipelines. We evaluate PYRA on a benchmark suite of real-world Jupyter notebooks, showing its effectiveness in detecting practical data science issues, thereby enhancing transparency, correctness, and reproducibility in data science software.
由于其跨学科的性质,数据科学软件的开发特别容易出现各种各样的潜在错误,这些错误可以很容易地悄悄地损害最终结果。已经提出了一些工具,可以帮助数据科学家识别最常见的低级编程问题。然而,这些工具在检测数据科学管道中典型的高层次、特定领域的问题方面往往不足,其中细微的错误可能不会触发异常,但仍可能导致不正确或误导性的结果,或意外的行为。在本文中,我们介绍了PYRA,一个静态分析工具,旨在检测数据科学工作流中的代码气味。PYRA建立在抽象解释框架的基础上,以推断抽象数据类型,并利用这些信息来标记16类潜在的代码气味,这些代码气味涉及误导性的可视化、可再现性的挑战,以及误导性、不可靠或意外的结果。与关注语法或风格问题的传统检测不同,PYRA通过特定于领域的类型系统来识别特定于数据科学的问题——例如不适当的数据预处理步骤和过程的错误应用——这些问题可以通过数据操作管道无声地传播。除了静态检查,我们设想像PYRA这样的工具成为开发循环的组成部分,分析报告指导纠正并帮助评估机器学习管道的可靠性。我们在真实世界的Jupyter笔记本的基准套件上对PYRA进行了评估,显示了它在检测实际数据科学问题方面的有效性,从而提高了数据科学软件的透明度、正确性和可重复性。
{"title":"PYRA: A high-level linter for data science software","authors":"Greta Dolcetti ,&nbsp;Vincenzo Arceri ,&nbsp;Antonella Mensi ,&nbsp;Enea Zaffanella ,&nbsp;Caterina Urban ,&nbsp;Agostino Cortesi","doi":"10.1016/j.knosys.2026.115412","DOIUrl":"10.1016/j.knosys.2026.115412","url":null,"abstract":"<div><div>Due to its interdisciplinary nature, the development of data science software is particularly prone to a wide range of potential mistakes that can easily and silently compromise the final results. Several tools have been proposed that can help the data scientist in identifying the most common, low-level programming issues. However, these tools often fall short in detecting higher-level, domain-specific issues typical of data science pipelines, where subtle errors may not trigger exceptions but can still lead to incorrect or misleading outcomes, or unexpected behaviors. In this paper, we present <span><math><mrow><mi>PYRA</mi></mrow></math></span>, a static analysis tool that aims at detecting code smells in data science workflows. <span><math><mrow><mi>PYRA</mi></mrow></math></span> builds upon the Abstract Interpretation framework to infer abstract datatypes, and exploits such information to flag 16 categories of potential code smells concerning misleading visualizations, challenges for reproducibility, as well as misleading, unreliable or unexpected results. Unlike traditional linters, which focus on syntactic or stylistic issues, <span><math><mrow><mi>PYRA</mi></mrow></math></span> reasons over a domain-specific type system to identify data science-specific problems – such as improper data preprocessing steps and procedures’ misapplications – that could silently propagate through a data-manipulation pipeline. Beyond static checking, we envision tools like <span><math><mrow><mi>PYRA</mi></mrow></math></span> becoming integral components of the development loop, with analysis reports guiding correction and helping assess the reliability of machine learning pipelines. We evaluate <span><math><mrow><mi>PYRA</mi></mrow></math></span> on a benchmark suite of real-world Jupyter notebooks, showing its effectiveness in detecting practical data science issues, thereby enhancing transparency, correctness, and reproducibility in data science software.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115412"},"PeriodicalIF":7.6,"publicationDate":"2026-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AdaptTrack: Perception field adaptation with contrastive attention for robust visual tracking 基于对比注意的感知场自适应鲁棒视觉跟踪
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-25 Epub Date: 2026-01-21 DOI: 10.1016/j.knosys.2026.115369
Yongjun Wang, Xiaohui Hao
While transformer-based methods have advanced visual object tracking, existing approaches often struggle with complex scenarios due to their reliance on fixed perception fields, limited discriminative capabilities, and insufficient predictive modeling. Current solutions utilizing attention mechanisms and feature learning techniques have made progress but face inherent limitations in adapting to dynamic scenes and maintaining robust target discrimination. We propose AdaptTrack, an innovative Transformer-based tracking framework that systematically addresses three critical limitations in existing approaches: suboptimal perception field adaptation for capturing target-specific information, insufficient target-background discrimination in cluttered environments, and inadequate predictive modeling during challenging scenarios. The framework introduces three key technical components: (1) an Adaptive Perception Field Guidance Network that dynamically optimizes feature extraction through scene-aware field configuration, (2) a Contrastive-Guided Contextual Attention mechanism that enhances discrimination through structured contrast learning, and (3) a Predictive State Transition Network that improves robustness via probabilistic state modeling. Through these innovations, our approach effectively addresses the limitations of current methods through dynamic field adaptation, explicit contrast modeling, and robust state prediction. Extensive evaluations demonstrate state-of-the-art performance on seven benchmarks (77.3% AO on GOT-10k, 73.3% AUC on LaSOT, 85.4% AUC on TrackingNet) while maintaining real-time efficiency at 32.6 FPS.
虽然基于变压器的方法具有先进的视觉目标跟踪,但现有的方法往往难以处理复杂的场景,因为它们依赖于固定的感知场,有限的判别能力和不足的预测建模。目前利用注意机制和特征学习技术的解决方案取得了进展,但在适应动态场景和保持强大的目标识别方面存在固有的局限性。我们提出了AdaptTrack,一个创新的基于变压器的跟踪框架,系统地解决了现有方法中的三个关键限制:捕获目标特定信息的次优感知场适应,混乱环境中的目标背景区分不足,以及在具有挑战性的场景中不充分的预测建模。该框架引入了三个关键技术组件:(1)自适应感知场引导网络,通过场景感知场配置动态优化特征提取;(2)对比引导上下文注意机制,通过结构化对比学习增强辨别能力;(3)预测状态转移网络,通过概率状态建模提高鲁棒性。通过这些创新,我们的方法通过动态场适应、显式对比建模和鲁棒状态预测有效地解决了当前方法的局限性。广泛的评估表明,在7个基准测试(GOT-10k上77.3%的AO, LaSOT上73.3%的AUC, TrackingNet上85.4%的AUC)上,最先进的性能,同时保持了32.6 FPS的实时效率。
{"title":"AdaptTrack: Perception field adaptation with contrastive attention for robust visual tracking","authors":"Yongjun Wang,&nbsp;Xiaohui Hao","doi":"10.1016/j.knosys.2026.115369","DOIUrl":"10.1016/j.knosys.2026.115369","url":null,"abstract":"<div><div>While transformer-based methods have advanced visual object tracking, existing approaches often struggle with complex scenarios due to their reliance on fixed perception fields, limited discriminative capabilities, and insufficient predictive modeling. Current solutions utilizing attention mechanisms and feature learning techniques have made progress but face inherent limitations in adapting to dynamic scenes and maintaining robust target discrimination. We propose AdaptTrack, an innovative Transformer-based tracking framework that systematically addresses three critical limitations in existing approaches: suboptimal perception field adaptation for capturing target-specific information, insufficient target-background discrimination in cluttered environments, and inadequate predictive modeling during challenging scenarios. The framework introduces three key technical components: (1) an Adaptive Perception Field Guidance Network that dynamically optimizes feature extraction through scene-aware field configuration, (2) a Contrastive-Guided Contextual Attention mechanism that enhances discrimination through structured contrast learning, and (3) a Predictive State Transition Network that improves robustness via probabilistic state modeling. Through these innovations, our approach effectively addresses the limitations of current methods through dynamic field adaptation, explicit contrast modeling, and robust state prediction. Extensive evaluations demonstrate state-of-the-art performance on seven benchmarks (77.3% AO on GOT-10k, 73.3% AUC on LaSOT, 85.4% AUC on TrackingNet) while maintaining real-time efficiency at 32.6 FPS.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115369"},"PeriodicalIF":7.6,"publicationDate":"2026-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
REAL-SORT: RElation-aware for real-time multiple object tracking REAL-SORT:关系感知的实时多对象跟踪
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-03-25 Epub Date: 2026-01-20 DOI: 10.1016/j.knosys.2026.115373
Xinling Zhang , Huijuan Zhao , Shuangjiang He , Li Yu
Recent advancements in multi-object tracking (MOT) have accelerated progress in autonomous driving and human-computer interaction. Tracking-by-detection approaches remain dominant due to their computational efficiency and streamlined architectures. However, this paradigm faces two critical challenges that trade off tracking accuracy with real-time efficiency: (i) spatial cues often exhibit inconsistent reliability across diverse scenarios, limiting their effectiveness; and (ii) false associations across consecutive frames frequently cause tracking failures, undermining long-term robustness. To address these issues, we propose Relation-Aware Simple Online and Real-time Tracker (REAL-SORT), which effectively leverages both spatial and temporal relationships. Regarding spatial relations, we introduce two association strategies that incorporate occlusion-aware cross-scenario feature extraction and relative-position-based matching. On the temporal side, an ID Recovery Module (IRM) exploits multi-frame information to estimate ID lost probabilities, enabling robust trajectory recovery. Extensive experiments on DanceTrack, MOT17, and MOT20 benchmarks demonstrate that our method outperforms existing state-of-the-art trackers across HOTA, IDF1 and AssA metrics, particularly excelling in challenging scenarios. Furthermore, REAL-SORT exhibits strong generalizability, consistently improving performance when integrated into various leading tracking frameworks.
近年来多目标跟踪技术的发展加速了自动驾驶和人机交互技术的发展。由于其计算效率和精简的架构,检测跟踪方法仍然占主导地位。然而,这种模式面临着两个关键的挑战,即在跟踪准确性和实时效率之间进行权衡:(i)空间线索在不同的场景中往往表现出不一致的可靠性,限制了它们的有效性;并且(ii)跨连续帧的错误关联经常导致跟踪失败,破坏长期鲁棒性。为了解决这些问题,我们提出了关系感知简单在线实时跟踪器(REAL-SORT),它有效地利用了空间和时间关系。在空间关系方面,我们引入了两种关联策略,包括闭塞感知的跨场景特征提取和基于相对位置的匹配。在时间方面,ID恢复模块(IRM)利用多帧信息来估计ID丢失概率,从而实现稳健的轨迹恢复。在DanceTrack、MOT17和MOT20基准测试上进行的大量实验表明,我们的方法在HOTA、IDF1和AssA指标上优于现有的最先进的跟踪器,特别是在具有挑战性的场景中表现出色。此外,REAL-SORT显示出强大的泛化性,当集成到各种领先的跟踪框架中时,性能不断提高。
{"title":"REAL-SORT: RElation-aware for real-time multiple object tracking","authors":"Xinling Zhang ,&nbsp;Huijuan Zhao ,&nbsp;Shuangjiang He ,&nbsp;Li Yu","doi":"10.1016/j.knosys.2026.115373","DOIUrl":"10.1016/j.knosys.2026.115373","url":null,"abstract":"<div><div>Recent advancements in multi-object tracking (MOT) have accelerated progress in autonomous driving and human-computer interaction. Tracking-by-detection approaches remain dominant due to their computational efficiency and streamlined architectures. However, this paradigm faces two critical challenges that trade off tracking accuracy with real-time efficiency: (i) spatial cues often exhibit inconsistent reliability across diverse scenarios, limiting their effectiveness; and (ii) false associations across consecutive frames frequently cause tracking failures, undermining long-term robustness. To address these issues, we propose Relation-Aware Simple Online and Real-time Tracker (REAL-SORT), which effectively leverages both spatial and temporal relationships. Regarding spatial relations, we introduce two association strategies that incorporate occlusion-aware cross-scenario feature extraction and relative-position-based matching. On the temporal side, an ID Recovery Module (IRM) exploits multi-frame information to estimate ID lost probabilities, enabling robust trajectory recovery. Extensive experiments on DanceTrack, MOT17, and MOT20 benchmarks demonstrate that our method outperforms existing state-of-the-art trackers across HOTA, IDF1 and AssA metrics, particularly excelling in challenging scenarios. Furthermore, REAL-SORT exhibits strong generalizability, consistently improving performance when integrated into various leading tracking frameworks.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115373"},"PeriodicalIF":7.6,"publicationDate":"2026-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Knowledge-Based Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1