首页 > 最新文献

Knowledge-Based Systems最新文献

英文 中文
SLATSCOG: A secure authentication framework via federated data generation and temporally-enhanced split learning SLATSCOG:通过联邦数据生成和临时增强的分割学习实现的安全身份验证框架
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-15 DOI: 10.1016/j.knosys.2026.115304
Jiayuan Chen , Tiantian Zhu , Zhengqiu Weng , Zhizhong Ma , Suyu Zhang
With the rapid proliferation of mobile devices, ensuring secure user identity authentication has become increasingly important. Mobile user authentication technologies that utilize motion-sensor-based implicit behavioral features hold great potential to strengthen verification security, improve efficiency, and enhance the overall user experience. However, real-world applications face the following challenges: limited training samples often lead to model overfitting and poor generalization, extracting key behavioral features from noisy data is difficult owing to sensor signal interference and complex temporal dependencies, and centralized training paradigms fail to protect privacy, placing sensitive user data at risk. To address these issues, we propose an innovative framework, SLATSCOG, which integrates a series of optimization strategies to enhance overall system performance across three key aspects: data generation, feature extraction, and privacy protection. (1) To address data scarcity, a federated, trained diffusion model generates high-fidelity, user-labeled accelerometer and gyroscope data, enriching datasets, preserving privacy, and enhancing model accuracy and generalization. (2) For better feature extraction, an advanced pipeline combines a physical signal generator (PSG) to extract salient physical features with the SLATE (Split Learning with Auxiliary Temporal Enhancement) architecture (modeling temporal dependencies via split learning with auxiliary enhancement), enhancing pattern capture, reducing noise, and adapting to diverse interactions. (3) A hybrid decentralized framework balances privacy and performance: federated learning enables local diffusion-based data generation, whereas split learning trains the authentication model via modular computations, ensuring layered privacy protection. Experimental results from 1513 users indicated that SLATSCOG safeguards data while maintaining fast convergence and high accuracy, making it suitable for practical deployment.
随着移动设备的迅速普及,确保用户身份的安全认证变得越来越重要。利用基于运动传感器的隐式行为特征的移动用户认证技术在增强验证安全性、提高效率和增强整体用户体验方面具有巨大潜力。然而,现实世界的应用面临着以下挑战:有限的训练样本往往导致模型过拟合和泛化不良;由于传感器信号干扰和复杂的时间依赖性,从噪声数据中提取关键行为特征很困难;集中的训练范式无法保护隐私,使敏感的用户数据处于危险之中。为了解决这些问题,我们提出了一个创新的框架,SLATSCOG,它集成了一系列优化策略,以提高系统在三个关键方面的整体性能:数据生成、特征提取和隐私保护。(1)为了解决数据稀缺问题,一个联邦的、训练好的扩散模型生成高保真的、用户标记的加速度计和陀螺仪数据,丰富了数据集,保护了隐私,提高了模型的准确性和泛化。(2)为了更好地提取特征,一种先进的管道将物理信号发生器(PSG)与SLATE(分裂学习与辅助时间增强)架构(通过分裂学习与辅助时间增强建模时间依赖性)结合起来提取显著的物理特征,增强模式捕获,降低噪声,并适应多种交互。(3)混合分散框架平衡隐私和性能:联邦学习实现基于局部扩散的数据生成,而分裂学习通过模块化计算训练认证模型,确保分层隐私保护。1513个用户的实验结果表明,SLATSCOG在保证数据安全的同时保持了快速收敛和高精度,适合实际部署。
{"title":"SLATSCOG: A secure authentication framework via federated data generation and temporally-enhanced split learning","authors":"Jiayuan Chen ,&nbsp;Tiantian Zhu ,&nbsp;Zhengqiu Weng ,&nbsp;Zhizhong Ma ,&nbsp;Suyu Zhang","doi":"10.1016/j.knosys.2026.115304","DOIUrl":"10.1016/j.knosys.2026.115304","url":null,"abstract":"<div><div>With the rapid proliferation of mobile devices, ensuring secure user identity authentication has become increasingly important. Mobile user authentication technologies that utilize motion-sensor-based implicit behavioral features hold great potential to strengthen verification security, improve efficiency, and enhance the overall user experience. However, real-world applications face the following challenges: limited training samples often lead to model overfitting and poor generalization, extracting key behavioral features from noisy data is difficult owing to sensor signal interference and complex temporal dependencies, and centralized training paradigms fail to protect privacy, placing sensitive user data at risk. To address these issues, we propose an innovative framework, <span>SLATSCOG</span>, which integrates a series of optimization strategies to enhance overall system performance across three key aspects: data generation, feature extraction, and privacy protection. (1) To address data scarcity, a federated, trained diffusion model generates high-fidelity, user-labeled accelerometer and gyroscope data, enriching datasets, preserving privacy, and enhancing model accuracy and generalization. (2) For better feature extraction, an advanced pipeline combines a physical signal generator (PSG) to extract salient physical features with the SLATE (Split Learning with Auxiliary Temporal Enhancement) architecture (modeling temporal dependencies via split learning with auxiliary enhancement), enhancing pattern capture, reducing noise, and adapting to diverse interactions. (3) A hybrid decentralized framework balances privacy and performance: federated learning enables local diffusion-based data generation, whereas split learning trains the authentication model via modular computations, ensuring layered privacy protection. Experimental results from 1513 users indicated that <span>SLATSCOG</span> safeguards data while maintaining fast convergence and high accuracy, making it suitable for practical deployment.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"336 ","pages":"Article 115304"},"PeriodicalIF":7.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146024298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generative compositional zero-Shot learning using learnable primitive disparity 基于可学习原语视差的生成构图零距学习
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-15 DOI: 10.1016/j.knosys.2026.115278
Minho Kim , Byeongkeun Kang , Yeejin Lee
Compositional zero-shot learning aims to recognize both object and attribute categories from images, including novel attribute-object combinations that are not observed during training. A key challenge is correctly identifying unseen compositions without supervision while avoiding reliance on inconsistent associations between class names and visual content. This issue mainly arises from the use of fixed text embeddings that are directly tied to class labels. To overcome these challenges, we propose a novel framework that learns primitive disparities without depending on textual labels. Our method integrates an embedding-based strategy with a generative framework, an approach that has received limited attention in compositional learning. Specifically, primitive classes are identified by comparing visual and textual representations in a shared embedding space. To improve visual feature quality, we introduce a region-specific feature aggregation strategy that effectively captures attribute-related information. In addition, to mitigate data scarcity in zero-shot learning scenarios, we design a generative module that synthesizes unseen features using metric-learning-based triplets and feature disparity modeling with learnable class features. This module enables feature synthesis in a unified visual space, reducing dependence on text-driven knowledge commonly used in existing methods. The synthesized features are then used to jointly refine both visual and textual representations, leading to improved generalization performance. Extensive experiments on four widely used benchmark datasets demonstrate that our method outperforms state-of-the-art approaches. The code will be released upon publication.
组合零射击学习旨在从图像中识别对象和属性类别,包括在训练过程中未观察到的新属性-对象组合。一个关键的挑战是在没有监督的情况下正确识别看不见的组合,同时避免依赖于类名和视觉内容之间不一致的关联。这个问题主要是由直接绑定到类标签的固定文本嵌入引起的。为了克服这些挑战,我们提出了一种新的框架,可以在不依赖文本标签的情况下学习原始差异。我们的方法将基于嵌入的策略与生成框架相结合,这种方法在作文学习中受到的关注有限。具体来说,通过比较共享嵌入空间中的视觉表示和文本表示来识别基元类。为了提高视觉特征质量,我们引入了一种针对特定区域的特征聚合策略,该策略可以有效地捕获属性相关信息。此外,为了缓解零射击学习场景中的数据稀缺性,我们设计了一个生成模块,该模块使用基于度量学习的三元组和具有可学习类特征的特征视差建模来综合看不见的特征。该模块能够在统一的视觉空间中进行特征合成,减少了对现有方法中常用的文本驱动知识的依赖。然后使用合成的特征来共同改进视觉和文本表示,从而提高泛化性能。在四个广泛使用的基准数据集上进行的大量实验表明,我们的方法优于最先进的方法。该规范将在出版后发布。
{"title":"Generative compositional zero-Shot learning using learnable primitive disparity","authors":"Minho Kim ,&nbsp;Byeongkeun Kang ,&nbsp;Yeejin Lee","doi":"10.1016/j.knosys.2026.115278","DOIUrl":"10.1016/j.knosys.2026.115278","url":null,"abstract":"<div><div>Compositional zero-shot learning aims to recognize both object and attribute categories from images, including novel attribute-object combinations that are not observed during training. A key challenge is correctly identifying unseen compositions without supervision while avoiding reliance on inconsistent associations between class names and visual content. This issue mainly arises from the use of fixed text embeddings that are directly tied to class labels. To overcome these challenges, we propose a novel framework that learns primitive disparities without depending on textual labels. Our method integrates an embedding-based strategy with a generative framework, an approach that has received limited attention in compositional learning. Specifically, primitive classes are identified by comparing visual and textual representations in a shared embedding space. To improve visual feature quality, we introduce a region-specific feature aggregation strategy that effectively captures attribute-related information. In addition, to mitigate data scarcity in zero-shot learning scenarios, we design a generative module that synthesizes unseen features using metric-learning-based triplets and feature disparity modeling with learnable class features. This module enables feature synthesis in a unified visual space, reducing dependence on text-driven knowledge commonly used in existing methods. The synthesized features are then used to jointly refine both visual and textual representations, leading to improved generalization performance. Extensive experiments on four widely used benchmark datasets demonstrate that our method outperforms state-of-the-art approaches. The code will be released upon publication.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"336 ","pages":"Article 115278"},"PeriodicalIF":7.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146024210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data-interactive mamba driven SAR-optical fusion cloud removal 数据交互曼巴驱动的sar -光学融合云去除
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-15 DOI: 10.1016/j.knosys.2026.115301
Fajing Liu , En Li , Yixiao Liu , Sijie Zhou , Yuanyuan Wu , Chao Ren
Synthetic aperture radar (SAR) imagery, synchronized with optical (OPT) data, enables reconstructing remote sensing images obscured by clouds and shadows owing to their ability to penetrate clouds. However, existing SAR-optical fusion techniques for cloud removal are few and predominantly target global cloud removal, resulting in insufficient feature reconstruction in dense cloud areas and introducing artifacts in cloud-free regions. Moreover, current methods lack independent learning from individual modalities and fail to establish comprehensive associations for data recovery when either SAR or OPT signals are weak. To effectively solve the above problems, a novel model (MDFuse-CR) is proposed in this paper. Firstly, the data-interactive vision mamba (DI_ViM) based on the state space model is presented, then utilized with an invertible neural network (INN) to create a dual-branch complementary-individual feature extraction module. Secondly, a cloud-shadow adaptive loss function derived from the cloud-shadow detection algorithm is introduced, aimed at minimizing the impact of processing on cloud-free areas. Thirdly, a pre-training method is adopted to remove noise in SAR and OPT images. Finally, the superiority of the approach in terms of cloud removal efficacy and inference speed in the absence of the transformer is substantiated. The experimental results on public dataset SEN12MS-CR show that the proposed method exhibits superior performance. Compared with the second-best method, the average PSNR and SSIM values of MDFuse-CR increase by 1.4% and 2.8% respectively, while the average MAE and SAM values decrease by 5.4% and 1.8% respectively. The source code is available at https://github.com/Jing220/MDFuse-CR.
合成孔径雷达(SAR)图像与光学(OPT)数据同步,可以重建被云层和阴影遮挡的遥感图像,因为它们能够穿透云层。然而,现有的sar -光学融合云去除技术很少,而且主要针对全局云去除,导致稠密云区域的特征重建不足,并且在无云区域引入伪影。此外,目前的方法缺乏对单个模态的独立学习,无法在SAR或OPT信号较弱时建立数据恢复的综合关联。为了有效地解决上述问题,本文提出了一种新的模型MDFuse-CR。首先,提出了基于状态空间模型的数据交互视觉曼巴(DI_ViM),然后结合可逆神经网络(INN)构建了双分支互补-个体特征提取模块;其次,在云阴影检测算法的基础上引入云阴影自适应损失函数,使处理过程对无云区域的影响最小化;第三,采用预训练方法去除SAR和OPT图像中的噪声。最后,在没有变压器的情况下,证明了该方法在除云效率和推理速度方面的优越性。在公共数据集SEN12MS-CR上的实验结果表明,该方法具有良好的性能。与次优方法相比,MDFuse-CR的平均PSNR和SSIM值分别提高了1.4%和2.8%,而平均MAE和SAM值分别降低了5.4%和1.8%。源代码可从https://github.com/Jing220/MDFuse-CR获得。
{"title":"Data-interactive mamba driven SAR-optical fusion cloud removal","authors":"Fajing Liu ,&nbsp;En Li ,&nbsp;Yixiao Liu ,&nbsp;Sijie Zhou ,&nbsp;Yuanyuan Wu ,&nbsp;Chao Ren","doi":"10.1016/j.knosys.2026.115301","DOIUrl":"10.1016/j.knosys.2026.115301","url":null,"abstract":"<div><div>Synthetic aperture radar (SAR) imagery, synchronized with optical (OPT) data, enables reconstructing remote sensing images obscured by clouds and shadows owing to their ability to penetrate clouds. However, existing SAR-optical fusion techniques for cloud removal are few and predominantly target global cloud removal, resulting in insufficient feature reconstruction in dense cloud areas and introducing artifacts in cloud-free regions. Moreover, current methods lack independent learning from individual modalities and fail to establish comprehensive associations for data recovery when either SAR or OPT signals are weak. To effectively solve the above problems, a novel model (MDFuse-CR) is proposed in this paper. Firstly, the data-interactive vision mamba (DI_ViM) based on the state space model is presented, then utilized with an invertible neural network (INN) to create a dual-branch complementary-individual feature extraction module. Secondly, a cloud-shadow adaptive loss function derived from the cloud-shadow detection algorithm is introduced, aimed at minimizing the impact of processing on cloud-free areas. Thirdly, a pre-training method is adopted to remove noise in SAR and OPT images. Finally, the superiority of the approach in terms of cloud removal efficacy and inference speed in the absence of the transformer is substantiated. The experimental results on public dataset SEN12MS-CR show that the proposed method exhibits superior performance. Compared with the second-best method, the average PSNR and SSIM values of MDFuse-CR increase by 1.4% and 2.8% respectively, while the average MAE and SAM values decrease by 5.4% and 1.8% respectively. The source code is available at <span><span>https://github.com/Jing220/MDFuse-CR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"336 ","pages":"Article 115301"},"PeriodicalIF":7.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146024208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data privacy preserved student career prediction with deep learning and blockchain based mechanism 基于深度学习和区块链机制的学生职业预测数据隐私保护
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-15 DOI: 10.1016/j.knosys.2026.115352
Mansi Aggarwal , Vaibhav Vyas
In recent days, student success has become a primary strategic objective for most higher education institutions. Due to increasing operational costs and budget cuts, educational institutions are paying more attention to satisfying student enrollment in their plans without compromising the quality and rigour of education. The existing research advancement in big data analytics and machine learning (ML) techniques massively depend on student data to predict student information. But these existing models address some issues in classifying the student data and predicting their performance. This paper proposes Deep learning with blockchain to improve the accuracy of predicting student career success with data privacy. The paper is divided into three phases: pre-processing, feature selection and classification. Initially, in the pre-processing stage, the min-max normalization method is performed to normalize the data. Next, the phase employs a feature selection method by using an advanced technique of improved Levy flight assisted fire hawk optimization (Imp-LeFoP). This method selects the optimal subset of the features. Further, the selected feature is taken as input and passed into a classifier model, namely optimized hybrid densely connected convolutional VGG assisted car-studformer (Opt-Studformer). The hyperparameter present in the classifier model is properly tuned by the dung beetle optimization mechanism (DBO). For student data privacy, the blockchain mechanism is integrated to store the data securely. In addition, the blockchain mechanism uses the improved proof-of-stake (Im-PoS) consensus algorithm for retrieval validation. The performance of career prediction is evaluated by using two datasets: computer science student career prediction and career prediction. Thus, both datasets achieve high accuracy rates of 98.97% and 98.84%, respectively.
近年来,学生的成功已成为大多数高等教育机构的主要战略目标。由于运营成本的增加和预算的削减,教育机构越来越注重在不影响教育质量和严谨性的情况下满足学生的入学计划。现有的大数据分析和机器学习(ML)技术的研究进展很大程度上依赖于学生数据来预测学生信息。但是这些现有的模型在分类学生数据和预测他们的表现方面存在一些问题。本文提出利用区块链进行深度学习,以提高利用数据隐私预测学生职业成功的准确性。本文分为预处理、特征选择和分类三个阶段。首先,在预处理阶段,采用最小-最大归一化方法对数据进行归一化。接下来,采用先进的改进Levy飞行辅助火鹰优化技术(Imp-LeFoP)进行特征选择。该方法选择特征的最优子集。进一步,将选择的特征作为输入,传递给分类器模型,即优化的混合密集连接卷积VGG辅助car-studformer (Opt-Studformer)。通过屎壳郎优化机制(DBO)对分类器模型中的超参数进行了适当的调整。在学生数据隐私方面,集成了区块链机制来安全存储数据。此外,区块链机制使用改进的权益证明(Im-PoS)共识算法进行检索验证。利用计算机科学专业学生职业预测和职业预测两个数据集对职业预测的效果进行了评价。因此,两个数据集的准确率分别达到了98.97%和98.84%。
{"title":"Data privacy preserved student career prediction with deep learning and blockchain based mechanism","authors":"Mansi Aggarwal ,&nbsp;Vaibhav Vyas","doi":"10.1016/j.knosys.2026.115352","DOIUrl":"10.1016/j.knosys.2026.115352","url":null,"abstract":"<div><div>In recent days, student success has become a primary strategic objective for most higher education institutions. Due to increasing operational costs and budget cuts, educational institutions are paying more attention to satisfying student enrollment in their plans without compromising the quality and rigour of education. The existing research advancement in big data analytics and machine learning (ML) techniques massively depend on student data to predict student information. But these existing models address some issues in classifying the student data and predicting their performance. This paper proposes Deep learning with blockchain to improve the accuracy of predicting student career success with data privacy. The paper is divided into three phases: pre-processing, feature selection and classification. Initially, in the pre-processing stage, the min-max normalization method is performed to normalize the data. Next, the phase employs a feature selection method by using an advanced technique of improved Levy flight assisted fire hawk optimization (Imp-LeFoP). This method selects the optimal subset of the features. Further, the selected feature is taken as input and passed into a classifier model, namely optimized hybrid densely connected convolutional VGG assisted car-studformer (Opt-Studformer). The hyperparameter present in the classifier model is properly tuned by the dung beetle optimization mechanism (DBO). For student data privacy, the blockchain mechanism is integrated to store the data securely. In addition, the blockchain mechanism uses the improved proof-of-stake (Im-PoS) consensus algorithm for retrieval validation. The performance of career prediction is evaluated by using two datasets: computer science student career prediction and career prediction. Thus, both datasets achieve high accuracy rates of 98.97% and 98.84%, respectively.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115352"},"PeriodicalIF":7.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RMViM-Net: Residual multi-path vision mamba with graph interaction attention for medical image segmentation 基于图交互关注的残差多路径视觉曼巴医学图像分割
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-15 DOI: 10.1016/j.knosys.2026.115326
Shen Jiang , Xiaoyan Kui , Xingzhuo Bao , Qinsong Li , Zhipeng Hu , Beiji Zou
Medical image segmentation plays an important role in computer-aided clinical diagnosis, but existing methods still face challenges in balancing long-range dependency modeling and fine-grained detail extraction. Convolutional neural networks (CNNs) are effective in capturing local features but lack global modeling capability. Transformer architectures can model long-range dependencies, yet their high computational complexity hinders their use in high-resolution medical image scenarios. Recently, the Vision Mamba architecture has gained attention for its linear computational complexity and ability to capture long-range dependencies, but it remains limited by fixed scanning patterns and insufficient feature interaction. To address these limitations, this study presents an improved vision Mamba-based network named RMViM-Net for efficient and accurate medical image segmentation. As a key component, a five-dimensional multi-path scanning (5D Multi-path Scan) module is developed, which constructs multi-directional scan paths in parallel across multiple subspaces to enhance directional spatial modeling capability. In addition, a residual-enhanced multi-path visual state modeling (RMViM) block is introduced to mitigate feature degradation in deep state spaces by incorporating learnable residual connections, thereby improving the modeling of lesion boundaries and fine structural details. Moreover, a graph interaction attention (GIA) mechanism is proposed to build topological connections among regional feature nodes, effectively improving dynamic feature aggregation across regions and semantic consistency in spatial context. Experiments conducted on five public medical image segmentation datasets, including ISIC 2017, ISIC 2018, CVC-ClinicDB, ACDC, and Synapse, demonstrate that RMViM-Net achieves superior segmentation performance and generalization ability compared to existing state-of-the-art methods.
医学图像分割在计算机辅助临床诊断中发挥着重要作用,但现有方法在平衡远程依赖建模和细粒度细节提取方面仍面临挑战。卷积神经网络(cnn)能够有效捕获局部特征,但缺乏全局建模能力。Transformer架构可以对远程依赖关系进行建模,但是它们的高计算复杂性阻碍了它们在高分辨率医学图像场景中的使用。最近,Vision Mamba架构因其线性计算复杂性和捕获远程依赖关系的能力而受到关注,但它仍然受到固定扫描模式和特征交互不足的限制。为了解决这些限制,本研究提出了一种改进的基于视觉曼巴的网络,命名为RMViM-Net,用于高效准确的医学图像分割。作为关键组件,开发了五维多路径扫描(5D multi-path Scan)模块,该模块跨多个子空间并行构建多方向扫描路径,增强了定向空间建模能力。此外,引入残差增强多路径视觉状态建模(RMViM)块,通过结合可学习残差连接来减轻深度状态空间中的特征退化,从而改进病变边界和精细结构细节的建模。此外,提出了一种图形交互注意(GIA)机制,在区域特征节点之间建立拓扑连接,有效提高了区域间动态特征聚合和空间语义一致性。在ISIC 2017、ISIC 2018、CVC-ClinicDB、ACDC和Synapse 5个公共医学图像分割数据集上进行的实验表明,RMViM-Net在分割性能和泛化能力方面优于现有的先进方法。
{"title":"RMViM-Net: Residual multi-path vision mamba with graph interaction attention for medical image segmentation","authors":"Shen Jiang ,&nbsp;Xiaoyan Kui ,&nbsp;Xingzhuo Bao ,&nbsp;Qinsong Li ,&nbsp;Zhipeng Hu ,&nbsp;Beiji Zou","doi":"10.1016/j.knosys.2026.115326","DOIUrl":"10.1016/j.knosys.2026.115326","url":null,"abstract":"<div><div>Medical image segmentation plays an important role in computer-aided clinical diagnosis, but existing methods still face challenges in balancing long-range dependency modeling and fine-grained detail extraction. Convolutional neural networks (CNNs) are effective in capturing local features but lack global modeling capability. Transformer architectures can model long-range dependencies, yet their high computational complexity hinders their use in high-resolution medical image scenarios. Recently, the Vision Mamba architecture has gained attention for its linear computational complexity and ability to capture long-range dependencies, but it remains limited by fixed scanning patterns and insufficient feature interaction. To address these limitations, this study presents an improved vision Mamba-based network named RMViM-Net for efficient and accurate medical image segmentation. As a key component, a five-dimensional multi-path scanning (5D Multi-path Scan) module is developed, which constructs multi-directional scan paths in parallel across multiple subspaces to enhance directional spatial modeling capability. In addition, a residual-enhanced multi-path visual state modeling (RMViM) block is introduced to mitigate feature degradation in deep state spaces by incorporating learnable residual connections, thereby improving the modeling of lesion boundaries and fine structural details. Moreover, a graph interaction attention (GIA) mechanism is proposed to build topological connections among regional feature nodes, effectively improving dynamic feature aggregation across regions and semantic consistency in spatial context. Experiments conducted on five public medical image segmentation datasets, including ISIC 2017, ISIC 2018, CVC-ClinicDB, ACDC, and Synapse, demonstrate that RMViM-Net achieves superior segmentation performance and generalization ability compared to existing state-of-the-art methods.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"336 ","pages":"Article 115326"},"PeriodicalIF":7.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146024292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reasoning text-to-image retrieval with large language models and digital twin representations 推理文本到图像检索与大型语言模型和数字孪生表示
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-15 DOI: 10.1016/j.knosys.2026.115313
Zexu Lin , Dell Zhang , Yiqing Shen , Xuelong Li
Text-to-image retrieval aims to identify images from large-scale collections that are semantically relevant to given textual queries. Existing embedding similarity methods rely on global feature matching that captures surface-level similarities, limiting their ability to handle implicit queries that require reasoning about object attributes, spatial relationships, and scene semantics. While some recent approaches employ multi-stage processing pipelines to enhance cross-modal understanding, they still struggle with complex implicit reasoning tasks. Additionally, these methods typically return entire images without localizing specific objects that satisfy query constraints, making them inadequate for applications demanding fine-grained retrieval. To address these limitations, we define a new task called reasoning text-to-image retrieval, a novel task that goes beyond simple similarity matching. The goal is to retrieve not only the relevant images but also the specific objects within them that satisfy implicit, reasoning-based queries. To tackle this task, we propose a two-phase framework called DTIR (Digital Twin Image Retrieval). It bridges visual and text modalities for LLM reasoning by introducing intermediate digital twin (DT) representations. Specifically, DTIR first converts images into DT representations, which are textual descriptions that encode object semantics, attributes, and spatial relationships while preserving fine-grained visual context. Subsequently, an LLM-based agent performs reasoning and hierarchical retrieval to determine the target images as well as the objects within the images from DT representations. To evaluate reasoning-based retrieval capabilities, we construct a novel benchmark dataset RT2I, comprising 1260 query-image pairs that require reasoning. On RT2I, DTIR achieves a Recall@1 of 37.38%, a 61% relative improvement over the strongest baseline, and establishes new state-of-the-art results on 4 conventional benchmarks. Code and dataset are available at https://github.com/oneoflzx/DTIR.
文本到图像检索旨在从与给定文本查询在语义上相关的大规模集合中识别图像。现有的嵌入相似度方法依赖于捕获表面级相似度的全局特征匹配,限制了它们处理需要对对象属性、空间关系和场景语义进行推理的隐式查询的能力。虽然最近的一些方法采用多阶段处理管道来增强跨模态理解,但它们仍然难以处理复杂的隐式推理任务。此外,这些方法通常返回整个图像,而不本地化满足查询约束的特定对象,这使得它们不适用于需要细粒度检索的应用程序。为了解决这些限制,我们定义了一个新的任务,称为推理文本到图像检索,这是一个超越简单相似性匹配的新任务。目标是不仅检索相关图像,而且检索其中满足隐式、基于推理的查询的特定对象。为了解决这个问题,我们提出了一个称为DTIR(数字孪生图像检索)的两阶段框架。它通过引入中间数字孪生(DT)表示,为LLM推理架起了视觉和文本模式的桥梁。具体来说,DTIR首先将图像转换为DT表示,DT表示是对对象语义、属性和空间关系进行编码的文本描述,同时保留细粒度的视觉上下文。随后,基于llm的代理执行推理和分层检索,以确定目标图像以及来自DT表示的图像中的对象。为了评估基于推理的检索能力,我们构建了一个新的基准数据集RT2I,其中包含1260个需要推理的查询图像对。在rtti上,DTIR达到Recall@1 37.38%,比最强基线相对提高61%,并在4个传统基准上建立了新的最先进的结果。代码和数据集可从https://github.com/oneoflzx/DTIR获得。
{"title":"Reasoning text-to-image retrieval with large language models and digital twin representations","authors":"Zexu Lin ,&nbsp;Dell Zhang ,&nbsp;Yiqing Shen ,&nbsp;Xuelong Li","doi":"10.1016/j.knosys.2026.115313","DOIUrl":"10.1016/j.knosys.2026.115313","url":null,"abstract":"<div><div>Text-to-image retrieval aims to identify images from large-scale collections that are semantically relevant to given textual queries. Existing embedding similarity methods rely on global feature matching that captures surface-level similarities, limiting their ability to handle implicit queries that require reasoning about object attributes, spatial relationships, and scene semantics. While some recent approaches employ multi-stage processing pipelines to enhance cross-modal understanding, they still struggle with complex implicit reasoning tasks. Additionally, these methods typically return entire images without localizing specific objects that satisfy query constraints, making them inadequate for applications demanding fine-grained retrieval. To address these limitations, we define a new task called <strong>reasoning text-to-image retrieval</strong>, a novel task that goes beyond simple similarity matching. The goal is to retrieve not only the relevant images but also the specific objects within them that satisfy implicit, reasoning-based queries. To tackle this task, we propose a two-phase framework called DTIR (<strong>D</strong>igital <strong>T</strong>win <strong>I</strong>mage <strong>R</strong>etrieval). It bridges visual and text modalities for LLM reasoning by introducing intermediate digital twin (DT) representations. Specifically, DTIR first converts images into DT representations, which are textual descriptions that encode object semantics, attributes, and spatial relationships while preserving fine-grained visual context. Subsequently, an LLM-based agent performs reasoning and hierarchical retrieval to determine the target images as well as the objects within the images from DT representations. To evaluate reasoning-based retrieval capabilities, we construct a novel benchmark dataset RT2I, comprising 1260 query-image pairs that require reasoning. On RT2I, DTIR achieves a Recall@1 of 37.38%, a 61% relative improvement over the strongest baseline, and establishes new state-of-the-art results on 4 conventional benchmarks. Code and dataset are available at <span><span>https://github.com/oneoflzx/DTIR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"336 ","pages":"Article 115313"},"PeriodicalIF":7.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146024143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Disentangled multimodal domain generalization network for zero-calibration vigilance estimation 零标定警惕性估计的解纠缠多模态域泛化网络
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-15 DOI: 10.1016/j.knosys.2026.115353
Kangning Wang , Wei Wei , Weibo Yi , Ying Gao , Huiguang He , Minpeng Xu , Shuang Qiu , Dong Ming
Vigilance estimation is critical for maintaining optimal performance in human-machine interaction tasks. Most existing vigilance estimation methods depend on extensive labeled calibration data to train an available model for the specific subject, which substantially limits their practical applicability. To address this challenge, we propose a disentangled multimodal domain generalization network (DMDGN) for zero-calibration vigilance estimation, thereby eliminating the need for individual calibration data. Specifically, a graph convolution network (GCN) and a long short-term memory (LSTM) network were employed to project electroencephalogram (EEG) and electrooculogram (EOG) data from different domains into respective feature spaces. A modality alignment-fusion mechanism was designed to align and integrate features across modalities, mitigating modality heterogeneity and facilitating the extraction and fusion of vigilance-related features intrinsic to each modality. Furthermore, a disentangled domain adversarial mechanism was developed to decouple domain-invariant and domain-specific features, explicitly suppressing the domain-specific information while efficiently extracting domain-invariant features to improve model generalizability. Experimental results demonstrate that the proposed DMDGN outperforms comparative methods, validating its effectiveness. These findings highlight the potential of the proposed approach for practical deployment of vigilance estimation in real-world scenarios.
警惕性估计是人机交互任务中保持最佳性能的关键。大多数现有的警惕性估计方法依赖于大量标记的校准数据来训练特定主题的可用模型,这极大地限制了它们的实际适用性。为了解决这一挑战,我们提出了一种用于零校准警戒估计的解纠缠多模态域泛化网络(DMDGN),从而消除了对单个校准数据的需求。具体而言,利用图卷积网络(GCN)和长短期记忆网络(LSTM)将不同域的脑电图(EEG)和眼电图(EOG)数据投影到各自的特征空间中。设计了一种模态对齐-融合机制,用于对齐和整合不同模态的特征,减轻模态异质性,促进每种模态固有的警讯相关特征的提取和融合。在此基础上,提出了一种解纠缠域对抗机制,将域不变特征与域特定特征解耦,在有效提取域不变特征的同时明确抑制域特定信息,提高模型的泛化能力。实验结果表明,所提出的DMDGN优于比较方法,验证了其有效性。这些发现突出了所提出的方法在现实世界场景中实际部署警惕性估计的潜力。
{"title":"Disentangled multimodal domain generalization network for zero-calibration vigilance estimation","authors":"Kangning Wang ,&nbsp;Wei Wei ,&nbsp;Weibo Yi ,&nbsp;Ying Gao ,&nbsp;Huiguang He ,&nbsp;Minpeng Xu ,&nbsp;Shuang Qiu ,&nbsp;Dong Ming","doi":"10.1016/j.knosys.2026.115353","DOIUrl":"10.1016/j.knosys.2026.115353","url":null,"abstract":"<div><div>Vigilance estimation is critical for maintaining optimal performance in human-machine interaction tasks. Most existing vigilance estimation methods depend on extensive labeled calibration data to train an available model for the specific subject, which substantially limits their practical applicability. To address this challenge, we propose a disentangled multimodal domain generalization network (DMDGN) for zero-calibration vigilance estimation, thereby eliminating the need for individual calibration data. Specifically, a graph convolution network (GCN) and a long short-term memory (LSTM) network were employed to project electroencephalogram (EEG) and electrooculogram (EOG) data from different domains into respective feature spaces. A modality alignment-fusion mechanism was designed to align and integrate features across modalities, mitigating modality heterogeneity and facilitating the extraction and fusion of vigilance-related features intrinsic to each modality. Furthermore, a disentangled domain adversarial mechanism was developed to decouple domain-invariant and domain-specific features, explicitly suppressing the domain-specific information while efficiently extracting domain-invariant features to improve model generalizability. Experimental results demonstrate that the proposed DMDGN outperforms comparative methods, validating its effectiveness. These findings highlight the potential of the proposed approach for practical deployment of vigilance estimation in real-world scenarios.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"336 ","pages":"Article 115353"},"PeriodicalIF":7.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146024221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accelerating deep neural networks through stability-aware initial training and density-guided asymptotic filter decay 通过稳定性感知初始训练和密度引导渐近滤波器衰减加速深度神经网络
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-14 DOI: 10.1016/j.knosys.2026.115337
Zhuang Chen , Ye Su , Jinzhe Huang , Yichen Ye , Yiyuan Xie
Identifying the computationally redundant components in deep neural networks constitutes the cornerstone of network compression. Unlike magnitude-oriented approaches that rank filters based solely on their deviation from a global centroid, potentially skewing parameter distribution, we introduce a novel framework to fully exploit the latent feature correlations within the parameter space. Firstly, for untrained or incompletely trained networks, we introduce the concept of network stability to quantify architectural similarity across training epochs and adaptively provide explicit guidance for searching the optimal network architecture without requiring full training of the entire model. Consequently, we yield a seed model enriched with more reliable information than those initialized with random weights. Secondly, we analyze the characteristics of parameter distribution and construct the hypersphere for each filter. We then quantify the density of each filter within its hypersphere neighborhood and identify redundant filters based on the criterion that high-density filters demonstrate high “replaceability”, which can be recovered through fine-tuning techniques. Thirdly, to ensure stable and smooth knowledge transfer from the original to the pruned model, we adopt an asymptotic filter decay technique, concentrating informative weights toward retained filters gradually. We conduct extensive experiments on two real-world datasets to demonstrate the superiority of our proposed method over state-of-the-art algorithms. For instance, on CIFAR-10, our method reduces FLOPs by 62.8% on ResNet-110, with even a 0.13% accuracy improvement. On the larger ILSVRC-2012 dataset, we achieve a 53.5% reduction in FLOPs with only a 0.87% drop in Top-1 accuracy on ResNet-50.
识别深度神经网络中计算冗余的组件是网络压缩的基础。与仅根据其与全局质心的偏差(可能扭曲参数分布)对滤波器进行排序的面向震级的方法不同,我们引入了一个新的框架来充分利用参数空间内的潜在特征相关性。首先,对于未训练或未完全训练的网络,我们引入网络稳定性的概念来量化跨训练时期的架构相似性,并自适应地为搜索最优网络架构提供明确的指导,而无需对整个模型进行完全训练。因此,我们得到的种子模型比那些用随机权重初始化的模型具有更丰富的可靠信息。其次,分析了各滤波器的参数分布特征,构造了各滤波器的超球;然后,我们量化了每个滤波器在其超球邻域中的密度,并根据高密度滤波器表现出高“可替换性”的标准识别冗余滤波器,这可以通过微调技术恢复。第三,采用渐近滤波器衰减技术,将信息权逐渐集中到保留的滤波器上,以保证知识从原始模型平稳平稳地转移到剪枝模型。我们在两个真实世界的数据集上进行了广泛的实验,以证明我们提出的方法优于最先进的算法。例如,在CIFAR-10上,我们的方法在ResNet-110上减少了62.8%的FLOPs,准确率甚至提高了0.13%。在更大的ILSVRC-2012数据集上,我们实现了53.5%的FLOPs降低,而ResNet-50上Top-1精度仅下降0.87%。
{"title":"Accelerating deep neural networks through stability-aware initial training and density-guided asymptotic filter decay","authors":"Zhuang Chen ,&nbsp;Ye Su ,&nbsp;Jinzhe Huang ,&nbsp;Yichen Ye ,&nbsp;Yiyuan Xie","doi":"10.1016/j.knosys.2026.115337","DOIUrl":"10.1016/j.knosys.2026.115337","url":null,"abstract":"<div><div>Identifying the computationally redundant components in deep neural networks constitutes the cornerstone of network compression. Unlike magnitude-oriented approaches that rank filters based solely on their deviation from a global centroid, potentially skewing parameter distribution, we introduce a novel framework to fully exploit the latent feature correlations within the parameter space. Firstly, for untrained or incompletely trained networks, we introduce the concept of network stability to quantify architectural similarity across training epochs and adaptively provide explicit guidance for searching the optimal network architecture without requiring full training of the entire model. Consequently, we yield a seed model enriched with more reliable information than those initialized with random weights. Secondly, we analyze the characteristics of parameter distribution and construct the hypersphere for each filter. We then quantify the density of each filter within its hypersphere neighborhood and identify redundant filters based on the criterion that high-density filters demonstrate high “replaceability”, which can be recovered through fine-tuning techniques. Thirdly, to ensure stable and smooth knowledge transfer from the original to the pruned model, we adopt an asymptotic filter decay technique, concentrating informative weights toward retained filters gradually. We conduct extensive experiments on two real-world datasets to demonstrate the superiority of our proposed method over state-of-the-art algorithms. For instance, on CIFAR-10, our method reduces FLOPs by 62.8% on ResNet-110, with even a 0.13% accuracy improvement. On the larger ILSVRC-2012 dataset, we achieve a 53.5% reduction in FLOPs with only a 0.87% drop in Top-1 accuracy on ResNet-50.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"336 ","pages":"Article 115337"},"PeriodicalIF":7.6,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145980893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CLMDA:Cross language vulnerability detection based on multimodal learning and domain adaptation CLMDA:基于多模态学习和领域适应的跨语言漏洞检测
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-14 DOI: 10.1016/j.knosys.2026.115336
Zhengbin Zou, Tao Jiang, Nan Zhang, Yizheng Wang, Tiancheng Xue, Jie Luan
Due to the semantic gap and distribution discrepancy between the source and target domains, cross-language software vulnerability detection remains a significant challenge. To address this issue, this paper proposes CLMDA, a cross-language vulnerability detection framework based on multimodal learning and domain adaptation. CLMDA integrates three complementary modalities: token sequences, dynamic execution paths, and code property graphs, to comprehensively capture the syntactic and semantic features of code. For each modality, specific feature extractors are designed: a pre-trained model is employed to process token sequences, a Transformer encoder is employed to represent dynamic path information, and a graph attention network is applied to model code property graphs. In order to alleviate the distribution discrepancy across languages, a class-balance-driven cross-domain adaptive learning strategy is designed in this study to enhance inter-modal alignment. In this study, C/C++ is adopted as the source domain, while Java serves as the target domain. Finally, multimodal features are fused for vulnerability prediction. Experimental results on benchmark datasets demonstrate that, compared with other state-of-the-art methods, CLMDA achieves an improvement in F1 score ranging from 14.26% to 55.32%, with the Matthews Correlation Coefficient exhibits a maximum increase of nearly 70%.
由于源域和目标域之间的语义差距和分布差异,跨语言软件漏洞检测仍然是一个重大挑战。针对这一问题,本文提出了基于多模态学习和领域自适应的跨语言漏洞检测框架CLMDA。CLMDA集成了三种互补的模式:标记序列、动态执行路径和代码属性图,以全面捕获代码的语法和语义特征。针对每种模式,设计了特定的特征提取器:使用预训练模型处理令牌序列,使用Transformer编码器表示动态路径信息,并使用图注意网络对模型代码属性图进行关注。为了缓解跨语言的分布差异,本研究设计了一个类平衡驱动的跨领域自适应学习策略,以增强跨模式的一致性。本研究采用C/ c++作为源域,Java作为目标域。最后,融合多模态特征进行脆弱性预测。在基准数据集上的实验结果表明,与其他最先进的方法相比,CLMDA的F1得分提高了14.26% ~ 55.32%,Matthews相关系数最大提高了近70%。
{"title":"CLMDA:Cross language vulnerability detection based on multimodal learning and domain adaptation","authors":"Zhengbin Zou,&nbsp;Tao Jiang,&nbsp;Nan Zhang,&nbsp;Yizheng Wang,&nbsp;Tiancheng Xue,&nbsp;Jie Luan","doi":"10.1016/j.knosys.2026.115336","DOIUrl":"10.1016/j.knosys.2026.115336","url":null,"abstract":"<div><div>Due to the semantic gap and distribution discrepancy between the source and target domains, cross-language software vulnerability detection remains a significant challenge. To address this issue, this paper proposes CLMDA, a cross-language vulnerability detection framework based on multimodal learning and domain adaptation. CLMDA integrates three complementary modalities: token sequences, dynamic execution paths, and code property graphs, to comprehensively capture the syntactic and semantic features of code. For each modality, specific feature extractors are designed: a pre-trained model is employed to process token sequences, a Transformer encoder is employed to represent dynamic path information, and a graph attention network is applied to model code property graphs. In order to alleviate the distribution discrepancy across languages, a class-balance-driven cross-domain adaptive learning strategy is designed in this study to enhance inter-modal alignment. In this study, C/C++ is adopted as the source domain, while Java serves as the target domain. Finally, multimodal features are fused for vulnerability prediction. Experimental results on benchmark datasets demonstrate that, compared with other state-of-the-art methods, CLMDA achieves an improvement in F1 score ranging from 14.26% to 55.32%, with the Matthews Correlation Coefficient exhibits a maximum increase of nearly 70%.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"336 ","pages":"Article 115336"},"PeriodicalIF":7.6,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146024297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
M3-UNet: Multi-frequency, multi-scale and multi-task U-Net for intima-media complex segmentation M3-UNet:多频、多尺度、多任务的中内膜复合分割U-Net
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-14 DOI: 10.1016/j.knosys.2026.115298
Beibei Wang , Yizhuo Feng , Peng Wang , Zirui Wang , Zhenfeng Li , Pang Wu , Xianxiang Chen , Junxian Song , Hongbo Chang , Lidong Du , Zhen Fang
Cardiovascular diseases, driven by atherosclerosis, are the leading cause of global mortality. Reliable measurement of a key risk indicator, the carotid intima-media thickness (CIMT), necessitates precise segmentation of the intima-media complex (IMC) from B-mode ultrasound images. However, this task is confronted by two intertwined, fundamental challenges. At the feature level, the inherently low signal-to-noise ratio (SNR) of ultrasound images degrades feature representations, leaving extracted features with insufficient discriminative power to encode tissue boundaries and textures. At the structural level, substantial morphological variations of the IMC across different pathological states challenge the model’s geometric adaptability and structure-capturing capability. In essence, the performance bottleneck of existing methods originates from the coupling of these challenges, where unreliable underlying features severely inhibit the effective modeling of complex structures. To tackle this, we propose M3-UNet, a segmentation architecture employing a sequential decoupling strategy. Its backbone first counteracts representation degradation by refining features with Multi-frequency Octave Convolution (MFOC). Building on this, a Multi-scale Fusion Module (MSFM) enables robust modeling of complex, variable structures by effectively capturing multi-scale information. Based on these high-quality features, a multi-task framework centered on a Boundary Refinement Module (BRM) resolves pixel-level boundary ambiguities by enhancing border signals. Comprehensive experiments demonstrate that M3-UNet, empowered by its efficient core components, achieves state-of-the-art (SOTA) performance on two public IMC segmentation datasets. Furthermore, the model’s superior performance on the morphologically disparate arterial lumen segmentation validates its broad generalization capability.
由动脉粥样硬化引起的心血管疾病是全球死亡的主要原因。可靠的测量一个关键的风险指标,颈动脉内膜-中膜厚度(CIMT),需要从b超图像中精确分割内膜-中膜复合体(IMC)。然而,这一任务面临着两个相互交织的根本挑战。在特征层面,超声图像固有的低信噪比(SNR)降低了特征表示,使得提取的特征没有足够的判别能力来编码组织边界和纹理。在结构水平上,IMC在不同病理状态下的大量形态学变化挑战了模型的几何适应性和结构捕获能力。从本质上讲,现有方法的性能瓶颈源于这些挑战的耦合,其中不可靠的底层特征严重抑制了复杂结构的有效建模。为了解决这个问题,我们提出了M3-UNet,一种采用顺序解耦策略的分割架构。它的主干首先通过多频倍频卷积(MFOC)改进特征来抵消表征退化。在此基础上,多尺度融合模块(MSFM)通过有效捕获多尺度信息,实现复杂、可变结构的鲁棒建模。基于这些高质量特征,以边界细化模块(BRM)为中心的多任务框架通过增强边界信号来解决像素级边界模糊问题。综合实验表明,在其高效核心组件的支持下,M3-UNet在两个公共IMC分割数据集上实现了最先进(SOTA)的性能。此外,该模型在形态学不同的动脉腔分割上的优越性能验证了其广泛的泛化能力。
{"title":"M3-UNet: Multi-frequency, multi-scale and multi-task U-Net for intima-media complex segmentation","authors":"Beibei Wang ,&nbsp;Yizhuo Feng ,&nbsp;Peng Wang ,&nbsp;Zirui Wang ,&nbsp;Zhenfeng Li ,&nbsp;Pang Wu ,&nbsp;Xianxiang Chen ,&nbsp;Junxian Song ,&nbsp;Hongbo Chang ,&nbsp;Lidong Du ,&nbsp;Zhen Fang","doi":"10.1016/j.knosys.2026.115298","DOIUrl":"10.1016/j.knosys.2026.115298","url":null,"abstract":"<div><div>Cardiovascular diseases, driven by atherosclerosis, are the leading cause of global mortality. Reliable measurement of a key risk indicator, the carotid intima-media thickness (CIMT), necessitates precise segmentation of the intima-media complex (IMC) from B-mode ultrasound images. However, this task is confronted by two intertwined, fundamental challenges. At the feature level, the inherently low signal-to-noise ratio (SNR) of ultrasound images degrades feature representations, leaving extracted features with insufficient discriminative power to encode tissue boundaries and textures. At the structural level, substantial morphological variations of the IMC across different pathological states challenge the model’s geometric adaptability and structure-capturing capability. In essence, the performance bottleneck of existing methods originates from the coupling of these challenges, where unreliable underlying features severely inhibit the effective modeling of complex structures. To tackle this, we propose M<sup>3</sup>-UNet, a segmentation architecture employing a sequential decoupling strategy. Its backbone first counteracts representation degradation by refining features with Multi-frequency Octave Convolution (MFOC). Building on this, a Multi-scale Fusion Module (MSFM) enables robust modeling of complex, variable structures by effectively capturing multi-scale information. Based on these high-quality features, a multi-task framework centered on a Boundary Refinement Module (BRM) resolves pixel-level boundary ambiguities by enhancing border signals. Comprehensive experiments demonstrate that M<sup>3</sup>-UNet, empowered by its efficient core components, achieves state-of-the-art (SOTA) performance on two public IMC segmentation datasets. Furthermore, the model’s superior performance on the morphologically disparate arterial lumen segmentation validates its broad generalization capability.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"336 ","pages":"Article 115298"},"PeriodicalIF":7.6,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146024293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Knowledge-Based Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1