首页 > 最新文献

Knowledge-Based Systems最新文献

英文 中文
Diff-GDAformer: A diffusion-guided dynamic attention transformer for image inpainting Diff-GDAformer:一种用于图像绘制的扩散引导动态注意力转换器
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-31 DOI: 10.1016/j.knosys.2026.115443
Hao Wu, Shuzhen Xu, Cuicui Lv, Yuanwei Bi, Zhizhong Liu, Shuo Wang
Diffusion model (DM) has shown great promise in image inpainting by modeling complex data distributions and generating high-quality reconstructions. However, current diffusion-based methods often face challenges such as excessive iterative steps and limited adaptability to both local and global features, resulting in high computational costs and suboptimal restoration quality. To address these issues, we propose Diff-GDAformer, a novel image inpainting framework that combines diffusion-based prior feature generation with guided dynamic attention Transformer (GDAformer) for robust and efficient restoration. In our approach, the DM iteratively refines Gaussian noise in a compressed latent space to generate high-quality prior features, which guide the restoration process. These prior features are injected into GDAformer, which innovatively adopts a dynamic recursive local attention (DRLA) module. DRLA makes use of two complementary attention mechanisms: guided local self-attention (GL-SA) and guided recursive-generalized self-attention (GRG-SA). GL-SA enhances the model’s ability to capture fine-grained local details, while GRG-SA focuses on aggregating global contextual information efficiently. To bridge the gap between local and global features, we introduce the hybrid feature integration (HFI) module, which effectively fuses features from different attention layers, enabling a more comprehensive understanding of image contexts. The two-stage training strategy combines GDAformer with DM optimization, ensuring that the extracted prior features are accurate and seamlessly integrated into the restoration pipeline. Extensive experiments demonstrate that Diff-GDAformer achieves state-of-the-art performance on standard benchmarks, delivering superior visual quality and computational efficiency compared to existing methods. https://github.com/w1zzzzzWu/Diff-GDAformer.
扩散模型(DM)通过建模复杂的数据分布和生成高质量的重建图像,在图像绘制中显示出巨大的前景。然而,目前基于扩散的方法往往面临迭代步骤过多、对局部和全局特征的适应性有限等挑战,导致计算成本高、恢复质量欠佳。为了解决这些问题,我们提出了Diff-GDAformer,这是一种新的图像修复框架,将基于扩散的先验特征生成与引导动态注意力转换器(GDAformer)相结合,以实现鲁棒和高效的恢复。在我们的方法中,DM迭代地细化压缩潜在空间中的高斯噪声以生成高质量的先验特征,这些特征指导恢复过程。将这些先验特征注入到GDAformer中,创新地采用了动态递归局部注意(DRLA)模块。DRLA使用了两种互补的注意机制:引导局部自注意(GL-SA)和引导递归-广义自注意(GRG-SA)。GL-SA增强了模型捕获细粒度局部细节的能力,而GRG-SA侧重于有效地聚合全局上下文信息。为了弥合局部和全局特征之间的差距,我们引入了混合特征集成(HFI)模块,该模块有效地融合了来自不同关注层的特征,从而能够更全面地理解图像上下文。两阶段训练策略将GDAformer与DM优化相结合,确保提取的先验特征准确且无缝集成到恢复管道中。大量实验表明,与现有方法相比,Diff-GDAformer在标准基准测试中实现了最先进的性能,提供了卓越的视觉质量和计算效率。https://github.com/w1zzzzzWu/Diff-GDAformer。
{"title":"Diff-GDAformer: A diffusion-guided dynamic attention transformer for image inpainting","authors":"Hao Wu,&nbsp;Shuzhen Xu,&nbsp;Cuicui Lv,&nbsp;Yuanwei Bi,&nbsp;Zhizhong Liu,&nbsp;Shuo Wang","doi":"10.1016/j.knosys.2026.115443","DOIUrl":"10.1016/j.knosys.2026.115443","url":null,"abstract":"<div><div>Diffusion model (DM) has shown great promise in image inpainting by modeling complex data distributions and generating high-quality reconstructions. However, current diffusion-based methods often face challenges such as excessive iterative steps and limited adaptability to both local and global features, resulting in high computational costs and suboptimal restoration quality. To address these issues, we propose Diff-GDAformer, a novel image inpainting framework that combines diffusion-based prior feature generation with guided dynamic attention Transformer (GDAformer) for robust and efficient restoration. In our approach, the DM iteratively refines Gaussian noise in a compressed latent space to generate high-quality prior features, which guide the restoration process. These prior features are injected into GDAformer, which innovatively adopts a dynamic recursive local attention (DRLA) module. DRLA makes use of two complementary attention mechanisms: guided local self-attention (GL-SA) and guided recursive-generalized self-attention (GRG-SA). GL-SA enhances the model’s ability to capture fine-grained local details, while GRG-SA focuses on aggregating global contextual information efficiently. To bridge the gap between local and global features, we introduce the hybrid feature integration (HFI) module, which effectively fuses features from different attention layers, enabling a more comprehensive understanding of image contexts. The two-stage training strategy combines GDAformer with DM optimization, ensuring that the extracted prior features are accurate and seamlessly integrated into the restoration pipeline. Extensive experiments demonstrate that Diff-GDAformer achieves state-of-the-art performance on standard benchmarks, delivering superior visual quality and computational efficiency compared to existing methods. <span><span>https://github.com/w1zzzzzWu/Diff-GDAformer</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115443"},"PeriodicalIF":7.6,"publicationDate":"2026-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recursive multi-modal retrieval for structured semantic trees in engineering documents 工程文档结构化语义树的递归多模态检索
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-29 DOI: 10.1016/j.knosys.2026.115433
Fei Li, Xinyu Li, Jinsong Bao
In lifecycle-oriented manufacturing systems, numerous engineering documents with text, tables, and images are continuously produced. Retrieval-augmented generation (RAG) models enhance document retrieval efficiency and adapt to evolving domain knowledge. However, existing methods struggle to achieve accurate cross-modal semantic alignment and high-precision retrieval in engineering documents. To address these limitations, this paper proposes the recursive multi-modal retrieval for structured semantic trees (RMR-SST) method for engineering documents. First, layout analysis extracts multimodal elements and divides metadata into three hierarchical levels: minimal chunks, assembly chunks, and section chunks. Domain rules are then applied to compute inter-section semantic relationships and construct the structured semantic trees (SSTs) of engineering documents. Second, a context-aware multimodal semantic alignment strategy is proposed to embed multimodal metadata chunks and their semantic relationships into a unified vector space, enabling cross-modal semantic alignment of SSTs. Finally, a recursive abstractive multimodal metadata retrieval algorithm is designed to integrate multimodal information across documents at different abstraction levels and to generate multimodal retrieval results. Based on 872 ship-design engineering documents, multiple SSTs were constructed for evaluation. Experiments show that RMR-SST outperforms conventional RAG methods in multimodal retrieval and semantic alignment tasks, achieving a Hit@5 of 88.3% when integrated with the Qwen3–235B model.
在面向生命周期的制造系统中,不断产生大量带有文本、表格和图像的工程文档。检索增强生成(RAG)模型提高了文档检索效率,适应了不断发展的领域知识。然而,现有的方法难以在工程文档中实现准确的跨模态语义对齐和高精度检索。针对这些局限性,本文提出了面向工程文档的递归多模态结构化语义树检索(RMR-SST)方法。首先,进行布局分析,提取多模态元素,并将元数据划分为最小块、装配块和分段块三个层次。然后应用领域规则计算交叉语义关系,构建工程文档的结构化语义树(SSTs)。其次,提出了上下文感知的多模态语义对齐策略,将多模态元数据块及其语义关系嵌入到统一的向量空间中,实现了sst的跨模态语义对齐。最后,设计了一种递归抽象多模态元数据检索算法,用于集成不同抽象层次文档中的多模态信息,生成多模态检索结果。基于872份船舶设计工程文件,构建了多个SSTs进行评价。实验表明,RMR-SST在多模态检索和语义对齐任务上优于传统的RAG方法,与Qwen3-235B模型集成后,准确率达到Hit@5 88.3%。
{"title":"Recursive multi-modal retrieval for structured semantic trees in engineering documents","authors":"Fei Li,&nbsp;Xinyu Li,&nbsp;Jinsong Bao","doi":"10.1016/j.knosys.2026.115433","DOIUrl":"10.1016/j.knosys.2026.115433","url":null,"abstract":"<div><div>In lifecycle-oriented manufacturing systems, numerous engineering documents with text, tables, and images are continuously produced. Retrieval-augmented generation (RAG) models enhance document retrieval efficiency and adapt to evolving domain knowledge. However, existing methods struggle to achieve accurate cross-modal semantic alignment and high-precision retrieval in engineering documents. To address these limitations, this paper proposes the recursive multi-modal retrieval for structured semantic trees (RMR-SST) method for engineering documents. First, layout analysis extracts multimodal elements and divides metadata into three hierarchical levels: minimal chunks, assembly chunks, and section chunks. Domain rules are then applied to compute inter-section semantic relationships and construct the structured semantic trees (SSTs) of engineering documents. Second, a context-aware multimodal semantic alignment strategy is proposed to embed multimodal metadata chunks and their semantic relationships into a unified vector space, enabling cross-modal semantic alignment of SSTs. Finally, a recursive abstractive multimodal metadata retrieval algorithm is designed to integrate multimodal information across documents at different abstraction levels and to generate multimodal retrieval results. Based on 872 ship-design engineering documents, multiple SSTs were constructed for evaluation. Experiments show that RMR-SST outperforms conventional RAG methods in multimodal retrieval and semantic alignment tasks, achieving a Hit@5 of 88.3% when integrated with the Qwen3–235B model.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115433"},"PeriodicalIF":7.6,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PoseDefCycleGAN: Identity-preserving face frontalization with deformable convolutions and pose-aware supervision PoseDefCycleGAN:具有可变形卷积和姿态感知监督的身份保持人脸正面化
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-27 DOI: 10.1016/j.knosys.2026.115358
Shakeel Muhammad Ibrahim , Shujaat Khan , Young-Woong Ko , Jeong-Gun Lee
Face recognition systems have achieved impressive accuracy in controlled environments but continue to face challenges under extreme pose variations. To address this limitation, we propose a novel face frontalization framework, PoseDefCycleGAN, that combines the strengths of CycleGAN, deformable convolution, and pose-guided supervision. Our method leverages deformable convolution in the final layer of the generator to dynamically adapt the receptive field, enabling better reconstruction of complex facial geometries. Additionally, we incorporate a lightweight pose classification network to enforce pose-aware regularization, encouraging the generation of semantically consistent frontal images. The proposed model is trained using unpaired data and optimized with a combination of adversarial, cycle consistency, identity-preserving, and pose regularization losses. Extensive experiments on MultiPIE, AFW, and LFW datasets demonstrate that the method improves both visual fidelity and face recognition, particularly at extreme yaw angles: on MultiPIE we reduce FID to 15.90 (from 18.32 with CycleGAN) and achieve 98.9% rank-1 accuracy at  ± 90; on LFW we obtain 90.20% accuracy with LPIPS=0.3052. Quantitative evaluations further validate the contribution of deformable convolutions and pose supervision. Our work presents a robust solution for pose-invariant face recognition and establishes a strong benchmark for identity-preserving face frontalization. Model implementation is available on the author’s GitHub page https://github.com/Shak97/PoseDefCycleGAN.
人脸识别系统在受控环境中取得了令人印象深刻的准确性,但在极端姿势变化下仍面临挑战。为了解决这一限制,我们提出了一种新的人脸前端化框架PoseDefCycleGAN,它结合了CycleGAN、可变形卷积和姿态引导监督的优点。我们的方法利用生成器最后一层的可变形卷积来动态调整接受野,从而更好地重建复杂的面部几何形状。此外,我们结合了一个轻量级的姿势分类网络来强制姿势感知正则化,鼓励生成语义一致的正面图像。该模型使用非配对数据进行训练,并结合对抗性、周期一致性、身份保持和姿态正则化损失进行优化。在MultiPIE、AFW和LFW数据集上进行的大量实验表明,该方法提高了视觉保真度和人脸识别,特别是在极端偏航角下:在MultiPIE上,我们将FID从使用CycleGAN时的18.32降至15.90,在 ± 90°时达到98.9%的rank-1精度;在LFW上,我们获得90.20%的准确率,LPIPS=0.3052。定量评估进一步验证了变形卷积和位姿监督的贡献。我们的工作提出了一种姿态不变人脸识别的鲁棒解决方案,并为保持身份的人脸正面化建立了一个强大的基准。模型实现可在作者的GitHub页面https://github.com/Shak97/PoseDefCycleGAN上获得。
{"title":"PoseDefCycleGAN: Identity-preserving face frontalization with deformable convolutions and pose-aware supervision","authors":"Shakeel Muhammad Ibrahim ,&nbsp;Shujaat Khan ,&nbsp;Young-Woong Ko ,&nbsp;Jeong-Gun Lee","doi":"10.1016/j.knosys.2026.115358","DOIUrl":"10.1016/j.knosys.2026.115358","url":null,"abstract":"<div><div>Face recognition systems have achieved impressive accuracy in controlled environments but continue to face challenges under extreme pose variations. To address this limitation, we propose a novel face frontalization framework, PoseDefCycleGAN, that combines the strengths of CycleGAN, deformable convolution, and pose-guided supervision. Our method leverages deformable convolution in the final layer of the generator to dynamically adapt the receptive field, enabling better reconstruction of complex facial geometries. Additionally, we incorporate a lightweight pose classification network to enforce pose-aware regularization, encouraging the generation of semantically consistent frontal images. The proposed model is trained using unpaired data and optimized with a combination of adversarial, cycle consistency, identity-preserving, and pose regularization losses. Extensive experiments on MultiPIE, AFW, and LFW datasets demonstrate that the method improves both visual fidelity and face recognition, particularly at extreme yaw angles: on MultiPIE we reduce FID to 15.90 (from 18.32 with CycleGAN) and achieve 98.9% rank-1 accuracy at  ± 90<sup>∘</sup>; on LFW we obtain 90.20% accuracy with LPIPS=0.3052. Quantitative evaluations further validate the contribution of deformable convolutions and pose supervision. Our work presents a robust solution for pose-invariant face recognition and establishes a strong benchmark for identity-preserving face frontalization. Model implementation is available on the author’s GitHub page <span><span>https://github.com/Shak97/PoseDefCycleGAN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115358"},"PeriodicalIF":7.6,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Motion data segmentation using robust subspace clustering with noise suppression 基于噪声抑制的鲁棒子空间聚类的运动数据分割
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-26 DOI: 10.1016/j.knosys.2026.115386
Qian Wang , Hong Song , Yungang Hao , Yunzhi Luo , Jingfan Fan , Jian Yang
Numerous applications regard motion segmentation as a fundamental and vital process. A plethora of motion segmentation techniques have been introduced, with the subspace clustering-based method standing out, particularly because of its unsupervised nature. However, these methods often face a challenge in effectively handling nonlinear data with hybrid noise. In the present study, we propose a novel robust subspace clustering methodology, specifically designed to address the complexities inherent in motion segmentation tasks. We’ve termed it as Robust Subspace Clustering with Noise Suppression (RSCNS),which integrates hybrid noise reconstruction with a representation of data relationships. Specifically, we propose a hybrid noise modeling method by joining Correntropy and Cauchy function to suppress noise and outlier pollution. To restore the corrupted data, we treat the motion trajectory feature data matrix as an approximate low-rank matrix and design a truncated weighting nuclear norm regularization constraint. Meanwhile, the block diagonal regularizer (BDR) is incorporated into our model to ensure that motion trajectory features from the same moving object are clustered together. Experimental evaluations are conducted on various video datasets, demonstrating that RSCNS can effectively handle motion segmentation tasks not only in visible light video, but also in invisible light (infrared) video.
许多应用都把运动分割作为一个基本的和重要的过程。已经引入了大量的运动分割技术,其中基于子空间聚类的方法脱颖而出,特别是因为它的无监督性质。然而,这些方法在有效处理带有混合噪声的非线性数据时往往面临挑战。在本研究中,我们提出了一种新的鲁棒子空间聚类方法,专门用于解决运动分割任务中固有的复杂性。我们将其称为具有噪声抑制的鲁棒子空间聚类(RSCNS),它将混合噪声重构与数据关系表示集成在一起。具体来说,我们提出了一种结合Correntropy和Cauchy函数来抑制噪声和离群污染的混合噪声建模方法。为了恢复损坏的数据,我们将运动轨迹特征数据矩阵视为近似的低秩矩阵,并设计了截断加权核范数正则化约束。同时,在模型中引入了块对角正则化(BDR),以保证同一运动对象的运动轨迹特征聚类在一起。在各种视频数据集上进行了实验评估,结果表明RSCNS不仅可以有效地处理可见光视频中的运动分割任务,也可以有效地处理不可见光(红外)视频中的运动分割任务。
{"title":"Motion data segmentation using robust subspace clustering with noise suppression","authors":"Qian Wang ,&nbsp;Hong Song ,&nbsp;Yungang Hao ,&nbsp;Yunzhi Luo ,&nbsp;Jingfan Fan ,&nbsp;Jian Yang","doi":"10.1016/j.knosys.2026.115386","DOIUrl":"10.1016/j.knosys.2026.115386","url":null,"abstract":"<div><div>Numerous applications regard motion segmentation as a fundamental and vital process. A plethora of motion segmentation techniques have been introduced, with the subspace clustering-based method standing out, particularly because of its unsupervised nature. However, these methods often face a challenge in effectively handling nonlinear data with hybrid noise. In the present study, we propose a novel robust subspace clustering methodology, specifically designed to address the complexities inherent in motion segmentation tasks. We’ve termed it as Robust Subspace Clustering with Noise Suppression (RSCNS),which integrates hybrid noise reconstruction with a representation of data relationships. Specifically, we propose a hybrid noise modeling method by joining Correntropy and Cauchy function to suppress noise and outlier pollution. To restore the corrupted data, we treat the motion trajectory feature data matrix as an approximate low-rank matrix and design a truncated weighting nuclear norm regularization constraint. Meanwhile, the block diagonal regularizer (BDR) is incorporated into our model to ensure that motion trajectory features from the same moving object are clustered together. Experimental evaluations are conducted on various video datasets, demonstrating that RSCNS can effectively handle motion segmentation tasks not only in visible light video, but also in invisible light (infrared) video.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115386"},"PeriodicalIF":7.6,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PYRA: A high-level linter for data science software PYRA:数据科学软件的高级过滤器
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-25 DOI: 10.1016/j.knosys.2026.115412
Greta Dolcetti , Vincenzo Arceri , Antonella Mensi , Enea Zaffanella , Caterina Urban , Agostino Cortesi
Due to its interdisciplinary nature, the development of data science software is particularly prone to a wide range of potential mistakes that can easily and silently compromise the final results. Several tools have been proposed that can help the data scientist in identifying the most common, low-level programming issues. However, these tools often fall short in detecting higher-level, domain-specific issues typical of data science pipelines, where subtle errors may not trigger exceptions but can still lead to incorrect or misleading outcomes, or unexpected behaviors. In this paper, we present PYRA, a static analysis tool that aims at detecting code smells in data science workflows. PYRA builds upon the Abstract Interpretation framework to infer abstract datatypes, and exploits such information to flag 16 categories of potential code smells concerning misleading visualizations, challenges for reproducibility, as well as misleading, unreliable or unexpected results. Unlike traditional linters, which focus on syntactic or stylistic issues, PYRA reasons over a domain-specific type system to identify data science-specific problems – such as improper data preprocessing steps and procedures’ misapplications – that could silently propagate through a data-manipulation pipeline. Beyond static checking, we envision tools like PYRA becoming integral components of the development loop, with analysis reports guiding correction and helping assess the reliability of machine learning pipelines. We evaluate PYRA on a benchmark suite of real-world Jupyter notebooks, showing its effectiveness in detecting practical data science issues, thereby enhancing transparency, correctness, and reproducibility in data science software.
由于其跨学科的性质,数据科学软件的开发特别容易出现各种各样的潜在错误,这些错误可以很容易地悄悄地损害最终结果。已经提出了一些工具,可以帮助数据科学家识别最常见的低级编程问题。然而,这些工具在检测数据科学管道中典型的高层次、特定领域的问题方面往往不足,其中细微的错误可能不会触发异常,但仍可能导致不正确或误导性的结果,或意外的行为。在本文中,我们介绍了PYRA,一个静态分析工具,旨在检测数据科学工作流中的代码气味。PYRA建立在抽象解释框架的基础上,以推断抽象数据类型,并利用这些信息来标记16类潜在的代码气味,这些代码气味涉及误导性的可视化、可再现性的挑战,以及误导性、不可靠或意外的结果。与关注语法或风格问题的传统检测不同,PYRA通过特定于领域的类型系统来识别特定于数据科学的问题——例如不适当的数据预处理步骤和过程的错误应用——这些问题可以通过数据操作管道无声地传播。除了静态检查,我们设想像PYRA这样的工具成为开发循环的组成部分,分析报告指导纠正并帮助评估机器学习管道的可靠性。我们在真实世界的Jupyter笔记本的基准套件上对PYRA进行了评估,显示了它在检测实际数据科学问题方面的有效性,从而提高了数据科学软件的透明度、正确性和可重复性。
{"title":"PYRA: A high-level linter for data science software","authors":"Greta Dolcetti ,&nbsp;Vincenzo Arceri ,&nbsp;Antonella Mensi ,&nbsp;Enea Zaffanella ,&nbsp;Caterina Urban ,&nbsp;Agostino Cortesi","doi":"10.1016/j.knosys.2026.115412","DOIUrl":"10.1016/j.knosys.2026.115412","url":null,"abstract":"<div><div>Due to its interdisciplinary nature, the development of data science software is particularly prone to a wide range of potential mistakes that can easily and silently compromise the final results. Several tools have been proposed that can help the data scientist in identifying the most common, low-level programming issues. However, these tools often fall short in detecting higher-level, domain-specific issues typical of data science pipelines, where subtle errors may not trigger exceptions but can still lead to incorrect or misleading outcomes, or unexpected behaviors. In this paper, we present <span><math><mrow><mi>PYRA</mi></mrow></math></span>, a static analysis tool that aims at detecting code smells in data science workflows. <span><math><mrow><mi>PYRA</mi></mrow></math></span> builds upon the Abstract Interpretation framework to infer abstract datatypes, and exploits such information to flag 16 categories of potential code smells concerning misleading visualizations, challenges for reproducibility, as well as misleading, unreliable or unexpected results. Unlike traditional linters, which focus on syntactic or stylistic issues, <span><math><mrow><mi>PYRA</mi></mrow></math></span> reasons over a domain-specific type system to identify data science-specific problems – such as improper data preprocessing steps and procedures’ misapplications – that could silently propagate through a data-manipulation pipeline. Beyond static checking, we envision tools like <span><math><mrow><mi>PYRA</mi></mrow></math></span> becoming integral components of the development loop, with analysis reports guiding correction and helping assess the reliability of machine learning pipelines. We evaluate <span><math><mrow><mi>PYRA</mi></mrow></math></span> on a benchmark suite of real-world Jupyter notebooks, showing its effectiveness in detecting practical data science issues, thereby enhancing transparency, correctness, and reproducibility in data science software.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115412"},"PeriodicalIF":7.6,"publicationDate":"2026-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HDC-Net:A multimodal remote sensing semantic segmentation network with hierarchical dual-stream fusion and cross-token interaction HDC-Net:一个具有分层双流融合和交叉标记交互的多模态遥感语义分割网络
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-25 DOI: 10.1016/j.knosys.2026.115416
Zhengpeng Li , Yubo Zhang , Jun Hu , Kunyang Wu , Jiawei Miao , Jiansheng Wu , Xiaolin Zhang , Bin Yang , Zhiguo Xia
Semantic segmentation of multimodal remote sensing imagery aims to integrate complementary information from different sensors to achieve high-precision land-cover classification, representing a key direction in remote sensing image interpretation. However, most existing approaches adopt homogeneous fusion strategies, such as simple feature concatenation or uniform attention mechanisms, which fail to address the dynamic requirements across different representation levels. In the shallow layers, the lack of precise spatial alignment often leads to the loss of fine details and blurred boundaries, while in the deeper layers, these methods struggle to effectively disentangle cross-modal semantic relationships and model global dependencies. To overcome these limitations, this paper proposes a hierarchical dual-stream fusion and cross-token interaction network (HDC-Net) for multimodal remote sensing semantic segmentation. The network follows a layer-wise heterogeneous design. In the shallow encoding stage, an interactive and shared attention fusion (ISAF) module is introduced to achieve pixel-level spatial alignment and feature enhancement. In the deeper layers, a hierarchical cross-token interaction transformer (HCFormer) is developed for global semantic modeling and cross-modal relationship disentanglement. Additionally, a pyramidal fusion bridge (PFB) is designed to efficiently connect deep and shallow features. Finally, an information-fusion decoder integrates deep semantics, cross-modal bridging features, and shallow spatial details to produce high-fidelity segmentation maps. Extensive experiments on three public benchmark datasets, ISPRS Vaihingen, ISPRS Potsdam, and WHU-OPT-SAR, demonstrate the effectiveness, robustness, and generalization capability of the proposed approach. The implementation is available at https://github.com/lzp-lkd/HDC-Net.
多模态遥感图像语义分割旨在整合不同传感器的互补信息,实现高精度的土地覆盖分类,是遥感图像解译的一个重要方向。然而,大多数现有方法采用同质融合策略,如简单的特征连接或统一注意机制,无法解决不同表示层次的动态需求。在浅层中,缺乏精确的空间对齐常常导致精细细节的丢失和边界模糊,而在更深的层中,这些方法难以有效地解开跨模态语义关系和建模全局依赖关系。为了克服这些限制,本文提出了一种用于多模态遥感语义分割的分层双流融合和交叉令牌交互网络(HDC-Net)。网络遵循分层异构设计。在浅层编码阶段,引入交互式和共享注意力融合(ISAF)模块,实现像素级空间对齐和特征增强。在更深层,开发了一个分层的跨令牌交互转换器(HCFormer),用于全局语义建模和跨模态关系解纠缠。此外,还设计了一个金字塔型融合桥(PFB),以有效地连接深、浅特征。最后,信息融合解码器集成深度语义、跨模态桥接特征和浅层空间细节,生成高保真分割图。在三个公共基准数据集(ISPRS Vaihingen、ISPRS Potsdam和WHU-OPT-SAR)上进行的大量实验证明了该方法的有效性、鲁棒性和泛化能力。该实现可从https://github.com/lzp-lkd/HDC-Net获得。
{"title":"HDC-Net:A multimodal remote sensing semantic segmentation network with hierarchical dual-stream fusion and cross-token interaction","authors":"Zhengpeng Li ,&nbsp;Yubo Zhang ,&nbsp;Jun Hu ,&nbsp;Kunyang Wu ,&nbsp;Jiawei Miao ,&nbsp;Jiansheng Wu ,&nbsp;Xiaolin Zhang ,&nbsp;Bin Yang ,&nbsp;Zhiguo Xia","doi":"10.1016/j.knosys.2026.115416","DOIUrl":"10.1016/j.knosys.2026.115416","url":null,"abstract":"<div><div>Semantic segmentation of multimodal remote sensing imagery aims to integrate complementary information from different sensors to achieve high-precision land-cover classification, representing a key direction in remote sensing image interpretation. However, most existing approaches adopt homogeneous fusion strategies, such as simple feature concatenation or uniform attention mechanisms, which fail to address the dynamic requirements across different representation levels. In the shallow layers, the lack of precise spatial alignment often leads to the loss of fine details and blurred boundaries, while in the deeper layers, these methods struggle to effectively disentangle cross-modal semantic relationships and model global dependencies. To overcome these limitations, this paper proposes a hierarchical dual-stream fusion and cross-token interaction network (HDC-Net) for multimodal remote sensing semantic segmentation. The network follows a layer-wise heterogeneous design. In the shallow encoding stage, an interactive and shared attention fusion (ISAF) module is introduced to achieve pixel-level spatial alignment and feature enhancement. In the deeper layers, a hierarchical cross-token interaction transformer (HCFormer) is developed for global semantic modeling and cross-modal relationship disentanglement. Additionally, a pyramidal fusion bridge (PFB) is designed to efficiently connect deep and shallow features. Finally, an information-fusion decoder integrates deep semantics, cross-modal bridging features, and shallow spatial details to produce high-fidelity segmentation maps. Extensive experiments on three public benchmark datasets, ISPRS Vaihingen, ISPRS Potsdam, and WHU-OPT-SAR, demonstrate the effectiveness, robustness, and generalization capability of the proposed approach. The implementation is available at <span><span>https://github.com/lzp-lkd/HDC-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115416"},"PeriodicalIF":7.6,"publicationDate":"2026-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Affective computing in the era of large language models: A survey from the NLP perspective 大语言模型时代的情感计算:基于NLP视角的调查
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-25 DOI: 10.1016/j.knosys.2026.115411
Yiqun Zhang , Xiaocui Yang , Xingle Xu , Zeran Gao , Yijie Huang , Shiyi Mu , Shi Feng , Daling Wang , Yifei Zhang , Kaisong Song , Ge Yu
Affective Computing (AC) integrates computer science, psychology, and cognitive science to enable machines to recognize, interpret, and simulate human emotions across domains such as social media, finance, healthcare, and education. AC commonly centers on two task families: Affective Understanding (AU) and Affective Generation (AG). While fine-tuned pre-trained language models (PLMs) have achieved solid AU performance, they often generalize poorly across tasks and remain limited for AG, especially in producing diverse, emotionally appropriate responses. The advent of Large Language Models (LLMs) (e.g., ChatGPT and LLaMA) has catalyzed a paradigm shift by offering in-context learning, broader world knowledge, and stronger sequence generation. This survey presents an Natural Language Processing (NLP)-oriented overview of AC in the LLM era. We (i) consolidate traditional AC tasks and preliminary LLM-based studies; (ii) review adaptation techniques that improve AU/AG, including Instruction Tuning (full and parameter-efficient methods), Prompt Engineering (zero/few-shot, chain-of-thought, agent-based prompting), and Reinforcement Learning (RL). For the latter, we summarize RL from human preferences, verifiable/programmatic rewards, and model(s) feedback, which provide preference- or rule-grounded optimization signals that can help steer AU/AG toward empathy, safety, and planning, achieving finer-grained or multi-objective control. To assess progress, we compile benchmarks and evaluation practices for both AU and AG. We also discuss open challenges–from ethics, data quality, and safety to robust evaluation and resource efficiency–and outline research directions. We hope this survey clarifies the landscape and offers practical guidance for building affect-aware, reliable, and responsible LLM systems.
情感计算(AC)集成了计算机科学、心理学和认知科学,使机器能够识别、解释和模拟社交媒体、金融、医疗保健和教育等领域的人类情感。情感交流通常以情感理解(AU)和情感产生(AG)两个任务族为中心。虽然经过微调的预训练语言模型(PLMs)已经取得了坚实的AU性能,但它们通常在任务之间泛化得很差,并且对于AG来说仍然有限,特别是在产生多样化、情感上适当的反应方面。大型语言模型(llm)的出现(例如,ChatGPT和LLaMA)通过提供上下文学习、更广泛的世界知识和更强大的序列生成,催化了范式的转变。这项调查提出了一个自然语言处理(NLP)为导向的概述在法学硕士时代的交流。我们(i)整合传统的交流任务和基于法学硕士的初步研究;(ii)回顾改进AU/AG的适应技术,包括指令调整(完整和参数有效的方法),提示工程(零/几次,思维链,基于代理的提示)和强化学习(RL)。对于后者,我们从人类偏好、可验证/可编程奖励和模型反馈中总结了强化学习,这些反馈提供了基于偏好或规则的优化信号,可以帮助引导AU/AG走向同理心、安全和规划,实现更细粒度或多目标控制。为了评估进展情况,我们为非盟和总队编制了基准和评估方法。我们还讨论了开放性挑战——从伦理、数据质量和安全到稳健评估和资源效率——并概述了研究方向。我们希望这项调查能够澄清现状,并为建立具有影响意识、可靠和负责任的法学硕士系统提供实用指导。
{"title":"Affective computing in the era of large language models: A survey from the NLP perspective","authors":"Yiqun Zhang ,&nbsp;Xiaocui Yang ,&nbsp;Xingle Xu ,&nbsp;Zeran Gao ,&nbsp;Yijie Huang ,&nbsp;Shiyi Mu ,&nbsp;Shi Feng ,&nbsp;Daling Wang ,&nbsp;Yifei Zhang ,&nbsp;Kaisong Song ,&nbsp;Ge Yu","doi":"10.1016/j.knosys.2026.115411","DOIUrl":"10.1016/j.knosys.2026.115411","url":null,"abstract":"<div><div>Affective Computing (AC) integrates computer science, psychology, and cognitive science to enable machines to recognize, interpret, and simulate human emotions across domains such as social media, finance, healthcare, and education. AC commonly centers on two task families: Affective Understanding (AU) and Affective Generation (AG). While fine-tuned pre-trained language models (PLMs) have achieved solid AU performance, they often generalize poorly across tasks and remain limited for AG, especially in producing diverse, emotionally appropriate responses. The advent of Large Language Models (LLMs) (e.g., ChatGPT and LLaMA) has catalyzed a paradigm shift by offering in-context learning, broader world knowledge, and stronger sequence generation. This survey presents an Natural Language Processing (NLP)-oriented overview of AC in the LLM era. We (i) consolidate traditional AC tasks and preliminary LLM-based studies; (ii) review adaptation techniques that improve AU/AG, including Instruction Tuning (full and parameter-efficient methods), Prompt Engineering (zero/few-shot, chain-of-thought, agent-based prompting), and Reinforcement Learning (RL). For the latter, we summarize RL from human preferences, verifiable/programmatic rewards, and model(s) feedback, which provide preference- or rule-grounded optimization signals that can help steer AU/AG toward empathy, safety, and planning, achieving finer-grained or multi-objective control. To assess progress, we compile benchmarks and evaluation practices for both AU and AG. We also discuss open challenges–from ethics, data quality, and safety to robust evaluation and resource efficiency–and outline research directions. We hope this survey clarifies the landscape and offers practical guidance for building affect-aware, reliable, and responsible LLM systems.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115411"},"PeriodicalIF":7.6,"publicationDate":"2026-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RAATrack: Reliable appearance aggregation for video-level multimodel tracking RAATrack:视频级多模型跟踪的可靠外观聚合
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-25 DOI: 10.1016/j.knosys.2026.115414
Yingran Jin, Yun Gao, Qianyun Feng
Multimodal tracking has attracted widespread attention due to its ability to mitigate the inherent limitations of conventional RGB-based tracking. However, most existing multimodal trackers primarily focus on spatial feature fusion and enhancement across different modalities, or only exploit sparse temporal dependencies between video frames, making it difficult to systematically capture and utilize long-range temporal correlations and effectively model target dynamics and motion information. To address this issue, we propose a novel context-aware video-level multimodal tracking framework based on reliable appearance aggregation, named RAATrack. During tracking, RAATrack continuously aggregates reliable target appearance information and, leveraging the hidden state mechanism of Mamba, records and propagates rich contextual information across the entire video sequence, thereby enhancing tracking robustness. The core component of RAATrack is the appearance information aggregation (AIA) module, which consists of a cross-attention layer and a Mamba layer. The cross-attention layer periodically calibrates appearance information, while the Mamba layer continuously captures target appearance variations and establishes long-range temporal dependencies across video frames. Experiments conducted on five diverse multi-modal datasets (RGBT234, LasHeR, VisEvent, DepthTrack, and VOT-RGBD2022) demonstrate that RAATrack achieves state-of-the-art performance.
多模态跟踪由于能够减轻传统基于rgb的跟踪的固有局限性而引起了广泛的关注。然而,大多数现有的多模态跟踪器主要关注不同模态的空间特征融合和增强,或者仅利用视频帧之间的稀疏时间依赖性,这使得系统捕获和利用远程时间相关性以及有效建模目标动态和运动信息变得困难。为了解决这个问题,我们提出了一种新的基于可靠外观聚合的上下文感知视频级多模式跟踪框架,名为RAATrack。在跟踪过程中,RAATrack不断聚合可靠的目标外观信息,并利用曼巴的隐藏状态机制,在整个视频序列中记录和传播丰富的上下文信息,从而增强跟踪的鲁棒性。RAATrack的核心组件是外观信息聚合(AIA)模块,该模块由交叉注意层和曼巴层组成。交叉注意层定期校准外观信息,而曼巴层持续捕获目标外观变化并建立跨视频帧的长期时间依赖性。在五个不同的多模态数据集(RGBT234、LasHeR、VisEvent、DepthTrack和vote - rgbd2022)上进行的实验表明,RAATrack实现了最先进的性能。
{"title":"RAATrack: Reliable appearance aggregation for video-level multimodel tracking","authors":"Yingran Jin,&nbsp;Yun Gao,&nbsp;Qianyun Feng","doi":"10.1016/j.knosys.2026.115414","DOIUrl":"10.1016/j.knosys.2026.115414","url":null,"abstract":"<div><div>Multimodal tracking has attracted widespread attention due to its ability to mitigate the inherent limitations of conventional RGB-based tracking. However, most existing multimodal trackers primarily focus on spatial feature fusion and enhancement across different modalities, or only exploit sparse temporal dependencies between video frames, making it difficult to systematically capture and utilize long-range temporal correlations and effectively model target dynamics and motion information. To address this issue, we propose a novel context-aware video-level multimodal tracking framework based on <strong>r</strong>eliable <strong>a</strong>ppearance <strong>a</strong>ggregation, named RAATrack. During tracking, RAATrack continuously aggregates reliable target appearance information and, leveraging the hidden state mechanism of Mamba, records and propagates rich contextual information across the entire video sequence, thereby enhancing tracking robustness. The core component of RAATrack is the <strong>a</strong>ppearance <strong>i</strong>nformation <strong>a</strong>ggregation (AIA) module, which consists of a cross-attention layer and a Mamba layer. The cross-attention layer periodically calibrates appearance information, while the Mamba layer continuously captures target appearance variations and establishes long-range temporal dependencies across video frames. Experiments conducted on five diverse multi-modal datasets (RGBT234, LasHeR, VisEvent, DepthTrack, and VOT-RGBD2022) demonstrate that RAATrack achieves state-of-the-art performance.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115414"},"PeriodicalIF":7.6,"publicationDate":"2026-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Similarity surrogate-assisted evolutionary neural architecture search based on graph neural network 基于图神经网络的相似性代理辅助进化神经结构搜索
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-24 DOI: 10.1016/j.knosys.2026.115410
Yu Xue , Keyu Liu , Yiyu Tan , Yong Zhang
Neural architecture search (NAS) serves as a paradigm for the automated design of neural network architectures and has significant potential to advance the field of deep learning. NAS often incurs substantial computational costs. Surrogate models are proposed to replace time-intensive evaluations by predicting fitness values. However, they have limited accuracy or stability issues. To address these issues, this paper proposed SiGNAS, a surrogate-assisted evolutionary NAS method that predicts architecture performance by assessing the similarity between candidate architectures and the optimal benchmark. In addition, SiGNAS maps architectures into a latent feature space and adopts graph neural networks as the surrogate model to capture similarities. Furthermore, the traditional static transmission strategy is replaced with a novel graph convolutional network transmission and aggregation mechanism to enhance expressiveness through similarity-based feature clustering. Experimental results demonstrate that SiGNAS discovers a high-performing architecture with a test error of 2.52% on CIFAR-10 in the DARTS search space. Moreover, it achieves 94.22% accuracy with 1,000 queries on NAS-Bench-101 and identifies the best architecture with only 200 queries on NAS-Bench-201. These results demonstrate that SiGNAS has significant advantages over most existing NAS algorithms, effectively exploring the search space and accurately identifying high-performance neural network architectures.
神经网络架构搜索(NAS)是神经网络架构自动化设计的一个范例,在推进深度学习领域具有重要的潜力。NAS通常会产生大量的计算成本。提出代理模型,通过预测适应度值来取代耗时的评估。然而,它们有有限的准确性和稳定性问题。为了解决这些问题,本文提出了SiGNAS,这是一种代理辅助的进化NAS方法,通过评估候选体系结构与最优基准之间的相似性来预测体系结构性能。此外,SiGNAS将架构映射到潜在特征空间中,并采用图神经网络作为代理模型来捕获相似性。在此基础上,采用一种新颖的图卷积网络传输聚合机制取代传统的静态传输策略,通过基于相似度的特征聚类增强表达能力。实验结果表明,SiGNAS在CIFAR-10上发现了一种高性能的体系结构,在dart搜索空间中测试误差为2.52%。此外,在NAS-Bench-101上查询1000次,准确率达到94.22%;在NAS-Bench-201上查询200次,准确率达到94.22%。这些结果表明,SiGNAS比大多数现有的NAS算法具有显著的优势,可以有效地探索搜索空间并准确识别高性能神经网络架构。
{"title":"Similarity surrogate-assisted evolutionary neural architecture search based on graph neural network","authors":"Yu Xue ,&nbsp;Keyu Liu ,&nbsp;Yiyu Tan ,&nbsp;Yong Zhang","doi":"10.1016/j.knosys.2026.115410","DOIUrl":"10.1016/j.knosys.2026.115410","url":null,"abstract":"<div><div>Neural architecture search (NAS) serves as a paradigm for the automated design of neural network architectures and has significant potential to advance the field of deep learning. NAS often incurs substantial computational costs. Surrogate models are proposed to replace time-intensive evaluations by predicting fitness values. However, they have limited accuracy or stability issues. To address these issues, this paper proposed SiGNAS, a surrogate-assisted evolutionary NAS method that predicts architecture performance by assessing the similarity between candidate architectures and the optimal benchmark. In addition, SiGNAS maps architectures into a latent feature space and adopts graph neural networks as the surrogate model to capture similarities. Furthermore, the traditional static transmission strategy is replaced with a novel graph convolutional network transmission and aggregation mechanism to enhance expressiveness through similarity-based feature clustering. Experimental results demonstrate that SiGNAS discovers a high-performing architecture with a test error of 2.52% on CIFAR-10 in the DARTS search space. Moreover, it achieves 94.22% accuracy with 1,000 queries on NAS-Bench-101 and identifies the best architecture with only 200 queries on NAS-Bench-201. These results demonstrate that SiGNAS has significant advantages over most existing NAS algorithms, effectively exploring the search space and accurately identifying high-performance neural network architectures.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115410"},"PeriodicalIF":7.6,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Entropy-driven topology mapping framework for robust Bayesian classification 熵驱动的鲁棒贝叶斯分类拓扑映射框架
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-24 DOI: 10.1016/j.knosys.2026.115378
Yang Liu, Qi Wu, Xu Zhang
Balancing predictive accuracy with model interpretability remains a fundamental challenge in artificial intelligence and machine learning. Bayesian network classifiers (BNCs) address this by leveraging directed acyclic graphs (DAGs) to explicitly represent probabilistic dependencies. Current BNCs fall into two categories: single-topology methods, which struggle with inter-class divergence and intra-class heterogeneity in modern datasets, and multi-topology ensemble methods, which often employ oversimplified ensemble strategies and rely on pairwise dependency metrics that neglect synergistic effects and global coherence. To overcome these limitations, this paper proposes the entropy-driven topology mapping Bayesian classifier (ETMBC). It introduces a novel two-stage learning strategy guided by dual entropy minimization: class-specific entropy captures general dependencies across class distributions to handle inter-class divergence, while instance-specific entropy models specialized dependencies within testing instances. For classification, we develop a novel topology mapping approach that dynamically integrates predictions from multiple DAGs by jointly optimizing Jensen-Shannon divergence for structural consistency evaluation and topological information gain for complementary dependency discovery, effectively addressing intra-class heterogeneity and dependency asymmetry without predefined rules. Extensive experimental evaluation on 50 publicly available datasets spanning various domains with distinct properties demonstrate that ETMBC consistently outperforms state-of-the-art single-topology and multi-topology learners in terms of zero-one loss, bias, variance, root-mean-square error, and statistical tests.
平衡预测准确性和模型可解释性仍然是人工智能和机器学习的一个基本挑战。贝叶斯网络分类器(bnc)通过利用有向无环图(dag)显式表示概率依赖关系来解决这个问题。目前的bnc分为两类:单拓扑方法,在现代数据集中与类间差异和类内异质性作斗争;多拓扑集成方法,通常采用过于简化的集成策略,依赖于忽略协同效应和全局一致性的两两依赖度量。为了克服这些局限性,本文提出了熵驱动的拓扑映射贝叶斯分类器(ETMBC)。它引入了一种以双熵最小化为指导的新的两阶段学习策略:特定于类的熵捕获跨类分布的一般依赖关系,以处理类间的分歧,而特定于实例的熵模型专门处理测试实例中的依赖关系。在分类方面,我们开发了一种新颖的拓扑映射方法,该方法通过联合优化Jensen-Shannon散度来进行结构一致性评估,并通过拓扑信息获取来进行互补依赖发现,从而动态集成来自多个dag的预测,有效地解决了类内异质性和依赖不对称问题,而无需预定义规则。对50个公开可用的数据集进行了广泛的实验评估,这些数据集跨越具有不同属性的各个领域,表明ETMBC在0 - 1损失、偏差、方差、均方根误差和统计测试方面始终优于最先进的单拓扑和多拓扑学习器。
{"title":"Entropy-driven topology mapping framework for robust Bayesian classification","authors":"Yang Liu,&nbsp;Qi Wu,&nbsp;Xu Zhang","doi":"10.1016/j.knosys.2026.115378","DOIUrl":"10.1016/j.knosys.2026.115378","url":null,"abstract":"<div><div>Balancing predictive accuracy with model interpretability remains a fundamental challenge in artificial intelligence and machine learning. Bayesian network classifiers (BNCs) address this by leveraging directed acyclic graphs (DAGs) to explicitly represent probabilistic dependencies. Current BNCs fall into two categories: single-topology methods, which struggle with inter-class divergence and intra-class heterogeneity in modern datasets, and multi-topology ensemble methods, which often employ oversimplified ensemble strategies and rely on pairwise dependency metrics that neglect synergistic effects and global coherence. To overcome these limitations, this paper proposes the entropy-driven topology mapping Bayesian classifier (ETMBC). It introduces a novel two-stage learning strategy guided by dual entropy minimization: class-specific entropy captures general dependencies across class distributions to handle inter-class divergence, while instance-specific entropy models specialized dependencies within testing instances. For classification, we develop a novel topology mapping approach that dynamically integrates predictions from multiple DAGs by jointly optimizing Jensen-Shannon divergence for structural consistency evaluation and topological information gain for complementary dependency discovery, effectively addressing intra-class heterogeneity and dependency asymmetry without predefined rules. Extensive experimental evaluation on 50 publicly available datasets spanning various domains with distinct properties demonstrate that ETMBC consistently outperforms state-of-the-art single-topology and multi-topology learners in terms of zero-one loss, bias, variance, root-mean-square error, and statistical tests.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115378"},"PeriodicalIF":7.6,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Knowledge-Based Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1