Knowledge-Based Systems最新文献_第5页

TreeC: A method to generate interpretable energy management systems using a metaheuristic algorithm

IF 7.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge-Based Systems

Pub Date : 2024-11-20 DOI: 10.1016/j.knosys.2024.112756

Julian Ruddick , Luis Ramirez Camargo , Muhammad Andy Putratama , Maarten Messagie , Thierry Coosemans

Energy management systems (EMS) have traditionally been implemented using rule-based control (RBC) and model predictive control (MPC) methods. However, recent research has explored the use of reinforcement learning (RL) as a promising alternative. This paper introduces TreeC, a machine learning method that utilises the covariance matrix adaptation evolution strategy metaheuristic algorithm to generate an interpretable EMS modelled as a decision tree. Unlike RBC and MPC approaches, TreeC learns the decision strategy of the EMS based on historical data, adapting the control model to the controlled energy grid. The decision strategy is represented as a decision tree, providing interpretability compared to RL methods that often rely on black-box models like neural networks. TreeC is evaluated against MPC with perfect forecast and RL EMSs in two case studies taken from literature: an electric grid case and a household heating case. In the electric grid case, TreeC achieves an average energy loss and constraint violation score of 19.2, which is close to MPC and RL EMSs that achieve scores of 14.4 and 16.2 respectively. All three methods control the electric grid well especially when compared to the random EMS, which obtains an average score of 12 875. In the household heating case, TreeC performs similarly to MPC on the adjusted and averaged electricity cost and total discomfort (0.033 EUR/m² and 0.42 Kh for TreeC compared to 0.037 EUR/m² and 2.91 kH for MPC), while outperforming RL (0.266 EUR/m² and 24.41 Kh). TreeC demonstrates a performant and interpretable application of machine learning for EMSs.

{"title":"TreeC: A method to generate interpretable energy management systems using a metaheuristic algorithm","authors":"Julian Ruddick , Luis Ramirez Camargo , Muhammad Andy Putratama , Maarten Messagie , Thierry Coosemans","doi":"10.1016/j.knosys.2024.112756","DOIUrl":"10.1016/j.knosys.2024.112756","url":null,"abstract":"<div><div>Energy management systems (EMS) have traditionally been implemented using rule-based control (RBC) and model predictive control (MPC) methods. However, recent research has explored the use of reinforcement learning (RL) as a promising alternative. This paper introduces TreeC, a machine learning method that utilises the covariance matrix adaptation evolution strategy metaheuristic algorithm to generate an interpretable EMS modelled as a decision tree. Unlike RBC and MPC approaches, TreeC learns the decision strategy of the EMS based on historical data, adapting the control model to the controlled energy grid. The decision strategy is represented as a decision tree, providing interpretability compared to RL methods that often rely on black-box models like neural networks. TreeC is evaluated against MPC with perfect forecast and RL EMSs in two case studies taken from literature: an electric grid case and a household heating case. In the electric grid case, TreeC achieves an average energy loss and constraint violation score of 19.2, which is close to MPC and RL EMSs that achieve scores of 14.4 and 16.2 respectively. All three methods control the electric grid well especially when compared to the random EMS, which obtains an average score of 12875. In the household heating case, TreeC performs similarly to MPC on the adjusted and averaged electricity cost and total discomfort (0.033 EUR/m<sup>2</sup> and 0.42 Kh for TreeC compared to 0.037 EUR/m<sup>2</sup> and 2.91 kH for MPC), while outperforming RL (0.266 EUR/m<sup>2</sup> and 24.41 Kh). TreeC demonstrates a performant and interpretable application of machine learning for EMSs.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"309 ","pages":"Article 112756"},"PeriodicalIF":7.2,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142748702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Diverse Semantic Image Synthesis with various conditioning modalities

IF 7.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge-Based Systems

Pub Date : 2024-11-19 DOI: 10.1016/j.knosys.2024.112727

Chaoyue Wu , Rui Li , Cheng Liu , Si Wu , Hau-San Wong

Semantic image synthesis aims to generate high-fidelity images from a segmentation mask, and previous methods typically train a generator to associate a global random map with the conditioning mask. However, the lack of independent control of regional content impedes their application. To address this issue, we propose an effective approach for Multi-modal conditioning-based Diverse Semantic Image Synthesis, which is referred to as McDSIS. In this model, there are a number of constituent generators incorporated to synthesize the content in semantic regions from independent random maps. The regional content can be determined by the style code associated with a random map, extracted from a reference image, or by embedding a textual description via our proposed conditioning mechanisms. As a result, the generation process is spatially disentangled, which facilitates independent synthesis of diverse content in a semantic region, while at the same time preserving other content. Due to this flexible architecture, in addition to achieving superior performance over state-of-the-art semantic image generation models, McDSIS is capable of performing various visual tasks, such as face inpainting, swapping, local editing, etc.

{"title":"Diverse Semantic Image Synthesis with various conditioning modalities","authors":"Chaoyue Wu , Rui Li , Cheng Liu , Si Wu , Hau-San Wong","doi":"10.1016/j.knosys.2024.112727","DOIUrl":"10.1016/j.knosys.2024.112727","url":null,"abstract":"<div><div>Semantic image synthesis aims to generate high-fidelity images from a segmentation mask, and previous methods typically train a generator to associate a global random map with the conditioning mask. However, the lack of independent control of regional content impedes their application. To address this issue, we propose an effective approach for Multi-modal conditioning-based Diverse Semantic Image Synthesis, which is referred to as McDSIS. In this model, there are a number of constituent generators incorporated to synthesize the content in semantic regions from independent random maps. The regional content can be determined by the style code associated with a random map, extracted from a reference image, or by embedding a textual description via our proposed conditioning mechanisms. As a result, the generation process is spatially disentangled, which facilitates independent synthesis of diverse content in a semantic region, while at the same time preserving other content. Due to this flexible architecture, in addition to achieving superior performance over state-of-the-art semantic image generation models, McDSIS is capable of performing various visual tasks, such as face inpainting, swapping, local editing, etc.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"309 ","pages":"Article 112727"},"PeriodicalIF":7.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142748706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep spectral clustering by integrating local structure and prior information 通过整合局部结构和先验信息进行深度光谱聚类

IF 7.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge-Based Systems

Pub Date : 2024-11-19 DOI: 10.1016/j.knosys.2024.112743

Hua Meng , Yueyi Zhang , Zhiguo Long

The traditional spectral clustering (SC) is an effective clustering method that can handle data with complex structure. SC essentially embeds data in another feature space with time-consuming spectral embedding before clustering, and has to re-embed the whole data when unseen data arrive, lacking the so-called out-of-sample-extension capability. SpectralNet (Shaham et al., 2018) is a pioneer attempt to resolve these two problems by training with random mini-batches to scale to large-scale data and by an orthogonal transformation layer to ensure orthogonality of embeddings and remove redundancy in features. However, the randomly selected data in each mini-batch might be far away from each other and fail to convey local structural information; the orthogonal transformation can only ensure orthogonality for each mini-batch instead of the whole data. In this paper, we propose a novel approach to address these two problems. By improving data selection for batches with batch augmentation using neighboring information, it helps the network to better capture local structural information. By devising core point guidance to exploit the spectral embeddings of representative points as prior information, it guides the network to learn embeddings that can better maintain the overall structures of data points. Empirical results show that our method resolves the two problems of SpectralNet and exhibits superior clustering performance to SpectralNet and other state-of-the-art deep clustering algorithms, while being able to generalize the embedding to unseen data.

传统的光谱聚类（SC）是一种有效的聚类方法，可以处理结构复杂的数据。光谱聚类本质上是在聚类前通过耗时的光谱嵌入将数据嵌入另一个特征空间，而当未知数据到来时又必须重新嵌入整个数据，缺乏所谓的样本外扩展能力。SpectralNet （Shaham 等人，2018 年）是解决这两个问题的先驱尝试，它通过随机小批量训练来扩展大规模数据，并通过正交变换层来确保嵌入的正交性并消除特征中的冗余。然而，每个迷你批次中随机选取的数据可能彼此相距甚远，无法传递局部结构信息；正交变换只能确保每个迷你批次的正交性，而不能确保整个数据的正交性。本文提出了一种新方法来解决这两个问题。通过使用邻近信息改进批次增强的数据选择，它有助于网络更好地捕捉局部结构信息。通过设计核心点引导，利用代表性点的谱嵌入作为先验信息，引导网络学习能更好地保持数据点整体结构的嵌入。实证结果表明，我们的方法解决了 SpectralNet 的两个问题，聚类性能优于 SpectralNet 和其他最先进的深度聚类算法，同时还能将嵌入泛化到未见数据中。

{"title":"Deep spectral clustering by integrating local structure and prior information","authors":"Hua Meng , Yueyi Zhang , Zhiguo Long","doi":"10.1016/j.knosys.2024.112743","DOIUrl":"10.1016/j.knosys.2024.112743","url":null,"abstract":"<div><div>The traditional spectral clustering (SC) is an effective clustering method that can handle data with complex structure. SC essentially embeds data in another feature space with time-consuming spectral embedding before clustering, and has to re-embed the whole data when unseen data arrive, lacking the so-called <em>out-of-sample-extension</em> capability. SpectralNet (Shaham et al., 2018) is a pioneer attempt to resolve these two problems by training with random mini-batches to scale to large-scale data and by an orthogonal transformation layer to ensure orthogonality of embeddings and remove redundancy in features. However, the randomly selected data in each mini-batch might be far away from each other and fail to convey local structural information; the orthogonal transformation can only ensure orthogonality for each mini-batch instead of the whole data. In this paper, we propose a novel approach to address these two problems. By improving data selection for batches with <em>batch augmentation</em> using neighboring information, it helps the network to better capture local structural information. By devising <em>core point guidance</em> to exploit the spectral embeddings of representative points as prior information, it guides the network to learn embeddings that can better maintain the overall structures of data points. Empirical results show that our method resolves the two problems of SpectralNet and exhibits superior clustering performance to SpectralNet and other state-of-the-art deep clustering algorithms, while being able to generalize the embedding to unseen data.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"308 ","pages":"Article 112743"},"PeriodicalIF":7.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142720587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

EGFDA: Experience-guided Fine-grained Domain Adaptation for cross-domain pneumonia diagnosis EGFDA：经验指导下的细粒度领域适应性，用于跨领域肺炎诊断

IF 7.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge-Based Systems

Pub Date : 2024-11-19 DOI: 10.1016/j.knosys.2024.112752

Haoran Zhao , Tao Ren , Wei Li , Danke Wu , Zhe Xu

Although recent advances in deep learning have led to accurate pneumonia diagnoses, their heavy reliance on data annotation hinders their expected performance in clinical practice. Unsupervised domain adaptation (UDA) methods have been developed to address the scarcity of annotations. Nevertheless, the diverse manifestations of pneumonia pose challenges for current UDA methods, including spatial lesion-preference bias and discriminative class-preference bias. To overcome these problems, we propose an Experience-Guided Fine-grained Domain Adaptation (EGFDA) framework for automatic cross-domain pneumonia diagnosis. Our framework consists of two main modules: (1) Gradient-aware Lesion Area Matching (GaLAM), which aims to reduce the global domain gap while avoiding misleading from lesion-unrelated targets, and (2) Reweighing Smooth Certainty-aware Matching (RSCaM), which aims to match class space with a smooth certainty-aware feature mapping to guide the model to learn more precise class-discriminative features. Benefiting from the collaboration between GaLAM and RSCaM, the proposed EGFDA is able to process unlabeled samples following a pattern similar to the diagnostic experience of physicians, that is, first locating the disease-related lesion area and then performing fine-grained discrimination. Comprehensive experiments on three different tasks using six datasets demonstrate the superior performance of our EGFDA. Furthermore, extensive ablation studies and visual analyses highlight the remarkable interpretability and generalization of the proposed method.

虽然最近在深度学习方面取得的进展已经带来了准确的肺炎诊断，但它们对数据注释的严重依赖阻碍了它们在临床实践中的预期表现。为了解决注释稀缺的问题，人们开发了无监督领域适应（UDA）方法。然而，肺炎的多种表现形式给当前的 UDA 方法带来了挑战，包括空间病变偏好和分辨类别偏好。为了克服这些问题，我们提出了一种用于跨域肺炎自动诊断的经验引导细粒度域适应（EGFDA）框架。我们的框架由两个主要模块组成：(1) 梯度感知病变区域匹配（Gradient-aware Lesion Area Matching，GaLAM），旨在减少全域差距，同时避免病变无关目标的误导；(2) 重权重平滑确定性感知匹配（Reweighing Smooth Certainty-aware Matching，RSCaM），旨在用平滑确定性感知特征映射匹配类空间，引导模型学习更精确的类区分特征。得益于GaLAM和RSCaM之间的合作，所提出的EGFDA能够按照类似于医生诊断经验的模式处理无标记样本，即首先定位与疾病相关的病变区域，然后进行细粒度判别。利用六个数据集对三种不同任务进行的综合实验证明了我们的 EGFDA 的卓越性能。此外，广泛的消融研究和视觉分析凸显了所提出方法的显著可解释性和通用性。

{"title":"EGFDA: Experience-guided Fine-grained Domain Adaptation for cross-domain pneumonia diagnosis","authors":"Haoran Zhao , Tao Ren , Wei Li , Danke Wu , Zhe Xu","doi":"10.1016/j.knosys.2024.112752","DOIUrl":"10.1016/j.knosys.2024.112752","url":null,"abstract":"<div><div>Although recent advances in deep learning have led to accurate pneumonia diagnoses, their heavy reliance on data annotation hinders their expected performance in clinical practice. Unsupervised domain adaptation (UDA) methods have been developed to address the scarcity of annotations. Nevertheless, the diverse manifestations of pneumonia pose challenges for current UDA methods, including spatial lesion-preference bias and discriminative class-preference bias. To overcome these problems, we propose an Experience-Guided Fine-grained Domain Adaptation (EGFDA) framework for automatic cross-domain pneumonia diagnosis. Our framework consists of two main modules: (1) Gradient-aware Lesion Area Matching (GaLAM), which aims to reduce the global domain gap while avoiding misleading from lesion-unrelated targets, and (2) Reweighing Smooth Certainty-aware Matching (RSCaM), which aims to match class space with a smooth certainty-aware feature mapping to guide the model to learn more precise class-discriminative features. Benefiting from the collaboration between GaLAM and RSCaM, the proposed EGFDA is able to process unlabeled samples following a pattern similar to the diagnostic experience of physicians, that is, first locating the disease-related lesion area and then performing fine-grained discrimination. Comprehensive experiments on three different tasks using six datasets demonstrate the superior performance of our EGFDA. Furthermore, extensive ablation studies and visual analyses highlight the remarkable interpretability and generalization of the proposed method.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"307 ","pages":"Article 112752"},"PeriodicalIF":7.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142698124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Contrastive Predictive Embedding for learning and inference in knowledge graph 用于知识图谱学习和推理的对比预测嵌入法

IF 7.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge-Based Systems

Pub Date : 2024-11-19 DOI: 10.1016/j.knosys.2024.112730

Chen Liu, Zihan Wei, Lixin Zhou

Knowledge graph embedding (KGE) aims to capture rich semantic information about entities and relationships in KGs, which is essential for Knowledge Graph Completion (KGC) and various downstream tasks. Existing KGE models differentiate between entity and relationship embeddings by constructing indirect pretext tasks and scoring functions to discern different types of triplets. In contrast, this paper introduces a novel KGE method called Contrastive Predictive Embedding (CPE), which dispenses with the need for defining scoring functions or negative sampling. Specifically, CPE directly predicts embeddings for unknown entities based on the known entity and relationship embeddings in triplets and compares them with the true embeddings. Additionally, this paper proposes a special optimization approach to enhance the performance of various Translation-based models. Experimental results on four benchmark KGs demonstrate that CPE improves the performance of original KGE models while maintaining lower computational complexity. On the FB15k-237 dataset, CPE enhances the MRR and

Hit @ k (k \in {1, 3, 10})

metrics of TransE by 1.55%, 3.37%, 4.58%, and 5.92%, respectively.

知识图谱嵌入（KGE）旨在捕捉知识图谱中关于实体和关系的丰富语义信息，这些信息对于知识图谱补全（KGC）和各种下游任务至关重要。现有的 KGE 模型通过构建间接前置任务和评分函数来区分实体和关系嵌入，以辨别不同类型的三元组。相比之下，本文介绍了一种名为 "对比预测嵌入"（CPE）的新型 KGE 方法，该方法无需定义评分函数或负采样。具体来说，CPE 根据三元组中已知的实体和关系嵌入，直接预测未知实体的嵌入，并与真实嵌入进行比较。此外，本文还提出了一种特殊的优化方法，以提高各种基于翻译的模型的性能。在四个基准 KG 上的实验结果表明，CPE 提高了原始 KGE 模型的性能，同时保持了较低的计算复杂度。在 FB15k-237 数据集上，CPE 使 TransE 的 MRR 和 Hit@k(k∈{1,3,10})指标分别提高了 1.55%、3.37%、4.58% 和 5.92%。

{"title":"Contrastive Predictive Embedding for learning and inference in knowledge graph","authors":"Chen Liu, Zihan Wei, Lixin Zhou","doi":"10.1016/j.knosys.2024.112730","DOIUrl":"10.1016/j.knosys.2024.112730","url":null,"abstract":"<div><div>Knowledge graph embedding (KGE) aims to capture rich semantic information about entities and relationships in KGs, which is essential for Knowledge Graph Completion (KGC) and various downstream tasks. Existing KGE models differentiate between entity and relationship embeddings by constructing indirect pretext tasks and scoring functions to discern different types of triplets. In contrast, this paper introduces a novel KGE method called Contrastive Predictive Embedding (CPE), which dispenses with the need for defining scoring functions or negative sampling. Specifically, CPE directly predicts embeddings for unknown entities based on the known entity and relationship embeddings in triplets and compares them with the true embeddings. Additionally, this paper proposes a special optimization approach to enhance the performance of various Translation-based models. Experimental results on four benchmark KGs demonstrate that CPE improves the performance of original KGE models while maintaining lower computational complexity. On the FB15k-237 dataset, CPE enhances the MRR and <span><math><mrow><mtext>Hit</mtext><mi>@</mi><mi>k</mi><mrow><mo>(</mo><mi>k</mi><mo>∈</mo><mrow><mo>{</mo><mn>1</mn><mo>,</mo><mn>3</mn><mo>,</mo><mn>10</mn><mo>}</mo></mrow><mo>)</mo></mrow></mrow></math></span> metrics of TransE by 1.55%, 3.37%, 4.58%, and 5.92%, respectively.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"307 ","pages":"Article 112730"},"PeriodicalIF":7.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142698126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adaptive Cross-Modal Experts Network with Uncertainty-Driven Fusion for Vision–Language Navigation 用于视觉语言导航的不确定性驱动融合的自适应跨模态专家网络

IF 7.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge-Based Systems

Pub Date : 2024-11-19 DOI: 10.1016/j.knosys.2024.112735

Jie Wu , Chunlei Wu , Xiuxuan Shen , Leiquan Wang

Vision-and-Language Navigation (VLN) enables an agent to autonomously navigate in real-world environments based on language instructions to reach specified destinations and accurately locate relevant targets. Although significant progress has been made in recent years, two major limitations remain: (1) Existing methods lack flexibility and diversity in processing multimodal information and cannot dynamically adjust to different input features. (2) Current fixed fusion strategies fail to dynamically adapt to varying data quality in open environments, insufficiently leveraging multi-scale features and handling complex nonlinear relationships. In this paper, an adaptive cross-modal experts network (ACME) with uncertainty-driven fusion is proposed to address these issues. The adaptive cross-modal experts module dynamically selects the most suitable expert network based on the input features, enhancing information processing diversity and flexibility. Additionally, the uncertainty-driven fusion module balances coarse-grained and fine-grained information by calculating their confidences and dynamically adjusting the fusion weights. Comprehensive experiments on the R2R, SOON, and REVERIE datasets demonstrate that our approach significantly outperforms existing VLN approaches.

视觉语言导航（VLN）使代理能够根据语言指令在真实世界环境中自主导航，到达指定目的地并准确定位相关目标。虽然近年来取得了重大进展，但仍存在两大局限：（1）现有方法在处理多模态信息时缺乏灵活性和多样性，无法根据不同的输入特征进行动态调整。(2）目前的固定融合策略无法动态适应开放环境中的数据质量变化，不能充分利用多尺度特征和处理复杂的非线性关系。本文提出了一种具有不确定性驱动融合的自适应跨模态专家网络（ACME）来解决这些问题。自适应跨模态专家模块可根据输入特征动态选择最合适的专家网络，从而提高信息处理的多样性和灵活性。此外，不确定性驱动的融合模块通过计算粗粒度和细粒度信息的可信度，动态调整融合权重，从而平衡粗粒度和细粒度信息。在 R2R、SOON 和 REVERIE 数据集上进行的综合实验表明，我们的方法明显优于现有的 VLN 方法。

{"title":"Adaptive Cross-Modal Experts Network with Uncertainty-Driven Fusion for Vision–Language Navigation","authors":"Jie Wu , Chunlei Wu , Xiuxuan Shen , Leiquan Wang","doi":"10.1016/j.knosys.2024.112735","DOIUrl":"10.1016/j.knosys.2024.112735","url":null,"abstract":"<div><div>Vision-and-Language Navigation (VLN) enables an agent to autonomously navigate in real-world environments based on language instructions to reach specified destinations and accurately locate relevant targets. Although significant progress has been made in recent years, two major limitations remain: (1) Existing methods lack flexibility and diversity in processing multimodal information and cannot dynamically adjust to different input features. (2) Current fixed fusion strategies fail to dynamically adapt to varying data quality in open environments, insufficiently leveraging multi-scale features and handling complex nonlinear relationships. In this paper, an adaptive cross-modal experts network (ACME) with uncertainty-driven fusion is proposed to address these issues. The adaptive cross-modal experts module dynamically selects the most suitable expert network based on the input features, enhancing information processing diversity and flexibility. Additionally, the uncertainty-driven fusion module balances coarse-grained and fine-grained information by calculating their confidences and dynamically adjusting the fusion weights. Comprehensive experiments on the R2R, SOON, and REVERIE datasets demonstrate that our approach significantly outperforms existing VLN approaches.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"307 ","pages":"Article 112735"},"PeriodicalIF":7.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142698122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A lightweight convolutional neural network for road surface classification under shadow interference 用于阴影干扰下路面分类的轻量级卷积神经网络

IF 7.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge-Based Systems

Pub Date : 2024-11-17 DOI: 10.1016/j.knosys.2024.112761

Ruichi Mao, Guangqiang Wu, Jian Wu, Xingyu Wang

The development of intelligent driving, especially in the intelligent control of active suspension, heavily relies on the predictive perception of upcoming road conditions. To achieve accurate real-time road surface classification and overcome shadow interference, a lightweight convolutional neural network (CNN) based on a novel data augmentation method is proposed and an improved cycle-consistent adversarial network (CycleGAN) is developed to generate shadowed pavement data. The CycleGAN network structure is optimized using the texture self-supervised (TSS) mechanism and the learned perceptual image patch similarity (LPIPS) function, with label smoothing applied during training. The images produced by this data augmentation method closely resemble real-world images. Furthermore, Efficient-MBConv, which offers the advantages of fewer parameters and higher precision, is proposed. Finally, the Light-EfficientNet architecture, based on Efficient-MBConv, is developed and trained on the augmented dataset. Compared with EfficientNet-B0, the number of parameters in Light-EfficientNet is reduced by 61.94 %. The Light-EfficientNet model trained with data augmentation demonstrates an average classification accuracy improvement of 5.76 % on the test set with shadows, compared with the model trained without data augmentation. This approach effectively reduces the impact of shadows on road classification at a lower cost, while also significantly reducing the computational resources required by the CNN, providing real-time and accurate road surface information for the control of active suspension height and damping.

智能驾驶的发展，尤其是主动悬架的智能控制，在很大程度上依赖于对未来路况的预测感知。为了实现准确的实时路面分类并克服阴影干扰，提出了一种基于新型数据增强方法的轻量级卷积神经网络（CNN），并开发了一种改进的循环一致性对抗网络（CycleGAN）来生成阴影路面数据。利用纹理自监督（TSS）机制和学习感知图像补丁相似性（LPIPS）函数对 CycleGAN 网络结构进行了优化，并在训练过程中应用了标签平滑。这种数据增强方法生成的图像与真实世界的图像非常相似。此外，还提出了 Efficient-MBConv 方法，它具有参数少、精度高的优点。最后，基于 Efficient-MBConv 开发了 Light-EfficientNet 架构，并在增强数据集上进行了训练。与 EfficientNet-B0 相比，Light-EfficientNet 的参数数量减少了 61.94%。与未进行数据增强的模型相比，经过数据增强训练的 Light-EfficientNet 模型在有阴影的测试集上的平均分类准确率提高了 5.76%。这种方法以较低的成本有效地减少了阴影对道路分类的影响，同时还大大减少了 CNN 所需的计算资源，为控制主动悬架高度和阻尼提供了实时、准确的路面信息。

{"title":"A lightweight convolutional neural network for road surface classification under shadow interference","authors":"Ruichi Mao, Guangqiang Wu, Jian Wu, Xingyu Wang","doi":"10.1016/j.knosys.2024.112761","DOIUrl":"10.1016/j.knosys.2024.112761","url":null,"abstract":"<div><div>The development of intelligent driving, especially in the intelligent control of active suspension, heavily relies on the predictive perception of upcoming road conditions. To achieve accurate real-time road surface classification and overcome shadow interference, a lightweight convolutional neural network (CNN) based on a novel data augmentation method is proposed and an improved cycle-consistent adversarial network (CycleGAN) is developed to generate shadowed pavement data. The CycleGAN network structure is optimized using the texture self-supervised (TSS) mechanism and the learned perceptual image patch similarity (LPIPS) function, with label smoothing applied during training. The images produced by this data augmentation method closely resemble real-world images. Furthermore, Efficient-MBConv, which offers the advantages of fewer parameters and higher precision, is proposed. Finally, the Light-EfficientNet architecture, based on Efficient-MBConv, is developed and trained on the augmented dataset. Compared with EfficientNet-B0, the number of parameters in Light-EfficientNet is reduced by 61.94 %. The Light-EfficientNet model trained with data augmentation demonstrates an average classification accuracy improvement of 5.76 % on the test set with shadows, compared with the model trained without data augmentation. This approach effectively reduces the impact of shadows on road classification at a lower cost, while also significantly reducing the computational resources required by the CNN, providing real-time and accurate road surface information for the control of active suspension height and damping.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"306 ","pages":"Article 112761"},"PeriodicalIF":7.2,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142706023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

QA-TSN: QuickAccurate Tongue Segmentation Net QA-TSN：快速准确舌苔分割网

IF 7.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge-Based Systems

Pub Date : 2024-11-15 DOI: 10.1016/j.knosys.2024.112648

Guangze Jia, Zhenchao Cui, Qingsong Fei

Tongue segmentation is an essential part for computer-aided tongue diagnosis. Since of similar color and texture between tongue body and non-tongue body, such as lips and face, existing methods produce the lack of accuracy and completeness for tongue segmentation results. Moreover, small samples in tongue datasets lead under-fitting on CNN-based methods which always produce poor segmentation. To solve these problems, we designed the quick accurate tongue segmentation net (QA-TSN) to segment tongue body. To alleviate small sample problem, in the proposed method, a tongue-style transfer generation net(T-STGN) was propose to synthesize tongue images. In T-STGN, a novel encoder–decoder structure with two encoder with a global rendering block was used to refine global characteristics of synthetic tongue images. For real-time tongue segmentation, quicker tongue segmentation net (QTSN) was proposed in QA-TSN. In QTSN, we used an encoder–decoder structure with modified partial convolution (MPConv) to expedite the computation for real-time segmentation. To smooth the segments of tongue body, a novel loss function of tongue segmentation loss (TSL) was proposed. In TSL, tongue edge loss (TEL) was used to smooth the boundary of segmentation of tongue body and tongue area loss (TAL) was proposed to improve the fragmentation of segmentation results. Experiments conducted on tongue datasets achieved an IoU of 98.0307 and a Dice score of 99.0738, with a frame rate of 75.35, outperforming all other methods involved in the experiment. These results demonstrate the effectiveness of the proposed QA-TSN.

舌体分割是计算机辅助舌体诊断的重要组成部分。由于舌体与非舌体（如嘴唇和脸部）的颜色和纹理相似，现有方法对舌头的分割结果缺乏准确性和完整性。此外，由于舌体数据集样本较少，基于 CNN 的方法拟合不足，导致分割效果不佳。为了解决这些问题，我们设计了快速准确舌体分割网（QA-TSN）来分割舌体。为了缓解小样本问题，我们提出了一种舌头样式转移生成网（T-STGN）来合成舌头图像。在 T-STGN 中，一个新颖的编码器-解码器结构包含两个编码器和一个全局渲染块，用于细化合成舌体图像的全局特征。为实现实时舌头分割，QA-TSN 中提出了快速舌头分割网（QTSN）。在 QTSN 中，我们使用了带有修正部分卷积（MPConv）的编码器-解码器结构，以加快实时分割的计算速度。为了平滑舌体的分割，我们提出了一种新的损失函数--舌体分割损失（TSL）。在 TSL 中，舌头边缘损失（TEL）用于平滑舌体分割的边界，舌头区域损失（TAL）用于改善分割结果的破碎性。在舌头数据集上进行的实验取得了 98.0307 的 IoU 和 99.0738 的 Dice 分数，帧速率为 75.35，优于实验中涉及的所有其他方法。这些结果证明了所提出的 QA-TSN 的有效性。

{"title":"QA-TSN: QuickAccurate Tongue Segmentation Net","authors":"Guangze Jia, Zhenchao Cui, Qingsong Fei","doi":"10.1016/j.knosys.2024.112648","DOIUrl":"10.1016/j.knosys.2024.112648","url":null,"abstract":"<div><div>Tongue segmentation is an essential part for computer-aided tongue diagnosis. Since of similar color and texture between tongue body and non-tongue body, such as lips and face, existing methods produce the lack of accuracy and completeness for tongue segmentation results. Moreover, small samples in tongue datasets lead under-fitting on CNN-based methods which always produce poor segmentation. To solve these problems, we designed the quick accurate tongue segmentation net (QA-TSN) to segment tongue body. To alleviate small sample problem, in the proposed method, a tongue-style transfer generation net(T-STGN) was propose to synthesize tongue images. In T-STGN, a novel encoder–decoder structure with two encoder with a global rendering block was used to refine global characteristics of synthetic tongue images. For real-time tongue segmentation, quicker tongue segmentation net (QTSN) was proposed in QA-TSN. In QTSN, we used an encoder–decoder structure with modified partial convolution (MPConv) to expedite the computation for real-time segmentation. To smooth the segments of tongue body, a novel loss function of tongue segmentation loss (TSL) was proposed. In TSL, tongue edge loss (TEL) was used to smooth the boundary of segmentation of tongue body and tongue area loss (TAL) was proposed to improve the fragmentation of segmentation results. Experiments conducted on tongue datasets achieved an IoU of 98.0307 and a Dice score of 99.0738, with a frame rate of 75.35, outperforming all other methods involved in the experiment. These results demonstrate the effectiveness of the proposed QA-TSN.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"307 ","pages":"Article 112648"},"PeriodicalIF":7.2,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142698123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A confidence-based knowledge integration framework for cross-domain table question answering 基于置信度的跨域表格问题解答知识整合框架

IF 7.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge-Based Systems

Pub Date : 2024-11-15 DOI: 10.1016/j.knosys.2024.112718

Yuankai Fan , Tonghui Ren , Can Huang , Beini Zheng , Yinan Jing , Zhenying He , Jinbao Li , Jianxin Li

Recent advancements in TableQA leverage sequence-to-sequence (Seq2seq) deep learning models to accurately respond to natural language queries. These models achieve this by converting the queries into SQL queries, using information drawn from one or more tables. However, Seq2seq models often produce uncertain (low-confidence) predictions when distributing probability mass across multiple outputs during a decoding step, frequently yielding translation errors. To tackle this problem, we present Ckif, a confidence-based knowledge integration framework that uses a two-stage deep-learning-based ranking technique to mitigate the low-confidence problem commonly associated with Seq2seq models for TableQA. The core idea of Ckif is to introduce a flexible framework that seamlessly integrates with any existing Seq2seq translation models to enhance their performance. Specifically, by inspecting the probability values in each decoding step, Ckif first masks out each low-confidence prediction from the predicted outcome of an underlying Seq2seq model. Subsequently, Ckif integrates prior knowledge of query language to generalize masked-out queries, enabling the generation of all possible queries and their corresponding NL expressions. Finally, a two-stage deep-learning ranking approach is developed to evaluate the semantic similarity of NL expressions to a given NL question, hence determining the best-matching result. Extensive experiments are conducted to investigate Ckif by applying it to five state-of-the-art Seq2seq models using a widely used public benchmark. The experimental results indicate that Ckif consistently enhances the performance of all the Seq2seq models, demonstrating its effectiveness for better supporting TableQA.

TableQA 的最新进展是利用序列到序列（Seq2seq）深度学习模型来准确响应自然语言查询。这些模型通过使用从一个或多个表中提取的信息将查询转换为 SQL 查询来实现这一目标。然而，Seq2seq 模型在解码步骤中将概率分布到多个输出时，往往会产生不确定（低置信度）的预测，从而经常产生翻译错误。为了解决这个问题，我们提出了基于置信度的知识整合框架 Ckif，该框架使用基于深度学习的两阶段排序技术来缓解 Seq2seq 模型在 TableQA 中常见的低置信度问题。Ckif 的核心理念是引入一个灵活的框架，与任何现有的 Seq2seq 翻译模型无缝集成，以提高其性能。具体来说，通过检查每个解码步骤中的概率值，Ckif 首先从底层 Seq2seq 模型的预测结果中屏蔽掉每个低置信度预测。随后，Ckif 整合查询语言的先验知识，对屏蔽掉的查询进行泛化，从而生成所有可能的查询及其相应的 NL 表达式。最后，开发了一种两阶段深度学习排序方法，用于评估 NL 表达式与给定 NL 问题的语义相似性，从而确定最佳匹配结果。为了研究 Ckif，我们进行了广泛的实验，使用广泛使用的公共基准将 Ckif 应用于五个最先进的 Seq2seq 模型。实验结果表明，Ckif 始终如一地提高了所有 Seq2seq 模型的性能，证明了它在更好地支持 TableQA 方面的有效性。

{"title":"A confidence-based knowledge integration framework for cross-domain table question answering","authors":"Yuankai Fan , Tonghui Ren , Can Huang , Beini Zheng , Yinan Jing , Zhenying He , Jinbao Li , Jianxin Li","doi":"10.1016/j.knosys.2024.112718","DOIUrl":"10.1016/j.knosys.2024.112718","url":null,"abstract":"<div><div>Recent advancements in TableQA leverage sequence-to-sequence (Seq2seq) deep learning models to accurately respond to natural language queries. These models achieve this by converting the queries into SQL queries, using information drawn from one or more tables. However, Seq2seq models often produce uncertain (low-confidence) predictions when distributing probability mass across multiple outputs during a decoding step, frequently yielding translation errors. To tackle this problem, we present <span>Ckif</span>, a <em>confidence-based knowledge integration framework</em> that uses a two-stage deep-learning-based ranking technique to mitigate the low-confidence problem commonly associated with Seq2seq models for TableQA. The core idea of <span>Ckif</span> is to introduce a flexible framework that seamlessly integrates with any existing Seq2seq translation models to enhance their performance. Specifically, by inspecting the probability values in each decoding step, <span>Ckif</span> first masks out each low-confidence prediction from the predicted outcome of an underlying Seq2seq model. Subsequently, <span>Ckif</span> integrates prior knowledge of query language to generalize masked-out queries, enabling the generation of all possible queries and their corresponding NL expressions. Finally, a two-stage deep-learning ranking approach is developed to evaluate the semantic similarity of NL expressions to a given NL question, hence determining the best-matching result. Extensive experiments are conducted to investigate <span>Ckif</span> by applying it to five state-of-the-art Seq2seq models using a widely used public benchmark. The experimental results indicate that <span>Ckif</span> consistently enhances the performance of all the Seq2seq models, demonstrating its effectiveness for better supporting TableQA.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"306 ","pages":"Article 112718"},"PeriodicalIF":7.2,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142706022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing accuracy of compressed Convolutional Neural Networks through a transfer teacher and reinforcement guided training curriculum 通过转移教师和强化引导训练课程提高压缩卷积神经网络的准确性

IF 7.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge-Based Systems

Pub Date : 2024-11-15 DOI: 10.1016/j.knosys.2024.112719

Anusha Jayasimhan, Pabitha P.

Model compression techniques, such as network pruning, quantization and knowledge distillation, are essential for deploying large Convolutional Neural Networks (CNNs) on resource-constrained devices. Nevertheless, these techniques frequently lead to an accuracy loss, which affects performance in applications where precision is crucial. To mitigate accuracy loss, a novel method integrating Curriculum Learning (CL) with model compression, is proposed. Curriculum learning is a training approach in machine learning that involves progressively training a model on increasingly difficult samples. Existing CL approaches primarily rely on the manual design of scoring the difficulty of samples as well as pacing the easy to difficult examples for training. This gives rise to limitations such as inflexibility, need for expert domain knowledge and a decline in performance. Thereby, we propose a novel curriculum learning approach TRACE-CNN, i.e Transfer-teacher and Reinforcement-guided Adaptive Curriculum for Enhancing Convolutional Neural Networks, to address these limitations. Our semi-automated CL method consists of a pre-trained transfer teacher model whose performance serves as a measure of difficulty for the training examples. Furthermore, we employ a reinforcement learning technique to schedule training according to sample difficulty rather than establishing a fixed scheduler. Experiments on two benchmark datasets demonstrate that our method, when integrated into a model compression pipeline, effectively reduces the accuracy loss usually associated with such compression techniques.

网络剪枝、量化和知识提炼等模型压缩技术对于在资源有限的设备上部署大型卷积神经网络（CNN）至关重要。然而，这些技术经常会导致精度损失，从而影响精度至关重要的应用的性能。为了减少精度损失，我们提出了一种将课程学习（CL）与模型压缩相结合的新方法。课程学习是机器学习中的一种训练方法，包括在难度越来越高的样本上逐步训练模型。现有的课程学习方法主要依赖于人工设计样本的难度评分以及从易到难的示例训练节奏。这就产生了一些局限性，如缺乏灵活性、需要专家领域知识以及性能下降等。因此，我们提出了一种新颖的课程学习方法 TRACE-CNN，即用于增强卷积神经网络的转移教师和强化指导自适应课程，以解决这些局限性。我们的半自动化卷积神经网络方法由一个预先训练好的转移教师模型组成，该模型的性能可作为衡量训练实例难度的标准。此外，我们还采用了强化学习技术，根据样本难度安排训练，而不是建立一个固定的调度程序。在两个基准数据集上进行的实验表明，当我们的方法集成到模型压缩管道中时，能有效减少通常与此类压缩技术相关的准确率损失。

{"title":"Enhancing accuracy of compressed Convolutional Neural Networks through a transfer teacher and reinforcement guided training curriculum","authors":"Anusha Jayasimhan, Pabitha P.","doi":"10.1016/j.knosys.2024.112719","DOIUrl":"10.1016/j.knosys.2024.112719","url":null,"abstract":"<div><div>Model compression techniques, such as network pruning, quantization and knowledge distillation, are essential for deploying large Convolutional Neural Networks (CNNs) on resource-constrained devices. Nevertheless, these techniques frequently lead to an accuracy loss, which affects performance in applications where precision is crucial. To mitigate accuracy loss, a novel method integrating Curriculum Learning (CL) with model compression, is proposed. Curriculum learning is a training approach in machine learning that involves progressively training a model on increasingly difficult samples. Existing CL approaches primarily rely on the manual design of scoring the difficulty of samples as well as pacing the easy to difficult examples for training. This gives rise to limitations such as inflexibility, need for expert domain knowledge and a decline in performance. Thereby, we propose a novel curriculum learning approach TRACE-CNN, i.e <strong><u>T</u></strong>ransfer-teacher and <strong><u>R</u></strong>einforcement-guided <strong><u>A</u></strong>daptive <strong><u>C</u></strong>urriculum for <strong><u>E</u></strong>nhancing <strong><u>C</u></strong>onvolutional <strong><u>N</u></strong>eural <strong><u>N</u></strong>etworks, to address these limitations. Our semi-automated CL method consists of a pre-trained transfer teacher model whose performance serves as a measure of difficulty for the training examples. Furthermore, we employ a reinforcement learning technique to schedule training according to sample difficulty rather than establishing a fixed scheduler. Experiments on two benchmark datasets demonstrate that our method, when integrated into a model compression pipeline, effectively reduces the accuracy loss usually associated with such compression techniques.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"306 ","pages":"Article 112719"},"PeriodicalIF":7.2,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142706017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0