Data & Knowledge Engineering最新文献_第8页

Artificial intelligence in digital twins—A systematic literature review 数字双胞胎中的人工智能--系统文献综述

IF 2.5 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2024-04-03 DOI: 10.1016/j.datak.2024.102304

Tim Kreuzer, Panagiotis Papapetrou, Jelena Zdravkovic

Artificial intelligence and digital twins have become more popular in recent years and have seen usage across different application domains for various scenarios. This study reviews the literature at the intersection of the two fields, where digital twins integrate an artificial intelligence component. We follow a systematic literature review approach, analyzing a total of 149 related studies. In the assessed literature, a variety of problems are approached with an artificial intelligence-integrated digital twin, demonstrating its applicability across different fields. Our findings indicate that there is a lack of in-depth modeling approaches regarding the digital twin, while many articles focus on the implementation and testing of the artificial intelligence component. The majority of publications do not demonstrate a virtual-to-physical connection between the digital twin and the real-world system. Further, only a small portion of studies base their digital twin on real-time data from a physical system, implementing a physical-to-virtual connection.

近年来，人工智能和数字孪生变得越来越流行，并在不同应用领域的各种场景中得到广泛应用。本研究回顾了这两个领域交叉点的文献，其中数字双胞胎集成了人工智能组件。我们采用系统的文献综述方法，共分析了 149 项相关研究。在所评估的文献中，各种问题都与集成人工智能的数字孪生相关，表明了其在不同领域的适用性。我们的研究结果表明，目前缺乏有关数字孪生的深入建模方法，而许多文章则侧重于人工智能组件的实施和测试。大多数出版物没有展示数字孪生与现实世界系统之间虚拟到物理的联系。此外，只有一小部分研究将数字孪生基于物理系统的实时数据，实现了物理到虚拟的连接。

{"title":"Artificial intelligence in digital twins—A systematic literature review","authors":"Tim Kreuzer, Panagiotis Papapetrou, Jelena Zdravkovic","doi":"10.1016/j.datak.2024.102304","DOIUrl":"https://doi.org/10.1016/j.datak.2024.102304","url":null,"abstract":"<div><p>Artificial intelligence and digital twins have become more popular in recent years and have seen usage across different application domains for various scenarios. This study reviews the literature at the intersection of the two fields, where digital twins integrate an artificial intelligence component. We follow a systematic literature review approach, analyzing a total of 149 related studies. In the assessed literature, a variety of problems are approached with an artificial intelligence-integrated digital twin, demonstrating its applicability across different fields. Our findings indicate that there is a lack of in-depth modeling approaches regarding the digital twin, while many articles focus on the implementation and testing of the artificial intelligence component. The majority of publications do not demonstrate a virtual-to-physical connection between the digital twin and the real-world system. Further, only a small portion of studies base their digital twin on real-time data from a physical system, implementing a physical-to-virtual connection.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"151 ","pages":"Article 102304"},"PeriodicalIF":2.5,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000284/pdfft?md5=7bf249b030dadbb8c82308b54aef035d&pid=1-s2.0-S0169023X24000284-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140549919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Leveraging an Isolation Forest to Anomaly Detection and Data Clustering 利用隔离林进行异常检测和数据聚类

IF 2.5 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2024-03-28 DOI: 10.1016/j.datak.2024.102302

Véronne Yepmo , Grégory Smits , Marie-Jeanne Lesot , Olivier Pivert

Understanding why some points in a data set are considered as anomalies cannot be done without taking into account the structure of the regular points. Whereas many machine learning methods are dedicated to the identification of anomalies on one side, or to the identification of the data inner-structure on the other side, a solution is introduced to answers these two tasks using a same data model, a variant of an isolation forest. The initial algorithm to construct an isolation forest is indeed revisited to preserve the data inner structure without affecting the efficiency of the outlier detection. Experiments conducted both on synthetic and real-world data sets show that, in addition to improving the detection of abnormal data points, the proposed variant of isolation forest allows for a reconstruction of the subspaces of high density. Therefore, the former can serve as a basis for a unified approach to detect global and local anomalies, which is a necessary condition to then provide users with informative descriptions of the data.

要理解数据集中的某些点为何被视为异常点，就必须考虑到正常点的结构。许多机器学习方法一方面致力于异常点的识别，另一方面也致力于数据内部结构的识别，而我们引入了一种解决方案，使用相同的数据模型--隔离林的变体--来回答这两项任务。为了在不影响离群点检测效率的情况下保留数据的内部结构，我们重新研究了构建隔离林的初始算法。在合成数据集和真实世界数据集上进行的实验表明，除了提高异常数据点的检测效率外，所提出的隔离林变体还能重建高密度子空间。因此，前者可以作为检测全局和局部异常的统一方法的基础，而全局和局部异常是为用户提供数据信息描述的必要条件。

引用次数: 0

The unresolved need for dependable guarantees on security, sovereignty, and trust in data ecosystems 对数据生态系统的安全、主权和信任提供可靠保证的需求尚未得到解决

IF 2.5 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2024-03-19 DOI: 10.1016/j.datak.2024.102301

Johannes Lohmöller , Jan Pennekamp , Roman Matzutt , Carolin Victoria Schneider , Eduard Vlad , Christian Trautwein , Klaus Wehrle

Data ecosystems emerged as a new paradigm to facilitate the automated and massive exchange of data from heterogeneous information sources between different stakeholders. However, the corresponding benefits come with unforeseen risks as sensitive information is potentially exposed, questioning data ecosystem reliability. Consequently, data security is of utmost importance and, thus, a central requirement for successfully realizing data ecosystems. Academia has recognized this requirement, and current initiatives foster sovereign participation via a federated infrastructure where participants retain local control over what data they offer to whom. However, recent proposals place significant trust in remote infrastructure by implementing organizational security measures such as certification processes before the admission of a participant. At the same time, the data sensitivity incentivizes participants to bypass the organizational security measures to maximize their benefit. This issue significantly weakens security, sovereignty, and trust guarantees and highlights that organizational security measures are insufficient in this context. In this paper, we argue that data ecosystems must be extended with technical means to (re)establish dependable guarantees. We underpin this need with three representative use cases for data ecosystems, which cover personal, economic, and governmental data, and systematically map the lack of dependable guarantees in related work. To this end, we identify three enablers of dependable guarantees, namely trusted remote policy enforcement, verifiable data tracking, and integration of resource-constrained participants. These enablers are critical for securely implementing data ecosystems in data-sensitive contexts.

数据生态系统作为一种新的模式出现，可促进不同利益相关者之间从异构信息源自动和大规模交换数据。然而，相应的好处也伴随着不可预见的风险，因为敏感信息可能会暴露，从而对数据生态系统的可靠性提出质疑。因此，数据安全至关重要，也是成功实现数据生态系统的核心要求。学术界已经认识到这一要求，目前的倡议是通过联合基础设施促进主权参与，参与者保留对向谁提供哪些数据的本地控制权。不过，最近的提案通过实施组织安全措施（如在接纳参与者之前的认证流程），对远程基础设施给予了极大的信任。与此同时，数据的敏感性促使参与者绕过组织安全措施，以实现利益最大化。这个问题极大地削弱了安全、主权和信任保证，并凸显了组织安全措施在这种情况下的不足。在本文中，我们认为数据生态系统必须通过技术手段进行扩展，以（重新）建立可靠的保证。我们通过三个具有代表性的数据生态系统使用案例（涵盖个人、经济和政府数据）来支持这一需求，并系统地描绘了相关工作中缺乏可靠保障的情况。为此，我们确定了可靠保证的三个推动因素，即可信的远程策略执行、可验证的数据跟踪和资源受限参与者的整合。这些使能因素对于在数据敏感环境中安全实施数据生态系统至关重要。

{"title":"The unresolved need for dependable guarantees on security, sovereignty, and trust in data ecosystems","authors":"Johannes Lohmöller , Jan Pennekamp , Roman Matzutt , Carolin Victoria Schneider , Eduard Vlad , Christian Trautwein , Klaus Wehrle","doi":"10.1016/j.datak.2024.102301","DOIUrl":"https://doi.org/10.1016/j.datak.2024.102301","url":null,"abstract":"<div><p>Data ecosystems emerged as a new paradigm to facilitate the automated and massive exchange of data from heterogeneous information sources between different stakeholders. However, the corresponding benefits come with unforeseen risks as sensitive information is potentially exposed, questioning data ecosystem reliability. Consequently, data security is of utmost importance and, thus, a central requirement for successfully realizing data ecosystems. Academia has recognized this requirement, and current initiatives foster sovereign participation via a federated infrastructure where participants retain local control over what data they offer to whom. However, recent proposals place significant trust in remote infrastructure by implementing organizational security measures such as certification processes before the admission of a participant. At the same time, the data sensitivity incentivizes participants to bypass the organizational security measures to maximize their benefit. This issue significantly weakens security, sovereignty, and trust guarantees and highlights that organizational security measures are insufficient in this context. In this paper, we argue that data ecosystems must be extended with technical means to (re)establish dependable guarantees. We underpin this need with three representative use cases for data ecosystems, which cover personal, economic, and governmental data, and systematically map the lack of dependable guarantees in related work. To this end, we identify three enablers of dependable guarantees, namely trusted remote policy enforcement, verifiable data tracking, and integration of resource-constrained participants. These enablers are critical for securely implementing data ecosystems in data-sensitive contexts.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"151 ","pages":"Article 102301"},"PeriodicalIF":2.5,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000259/pdfft?md5=5d1fb135737fcc7ddf73713a94b46ce0&pid=1-s2.0-S0169023X24000259-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140192029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Insights into commonalities of a sample: A visualization framework to explore unusual subset-dataset relationships 洞察样本的共性：探索异常子集与数据集关系的可视化框架

IF 2.5 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2024-03-12 DOI: 10.1016/j.datak.2024.102299

Nikolas Stege , Michael H. Breitner

Domain experts are driven by business needs, while data analysts develop and use various algorithms, methods, and tools, but often without domain knowledge. A major challenge for companies and organizations is to integrate data analytics in business processes and workflows. We deduce an interactive process and visualization framework to enable value creating collaboration in inter- and cross-disciplinary teams. Domain experts and data analysts are both empowered to analyze and discuss results and come to well-founded insights and implications. Inspired by a typical auditing problem, we develop and apply a visualization framework to single out unusual data in general subsets for potential further investigation. Our framework is applicable to both unusual data detected manually by domain experts or by algorithms applied by data analysts. Application examples show typical interaction, collaboration, visualization, and decision support.

领域专家由业务需求驱动，而数据分析师则开发和使用各种算法、方法和工具，但往往不具备领域知识。公司和组织面临的一大挑战是如何将数据分析整合到业务流程和工作流程中。我们推导出了一个交互式流程和可视化框架，以便在跨学科和跨专业团队中开展创造价值的协作。领域专家和数据分析师都有权分析和讨论结果，并得出有理有据的见解和影响。受一个典型审计问题的启发，我们开发并应用了一个可视化框架，以在一般子集中挑选出异常数据，进行潜在的进一步调查。我们的框架既适用于领域专家手动检测到的异常数据，也适用于数据分析师使用算法检测到的异常数据。应用实例展示了典型的交互、协作、可视化和决策支持。

{"title":"Insights into commonalities of a sample: A visualization framework to explore unusual subset-dataset relationships","authors":"Nikolas Stege , Michael H. Breitner","doi":"10.1016/j.datak.2024.102299","DOIUrl":"10.1016/j.datak.2024.102299","url":null,"abstract":"<div><p>Domain experts are driven by business needs, while data analysts develop and use various algorithms, methods, and tools, but often without domain knowledge. A major challenge for companies and organizations is to integrate data analytics in business processes and workflows. We deduce an interactive process and visualization framework to enable value creating collaboration in inter- and cross-disciplinary teams. Domain experts and data analysts are both empowered to analyze and discuss results and come to well-founded insights and implications. Inspired by a typical auditing problem, we develop and apply a visualization framework to single out unusual data in general subsets for potential further investigation. Our framework is applicable to both unusual data detected manually by domain experts or by algorithms applied by data analysts. Application examples show typical interaction, collaboration, visualization, and decision support.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"151 ","pages":"Article 102299"},"PeriodicalIF":2.5,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000235/pdfft?md5=5865a6d1aaccbc08965569d170abf88f&pid=1-s2.0-S0169023X24000235-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140151811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Time-aware structure matching for temporal knowledge graph alignment 用于时态知识图谱对齐的时间感知结构匹配

IF 2.5 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2024-03-11 DOI: 10.1016/j.datak.2024.102300

Wei Jia , Ruizhe Ma , Li Yan , Weinan Niu , Zongmin Ma

Entity alignment, aiming at identifying equivalent entity pairs across multiple knowledge graphs (KGs), serves as a vital step for knowledge fusion. As the majority of KGs undergo continuous evolution, existing solutions utilize graph neural networks (GNNs) to tackle entity alignment within temporal knowledge graphs (TKGs). However, this prevailing method often overlooks the consequential impact of relation embedding generation on entity embeddings through inherent structures. In this paper, we propose a novel model named Time-aware Structure Matching based on GNNs (TSM-GNN) that encompasses the learning of both topological and inherent structures. Our key innovation lies in a unique method for generating relation embeddings, which can enhance entity embeddings via inherent structure. Specifically, we utilize the translation property of knowledge graphs to obtain the entity embedding that is mapped into a time-aware vector space. Subsequently, we employ GNNs to learn global entity representation. To better capture the useful information from neighboring relations and entities, we introduce a time-aware attention mechanism that assigns different importance weights to different time-aware inherent structures. Experimental results on three real-world datasets demonstrate that TSM-GNN outperforms several state-of-the-art approaches for entity alignment between TKGs.

实体对齐旨在识别多个知识图谱（KG）中的等效实体对，是知识融合的重要步骤。由于大多数知识图谱都在不断演变，现有的解决方案利用图神经网络（GNN）来解决时态知识图谱（TKG）中的实体配准问题。然而，这种主流方法往往忽略了关系嵌入的生成会通过固有结构对实体嵌入产生影响。在本文中，我们提出了一种名为 "基于 GNNs 的时间感知结构匹配"（TSM-GNN）的新型模型，它包含拓扑结构和固有结构的学习。我们的关键创新在于一种生成关系嵌入的独特方法，它可以通过固有结构增强实体嵌入。具体来说，我们利用知识图谱的平移特性来获得映射到时间感知向量空间的实体嵌入。随后，我们利用 GNN 学习全局实体表示。为了更好地捕捉来自相邻关系和实体的有用信息，我们引入了时间感知关注机制，为不同的时间感知固有结构分配不同的重要性权重。在三个真实世界数据集上的实验结果表明，TSM-GNN 在 TKG 之间的实体配准方面优于几种最先进的方法。

{"title":"Time-aware structure matching for temporal knowledge graph alignment","authors":"Wei Jia , Ruizhe Ma , Li Yan , Weinan Niu , Zongmin Ma","doi":"10.1016/j.datak.2024.102300","DOIUrl":"https://doi.org/10.1016/j.datak.2024.102300","url":null,"abstract":"<div><p>Entity alignment, aiming at identifying equivalent entity pairs across multiple knowledge graphs (KGs), serves as a vital step for knowledge fusion. As the majority of KGs undergo continuous evolution, existing solutions utilize graph neural networks (GNNs) to tackle entity alignment within temporal knowledge graphs (TKGs). However, this prevailing method often overlooks the consequential impact of relation embedding generation on entity embeddings through inherent structures. In this paper, we propose a novel model named Time-aware Structure Matching based on GNNs (TSM-GNN) that encompasses the learning of both topological and inherent structures. Our key innovation lies in a unique method for generating relation embeddings, which can enhance entity embeddings via inherent structure. Specifically, we utilize the translation property of knowledge graphs to obtain the entity embedding that is mapped into a time-aware vector space. Subsequently, we employ GNNs to learn global entity representation. To better capture the useful information from neighboring relations and entities, we introduce a time-aware attention mechanism that assigns different importance weights to different time-aware inherent structures. Experimental results on three real-world datasets demonstrate that TSM-GNN outperforms several state-of-the-art approaches for entity alignment between TKGs.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"151 ","pages":"Article 102300"},"PeriodicalIF":2.5,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140138228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A knowledge-sharing platform for space resources 空间资源知识共享平台

IF 2.5 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2024-02-29 DOI: 10.1016/j.datak.2024.102286

Marcos Da Silveira, Louis Deladiennee, Emmanuel Scolan, Cedric Pruski

The ever-increasing interest of academia, industry, and government institutions in space resource information highlights the difficulty of finding, accessing, integrating, and reusing this information. Although information is regularly published on the internet, it is disseminated on many different websites and in different formats, including scientific publications, patents, news, and reports. We are currently developing a knowledge management and sharing platform for space resources. This tool, which relies on the combined use of knowledge graphs and ontologies, formalises the domain knowledge contained in the above-mentioned documents and makes it more readily available to the community. In this article, we describe the concepts and techniques of knowledge extraction and management adopted during the design and implementation of the platform.

学术界、工业界和政府机构对空间资源信息的兴趣与日俱增，这凸显了查找、获取、整合和再利用这些信息的难度。虽然信息会定期在互联网上发布，但这些信息在许多不同的网站上以不同的形式传播，包括科学出版物、专利、新闻和报告。我们目前正在开发一个空间资源知识管理和共享平台。该工具依赖于知识图谱和本体的结合使用，将上述文件中包含的领域知识正规化，使其更容易为社区所用。在本文中，我们将介绍在设计和实施该平台过程中采用的知识提取和管理概念及技术。

引用次数: 0

Knowledge graph-based image classification 基于知识图谱的图像分类

IF 2.5 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2024-02-28 DOI: 10.1016/j.datak.2024.102285

Franck Anaël Mbiaya , Christel Vrain , Frédéric Ros , Thi-Bich-Hanh Dao , Yves Lucas

This paper introduces a deep learning method for image classification that leverages knowledge formalized as a graph created from information represented by pairs attribute/value. The proposed method investigates a loss function that adaptively combines the classical cross-entropy commonly used in deep learning with a novel penalty function. The novel loss function is derived from the representation of nodes after embedding the knowledge graph and incorporates the proximity between class and image nodes. Its formulation enables the model to focus on identifying the boundary between the most challenging classes to distinguish. Experimental results on several image databases demonstrate improved performance compared to state-of-the-art methods, including classical deep learning algorithms and recent algorithms that incorporate knowledge represented by a graph.

本文介绍了一种用于图像分类的深度学习方法，该方法利用的知识形式化为由属性/值对表示的信息创建的图。该方法研究了一种损失函数，它将深度学习中常用的经典交叉熵与一种新型惩罚函数自适应地结合在一起。新颖的损失函数来自嵌入知识图谱后的节点表示，并结合了类和图像节点之间的邻近性。它的表述使模型能够专注于识别最难区分的类别之间的边界。在多个图像数据库上的实验结果表明，与最先进的方法（包括经典的深度学习算法和结合了图表示的知识的最新算法）相比，该模型的性能有所提高。

引用次数: 0

Improving the identification of relevant variants in genome information systems: A methodological approach with a case study on early onset Alzheimer's disease 改进基因组信息系统中相关变异的识别：方法论方法与早发性阿尔茨海默病案例研究

IF 2.5 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2024-02-09 DOI: 10.1016/j.datak.2024.102284

Mireia Costa, Ana León, Óscar Pastor

Alzheimer's disease is the most common type of dementia in the elderly. Nevertheless, there is an early onset form that is difficult to diagnose precisely. As the genetic component is the most critical factor in developing this disease, identifying relevant genetic variants is key to obtaining a more reliable and straightforward diagnosis. The information about these variants is stored in an extensive number of data sources, which must be carefully analyzed to select only the information with sufficient quality to be used in a clinical setting. This selection has become complex due to the increasing available genomic information. The SILE method was designed to systematize identifying relevant variants for a disease in this challenging context. However, several problems on how SILE identifies relevant variants were discovered when applying the method to the early onset form of Alzheimer's disease. More specifically, the method failed to address specific features of this disease such as its low incidence and familiar component. This paper proposes an improvement of the identification process defined by the SILE method to make it applicable to a further spectrum of diseases. Details of how the proposed solution has been applied are also reported. As a result of this improvement, a set of 29 variants has been identified (25 variants Accepted with a Limited Evidence and 4 Accepted with Moderate Evidence). This constitutes a valuable result that facilitates and reinforces the genetic diagnosis of the disease.

阿尔茨海默病是最常见的老年痴呆症。然而，也有一种难以精确诊断的早发型老年痴呆症。由于遗传因素是导致这种疾病的最关键因素，因此识别相关的遗传变异是获得更可靠、更直接诊断的关键。有关这些变异的信息存储在大量数据源中，必须对这些数据源进行仔细分析，只选择质量足够高的信息用于临床。由于可用的基因组信息越来越多，这种选择变得越来越复杂。SILE 方法就是为了在这种充满挑战的情况下系统地识别疾病的相关变异而设计的。然而，在将 SILE 方法应用于早发性阿尔茨海默病时，发现了该方法在识别相关变异方面存在的一些问题。更具体地说，该方法未能解决这种疾病的具体特征，如发病率低和熟悉的成分。本文建议改进 SILE 方法定义的识别过程，使其适用于更多的疾病。本文还详细介绍了如何应用所提出的解决方案。经过改进后，已识别出一组 29 个变体（25 个变体以有限证据接受，4 个以中等证据接受）。这是一项宝贵的成果，有助于并加强疾病的基因诊断。

{"title":"Improving the identification of relevant variants in genome information systems: A methodological approach with a case study on early onset Alzheimer's disease","authors":"Mireia Costa, Ana León, Óscar Pastor","doi":"10.1016/j.datak.2024.102284","DOIUrl":"https://doi.org/10.1016/j.datak.2024.102284","url":null,"abstract":"<div><p>Alzheimer's disease is the most common type of dementia in the elderly. Nevertheless, there is an early onset form that is difficult to diagnose precisely. As the genetic component is the most critical factor in developing this disease, identifying relevant genetic variants is key to obtaining a more reliable and straightforward diagnosis. The information about these variants is stored in an extensive number of data sources, which must be carefully analyzed to select only the information with sufficient quality to be used in a clinical setting. This selection has become complex due to the increasing available genomic information. The SILE method was designed to systematize identifying relevant variants for a disease in this challenging context. However, several problems on how SILE identifies relevant variants were discovered when applying the method to the early onset form of Alzheimer's disease. More specifically, the method failed to address specific features of this disease such as its low incidence and familiar component. This paper proposes an improvement of the identification process defined by the SILE method to make it applicable to a further spectrum of diseases. Details of how the proposed solution has been applied are also reported. As a result of this improvement, a set of 29 variants has been identified (25 variants Accepted with a Limited Evidence and 4 Accepted with Moderate Evidence). This constitutes a valuable result that facilitates and reinforces the genetic diagnosis of the disease.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"151 ","pages":"Article 102284"},"PeriodicalIF":2.5,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000089/pdfft?md5=571739f0b90877da191a9d94a852f178&pid=1-s2.0-S0169023X24000089-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139738034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fuzzy-Ontology based knowledge driven disease risk level prediction with optimization assisted ensemble classifier 基于模糊本体的知识驱动型疾病风险水平预测与优化辅助集合分类器

IF 2.5 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2024-02-04 DOI: 10.1016/j.datak.2024.102278

Huma Parveen , Syed Wajahat Abbas Rizvi , Raja Sarath Kumar Boddu

Modern medicinal analysis is a complex procedure, requiring precise patient data, scientific knowledge obtained over numerous years and a theoretical understanding of related medical literature. To improve the accuracy and to reduce the time for diagnosis, clinical decision support systems (DSS) were introduced, which incorporate data mining schemes for enhancing the disease diagnosing accuracy. This work proposes a new disease-predicting model that involves 3 stages. Initially, “improved stemming and tokenization” are carried out in the pre-processing stage. Then, the “Fuzzy ontology, improved mutual information (MI), and correlation features” are extracted. Then, prediction is carried out via ensemble classifiers that include “improved Fuzzy logic, Long Short Term Memory (LSTM), Deep Convolution Neural Network (DCNN), and Bidirectional Gated Recurrent Unit (Bi-GRU)”.The outcomes from improved fuzzy logic, LSTM, and DCNN are further classified via Bi-GRU which offers the results. Specifically, Bi-GRU weights are optimally tuned using Deer Hunting Update Explored Arithmetic Optimization (DHUEAO). Finally, the efficiency of the proposed work is determined concerning a variety of metrics.

现代医学分析是一个复杂的过程，需要精确的病人数据、多年积累的科学知识以及对相关医学文献的理论理解。为了提高诊断的准确性并缩短诊断时间，临床决策支持系统（DSS）应运而生，它结合了数据挖掘方案以提高疾病诊断的准确性。这项工作提出了一种新的疾病预测模型，包括 3 个阶段。首先，在预处理阶段进行 "改进的词干化和标记化"。然后，提取 "模糊本体、改进的互信息（MI）和相关特征"。然后，通过包括 "改进的模糊逻辑、长短期记忆（LSTM）、深度卷积神经网络（DCNN）和双向门控递归单元（Bi-GRU）"在内的集合分类器进行预测。具体来说，Bi-GRU 权重是通过猎鹿更新探索算术优化（DHUEAO）进行优化调整的。最后，根据各种指标确定了拟议工作的效率。

{"title":"Fuzzy-Ontology based knowledge driven disease risk level prediction with optimization assisted ensemble classifier","authors":"Huma Parveen , Syed Wajahat Abbas Rizvi , Raja Sarath Kumar Boddu","doi":"10.1016/j.datak.2024.102278","DOIUrl":"10.1016/j.datak.2024.102278","url":null,"abstract":"<div><p>Modern medicinal analysis is a complex procedure, requiring precise patient data, scientific knowledge obtained over numerous years and a theoretical understanding of related medical literature. To improve the accuracy and to reduce the time for diagnosis, clinical decision support systems (DSS) were introduced, which incorporate data mining schemes for enhancing the disease diagnosing accuracy. This work proposes a new disease-predicting model that involves 3 stages. Initially, “improved stemming and tokenization” are carried out in the pre-processing stage. Then, the “Fuzzy ontology, improved mutual information (MI), and correlation features” are extracted. Then, prediction is carried out via ensemble classifiers that include “improved Fuzzy logic, Long Short Term Memory (LSTM), Deep Convolution Neural Network (DCNN), and Bidirectional Gated Recurrent Unit (Bi-GRU)”.The outcomes from improved fuzzy logic, LSTM, and DCNN are further classified via Bi-GRU which offers the results. Specifically, Bi-GRU weights are optimally tuned using Deer Hunting Update Explored Arithmetic Optimization (DHUEAO). Finally, the efficiency of the proposed work is determined concerning a variety of metrics.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"151 ","pages":"Article 102278"},"PeriodicalIF":2.5,"publicationDate":"2024-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139677918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fusion learning of preference and bias from ratings and reviews for item recommendation 从评分和评论中融合学习偏好和偏见，以进行项目推荐

IF 2.5 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2024-02-03 DOI: 10.1016/j.datak.2024.102283

Junrui Liu , Tong Li , Zhen Yang , Di Wu , Huan Liu

Recommendation methods improve rating prediction performance by learning selection bias phenomenon-users tend to rate items they like. These methods model selection bias by calculating the propensities of ratings, but inaccurate propensity could introduce more noise, fail to model selection bias, and reduce prediction performance. We argue that learning interaction features can effectively model selection bias and improve model performance, as interaction features explain the reason of the trend. Reviews can be used to model interaction features because they have a strong intrinsic correlation with user interests and item interactions. In this study, we propose a preference- and bias-oriented fusion learning model (PBFL) that models the interaction features based on reviews and user preferences to make rating predictions. Our proposal both embeds traditional user preferences in reviews, interactions, and ratings and considers word distribution bias and review quoting to model interaction features. Six real-world datasets are used to demonstrate effectiveness and performance. PBFL achieves an average improvement of 4.46% in root-mean-square error (RMSE) and 3.86% in mean absolute error (MAE) over the best baseline.

推荐方法通过学习选择偏差现象--用户倾向于给自己喜欢的项目评分--来提高评分预测性能。这些方法通过计算评分的倾向性来模拟选择偏差，但不准确的倾向性会带来更多噪音，无法模拟选择偏差，降低预测性能。我们认为，学习交互特征可以有效地模拟选择偏差并提高模型性能，因为交互特征可以解释趋势的原因。评论可用于交互特征建模，因为它们与用户兴趣和项目交互有很强的内在相关性。在本研究中，我们提出了一种以偏好和偏见为导向的融合学习模型（PBFL），该模型基于评论和用户偏好对交互特征进行建模，从而做出评分预测。我们的建议既在评论、互动和评分中嵌入了传统的用户偏好，又考虑了单词分布偏差和评论引用，从而为互动特征建模。我们使用了六个真实世界的数据集来证明其有效性和性能。与最佳基准相比，PBFL 的均方根误差 (RMSE) 平均提高了 4.46%，平均绝对误差 (MAE) 平均提高了 3.86%。

{"title":"Fusion learning of preference and bias from ratings and reviews for item recommendation","authors":"Junrui Liu , Tong Li , Zhen Yang , Di Wu , Huan Liu","doi":"10.1016/j.datak.2024.102283","DOIUrl":"10.1016/j.datak.2024.102283","url":null,"abstract":"<div><p>Recommendation methods improve rating prediction performance by learning selection bias phenomenon-users tend to rate items they like. These methods model selection bias by calculating the propensities of ratings, but inaccurate propensity could introduce more noise, fail to model selection bias, and reduce prediction performance. We argue that learning interaction features can effectively model selection bias and improve model performance, as interaction features explain the reason of the trend. Reviews can be used to model interaction features because they have a strong intrinsic correlation with user interests and item interactions. In this study, we propose a preference- and bias-oriented fusion learning model (PBFL) that models the interaction features based on reviews and user preferences to make rating predictions. Our proposal both embeds traditional user preferences in reviews, interactions, and ratings and considers word distribution bias and review quoting to model interaction features. Six real-world datasets are used to demonstrate effectiveness and performance. PBFL achieves an average improvement of 4.46% in root-mean-square error (RMSE) and 3.86% in mean absolute error (MAE) over the best baseline.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"150 ","pages":"Article 102283"},"PeriodicalIF":2.5,"publicationDate":"2024-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139677949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0