Data & Knowledge Engineering最新文献_第3页

Developing A Decision Support System for Healthcare Practices: A Design Science Research Approach 为医疗实践开发决策支持系统：设计科学研究方法

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2024-07-17 DOI: 10.1016/j.datak.2024.102344

Arun Sen , Atish P. Sinha , Cong Zhang

We propose a new approach for designing a decision support system (DSS) for the transformation of healthcare practices. Practice transformation helps practices transition from their current state to patient-centered medical home (PCMH) model of care. Our approach employs activity theory to derive the elements of practice transformation by designing and integrating two ontologies: a domain ontology and a task ontology. By incorporating both goal-oriented and task-oriented aspects of the practice transformation process and specifying how they interact, our integrated design model for the DSS provides prescriptive knowledge on assessing the current status of a practice with respect to PCMH recognition and navigating efficiently through a complex solution space. This knowledge, which is at a moderate level of abstraction and expressed in a language that practitioners understand, contributes to the literature by providing a formulation for a nascent design theory. We implement the integrated design model as a DSS prototype; results of validation tests conducted on the prototype indicate that it is superior to the existing PCMH readiness tracking tool with respect to effectiveness, usability, efficiency, and sustainability.

我们提出了一种设计决策支持系统（DSS）的新方法，用于医疗实践的转型。实践转型有助于医疗实践从当前状态过渡到以患者为中心的医疗之家（PCMH）护理模式。我们的方法采用活动理论，通过设计和整合两个本体：领域本体和任务本体，推导出实践转型的要素。通过整合实践转型过程中的目标导向和任务导向两个方面，并明确它们之间的互动方式，我们的 DSS 集成设计模型提供了有关评估实践在 PCMH 识别方面的现状以及在复杂的解决方案空间中有效导航的规范性知识。这些知识的抽象程度适中，并以从业人员能够理解的语言表达，为新生的设计理论提供了一种表述方式，从而为文献做出了贡献。我们将综合设计模型作为一个 DSS 原型来实施；对该原型进行的验证测试结果表明，它在有效性、可用性、效率和可持续性方面都优于现有的 PCMH 准备情况跟踪工具。

{"title":"Developing A Decision Support System for Healthcare Practices: A Design Science Research Approach","authors":"Arun Sen , Atish P. Sinha , Cong Zhang","doi":"10.1016/j.datak.2024.102344","DOIUrl":"10.1016/j.datak.2024.102344","url":null,"abstract":"<div><p>We propose a new approach for designing a decision support system (DSS) for the transformation of healthcare practices. Practice transformation helps practices transition from their current state to patient-centered medical home (PCMH) model of care. Our approach employs activity theory to derive the elements of practice transformation by designing and integrating two ontologies: a domain ontology and a task ontology. By incorporating both goal-oriented and task-oriented aspects of the practice transformation process and specifying how they interact, our integrated design model for the DSS provides prescriptive knowledge on assessing the current status of a practice with respect to PCMH recognition and navigating efficiently through a complex solution space. This knowledge, which is at a moderate level of abstraction and expressed in a language that practitioners understand, contributes to the literature by providing a formulation for a nascent design theory. We implement the integrated design model as a DSS prototype; results of validation tests conducted on the prototype indicate that it is superior to the existing PCMH readiness tracking tool with respect to effectiveness, usability, efficiency, and sustainability.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"154 ","pages":"Article 102344"},"PeriodicalIF":2.7,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141840536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Increasing the precision of public transit user activity location detection from smart card data analysis via spatial–temporal DBSCAN 通过时空 DBSCAN 提高智能卡数据分析中公交用户活动位置检测的精度

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2024-07-15 DOI: 10.1016/j.datak.2024.102343

Fehmi Can Ozer , Hediye Tuydes-Yaman , Gulcin Dalkic-Melek

Smart Card (SC) systems have been increasingly adopted by public transit (PT) agencies all over the world, which facilitates not only fare collection but also PT service analyses and evaluations. Spatial clustering is one of the most important methods to investigate this big data in terms of activity locations, travel patterns, user behaviours, etc. Besides spatio-temporal analysis of the clusters provide further precision for detection of PT traveller activity locations and durations. This study focuses on investigation and comparison of the effectiveness of two density-based clustering algorithms, DBSCAN, and ST-DBSCAN. The numeric results are obtained using SC data (public bus system) from the metropolitan city of Konya, Turkey, and clustering algorithms are applied to a sample of this smart card data, and activity clusters are detected for the users. The results of the study suggested that ST-DBSCAN constitutes more compact clusters in both time and space for transportation researchers who want to accurately detect passengers’ individual activity regions using SC data.

世界各地的公共交通（PT）机构越来越多地采用智能卡（SC）系统，这不仅方便了收费，也方便了对公共交通服务的分析和评估。空间聚类是研究活动地点、出行模式、用户行为等大数据的最重要方法之一。此外，对聚类的时空分析还能进一步精确检测公共交通乘客的活动地点和持续时间。本研究重点调查和比较了两种基于密度的聚类算法--DBSCAN 和 ST-DBSCAN--的有效性。研究使用土耳其科尼亚市的 SC 数据（公共汽车系统）得出了数值结果，并将聚类算法应用于该智能卡数据样本，检测出用户的活动聚类。研究结果表明，ST-DBSCAN 在时间和空间上都能构成更紧凑的聚类，适用于希望利用 SC 数据准确检测乘客个人活动区域的交通研究人员。

{"title":"Increasing the precision of public transit user activity location detection from smart card data analysis via spatial–temporal DBSCAN","authors":"Fehmi Can Ozer , Hediye Tuydes-Yaman , Gulcin Dalkic-Melek","doi":"10.1016/j.datak.2024.102343","DOIUrl":"10.1016/j.datak.2024.102343","url":null,"abstract":"<div><p>Smart Card (SC) systems have been increasingly adopted by public transit (PT) agencies all over the world, which facilitates not only fare collection but also PT service analyses and evaluations. Spatial clustering is one of the most important methods to investigate this big data in terms of activity locations, travel patterns, user behaviours, etc. Besides spatio-temporal analysis of the clusters provide further precision for detection of PT traveller activity locations and durations. This study focuses on investigation and comparison of the effectiveness of two density-based clustering algorithms, DBSCAN, and ST-DBSCAN. The numeric results are obtained using SC data (public bus system) from the metropolitan city of Konya, Turkey, and clustering algorithms are applied to a sample of this smart card data, and activity clusters are detected for the users. The results of the study suggested that ST-DBSCAN constitutes more compact clusters in both time and space for transportation researchers who want to accurately detect passengers’ individual activity regions using SC data.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"153 ","pages":"Article 102343"},"PeriodicalIF":2.7,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141716049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evaluating quality of ontology-driven conceptual models abstractions 评估本体驱动概念模型抽象的质量

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2024-07-14 DOI: 10.1016/j.datak.2024.102342

Elena Romanenko , Diego Calvanese , Giancarlo Guizzardi

The complexity of an (ontology-driven) conceptual model highly correlates with the complexity of the domain and software for which it is designed. With that in mind, an algorithm for producing ontology-driven conceptual model abstractions was previously proposed. In this paper, we empirically evaluate the quality of the abstractions produced by it. First, we have implemented and tested the last version of the algorithm over a FAIR catalog of models represented in the ontology-driven conceptual modeling language OntoUML. Second, we performed three user studies to evaluate the usefulness of the resulting abstractions as perceived by modelers. This paper reports on the findings of these experiments and reflects on how they can be exploited to improve the existing algorithm.

本体驱动）概念模型的复杂性与设计该模型的领域和软件的复杂性密切相关。有鉴于此，我们之前提出了一种生成本体驱动概念模型抽象的算法。在本文中，我们对该算法生成的抽象的质量进行了实证评估。首先，我们在用本体驱动的概念模型语言 OntoUML 表示的模型 FAIR 目录上实现并测试了该算法的最后一个版本。其次，我们进行了三项用户研究，以评估建模者所感知的抽象结果的有用性。本文报告了这些实验的结果，并对如何利用这些结果改进现有算法进行了思考。

引用次数: 0

An interactive approach to semantic enrichment with geospatial data 利用地理空间数据丰富语义的互动方法

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2024-07-04 DOI: 10.1016/j.datak.2024.102341

Flavio De Paoli , Michele Ciavotta , Roberto Avogadro , Emil Hristov , Milena Borukova , Dessislava Petrova-Antonova , Iva Krasteva

The ubiquitous availability of datasets has spurred the utilization of Artificial Intelligence methods and models to extract valuable insights, unearth hidden patterns, and predict future trends. However, the current process of data collection and linking heavily relies on expert knowledge and domain-specific understanding, which engenders substantial costs in terms of both time and financial resources. Therefore, streamlining the data acquisition, harmonization, and enrichment procedures to deliver high-fidelity datasets readily usable for analytics is paramount. This paper explores the capabilities of SemTUI, a comprehensive framework designed to support the enrichment of tabular data by leveraging semantics and user interaction. Utilizing SemTUI, an iterative and interactive approach is proposed to enhance the flexibility, usability and efficiency of geospatial data enrichment. The approach is evaluated through a pilot case study focused on urban planning, with a particular emphasis on geocoding. Using a real-world scenario involving the analysis of kindergarten accessibility within walking distance, the study demonstrates the proficiency of SemTUI in generating precise and semantically enriched location data. The incorporation of human feedback in the enrichment process successfully enhances the quality of the resulting dataset, highlighting SemTUI’s potential for broader applications in geospatial analysis and its usability for users with limited expertise in manipulating geospatial data.

无处不在的数据集促使人们利用人工智能方法和模型来提取有价值的见解、发掘隐藏的模式并预测未来趋势。然而，目前的数据收集和链接过程严重依赖于专家知识和对特定领域的理解，这在时间和财政资源方面都产生了巨大的成本。因此，最重要的是简化数据采集、协调和丰富程序，以提供可随时用于分析的高保真数据集。本文探讨了 SemTUI 的功能，这是一个综合框架，旨在通过利用语义和用户交互来支持表格数据的丰富。利用 SemTUI，提出了一种迭代和交互式方法，以提高地理空间数据丰富的灵活性、可用性和效率。该方法通过一项以城市规划为重点的试点案例研究进行了评估，特别强调了地理编码。该研究使用了一个涉及分析步行距离内幼儿园可达性的真实场景，展示了 SemTUI 在生成精确且语义丰富的位置数据方面的能力。在丰富过程中加入人工反馈，成功地提高了所生成数据集的质量，凸显了 SemTUI 在地理空间分析领域更广泛应用的潜力，以及它对于在地理空间数据操作方面专业知识有限的用户的可用性。

{"title":"An interactive approach to semantic enrichment with geospatial data","authors":"Flavio De Paoli , Michele Ciavotta , Roberto Avogadro , Emil Hristov , Milena Borukova , Dessislava Petrova-Antonova , Iva Krasteva","doi":"10.1016/j.datak.2024.102341","DOIUrl":"10.1016/j.datak.2024.102341","url":null,"abstract":"<div><p>The ubiquitous availability of datasets has spurred the utilization of Artificial Intelligence methods and models to extract valuable insights, unearth hidden patterns, and predict future trends. However, the current process of data collection and linking heavily relies on expert knowledge and domain-specific understanding, which engenders substantial costs in terms of both time and financial resources. Therefore, streamlining the data acquisition, harmonization, and enrichment procedures to deliver high-fidelity datasets readily usable for analytics is paramount. This paper explores the capabilities of <em>SemTUI</em>, a comprehensive framework designed to support the enrichment of tabular data by leveraging semantics and user interaction. Utilizing SemTUI, an iterative and interactive approach is proposed to enhance the flexibility, usability and efficiency of geospatial data enrichment. The approach is evaluated through a pilot case study focused on urban planning, with a particular emphasis on geocoding. Using a real-world scenario involving the analysis of kindergarten accessibility within walking distance, the study demonstrates the proficiency of SemTUI in generating precise and semantically enriched location data. The incorporation of human feedback in the enrichment process successfully enhances the quality of the resulting dataset, highlighting SemTUI’s potential for broader applications in geospatial analysis and its usability for users with limited expertise in manipulating geospatial data.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"153 ","pages":"Article 102341"},"PeriodicalIF":2.7,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X2400065X/pdfft?md5=969535621599adcaa2ec5e5d12e392b3&pid=1-s2.0-S0169023X2400065X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141698385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Explanation, semantics, and ontology 解释、语义和本体论

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2024-06-25 DOI: 10.1016/j.datak.2024.102325

Giancarlo Guizzardi , Nicola Guarino

The terms ‘semantics’ and ‘ontology’ are increasingly appearing together with ‘explanation’, not only in the scientific literature, but also in everyday social interactions, in particular, within organizations. Ontologies have been shown to play a key role in supporting the semantic interoperability of data and knowledge representation structures used by information systems. With the proliferation of applications of Artificial Intelligence (AI) in different settings and the increasing need to guarantee their explainability (but also their interoperability) in critical contexts, the term ‘explanation’ has also become part of the scientific and technical jargon of modern information systems engineering. However, all of these terms are also significantly overloaded. In this paper, we address several interpretations of these notions, with an emphasis on their strong connection. Specifically, we discuss a notion of explanation termed ontological unpacking, which aims at explaining symbolic domain descriptions (e.g., conceptual models, knowledge graphs, logical specifications) by revealing their ontological commitment in terms of their so-called truthmakers, i.e., the entities in one’s ontology that are responsible for the truth of a description. To illustrate this methodology, we employ an ontological theory of relations to explain a symbolic model encoded in the de facto standard modeling language UML. We also discuss the essential role played by ontology-driven conceptual models (resulting from this form of explanation processes) in supporting semantic interoperability tasks. Furthermore, we revisit a proposal for quality criteria for explanations from philosophy of science to assess our approach. Finally, we discuss the relation between ontological unpacking and other forms of explanation in philosophy and science, as well as in the subarea of Artificial Intelligence known as Explainable AI (XAI).

语义 "和 "本体 "这两个术语越来越多地与 "解释 "放在一起，不仅出现在科学文献中，也出现在日常社会交往中，特别是在组织内部。事实证明，本体在支持信息系统所使用的数据和知识表示结构的语义互操作性方面发挥着关键作用。随着人工智能（AI）在不同环境中的广泛应用，以及在关键环境中保证其可解释性（以及互操作性）的需求日益增加，"解释 "一词也已成为现代信息系统工程科学和技术术语的一部分。然而，所有这些术语也都严重超载。在本文中，我们将讨论对这些概念的几种解释，并强调它们之间的紧密联系。具体来说，我们讨论了一种解释概念，称为"......"，其目的是通过揭示所谓的"......"（即本体中对描述的真实性负责的实体）来解释符号领域描述（如概念模型、知识图谱、逻辑规范）。为了说明这种方法，我们采用本体论关系理论来解释用标准建模语言 UML 编码的符号模型。我们还讨论了本体驱动的概念模型（由这种形式的解释过程产生）在支持语义互操作性任务中发挥的重要作用。此外，我们重温了科学哲学中关于解释质量标准的建议，以评估我们的方法。最后，我们讨论了本体论解包与哲学和科学中的其他解释形式之间的关系，以及在人工智能子领域 "可解释人工智能"（XAI）中的关系。

{"title":"Explanation, semantics, and ontology","authors":"Giancarlo Guizzardi , Nicola Guarino","doi":"10.1016/j.datak.2024.102325","DOIUrl":"10.1016/j.datak.2024.102325","url":null,"abstract":"<div><p>The terms ‘semantics’ and ‘ontology’ are increasingly appearing together with ‘explanation’, not only in the scientific literature, but also in everyday social interactions, in particular, within organizations. Ontologies have been shown to play a key role in supporting the semantic interoperability of data and knowledge representation structures used by information systems. With the proliferation of applications of Artificial Intelligence (AI) in different settings and the increasing need to guarantee their explainability (but also their interoperability) in critical contexts, the term ‘explanation’ has also become part of the scientific and technical jargon of modern information systems engineering. However, all of these terms are also significantly overloaded. In this paper, we address several interpretations of these notions, with an emphasis on their strong connection. Specifically, we discuss a notion of explanation termed <em>ontological unpacking</em>, which aims at explaining symbolic domain descriptions (e.g., conceptual models, knowledge graphs, logical specifications) by revealing their <em>ontological commitment</em> in terms of their so-called <em>truthmakers</em>, i.e., the entities in one’s ontology that are responsible for the truth of a description. To illustrate this methodology, we employ an ontological theory of relations to explain a symbolic model encoded in the <em>de facto</em> standard modeling language UML. We also discuss the essential role played by ontology-driven conceptual models (resulting from this form of explanation processes) in supporting semantic interoperability tasks. Furthermore, we revisit a proposal for quality criteria for explanations from philosophy of science to assess our approach. Finally, we discuss the relation between ontological unpacking and other forms of explanation in philosophy and science, as well as in the subarea of Artificial Intelligence known as Explainable AI (XAI).</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"153 ","pages":"Article 102325"},"PeriodicalIF":2.7,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000491/pdfft?md5=79cddbdaff8702c03d78a624d5f422a3&pid=1-s2.0-S0169023X24000491-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141943335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Topological querying of music scores 乐谱拓扑查询

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2024-06-20 DOI: 10.1016/j.datak.2024.102340

Philippe Rigaux, Virginie Thion

For centuries, sheet music scores have been the traditional way to preserve and disseminate Western music works. Nowadays, their content can be encoded in digital formats, making possible to store music score data in digital score libraries (DSL). To supply intelligent services (extracting and analysing relevant information from data), the new generation of DSL has to rely on digital representations of the score content as structured objects apt at being manipulated by high-level operators. In the present paper, we propose the Muster model, a graph-based data model for representing the music content of a digital score, and we discuss the querying of such data through graph pattern queries. We then present a proof-of-concept of this approach, which allows storing graph-based representations of music scores in the Neo4j database, and performing musical pattern searches through graph pattern queries with the Cypher query language. A benchmark study, using (real) datasets stemming from the Neuma Digital Score Library, complements this implementation.

几个世纪以来，乐谱一直是保存和传播西方音乐作品的传统方式。如今，乐谱的内容可以用数字格式进行编码，从而可以在数字乐谱图书馆（DSL）中存储乐谱数据。为了提供智能服务（从数据中提取和分析相关信息），新一代数字乐谱图书馆必须依靠乐谱内容的数字表示法，将其作为适合高级操作员操作的结构化对象。在本文中，我们提出了 Muster 模型（一种基于图的数据模型，用于表示数字乐谱的音乐内容），并讨论了如何通过图模式查询此类数据。然后，我们介绍了这种方法的概念验证，它允许在 Neo4j 数据库中存储基于图的乐谱表示，并通过 Cypher 查询语言的图模式查询执行音乐模式搜索。利用 Neuma 数字乐谱库中的（真实）数据集进行的基准研究对该实现方法进行了补充。

{"title":"Topological querying of music scores","authors":"Philippe Rigaux, Virginie Thion","doi":"10.1016/j.datak.2024.102340","DOIUrl":"https://doi.org/10.1016/j.datak.2024.102340","url":null,"abstract":"<div><p>For centuries, <em>sheet music scores</em> have been the traditional way to preserve and disseminate Western music works. Nowadays, their content can be encoded in digital formats, making possible to store music score data in digital score libraries (DSL). To supply intelligent services (extracting and analysing relevant information from data), the new generation of DSL has to rely on digital representations of the score content as structured objects apt at being manipulated by high-level operators. In the present paper, we propose the <em>Muster</em> model, a graph-based data model for representing the music content of a digital score, and we discuss the querying of such data through graph pattern queries. We then present a proof-of-concept of this approach, which allows storing graph-based representations of music scores in the Neo4j database, and performing musical pattern searches through graph pattern queries with the Cypher query language. A benchmark study, using (real) datasets stemming from the <span>Neuma</span> Digital Score Library, complements this implementation.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"153 ","pages":"Article 102340"},"PeriodicalIF":2.7,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000648/pdfft?md5=422689552133a28488b6610063f13879&pid=1-s2.0-S0169023X24000648-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141484752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Entity type inference based on path walking and inter-types relationships 基于路径行走和类型间关系的实体类型推断

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2024-06-19 DOI: 10.1016/j.datak.2024.102337

Yi Gan , Zhihui Su , Gaoyong Lu , Pengju Zhang , Aixiang Cui , Jiawei Jiang , Duanbing Chen

As a crucial task for knowledge graphs (KGs), knowledge graph entity type inference (KGET) has garnered increasing attention in recent years. However, recent methods overlook the long-distance information pertaining to entities and the inter-types relationships. The neglect of long-distance information results in the omission of crucial entity relationships and neighbors, consequently leading to the loss of path information associated with missing types. To address this, a path-walking strategy is utilized to identify two-hop triplet paths of the crucial entity for encoding long-distance entity information. Moreover, the absence of inter-types relationships can lead to the loss of the neighborhood information of types, such as co-occurrence information. To ensure a comprehensive understanding of inter-types relationships, we consider interactions not only with the types of single entity but also with different types of entities. Finally, in order to comprehensively represent entities for missing types, considering both the dimensions of path information and neighborhood information, we propose an entity type inference model based on path walking and inter-types relationships, denoted as “ET-PT”. This model effectively extracts comprehensive entity information, thereby obtaining the most complete semantic representation of entities. The experimental results on publicly available datasets demonstrate that the proposed method outperforms state-of-the-art approaches.

作为知识图谱（KG）的一项重要任务，知识图谱实体类型推断（KGET）近年来受到越来越多的关注。然而，最近的方法忽略了与实体和类型间关系相关的远距离信息。对远距离信息的忽视会导致关键实体关系和邻接关系的遗漏，从而导致与缺失类型相关的路径信息的丢失。为了解决这个问题，我们采用了路径行走策略来识别关键实体的两跳三重路径，以编码长距离实体信息。此外，类型间关系的缺失会导致类型邻域信息的丢失，如共现信息。为了确保对类型间关系的全面理解，我们不仅要考虑与单一实体类型的交互，还要考虑与不同类型实体的交互。最后，为了全面表示缺失类型的实体，同时考虑路径信息和邻域信息两个维度，我们提出了一种基于路径行走和类型间关系的实体类型推断模型，称为 "ET-PT"。该模型能有效提取全面的实体信息，从而获得最完整的实体语义表征。在公开数据集上的实验结果表明，所提出的方法优于最先进的方法。

{"title":"Entity type inference based on path walking and inter-types relationships","authors":"Yi Gan , Zhihui Su , Gaoyong Lu , Pengju Zhang , Aixiang Cui , Jiawei Jiang , Duanbing Chen","doi":"10.1016/j.datak.2024.102337","DOIUrl":"https://doi.org/10.1016/j.datak.2024.102337","url":null,"abstract":"<div><p>As a crucial task for knowledge graphs (KGs), knowledge graph entity type inference (KGET) has garnered increasing attention in recent years. However, recent methods overlook the long-distance information pertaining to entities and the inter-types relationships. The neglect of long-distance information results in the omission of crucial entity relationships and neighbors, consequently leading to the loss of path information associated with missing types. To address this, a path-walking strategy is utilized to identify two-hop triplet paths of the crucial entity for encoding long-distance entity information. Moreover, the absence of inter-types relationships can lead to the loss of the neighborhood information of types, such as co-occurrence information. To ensure a comprehensive understanding of inter-types relationships, we consider interactions not only with the types of single entity but also with different types of entities. Finally, in order to comprehensively represent entities for missing types, considering both the dimensions of path information and neighborhood information, we propose an entity type inference model based on path walking and inter-types relationships, denoted as “ET-PT”. This model effectively extracts comprehensive entity information, thereby obtaining the most complete semantic representation of entities. The experimental results on publicly available datasets demonstrate that the proposed method outperforms state-of-the-art approaches.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"153 ","pages":"Article 102337"},"PeriodicalIF":2.7,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000612/pdfft?md5=3856b1f399f41f93c93401f8aea9503b&pid=1-s2.0-S0169023X24000612-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141484693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An explainable machine learning approach for automated medical decision support of heart disease 用于心脏病自动医疗决策支持的可解释机器学习方法

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2024-06-19 DOI: 10.1016/j.datak.2024.102339

Francisco Mesquita, Gonçalo Marques

Coronary Heart Disease (CHD) is the dominant cause of mortality around the world. Every year, it causes about 3.9 million deaths in Europe and 1.8 million in the European Union (EU). It is responsible for 45 % and 37 % of all deaths in Europe and the European Union, respectively. Using machine learning (ML) to predict heart diseases is one of the most promising research topics, as it can improve healthcare and consequently increase the longevity of people's lives. However, although the ability to interpret the results of the predictive model is essential, most of the related studies do not propose explainable methods. To address this problem, this paper presents a classification method that not only exhibits reliable performance but is also interpretable, ensuring transparency in its decision-making process. SHapley Additive exPlanations, known as the SHAP method was chosen for model interpretability. This approach presents a comparison between different classifiers and parameter tuning techniques, providing all the details necessary to replicate the experiment and help future researchers working in the field. The proposed model achieves similar performance to those proposed in the literature, and its predictions are fully interpretable.

冠心病（CHD）是全球最主要的死亡原因。每年，冠心病在欧洲造成约 390 万人死亡，在欧盟造成 180 万人死亡。在欧洲和欧盟的所有死亡病例中，心脏病分别占 45% 和 37%。利用机器学习（ML）预测心脏病是最有前途的研究课题之一，因为它可以改善医疗保健，从而延长人们的寿命。然而，尽管解释预测模型结果的能力至关重要，但大多数相关研究并未提出可解释的方法。为了解决这个问题，本文提出了一种分类方法，它不仅性能可靠，而且可以解释，确保了决策过程的透明度。出于模型可解释性的考虑，本文选择了 SHapley Additive exPlanations 方法，即 SHAP 方法。这种方法对不同的分类器和参数调整技术进行了比较，提供了复制实验所需的所有细节，有助于未来从事该领域工作的研究人员。所提出的模型与文献中提出的模型性能相似，其预测结果也完全可以解释。

{"title":"An explainable machine learning approach for automated medical decision support of heart disease","authors":"Francisco Mesquita, Gonçalo Marques","doi":"10.1016/j.datak.2024.102339","DOIUrl":"https://doi.org/10.1016/j.datak.2024.102339","url":null,"abstract":"<div><p>Coronary Heart Disease (CHD) is the dominant cause of mortality around the world. Every year, it causes about 3.9 million deaths in Europe and 1.8 million in the European Union (EU). It is responsible for 45 % and 37 % of all deaths in Europe and the European Union, respectively. Using machine learning (ML) to predict heart diseases is one of the most promising research topics, as it can improve healthcare and consequently increase the longevity of people's lives. However, although the ability to interpret the results of the predictive model is essential, most of the related studies do not propose explainable methods. To address this problem, this paper presents a classification method that not only exhibits reliable performance but is also interpretable, ensuring transparency in its decision-making process. SHapley Additive exPlanations, known as the SHAP method was chosen for model interpretability. This approach presents a comparison between different classifiers and parameter tuning techniques, providing all the details necessary to replicate the experiment and help future researchers working in the field. The proposed model achieves similar performance to those proposed in the literature, and its predictions are fully interpretable.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"153 ","pages":"Article 102339"},"PeriodicalIF":2.7,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000636/pdfft?md5=9bdfa8117c5ce50d0508986a80981671&pid=1-s2.0-S0169023X24000636-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141592969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A comprehensive methodology to construct standardised datasets for Science and Technology Parks 构建科技园区标准化数据集的综合方法

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2024-06-18 DOI: 10.1016/j.datak.2024.102338

Olga Francés, Javi Fernández, José Abreu-Salas, Yoan Gutiérrez, Manuel Palomar

This work presents a standardised approach to create datasets for Science and Technology Parks (STPs), facilitating future analysis of STP characteristics, trends and performance. STPs are the most representative examples of innovation ecosystems. The ETL (extraction-transformation-load) structure was adapted to a global field study of STPs. A selection stage and quality check were incorporated, and the methodology was applied to Spanish STPs. This study applies diverse techniques such as expert labelling and information extraction which uses language technologies. A novel methodology for building quality and standardised STP datasets was designed and applied to a Spanish STP case study with 49 STPs. An updatable dataset and a list of the main features impacting STPs are presented. Twenty-one (n = 21) core features were refined and selected, with fifteen of them (71.4 %) being robust enough for developing further quality analysis. The methodology presented integrates different sources with heterogeneous information that is often decentralised, disaggregated and in different formats: excel files, and unstructured information in HTML or PDF format. The existence of this updatable dataset and the defined methodology will enable powerful AI tools to be applied that focus on more sophisticated analysis, such as taxonomy, monitoring, and predictive and prescriptive analytics in the innovation ecosystems field.

这项工作提出了一种为科技园（STPs）创建数据集的标准化方法，有助于今后对科技园的特点、趋势和绩效进行分析。科技园是创新生态系统中最具代表性的例子。ETL（提取-转换-加载）结构适用于科技园的全球实地研究。其中包括选择阶段和质量检查，该方法适用于西班牙的科技园。这项研究应用了多种技术，如专家标签和使用语言技术的信息提取。设计了一种建立高质量和标准化 STP 数据集的新方法，并将其应用于包含 49 个 STP 的西班牙 STP 案例研究。文中介绍了可更新的数据集和影响科技园的主要特征列表。对 21 个核心特征进行了提炼和筛选，其中 15 个（71.4%）足以进行进一步的质量分析。所介绍的方法整合了不同来源的异构信息，这些信息通常是分散的、分类的和不同格式的：这些信息通常是分散的、分类的和不同格式的：excel 文件和 HTML 或 PDF 格式的非结构化信息。有了这个可更新的数据集和确定的方法，就可以应用功能强大的人工智能工具，进行更复杂的分析，如创新生态系统领域的分类、监测、预测和规范分析。

{"title":"A comprehensive methodology to construct standardised datasets for Science and Technology Parks","authors":"Olga Francés, Javi Fernández, José Abreu-Salas, Yoan Gutiérrez, Manuel Palomar","doi":"10.1016/j.datak.2024.102338","DOIUrl":"https://doi.org/10.1016/j.datak.2024.102338","url":null,"abstract":"<div><p>This work presents a standardised approach to create datasets for Science and Technology Parks (STPs), facilitating future analysis of STP characteristics, trends and performance. STPs are the most representative examples of innovation ecosystems. The ETL (extraction-transformation-load) structure was adapted to a global field study of STPs. A selection stage and quality check were incorporated, and the methodology was applied to Spanish STPs. This study applies diverse techniques such as expert labelling and information extraction which uses language technologies. A novel methodology for building quality and standardised STP datasets was designed and applied to a Spanish STP case study with 49 STPs. An updatable dataset and a list of the main features impacting STPs are presented. Twenty-one (<em>n</em> = 21) core features were refined and selected, with fifteen of them (71.4 %) being robust enough for developing further quality analysis. The methodology presented integrates different sources with heterogeneous information that is often decentralised, disaggregated and in different formats: excel files, and unstructured information in HTML or PDF format. The existence of this updatable dataset and the defined methodology will enable powerful AI tools to be applied that focus on more sophisticated analysis, such as taxonomy, monitoring, and predictive and prescriptive analytics in the innovation ecosystems field.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"153 ","pages":"Article 102338"},"PeriodicalIF":2.7,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141542542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Providing healthcare shopping advice through knowledge-based virtual agents 通过基于知识的虚拟代理提供医疗保健购物建议

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2024-06-14 DOI: 10.1016/j.datak.2024.102336

Claire Deventer, Pietro Zidda

Knowledge-based virtual shopping agents, that advise their users about which products to buy, are well used in technical markets such as healthcare e-commerce. To ensure the proper adoption of this technology, it is important to consider aspects of users’ psychology early in the software design process. When traditional adoption models such as UTAUT-2 work well for many technologies, they overlook important specificities of the healthcare e-commerce domain and of knowledge-based virtual agents technology. Drawing upon health information technology and virtual agent literature, we propose a complementary adoption model incorporating new predictors and moderators reflecting these domains’ specificities. The model is tested using 903 observations gathered through an online survey conducted in collaboration with a major actor in the herbal medicine market. Our model can serve as a basis for many phases of the knowledge-based agents software development. We propose actionable recommendations for practitioners and ideas for further research.

基于知识的虚拟购物代理可以向用户提供购买产品的建议，在医疗保健电子商务等技术市场得到了广泛应用。为确保这项技术的正确应用，在软件设计过程中尽早考虑用户心理方面的问题非常重要。传统的采用模型（如 UTAUT-2 模型）对许多技术都很有效，但它们忽略了医疗保健电子商务领域和基于知识的虚拟代理技术的重要特性。借鉴医疗信息技术和虚拟代理文献，我们提出了一个互补的采用模型，其中包含了反映这些领域特殊性的新的预测因子和调节因子。我们通过与中草药市场的主要参与者合作开展在线调查，收集了 903 项观察结果，并对模型进行了测试。我们的模型可作为基于知识的代理软件开发许多阶段的基础。我们为从业人员提出了可行的建议和进一步研究的想法。

引用次数: 0