首页 > 最新文献

Data & Knowledge Engineering最新文献

英文 中文
Increasing the precision of public transit user activity location detection from smart card data analysis via spatial–temporal DBSCAN 通过时空 DBSCAN 提高智能卡数据分析中公交用户活动位置检测的精度
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-15 DOI: 10.1016/j.datak.2024.102343

Smart Card (SC) systems have been increasingly adopted by public transit (PT) agencies all over the world, which facilitates not only fare collection but also PT service analyses and evaluations. Spatial clustering is one of the most important methods to investigate this big data in terms of activity locations, travel patterns, user behaviours, etc. Besides spatio-temporal analysis of the clusters provide further precision for detection of PT traveller activity locations and durations. This study focuses on investigation and comparison of the effectiveness of two density-based clustering algorithms, DBSCAN, and ST-DBSCAN. The numeric results are obtained using SC data (public bus system) from the metropolitan city of Konya, Turkey, and clustering algorithms are applied to a sample of this smart card data, and activity clusters are detected for the users. The results of the study suggested that ST-DBSCAN constitutes more compact clusters in both time and space for transportation researchers who want to accurately detect passengers’ individual activity regions using SC data.

世界各地的公共交通(PT)机构越来越多地采用智能卡(SC)系统,这不仅方便了收费,也方便了对公共交通服务的分析和评估。空间聚类是研究活动地点、出行模式、用户行为等大数据的最重要方法之一。此外,对聚类的时空分析还能进一步精确检测公共交通乘客的活动地点和持续时间。本研究重点调查和比较了两种基于密度的聚类算法--DBSCAN 和 ST-DBSCAN--的有效性。研究使用土耳其科尼亚市的 SC 数据(公共汽车系统)得出了数值结果,并将聚类算法应用于该智能卡数据样本,检测出用户的活动聚类。研究结果表明,ST-DBSCAN 在时间和空间上都能构成更紧凑的聚类,适用于希望利用 SC 数据准确检测乘客个人活动区域的交通研究人员。
{"title":"Increasing the precision of public transit user activity location detection from smart card data analysis via spatial–temporal DBSCAN","authors":"","doi":"10.1016/j.datak.2024.102343","DOIUrl":"10.1016/j.datak.2024.102343","url":null,"abstract":"<div><p>Smart Card (SC) systems have been increasingly adopted by public transit (PT) agencies all over the world, which facilitates not only fare collection but also PT service analyses and evaluations. Spatial clustering is one of the most important methods to investigate this big data in terms of activity locations, travel patterns, user behaviours, etc. Besides spatio-temporal analysis of the clusters provide further precision for detection of PT traveller activity locations and durations. This study focuses on investigation and comparison of the effectiveness of two density-based clustering algorithms, DBSCAN, and ST-DBSCAN. The numeric results are obtained using SC data (public bus system) from the metropolitan city of Konya, Turkey, and clustering algorithms are applied to a sample of this smart card data, and activity clusters are detected for the users. The results of the study suggested that ST-DBSCAN constitutes more compact clusters in both time and space for transportation researchers who want to accurately detect passengers’ individual activity regions using SC data.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141716049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating quality of ontology-driven conceptual models abstractions 评估本体驱动概念模型抽象的质量
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-14 DOI: 10.1016/j.datak.2024.102342

The complexity of an (ontology-driven) conceptual model highly correlates with the complexity of the domain and software for which it is designed. With that in mind, an algorithm for producing ontology-driven conceptual model abstractions was previously proposed. In this paper, we empirically evaluate the quality of the abstractions produced by it. First, we have implemented and tested the last version of the algorithm over a FAIR catalog of models represented in the ontology-driven conceptual modeling language OntoUML. Second, we performed three user studies to evaluate the usefulness of the resulting abstractions as perceived by modelers. This paper reports on the findings of these experiments and reflects on how they can be exploited to improve the existing algorithm.

本体驱动)概念模型的复杂性与设计该模型的领域和软件的复杂性密切相关。有鉴于此,我们之前提出了一种生成本体驱动概念模型抽象的算法。在本文中,我们对该算法生成的抽象的质量进行了实证评估。首先,我们在用本体驱动的概念模型语言 OntoUML 表示的模型 FAIR 目录上实现并测试了该算法的最后一个版本。其次,我们进行了三项用户研究,以评估建模者所感知的抽象结果的有用性。本文报告了这些实验的结果,并对如何利用这些结果改进现有算法进行了思考。
{"title":"Evaluating quality of ontology-driven conceptual models abstractions","authors":"","doi":"10.1016/j.datak.2024.102342","DOIUrl":"10.1016/j.datak.2024.102342","url":null,"abstract":"<div><p>The complexity of an (ontology-driven) conceptual model highly correlates with the complexity of the domain and software for which it is designed. With that in mind, an algorithm for producing ontology-driven conceptual model abstractions was previously proposed. In this paper, we empirically evaluate the quality of the abstractions produced by it. First, we have implemented and tested the last version of the algorithm over a FAIR catalog of models represented in the ontology-driven conceptual modeling language OntoUML. Second, we performed three user studies to evaluate the usefulness of the resulting abstractions as perceived by modelers. This paper reports on the findings of these experiments and reflects on how they can be exploited to improve the existing algorithm.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000661/pdfft?md5=3da15f24c92422d6dac0dc27c996166b&pid=1-s2.0-S0169023X24000661-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141705730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An interactive approach to semantic enrichment with geospatial data 利用地理空间数据丰富语义的互动方法
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-04 DOI: 10.1016/j.datak.2024.102341

The ubiquitous availability of datasets has spurred the utilization of Artificial Intelligence methods and models to extract valuable insights, unearth hidden patterns, and predict future trends. However, the current process of data collection and linking heavily relies on expert knowledge and domain-specific understanding, which engenders substantial costs in terms of both time and financial resources. Therefore, streamlining the data acquisition, harmonization, and enrichment procedures to deliver high-fidelity datasets readily usable for analytics is paramount. This paper explores the capabilities of SemTUI, a comprehensive framework designed to support the enrichment of tabular data by leveraging semantics and user interaction. Utilizing SemTUI, an iterative and interactive approach is proposed to enhance the flexibility, usability and efficiency of geospatial data enrichment. The approach is evaluated through a pilot case study focused on urban planning, with a particular emphasis on geocoding. Using a real-world scenario involving the analysis of kindergarten accessibility within walking distance, the study demonstrates the proficiency of SemTUI in generating precise and semantically enriched location data. The incorporation of human feedback in the enrichment process successfully enhances the quality of the resulting dataset, highlighting SemTUI’s potential for broader applications in geospatial analysis and its usability for users with limited expertise in manipulating geospatial data.

无处不在的数据集促使人们利用人工智能方法和模型来提取有价值的见解、发掘隐藏的模式并预测未来趋势。然而,目前的数据收集和链接过程严重依赖于专家知识和对特定领域的理解,这在时间和财政资源方面都产生了巨大的成本。因此,最重要的是简化数据采集、协调和丰富程序,以提供可随时用于分析的高保真数据集。本文探讨了 SemTUI 的功能,这是一个综合框架,旨在通过利用语义和用户交互来支持表格数据的丰富。利用 SemTUI,提出了一种迭代和交互式方法,以提高地理空间数据丰富的灵活性、可用性和效率。该方法通过一项以城市规划为重点的试点案例研究进行了评估,特别强调了地理编码。该研究使用了一个涉及分析步行距离内幼儿园可达性的真实场景,展示了 SemTUI 在生成精确且语义丰富的位置数据方面的能力。在丰富过程中加入人工反馈,成功地提高了所生成数据集的质量,凸显了 SemTUI 在地理空间分析领域更广泛应用的潜力,以及它对于在地理空间数据操作方面专业知识有限的用户的可用性。
{"title":"An interactive approach to semantic enrichment with geospatial data","authors":"","doi":"10.1016/j.datak.2024.102341","DOIUrl":"10.1016/j.datak.2024.102341","url":null,"abstract":"<div><p>The ubiquitous availability of datasets has spurred the utilization of Artificial Intelligence methods and models to extract valuable insights, unearth hidden patterns, and predict future trends. However, the current process of data collection and linking heavily relies on expert knowledge and domain-specific understanding, which engenders substantial costs in terms of both time and financial resources. Therefore, streamlining the data acquisition, harmonization, and enrichment procedures to deliver high-fidelity datasets readily usable for analytics is paramount. This paper explores the capabilities of <em>SemTUI</em>, a comprehensive framework designed to support the enrichment of tabular data by leveraging semantics and user interaction. Utilizing SemTUI, an iterative and interactive approach is proposed to enhance the flexibility, usability and efficiency of geospatial data enrichment. The approach is evaluated through a pilot case study focused on urban planning, with a particular emphasis on geocoding. Using a real-world scenario involving the analysis of kindergarten accessibility within walking distance, the study demonstrates the proficiency of SemTUI in generating precise and semantically enriched location data. The incorporation of human feedback in the enrichment process successfully enhances the quality of the resulting dataset, highlighting SemTUI’s potential for broader applications in geospatial analysis and its usability for users with limited expertise in manipulating geospatial data.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X2400065X/pdfft?md5=969535621599adcaa2ec5e5d12e392b3&pid=1-s2.0-S0169023X2400065X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141698385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explanation, semantics, and ontology 解释、语义和本体论
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-06-25 DOI: 10.1016/j.datak.2024.102325

The terms ‘semantics’ and ‘ontology’ are increasingly appearing together with ‘explanation’, not only in the scientific literature, but also in everyday social interactions, in particular, within organizations. Ontologies have been shown to play a key role in supporting the semantic interoperability of data and knowledge representation structures used by information systems. With the proliferation of applications of Artificial Intelligence (AI) in different settings and the increasing need to guarantee their explainability (but also their interoperability) in critical contexts, the term ‘explanation’ has also become part of the scientific and technical jargon of modern information systems engineering. However, all of these terms are also significantly overloaded. In this paper, we address several interpretations of these notions, with an emphasis on their strong connection. Specifically, we discuss a notion of explanation termed ontological unpacking, which aims at explaining symbolic domain descriptions (e.g., conceptual models, knowledge graphs, logical specifications) by revealing their ontological commitment in terms of their so-called truthmakers, i.e., the entities in one’s ontology that are responsible for the truth of a description. To illustrate this methodology, we employ an ontological theory of relations to explain a symbolic model encoded in the de facto standard modeling language UML. We also discuss the essential role played by ontology-driven conceptual models (resulting from this form of explanation processes) in supporting semantic interoperability tasks. Furthermore, we revisit a proposal for quality criteria for explanations from philosophy of science to assess our approach. Finally, we discuss the relation between ontological unpacking and other forms of explanation in philosophy and science, as well as in the subarea of Artificial Intelligence known as Explainable AI (XAI).

语义 "和 "本体 "这两个术语越来越多地与 "解释 "放在一起,不仅出现在科学文献中,也出现在日常社会交往中,特别是在组织内部。事实证明,本体在支持信息系统所使用的数据和知识表示结构的语义互操作性方面发挥着关键作用。随着人工智能(AI)在不同环境中的广泛应用,以及在关键环境中保证其可解释性(以及互操作性)的需求日益增加,"解释 "一词也已成为现代信息系统工程科学和技术术语的一部分。然而,所有这些术语也都严重超载。在本文中,我们将讨论对这些概念的几种解释,并强调它们之间的紧密联系。具体来说,我们讨论了一种解释概念,称为"......",其目的是通过揭示所谓的"......"(即本体中对描述的真实性负责的实体)来解释符号领域描述(如概念模型、知识图谱、逻辑规范)。为了说明这种方法,我们采用本体论关系理论来解释用标准建模语言 UML 编码的符号模型。我们还讨论了本体驱动的概念模型(由这种形式的解释过程产生)在支持语义互操作性任务中发挥的重要作用。此外,我们重温了科学哲学中关于解释质量标准的建议,以评估我们的方法。最后,我们讨论了本体论解包与哲学和科学中的其他解释形式之间的关系,以及在人工智能子领域 "可解释人工智能"(XAI)中的关系。
{"title":"Explanation, semantics, and ontology","authors":"","doi":"10.1016/j.datak.2024.102325","DOIUrl":"10.1016/j.datak.2024.102325","url":null,"abstract":"<div><p>The terms ‘semantics’ and ‘ontology’ are increasingly appearing together with ‘explanation’, not only in the scientific literature, but also in everyday social interactions, in particular, within organizations. Ontologies have been shown to play a key role in supporting the semantic interoperability of data and knowledge representation structures used by information systems. With the proliferation of applications of Artificial Intelligence (AI) in different settings and the increasing need to guarantee their explainability (but also their interoperability) in critical contexts, the term ‘explanation’ has also become part of the scientific and technical jargon of modern information systems engineering. However, all of these terms are also significantly overloaded. In this paper, we address several interpretations of these notions, with an emphasis on their strong connection. Specifically, we discuss a notion of explanation termed <em>ontological unpacking</em>, which aims at explaining symbolic domain descriptions (e.g., conceptual models, knowledge graphs, logical specifications) by revealing their <em>ontological commitment</em> in terms of their so-called <em>truthmakers</em>, i.e., the entities in one’s ontology that are responsible for the truth of a description. To illustrate this methodology, we employ an ontological theory of relations to explain a symbolic model encoded in the <em>de facto</em> standard modeling language UML. We also discuss the essential role played by ontology-driven conceptual models (resulting from this form of explanation processes) in supporting semantic interoperability tasks. Furthermore, we revisit a proposal for quality criteria for explanations from philosophy of science to assess our approach. Finally, we discuss the relation between ontological unpacking and other forms of explanation in philosophy and science, as well as in the subarea of Artificial Intelligence known as Explainable AI (XAI).</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000491/pdfft?md5=79cddbdaff8702c03d78a624d5f422a3&pid=1-s2.0-S0169023X24000491-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141943335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Topological querying of music scores 乐谱拓扑查询
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-06-20 DOI: 10.1016/j.datak.2024.102340
Philippe Rigaux, Virginie Thion

For centuries, sheet music scores have been the traditional way to preserve and disseminate Western music works. Nowadays, their content can be encoded in digital formats, making possible to store music score data in digital score libraries (DSL). To supply intelligent services (extracting and analysing relevant information from data), the new generation of DSL has to rely on digital representations of the score content as structured objects apt at being manipulated by high-level operators. In the present paper, we propose the Muster model, a graph-based data model for representing the music content of a digital score, and we discuss the querying of such data through graph pattern queries. We then present a proof-of-concept of this approach, which allows storing graph-based representations of music scores in the Neo4j database, and performing musical pattern searches through graph pattern queries with the Cypher query language. A benchmark study, using (real) datasets stemming from the Neuma Digital Score Library, complements this implementation.

几个世纪以来,乐谱一直是保存和传播西方音乐作品的传统方式。如今,乐谱的内容可以用数字格式进行编码,从而可以在数字乐谱图书馆(DSL)中存储乐谱数据。为了提供智能服务(从数据中提取和分析相关信息),新一代数字乐谱图书馆必须依靠乐谱内容的数字表示法,将其作为适合高级操作员操作的结构化对象。在本文中,我们提出了 Muster 模型(一种基于图的数据模型,用于表示数字乐谱的音乐内容),并讨论了如何通过图模式查询此类数据。然后,我们介绍了这种方法的概念验证,它允许在 Neo4j 数据库中存储基于图的乐谱表示,并通过 Cypher 查询语言的图模式查询执行音乐模式搜索。利用 Neuma 数字乐谱库中的(真实)数据集进行的基准研究对该实现方法进行了补充。
{"title":"Topological querying of music scores","authors":"Philippe Rigaux,&nbsp;Virginie Thion","doi":"10.1016/j.datak.2024.102340","DOIUrl":"https://doi.org/10.1016/j.datak.2024.102340","url":null,"abstract":"<div><p>For centuries, <em>sheet music scores</em> have been the traditional way to preserve and disseminate Western music works. Nowadays, their content can be encoded in digital formats, making possible to store music score data in digital score libraries (DSL). To supply intelligent services (extracting and analysing relevant information from data), the new generation of DSL has to rely on digital representations of the score content as structured objects apt at being manipulated by high-level operators. In the present paper, we propose the <em>Muster</em> model, a graph-based data model for representing the music content of a digital score, and we discuss the querying of such data through graph pattern queries. We then present a proof-of-concept of this approach, which allows storing graph-based representations of music scores in the Neo4j database, and performing musical pattern searches through graph pattern queries with the Cypher query language. A benchmark study, using (real) datasets stemming from the <span>Neuma</span> Digital Score Library, complements this implementation.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000648/pdfft?md5=422689552133a28488b6610063f13879&pid=1-s2.0-S0169023X24000648-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141484752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Entity type inference based on path walking and inter-types relationships 基于路径行走和类型间关系的实体类型推断
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-06-19 DOI: 10.1016/j.datak.2024.102337
Yi Gan , Zhihui Su , Gaoyong Lu , Pengju Zhang , Aixiang Cui , Jiawei Jiang , Duanbing Chen

As a crucial task for knowledge graphs (KGs), knowledge graph entity type inference (KGET) has garnered increasing attention in recent years. However, recent methods overlook the long-distance information pertaining to entities and the inter-types relationships. The neglect of long-distance information results in the omission of crucial entity relationships and neighbors, consequently leading to the loss of path information associated with missing types. To address this, a path-walking strategy is utilized to identify two-hop triplet paths of the crucial entity for encoding long-distance entity information. Moreover, the absence of inter-types relationships can lead to the loss of the neighborhood information of types, such as co-occurrence information. To ensure a comprehensive understanding of inter-types relationships, we consider interactions not only with the types of single entity but also with different types of entities. Finally, in order to comprehensively represent entities for missing types, considering both the dimensions of path information and neighborhood information, we propose an entity type inference model based on path walking and inter-types relationships, denoted as “ET-PT”. This model effectively extracts comprehensive entity information, thereby obtaining the most complete semantic representation of entities. The experimental results on publicly available datasets demonstrate that the proposed method outperforms state-of-the-art approaches.

作为知识图谱(KG)的一项重要任务,知识图谱实体类型推断(KGET)近年来受到越来越多的关注。然而,最近的方法忽略了与实体和类型间关系相关的远距离信息。对远距离信息的忽视会导致关键实体关系和邻接关系的遗漏,从而导致与缺失类型相关的路径信息的丢失。为了解决这个问题,我们采用了路径行走策略来识别关键实体的两跳三重路径,以编码长距离实体信息。此外,类型间关系的缺失会导致类型邻域信息的丢失,如共现信息。为了确保对类型间关系的全面理解,我们不仅要考虑与单一实体类型的交互,还要考虑与不同类型实体的交互。最后,为了全面表示缺失类型的实体,同时考虑路径信息和邻域信息两个维度,我们提出了一种基于路径行走和类型间关系的实体类型推断模型,称为 "ET-PT"。该模型能有效提取全面的实体信息,从而获得最完整的实体语义表征。在公开数据集上的实验结果表明,所提出的方法优于最先进的方法。
{"title":"Entity type inference based on path walking and inter-types relationships","authors":"Yi Gan ,&nbsp;Zhihui Su ,&nbsp;Gaoyong Lu ,&nbsp;Pengju Zhang ,&nbsp;Aixiang Cui ,&nbsp;Jiawei Jiang ,&nbsp;Duanbing Chen","doi":"10.1016/j.datak.2024.102337","DOIUrl":"https://doi.org/10.1016/j.datak.2024.102337","url":null,"abstract":"<div><p>As a crucial task for knowledge graphs (KGs), knowledge graph entity type inference (KGET) has garnered increasing attention in recent years. However, recent methods overlook the long-distance information pertaining to entities and the inter-types relationships. The neglect of long-distance information results in the omission of crucial entity relationships and neighbors, consequently leading to the loss of path information associated with missing types. To address this, a path-walking strategy is utilized to identify two-hop triplet paths of the crucial entity for encoding long-distance entity information. Moreover, the absence of inter-types relationships can lead to the loss of the neighborhood information of types, such as co-occurrence information. To ensure a comprehensive understanding of inter-types relationships, we consider interactions not only with the types of single entity but also with different types of entities. Finally, in order to comprehensively represent entities for missing types, considering both the dimensions of path information and neighborhood information, we propose an entity type inference model based on path walking and inter-types relationships, denoted as “ET-PT”. This model effectively extracts comprehensive entity information, thereby obtaining the most complete semantic representation of entities. The experimental results on publicly available datasets demonstrate that the proposed method outperforms state-of-the-art approaches.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000612/pdfft?md5=3856b1f399f41f93c93401f8aea9503b&pid=1-s2.0-S0169023X24000612-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141484693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An explainable machine learning approach for automated medical decision support of heart disease 用于心脏病自动医疗决策支持的可解释机器学习方法
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-06-19 DOI: 10.1016/j.datak.2024.102339
Francisco Mesquita, Gonçalo Marques

Coronary Heart Disease (CHD) is the dominant cause of mortality around the world. Every year, it causes about 3.9 million deaths in Europe and 1.8 million in the European Union (EU). It is responsible for 45 % and 37 % of all deaths in Europe and the European Union, respectively. Using machine learning (ML) to predict heart diseases is one of the most promising research topics, as it can improve healthcare and consequently increase the longevity of people's lives. However, although the ability to interpret the results of the predictive model is essential, most of the related studies do not propose explainable methods. To address this problem, this paper presents a classification method that not only exhibits reliable performance but is also interpretable, ensuring transparency in its decision-making process. SHapley Additive exPlanations, known as the SHAP method was chosen for model interpretability. This approach presents a comparison between different classifiers and parameter tuning techniques, providing all the details necessary to replicate the experiment and help future researchers working in the field. The proposed model achieves similar performance to those proposed in the literature, and its predictions are fully interpretable.

冠心病(CHD)是全球最主要的死亡原因。每年,冠心病在欧洲造成约 390 万人死亡,在欧盟造成 180 万人死亡。在欧洲和欧盟的所有死亡病例中,心脏病分别占 45% 和 37%。利用机器学习(ML)预测心脏病是最有前途的研究课题之一,因为它可以改善医疗保健,从而延长人们的寿命。然而,尽管解释预测模型结果的能力至关重要,但大多数相关研究并未提出可解释的方法。为了解决这个问题,本文提出了一种分类方法,它不仅性能可靠,而且可以解释,确保了决策过程的透明度。出于模型可解释性的考虑,本文选择了 SHapley Additive exPlanations 方法,即 SHAP 方法。这种方法对不同的分类器和参数调整技术进行了比较,提供了复制实验所需的所有细节,有助于未来从事该领域工作的研究人员。所提出的模型与文献中提出的模型性能相似,其预测结果也完全可以解释。
{"title":"An explainable machine learning approach for automated medical decision support of heart disease","authors":"Francisco Mesquita,&nbsp;Gonçalo Marques","doi":"10.1016/j.datak.2024.102339","DOIUrl":"https://doi.org/10.1016/j.datak.2024.102339","url":null,"abstract":"<div><p>Coronary Heart Disease (CHD) is the dominant cause of mortality around the world. Every year, it causes about 3.9 million deaths in Europe and 1.8 million in the European Union (EU). It is responsible for 45 % and 37 % of all deaths in Europe and the European Union, respectively. Using machine learning (ML) to predict heart diseases is one of the most promising research topics, as it can improve healthcare and consequently increase the longevity of people's lives. However, although the ability to interpret the results of the predictive model is essential, most of the related studies do not propose explainable methods. To address this problem, this paper presents a classification method that not only exhibits reliable performance but is also interpretable, ensuring transparency in its decision-making process. SHapley Additive exPlanations, known as the SHAP method was chosen for model interpretability. This approach presents a comparison between different classifiers and parameter tuning techniques, providing all the details necessary to replicate the experiment and help future researchers working in the field. The proposed model achieves similar performance to those proposed in the literature, and its predictions are fully interpretable.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000636/pdfft?md5=9bdfa8117c5ce50d0508986a80981671&pid=1-s2.0-S0169023X24000636-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141592969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comprehensive methodology to construct standardised datasets for Science and Technology Parks 构建科技园区标准化数据集的综合方法
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-06-18 DOI: 10.1016/j.datak.2024.102338
Olga Francés, Javi Fernández, José Abreu-Salas, Yoan Gutiérrez, Manuel Palomar

This work presents a standardised approach to create datasets for Science and Technology Parks (STPs), facilitating future analysis of STP characteristics, trends and performance. STPs are the most representative examples of innovation ecosystems. The ETL (extraction-transformation-load) structure was adapted to a global field study of STPs. A selection stage and quality check were incorporated, and the methodology was applied to Spanish STPs. This study applies diverse techniques such as expert labelling and information extraction which uses language technologies. A novel methodology for building quality and standardised STP datasets was designed and applied to a Spanish STP case study with 49 STPs. An updatable dataset and a list of the main features impacting STPs are presented. Twenty-one (n = 21) core features were refined and selected, with fifteen of them (71.4 %) being robust enough for developing further quality analysis. The methodology presented integrates different sources with heterogeneous information that is often decentralised, disaggregated and in different formats: excel files, and unstructured information in HTML or PDF format. The existence of this updatable dataset and the defined methodology will enable powerful AI tools to be applied that focus on more sophisticated analysis, such as taxonomy, monitoring, and predictive and prescriptive analytics in the innovation ecosystems field.

这项工作提出了一种为科技园(STPs)创建数据集的标准化方法,有助于今后对科技园的特点、趋势和绩效进行分析。科技园是创新生态系统中最具代表性的例子。ETL(提取-转换-加载)结构适用于科技园的全球实地研究。其中包括选择阶段和质量检查,该方法适用于西班牙的科技园。这项研究应用了多种技术,如专家标签和使用语言技术的信息提取。设计了一种建立高质量和标准化 STP 数据集的新方法,并将其应用于包含 49 个 STP 的西班牙 STP 案例研究。文中介绍了可更新的数据集和影响科技园的主要特征列表。对 21 个核心特征进行了提炼和筛选,其中 15 个(71.4%)足以进行进一步的质量分析。所介绍的方法整合了不同来源的异构信息,这些信息通常是分散的、分类的和不同格式的:这些信息通常是分散的、分类的和不同格式的:excel 文件和 HTML 或 PDF 格式的非结构化信息。有了这个可更新的数据集和确定的方法,就可以应用功能强大的人工智能工具,进行更复杂的分析,如创新生态系统领域的分类、监测、预测和规范分析。
{"title":"A comprehensive methodology to construct standardised datasets for Science and Technology Parks","authors":"Olga Francés,&nbsp;Javi Fernández,&nbsp;José Abreu-Salas,&nbsp;Yoan Gutiérrez,&nbsp;Manuel Palomar","doi":"10.1016/j.datak.2024.102338","DOIUrl":"https://doi.org/10.1016/j.datak.2024.102338","url":null,"abstract":"<div><p>This work presents a standardised approach to create datasets for Science and Technology Parks (STPs), facilitating future analysis of STP characteristics, trends and performance. STPs are the most representative examples of innovation ecosystems. The ETL (extraction-transformation-load) structure was adapted to a global field study of STPs. A selection stage and quality check were incorporated, and the methodology was applied to Spanish STPs. This study applies diverse techniques such as expert labelling and information extraction which uses language technologies. A novel methodology for building quality and standardised STP datasets was designed and applied to a Spanish STP case study with 49 STPs. An updatable dataset and a list of the main features impacting STPs are presented. Twenty-one (<em>n</em> = 21) core features were refined and selected, with fifteen of them (71.4 %) being robust enough for developing further quality analysis. The methodology presented integrates different sources with heterogeneous information that is often decentralised, disaggregated and in different formats: excel files, and unstructured information in HTML or PDF format. The existence of this updatable dataset and the defined methodology will enable powerful AI tools to be applied that focus on more sophisticated analysis, such as taxonomy, monitoring, and predictive and prescriptive analytics in the innovation ecosystems field.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141542542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Providing healthcare shopping advice through knowledge-based virtual agents 通过基于知识的虚拟代理提供医疗保健购物建议
IF 2.7 3区 计算机科学 Q2 Decision Sciences Pub Date : 2024-06-14 DOI: 10.1016/j.datak.2024.102336
Claire Deventer, Pietro Zidda

Knowledge-based virtual shopping agents, that advise their users about which products to buy, are well used in technical markets such as healthcare e-commerce. To ensure the proper adoption of this technology, it is important to consider aspects of users’ psychology early in the software design process. When traditional adoption models such as UTAUT-2 work well for many technologies, they overlook important specificities of the healthcare e-commerce domain and of knowledge-based virtual agents technology. Drawing upon health information technology and virtual agent literature, we propose a complementary adoption model incorporating new predictors and moderators reflecting these domains’ specificities. The model is tested using 903 observations gathered through an online survey conducted in collaboration with a major actor in the herbal medicine market. Our model can serve as a basis for many phases of the knowledge-based agents software development. We propose actionable recommendations for practitioners and ideas for further research.

基于知识的虚拟购物代理可以向用户提供购买产品的建议,在医疗保健电子商务等技术市场得到了广泛应用。为确保这项技术的正确应用,在软件设计过程中尽早考虑用户心理方面的问题非常重要。传统的采用模型(如 UTAUT-2 模型)对许多技术都很有效,但它们忽略了医疗保健电子商务领域和基于知识的虚拟代理技术的重要特性。借鉴医疗信息技术和虚拟代理文献,我们提出了一个互补的采用模型,其中包含了反映这些领域特殊性的新的预测因子和调节因子。我们通过与中草药市场的主要参与者合作开展在线调查,收集了 903 项观察结果,并对模型进行了测试。我们的模型可作为基于知识的代理软件开发许多阶段的基础。我们为从业人员提出了可行的建议和进一步研究的想法。
{"title":"Providing healthcare shopping advice through knowledge-based virtual agents","authors":"Claire Deventer,&nbsp;Pietro Zidda","doi":"10.1016/j.datak.2024.102336","DOIUrl":"10.1016/j.datak.2024.102336","url":null,"abstract":"<div><p>Knowledge-based virtual shopping agents, that advise their users about which products to buy, are well used in technical markets such as healthcare e-commerce. To ensure the proper adoption of this technology, it is important to consider aspects of users’ psychology early in the software design process. When traditional adoption models such as UTAUT-2 work well for many technologies, they overlook important specificities of the healthcare e-commerce domain and of knowledge-based virtual agents technology. Drawing upon health information technology and virtual agent literature, we propose a complementary adoption model incorporating new predictors and moderators reflecting these domains’ specificities. The model is tested using 903 observations gathered through an online survey conducted in collaboration with a major actor in the herbal medicine market. Our model can serve as a basis for many phases of the knowledge-based agents software development. We propose actionable recommendations for practitioners and ideas for further research.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141412665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CGT: A Clause Graph Transformer Structure for aspect-based sentiment analysis CGT:用于基于方面的情感分析的条款图转换器结构
IF 2.5 3区 计算机科学 Q2 Decision Sciences Pub Date : 2024-06-13 DOI: 10.1016/j.datak.2024.102332
Zelong Su , Bin Gao , Xiaoou Pan , Zhengjun Liu , Yu Ji , Shutian Liu

In the realm of natural language processing (NLP), aspect-based sentiment analysis plays a pivotal role. Recently, there has been a growing emphasis on techniques leveraging Graph Convolutional Neural Network (GCN). However, there are several challenges associated with current approaches: (1) Due to the inherent transitivity of CGN, training inevitably entails the acquisition of irrelevant semantic information. (2) Existing methodologies heavily depend on the dependency tree, neglecting to consider the contextual structure of the sentence. (3) Another limitation of the majority of methods is their failure to account for the interactions occurring between different aspects. In this study, we propose a Clause Graph Transformer Structure (CGT) to alleviate these limitations. Specifically, CGT comprises three modules. The preprocessing module extracts aspect clauses from each sentence by bi-directionally traversing the constituent tree, reducing reliance on syntax trees and extracting semantic information from the perspective of clauses. Additionally, we assert that a word’s vector direction signifies its underlying attitude in the semantic space, a feature often overlooked in recent research. Without the necessity for additional parameters, we introduce the Clause Attention encoder (CA-encoder) to the clause module to effectively capture the directed cross-correlation coefficient between the clause and the target aspect. To enhance the representation of the target component, we propose capturing the connections between various aspects. In the inter-aspect module, we intricately design a Balanced Attention encoder (BA-encoder) that forms an aspect sequence by navigating the extracted phrase tree. To effectively capture the emotion of implicit components, we introduce a Top-K Attention Graph Convolutional Network (KA-GCN). Our proposed method has showcased state-of-the-art (SOTA) performance through experiments conducted on four widely used datasets. Furthermore, our model demonstrates a significant improvement in the robustness of datasets subjected to disturbances.

在自然语言处理(NLP)领域,基于方面的情感分析起着举足轻重的作用。最近,人们越来越重视利用图卷积神经网络(GCN)的技术。然而,目前的方法面临着几个挑战:(1) 由于图卷积神经网络固有的传递性,训练不可避免地会获取不相关的语义信息。(2) 现有方法严重依赖依赖树,忽略了句子的上下文结构。(3) 大多数方法的另一个局限是没有考虑到不同方面之间的相互作用。在本研究中,我们提出了语句图转换器结构(CGT)来缓解这些局限性。具体来说,CGT 包括三个模块。预处理模块通过双向遍历成分树从每个句子中提取方面分句,从而减少对句法树的依赖,并从分句的角度提取语义信息。此外,我们认为一个词的矢量方向标志着它在语义空间中的基本态度,而这一特征在最近的研究中经常被忽视。在不需要额外参数的情况下,我们在分句模块中引入了分句注意编码器(CA-encoder),以有效捕捉分句与目标方面之间的定向交叉相关系数。为了增强目标成分的表示,我们建议捕捉各方面之间的联系。在方面间模块中,我们复杂地设计了一个平衡注意力编码器(BA-encoder),通过导航提取的短语树形成一个方面序列。为了有效捕捉隐含成分的情感,我们引入了顶层注意力图卷积网络(KA-GCN)。通过在四个广泛使用的数据集上进行实验,我们提出的方法展示了最先进的(SOTA)性能。此外,我们的模型还显著提高了数据集受干扰时的鲁棒性。
{"title":"CGT: A Clause Graph Transformer Structure for aspect-based sentiment analysis","authors":"Zelong Su ,&nbsp;Bin Gao ,&nbsp;Xiaoou Pan ,&nbsp;Zhengjun Liu ,&nbsp;Yu Ji ,&nbsp;Shutian Liu","doi":"10.1016/j.datak.2024.102332","DOIUrl":"https://doi.org/10.1016/j.datak.2024.102332","url":null,"abstract":"<div><p>In the realm of natural language processing (NLP), aspect-based sentiment analysis plays a pivotal role. Recently, there has been a growing emphasis on techniques leveraging Graph Convolutional Neural Network (GCN). However, there are several challenges associated with current approaches: (1) Due to the inherent transitivity of CGN, training inevitably entails the acquisition of irrelevant semantic information. (2) Existing methodologies heavily depend on the dependency tree, neglecting to consider the contextual structure of the sentence. (3) Another limitation of the majority of methods is their failure to account for the interactions occurring between different aspects. In this study, we propose a Clause Graph Transformer Structure (CGT) to alleviate these limitations. Specifically, CGT comprises three modules. The preprocessing module extracts aspect clauses from each sentence by bi-directionally traversing the constituent tree, reducing reliance on syntax trees and extracting semantic information from the perspective of clauses. Additionally, we assert that a word’s vector direction signifies its underlying attitude in the semantic space, a feature often overlooked in recent research. Without the necessity for additional parameters, we introduce the Clause Attention encoder (CA-encoder) to the clause module to effectively capture the directed cross-correlation coefficient between the clause and the target aspect. To enhance the representation of the target component, we propose capturing the connections between various aspects. In the inter-aspect module, we intricately design a Balanced Attention encoder (BA-encoder) that forms an aspect sequence by navigating the extracted phrase tree. To effectively capture the emotion of implicit components, we introduce a Top-K Attention Graph Convolutional Network (KA-GCN). Our proposed method has showcased state-of-the-art (SOTA) performance through experiments conducted on four widely used datasets. Furthermore, our model demonstrates a significant improvement in the robustness of datasets subjected to disturbances.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141328532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Data & Knowledge Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1