Data & Knowledge Engineering最新文献_第6页

Unraveling the foundations and the evolution of conceptual modeling—Intellectual structure, current themes, and trajectories 解读概念模型的基础和演变--知识结构、当前主题和发展轨迹

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2024-09-04 DOI: 10.1016/j.datak.2024.102351

Jacky Akoka , Isabelle Comyn-Wattiau , Nicolas Prat , Veda C. Storey

The field of conceptual modeling has now been in existence for over five decades. To understand how this field has evolved and should continue to evolve, it is useful to examine the contributions made over time and the themes that have emerged. In this research, we apply bibliometric analysis to a corpus of over 4700 research papers spanning from 1976 to 2023. We successively apply co-citation, bibliographic coupling, and main path analysis. Co-citation and citation networks are produced that surface the intellectual structure of the field, the main themes, and the relationships among major and influential research papers over time. We identify four areas in the intellectual structure of the field: conceptual modeling and databases; grammars and guidelines for conceptual modeling; requirements engineering and information systems design methodologies; and ontology constructs for conceptual modeling. Between 2017 and 2023, we distinguish nine research themes, including domain-specific conceptual modeling and applications, ontologies and applications, genomics, and datastores and multi-model data. The main path analysis identifies several trajectories among the major and most influential papers. This leads to insights into the lineage of key, influential papers in conceptual modeling research. The primordial nature of the main paths identified encompasses two important aspects. The first revolves around refining and complementing the entity-relationship model. The second identifies the contribution of ontologies for conceptual modeling to make the models more robust. Based on the findings from this bibliometric analysis, we propose several directions for future conceptual modeling research.

概念建模领域迄今已有五十多年的历史。为了了解这一领域是如何发展的，以及应该如何继续发展，我们有必要研究一下随着时间推移做出的贡献和出现的主题。在这项研究中，我们对从 1976 年到 2023 年的 4700 多篇研究论文进行了文献计量分析。我们先后应用了共引、书目耦合和主要路径分析。通过共引和引文网络，我们发现了该领域的知识结构、主要主题以及主要和有影响力的研究论文之间的关系。我们确定了该领域知识结构的四个方面：概念建模和数据库；概念建模的语法和指南；需求工程和信息系统设计方法；概念建模的本体构造。从 2017 年到 2023 年，我们将划分出九个研究主题，包括特定领域概念建模与应用、本体与应用、基因组学以及数据存储与多模型数据。主要路径分析确定了主要和最有影响力的论文之间的几条轨迹。这有助于深入了解概念建模研究中重要的、有影响力的论文的发展脉络。所确定的主要路径的原始性质包括两个重要方面。第一个方面是完善和补充实体关系模型。第二个方面是本体对概念建模的贡献，使模型更加稳健。根据文献计量分析的结果，我们提出了未来概念建模研究的几个方向。

{"title":"Unraveling the foundations and the evolution of conceptual modeling—Intellectual structure, current themes, and trajectories","authors":"Jacky Akoka , Isabelle Comyn-Wattiau , Nicolas Prat , Veda C. Storey","doi":"10.1016/j.datak.2024.102351","DOIUrl":"10.1016/j.datak.2024.102351","url":null,"abstract":"<div><div>The field of conceptual modeling has now been in existence for over five decades. To understand how this field has evolved and should continue to evolve, it is useful to examine the contributions made over time and the themes that have emerged. In this research, we apply bibliometric analysis to a corpus of over 4700 research papers spanning from 1976 to 2023. We successively apply co-citation, bibliographic coupling, and main path analysis. Co-citation and citation networks are produced that surface the intellectual structure of the field, the main themes, and the relationships among major and influential research papers over time. We identify four areas in the intellectual structure of the field: conceptual modeling and databases; grammars and guidelines for conceptual modeling; requirements engineering and information systems design methodologies; and ontology constructs for conceptual modeling. Between 2017 and 2023, we distinguish nine research themes, including domain-specific conceptual modeling and applications, ontologies and applications, genomics, and datastores and multi-model data. The main path analysis identifies several trajectories among the major and most influential papers. This leads to insights into the lineage of key, influential papers in conceptual modeling research. The primordial nature of the main paths identified encompasses two important aspects. The first revolves around refining and complementing the entity-relationship model. The second identifies the contribution of ontologies for conceptual modeling to make the models more robust. Based on the findings from this bibliometric analysis, we propose several directions for future conceptual modeling research.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"154 ","pages":"Article 102351"},"PeriodicalIF":2.7,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Data engineering and modeling for artificial intelligence 人工智能的数据工程和建模

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2024-09-01 DOI: 10.1016/j.datak.2024.102346

Carlos Ordonez, Wojciech Macyna, Ladjel Bellatreche

引用次数: 0

Capturing and Analysing Employee Behaviour: An Honest Day’s Work Record 捕捉和分析员工行为：诚实的日常工作记录

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2024-08-31 DOI: 10.1016/j.datak.2024.102350

Iris Beerepoot, Tea Šinik, Hajo A. Reijers

For a range of reasons, organisations collect data on the work behaviour of their employees. However, each data collection technique displays its own unique mix of intrusiveness, information richness, and risks. For the sake of understanding the differences between data collection techniques, we conducted a multiple-case study in a multinational professional services organisation, tracking six participants throughout a workday using non-participant observation, screen recording, and timesheet techniques. This led to 136 hours of data. Our findings show that relying on one data collection technique alone cannot provide a comprehensive and accurate account of activities that are screen-based, offline, or overtime. The collected data also provided an opportunity to investigate the use of process mining for analysing employee behaviour, specifically with respect to the completeness of the collected data. Our study underlines the importance of judiciously selecting data collection techniques, as well as using a sufficiently broad data set to generate reliable insights into employee behaviour.

出于各种原因，企业会收集员工的工作行为数据。然而，每种数据收集技术都有其独特的侵入性、信息丰富性和风险性。为了了解数据收集技术之间的差异，我们在一家跨国专业服务机构开展了一项多案例研究，使用非参与者观察、屏幕记录和时间表技术对六名参与者的整个工作日进行跟踪。由此获得了 136 个小时的数据。我们的研究结果表明，仅仅依靠一种数据收集技术无法全面准确地描述基于屏幕、离线或加班的活动。收集到的数据还为研究如何利用流程挖掘来分析员工行为提供了机会，特别是在所收集数据的完整性方面。我们的研究强调了明智选择数据收集技术的重要性，以及使用足够广泛的数据集对员工行为进行可靠洞察的重要性。

{"title":"Capturing and Analysing Employee Behaviour: An Honest Day’s Work Record","authors":"Iris Beerepoot, Tea Šinik, Hajo A. Reijers","doi":"10.1016/j.datak.2024.102350","DOIUrl":"10.1016/j.datak.2024.102350","url":null,"abstract":"<div><p>For a range of reasons, organisations collect data on the work behaviour of their employees. However, each data collection technique displays its own unique mix of intrusiveness, information richness, and risks. For the sake of understanding the differences between data collection techniques, we conducted a multiple-case study in a multinational professional services organisation, tracking six participants throughout a workday using non-participant observation, screen recording, and timesheet techniques. This led to 136 hours of data. Our findings show that relying on one data collection technique alone cannot provide a comprehensive and accurate account of activities that are screen-based, offline, or overtime. The collected data also provided an opportunity to investigate the use of <em>process mining</em> for analysing employee behaviour, specifically with respect to the completeness of the collected data. Our study underlines the importance of judiciously selecting data collection techniques, as well as using a sufficiently broad data set to generate reliable insights into employee behaviour.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"154 ","pages":"Article 102350"},"PeriodicalIF":2.7,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000740/pdfft?md5=0803a6136e27919fd8c8a868fa63e889&pid=1-s2.0-S0169023X24000740-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142149252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Discovering outlying attributes of outliers in data streams 发现数据流中异常值的离群属性

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2024-08-30 DOI: 10.1016/j.datak.2024.102349

Egawati Panjei , Le Gruenwald

Data streams, continuous sequences of timestamped data points, necessitate real-time monitoring due to their time-sensitive nature. In various data stream applications, such as network security and credit card transaction monitoring, real-time detection of outliers is crucial, as these outliers often signify potential threats. Equally important is the real-time explanation of outliers, enabling users to glean insights and thereby shorten their investigation time. The investigation time for outliers is closely tied to their number of attributes, making it essential to provide explanations that detail which attributes are responsible for the abnormality of a data point, referred to as outlying attributes. However, the unbounded volume of data and concept drift of data streams pose challenges for discovering the outlying attributes of outliers in real time. In response, in this paper we propose EXOS, an algorithm designed for discovering the outlying attributes of multi-dimensional outliers in data streams. EXOS leverages cross-correlations among data streams, accommodates varying data stream schemas and arrival rates, and effectively addresses challenges related to the unbounded volume of data and concept drift. The algorithm is model-agnostic for point outlier detection and provides real-time explanations based on the local context of the outlier, derived from time-based tumbling windows. The paper provides a complexity analysis of EXOS and an experimental analysis comparing EXOS with existing algorithms. The evaluation includes an assessment of performance on both real-world and synthetic datasets in terms of average precision, recall, F1-score, and explanation time. The evaluation results show that, on average, EXOS achieves a 45.6% better F1 Score and is 7.3 times lower in explanation time compared to existing outlying attribute algorithms.

数据流是带有时间戳的数据点的连续序列，由于其时间敏感性，有必要对其进行实时监控。在网络安全和信用卡交易监控等各种数据流应用中，实时检测异常值至关重要，因为这些异常值往往意味着潜在的威胁。同样重要的是对异常值进行实时解释，使用户能够获得深刻见解，从而缩短调查时间。异常值的调查时间与其属性数量密切相关，因此必须提供解释，详细说明造成数据点异常的属性，即异常属性。然而，数据流的无限制数据量和概念漂移给实时发现异常值的离群属性带来了挑战。为此，我们在本文中提出了 EXOS 算法，该算法旨在发现数据流中多维离群值的离群属性。EXOS 可利用数据流之间的交叉相关性，适应不同的数据流模式和到达率，并能有效解决与无限制数据量和概念漂移相关的挑战。该算法在离群点检测方面与模型无关，并根据基于时间的翻滚窗口得出的离群点局部上下文提供实时解释。论文提供了 EXOS 的复杂性分析以及 EXOS 与现有算法比较的实验分析。评估包括对实际数据集和合成数据集的平均精确度、召回率、F1-分数和解释时间的性能评估。评估结果表明，与现有的离群属性算法相比，EXOS 的 F1 分数平均提高了 45.6%，解释时间缩短了 7.3 倍。

{"title":"Discovering outlying attributes of outliers in data streams","authors":"Egawati Panjei , Le Gruenwald","doi":"10.1016/j.datak.2024.102349","DOIUrl":"10.1016/j.datak.2024.102349","url":null,"abstract":"<div><p>Data streams, continuous sequences of timestamped data points, necessitate real-time monitoring due to their time-sensitive nature. In various data stream applications, such as network security and credit card transaction monitoring, real-time detection of outliers is crucial, as these outliers often signify potential threats. Equally important is the real-time explanation of outliers, enabling users to glean insights and thereby shorten their investigation time. The investigation time for outliers is closely tied to their number of attributes, making it essential to provide explanations that detail which attributes are responsible for the abnormality of a data point, referred to as outlying attributes. However, the unbounded volume of data and concept drift of data streams pose challenges for discovering the outlying attributes of outliers in real time. In response, in this paper we propose EXOS, an algorithm designed for discovering the outlying attributes of multi-dimensional outliers in data streams. EXOS leverages cross-correlations among data streams, accommodates varying data stream schemas and arrival rates, and effectively addresses challenges related to the unbounded volume of data and concept drift. The algorithm is model-agnostic for point outlier detection and provides real-time explanations based on the local context of the outlier, derived from time-based tumbling windows. The paper provides a complexity analysis of EXOS and an experimental analysis comparing EXOS with existing algorithms. The evaluation includes an assessment of performance on both real-world and synthetic datasets in terms of average precision, recall, F1-score, and explanation time. The evaluation results show that, on average, EXOS achieves a 45.6% better F1 Score and is 7.3 times lower in explanation time compared to existing outlying attribute algorithms.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"154 ","pages":"Article 102349"},"PeriodicalIF":2.7,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142121852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A self-adaptive density-based clustering algorithm for varying densities datasets with strong disturbance factor 针对具有强干扰因素的不同密度数据集的基于密度的自适应聚类算法

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2024-08-07 DOI: 10.1016/j.datak.2024.102345

Zihao Cai, Zhaodong Gu, Kejing He

Clustering is a fundamental task in data mining, aiming to group similar objects together based on their features or attributes. With the rapid increase in data analysis volume and the growing complexity of high-dimensional data distribution, clustering has become increasingly important in numerous applications, including image analysis, text mining, and anomaly detection. DBSCAN is a powerful tool for clustering analysis and is widely used in density-based clustering algorithms. However, DBSCAN and its variants encounter challenges when confronted with datasets exhibiting clusters of varying densities in intricate high-dimensional spaces affected by significant disturbance factors. A typical example is multi-density clustering connected by a few data points with strong internal correlations, a scenario commonly encountered in the analysis of crowd mobility. To address these challenges, we propose a Self-adaptive Density-Based Clustering Algorithm for Varying Densities Datasets with Strong Disturbance Factor (SADBSCAN). This algorithm comprises a data block splitter, a local clustering module, a global clustering module, and a data block merger to obtain adaptive clustering results. We conduct extensive experiments on both artificial and real-world datasets to evaluate the effectiveness of SADBSCAN. The experimental results indicate that SADBSCAN significantly outperforms several strong baselines across different metrics, demonstrating the high adaptability and scalability of our algorithm.

聚类是数据挖掘的一项基本任务，旨在根据相似对象的特征或属性将其归类。随着数据分析量的快速增长和高维数据分布的日益复杂，聚类在图像分析、文本挖掘和异常检测等众多应用中变得越来越重要。DBSCAN 是一种功能强大的聚类分析工具，被广泛应用于基于密度的聚类算法中。然而，当数据集在受重大干扰因素影响的错综复杂的高维空间中呈现出不同密度的聚类时，DBSCAN 及其变体就会遇到挑战。一个典型的例子是由几个具有强内部相关性的数据点连接而成的多密度聚类，这是人群流动性分析中经常遇到的情况。为了应对这些挑战，我们提出了一种针对具有强干扰因素的不同密度数据集的自适应密度聚类算法（SADBSCAN）。该算法由数据块分割器、局部聚类模块、全局聚类模块和数据块合并器组成，以获得自适应聚类结果。我们在人工数据集和真实数据集上进行了大量实验，以评估 SADBSCAN 的有效性。实验结果表明，在不同指标上，SADBSCAN 明显优于几种强大的基线算法，证明了我们算法的高适应性和可扩展性。

{"title":"A self-adaptive density-based clustering algorithm for varying densities datasets with strong disturbance factor","authors":"Zihao Cai, Zhaodong Gu, Kejing He","doi":"10.1016/j.datak.2024.102345","DOIUrl":"10.1016/j.datak.2024.102345","url":null,"abstract":"<div><p>Clustering is a fundamental task in data mining, aiming to group similar objects together based on their features or attributes. With the rapid increase in data analysis volume and the growing complexity of high-dimensional data distribution, clustering has become increasingly important in numerous applications, including image analysis, text mining, and anomaly detection. DBSCAN is a powerful tool for clustering analysis and is widely used in density-based clustering algorithms. However, DBSCAN and its variants encounter challenges when confronted with datasets exhibiting clusters of varying densities in intricate high-dimensional spaces affected by significant disturbance factors. A typical example is multi-density clustering connected by a few data points with strong internal correlations, a scenario commonly encountered in the analysis of crowd mobility. To address these challenges, we propose a Self-adaptive Density-Based Clustering Algorithm for Varying Densities Datasets with Strong Disturbance Factor (SADBSCAN). This algorithm comprises a data block splitter, a local clustering module, a global clustering module, and a data block merger to obtain adaptive clustering results. We conduct extensive experiments on both artificial and real-world datasets to evaluate the effectiveness of SADBSCAN. The experimental results indicate that SADBSCAN significantly outperforms several strong baselines across different metrics, demonstrating the high adaptability and scalability of our algorithm.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"153 ","pages":"Article 102345"},"PeriodicalIF":2.7,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141979340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Developing A Decision Support System for Healthcare Practices: A Design Science Research Approach 为医疗实践开发决策支持系统：设计科学研究方法

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2024-07-17 DOI: 10.1016/j.datak.2024.102344

Arun Sen , Atish P. Sinha , Cong Zhang

We propose a new approach for designing a decision support system (DSS) for the transformation of healthcare practices. Practice transformation helps practices transition from their current state to patient-centered medical home (PCMH) model of care. Our approach employs activity theory to derive the elements of practice transformation by designing and integrating two ontologies: a domain ontology and a task ontology. By incorporating both goal-oriented and task-oriented aspects of the practice transformation process and specifying how they interact, our integrated design model for the DSS provides prescriptive knowledge on assessing the current status of a practice with respect to PCMH recognition and navigating efficiently through a complex solution space. This knowledge, which is at a moderate level of abstraction and expressed in a language that practitioners understand, contributes to the literature by providing a formulation for a nascent design theory. We implement the integrated design model as a DSS prototype; results of validation tests conducted on the prototype indicate that it is superior to the existing PCMH readiness tracking tool with respect to effectiveness, usability, efficiency, and sustainability.

我们提出了一种设计决策支持系统（DSS）的新方法，用于医疗实践的转型。实践转型有助于医疗实践从当前状态过渡到以患者为中心的医疗之家（PCMH）护理模式。我们的方法采用活动理论，通过设计和整合两个本体：领域本体和任务本体，推导出实践转型的要素。通过整合实践转型过程中的目标导向和任务导向两个方面，并明确它们之间的互动方式，我们的 DSS 集成设计模型提供了有关评估实践在 PCMH 识别方面的现状以及在复杂的解决方案空间中有效导航的规范性知识。这些知识的抽象程度适中，并以从业人员能够理解的语言表达，为新生的设计理论提供了一种表述方式，从而为文献做出了贡献。我们将综合设计模型作为一个 DSS 原型来实施；对该原型进行的验证测试结果表明，它在有效性、可用性、效率和可持续性方面都优于现有的 PCMH 准备情况跟踪工具。

{"title":"Developing A Decision Support System for Healthcare Practices: A Design Science Research Approach","authors":"Arun Sen , Atish P. Sinha , Cong Zhang","doi":"10.1016/j.datak.2024.102344","DOIUrl":"10.1016/j.datak.2024.102344","url":null,"abstract":"<div><p>We propose a new approach for designing a decision support system (DSS) for the transformation of healthcare practices. Practice transformation helps practices transition from their current state to patient-centered medical home (PCMH) model of care. Our approach employs activity theory to derive the elements of practice transformation by designing and integrating two ontologies: a domain ontology and a task ontology. By incorporating both goal-oriented and task-oriented aspects of the practice transformation process and specifying how they interact, our integrated design model for the DSS provides prescriptive knowledge on assessing the current status of a practice with respect to PCMH recognition and navigating efficiently through a complex solution space. This knowledge, which is at a moderate level of abstraction and expressed in a language that practitioners understand, contributes to the literature by providing a formulation for a nascent design theory. We implement the integrated design model as a DSS prototype; results of validation tests conducted on the prototype indicate that it is superior to the existing PCMH readiness tracking tool with respect to effectiveness, usability, efficiency, and sustainability.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"154 ","pages":"Article 102344"},"PeriodicalIF":2.7,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141840536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Increasing the precision of public transit user activity location detection from smart card data analysis via spatial–temporal DBSCAN 通过时空 DBSCAN 提高智能卡数据分析中公交用户活动位置检测的精度

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2024-07-15 DOI: 10.1016/j.datak.2024.102343

Fehmi Can Ozer , Hediye Tuydes-Yaman , Gulcin Dalkic-Melek

Smart Card (SC) systems have been increasingly adopted by public transit (PT) agencies all over the world, which facilitates not only fare collection but also PT service analyses and evaluations. Spatial clustering is one of the most important methods to investigate this big data in terms of activity locations, travel patterns, user behaviours, etc. Besides spatio-temporal analysis of the clusters provide further precision for detection of PT traveller activity locations and durations. This study focuses on investigation and comparison of the effectiveness of two density-based clustering algorithms, DBSCAN, and ST-DBSCAN. The numeric results are obtained using SC data (public bus system) from the metropolitan city of Konya, Turkey, and clustering algorithms are applied to a sample of this smart card data, and activity clusters are detected for the users. The results of the study suggested that ST-DBSCAN constitutes more compact clusters in both time and space for transportation researchers who want to accurately detect passengers’ individual activity regions using SC data.

世界各地的公共交通（PT）机构越来越多地采用智能卡（SC）系统，这不仅方便了收费，也方便了对公共交通服务的分析和评估。空间聚类是研究活动地点、出行模式、用户行为等大数据的最重要方法之一。此外，对聚类的时空分析还能进一步精确检测公共交通乘客的活动地点和持续时间。本研究重点调查和比较了两种基于密度的聚类算法--DBSCAN 和 ST-DBSCAN--的有效性。研究使用土耳其科尼亚市的 SC 数据（公共汽车系统）得出了数值结果，并将聚类算法应用于该智能卡数据样本，检测出用户的活动聚类。研究结果表明，ST-DBSCAN 在时间和空间上都能构成更紧凑的聚类，适用于希望利用 SC 数据准确检测乘客个人活动区域的交通研究人员。

{"title":"Increasing the precision of public transit user activity location detection from smart card data analysis via spatial–temporal DBSCAN","authors":"Fehmi Can Ozer , Hediye Tuydes-Yaman , Gulcin Dalkic-Melek","doi":"10.1016/j.datak.2024.102343","DOIUrl":"10.1016/j.datak.2024.102343","url":null,"abstract":"<div><p>Smart Card (SC) systems have been increasingly adopted by public transit (PT) agencies all over the world, which facilitates not only fare collection but also PT service analyses and evaluations. Spatial clustering is one of the most important methods to investigate this big data in terms of activity locations, travel patterns, user behaviours, etc. Besides spatio-temporal analysis of the clusters provide further precision for detection of PT traveller activity locations and durations. This study focuses on investigation and comparison of the effectiveness of two density-based clustering algorithms, DBSCAN, and ST-DBSCAN. The numeric results are obtained using SC data (public bus system) from the metropolitan city of Konya, Turkey, and clustering algorithms are applied to a sample of this smart card data, and activity clusters are detected for the users. The results of the study suggested that ST-DBSCAN constitutes more compact clusters in both time and space for transportation researchers who want to accurately detect passengers’ individual activity regions using SC data.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"153 ","pages":"Article 102343"},"PeriodicalIF":2.7,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141716049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evaluating quality of ontology-driven conceptual models abstractions 评估本体驱动概念模型抽象的质量

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2024-07-14 DOI: 10.1016/j.datak.2024.102342

Elena Romanenko , Diego Calvanese , Giancarlo Guizzardi

The complexity of an (ontology-driven) conceptual model highly correlates with the complexity of the domain and software for which it is designed. With that in mind, an algorithm for producing ontology-driven conceptual model abstractions was previously proposed. In this paper, we empirically evaluate the quality of the abstractions produced by it. First, we have implemented and tested the last version of the algorithm over a FAIR catalog of models represented in the ontology-driven conceptual modeling language OntoUML. Second, we performed three user studies to evaluate the usefulness of the resulting abstractions as perceived by modelers. This paper reports on the findings of these experiments and reflects on how they can be exploited to improve the existing algorithm.

本体驱动）概念模型的复杂性与设计该模型的领域和软件的复杂性密切相关。有鉴于此，我们之前提出了一种生成本体驱动概念模型抽象的算法。在本文中，我们对该算法生成的抽象的质量进行了实证评估。首先，我们在用本体驱动的概念模型语言 OntoUML 表示的模型 FAIR 目录上实现并测试了该算法的最后一个版本。其次，我们进行了三项用户研究，以评估建模者所感知的抽象结果的有用性。本文报告了这些实验的结果，并对如何利用这些结果改进现有算法进行了思考。

引用次数: 0

An interactive approach to semantic enrichment with geospatial data 利用地理空间数据丰富语义的互动方法

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2024-07-04 DOI: 10.1016/j.datak.2024.102341

Flavio De Paoli , Michele Ciavotta , Roberto Avogadro , Emil Hristov , Milena Borukova , Dessislava Petrova-Antonova , Iva Krasteva

The ubiquitous availability of datasets has spurred the utilization of Artificial Intelligence methods and models to extract valuable insights, unearth hidden patterns, and predict future trends. However, the current process of data collection and linking heavily relies on expert knowledge and domain-specific understanding, which engenders substantial costs in terms of both time and financial resources. Therefore, streamlining the data acquisition, harmonization, and enrichment procedures to deliver high-fidelity datasets readily usable for analytics is paramount. This paper explores the capabilities of SemTUI, a comprehensive framework designed to support the enrichment of tabular data by leveraging semantics and user interaction. Utilizing SemTUI, an iterative and interactive approach is proposed to enhance the flexibility, usability and efficiency of geospatial data enrichment. The approach is evaluated through a pilot case study focused on urban planning, with a particular emphasis on geocoding. Using a real-world scenario involving the analysis of kindergarten accessibility within walking distance, the study demonstrates the proficiency of SemTUI in generating precise and semantically enriched location data. The incorporation of human feedback in the enrichment process successfully enhances the quality of the resulting dataset, highlighting SemTUI’s potential for broader applications in geospatial analysis and its usability for users with limited expertise in manipulating geospatial data.

无处不在的数据集促使人们利用人工智能方法和模型来提取有价值的见解、发掘隐藏的模式并预测未来趋势。然而，目前的数据收集和链接过程严重依赖于专家知识和对特定领域的理解，这在时间和财政资源方面都产生了巨大的成本。因此，最重要的是简化数据采集、协调和丰富程序，以提供可随时用于分析的高保真数据集。本文探讨了 SemTUI 的功能，这是一个综合框架，旨在通过利用语义和用户交互来支持表格数据的丰富。利用 SemTUI，提出了一种迭代和交互式方法，以提高地理空间数据丰富的灵活性、可用性和效率。该方法通过一项以城市规划为重点的试点案例研究进行了评估，特别强调了地理编码。该研究使用了一个涉及分析步行距离内幼儿园可达性的真实场景，展示了 SemTUI 在生成精确且语义丰富的位置数据方面的能力。在丰富过程中加入人工反馈，成功地提高了所生成数据集的质量，凸显了 SemTUI 在地理空间分析领域更广泛应用的潜力，以及它对于在地理空间数据操作方面专业知识有限的用户的可用性。

{"title":"An interactive approach to semantic enrichment with geospatial data","authors":"Flavio De Paoli , Michele Ciavotta , Roberto Avogadro , Emil Hristov , Milena Borukova , Dessislava Petrova-Antonova , Iva Krasteva","doi":"10.1016/j.datak.2024.102341","DOIUrl":"10.1016/j.datak.2024.102341","url":null,"abstract":"<div><p>The ubiquitous availability of datasets has spurred the utilization of Artificial Intelligence methods and models to extract valuable insights, unearth hidden patterns, and predict future trends. However, the current process of data collection and linking heavily relies on expert knowledge and domain-specific understanding, which engenders substantial costs in terms of both time and financial resources. Therefore, streamlining the data acquisition, harmonization, and enrichment procedures to deliver high-fidelity datasets readily usable for analytics is paramount. This paper explores the capabilities of <em>SemTUI</em>, a comprehensive framework designed to support the enrichment of tabular data by leveraging semantics and user interaction. Utilizing SemTUI, an iterative and interactive approach is proposed to enhance the flexibility, usability and efficiency of geospatial data enrichment. The approach is evaluated through a pilot case study focused on urban planning, with a particular emphasis on geocoding. Using a real-world scenario involving the analysis of kindergarten accessibility within walking distance, the study demonstrates the proficiency of SemTUI in generating precise and semantically enriched location data. The incorporation of human feedback in the enrichment process successfully enhances the quality of the resulting dataset, highlighting SemTUI’s potential for broader applications in geospatial analysis and its usability for users with limited expertise in manipulating geospatial data.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"153 ","pages":"Article 102341"},"PeriodicalIF":2.7,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X2400065X/pdfft?md5=969535621599adcaa2ec5e5d12e392b3&pid=1-s2.0-S0169023X2400065X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141698385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Explanation, semantics, and ontology 解释、语义和本体论

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2024-06-25 DOI: 10.1016/j.datak.2024.102325

Giancarlo Guizzardi , Nicola Guarino

The terms ‘semantics’ and ‘ontology’ are increasingly appearing together with ‘explanation’, not only in the scientific literature, but also in everyday social interactions, in particular, within organizations. Ontologies have been shown to play a key role in supporting the semantic interoperability of data and knowledge representation structures used by information systems. With the proliferation of applications of Artificial Intelligence (AI) in different settings and the increasing need to guarantee their explainability (but also their interoperability) in critical contexts, the term ‘explanation’ has also become part of the scientific and technical jargon of modern information systems engineering. However, all of these terms are also significantly overloaded. In this paper, we address several interpretations of these notions, with an emphasis on their strong connection. Specifically, we discuss a notion of explanation termed ontological unpacking, which aims at explaining symbolic domain descriptions (e.g., conceptual models, knowledge graphs, logical specifications) by revealing their ontological commitment in terms of their so-called truthmakers, i.e., the entities in one’s ontology that are responsible for the truth of a description. To illustrate this methodology, we employ an ontological theory of relations to explain a symbolic model encoded in the de facto standard modeling language UML. We also discuss the essential role played by ontology-driven conceptual models (resulting from this form of explanation processes) in supporting semantic interoperability tasks. Furthermore, we revisit a proposal for quality criteria for explanations from philosophy of science to assess our approach. Finally, we discuss the relation between ontological unpacking and other forms of explanation in philosophy and science, as well as in the subarea of Artificial Intelligence known as Explainable AI (XAI).

语义 "和 "本体 "这两个术语越来越多地与 "解释 "放在一起，不仅出现在科学文献中，也出现在日常社会交往中，特别是在组织内部。事实证明，本体在支持信息系统所使用的数据和知识表示结构的语义互操作性方面发挥着关键作用。随着人工智能（AI）在不同环境中的广泛应用，以及在关键环境中保证其可解释性（以及互操作性）的需求日益增加，"解释 "一词也已成为现代信息系统工程科学和技术术语的一部分。然而，所有这些术语也都严重超载。在本文中，我们将讨论对这些概念的几种解释，并强调它们之间的紧密联系。具体来说，我们讨论了一种解释概念，称为"......"，其目的是通过揭示所谓的"......"（即本体中对描述的真实性负责的实体）来解释符号领域描述（如概念模型、知识图谱、逻辑规范）。为了说明这种方法，我们采用本体论关系理论来解释用标准建模语言 UML 编码的符号模型。我们还讨论了本体驱动的概念模型（由这种形式的解释过程产生）在支持语义互操作性任务中发挥的重要作用。此外，我们重温了科学哲学中关于解释质量标准的建议，以评估我们的方法。最后，我们讨论了本体论解包与哲学和科学中的其他解释形式之间的关系，以及在人工智能子领域 "可解释人工智能"（XAI）中的关系。

{"title":"Explanation, semantics, and ontology","authors":"Giancarlo Guizzardi , Nicola Guarino","doi":"10.1016/j.datak.2024.102325","DOIUrl":"10.1016/j.datak.2024.102325","url":null,"abstract":"<div><p>The terms ‘semantics’ and ‘ontology’ are increasingly appearing together with ‘explanation’, not only in the scientific literature, but also in everyday social interactions, in particular, within organizations. Ontologies have been shown to play a key role in supporting the semantic interoperability of data and knowledge representation structures used by information systems. With the proliferation of applications of Artificial Intelligence (AI) in different settings and the increasing need to guarantee their explainability (but also their interoperability) in critical contexts, the term ‘explanation’ has also become part of the scientific and technical jargon of modern information systems engineering. However, all of these terms are also significantly overloaded. In this paper, we address several interpretations of these notions, with an emphasis on their strong connection. Specifically, we discuss a notion of explanation termed <em>ontological unpacking</em>, which aims at explaining symbolic domain descriptions (e.g., conceptual models, knowledge graphs, logical specifications) by revealing their <em>ontological commitment</em> in terms of their so-called <em>truthmakers</em>, i.e., the entities in one’s ontology that are responsible for the truth of a description. To illustrate this methodology, we employ an ontological theory of relations to explain a symbolic model encoded in the <em>de facto</em> standard modeling language UML. We also discuss the essential role played by ontology-driven conceptual models (resulting from this form of explanation processes) in supporting semantic interoperability tasks. Furthermore, we revisit a proposal for quality criteria for explanations from philosophy of science to assess our approach. Finally, we discuss the relation between ontological unpacking and other forms of explanation in philosophy and science, as well as in the subarea of Artificial Intelligence known as Explainable AI (XAI).</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"153 ","pages":"Article 102325"},"PeriodicalIF":2.7,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000491/pdfft?md5=79cddbdaff8702c03d78a624d5f422a3&pid=1-s2.0-S0169023X24000491-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141943335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0