首页 > 最新文献

Data & Knowledge Engineering最新文献

英文 中文
A conceptual framework for the government big data ecosystem (‘datagov.eco’) 政府大数据生态系统("datagov.eco")概念框架
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-05 DOI: 10.1016/j.datak.2024.102348
Syed Iftikhar Hussain Shah , Vassilios Peristeras , Ioannis Magnisalis

The public sector, private firms, and civil society constantly create data of high volume, velocity, and veracity from diverse sources. This kind of data is known as big data. As in other industries, public administrations consider big data as the “new oil" and employ data-centric policies to transform data into knowledge, stimulate good governance, innovative digital services, transparency, and citizens' engagement in public policy. More and more public organizations understand the value created by exploiting internal and external data sources, delivering new capabilities, and fostering collaboration inside and outside of public administrations. Despite the broad interest in this ecosystem, we still lack a detailed and systematic view of it. In this paper, we attempt to describe the emerging Government Big Data Ecosystem as a socio-technical network of people, organizations, processes, technology, infrastructure, standards & policies, procedures, and resources. This ecosystem supports data functions such as data collection, integration, analysis, storage, sharing, use, protection, and archiving. Through these functions, value is created by promoting evidence-based policymaking, modern public services delivery, data-driven administration and open government, and boosting the data economy. Through a Design Science Research methodology, we propose a conceptual framework, which we call ‘datagov.eco’. We believe our ‘datagov.eco’ framework will provide insights and support to different stakeholders’ profiles, including administrators, consultants, data engineers, and data scientists.

公共部门、私营企业和民间社会不断从各种来源创建大量、高速和真实的数据。这类数据被称为大数据。与其他行业一样,公共管理部门将大数据视为 "新石油",并采用以数据为中心的政策,将数据转化为知识,促进善治、创新数字服务、透明度和公民参与公共政策。越来越多的公共组织认识到利用内部和外部数据源、提供新能力以及促进公共管理部门内外协作所创造的价值。尽管这一生态系统受到广泛关注,但我们仍然缺乏对其详细而系统的了解。在本文中,我们试图将新兴的政府大数据生态系统描述为一个由人员、组织、流程、技术、基础设施、标准&amp、政策、程序和资源组成的社会技术网络。该生态系统支持数据功能,如数据收集、整合、分析、存储、共享、使用、保护和归档。通过这些功能,可以促进循证决策、现代公共服务交付、数据驱动的行政管理和开放式政府,并推动数据经济的发展,从而创造价值。通过设计科学研究方法,我们提出了一个概念框架,我们称之为 "datagov.eco"。我们相信,我们的 "datagov.eco "框架将为不同利益相关者(包括管理者、顾问、数据工程师和数据科学家)提供见解和支持。
{"title":"A conceptual framework for the government big data ecosystem (‘datagov.eco’)","authors":"Syed Iftikhar Hussain Shah ,&nbsp;Vassilios Peristeras ,&nbsp;Ioannis Magnisalis","doi":"10.1016/j.datak.2024.102348","DOIUrl":"10.1016/j.datak.2024.102348","url":null,"abstract":"<div><p>The public sector, private firms, and civil society constantly create data of high volume, velocity, and veracity from diverse sources. This kind of data is known as big data. As in other industries, public administrations consider big data as the “new oil\" and employ data-centric policies to transform data into knowledge, stimulate good governance, innovative digital services, transparency, and citizens' engagement in public policy. More and more public organizations understand the value created by exploiting internal and external data sources, delivering new capabilities, and fostering collaboration inside and outside of public administrations. Despite the broad interest in this ecosystem, we still lack a detailed and systematic view of it. In this paper, we attempt to describe the emerging Government Big Data Ecosystem as a <em>socio-technical network</em> of people, organizations, processes, technology, infrastructure, standards &amp; policies, procedures, and resources. This ecosystem supports <em>data functions</em> such as data collection, integration, analysis, storage, sharing, use, protection, and archiving. Through these functions, <em>value is created</em> by promoting evidence-based policymaking, modern public services delivery, data-driven administration and open government, and boosting the data economy. Through a Design Science Research methodology, we propose a conceptual framework, which we call ‘datagov.eco’. We believe our ‘datagov.eco’ framework will provide insights and support to different stakeholders’ profiles, including administrators, consultants, data engineers, and data scientists.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"154 ","pages":"Article 102348"},"PeriodicalIF":2.7,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142271814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unraveling the foundations and the evolution of conceptual modeling—Intellectual structure, current themes, and trajectories 解读概念模型的基础和演变--知识结构、当前主题和发展轨迹
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-04 DOI: 10.1016/j.datak.2024.102351
Jacky Akoka , Isabelle Comyn-Wattiau , Nicolas Prat , Veda C. Storey
The field of conceptual modeling has now been in existence for over five decades. To understand how this field has evolved and should continue to evolve, it is useful to examine the contributions made over time and the themes that have emerged. In this research, we apply bibliometric analysis to a corpus of over 4700 research papers spanning from 1976 to 2023. We successively apply co-citation, bibliographic coupling, and main path analysis. Co-citation and citation networks are produced that surface the intellectual structure of the field, the main themes, and the relationships among major and influential research papers over time. We identify four areas in the intellectual structure of the field: conceptual modeling and databases; grammars and guidelines for conceptual modeling; requirements engineering and information systems design methodologies; and ontology constructs for conceptual modeling. Between 2017 and 2023, we distinguish nine research themes, including domain-specific conceptual modeling and applications, ontologies and applications, genomics, and datastores and multi-model data. The main path analysis identifies several trajectories among the major and most influential papers. This leads to insights into the lineage of key, influential papers in conceptual modeling research. The primordial nature of the main paths identified encompasses two important aspects. The first revolves around refining and complementing the entity-relationship model. The second identifies the contribution of ontologies for conceptual modeling to make the models more robust. Based on the findings from this bibliometric analysis, we propose several directions for future conceptual modeling research.
概念建模领域迄今已有五十多年的历史。为了了解这一领域是如何发展的,以及应该如何继续发展,我们有必要研究一下随着时间推移做出的贡献和出现的主题。在这项研究中,我们对从 1976 年到 2023 年的 4700 多篇研究论文进行了文献计量分析。我们先后应用了共引、书目耦合和主要路径分析。通过共引和引文网络,我们发现了该领域的知识结构、主要主题以及主要和有影响力的研究论文之间的关系。我们确定了该领域知识结构的四个方面:概念建模和数据库;概念建模的语法和指南;需求工程和信息系统设计方法;概念建模的本体构造。从 2017 年到 2023 年,我们将划分出九个研究主题,包括特定领域概念建模与应用、本体与应用、基因组学以及数据存储与多模型数据。主要路径分析确定了主要和最有影响力的论文之间的几条轨迹。这有助于深入了解概念建模研究中重要的、有影响力的论文的发展脉络。所确定的主要路径的原始性质包括两个重要方面。第一个方面是完善和补充实体关系模型。第二个方面是本体对概念建模的贡献,使模型更加稳健。根据文献计量分析的结果,我们提出了未来概念建模研究的几个方向。
{"title":"Unraveling the foundations and the evolution of conceptual modeling—Intellectual structure, current themes, and trajectories","authors":"Jacky Akoka ,&nbsp;Isabelle Comyn-Wattiau ,&nbsp;Nicolas Prat ,&nbsp;Veda C. Storey","doi":"10.1016/j.datak.2024.102351","DOIUrl":"10.1016/j.datak.2024.102351","url":null,"abstract":"<div><div>The field of conceptual modeling has now been in existence for over five decades. To understand how this field has evolved and should continue to evolve, it is useful to examine the contributions made over time and the themes that have emerged. In this research, we apply bibliometric analysis to a corpus of over 4700 research papers spanning from 1976 to 2023. We successively apply co-citation, bibliographic coupling, and main path analysis. Co-citation and citation networks are produced that surface the intellectual structure of the field, the main themes, and the relationships among major and influential research papers over time. We identify four areas in the intellectual structure of the field: conceptual modeling and databases; grammars and guidelines for conceptual modeling; requirements engineering and information systems design methodologies; and ontology constructs for conceptual modeling. Between 2017 and 2023, we distinguish nine research themes, including domain-specific conceptual modeling and applications, ontologies and applications, genomics, and datastores and multi-model data. The main path analysis identifies several trajectories among the major and most influential papers. This leads to insights into the lineage of key, influential papers in conceptual modeling research. The primordial nature of the main paths identified encompasses two important aspects. The first revolves around refining and complementing the entity-relationship model. The second identifies the contribution of ontologies for conceptual modeling to make the models more robust. Based on the findings from this bibliometric analysis, we propose several directions for future conceptual modeling research.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"154 ","pages":"Article 102351"},"PeriodicalIF":2.7,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data engineering and modeling for artificial intelligence 人工智能的数据工程和建模
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-01 DOI: 10.1016/j.datak.2024.102346
Carlos Ordonez, Wojciech Macyna, Ladjel Bellatreche
{"title":"Data engineering and modeling for artificial intelligence","authors":"Carlos Ordonez,&nbsp;Wojciech Macyna,&nbsp;Ladjel Bellatreche","doi":"10.1016/j.datak.2024.102346","DOIUrl":"10.1016/j.datak.2024.102346","url":null,"abstract":"","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"153 ","pages":"Article 102346"},"PeriodicalIF":2.7,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142169569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Capturing and Analysing Employee Behaviour: An Honest Day’s Work Record 捕捉和分析员工行为:诚实的日常工作记录
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-31 DOI: 10.1016/j.datak.2024.102350
Iris Beerepoot, Tea Šinik, Hajo A. Reijers

For a range of reasons, organisations collect data on the work behaviour of their employees. However, each data collection technique displays its own unique mix of intrusiveness, information richness, and risks. For the sake of understanding the differences between data collection techniques, we conducted a multiple-case study in a multinational professional services organisation, tracking six participants throughout a workday using non-participant observation, screen recording, and timesheet techniques. This led to 136 hours of data. Our findings show that relying on one data collection technique alone cannot provide a comprehensive and accurate account of activities that are screen-based, offline, or overtime. The collected data also provided an opportunity to investigate the use of process mining for analysing employee behaviour, specifically with respect to the completeness of the collected data. Our study underlines the importance of judiciously selecting data collection techniques, as well as using a sufficiently broad data set to generate reliable insights into employee behaviour.

出于各种原因,企业会收集员工的工作行为数据。然而,每种数据收集技术都有其独特的侵入性、信息丰富性和风险性。为了了解数据收集技术之间的差异,我们在一家跨国专业服务机构开展了一项多案例研究,使用非参与者观察、屏幕记录和时间表技术对六名参与者的整个工作日进行跟踪。由此获得了 136 个小时的数据。我们的研究结果表明,仅仅依靠一种数据收集技术无法全面准确地描述基于屏幕、离线或加班的活动。收集到的数据还为研究如何利用流程挖掘来分析员工行为提供了机会,特别是在所收集数据的完整性方面。我们的研究强调了明智选择数据收集技术的重要性,以及使用足够广泛的数据集对员工行为进行可靠洞察的重要性。
{"title":"Capturing and Analysing Employee Behaviour: An Honest Day’s Work Record","authors":"Iris Beerepoot,&nbsp;Tea Šinik,&nbsp;Hajo A. Reijers","doi":"10.1016/j.datak.2024.102350","DOIUrl":"10.1016/j.datak.2024.102350","url":null,"abstract":"<div><p>For a range of reasons, organisations collect data on the work behaviour of their employees. However, each data collection technique displays its own unique mix of intrusiveness, information richness, and risks. For the sake of understanding the differences between data collection techniques, we conducted a multiple-case study in a multinational professional services organisation, tracking six participants throughout a workday using non-participant observation, screen recording, and timesheet techniques. This led to 136 hours of data. Our findings show that relying on one data collection technique alone cannot provide a comprehensive and accurate account of activities that are screen-based, offline, or overtime. The collected data also provided an opportunity to investigate the use of <em>process mining</em> for analysing employee behaviour, specifically with respect to the completeness of the collected data. Our study underlines the importance of judiciously selecting data collection techniques, as well as using a sufficiently broad data set to generate reliable insights into employee behaviour.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"154 ","pages":"Article 102350"},"PeriodicalIF":2.7,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000740/pdfft?md5=0803a6136e27919fd8c8a868fa63e889&pid=1-s2.0-S0169023X24000740-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142149252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discovering outlying attributes of outliers in data streams 发现数据流中异常值的离群属性
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-30 DOI: 10.1016/j.datak.2024.102349
Egawati Panjei , Le Gruenwald

Data streams, continuous sequences of timestamped data points, necessitate real-time monitoring due to their time-sensitive nature. In various data stream applications, such as network security and credit card transaction monitoring, real-time detection of outliers is crucial, as these outliers often signify potential threats. Equally important is the real-time explanation of outliers, enabling users to glean insights and thereby shorten their investigation time. The investigation time for outliers is closely tied to their number of attributes, making it essential to provide explanations that detail which attributes are responsible for the abnormality of a data point, referred to as outlying attributes. However, the unbounded volume of data and concept drift of data streams pose challenges for discovering the outlying attributes of outliers in real time. In response, in this paper we propose EXOS, an algorithm designed for discovering the outlying attributes of multi-dimensional outliers in data streams. EXOS leverages cross-correlations among data streams, accommodates varying data stream schemas and arrival rates, and effectively addresses challenges related to the unbounded volume of data and concept drift. The algorithm is model-agnostic for point outlier detection and provides real-time explanations based on the local context of the outlier, derived from time-based tumbling windows. The paper provides a complexity analysis of EXOS and an experimental analysis comparing EXOS with existing algorithms. The evaluation includes an assessment of performance on both real-world and synthetic datasets in terms of average precision, recall, F1-score, and explanation time. The evaluation results show that, on average, EXOS achieves a 45.6% better F1 Score and is 7.3 times lower in explanation time compared to existing outlying attribute algorithms.

数据流是带有时间戳的数据点的连续序列,由于其时间敏感性,有必要对其进行实时监控。在网络安全和信用卡交易监控等各种数据流应用中,实时检测异常值至关重要,因为这些异常值往往意味着潜在的威胁。同样重要的是对异常值进行实时解释,使用户能够获得深刻见解,从而缩短调查时间。异常值的调查时间与其属性数量密切相关,因此必须提供解释,详细说明造成数据点异常的属性,即异常属性。然而,数据流的无限制数据量和概念漂移给实时发现异常值的离群属性带来了挑战。为此,我们在本文中提出了 EXOS 算法,该算法旨在发现数据流中多维离群值的离群属性。EXOS 可利用数据流之间的交叉相关性,适应不同的数据流模式和到达率,并能有效解决与无限制数据量和概念漂移相关的挑战。该算法在离群点检测方面与模型无关,并根据基于时间的翻滚窗口得出的离群点局部上下文提供实时解释。论文提供了 EXOS 的复杂性分析以及 EXOS 与现有算法比较的实验分析。评估包括对实际数据集和合成数据集的平均精确度、召回率、F1-分数和解释时间的性能评估。评估结果表明,与现有的离群属性算法相比,EXOS 的 F1 分数平均提高了 45.6%,解释时间缩短了 7.3 倍。
{"title":"Discovering outlying attributes of outliers in data streams","authors":"Egawati Panjei ,&nbsp;Le Gruenwald","doi":"10.1016/j.datak.2024.102349","DOIUrl":"10.1016/j.datak.2024.102349","url":null,"abstract":"<div><p>Data streams, continuous sequences of timestamped data points, necessitate real-time monitoring due to their time-sensitive nature. In various data stream applications, such as network security and credit card transaction monitoring, real-time detection of outliers is crucial, as these outliers often signify potential threats. Equally important is the real-time explanation of outliers, enabling users to glean insights and thereby shorten their investigation time. The investigation time for outliers is closely tied to their number of attributes, making it essential to provide explanations that detail which attributes are responsible for the abnormality of a data point, referred to as outlying attributes. However, the unbounded volume of data and concept drift of data streams pose challenges for discovering the outlying attributes of outliers in real time. In response, in this paper we propose EXOS, an algorithm designed for discovering the outlying attributes of multi-dimensional outliers in data streams. EXOS leverages cross-correlations among data streams, accommodates varying data stream schemas and arrival rates, and effectively addresses challenges related to the unbounded volume of data and concept drift. The algorithm is model-agnostic for point outlier detection and provides real-time explanations based on the local context of the outlier, derived from time-based tumbling windows. The paper provides a complexity analysis of EXOS and an experimental analysis comparing EXOS with existing algorithms. The evaluation includes an assessment of performance on both real-world and synthetic datasets in terms of average precision, recall, F1-score, and explanation time. The evaluation results show that, on average, EXOS achieves a 45.6% better F1 Score and is 7.3 times lower in explanation time compared to existing outlying attribute algorithms.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"154 ","pages":"Article 102349"},"PeriodicalIF":2.7,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142121852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A self-adaptive density-based clustering algorithm for varying densities datasets with strong disturbance factor 针对具有强干扰因素的不同密度数据集的基于密度的自适应聚类算法
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-07 DOI: 10.1016/j.datak.2024.102345
Zihao Cai, Zhaodong Gu, Kejing He

Clustering is a fundamental task in data mining, aiming to group similar objects together based on their features or attributes. With the rapid increase in data analysis volume and the growing complexity of high-dimensional data distribution, clustering has become increasingly important in numerous applications, including image analysis, text mining, and anomaly detection. DBSCAN is a powerful tool for clustering analysis and is widely used in density-based clustering algorithms. However, DBSCAN and its variants encounter challenges when confronted with datasets exhibiting clusters of varying densities in intricate high-dimensional spaces affected by significant disturbance factors. A typical example is multi-density clustering connected by a few data points with strong internal correlations, a scenario commonly encountered in the analysis of crowd mobility. To address these challenges, we propose a Self-adaptive Density-Based Clustering Algorithm for Varying Densities Datasets with Strong Disturbance Factor (SADBSCAN). This algorithm comprises a data block splitter, a local clustering module, a global clustering module, and a data block merger to obtain adaptive clustering results. We conduct extensive experiments on both artificial and real-world datasets to evaluate the effectiveness of SADBSCAN. The experimental results indicate that SADBSCAN significantly outperforms several strong baselines across different metrics, demonstrating the high adaptability and scalability of our algorithm.

聚类是数据挖掘的一项基本任务,旨在根据相似对象的特征或属性将其归类。随着数据分析量的快速增长和高维数据分布的日益复杂,聚类在图像分析、文本挖掘和异常检测等众多应用中变得越来越重要。DBSCAN 是一种功能强大的聚类分析工具,被广泛应用于基于密度的聚类算法中。然而,当数据集在受重大干扰因素影响的错综复杂的高维空间中呈现出不同密度的聚类时,DBSCAN 及其变体就会遇到挑战。一个典型的例子是由几个具有强内部相关性的数据点连接而成的多密度聚类,这是人群流动性分析中经常遇到的情况。为了应对这些挑战,我们提出了一种针对具有强干扰因素的不同密度数据集的自适应密度聚类算法(SADBSCAN)。该算法由数据块分割器、局部聚类模块、全局聚类模块和数据块合并器组成,以获得自适应聚类结果。我们在人工数据集和真实数据集上进行了大量实验,以评估 SADBSCAN 的有效性。实验结果表明,在不同指标上,SADBSCAN 明显优于几种强大的基线算法,证明了我们算法的高适应性和可扩展性。
{"title":"A self-adaptive density-based clustering algorithm for varying densities datasets with strong disturbance factor","authors":"Zihao Cai,&nbsp;Zhaodong Gu,&nbsp;Kejing He","doi":"10.1016/j.datak.2024.102345","DOIUrl":"10.1016/j.datak.2024.102345","url":null,"abstract":"<div><p>Clustering is a fundamental task in data mining, aiming to group similar objects together based on their features or attributes. With the rapid increase in data analysis volume and the growing complexity of high-dimensional data distribution, clustering has become increasingly important in numerous applications, including image analysis, text mining, and anomaly detection. DBSCAN is a powerful tool for clustering analysis and is widely used in density-based clustering algorithms. However, DBSCAN and its variants encounter challenges when confronted with datasets exhibiting clusters of varying densities in intricate high-dimensional spaces affected by significant disturbance factors. A typical example is multi-density clustering connected by a few data points with strong internal correlations, a scenario commonly encountered in the analysis of crowd mobility. To address these challenges, we propose a Self-adaptive Density-Based Clustering Algorithm for Varying Densities Datasets with Strong Disturbance Factor (SADBSCAN). This algorithm comprises a data block splitter, a local clustering module, a global clustering module, and a data block merger to obtain adaptive clustering results. We conduct extensive experiments on both artificial and real-world datasets to evaluate the effectiveness of SADBSCAN. The experimental results indicate that SADBSCAN significantly outperforms several strong baselines across different metrics, demonstrating the high adaptability and scalability of our algorithm.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"153 ","pages":"Article 102345"},"PeriodicalIF":2.7,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141979340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Developing A Decision Support System for Healthcare Practices: A Design Science Research Approach 为医疗实践开发决策支持系统:设计科学研究方法
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-17 DOI: 10.1016/j.datak.2024.102344
Arun Sen , Atish P. Sinha , Cong Zhang

We propose a new approach for designing a decision support system (DSS) for the transformation of healthcare practices. Practice transformation helps practices transition from their current state to patient-centered medical home (PCMH) model of care. Our approach employs activity theory to derive the elements of practice transformation by designing and integrating two ontologies: a domain ontology and a task ontology. By incorporating both goal-oriented and task-oriented aspects of the practice transformation process and specifying how they interact, our integrated design model for the DSS provides prescriptive knowledge on assessing the current status of a practice with respect to PCMH recognition and navigating efficiently through a complex solution space. This knowledge, which is at a moderate level of abstraction and expressed in a language that practitioners understand, contributes to the literature by providing a formulation for a nascent design theory. We implement the integrated design model as a DSS prototype; results of validation tests conducted on the prototype indicate that it is superior to the existing PCMH readiness tracking tool with respect to effectiveness, usability, efficiency, and sustainability.

我们提出了一种设计决策支持系统(DSS)的新方法,用于医疗实践的转型。实践转型有助于医疗实践从当前状态过渡到以患者为中心的医疗之家(PCMH)护理模式。我们的方法采用活动理论,通过设计和整合两个本体:领域本体和任务本体,推导出实践转型的要素。通过整合实践转型过程中的目标导向和任务导向两个方面,并明确它们之间的互动方式,我们的 DSS 集成设计模型提供了有关评估实践在 PCMH 识别方面的现状以及在复杂的解决方案空间中有效导航的规范性知识。这些知识的抽象程度适中,并以从业人员能够理解的语言表达,为新生的设计理论提供了一种表述方式,从而为文献做出了贡献。我们将综合设计模型作为一个 DSS 原型来实施;对该原型进行的验证测试结果表明,它在有效性、可用性、效率和可持续性方面都优于现有的 PCMH 准备情况跟踪工具。
{"title":"Developing A Decision Support System for Healthcare Practices: A Design Science Research Approach","authors":"Arun Sen ,&nbsp;Atish P. Sinha ,&nbsp;Cong Zhang","doi":"10.1016/j.datak.2024.102344","DOIUrl":"10.1016/j.datak.2024.102344","url":null,"abstract":"<div><p>We propose a new approach for designing a decision support system (DSS) for the transformation of healthcare practices. Practice transformation helps practices transition from their current state to patient-centered medical home (PCMH) model of care. Our approach employs activity theory to derive the elements of practice transformation by designing and integrating two ontologies: a domain ontology and a task ontology. By incorporating both goal-oriented and task-oriented aspects of the practice transformation process and specifying how they interact, our integrated design model for the DSS provides prescriptive knowledge on assessing the current status of a practice with respect to PCMH recognition and navigating efficiently through a complex solution space. This knowledge, which is at a moderate level of abstraction and expressed in a language that practitioners understand, contributes to the literature by providing a formulation for a nascent design theory. We implement the integrated design model as a DSS prototype; results of validation tests conducted on the prototype indicate that it is superior to the existing PCMH readiness tracking tool with respect to effectiveness, usability, efficiency, and sustainability.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"154 ","pages":"Article 102344"},"PeriodicalIF":2.7,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141840536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Increasing the precision of public transit user activity location detection from smart card data analysis via spatial–temporal DBSCAN 通过时空 DBSCAN 提高智能卡数据分析中公交用户活动位置检测的精度
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-15 DOI: 10.1016/j.datak.2024.102343
Fehmi Can Ozer , Hediye Tuydes-Yaman , Gulcin Dalkic-Melek

Smart Card (SC) systems have been increasingly adopted by public transit (PT) agencies all over the world, which facilitates not only fare collection but also PT service analyses and evaluations. Spatial clustering is one of the most important methods to investigate this big data in terms of activity locations, travel patterns, user behaviours, etc. Besides spatio-temporal analysis of the clusters provide further precision for detection of PT traveller activity locations and durations. This study focuses on investigation and comparison of the effectiveness of two density-based clustering algorithms, DBSCAN, and ST-DBSCAN. The numeric results are obtained using SC data (public bus system) from the metropolitan city of Konya, Turkey, and clustering algorithms are applied to a sample of this smart card data, and activity clusters are detected for the users. The results of the study suggested that ST-DBSCAN constitutes more compact clusters in both time and space for transportation researchers who want to accurately detect passengers’ individual activity regions using SC data.

世界各地的公共交通(PT)机构越来越多地采用智能卡(SC)系统,这不仅方便了收费,也方便了对公共交通服务的分析和评估。空间聚类是研究活动地点、出行模式、用户行为等大数据的最重要方法之一。此外,对聚类的时空分析还能进一步精确检测公共交通乘客的活动地点和持续时间。本研究重点调查和比较了两种基于密度的聚类算法--DBSCAN 和 ST-DBSCAN--的有效性。研究使用土耳其科尼亚市的 SC 数据(公共汽车系统)得出了数值结果,并将聚类算法应用于该智能卡数据样本,检测出用户的活动聚类。研究结果表明,ST-DBSCAN 在时间和空间上都能构成更紧凑的聚类,适用于希望利用 SC 数据准确检测乘客个人活动区域的交通研究人员。
{"title":"Increasing the precision of public transit user activity location detection from smart card data analysis via spatial–temporal DBSCAN","authors":"Fehmi Can Ozer ,&nbsp;Hediye Tuydes-Yaman ,&nbsp;Gulcin Dalkic-Melek","doi":"10.1016/j.datak.2024.102343","DOIUrl":"10.1016/j.datak.2024.102343","url":null,"abstract":"<div><p>Smart Card (SC) systems have been increasingly adopted by public transit (PT) agencies all over the world, which facilitates not only fare collection but also PT service analyses and evaluations. Spatial clustering is one of the most important methods to investigate this big data in terms of activity locations, travel patterns, user behaviours, etc. Besides spatio-temporal analysis of the clusters provide further precision for detection of PT traveller activity locations and durations. This study focuses on investigation and comparison of the effectiveness of two density-based clustering algorithms, DBSCAN, and ST-DBSCAN. The numeric results are obtained using SC data (public bus system) from the metropolitan city of Konya, Turkey, and clustering algorithms are applied to a sample of this smart card data, and activity clusters are detected for the users. The results of the study suggested that ST-DBSCAN constitutes more compact clusters in both time and space for transportation researchers who want to accurately detect passengers’ individual activity regions using SC data.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"153 ","pages":"Article 102343"},"PeriodicalIF":2.7,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141716049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating quality of ontology-driven conceptual models abstractions 评估本体驱动概念模型抽象的质量
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-14 DOI: 10.1016/j.datak.2024.102342
Elena Romanenko , Diego Calvanese , Giancarlo Guizzardi

The complexity of an (ontology-driven) conceptual model highly correlates with the complexity of the domain and software for which it is designed. With that in mind, an algorithm for producing ontology-driven conceptual model abstractions was previously proposed. In this paper, we empirically evaluate the quality of the abstractions produced by it. First, we have implemented and tested the last version of the algorithm over a FAIR catalog of models represented in the ontology-driven conceptual modeling language OntoUML. Second, we performed three user studies to evaluate the usefulness of the resulting abstractions as perceived by modelers. This paper reports on the findings of these experiments and reflects on how they can be exploited to improve the existing algorithm.

本体驱动)概念模型的复杂性与设计该模型的领域和软件的复杂性密切相关。有鉴于此,我们之前提出了一种生成本体驱动概念模型抽象的算法。在本文中,我们对该算法生成的抽象的质量进行了实证评估。首先,我们在用本体驱动的概念模型语言 OntoUML 表示的模型 FAIR 目录上实现并测试了该算法的最后一个版本。其次,我们进行了三项用户研究,以评估建模者所感知的抽象结果的有用性。本文报告了这些实验的结果,并对如何利用这些结果改进现有算法进行了思考。
{"title":"Evaluating quality of ontology-driven conceptual models abstractions","authors":"Elena Romanenko ,&nbsp;Diego Calvanese ,&nbsp;Giancarlo Guizzardi","doi":"10.1016/j.datak.2024.102342","DOIUrl":"10.1016/j.datak.2024.102342","url":null,"abstract":"<div><p>The complexity of an (ontology-driven) conceptual model highly correlates with the complexity of the domain and software for which it is designed. With that in mind, an algorithm for producing ontology-driven conceptual model abstractions was previously proposed. In this paper, we empirically evaluate the quality of the abstractions produced by it. First, we have implemented and tested the last version of the algorithm over a FAIR catalog of models represented in the ontology-driven conceptual modeling language OntoUML. Second, we performed three user studies to evaluate the usefulness of the resulting abstractions as perceived by modelers. This paper reports on the findings of these experiments and reflects on how they can be exploited to improve the existing algorithm.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"153 ","pages":"Article 102342"},"PeriodicalIF":2.7,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000661/pdfft?md5=3da15f24c92422d6dac0dc27c996166b&pid=1-s2.0-S0169023X24000661-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141705730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An interactive approach to semantic enrichment with geospatial data 利用地理空间数据丰富语义的互动方法
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-04 DOI: 10.1016/j.datak.2024.102341
Flavio De Paoli , Michele Ciavotta , Roberto Avogadro , Emil Hristov , Milena Borukova , Dessislava Petrova-Antonova , Iva Krasteva

The ubiquitous availability of datasets has spurred the utilization of Artificial Intelligence methods and models to extract valuable insights, unearth hidden patterns, and predict future trends. However, the current process of data collection and linking heavily relies on expert knowledge and domain-specific understanding, which engenders substantial costs in terms of both time and financial resources. Therefore, streamlining the data acquisition, harmonization, and enrichment procedures to deliver high-fidelity datasets readily usable for analytics is paramount. This paper explores the capabilities of SemTUI, a comprehensive framework designed to support the enrichment of tabular data by leveraging semantics and user interaction. Utilizing SemTUI, an iterative and interactive approach is proposed to enhance the flexibility, usability and efficiency of geospatial data enrichment. The approach is evaluated through a pilot case study focused on urban planning, with a particular emphasis on geocoding. Using a real-world scenario involving the analysis of kindergarten accessibility within walking distance, the study demonstrates the proficiency of SemTUI in generating precise and semantically enriched location data. The incorporation of human feedback in the enrichment process successfully enhances the quality of the resulting dataset, highlighting SemTUI’s potential for broader applications in geospatial analysis and its usability for users with limited expertise in manipulating geospatial data.

无处不在的数据集促使人们利用人工智能方法和模型来提取有价值的见解、发掘隐藏的模式并预测未来趋势。然而,目前的数据收集和链接过程严重依赖于专家知识和对特定领域的理解,这在时间和财政资源方面都产生了巨大的成本。因此,最重要的是简化数据采集、协调和丰富程序,以提供可随时用于分析的高保真数据集。本文探讨了 SemTUI 的功能,这是一个综合框架,旨在通过利用语义和用户交互来支持表格数据的丰富。利用 SemTUI,提出了一种迭代和交互式方法,以提高地理空间数据丰富的灵活性、可用性和效率。该方法通过一项以城市规划为重点的试点案例研究进行了评估,特别强调了地理编码。该研究使用了一个涉及分析步行距离内幼儿园可达性的真实场景,展示了 SemTUI 在生成精确且语义丰富的位置数据方面的能力。在丰富过程中加入人工反馈,成功地提高了所生成数据集的质量,凸显了 SemTUI 在地理空间分析领域更广泛应用的潜力,以及它对于在地理空间数据操作方面专业知识有限的用户的可用性。
{"title":"An interactive approach to semantic enrichment with geospatial data","authors":"Flavio De Paoli ,&nbsp;Michele Ciavotta ,&nbsp;Roberto Avogadro ,&nbsp;Emil Hristov ,&nbsp;Milena Borukova ,&nbsp;Dessislava Petrova-Antonova ,&nbsp;Iva Krasteva","doi":"10.1016/j.datak.2024.102341","DOIUrl":"10.1016/j.datak.2024.102341","url":null,"abstract":"<div><p>The ubiquitous availability of datasets has spurred the utilization of Artificial Intelligence methods and models to extract valuable insights, unearth hidden patterns, and predict future trends. However, the current process of data collection and linking heavily relies on expert knowledge and domain-specific understanding, which engenders substantial costs in terms of both time and financial resources. Therefore, streamlining the data acquisition, harmonization, and enrichment procedures to deliver high-fidelity datasets readily usable for analytics is paramount. This paper explores the capabilities of <em>SemTUI</em>, a comprehensive framework designed to support the enrichment of tabular data by leveraging semantics and user interaction. Utilizing SemTUI, an iterative and interactive approach is proposed to enhance the flexibility, usability and efficiency of geospatial data enrichment. The approach is evaluated through a pilot case study focused on urban planning, with a particular emphasis on geocoding. Using a real-world scenario involving the analysis of kindergarten accessibility within walking distance, the study demonstrates the proficiency of SemTUI in generating precise and semantically enriched location data. The incorporation of human feedback in the enrichment process successfully enhances the quality of the resulting dataset, highlighting SemTUI’s potential for broader applications in geospatial analysis and its usability for users with limited expertise in manipulating geospatial data.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"153 ","pages":"Article 102341"},"PeriodicalIF":2.7,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X2400065X/pdfft?md5=969535621599adcaa2ec5e5d12e392b3&pid=1-s2.0-S0169023X2400065X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141698385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Data & Knowledge Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1