Pub Date : 2024-07-17DOI: 10.1016/j.datak.2024.102344
Arun Sen , Atish P. Sinha , Cong Zhang
We propose a new approach for designing a decision support system (DSS) for the transformation of healthcare practices. Practice transformation helps practices transition from their current state to patient-centered medical home (PCMH) model of care. Our approach employs activity theory to derive the elements of practice transformation by designing and integrating two ontologies: a domain ontology and a task ontology. By incorporating both goal-oriented and task-oriented aspects of the practice transformation process and specifying how they interact, our integrated design model for the DSS provides prescriptive knowledge on assessing the current status of a practice with respect to PCMH recognition and navigating efficiently through a complex solution space. This knowledge, which is at a moderate level of abstraction and expressed in a language that practitioners understand, contributes to the literature by providing a formulation for a nascent design theory. We implement the integrated design model as a DSS prototype; results of validation tests conducted on the prototype indicate that it is superior to the existing PCMH readiness tracking tool with respect to effectiveness, usability, efficiency, and sustainability.
{"title":"Developing A Decision Support System for Healthcare Practices: A Design Science Research Approach","authors":"Arun Sen , Atish P. Sinha , Cong Zhang","doi":"10.1016/j.datak.2024.102344","DOIUrl":"10.1016/j.datak.2024.102344","url":null,"abstract":"<div><p>We propose a new approach for designing a decision support system (DSS) for the transformation of healthcare practices. Practice transformation helps practices transition from their current state to patient-centered medical home (PCMH) model of care. Our approach employs activity theory to derive the elements of practice transformation by designing and integrating two ontologies: a domain ontology and a task ontology. By incorporating both goal-oriented and task-oriented aspects of the practice transformation process and specifying how they interact, our integrated design model for the DSS provides prescriptive knowledge on assessing the current status of a practice with respect to PCMH recognition and navigating efficiently through a complex solution space. This knowledge, which is at a moderate level of abstraction and expressed in a language that practitioners understand, contributes to the literature by providing a formulation for a nascent design theory. We implement the integrated design model as a DSS prototype; results of validation tests conducted on the prototype indicate that it is superior to the existing PCMH readiness tracking tool with respect to effectiveness, usability, efficiency, and sustainability.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"154 ","pages":"Article 102344"},"PeriodicalIF":2.7,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141840536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-15DOI: 10.1016/j.datak.2024.102343
Fehmi Can Ozer , Hediye Tuydes-Yaman , Gulcin Dalkic-Melek
Smart Card (SC) systems have been increasingly adopted by public transit (PT) agencies all over the world, which facilitates not only fare collection but also PT service analyses and evaluations. Spatial clustering is one of the most important methods to investigate this big data in terms of activity locations, travel patterns, user behaviours, etc. Besides spatio-temporal analysis of the clusters provide further precision for detection of PT traveller activity locations and durations. This study focuses on investigation and comparison of the effectiveness of two density-based clustering algorithms, DBSCAN, and ST-DBSCAN. The numeric results are obtained using SC data (public bus system) from the metropolitan city of Konya, Turkey, and clustering algorithms are applied to a sample of this smart card data, and activity clusters are detected for the users. The results of the study suggested that ST-DBSCAN constitutes more compact clusters in both time and space for transportation researchers who want to accurately detect passengers’ individual activity regions using SC data.
{"title":"Increasing the precision of public transit user activity location detection from smart card data analysis via spatial–temporal DBSCAN","authors":"Fehmi Can Ozer , Hediye Tuydes-Yaman , Gulcin Dalkic-Melek","doi":"10.1016/j.datak.2024.102343","DOIUrl":"10.1016/j.datak.2024.102343","url":null,"abstract":"<div><p>Smart Card (SC) systems have been increasingly adopted by public transit (PT) agencies all over the world, which facilitates not only fare collection but also PT service analyses and evaluations. Spatial clustering is one of the most important methods to investigate this big data in terms of activity locations, travel patterns, user behaviours, etc. Besides spatio-temporal analysis of the clusters provide further precision for detection of PT traveller activity locations and durations. This study focuses on investigation and comparison of the effectiveness of two density-based clustering algorithms, DBSCAN, and ST-DBSCAN. The numeric results are obtained using SC data (public bus system) from the metropolitan city of Konya, Turkey, and clustering algorithms are applied to a sample of this smart card data, and activity clusters are detected for the users. The results of the study suggested that ST-DBSCAN constitutes more compact clusters in both time and space for transportation researchers who want to accurately detect passengers’ individual activity regions using SC data.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"153 ","pages":"Article 102343"},"PeriodicalIF":2.7,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141716049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-14DOI: 10.1016/j.datak.2024.102342
Elena Romanenko , Diego Calvanese , Giancarlo Guizzardi
The complexity of an (ontology-driven) conceptual model highly correlates with the complexity of the domain and software for which it is designed. With that in mind, an algorithm for producing ontology-driven conceptual model abstractions was previously proposed. In this paper, we empirically evaluate the quality of the abstractions produced by it. First, we have implemented and tested the last version of the algorithm over a FAIR catalog of models represented in the ontology-driven conceptual modeling language OntoUML. Second, we performed three user studies to evaluate the usefulness of the resulting abstractions as perceived by modelers. This paper reports on the findings of these experiments and reflects on how they can be exploited to improve the existing algorithm.
{"title":"Evaluating quality of ontology-driven conceptual models abstractions","authors":"Elena Romanenko , Diego Calvanese , Giancarlo Guizzardi","doi":"10.1016/j.datak.2024.102342","DOIUrl":"10.1016/j.datak.2024.102342","url":null,"abstract":"<div><p>The complexity of an (ontology-driven) conceptual model highly correlates with the complexity of the domain and software for which it is designed. With that in mind, an algorithm for producing ontology-driven conceptual model abstractions was previously proposed. In this paper, we empirically evaluate the quality of the abstractions produced by it. First, we have implemented and tested the last version of the algorithm over a FAIR catalog of models represented in the ontology-driven conceptual modeling language OntoUML. Second, we performed three user studies to evaluate the usefulness of the resulting abstractions as perceived by modelers. This paper reports on the findings of these experiments and reflects on how they can be exploited to improve the existing algorithm.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"153 ","pages":"Article 102342"},"PeriodicalIF":2.7,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000661/pdfft?md5=3da15f24c92422d6dac0dc27c996166b&pid=1-s2.0-S0169023X24000661-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141705730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The ubiquitous availability of datasets has spurred the utilization of Artificial Intelligence methods and models to extract valuable insights, unearth hidden patterns, and predict future trends. However, the current process of data collection and linking heavily relies on expert knowledge and domain-specific understanding, which engenders substantial costs in terms of both time and financial resources. Therefore, streamlining the data acquisition, harmonization, and enrichment procedures to deliver high-fidelity datasets readily usable for analytics is paramount. This paper explores the capabilities of SemTUI, a comprehensive framework designed to support the enrichment of tabular data by leveraging semantics and user interaction. Utilizing SemTUI, an iterative and interactive approach is proposed to enhance the flexibility, usability and efficiency of geospatial data enrichment. The approach is evaluated through a pilot case study focused on urban planning, with a particular emphasis on geocoding. Using a real-world scenario involving the analysis of kindergarten accessibility within walking distance, the study demonstrates the proficiency of SemTUI in generating precise and semantically enriched location data. The incorporation of human feedback in the enrichment process successfully enhances the quality of the resulting dataset, highlighting SemTUI’s potential for broader applications in geospatial analysis and its usability for users with limited expertise in manipulating geospatial data.
{"title":"An interactive approach to semantic enrichment with geospatial data","authors":"Flavio De Paoli , Michele Ciavotta , Roberto Avogadro , Emil Hristov , Milena Borukova , Dessislava Petrova-Antonova , Iva Krasteva","doi":"10.1016/j.datak.2024.102341","DOIUrl":"10.1016/j.datak.2024.102341","url":null,"abstract":"<div><p>The ubiquitous availability of datasets has spurred the utilization of Artificial Intelligence methods and models to extract valuable insights, unearth hidden patterns, and predict future trends. However, the current process of data collection and linking heavily relies on expert knowledge and domain-specific understanding, which engenders substantial costs in terms of both time and financial resources. Therefore, streamlining the data acquisition, harmonization, and enrichment procedures to deliver high-fidelity datasets readily usable for analytics is paramount. This paper explores the capabilities of <em>SemTUI</em>, a comprehensive framework designed to support the enrichment of tabular data by leveraging semantics and user interaction. Utilizing SemTUI, an iterative and interactive approach is proposed to enhance the flexibility, usability and efficiency of geospatial data enrichment. The approach is evaluated through a pilot case study focused on urban planning, with a particular emphasis on geocoding. Using a real-world scenario involving the analysis of kindergarten accessibility within walking distance, the study demonstrates the proficiency of SemTUI in generating precise and semantically enriched location data. The incorporation of human feedback in the enrichment process successfully enhances the quality of the resulting dataset, highlighting SemTUI’s potential for broader applications in geospatial analysis and its usability for users with limited expertise in manipulating geospatial data.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"153 ","pages":"Article 102341"},"PeriodicalIF":2.7,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X2400065X/pdfft?md5=969535621599adcaa2ec5e5d12e392b3&pid=1-s2.0-S0169023X2400065X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141698385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-25DOI: 10.1016/j.datak.2024.102325
Giancarlo Guizzardi , Nicola Guarino
The terms ‘semantics’ and ‘ontology’ are increasingly appearing together with ‘explanation’, not only in the scientific literature, but also in everyday social interactions, in particular, within organizations. Ontologies have been shown to play a key role in supporting the semantic interoperability of data and knowledge representation structures used by information systems. With the proliferation of applications of Artificial Intelligence (AI) in different settings and the increasing need to guarantee their explainability (but also their interoperability) in critical contexts, the term ‘explanation’ has also become part of the scientific and technical jargon of modern information systems engineering. However, all of these terms are also significantly overloaded. In this paper, we address several interpretations of these notions, with an emphasis on their strong connection. Specifically, we discuss a notion of explanation termed ontological unpacking, which aims at explaining symbolic domain descriptions (e.g., conceptual models, knowledge graphs, logical specifications) by revealing their ontological commitment in terms of their so-called truthmakers, i.e., the entities in one’s ontology that are responsible for the truth of a description. To illustrate this methodology, we employ an ontological theory of relations to explain a symbolic model encoded in the de facto standard modeling language UML. We also discuss the essential role played by ontology-driven conceptual models (resulting from this form of explanation processes) in supporting semantic interoperability tasks. Furthermore, we revisit a proposal for quality criteria for explanations from philosophy of science to assess our approach. Finally, we discuss the relation between ontological unpacking and other forms of explanation in philosophy and science, as well as in the subarea of Artificial Intelligence known as Explainable AI (XAI).
{"title":"Explanation, semantics, and ontology","authors":"Giancarlo Guizzardi , Nicola Guarino","doi":"10.1016/j.datak.2024.102325","DOIUrl":"10.1016/j.datak.2024.102325","url":null,"abstract":"<div><p>The terms ‘semantics’ and ‘ontology’ are increasingly appearing together with ‘explanation’, not only in the scientific literature, but also in everyday social interactions, in particular, within organizations. Ontologies have been shown to play a key role in supporting the semantic interoperability of data and knowledge representation structures used by information systems. With the proliferation of applications of Artificial Intelligence (AI) in different settings and the increasing need to guarantee their explainability (but also their interoperability) in critical contexts, the term ‘explanation’ has also become part of the scientific and technical jargon of modern information systems engineering. However, all of these terms are also significantly overloaded. In this paper, we address several interpretations of these notions, with an emphasis on their strong connection. Specifically, we discuss a notion of explanation termed <em>ontological unpacking</em>, which aims at explaining symbolic domain descriptions (e.g., conceptual models, knowledge graphs, logical specifications) by revealing their <em>ontological commitment</em> in terms of their so-called <em>truthmakers</em>, i.e., the entities in one’s ontology that are responsible for the truth of a description. To illustrate this methodology, we employ an ontological theory of relations to explain a symbolic model encoded in the <em>de facto</em> standard modeling language UML. We also discuss the essential role played by ontology-driven conceptual models (resulting from this form of explanation processes) in supporting semantic interoperability tasks. Furthermore, we revisit a proposal for quality criteria for explanations from philosophy of science to assess our approach. Finally, we discuss the relation between ontological unpacking and other forms of explanation in philosophy and science, as well as in the subarea of Artificial Intelligence known as Explainable AI (XAI).</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"153 ","pages":"Article 102325"},"PeriodicalIF":2.7,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000491/pdfft?md5=79cddbdaff8702c03d78a624d5f422a3&pid=1-s2.0-S0169023X24000491-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141943335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-20DOI: 10.1016/j.datak.2024.102340
Philippe Rigaux, Virginie Thion
For centuries, sheet music scores have been the traditional way to preserve and disseminate Western music works. Nowadays, their content can be encoded in digital formats, making possible to store music score data in digital score libraries (DSL). To supply intelligent services (extracting and analysing relevant information from data), the new generation of DSL has to rely on digital representations of the score content as structured objects apt at being manipulated by high-level operators. In the present paper, we propose the Muster model, a graph-based data model for representing the music content of a digital score, and we discuss the querying of such data through graph pattern queries. We then present a proof-of-concept of this approach, which allows storing graph-based representations of music scores in the Neo4j database, and performing musical pattern searches through graph pattern queries with the Cypher query language. A benchmark study, using (real) datasets stemming from the Neuma Digital Score Library, complements this implementation.
{"title":"Topological querying of music scores","authors":"Philippe Rigaux, Virginie Thion","doi":"10.1016/j.datak.2024.102340","DOIUrl":"https://doi.org/10.1016/j.datak.2024.102340","url":null,"abstract":"<div><p>For centuries, <em>sheet music scores</em> have been the traditional way to preserve and disseminate Western music works. Nowadays, their content can be encoded in digital formats, making possible to store music score data in digital score libraries (DSL). To supply intelligent services (extracting and analysing relevant information from data), the new generation of DSL has to rely on digital representations of the score content as structured objects apt at being manipulated by high-level operators. In the present paper, we propose the <em>Muster</em> model, a graph-based data model for representing the music content of a digital score, and we discuss the querying of such data through graph pattern queries. We then present a proof-of-concept of this approach, which allows storing graph-based representations of music scores in the Neo4j database, and performing musical pattern searches through graph pattern queries with the Cypher query language. A benchmark study, using (real) datasets stemming from the <span>Neuma</span> Digital Score Library, complements this implementation.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"153 ","pages":"Article 102340"},"PeriodicalIF":2.7,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000648/pdfft?md5=422689552133a28488b6610063f13879&pid=1-s2.0-S0169023X24000648-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141484752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-19DOI: 10.1016/j.datak.2024.102337
Yi Gan , Zhihui Su , Gaoyong Lu , Pengju Zhang , Aixiang Cui , Jiawei Jiang , Duanbing Chen
As a crucial task for knowledge graphs (KGs), knowledge graph entity type inference (KGET) has garnered increasing attention in recent years. However, recent methods overlook the long-distance information pertaining to entities and the inter-types relationships. The neglect of long-distance information results in the omission of crucial entity relationships and neighbors, consequently leading to the loss of path information associated with missing types. To address this, a path-walking strategy is utilized to identify two-hop triplet paths of the crucial entity for encoding long-distance entity information. Moreover, the absence of inter-types relationships can lead to the loss of the neighborhood information of types, such as co-occurrence information. To ensure a comprehensive understanding of inter-types relationships, we consider interactions not only with the types of single entity but also with different types of entities. Finally, in order to comprehensively represent entities for missing types, considering both the dimensions of path information and neighborhood information, we propose an entity type inference model based on path walking and inter-types relationships, denoted as “ET-PT”. This model effectively extracts comprehensive entity information, thereby obtaining the most complete semantic representation of entities. The experimental results on publicly available datasets demonstrate that the proposed method outperforms state-of-the-art approaches.
{"title":"Entity type inference based on path walking and inter-types relationships","authors":"Yi Gan , Zhihui Su , Gaoyong Lu , Pengju Zhang , Aixiang Cui , Jiawei Jiang , Duanbing Chen","doi":"10.1016/j.datak.2024.102337","DOIUrl":"https://doi.org/10.1016/j.datak.2024.102337","url":null,"abstract":"<div><p>As a crucial task for knowledge graphs (KGs), knowledge graph entity type inference (KGET) has garnered increasing attention in recent years. However, recent methods overlook the long-distance information pertaining to entities and the inter-types relationships. The neglect of long-distance information results in the omission of crucial entity relationships and neighbors, consequently leading to the loss of path information associated with missing types. To address this, a path-walking strategy is utilized to identify two-hop triplet paths of the crucial entity for encoding long-distance entity information. Moreover, the absence of inter-types relationships can lead to the loss of the neighborhood information of types, such as co-occurrence information. To ensure a comprehensive understanding of inter-types relationships, we consider interactions not only with the types of single entity but also with different types of entities. Finally, in order to comprehensively represent entities for missing types, considering both the dimensions of path information and neighborhood information, we propose an entity type inference model based on path walking and inter-types relationships, denoted as “ET-PT”. This model effectively extracts comprehensive entity information, thereby obtaining the most complete semantic representation of entities. The experimental results on publicly available datasets demonstrate that the proposed method outperforms state-of-the-art approaches.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"153 ","pages":"Article 102337"},"PeriodicalIF":2.7,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000612/pdfft?md5=3856b1f399f41f93c93401f8aea9503b&pid=1-s2.0-S0169023X24000612-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141484693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-19DOI: 10.1016/j.datak.2024.102339
Francisco Mesquita, Gonçalo Marques
Coronary Heart Disease (CHD) is the dominant cause of mortality around the world. Every year, it causes about 3.9 million deaths in Europe and 1.8 million in the European Union (EU). It is responsible for 45 % and 37 % of all deaths in Europe and the European Union, respectively. Using machine learning (ML) to predict heart diseases is one of the most promising research topics, as it can improve healthcare and consequently increase the longevity of people's lives. However, although the ability to interpret the results of the predictive model is essential, most of the related studies do not propose explainable methods. To address this problem, this paper presents a classification method that not only exhibits reliable performance but is also interpretable, ensuring transparency in its decision-making process. SHapley Additive exPlanations, known as the SHAP method was chosen for model interpretability. This approach presents a comparison between different classifiers and parameter tuning techniques, providing all the details necessary to replicate the experiment and help future researchers working in the field. The proposed model achieves similar performance to those proposed in the literature, and its predictions are fully interpretable.
{"title":"An explainable machine learning approach for automated medical decision support of heart disease","authors":"Francisco Mesquita, Gonçalo Marques","doi":"10.1016/j.datak.2024.102339","DOIUrl":"https://doi.org/10.1016/j.datak.2024.102339","url":null,"abstract":"<div><p>Coronary Heart Disease (CHD) is the dominant cause of mortality around the world. Every year, it causes about 3.9 million deaths in Europe and 1.8 million in the European Union (EU). It is responsible for 45 % and 37 % of all deaths in Europe and the European Union, respectively. Using machine learning (ML) to predict heart diseases is one of the most promising research topics, as it can improve healthcare and consequently increase the longevity of people's lives. However, although the ability to interpret the results of the predictive model is essential, most of the related studies do not propose explainable methods. To address this problem, this paper presents a classification method that not only exhibits reliable performance but is also interpretable, ensuring transparency in its decision-making process. SHapley Additive exPlanations, known as the SHAP method was chosen for model interpretability. This approach presents a comparison between different classifiers and parameter tuning techniques, providing all the details necessary to replicate the experiment and help future researchers working in the field. The proposed model achieves similar performance to those proposed in the literature, and its predictions are fully interpretable.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"153 ","pages":"Article 102339"},"PeriodicalIF":2.7,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000636/pdfft?md5=9bdfa8117c5ce50d0508986a80981671&pid=1-s2.0-S0169023X24000636-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141592969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-18DOI: 10.1016/j.datak.2024.102338
Olga Francés, Javi Fernández, José Abreu-Salas, Yoan Gutiérrez, Manuel Palomar
This work presents a standardised approach to create datasets for Science and Technology Parks (STPs), facilitating future analysis of STP characteristics, trends and performance. STPs are the most representative examples of innovation ecosystems. The ETL (extraction-transformation-load) structure was adapted to a global field study of STPs. A selection stage and quality check were incorporated, and the methodology was applied to Spanish STPs. This study applies diverse techniques such as expert labelling and information extraction which uses language technologies. A novel methodology for building quality and standardised STP datasets was designed and applied to a Spanish STP case study with 49 STPs. An updatable dataset and a list of the main features impacting STPs are presented. Twenty-one (n = 21) core features were refined and selected, with fifteen of them (71.4 %) being robust enough for developing further quality analysis. The methodology presented integrates different sources with heterogeneous information that is often decentralised, disaggregated and in different formats: excel files, and unstructured information in HTML or PDF format. The existence of this updatable dataset and the defined methodology will enable powerful AI tools to be applied that focus on more sophisticated analysis, such as taxonomy, monitoring, and predictive and prescriptive analytics in the innovation ecosystems field.
这项工作提出了一种为科技园(STPs)创建数据集的标准化方法,有助于今后对科技园的特点、趋势和绩效进行分析。科技园是创新生态系统中最具代表性的例子。ETL(提取-转换-加载)结构适用于科技园的全球实地研究。其中包括选择阶段和质量检查,该方法适用于西班牙的科技园。这项研究应用了多种技术,如专家标签和使用语言技术的信息提取。设计了一种建立高质量和标准化 STP 数据集的新方法,并将其应用于包含 49 个 STP 的西班牙 STP 案例研究。文中介绍了可更新的数据集和影响科技园的主要特征列表。对 21 个核心特征进行了提炼和筛选,其中 15 个(71.4%)足以进行进一步的质量分析。所介绍的方法整合了不同来源的异构信息,这些信息通常是分散的、分类的和不同格式的:这些信息通常是分散的、分类的和不同格式的:excel 文件和 HTML 或 PDF 格式的非结构化信息。有了这个可更新的数据集和确定的方法,就可以应用功能强大的人工智能工具,进行更复杂的分析,如创新生态系统领域的分类、监测、预测和规范分析。
{"title":"A comprehensive methodology to construct standardised datasets for Science and Technology Parks","authors":"Olga Francés, Javi Fernández, José Abreu-Salas, Yoan Gutiérrez, Manuel Palomar","doi":"10.1016/j.datak.2024.102338","DOIUrl":"https://doi.org/10.1016/j.datak.2024.102338","url":null,"abstract":"<div><p>This work presents a standardised approach to create datasets for Science and Technology Parks (STPs), facilitating future analysis of STP characteristics, trends and performance. STPs are the most representative examples of innovation ecosystems. The ETL (extraction-transformation-load) structure was adapted to a global field study of STPs. A selection stage and quality check were incorporated, and the methodology was applied to Spanish STPs. This study applies diverse techniques such as expert labelling and information extraction which uses language technologies. A novel methodology for building quality and standardised STP datasets was designed and applied to a Spanish STP case study with 49 STPs. An updatable dataset and a list of the main features impacting STPs are presented. Twenty-one (<em>n</em> = 21) core features were refined and selected, with fifteen of them (71.4 %) being robust enough for developing further quality analysis. The methodology presented integrates different sources with heterogeneous information that is often decentralised, disaggregated and in different formats: excel files, and unstructured information in HTML or PDF format. The existence of this updatable dataset and the defined methodology will enable powerful AI tools to be applied that focus on more sophisticated analysis, such as taxonomy, monitoring, and predictive and prescriptive analytics in the innovation ecosystems field.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"153 ","pages":"Article 102338"},"PeriodicalIF":2.7,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141542542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-14DOI: 10.1016/j.datak.2024.102336
Claire Deventer, Pietro Zidda
Knowledge-based virtual shopping agents, that advise their users about which products to buy, are well used in technical markets such as healthcare e-commerce. To ensure the proper adoption of this technology, it is important to consider aspects of users’ psychology early in the software design process. When traditional adoption models such as UTAUT-2 work well for many technologies, they overlook important specificities of the healthcare e-commerce domain and of knowledge-based virtual agents technology. Drawing upon health information technology and virtual agent literature, we propose a complementary adoption model incorporating new predictors and moderators reflecting these domains’ specificities. The model is tested using 903 observations gathered through an online survey conducted in collaboration with a major actor in the herbal medicine market. Our model can serve as a basis for many phases of the knowledge-based agents software development. We propose actionable recommendations for practitioners and ideas for further research.
{"title":"Providing healthcare shopping advice through knowledge-based virtual agents","authors":"Claire Deventer, Pietro Zidda","doi":"10.1016/j.datak.2024.102336","DOIUrl":"10.1016/j.datak.2024.102336","url":null,"abstract":"<div><p>Knowledge-based virtual shopping agents, that advise their users about which products to buy, are well used in technical markets such as healthcare e-commerce. To ensure the proper adoption of this technology, it is important to consider aspects of users’ psychology early in the software design process. When traditional adoption models such as UTAUT-2 work well for many technologies, they overlook important specificities of the healthcare e-commerce domain and of knowledge-based virtual agents technology. Drawing upon health information technology and virtual agent literature, we propose a complementary adoption model incorporating new predictors and moderators reflecting these domains’ specificities. The model is tested using 903 observations gathered through an online survey conducted in collaboration with a major actor in the herbal medicine market. Our model can serve as a basis for many phases of the knowledge-based agents software development. We propose actionable recommendations for practitioners and ideas for further research.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"153 ","pages":"Article 102336"},"PeriodicalIF":2.7,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141412665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}