Pub Date : 2024-04-03DOI: 10.1016/j.datak.2024.102304
Tim Kreuzer, Panagiotis Papapetrou, Jelena Zdravkovic
Artificial intelligence and digital twins have become more popular in recent years and have seen usage across different application domains for various scenarios. This study reviews the literature at the intersection of the two fields, where digital twins integrate an artificial intelligence component. We follow a systematic literature review approach, analyzing a total of 149 related studies. In the assessed literature, a variety of problems are approached with an artificial intelligence-integrated digital twin, demonstrating its applicability across different fields. Our findings indicate that there is a lack of in-depth modeling approaches regarding the digital twin, while many articles focus on the implementation and testing of the artificial intelligence component. The majority of publications do not demonstrate a virtual-to-physical connection between the digital twin and the real-world system. Further, only a small portion of studies base their digital twin on real-time data from a physical system, implementing a physical-to-virtual connection.
{"title":"Artificial intelligence in digital twins—A systematic literature review","authors":"Tim Kreuzer, Panagiotis Papapetrou, Jelena Zdravkovic","doi":"10.1016/j.datak.2024.102304","DOIUrl":"https://doi.org/10.1016/j.datak.2024.102304","url":null,"abstract":"<div><p>Artificial intelligence and digital twins have become more popular in recent years and have seen usage across different application domains for various scenarios. This study reviews the literature at the intersection of the two fields, where digital twins integrate an artificial intelligence component. We follow a systematic literature review approach, analyzing a total of 149 related studies. In the assessed literature, a variety of problems are approached with an artificial intelligence-integrated digital twin, demonstrating its applicability across different fields. Our findings indicate that there is a lack of in-depth modeling approaches regarding the digital twin, while many articles focus on the implementation and testing of the artificial intelligence component. The majority of publications do not demonstrate a virtual-to-physical connection between the digital twin and the real-world system. Further, only a small portion of studies base their digital twin on real-time data from a physical system, implementing a physical-to-virtual connection.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"151 ","pages":"Article 102304"},"PeriodicalIF":2.5,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000284/pdfft?md5=7bf249b030dadbb8c82308b54aef035d&pid=1-s2.0-S0169023X24000284-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140549919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Understanding why some points in a data set are considered as anomalies cannot be done without taking into account the structure of the regular points. Whereas many machine learning methods are dedicated to the identification of anomalies on one side, or to the identification of the data inner-structure on the other side, a solution is introduced to answers these two tasks using a same data model, a variant of an isolation forest. The initial algorithm to construct an isolation forest is indeed revisited to preserve the data inner structure without affecting the efficiency of the outlier detection. Experiments conducted both on synthetic and real-world data sets show that, in addition to improving the detection of abnormal data points, the proposed variant of isolation forest allows for a reconstruction of the subspaces of high density. Therefore, the former can serve as a basis for a unified approach to detect global and local anomalies, which is a necessary condition to then provide users with informative descriptions of the data.
{"title":"Leveraging an Isolation Forest to Anomaly Detection and Data Clustering","authors":"Véronne Yepmo , Grégory Smits , Marie-Jeanne Lesot , Olivier Pivert","doi":"10.1016/j.datak.2024.102302","DOIUrl":"https://doi.org/10.1016/j.datak.2024.102302","url":null,"abstract":"<div><p>Understanding why some points in a data set are considered as anomalies cannot be done without taking into account the structure of the regular points. Whereas many machine learning methods are dedicated to the identification of anomalies on one side, or to the identification of the data inner-structure on the other side, a solution is introduced to answers these two tasks using a same data model, a variant of an isolation forest. The initial algorithm to construct an isolation forest is indeed revisited to preserve the data inner structure without affecting the efficiency of the outlier detection. Experiments conducted both on synthetic and real-world data sets show that, in addition to improving the detection of abnormal data points, the proposed variant of isolation forest allows for a reconstruction of the subspaces of high density. Therefore, the former can serve as a basis for a unified approach to detect global and local anomalies, which is a necessary condition to then provide users with informative descriptions of the data.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"151 ","pages":"Article 102302"},"PeriodicalIF":2.5,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140345076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-19DOI: 10.1016/j.datak.2024.102301
Johannes Lohmöller , Jan Pennekamp , Roman Matzutt , Carolin Victoria Schneider , Eduard Vlad , Christian Trautwein , Klaus Wehrle
Data ecosystems emerged as a new paradigm to facilitate the automated and massive exchange of data from heterogeneous information sources between different stakeholders. However, the corresponding benefits come with unforeseen risks as sensitive information is potentially exposed, questioning data ecosystem reliability. Consequently, data security is of utmost importance and, thus, a central requirement for successfully realizing data ecosystems. Academia has recognized this requirement, and current initiatives foster sovereign participation via a federated infrastructure where participants retain local control over what data they offer to whom. However, recent proposals place significant trust in remote infrastructure by implementing organizational security measures such as certification processes before the admission of a participant. At the same time, the data sensitivity incentivizes participants to bypass the organizational security measures to maximize their benefit. This issue significantly weakens security, sovereignty, and trust guarantees and highlights that organizational security measures are insufficient in this context. In this paper, we argue that data ecosystems must be extended with technical means to (re)establish dependable guarantees. We underpin this need with three representative use cases for data ecosystems, which cover personal, economic, and governmental data, and systematically map the lack of dependable guarantees in related work. To this end, we identify three enablers of dependable guarantees, namely trusted remote policy enforcement, verifiable data tracking, and integration of resource-constrained participants. These enablers are critical for securely implementing data ecosystems in data-sensitive contexts.
{"title":"The unresolved need for dependable guarantees on security, sovereignty, and trust in data ecosystems","authors":"Johannes Lohmöller , Jan Pennekamp , Roman Matzutt , Carolin Victoria Schneider , Eduard Vlad , Christian Trautwein , Klaus Wehrle","doi":"10.1016/j.datak.2024.102301","DOIUrl":"https://doi.org/10.1016/j.datak.2024.102301","url":null,"abstract":"<div><p>Data ecosystems emerged as a new paradigm to facilitate the automated and massive exchange of data from heterogeneous information sources between different stakeholders. However, the corresponding benefits come with unforeseen risks as sensitive information is potentially exposed, questioning data ecosystem reliability. Consequently, data security is of utmost importance and, thus, a central requirement for successfully realizing data ecosystems. Academia has recognized this requirement, and current initiatives foster sovereign participation via a federated infrastructure where participants retain local control over what data they offer to whom. However, recent proposals place significant trust in remote infrastructure by implementing organizational security measures such as certification processes before the admission of a participant. At the same time, the data sensitivity incentivizes participants to bypass the organizational security measures to maximize their benefit. This issue significantly weakens security, sovereignty, and trust guarantees and highlights that organizational security measures are insufficient in this context. In this paper, we argue that data ecosystems must be extended with technical means to (re)establish dependable guarantees. We underpin this need with three representative use cases for data ecosystems, which cover personal, economic, and governmental data, and systematically map the lack of dependable guarantees in related work. To this end, we identify three enablers of dependable guarantees, namely trusted remote policy enforcement, verifiable data tracking, and integration of resource-constrained participants. These enablers are critical for securely implementing data ecosystems in data-sensitive contexts.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"151 ","pages":"Article 102301"},"PeriodicalIF":2.5,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000259/pdfft?md5=5d1fb135737fcc7ddf73713a94b46ce0&pid=1-s2.0-S0169023X24000259-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140192029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-12DOI: 10.1016/j.datak.2024.102299
Nikolas Stege , Michael H. Breitner
Domain experts are driven by business needs, while data analysts develop and use various algorithms, methods, and tools, but often without domain knowledge. A major challenge for companies and organizations is to integrate data analytics in business processes and workflows. We deduce an interactive process and visualization framework to enable value creating collaboration in inter- and cross-disciplinary teams. Domain experts and data analysts are both empowered to analyze and discuss results and come to well-founded insights and implications. Inspired by a typical auditing problem, we develop and apply a visualization framework to single out unusual data in general subsets for potential further investigation. Our framework is applicable to both unusual data detected manually by domain experts or by algorithms applied by data analysts. Application examples show typical interaction, collaboration, visualization, and decision support.
{"title":"Insights into commonalities of a sample: A visualization framework to explore unusual subset-dataset relationships","authors":"Nikolas Stege , Michael H. Breitner","doi":"10.1016/j.datak.2024.102299","DOIUrl":"10.1016/j.datak.2024.102299","url":null,"abstract":"<div><p>Domain experts are driven by business needs, while data analysts develop and use various algorithms, methods, and tools, but often without domain knowledge. A major challenge for companies and organizations is to integrate data analytics in business processes and workflows. We deduce an interactive process and visualization framework to enable value creating collaboration in inter- and cross-disciplinary teams. Domain experts and data analysts are both empowered to analyze and discuss results and come to well-founded insights and implications. Inspired by a typical auditing problem, we develop and apply a visualization framework to single out unusual data in general subsets for potential further investigation. Our framework is applicable to both unusual data detected manually by domain experts or by algorithms applied by data analysts. Application examples show typical interaction, collaboration, visualization, and decision support.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"151 ","pages":"Article 102299"},"PeriodicalIF":2.5,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000235/pdfft?md5=5865a6d1aaccbc08965569d170abf88f&pid=1-s2.0-S0169023X24000235-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140151811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-11DOI: 10.1016/j.datak.2024.102300
Wei Jia , Ruizhe Ma , Li Yan , Weinan Niu , Zongmin Ma
Entity alignment, aiming at identifying equivalent entity pairs across multiple knowledge graphs (KGs), serves as a vital step for knowledge fusion. As the majority of KGs undergo continuous evolution, existing solutions utilize graph neural networks (GNNs) to tackle entity alignment within temporal knowledge graphs (TKGs). However, this prevailing method often overlooks the consequential impact of relation embedding generation on entity embeddings through inherent structures. In this paper, we propose a novel model named Time-aware Structure Matching based on GNNs (TSM-GNN) that encompasses the learning of both topological and inherent structures. Our key innovation lies in a unique method for generating relation embeddings, which can enhance entity embeddings via inherent structure. Specifically, we utilize the translation property of knowledge graphs to obtain the entity embedding that is mapped into a time-aware vector space. Subsequently, we employ GNNs to learn global entity representation. To better capture the useful information from neighboring relations and entities, we introduce a time-aware attention mechanism that assigns different importance weights to different time-aware inherent structures. Experimental results on three real-world datasets demonstrate that TSM-GNN outperforms several state-of-the-art approaches for entity alignment between TKGs.
{"title":"Time-aware structure matching for temporal knowledge graph alignment","authors":"Wei Jia , Ruizhe Ma , Li Yan , Weinan Niu , Zongmin Ma","doi":"10.1016/j.datak.2024.102300","DOIUrl":"https://doi.org/10.1016/j.datak.2024.102300","url":null,"abstract":"<div><p>Entity alignment, aiming at identifying equivalent entity pairs across multiple knowledge graphs (KGs), serves as a vital step for knowledge fusion. As the majority of KGs undergo continuous evolution, existing solutions utilize graph neural networks (GNNs) to tackle entity alignment within temporal knowledge graphs (TKGs). However, this prevailing method often overlooks the consequential impact of relation embedding generation on entity embeddings through inherent structures. In this paper, we propose a novel model named Time-aware Structure Matching based on GNNs (TSM-GNN) that encompasses the learning of both topological and inherent structures. Our key innovation lies in a unique method for generating relation embeddings, which can enhance entity embeddings via inherent structure. Specifically, we utilize the translation property of knowledge graphs to obtain the entity embedding that is mapped into a time-aware vector space. Subsequently, we employ GNNs to learn global entity representation. To better capture the useful information from neighboring relations and entities, we introduce a time-aware attention mechanism that assigns different importance weights to different time-aware inherent structures. Experimental results on three real-world datasets demonstrate that TSM-GNN outperforms several state-of-the-art approaches for entity alignment between TKGs.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"151 ","pages":"Article 102300"},"PeriodicalIF":2.5,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140138228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-29DOI: 10.1016/j.datak.2024.102286
Marcos Da Silveira, Louis Deladiennee, Emmanuel Scolan, Cedric Pruski
The ever-increasing interest of academia, industry, and government institutions in space resource information highlights the difficulty of finding, accessing, integrating, and reusing this information. Although information is regularly published on the internet, it is disseminated on many different websites and in different formats, including scientific publications, patents, news, and reports. We are currently developing a knowledge management and sharing platform for space resources. This tool, which relies on the combined use of knowledge graphs and ontologies, formalises the domain knowledge contained in the above-mentioned documents and makes it more readily available to the community. In this article, we describe the concepts and techniques of knowledge extraction and management adopted during the design and implementation of the platform.
{"title":"A knowledge-sharing platform for space resources","authors":"Marcos Da Silveira, Louis Deladiennee, Emmanuel Scolan, Cedric Pruski","doi":"10.1016/j.datak.2024.102286","DOIUrl":"https://doi.org/10.1016/j.datak.2024.102286","url":null,"abstract":"<div><p>The ever-increasing interest of academia, industry, and government institutions in space resource information highlights the difficulty of finding, accessing, integrating, and reusing this information. Although information is regularly published on the internet, it is disseminated on many different websites and in different formats, including scientific publications, patents, news, and reports. We are currently developing a knowledge management and sharing platform for space resources. This tool, which relies on the combined use of knowledge graphs and ontologies, formalises the domain knowledge contained in the above-mentioned documents and makes it more readily available to the community. In this article, we describe the concepts and techniques of knowledge extraction and management adopted during the design and implementation of the platform.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"151 ","pages":"Article 102286"},"PeriodicalIF":2.5,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140042746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-28DOI: 10.1016/j.datak.2024.102285
Franck Anaël Mbiaya , Christel Vrain , Frédéric Ros , Thi-Bich-Hanh Dao , Yves Lucas
This paper introduces a deep learning method for image classification that leverages knowledge formalized as a graph created from information represented by pairs attribute/value. The proposed method investigates a loss function that adaptively combines the classical cross-entropy commonly used in deep learning with a novel penalty function. The novel loss function is derived from the representation of nodes after embedding the knowledge graph and incorporates the proximity between class and image nodes. Its formulation enables the model to focus on identifying the boundary between the most challenging classes to distinguish. Experimental results on several image databases demonstrate improved performance compared to state-of-the-art methods, including classical deep learning algorithms and recent algorithms that incorporate knowledge represented by a graph.
{"title":"Knowledge graph-based image classification","authors":"Franck Anaël Mbiaya , Christel Vrain , Frédéric Ros , Thi-Bich-Hanh Dao , Yves Lucas","doi":"10.1016/j.datak.2024.102285","DOIUrl":"https://doi.org/10.1016/j.datak.2024.102285","url":null,"abstract":"<div><p>This paper introduces a deep learning method for image classification that leverages knowledge formalized as a graph created from information represented by pairs attribute/value. The proposed method investigates a loss function that adaptively combines the classical cross-entropy commonly used in deep learning with a novel penalty function. The novel loss function is derived from the representation of nodes after embedding the knowledge graph and incorporates the proximity between class and image nodes. Its formulation enables the model to focus on identifying the boundary between the most challenging classes to distinguish. Experimental results on several image databases demonstrate improved performance compared to state-of-the-art methods, including classical deep learning algorithms and recent algorithms that incorporate knowledge represented by a graph.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"151 ","pages":"Article 102285"},"PeriodicalIF":2.5,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000090/pdfft?md5=197a1155c2e53ecde4dd061f7a501a91&pid=1-s2.0-S0169023X24000090-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140113524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-09DOI: 10.1016/j.datak.2024.102284
Mireia Costa, Ana León, Óscar Pastor
Alzheimer's disease is the most common type of dementia in the elderly. Nevertheless, there is an early onset form that is difficult to diagnose precisely. As the genetic component is the most critical factor in developing this disease, identifying relevant genetic variants is key to obtaining a more reliable and straightforward diagnosis. The information about these variants is stored in an extensive number of data sources, which must be carefully analyzed to select only the information with sufficient quality to be used in a clinical setting. This selection has become complex due to the increasing available genomic information. The SILE method was designed to systematize identifying relevant variants for a disease in this challenging context. However, several problems on how SILE identifies relevant variants were discovered when applying the method to the early onset form of Alzheimer's disease. More specifically, the method failed to address specific features of this disease such as its low incidence and familiar component. This paper proposes an improvement of the identification process defined by the SILE method to make it applicable to a further spectrum of diseases. Details of how the proposed solution has been applied are also reported. As a result of this improvement, a set of 29 variants has been identified (25 variants Accepted with a Limited Evidence and 4 Accepted with Moderate Evidence). This constitutes a valuable result that facilitates and reinforces the genetic diagnosis of the disease.
阿尔茨海默病是最常见的老年痴呆症。然而,也有一种难以精确诊断的早发型老年痴呆症。由于遗传因素是导致这种疾病的最关键因素,因此识别相关的遗传变异是获得更可靠、更直接诊断的关键。有关这些变异的信息存储在大量数据源中,必须对这些数据源进行仔细分析,只选择质量足够高的信息用于临床。由于可用的基因组信息越来越多,这种选择变得越来越复杂。SILE 方法就是为了在这种充满挑战的情况下系统地识别疾病的相关变异而设计的。然而,在将 SILE 方法应用于早发性阿尔茨海默病时,发现了该方法在识别相关变异方面存在的一些问题。更具体地说,该方法未能解决这种疾病的具体特征,如发病率低和熟悉的成分。本文建议改进 SILE 方法定义的识别过程,使其适用于更多的疾病。本文还详细介绍了如何应用所提出的解决方案。经过改进后,已识别出一组 29 个变体(25 个变体以有限证据接受,4 个以中等证据接受)。这是一项宝贵的成果,有助于并加强疾病的基因诊断。
{"title":"Improving the identification of relevant variants in genome information systems: A methodological approach with a case study on early onset Alzheimer's disease","authors":"Mireia Costa, Ana León, Óscar Pastor","doi":"10.1016/j.datak.2024.102284","DOIUrl":"https://doi.org/10.1016/j.datak.2024.102284","url":null,"abstract":"<div><p>Alzheimer's disease is the most common type of dementia in the elderly. Nevertheless, there is an early onset form that is difficult to diagnose precisely. As the genetic component is the most critical factor in developing this disease, identifying relevant genetic variants is key to obtaining a more reliable and straightforward diagnosis. The information about these variants is stored in an extensive number of data sources, which must be carefully analyzed to select only the information with sufficient quality to be used in a clinical setting. This selection has become complex due to the increasing available genomic information. The SILE method was designed to systematize identifying relevant variants for a disease in this challenging context. However, several problems on how SILE identifies relevant variants were discovered when applying the method to the early onset form of Alzheimer's disease. More specifically, the method failed to address specific features of this disease such as its low incidence and familiar component. This paper proposes an improvement of the identification process defined by the SILE method to make it applicable to a further spectrum of diseases. Details of how the proposed solution has been applied are also reported. As a result of this improvement, a set of 29 variants has been identified (25 variants Accepted with a Limited Evidence and 4 Accepted with Moderate Evidence). This constitutes a valuable result that facilitates and reinforces the genetic diagnosis of the disease.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"151 ","pages":"Article 102284"},"PeriodicalIF":2.5,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000089/pdfft?md5=571739f0b90877da191a9d94a852f178&pid=1-s2.0-S0169023X24000089-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139738034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-04DOI: 10.1016/j.datak.2024.102278
Huma Parveen , Syed Wajahat Abbas Rizvi , Raja Sarath Kumar Boddu
Modern medicinal analysis is a complex procedure, requiring precise patient data, scientific knowledge obtained over numerous years and a theoretical understanding of related medical literature. To improve the accuracy and to reduce the time for diagnosis, clinical decision support systems (DSS) were introduced, which incorporate data mining schemes for enhancing the disease diagnosing accuracy. This work proposes a new disease-predicting model that involves 3 stages. Initially, “improved stemming and tokenization” are carried out in the pre-processing stage. Then, the “Fuzzy ontology, improved mutual information (MI), and correlation features” are extracted. Then, prediction is carried out via ensemble classifiers that include “improved Fuzzy logic, Long Short Term Memory (LSTM), Deep Convolution Neural Network (DCNN), and Bidirectional Gated Recurrent Unit (Bi-GRU)”.The outcomes from improved fuzzy logic, LSTM, and DCNN are further classified via Bi-GRU which offers the results. Specifically, Bi-GRU weights are optimally tuned using Deer Hunting Update Explored Arithmetic Optimization (DHUEAO). Finally, the efficiency of the proposed work is determined concerning a variety of metrics.
{"title":"Fuzzy-Ontology based knowledge driven disease risk level prediction with optimization assisted ensemble classifier","authors":"Huma Parveen , Syed Wajahat Abbas Rizvi , Raja Sarath Kumar Boddu","doi":"10.1016/j.datak.2024.102278","DOIUrl":"10.1016/j.datak.2024.102278","url":null,"abstract":"<div><p>Modern medicinal analysis is a complex procedure, requiring precise patient data, scientific knowledge obtained over numerous years and a theoretical understanding of related medical literature. To improve the accuracy and to reduce the time for diagnosis, clinical decision support systems (DSS) were introduced, which incorporate data mining schemes for enhancing the disease diagnosing accuracy. This work proposes a new disease-predicting model that involves 3 stages. Initially, “improved stemming and tokenization” are carried out in the pre-processing stage. Then, the “Fuzzy ontology, improved mutual information (MI), and correlation features” are extracted. Then, prediction is carried out via ensemble classifiers that include “improved Fuzzy logic, Long Short Term Memory (LSTM), Deep Convolution Neural Network (DCNN), and Bidirectional Gated Recurrent Unit (Bi-GRU)”.The outcomes from improved fuzzy logic, LSTM, and DCNN are further classified via Bi-GRU which offers the results. Specifically, Bi-GRU weights are optimally tuned using Deer Hunting Update Explored Arithmetic Optimization (DHUEAO). Finally, the efficiency of the proposed work is determined concerning a variety of metrics.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"151 ","pages":"Article 102278"},"PeriodicalIF":2.5,"publicationDate":"2024-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139677918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-03DOI: 10.1016/j.datak.2024.102283
Junrui Liu , Tong Li , Zhen Yang , Di Wu , Huan Liu
Recommendation methods improve rating prediction performance by learning selection bias phenomenon-users tend to rate items they like. These methods model selection bias by calculating the propensities of ratings, but inaccurate propensity could introduce more noise, fail to model selection bias, and reduce prediction performance. We argue that learning interaction features can effectively model selection bias and improve model performance, as interaction features explain the reason of the trend. Reviews can be used to model interaction features because they have a strong intrinsic correlation with user interests and item interactions. In this study, we propose a preference- and bias-oriented fusion learning model (PBFL) that models the interaction features based on reviews and user preferences to make rating predictions. Our proposal both embeds traditional user preferences in reviews, interactions, and ratings and considers word distribution bias and review quoting to model interaction features. Six real-world datasets are used to demonstrate effectiveness and performance. PBFL achieves an average improvement of 4.46% in root-mean-square error (RMSE) and 3.86% in mean absolute error (MAE) over the best baseline.
{"title":"Fusion learning of preference and bias from ratings and reviews for item recommendation","authors":"Junrui Liu , Tong Li , Zhen Yang , Di Wu , Huan Liu","doi":"10.1016/j.datak.2024.102283","DOIUrl":"10.1016/j.datak.2024.102283","url":null,"abstract":"<div><p>Recommendation methods improve rating prediction performance by learning selection bias phenomenon-users tend to rate items they like. These methods model selection bias by calculating the propensities of ratings, but inaccurate propensity could introduce more noise, fail to model selection bias, and reduce prediction performance. We argue that learning interaction features can effectively model selection bias and improve model performance, as interaction features explain the reason of the trend. Reviews can be used to model interaction features because they have a strong intrinsic correlation with user interests and item interactions. In this study, we propose a preference- and bias-oriented fusion learning model (PBFL) that models the interaction features based on reviews and user preferences to make rating predictions. Our proposal both embeds traditional user preferences in reviews, interactions, and ratings and considers word distribution bias and review quoting to model interaction features. Six real-world datasets are used to demonstrate effectiveness and performance. PBFL achieves an average improvement of 4.46% in root-mean-square error (RMSE) and 3.86% in mean absolute error (MAE) over the best baseline.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"150 ","pages":"Article 102283"},"PeriodicalIF":2.5,"publicationDate":"2024-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139677949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}