Data & Knowledge Engineering最新文献

英文中文

Overcoming the hurdle of legal expertise: A reusable model for smartwatch privacy policies

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2025-04-01 DOI: 10.1016/j.datak.2025.102443

Constantin Buschhaus , Arvid Butting , Judith Michael , Verena Nitsch , Sebastian Pütz , Bernhard Rumpe , Carolin Stellmacher , Sabine Theis

Regulations for privacy protection aim to protect individuals from the unauthorized storage, processing, and transfer of their personal data but oftentimes fail in providing helpful support for understanding these regulations. To better communicate privacy policies for smartwatches, we need an in-depth understanding of their concepts and provide better ways to enable developers to integrate them when engineering systems. Up to now, no conceptual model exists covering privacy statements from different smartwatch manufacturers that is reusable for developers. This paper introduces such a conceptual model for privacy policies of smartwatches and shows its use in a model-driven software engineering approach to create a platform for data visualization of wearable privacy policies from different smartwatch manufacturers. We have analyzed the privacy policies of various manufacturers and extracted the relevant concepts. Moreover, we have checked the model with lawyers for its correctness, instantiated it with concrete data, and used it in a model-driven software engineering approach to create a platform for data visualization. This reusable privacy policy model can enable developers to easily represent privacy policies in their systems. This provides a foundation for more structured and understandable privacy policies which, in the long run, can increase the data sovereignty of application users.

{"title":"Overcoming the hurdle of legal expertise: A reusable model for smartwatch privacy policies","authors":"Constantin Buschhaus , Arvid Butting , Judith Michael , Verena Nitsch , Sebastian Pütz , Bernhard Rumpe , Carolin Stellmacher , Sabine Theis","doi":"10.1016/j.datak.2025.102443","DOIUrl":"10.1016/j.datak.2025.102443","url":null,"abstract":"<div><div>Regulations for privacy protection aim to protect individuals from the unauthorized storage, processing, and transfer of their personal data but oftentimes fail in providing helpful support for understanding these regulations. To better communicate privacy policies for smartwatches, we need an in-depth understanding of their concepts and provide better ways to enable developers to integrate them when engineering systems. Up to now, no conceptual model exists covering privacy statements from different smartwatch manufacturers that is reusable for developers. This paper introduces such a conceptual model for privacy policies of smartwatches and shows its use in a model-driven software engineering approach to create a platform for data visualization of wearable privacy policies from different smartwatch manufacturers. We have analyzed the privacy policies of various manufacturers and extracted the relevant concepts. Moreover, we have checked the model with lawyers for its correctness, instantiated it with concrete data, and used it in a model-driven software engineering approach to create a platform for data visualization. This reusable privacy policy model can enable developers to easily represent privacy policies in their systems. This provides a foundation for more structured and understandable privacy policies which, in the long run, can increase the data sovereignty of application users.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"159 ","pages":"Article 102443"},"PeriodicalIF":2.7,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143817727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Customized long short-term memory architecture for multi-document summarization with improved text feature set 用于多文档摘要的定制化长短时记忆架构，具有改进的文本特征集

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2025-03-25 DOI: 10.1016/j.datak.2025.102440

Satya Deo , Debajyoty Banik , Prasant Kumar Pattnaik

One among the most crucial concerns in the domain of Natural Language Processing (NLP) is the Multi-Document Summarization (MDS) and in recent decades, the focus on this issue has risen massively. Hence, it is vital for the NLP community to provide effective and reliable MDS methods. Current deep learning-dependent MDS techniques rely on the extraordinary capacity of neural networks, in order to extract distinctive features. Motivated by this fact, we introduce a novel MDS technique, named as Customized Long Short-Term Memory-based Multi-Document Summarization using IBi-GRU (CLSTM-MDS+IBi-GRU), which includes the following working processes. Firstly, the input data gets converted into tokens by the Bi-directional Transformer (BERT) tokenizer. The features, such as Term Frequency- Inverse Document Frequency (TF-IDF), Bag of Words (BoW), thematic features and an improved aspect term-based feature are then extracted afterwards. Finally, the summarization process takes place by utilizing the concatenation of Customized Long Short-Term Memory (CLSTM) with a pre-eminent layer. Accurate and high-quality summary is provided via introducing this layer in the LSTM module and the Bi-GRU-based Inception module (IBi-GRU), which can capture long range dependences through parallel convolution. The outcomes of this work prove the superiority of our CLSTM-MDS in the Multi-Document Summarization task.

{"title":"Customized long short-term memory architecture for multi-document summarization with improved text feature set","authors":"Satya Deo , Debajyoty Banik , Prasant Kumar Pattnaik","doi":"10.1016/j.datak.2025.102440","DOIUrl":"10.1016/j.datak.2025.102440","url":null,"abstract":"<div><div>One <strong>a</strong>mong the most crucial concerns in the domain of Natural Language Processing (NLP) is the Multi-Document Summarization (MDS) and in recent decades, the focus on this issue has risen massively. Hence, it is vital for the NLP community to provide effective and reliable MDS methods. Current deep learning-dependent MDS techniques rely on the extraordinary capacity of neural networks, in order to extract distinctive features. Motivated by this fact, we introduce a novel MDS technique, named as Customized Long Short-Term Memory-based Multi-Document Summarization using IBi-GRU <strong>(</strong>CLSTM-MDS+IBi-GRU), which includes the following working processes. Firstly, the input data gets converted into tokens by the Bi-directional Transformer (BERT) tokenizer. The features, such as Term Frequency- Inverse Document Frequency (TF-IDF), Bag of Words (BoW), thematic features and an improved aspect term-based feature are then extracted afterwards. Finally, the summarization process takes place by utilizing the concatenation of Customized Long Short-Term Memory (CLSTM) with a pre-eminent layer. Accurate and high-quality summary is provided via introducing this layer in the LSTM module and the Bi-GRU-based Inception module (IBi-GRU), which can capture long range dependences through parallel convolution. The outcomes of this work prove the superiority of our CLSTM-MDS in the Multi-Document Summarization task.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"159 ","pages":"Article 102440"},"PeriodicalIF":2.7,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143800459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Application of digital shadows on different levels in the automation pyramid

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2025-03-24 DOI: 10.1016/j.datak.2025.102442

Malte Heithoff , Christian Hopmann , Thilo Köbel , Judith Michael , Bernhard Rumpe , Patrick Sapel

The concept of digital shadows helps to move from handling large amounts of heterogeneous data in production to the handling of task- and context-dependent aggregated data sets supporting a specific purpose. Current research lacks further investigations of characteristics digital shadows may have when they are applied to different levels of the automation pyramid. Within this paper, we describe the application of the digital shadow concept for two use cases in injection molding, namely geometry-dependent process configuration, and optimal production planning of jobs on an injection molding machine. In detail, we describe the creation process of digital shadows, relevant data needs for the specific purpose, as well as relevant models. Based on their usage, we describe specifics of their characteristics and discuss commonalities and differences. These aspects can be taken into account when creating digital shadows for further use cases.

数字阴影的概念有助于从处理生产中的大量异构数据转向处理支持特定目的的、与任务和上下文相关的聚合数据集。目前的研究缺乏对数字阴影应用于自动化金字塔不同层次时可能具有的特征的进一步研究。在本文中，我们介绍了数字阴影概念在注塑成型领域两个使用案例中的应用，即与几何形状相关的流程配置和注塑成型机上作业的优化生产计划。我们详细描述了数字阴影的创建过程、特定用途的相关数据需求以及相关模型。根据其用途，我们描述了它们的具体特点，并讨论了它们的共性和差异。在为其他使用案例创建数字阴影时，可以考虑这些方面。

引用次数: 0

Fake news detection algorithms – A systematic literature review

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2025-03-17 DOI: 10.1016/j.datak.2025.102441

Ana Julia Dal Forno , Graziela Piccoli Richetti , Vinícius Heinz Knaesel

Social media and news platforms make available to their users, in real-time and simultaneously, access to a significant amount of content that may be true or false. It is remarkable that, with the evolution of Industry 4.0 technologies, the production and dissemination of fake news also increased in recent years. Some content quickly reaches considerable popularity because it is accessed and shared on a large scale, especially in social networks, thus having a potential for going viral. Thus, this study aimed to identify the algorithms and software used for fake news detection. The choice for this combination is justified because in Brazil this process is carried out manually by verification agencies and thus, based on the mapping of the algorithms identified in the literature, an architecture proposal will be developed using artificial intelligence. As a methodology, a systematic literature review (SLR) was conducted in the Science Direct and Scopus databases using the keywords "fake news" and "machine learning" to locate reviews and research articles published in Engineering fields from 2018 to 2023. A total of 24 articles were analyzed, and the results pointed out that Facebook and X¹ were the social networks most used to disseminate fake news. Moreover, the main topics addressed were the COVID-19 pandemic and the United States presidential elections of 2016 and 2020. As for the most used algorithms, a predominance of neural networks was observed. The contribution of this study is in mapping the most used algorithms and their degree of assertiveness, as well as identifying the themes, countries and related researchers that help in the evolution of the fake news theme.

{"title":"Fake news detection algorithms – A systematic literature review","authors":"Ana Julia Dal Forno , Graziela Piccoli Richetti , Vinícius Heinz Knaesel","doi":"10.1016/j.datak.2025.102441","DOIUrl":"10.1016/j.datak.2025.102441","url":null,"abstract":"<div><div>Social media and news platforms make available to their users, in real-time and simultaneously, access to a significant amount of content that may be true or false. It is remarkable that, with the evolution of Industry 4.0 technologies, the production and dissemination of fake news also increased in recent years. Some content quickly reaches considerable popularity because it is accessed and shared on a large scale, especially in social networks, thus having a potential for going viral. Thus, this study aimed to identify the algorithms and software used for fake news detection. The choice for this combination is justified because in Brazil this process is carried out manually by verification agencies and thus, based on the mapping of the algorithms identified in the literature, an architecture proposal will be developed using artificial intelligence. As a methodology, a systematic literature review (SLR) was conducted in the Science Direct and Scopus databases using the keywords \"fake news\" and \"machine learning\" to locate reviews and research articles published in Engineering fields from 2018 to 2023. A total of 24 articles were analyzed, and the results pointed out that Facebook and X<span><span><sup>1</sup></span></span> were the social networks most used to disseminate fake news. Moreover, the main topics addressed were the COVID-19 pandemic and the United States presidential elections of 2016 and 2020. As for the most used algorithms, a predominance of neural networks was observed. The contribution of this study is in mapping the most used algorithms and their degree of assertiveness, as well as identifying the themes, countries and related researchers that help in the evolution of the fake news theme.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"158 ","pages":"Article 102441"},"PeriodicalIF":2.7,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143683664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Modelling process durations with gamma mixtures for right-censored data: Applications in customer clustering, pattern recognition, drift detection, and rationalisation

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2025-03-14 DOI: 10.1016/j.datak.2025.102430

Lingkai Yang , Sally McClean , Kevin Burke , Mark Donnelly , Kashaf Khan

Customer modelling, particularly concerning length of stay or process duration, is vital for identifying customer patterns and optimising business processes. Recent advancements in computing and database technologies have revolutionised statistics and business process analytics by producing heterogeneous data that reflects diverse customer behaviours. Different models should be employed for distinct customer categories, culminating in an overall mixture model. Furthermore, some customers may remain “alive” at the conclusion of the observation period, meaning their journeys are incomplete, resulting in right-censored (RC) duration data. This combination of heterogeneous and right-censored data introduces complexity to process duration modelling and analysis. This paper presents a general approach to modelling process duration data using a gamma mixture model, where each gamma distribution represents a specific customer pattern. The model is adapted to account for RC data by modifying the likelihood function during model fitting. The paper explores three key application scenarios: (1) offline pattern clustering, which categorises customers who have completed their journeys; (2) online pattern tracking, which monitors and predicts customer behaviours in real-time; and (3) concept drift detection and rationalisation, which identifies shifts in customer patterns and explains their underlying causes. The proposed method has been validated using synthetically generated data and real-world data from a hospital billing process. In all instances, the fitted models effectively represented the data and demonstrated strong performance across the three application scenarios.

{"title":"Modelling process durations with gamma mixtures for right-censored data: Applications in customer clustering, pattern recognition, drift detection, and rationalisation","authors":"Lingkai Yang , Sally McClean , Kevin Burke , Mark Donnelly , Kashaf Khan","doi":"10.1016/j.datak.2025.102430","DOIUrl":"10.1016/j.datak.2025.102430","url":null,"abstract":"<div><div>Customer modelling, particularly concerning length of stay or process duration, is vital for identifying customer patterns and optimising business processes. Recent advancements in computing and database technologies have revolutionised statistics and business process analytics by producing heterogeneous data that reflects diverse customer behaviours. Different models should be employed for distinct customer categories, culminating in an overall mixture model. Furthermore, some customers may remain “alive” at the conclusion of the observation period, meaning their journeys are incomplete, resulting in right-censored (RC) duration data. This combination of heterogeneous and right-censored data introduces complexity to process duration modelling and analysis. This paper presents a general approach to modelling process duration data using a gamma mixture model, where each gamma distribution represents a specific customer pattern. The model is adapted to account for RC data by modifying the likelihood function during model fitting. The paper explores three key application scenarios: (1) offline pattern clustering, which categorises customers who have completed their journeys; (2) online pattern tracking, which monitors and predicts customer behaviours in real-time; and (3) concept drift detection and rationalisation, which identifies shifts in customer patterns and explains their underlying causes. The proposed method has been validated using synthetically generated data and real-world data from a hospital billing process. In all instances, the fitted models effectively represented the data and demonstrated strong performance across the three application scenarios.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"158 ","pages":"Article 102430"},"PeriodicalIF":2.7,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143654587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Accessibility in conceptual modeling—A systematic literature review, a keyboard-only UML modeling tool, and a research roadmap

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2025-03-06 DOI: 10.1016/j.datak.2025.102423

Aylin Sarioğlu, Haydar Metin, Dominik Bork

The reports on Disability by the World Health Organization show that the number of people with disabilities is increasing. Consequently, accessibility should play an essential role in information systems engineering research. While there is an increasingly rich set of available web accessibility guidelines, testing frameworks, and generally accessibility features in modern web-based software systems, software development frameworks, and Integrated Development Environments, this paper shows, based on a systematic review of the literature and current modeling tools, that accessibility is, so far, only scarcely focused in conceptual modeling research. With this paper, we assess the state of the art of accessibility in conceptual modeling, we identify current research gaps, and we delineate a vision toward more accessible conceptual modeling methods and tools. As a concrete step forward toward this vision, we present a generic concept of a keyboard-only modeling tool interaction that is implemented as a new module for the Graphical Language Server Platform (GLSP) framework. We show—using a currently developed UML modeling tool—how efficiently this module allows GLSP-based tool developers to introduce accessibility features into their modeling tools, thereby engaging physically disabled users in conceptual modeling.

{"title":"Accessibility in conceptual modeling—A systematic literature review, a keyboard-only UML modeling tool, and a research roadmap","authors":"Aylin Sarioğlu, Haydar Metin, Dominik Bork","doi":"10.1016/j.datak.2025.102423","DOIUrl":"10.1016/j.datak.2025.102423","url":null,"abstract":"<div><div>The reports on Disability by the World Health Organization show that the number of people with disabilities is increasing. Consequently, accessibility should play an essential role in information systems engineering research. While there is an increasingly rich set of available web accessibility guidelines, testing frameworks, and generally accessibility features in modern web-based software systems, software development frameworks, and Integrated Development Environments, this paper shows, based on a systematic review of the literature and current modeling tools, that accessibility is, so far, only scarcely focused in conceptual modeling research. With this paper, we assess the state of the art of accessibility in conceptual modeling, we identify current research gaps, and we delineate a vision toward more accessible conceptual modeling methods and tools. As a concrete step forward toward this vision, we present a generic concept of a keyboard-only modeling tool interaction that is implemented as a new module for the Graphical Language Server Platform (GLSP) framework. We show—using a currently developed UML modeling tool—how efficiently this module allows GLSP-based tool developers to introduce accessibility features into their modeling tools, thereby engaging physically disabled users in conceptual modeling.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"158 ","pages":"Article 102423"},"PeriodicalIF":2.7,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143579543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Privacy-preserving cross-network service recommendation via federated learning of unified user representations

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2025-03-04 DOI: 10.1016/j.datak.2025.102422

Mohamed Gaith Ayadi , Haithem Mezni , Hela Elmannai , Reem Ibrahim Alkanhel

With the emergence of cloud computing, the Internet of Things, and other large-scale environments, recommender systems have been faced with several issues, mainly (i) the distribution of user–item data across multiple information networks, (ii) privacy restrictions and the partial profiling of users and items caused by this distribution, (iii) the heterogeneity of user–item knowledge in different information networks. Furthermore, most approaches perform recommendations based on a single source of information, and do not handle the partial representation of users’ and items’ information in a federated way. Such isolated and non-collaborative behavior, in multi-source and cross-network information settings, often results in inaccurate and low-quality recommendations. To address these issues, we exploit the strengths of network representation learning and federated learning to propose a service recommendation approach in smart service networks. While NRL is employed to learn rich representations of entities (e.g., users, services, IoT objects), federated learning helps collaboratively infer a unified profile of users and items, based on the concept of anchor user, which are bridge entities connecting multiple information networks. These unified profiles are, finally, fed into a federated recommendation algorithm to select the top-rated services. Using a scenario from the smart healthcare context, the proposed approach was developed and validated on a multiplex information network built from real-world electronic medical records (157 diseases, 491 symptoms, 273 174 patients, treatments and anchors data). Experimental results under varied federated settings demonstrated the utility of cross-client knowledge (i.e. anchor links) and the collaborative reconstruction of composite embeddings (i.e. user representations) for improving recommendation accuracy. In terms of RMSE@K and MAE@K, our approach achieved an improvement of 54.41% compared to traditional single-network recommendation, as long as the federation and communication scale increased. Moreover, the gap with four federated approaches has reached 19.83 %, highlighting our approach’s ability to map local embeddings (i.e. user’s partial representations) into a complete view.

{"title":"Privacy-preserving cross-network service recommendation via federated learning of unified user representations","authors":"Mohamed Gaith Ayadi , Haithem Mezni , Hela Elmannai , Reem Ibrahim Alkanhel","doi":"10.1016/j.datak.2025.102422","DOIUrl":"10.1016/j.datak.2025.102422","url":null,"abstract":"<div><div>With the emergence of cloud computing, the Internet of Things, and other large-scale environments, recommender systems have been faced with several issues, mainly (i) the distribution of user–item data across multiple information networks, (ii) privacy restrictions and the partial profiling of users and items caused by this distribution, (iii) the heterogeneity of user–item knowledge in different information networks. Furthermore, most approaches perform recommendations based on a single source of information, and do not handle the partial representation of users’ and items’ information in a federated way. Such isolated and non-collaborative behavior, in multi-source and cross-network information settings, often results in inaccurate and low-quality recommendations. To address these issues, we exploit the strengths of network representation learning and federated learning to propose a service recommendation approach in smart service networks. While NRL is employed to learn rich representations of entities (e.g., users, services, IoT objects), federated learning helps collaboratively infer a unified profile of users and items, based on the concept of <em>anchor user</em>, which are bridge entities connecting multiple information networks. These unified profiles are, finally, fed into a federated recommendation algorithm to select the top-rated services. Using a scenario from the smart healthcare context, the proposed approach was developed and validated on a multiplex information network built from real-world electronic medical records (157 diseases, 491 symptoms, 273 174 patients, treatments and anchors data). Experimental results under varied federated settings demonstrated the utility of cross-client knowledge (i.e. anchor links) and the collaborative reconstruction of composite embeddings (i.e. user representations) for improving recommendation accuracy. In terms of RMSE@K and MAE@K, our approach achieved an improvement of 54.41% compared to traditional single-network recommendation, as long as the federation and communication scale increased. Moreover, the gap with four federated approaches has reached 19.83 %, highlighting our approach’s ability to map local embeddings (i.e. user’s partial representations) into a complete view.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"158 ","pages":"Article 102422"},"PeriodicalIF":2.7,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143551137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A graph theoretic approach to assess quality of data for classification task

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2025-03-03 DOI: 10.1016/j.datak.2025.102421

Payel Sadhukhan , Samrat Gupta

The correctness of predictions rendered by an AI/ML model is key to its acceptability. To foster researchers’ and practitioners’ confidence in the model, it is necessary to render an intuitive understanding of the workings of a model. In this work, we attempt to explain a model’s working by providing some insights into the quality of data. While doing this, it is essential to consider that revealing the training data to the users is not feasible for logistical and security reasons. However, sharing some interpretable parameters of the training data and correlating them with the model’s performance can be helpful in this regard. To this end, we propose a new measure based on Euclidean Minimum Spanning Tree (EMST) for quantifying the intrinsic separation (or overlaps) between the data classes. For experiments, we use datasets from diverse domains such as finance, medical, and marketing. We use state-of-the-art measure known as Davies Bouldin Index (DBI) to validate our approach on four different datasets from aforementioned domains. The experimental results of this study establish the viability of the proposed approach in explaining the working and efficiency of a classifier. Firstly, the proposed measure of class-overlap quantification has shown a better correlation with the classification performance as compared to DBI scores. Secondly, the results on multi-class datasets demonstrate that the proposed measure can be used to determine the feature importance so as to learn a better classification model.

{"title":"A graph theoretic approach to assess quality of data for classification task","authors":"Payel Sadhukhan , Samrat Gupta","doi":"10.1016/j.datak.2025.102421","DOIUrl":"10.1016/j.datak.2025.102421","url":null,"abstract":"<div><div>The correctness of predictions rendered by an AI/ML model is key to its acceptability. To foster researchers’ and practitioners’ confidence in the model, it is necessary to render an intuitive understanding of the workings of a model. In this work, we attempt to explain a model’s working by providing some insights into the quality of data. While doing this, it is essential to consider that revealing the training data to the users is not feasible for logistical and security reasons. However, sharing some interpretable parameters of the training data and correlating them with the model’s performance can be helpful in this regard. To this end, we propose a new measure based on Euclidean Minimum Spanning Tree (EMST) for quantifying the intrinsic separation (or overlaps) between the data classes. For experiments, we use datasets from diverse domains such as finance, medical, and marketing. We use state-of-the-art measure known as <em>Davies Bouldin Index (DBI)</em> to validate our approach on four different datasets from aforementioned domains. The experimental results of this study establish the viability of the proposed approach in explaining the working and efficiency of a classifier. Firstly, the proposed measure of class-overlap quantification has shown a better correlation with the classification performance as compared to DBI scores. Secondly, the results on multi-class datasets demonstrate that the proposed measure can be used to determine the feature importance so as to learn a better classification model.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"158 ","pages":"Article 102421"},"PeriodicalIF":2.7,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143591647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Advances in knowledge discovery and management, best papers of EGC 2024

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2025-03-01 DOI: 10.1016/j.datak.2025.102420

Jérôme Gensel , Christophe Cruz , Hocine Cherifi

引用次数: 0

CriMOnto: A generalized domain-specific ontology for modeling procedural norms of the Lebanese criminal law

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering

Pub Date : 2025-02-27 DOI: 10.1016/j.datak.2025.102419

Mirna El Ghosh , Hala Naja , Habib Abdulrab , Mohamad Khalil

Criminal (or penal) law regulates offenses, offenders, and legal punishments. Modeling criminal law is gaining much attention in the ontology engineering community. However, a significant aspect is neglected: the explicit representation of procedural knowledge. Procedural norms, such as regulative norms, are addressed to agents in the normative system. They govern the different interactions among these agents. In this study, we propose a formal and faithful representation of the procedural aspect of legal norms in the context of the Lebanese Criminal Code. A modular domain-specific ontology named CriMOnto is developed for this purpose. CriMOnto is grounded in the Unified Foundational Ontology (UFO) and the legal core ontology UFO-L by applying the Ontology-Driven Conceptual Modeling (ODCM) process. Conceptual Ontology Patterns (COPs) are reused from UFO and UFO-L to build the hierarchical and procedural content of the ontology. CriMOnto is validated as a formal ontology and evaluated using a dual evaluation approach. The potential use of CriMOnto for lightweight rule-based decision support is discussed in this study.

{"title":"CriMOnto: A generalized domain-specific ontology for modeling procedural norms of the Lebanese criminal law","authors":"Mirna El Ghosh , Hala Naja , Habib Abdulrab , Mohamad Khalil","doi":"10.1016/j.datak.2025.102419","DOIUrl":"10.1016/j.datak.2025.102419","url":null,"abstract":"<div><div>Criminal (or penal) law regulates offenses, offenders, and legal punishments. Modeling criminal law is gaining much attention in the ontology engineering community. However, a significant aspect is neglected: the explicit representation of procedural knowledge. Procedural norms, such as regulative norms, are addressed to agents in the normative system. They govern the different interactions among these agents. In this study, we propose a formal and faithful representation of the procedural aspect of legal norms in the context of the Lebanese Criminal Code. A modular domain-specific ontology named CriMOnto is developed for this purpose. CriMOnto is grounded in the Unified Foundational Ontology (UFO) and the legal core ontology UFO-L by applying the Ontology-Driven Conceptual Modeling (ODCM) process. Conceptual Ontology Patterns (COPs) are reused from UFO and UFO-L to build the hierarchical and procedural content of the ontology. CriMOnto is validated as a formal ontology and evaluated using a dual evaluation approach. The potential use of CriMOnto for lightweight rule-based decision support is discussed in this study.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"158 ","pages":"Article 102419"},"PeriodicalIF":2.7,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143529128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Data & Knowledge Engineering

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀