Pub Date : 2025-04-17DOI: 10.1016/j.datak.2025.102447
Harivardhagini S (Professor) , Pranavanand S (Associate Professor) , Raghuram A (Professor)
Sensor nodes that are wirelessly connected to the internet and several systems make up the Internet of Things system. Large volumes of data are often stored in big data, which complicates the classification process. There are many Big data classification strategies in use, but the main issues are the management of secure information as well as computational time. This paper's goal is to suggest a novel classification system for big data in Internet of Things networks that operates in four main phases. Particularly, the healthcare data is considered as the Big data perspective to solve the classification problem. Since the healthcare Big data is the revolutionary tool in this industry, it is becoming the most vital point of patient-centric care. Different data sources are aggregated in this Big data healthcare ecosystem. The first stage is data acquisition which takes place via Internet of Things through sensors. The second stage is improved DSig normalization for input data preprocessing. The third stage is MapReduce framework-based feature extraction for handling the Big data. This extract features like raw data, mutual information, information gain, and improved Renyi entropy. Finally, the fourth stage is an ensemble disease classification model by the combination of Recurrent Neural Network, Neural Network, and Improved Support Vector Machine for predicting normal and abnormal diseases. The suggested work is implemented by the Python tool, and the effectiveness, specificity, sensitivity, precision, and other factors of the results are assessed. The proposed ensemble model achieves superior precision of 0.9573 for the training rate of 90 % when compared to the traditional models.
{"title":"Ensemble model with combined feature set for Big data classification in IoT scenario","authors":"Harivardhagini S (Professor) , Pranavanand S (Associate Professor) , Raghuram A (Professor)","doi":"10.1016/j.datak.2025.102447","DOIUrl":"10.1016/j.datak.2025.102447","url":null,"abstract":"<div><div>Sensor nodes that are wirelessly connected to the internet and several systems make up the Internet of Things system. Large volumes of data are often stored in big data, which complicates the classification process. There are many Big data classification strategies in use, but the main issues are the management of secure information as well as computational time. This paper's goal is to suggest a novel classification system for big data in Internet of Things networks that operates in four main phases. Particularly, the healthcare data is considered as the Big data perspective to solve the classification problem. Since the healthcare Big data is the revolutionary tool in this industry, it is becoming the most vital point of patient-centric care. Different data sources are aggregated in this Big data healthcare ecosystem. The first stage is data acquisition which takes place via Internet of Things through sensors. The second stage is improved DSig normalization for input data preprocessing. The third stage is MapReduce framework-based feature extraction for handling the Big data. This extract features like raw data, mutual information, information gain, and improved Renyi entropy. Finally, the fourth stage is an ensemble disease classification model by the combination of Recurrent Neural Network, Neural Network, and Improved Support Vector Machine for predicting normal and abnormal diseases. The suggested work is implemented by the Python tool, and the effectiveness, specificity, sensitivity, precision, and other factors of the results are assessed. The proposed ensemble model achieves superior precision of 0.9573 for the training rate of 90 % when compared to the traditional models.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"159 ","pages":"Article 102447"},"PeriodicalIF":2.7,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144084758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-15DOI: 10.1016/j.datak.2025.102450
Frederik Wangelik, Majid Rafiei, Mahsa Pourbafrani, Wil M.P. van der Aalst
In recent years, the industry has been witnessing an extended usage of process mining and automated event data analysis. Consequently, there is a rising significance in addressing privacy apprehensions related to the inclusion of sensitive and private information within event data utilized by process mining algorithms. State-of-the-art research mainly focuses on providing quantifiable privacy guarantees, e.g., via differential privacy, for trace variants that are used by the main process mining techniques, e.g., process discovery. However, privacy preservation techniques designed for the release of trace variants are still insufficient to meet all the demands of industry-scale utilization. Moreover, ensuring privacy guarantees in situations characterized by a high occurrence of infrequent trace variants remains a challenging endeavor. In this paper, we introduce two novel approaches for releasing differentially private trace variants based on trained generative models. With TraVaG, we leverage Generative Adversarial Networks (GANs) to sample from a privatized implicit variant distribution. Our second method employs Denoising Diffusion Probabilistic Models that reconstruct artificial trace variants from noise via trained Markov chains. Both methods offer industry-scale benefits and elevate the degree of privacy assurances, particularly in scenarios featuring a substantial prevalence of infrequent variants. Also, they overcome the shortcomings of conventional privacy preservation techniques, such as bounding the length of variants and introducing fake variants. Experimental results on real-life event data demonstrate that our approaches surpass state-of-the-art techniques in terms of privacy guarantees and utility preservation.
{"title":"Releasing differentially private event logs using generative models","authors":"Frederik Wangelik, Majid Rafiei, Mahsa Pourbafrani, Wil M.P. van der Aalst","doi":"10.1016/j.datak.2025.102450","DOIUrl":"10.1016/j.datak.2025.102450","url":null,"abstract":"<div><div>In recent years, the industry has been witnessing an extended usage of process mining and automated event data analysis. Consequently, there is a rising significance in addressing privacy apprehensions related to the inclusion of sensitive and private information within event data utilized by process mining algorithms. State-of-the-art research mainly focuses on providing quantifiable privacy guarantees, e.g., via differential privacy, for trace variants that are used by the main process mining techniques, e.g., process discovery. However, privacy preservation techniques designed for the release of trace variants are still insufficient to meet all the demands of industry-scale utilization. Moreover, ensuring privacy guarantees in situations characterized by a high occurrence of infrequent trace variants remains a challenging endeavor. In this paper, we introduce two novel approaches for releasing differentially private trace variants based on trained generative models. With TraVaG, we leverage <em>Generative Adversarial Networks</em> (GANs) to sample from a privatized implicit variant distribution. Our second method employs <em>Denoising Diffusion Probabilistic Models</em> that reconstruct artificial trace variants from noise via trained Markov chains. Both methods offer industry-scale benefits and elevate the degree of privacy assurances, particularly in scenarios featuring a substantial prevalence of infrequent variants. Also, they overcome the shortcomings of conventional privacy preservation techniques, such as bounding the length of variants and introducing fake variants. Experimental results on real-life event data demonstrate that our approaches surpass state-of-the-art techniques in terms of privacy guarantees and utility preservation.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"159 ","pages":"Article 102450"},"PeriodicalIF":2.7,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143848466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-15DOI: 10.1016/j.datak.2025.102449
Florian Plötzky , Katarina Britz , Wolf-Tilo Balke
The use of narratives as a means of fusing information from knowledge graphs (KGs) into a coherent line of argumentation has been the subject of recent investigation. Narratives are especially useful in event-centric knowledge graphs in that they provide a means to connect different real-world events and categorize them by well-known narrations. However, specifically for controversial events, a problem in information fusion arises, namely, multiple viewpoints regarding the validity of certain event aspects, e.g., regarding the role a participant takes in an event, may exist. Expressing those viewpoints in KGs is challenging because disputed information provided by different viewpoints may introduce inconsistencies. Hence, most KGs only feature a single view on the contained information, hampering the effectiveness of narrative information access. This paper is an extension of our original work and introduces attributions, i.e., parameterized predicates that allow for the representation of facts that are only valid in a specific viewpoint. For this, we develop a conceptual model that allows for the representation of viewpoint-dependent information. As an extension, we enhance the model by a conception of viewpoint-compatibility. Based on this, we deepen our original deliberations on the model’s effects on information fusion and provide additional grounding in the literature.
{"title":"A conceptual model for attributions in event-centric knowledge graphs","authors":"Florian Plötzky , Katarina Britz , Wolf-Tilo Balke","doi":"10.1016/j.datak.2025.102449","DOIUrl":"10.1016/j.datak.2025.102449","url":null,"abstract":"<div><div>The use of narratives as a means of fusing information from knowledge graphs (KGs) into a coherent line of argumentation has been the subject of recent investigation. Narratives are especially useful in event-centric knowledge graphs in that they provide a means to connect different real-world events and categorize them by well-known narrations. However, specifically for controversial events, a problem in information fusion arises, namely, multiple <em>viewpoints</em> regarding the validity of certain event aspects, e.g., regarding the role a participant takes in an event, may exist. Expressing those viewpoints in KGs is challenging because disputed information provided by different viewpoints may introduce <em>inconsistencies</em>. Hence, most KGs only feature a single view on the contained information, hampering the effectiveness of narrative information access. This paper is an extension of our original work and introduces <em>attributions</em>, i.e., parameterized predicates that allow for the representation of facts that are only valid in a specific viewpoint. For this, we develop a conceptual model that allows for the representation of viewpoint-dependent information. As an extension, we enhance the model by a conception of viewpoint-compatibility. Based on this, we deepen our original deliberations on the model’s effects on information fusion and provide additional grounding in the literature.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"159 ","pages":"Article 102449"},"PeriodicalIF":2.7,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143855958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-10DOI: 10.1016/j.datak.2025.102445
Fazel Naghdy
Over the past five years, the fast development and use of generative artificial intelligence (GenAI) and large language models (LLMs) has ushered in a new era of study, teaching, and learning in many domains. The role that GenAIs can play in engineering research is addressed. The related previous works report on the potential of GenAIs in the literature review process. However, such potential is not demonstrated by case studies and practical examples. The previous works also do not address how GenAIs can assist with all the steps traditionally taken to design research. This study examines the effectiveness of collaboration with GenAIs at various stages of research design. It explores whether collaboration with GenAIs can result in more focused and comprehensive outcomes. A generalised approach for collaboration with AI tools in research design is proposed. A case study to develop a research design on the concept of “shared machine-human driving” is deployed to show the validity of the articulated concepts. The case study demonstrates both the pros and cons of collaboration with GenAIs. The results generated at each stage are rigorously validated and thoroughly examined to ensure they remain free from inaccuracies or hallucinations and align with the original research objectives. When necessary, the results are manually adjusted and refined to uphold their integrity and accuracy. The findings produced by the various GenAI models utilized in this study highlight the key attributes of generative artificial intelligence, namely speed, efficiency, and scope. However, they also underscore the critical importance of researcher oversight, as unexamined inferences and interpretations can render the results irrelevant or meaningless.
{"title":"Collaboration with GenAI in engineering research design","authors":"Fazel Naghdy","doi":"10.1016/j.datak.2025.102445","DOIUrl":"10.1016/j.datak.2025.102445","url":null,"abstract":"<div><div>Over the past five years, the fast development and use of generative artificial intelligence (GenAI) and large language models (LLMs) has ushered in a new era of study, teaching, and learning in many domains. The role that GenAIs can play in engineering research is addressed. The related previous works report on the potential of GenAIs in the literature review process. However, such potential is not demonstrated by case studies and practical examples. The previous works also do not address how GenAIs can assist with all the steps traditionally taken to design research. This study examines the effectiveness of collaboration with GenAIs at various stages of research design. It explores whether collaboration with GenAIs can result in more focused and comprehensive outcomes. A generalised approach for collaboration with AI tools in research design is proposed. A case study to develop a research design on the concept of “shared machine-human driving” is deployed to show the validity of the articulated concepts. The case study demonstrates both the pros and cons of collaboration with GenAIs. The results generated at each stage are rigorously validated and thoroughly examined to ensure they remain free from inaccuracies or hallucinations and align with the original research objectives. When necessary, the results are manually adjusted and refined to uphold their integrity and accuracy. The findings produced by the various GenAI models utilized in this study highlight the key attributes of generative artificial intelligence, namely speed, efficiency, and scope. However, they also underscore the critical importance of researcher oversight, as unexamined inferences and interpretations can render the results irrelevant or meaningless.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"159 ","pages":"Article 102445"},"PeriodicalIF":2.7,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143942791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-10DOI: 10.1016/j.datak.2025.102448
N Nataraj , RV Nataraj
Cloud Object Storage System (COSS) is capable of storing and retrieving a ton of unstructured data items called objects which act as a core cloud service for contemporary web-based applications. While sharing the data among different parties, privacy preservation becomes challenging. Research Problem: From day-to-day activities, a high volume of requests are served daily thus, it leads to cause the latency issues. In a cloud storage system, the adaption of a holistic approach helps the user to identify sensitive information and analyze the unwanted files/data. With evolving of Internet of Things (IoT) applications are latency-sensitive, which does not function well with these new ideas and platforms that are available today. Overall Purpose of the Study: Therefore, a novel latency-aware COSS is implemented with the aid of multi-objective functionalities to allocate and reallocate data efficiently in order to sustain the storage process in the cloud environment. Design of the Study: This goal is accomplished by implementing a hybrid meta-heuristic approach with the integration of the Mother Optimization Algorithm (MOA) with Dolphin Swarm Optimization (DSO) algorithm. The implemented hybrid optimization algorithm is called the Hybrid Dolphin Swarm-based Mother Optimization Algorithm (HDS-MOA). The HDS-MOA considers the objective function by considering constraints like throughput, latency, resource usage, and active servers during the data allocation process. While considering data reallocation process, the developed HDS-MOA algorithm is also performed by considering the multi-objective constraints like cost, makespan, and energy. The diverse experimental test is conducted to prove its effectiveness by comparing it with other existing methods for storing data efficiently across cloud networks. Major findings of results: In the configuration 3, the proposed HDS-MOA attains 31.11 %, 55.71 %, 55.71 %, and 68.21 % enhanced than the OSSperf, queuing theory, scheduling technique, and Monte Carlo-PSO based on the latency analysis. Overview of Interpretations and Conclusions: The developed HDS-MOA assured the better performance on the data is preserved in the optimal locations having appropriate access time and less latency that is highly essential for the cloud object storage. This supports to enhance the overall user experience by boosting the data retrieval. Limitations of this Study with Solutions: The ability of the proposed algorithm needs to enhance on balancing the multiple objectives such as performance, cost, and fault tolerance for optimally performing the operations in real-time that makes the system to be more efficient as well as responsive in the dynamic variations in the demand.
{"title":"Derived multi-objective function for latency sensitive-based cloud object storage system using hybrid heuristic algorithm","authors":"N Nataraj , RV Nataraj","doi":"10.1016/j.datak.2025.102448","DOIUrl":"10.1016/j.datak.2025.102448","url":null,"abstract":"<div><div>Cloud Object Storage System (COSS) is capable of storing and retrieving a ton of unstructured data items called objects which act as a core cloud service for contemporary web-based applications. While sharing the data among different parties, privacy preservation becomes challenging. <em>Research Problem:</em> From day-to-day activities, a high volume of requests are served daily thus, it leads to cause the latency issues. In a cloud storage system, the adaption of a holistic approach helps the user to identify sensitive information and analyze the unwanted files/data. With evolving of Internet of Things (IoT) applications are latency-sensitive, which does not function well with these new ideas and platforms that are available today. <em>Overall Purpose of the Study:</em> Therefore, a novel latency-aware COSS is implemented with the aid of multi-objective functionalities to allocate and reallocate data efficiently in order to sustain the storage process in the cloud environment. <em>Design of the Study:</em> This goal is accomplished by implementing a hybrid meta-heuristic approach with the integration of the Mother Optimization Algorithm (MOA) with Dolphin Swarm Optimization (DSO) algorithm. The implemented hybrid optimization algorithm is called the Hybrid Dolphin Swarm-based Mother Optimization Algorithm (HDS-MOA). The HDS-MOA considers the objective function by considering constraints like throughput, latency, resource usage, and active servers during the data allocation process. While considering data reallocation process, the developed HDS-MOA algorithm is also performed by considering the multi-objective constraints like cost, makespan, and energy. The diverse experimental test is conducted to prove its effectiveness by comparing it with other existing methods for storing data efficiently across cloud networks. <em>Major findings of results:</em> In the configuration 3, the proposed HDS-MOA attains 31.11 %, 55.71 %, 55.71 %, and 68.21 % enhanced than the OSSperf, queuing theory, scheduling technique, and Monte Carlo-PSO based on the latency analysis. <em>Overview of Interpretations and Conclusions:</em> The developed HDS-MOA assured the better performance on the data is preserved in the optimal locations having appropriate access time and less latency that is highly essential for the cloud object storage. This supports to enhance the overall user experience by boosting the data retrieval. <em>Limitations of this Study with Solutions:</em> The ability of the proposed algorithm needs to enhance on balancing the multiple objectives such as performance, cost, and fault tolerance for optimally performing the operations in real-time that makes the system to be more efficient as well as responsive in the dynamic variations in the demand.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"159 ","pages":"Article 102448"},"PeriodicalIF":2.7,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143859469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-08DOI: 10.1016/j.datak.2025.102451
MVPT Lakshika, HA Caldera, TNK De Zoysa
Recent advances in deep learning techniques and contextual understanding render Knowledge Graphs (KGs) valuable tools for enhancing accessibility and news comprehension. Conventional and news-specific KGs frequently lack the specificity for efficient news-related tasks, leading to limited relevance and static data representation. To fill the gap, this study proposes an Event-Centric Semantic Knowledge Graph (ECS-KG) model that combines deep learning approaches with contextual embeddings to improve the procedural and dynamic knowledge representation observed in news articles. The ECS-KG incorporates several information extraction techniques, a temporal Graph Neural Network (GNN), and a Graph Attention Network (GAT), yielding significant improvements in news representation. Several gold-standard datasets, comprising CNN/Daily Mail, TB-Dense, and ACE 2005, revealed that the proposed model outperformed the most advanced models. By integrating temporal reasoning and semantic insights, ECS-KG not only enhances user understanding of news significance but also meets the evolving demands of news consumers. This model advances the field of event-centric semantic KGs and provides valuable resources for applications in news information processing.
{"title":"ECS-KG: An event-centric semantic knowledge graph for event-related news articles","authors":"MVPT Lakshika, HA Caldera, TNK De Zoysa","doi":"10.1016/j.datak.2025.102451","DOIUrl":"10.1016/j.datak.2025.102451","url":null,"abstract":"<div><div>Recent advances in deep learning techniques and contextual understanding render Knowledge Graphs (KGs) valuable tools for enhancing accessibility and news comprehension. Conventional and news-specific KGs frequently lack the specificity for efficient news-related tasks, leading to limited relevance and static data representation. To fill the gap, this study proposes an Event-Centric Semantic Knowledge Graph (ECS-KG) model that combines deep learning approaches with contextual embeddings to improve the procedural and dynamic knowledge representation observed in news articles. The ECS-KG incorporates several information extraction techniques, a temporal Graph Neural Network (GNN), and a Graph Attention Network (GAT), yielding significant improvements in news representation. Several gold-standard datasets, comprising CNN/Daily Mail, TB-Dense, and ACE 2005, revealed that the proposed model outperformed the most advanced models. By integrating temporal reasoning and semantic insights, ECS-KG not only enhances user understanding of news significance but also meets the evolving demands of news consumers. This model advances the field of event-centric semantic KGs and provides valuable resources for applications in news information processing.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"159 ","pages":"Article 102451"},"PeriodicalIF":2.7,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143828580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01DOI: 10.1016/j.datak.2025.102443
Constantin Buschhaus , Arvid Butting , Judith Michael , Verena Nitsch , Sebastian Pütz , Bernhard Rumpe , Carolin Stellmacher , Sabine Theis
Regulations for privacy protection aim to protect individuals from the unauthorized storage, processing, and transfer of their personal data but oftentimes fail in providing helpful support for understanding these regulations. To better communicate privacy policies for smartwatches, we need an in-depth understanding of their concepts and provide better ways to enable developers to integrate them when engineering systems. Up to now, no conceptual model exists covering privacy statements from different smartwatch manufacturers that is reusable for developers. This paper introduces such a conceptual model for privacy policies of smartwatches and shows its use in a model-driven software engineering approach to create a platform for data visualization of wearable privacy policies from different smartwatch manufacturers. We have analyzed the privacy policies of various manufacturers and extracted the relevant concepts. Moreover, we have checked the model with lawyers for its correctness, instantiated it with concrete data, and used it in a model-driven software engineering approach to create a platform for data visualization. This reusable privacy policy model can enable developers to easily represent privacy policies in their systems. This provides a foundation for more structured and understandable privacy policies which, in the long run, can increase the data sovereignty of application users.
{"title":"Overcoming the hurdle of legal expertise: A reusable model for smartwatch privacy policies","authors":"Constantin Buschhaus , Arvid Butting , Judith Michael , Verena Nitsch , Sebastian Pütz , Bernhard Rumpe , Carolin Stellmacher , Sabine Theis","doi":"10.1016/j.datak.2025.102443","DOIUrl":"10.1016/j.datak.2025.102443","url":null,"abstract":"<div><div>Regulations for privacy protection aim to protect individuals from the unauthorized storage, processing, and transfer of their personal data but oftentimes fail in providing helpful support for understanding these regulations. To better communicate privacy policies for smartwatches, we need an in-depth understanding of their concepts and provide better ways to enable developers to integrate them when engineering systems. Up to now, no conceptual model exists covering privacy statements from different smartwatch manufacturers that is reusable for developers. This paper introduces such a conceptual model for privacy policies of smartwatches and shows its use in a model-driven software engineering approach to create a platform for data visualization of wearable privacy policies from different smartwatch manufacturers. We have analyzed the privacy policies of various manufacturers and extracted the relevant concepts. Moreover, we have checked the model with lawyers for its correctness, instantiated it with concrete data, and used it in a model-driven software engineering approach to create a platform for data visualization. This reusable privacy policy model can enable developers to easily represent privacy policies in their systems. This provides a foundation for more structured and understandable privacy policies which, in the long run, can increase the data sovereignty of application users.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"159 ","pages":"Article 102443"},"PeriodicalIF":2.7,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143817727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-31DOI: 10.1016/j.datak.2025.102446
Selmin Nurcan, Andreas L. Opdahl
{"title":"Editorial preface to the special issue on research challenges in information science (RCIS’2023)","authors":"Selmin Nurcan, Andreas L. Opdahl","doi":"10.1016/j.datak.2025.102446","DOIUrl":"10.1016/j.datak.2025.102446","url":null,"abstract":"","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"158 ","pages":"Article 102446"},"PeriodicalIF":2.7,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143911765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-25DOI: 10.1016/j.datak.2025.102440
Satya Deo , Debajyoty Banik , Prasant Kumar Pattnaik
One among the most crucial concerns in the domain of Natural Language Processing (NLP) is the Multi-Document Summarization (MDS) and in recent decades, the focus on this issue has risen massively. Hence, it is vital for the NLP community to provide effective and reliable MDS methods. Current deep learning-dependent MDS techniques rely on the extraordinary capacity of neural networks, in order to extract distinctive features. Motivated by this fact, we introduce a novel MDS technique, named as Customized Long Short-Term Memory-based Multi-Document Summarization using IBi-GRU (CLSTM-MDS+IBi-GRU), which includes the following working processes. Firstly, the input data gets converted into tokens by the Bi-directional Transformer (BERT) tokenizer. The features, such as Term Frequency- Inverse Document Frequency (TF-IDF), Bag of Words (BoW), thematic features and an improved aspect term-based feature are then extracted afterwards. Finally, the summarization process takes place by utilizing the concatenation of Customized Long Short-Term Memory (CLSTM) with a pre-eminent layer. Accurate and high-quality summary is provided via introducing this layer in the LSTM module and the Bi-GRU-based Inception module (IBi-GRU), which can capture long range dependences through parallel convolution. The outcomes of this work prove the superiority of our CLSTM-MDS in the Multi-Document Summarization task.
{"title":"Customized long short-term memory architecture for multi-document summarization with improved text feature set","authors":"Satya Deo , Debajyoty Banik , Prasant Kumar Pattnaik","doi":"10.1016/j.datak.2025.102440","DOIUrl":"10.1016/j.datak.2025.102440","url":null,"abstract":"<div><div>One <strong>a</strong>mong the most crucial concerns in the domain of Natural Language Processing (NLP) is the Multi-Document Summarization (MDS) and in recent decades, the focus on this issue has risen massively. Hence, it is vital for the NLP community to provide effective and reliable MDS methods. Current deep learning-dependent MDS techniques rely on the extraordinary capacity of neural networks, in order to extract distinctive features. Motivated by this fact, we introduce a novel MDS technique, named as Customized Long Short-Term Memory-based Multi-Document Summarization using IBi-GRU <strong>(</strong>CLSTM-MDS+IBi-GRU), which includes the following working processes. Firstly, the input data gets converted into tokens by the Bi-directional Transformer (BERT) tokenizer. The features, such as Term Frequency- Inverse Document Frequency (TF-IDF), Bag of Words (BoW), thematic features and an improved aspect term-based feature are then extracted afterwards. Finally, the summarization process takes place by utilizing the concatenation of Customized Long Short-Term Memory (CLSTM) with a pre-eminent layer. Accurate and high-quality summary is provided via introducing this layer in the LSTM module and the Bi-GRU-based Inception module (IBi-GRU), which can capture long range dependences through parallel convolution. The outcomes of this work prove the superiority of our CLSTM-MDS in the Multi-Document Summarization task.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"159 ","pages":"Article 102440"},"PeriodicalIF":2.7,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143800459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-24DOI: 10.1016/j.datak.2025.102442
Malte Heithoff , Christian Hopmann , Thilo Köbel , Judith Michael , Bernhard Rumpe , Patrick Sapel
The concept of digital shadows helps to move from handling large amounts of heterogeneous data in production to the handling of task- and context-dependent aggregated data sets supporting a specific purpose. Current research lacks further investigations of characteristics digital shadows may have when they are applied to different levels of the automation pyramid. Within this paper, we describe the application of the digital shadow concept for two use cases in injection molding, namely geometry-dependent process configuration, and optimal production planning of jobs on an injection molding machine. In detail, we describe the creation process of digital shadows, relevant data needs for the specific purpose, as well as relevant models. Based on their usage, we describe specifics of their characteristics and discuss commonalities and differences. These aspects can be taken into account when creating digital shadows for further use cases.
{"title":"Application of digital shadows on different levels in the automation pyramid","authors":"Malte Heithoff , Christian Hopmann , Thilo Köbel , Judith Michael , Bernhard Rumpe , Patrick Sapel","doi":"10.1016/j.datak.2025.102442","DOIUrl":"10.1016/j.datak.2025.102442","url":null,"abstract":"<div><div>The concept of digital shadows helps to move from handling large amounts of heterogeneous data in production to the handling of task- and context-dependent aggregated data sets supporting a specific purpose. Current research lacks further investigations of characteristics digital shadows may have when they are applied to different levels of the automation pyramid. Within this paper, we describe the application of the digital shadow concept for two use cases in injection molding, namely geometry-dependent process configuration, and optimal production planning of jobs on an injection molding machine. In detail, we describe the creation process of digital shadows, relevant data needs for the specific purpose, as well as relevant models. Based on their usage, we describe specifics of their characteristics and discuss commonalities and differences. These aspects can be taken into account when creating digital shadows for further use cases.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"158 ","pages":"Article 102442"},"PeriodicalIF":2.7,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143748056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}