首页 > 最新文献

Data & Knowledge Engineering最新文献

英文 中文
Dirigo: A method to extract event logs for object-centric processes Dirigo:为以对象为中心的进程提取事件日志的方法
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-07-05 DOI: 10.1016/j.datak.2025.102485
Jia Wei , Chun Ouyang , Ying Wang , Lei Huang
Real-world processes involve multiple object types with intricate interrelationships. Traditional event logs (in XES format), which record process execution centred around the case notion, are restricted to a single-object perspective, making it difficult to capture the behaviour of multiple objects and their interactions. To address this limitation, object-centric event logs (OCEL) have been introduced to capture both the objects involved in a process and their interactions with events. The object-centric event data (OCED) metamodel extends the OCEL format by further capturing dynamic object attributes and object-to-object relations. Recently OCEL 2.0 has been proposed based on OCED metamodel. Current research on generating OCEL logs requires specific input data sources, and resulting log data often fails to fully conform to OCEL 2.0. Moreover, the generated OCEL logs vary across different representational formats and their quality remains unevaluated. To address these challenges, a set of quality criteria for evaluating OCEL log representations is established. Guided by these criteria, Dirigo is proposed—a method for extracting event logs that not only conforms to OCEL 2.0 but also extends it by capturing the temporal aspect of dynamic object-to-object relations. Object-role Modelling (ORM), a conceptual data modelling technique, is employed to describe the artifact produced at each step of Dirigo. To validate the applicability of Dirigo, it is applied to a real-life use case. The quality of the log representation of the extracted event log is compared to those of existing OCEL logs using the established quality criteria.
现实世界的过程涉及具有复杂相互关系的多种对象类型。传统的事件日志(以XES格式)记录了以案例概念为中心的流程执行,它仅限于单对象透视图,因此很难捕获多个对象的行为及其交互。为了解决这一限制,引入了以对象为中心的事件日志(OCEL)来捕获流程中涉及的对象及其与事件的交互。以对象为中心的事件数据(OCED)元模型通过进一步捕获动态对象属性和对象到对象关系来扩展OCEL格式。最近提出了基于OCED元模型的OCEL 2.0。目前对生成OCEL日志的研究需要特定的输入数据源,生成的日志数据往往不能完全符合OCEL 2.0。此外,生成的OCEL日志因不同的表示格式而异,其质量仍未得到评估。为了应对这些挑战,建立了一套评估OCEL日志表示的质量标准。在这些标准的指导下,提出了Dirigo——一种提取事件日志的方法,它不仅符合OCEL 2.0,而且还通过捕获动态对象对对象关系的时间方面对其进行了扩展。对象角色建模(Object-role modeling, ORM)是一种概念数据建模技术,用于描述Dirigo每一步产生的工件。为了验证Dirigo的适用性,我们将其应用于一个现实生活中的用例。使用已建立的质量标准,将提取的事件日志的日志表示的质量与现有OCEL日志的质量进行比较。
{"title":"Dirigo: A method to extract event logs for object-centric processes","authors":"Jia Wei ,&nbsp;Chun Ouyang ,&nbsp;Ying Wang ,&nbsp;Lei Huang","doi":"10.1016/j.datak.2025.102485","DOIUrl":"10.1016/j.datak.2025.102485","url":null,"abstract":"<div><div>Real-world processes involve multiple object types with intricate interrelationships. Traditional event logs (in XES format), which record process execution centred around the case notion, are restricted to a single-object perspective, making it difficult to capture the behaviour of multiple objects and their interactions. To address this limitation, object-centric event logs (OCEL) have been introduced to capture both the objects involved in a process and their interactions with events. The object-centric event data (OCED) metamodel extends the OCEL format by further capturing dynamic object attributes and object-to-object relations. Recently OCEL 2.0 has been proposed based on OCED metamodel. Current research on generating OCEL logs requires specific input data sources, and resulting log data often fails to fully conform to OCEL 2.0. Moreover, the generated OCEL logs vary across different representational formats and their quality remains unevaluated. To address these challenges, a set of quality criteria for evaluating OCEL log representations is established. Guided by these criteria, <em>Dirigo</em> is proposed—a method for extracting event logs that not only conforms to OCEL 2.0 but also extends it by capturing the temporal aspect of dynamic object-to-object relations. Object-role Modelling (ORM), a conceptual data modelling technique, is employed to describe the artifact produced at each step of <em>Dirigo</em>. To validate the applicability of <em>Dirigo</em>, it is applied to a real-life use case. The quality of the log representation of the extracted event log is compared to those of existing OCEL logs using the established quality criteria.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"160 ","pages":"Article 102485"},"PeriodicalIF":2.7,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144614765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editorial introduction for special issue on research challenges and practices in conceptual modeling – ER 2023 关于概念建模的研究挑战和实践的特刊编辑导言- er2023
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-07-03 DOI: 10.1016/j.datak.2025.102487
João Paulo A Almeida , José Borbinha , Giancarlo Guizzardi , Sebastian Link , Jelena Zdravkovic
{"title":"Editorial introduction for special issue on research challenges and practices in conceptual modeling – ER 2023","authors":"João Paulo A Almeida ,&nbsp;José Borbinha ,&nbsp;Giancarlo Guizzardi ,&nbsp;Sebastian Link ,&nbsp;Jelena Zdravkovic","doi":"10.1016/j.datak.2025.102487","DOIUrl":"10.1016/j.datak.2025.102487","url":null,"abstract":"","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"160 ","pages":"Article 102487"},"PeriodicalIF":2.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145120291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PSA-GAT: Integrating position-syntax and cross-aspect graph attention networks for aspect-based sentiment analysis PSA-GAT:整合位置语法和跨方面图注意网络,用于基于方面的情感分析
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-06-26 DOI: 10.1016/j.datak.2025.102477
Ning Zhou, Linfu Sun, Min Han, Songlin He
Aspect-based sentiment analysis (ABSA) is widely applied in analyzing user review data on web platforms to identify sentiment polarity toward specific aspects of web reviews. However, individual reviews often contain multiple conditions and coordinating and conflicting elements or relationships, which significantly increases the complexity of this task. In recent years, exploiting semantic–syntactic information with graph neural networks has been widely used to address such tasks. However, such methods overlook the features of the location influence factor of words and may provide irrelevant or even interfering noisy signals for ABSA because of the word association relationships mined by the syntax tree and semantic composition tree. To alleviate the effect of noise information and fully strengthen the context for multiple-aspect representation in ABSA, we propose a new framework, PSA-GAT, that mines information on position importance, syntactic–semantic dependencies and cross-aspect correlations. Overall, the structural features of the multi-aspect sentiment set are learned by using various variations of graph neural networks. Moreover, the experimental results on four real-world datasets demonstrate the effectiveness of PSA-GAT compared to state-of-the-art methods. The code is available at https://github.com/zhouning6000/PSA_GAT.
基于方面的情感分析(ABSA)被广泛应用于分析网络平台上的用户评论数据,以识别对网络评论特定方面的情感极性。然而,单个评审通常包含多个条件以及协调和冲突的元素或关系,这大大增加了该任务的复杂性。近年来,利用图神经网络挖掘语义句法信息已被广泛用于解决这类任务。然而,这些方法忽略了词的位置影响因素的特点,由于句法树和语义组合树挖掘的词关联关系,可能为ABSA提供不相关甚至干扰的噪声信号。为了减轻噪声信息的影响,充分加强ABSA中多向表示的上下文,我们提出了一个新的PSA-GAT框架,该框架挖掘了位置重要性、句法语义依赖性和跨向相关性信息。总体而言,多向情感集的结构特征是通过使用各种不同的图神经网络来学习的。此外,在四个真实数据集上的实验结果表明,与最先进的方法相比,PSA-GAT的有效性。代码可在https://github.com/zhouning6000/PSA_GAT上获得。
{"title":"PSA-GAT: Integrating position-syntax and cross-aspect graph attention networks for aspect-based sentiment analysis","authors":"Ning Zhou,&nbsp;Linfu Sun,&nbsp;Min Han,&nbsp;Songlin He","doi":"10.1016/j.datak.2025.102477","DOIUrl":"10.1016/j.datak.2025.102477","url":null,"abstract":"<div><div>Aspect-based sentiment analysis (ABSA) is widely applied in analyzing user review data on web platforms to identify sentiment polarity toward specific aspects of web reviews. However, individual reviews often contain multiple conditions and coordinating and conflicting elements or relationships, which significantly increases the complexity of this task. In recent years, exploiting semantic–syntactic information with graph neural networks has been widely used to address such tasks. However, such methods overlook the features of the location influence factor of words and may provide irrelevant or even interfering noisy signals for ABSA because of the word association relationships mined by the syntax tree and semantic composition tree. To alleviate the effect of noise information and fully strengthen the context for multiple-aspect representation in ABSA, we propose a new framework, PSA-GAT, that mines information on position importance, syntactic–semantic dependencies and cross-aspect correlations. Overall, the structural features of the multi-aspect sentiment set are learned by using various variations of graph neural networks. Moreover, the experimental results on four real-world datasets demonstrate the effectiveness of PSA-GAT compared to state-of-the-art methods. The code is available at <span><span>https://github.com/zhouning6000/PSA_GAT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"160 ","pages":"Article 102477"},"PeriodicalIF":2.7,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144534819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Domain knowledge in artificial intelligence: Using conceptual modeling to increase machine learning accuracy and explainability 人工智能领域知识:使用概念建模来提高机器学习的准确性和可解释性
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-06-23 DOI: 10.1016/j.datak.2025.102482
Veda C. Storey , Jeffrey Parsons , Arturo Castellanos Bueso , Monica Chiarini Tremblay , Roman Lukyanenko , Alfred Castillo , Wolfgang Maaß
Machine learning enables the extraction of useful information from large, diverse datasets. However, despite many successful applications, machine learning continues to suffer from performance and transparency issues. These challenges can be partially attributed to the limited use of domain knowledge by machine learning models. This research proposes using the domain knowledge represented in conceptual models to improve the preparation of the data used to train machine learning models. We develop and demonstrate a method, called the Conceptual Modeling for Machine Learning (CMML), which is comprised of guidelines for data preparation in machine learning and based on conceptual modeling constructs and principles. To assess the impact of CMML on machine learning outcomes, we first applied it to two real-world problems to evaluate its impact on model performance. We then solicited an assessment by data scientists on the applicability of the method. These results demonstrate the value of CMML for improving machine learning outcomes.
机器学习可以从大量不同的数据集中提取有用的信息。然而,尽管有许多成功的应用,机器学习仍然受到性能和透明度问题的困扰。这些挑战可以部分归因于机器学习模型对领域知识的有限使用。本研究建议使用概念模型中表示的领域知识来改进用于训练机器学习模型的数据准备。我们开发并演示了一种称为机器学习概念建模(CMML)的方法,该方法由机器学习中的数据准备指南组成,并基于概念建模构造和原则。为了评估cml对机器学习结果的影响,我们首先将其应用于两个现实世界的问题,以评估其对模型性能的影响。然后,我们请数据科学家对该方法的适用性进行评估。这些结果证明了cml在改善机器学习结果方面的价值。
{"title":"Domain knowledge in artificial intelligence: Using conceptual modeling to increase machine learning accuracy and explainability","authors":"Veda C. Storey ,&nbsp;Jeffrey Parsons ,&nbsp;Arturo Castellanos Bueso ,&nbsp;Monica Chiarini Tremblay ,&nbsp;Roman Lukyanenko ,&nbsp;Alfred Castillo ,&nbsp;Wolfgang Maaß","doi":"10.1016/j.datak.2025.102482","DOIUrl":"10.1016/j.datak.2025.102482","url":null,"abstract":"<div><div>Machine learning enables the extraction of useful information from large, diverse datasets. However, despite many successful applications, machine learning continues to suffer from performance and transparency issues. These challenges can be partially attributed to the limited use of domain knowledge by machine learning models. This research proposes using the domain knowledge represented in conceptual models to improve the preparation of the data used to train machine learning models. We develop and demonstrate a method, called the <em>Conceptual Modeling for Machine Learning (CMML)</em>, which is comprised of guidelines for data preparation in machine learning and based on conceptual modeling constructs and principles. To assess the impact of CMML on machine learning outcomes, we first applied it to two real-world problems to evaluate its impact on model performance. We then solicited an assessment by data scientists on the applicability of the method. These results demonstrate the value of CMML for improving machine learning outcomes.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"160 ","pages":"Article 102482"},"PeriodicalIF":2.7,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144534882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large language models for conceptual modeling: Assessment and application potential 用于概念建模的大型语言模型:评估和应用潜力
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-06-21 DOI: 10.1016/j.datak.2025.102480
Veda C. Storey , Oscar Pastor , Giancarlo Guizzardi , Stephen W. Liddle , Wolfgang Maaß , Jeffrey Parsons , Jolita Ralyté , Maribel Yasmina Santos
Large Language Models (LLMs) are being rapidly adopted for many activities in organizations, business, and education. Included in their applications are capabilities to generate text, code, and models. This leads to questions about their potential role in the conceptual modeling part of information systems development. This paper reports on a panel presented at the 43rd International Conference on Conceptual Modeling where researchers discussed the current and potential role of LLMs in conceptual modeling. The panelists discussed applications and interest levels and expressed both optimism and caution in the adoption of LLMs. Suggested is a need for much continued research by the conceptual modeling community on LLM development and their role in research and teaching.
大型语言模型(llm)正迅速被组织、商业和教育中的许多活动所采用。它们的应用程序中包含生成文本、代码和模型的功能。这就引出了关于它们在信息系统开发的概念建模部分中的潜在作用的问题。在第43届概念建模国际会议上,研究人员讨论了法学硕士在概念建模中的当前和潜在作用。小组成员讨论了法学硕士的应用和兴趣水平,并对法学硕士的采用表示乐观和谨慎。建议概念建模社区对法学硕士发展及其在研究和教学中的作用进行更多的持续研究。
{"title":"Large language models for conceptual modeling: Assessment and application potential","authors":"Veda C. Storey ,&nbsp;Oscar Pastor ,&nbsp;Giancarlo Guizzardi ,&nbsp;Stephen W. Liddle ,&nbsp;Wolfgang Maaß ,&nbsp;Jeffrey Parsons ,&nbsp;Jolita Ralyté ,&nbsp;Maribel Yasmina Santos","doi":"10.1016/j.datak.2025.102480","DOIUrl":"10.1016/j.datak.2025.102480","url":null,"abstract":"<div><div>Large Language Models (LLMs) are being rapidly adopted for many activities in organizations, business, and education. Included in their applications are capabilities to generate text, code, and models. This leads to questions about their potential role in the conceptual modeling part of information systems development. This paper reports on a panel presented at the <em>43rd International Conference on Conceptual Modeling</em> where researchers discussed the current and potential role of LLMs in conceptual modeling. The panelists discussed applications and interest levels and expressed both optimism and caution in the adoption of LLMs. Suggested is a need for much continued research by the conceptual modeling community on LLM development and their role in research and teaching.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"160 ","pages":"Article 102480"},"PeriodicalIF":2.7,"publicationDate":"2025-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144517377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explainable artificial intelligence for natural language processing: A survey 用于自然语言处理的可解释人工智能:综述
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-06-13 DOI: 10.1016/j.datak.2025.102470
Md. Mehedi Hassan , Anindya Nag , Riya Biswas , Md Shahin Ali , Sadika Zaman , Anupam Kumar Bairagi , Chetna Kaushal
Recently, artificial intelligence has gained a lot of momentum and is predicted to surpass expectations across a range of industries. However, explainability is a major challenge due to sub-symbolic techniques like Deep Neural Networks and Ensembles, which were absent during the boom of AI. The practical application of AI in numerous application areas is greatly undermined by this lack of explainability. In order to counter the lack of perception of AI-based systems, Explainable AI (XAI) aims to increase transparency and human comprehension of black-box AI models. Explainable AI (XAI) also strives to promote transparency and human comprehension of black-box AI models. The explainability problem has been approached using a variety of XAI strategies; however, given the complexity of the search space, it may be tricky for ML developers and data scientists to construct XAI applications and choose the optimal XAI algorithms. This paper provides different frameworks, surveys, operations, and explainability methodologies that are currently available for producing reasoning for predictions from Natural Language Processing models in order to aid developers. Additionally, a thorough analysis of current work in explainable NLP and AI is undertaken, providing researchers worldwide with exploration, insight, and idea development opportunities. Finally, the authors highlight gaps in the literature and offer ideas for future research in this area.
最近,人工智能获得了很大的动力,预计将在一系列行业中超出预期。然而,由于深度神经网络和集成等子符号技术的存在,可解释性是一个主要挑战,而这些技术在人工智能的繁荣时期是不存在的。由于缺乏可解释性,人工智能在许多应用领域的实际应用受到了极大的破坏。为了解决基于人工智能的系统缺乏感知的问题,可解释的人工智能(XAI)旨在提高透明度和人类对黑盒人工智能模型的理解。可解释的人工智能(XAI)也致力于提高透明度和人类对黑箱人工智能模型的理解。可解释性问题已经使用各种XAI策略来解决;然而,考虑到搜索空间的复杂性,对于ML开发人员和数据科学家来说,构建XAI应用程序和选择最佳的XAI算法可能会很棘手。本文提供了不同的框架、调查、操作和可解释性方法,这些方法目前可用于从自然语言处理模型中产生预测推理,以帮助开发人员。此外,对可解释的NLP和AI的当前工作进行了彻底的分析,为全世界的研究人员提供了探索,见解和想法发展的机会。最后,作者强调了文献中的空白,并对该领域的未来研究提出了想法。
{"title":"Explainable artificial intelligence for natural language processing: A survey","authors":"Md. Mehedi Hassan ,&nbsp;Anindya Nag ,&nbsp;Riya Biswas ,&nbsp;Md Shahin Ali ,&nbsp;Sadika Zaman ,&nbsp;Anupam Kumar Bairagi ,&nbsp;Chetna Kaushal","doi":"10.1016/j.datak.2025.102470","DOIUrl":"10.1016/j.datak.2025.102470","url":null,"abstract":"<div><div>Recently, artificial intelligence has gained a lot of momentum and is predicted to surpass expectations across a range of industries. However, explainability is a major challenge due to sub-symbolic techniques like Deep Neural Networks and Ensembles, which were absent during the boom of AI. The practical application of AI in numerous application areas is greatly undermined by this lack of explainability. In order to counter the lack of perception of AI-based systems, Explainable AI (XAI) aims to increase transparency and human comprehension of black-box AI models. Explainable AI (XAI) also strives to promote transparency and human comprehension of black-box AI models. The explainability problem has been approached using a variety of XAI strategies; however, given the complexity of the search space, it may be tricky for ML developers and data scientists to construct XAI applications and choose the optimal XAI algorithms. This paper provides different frameworks, surveys, operations, and explainability methodologies that are currently available for producing reasoning for predictions from Natural Language Processing models in order to aid developers. Additionally, a thorough analysis of current work in explainable NLP and AI is undertaken, providing researchers worldwide with exploration, insight, and idea development opportunities. Finally, the authors highlight gaps in the literature and offer ideas for future research in this area.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"160 ","pages":"Article 102470"},"PeriodicalIF":2.7,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144297314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unveiling cancellation dynamics: A two-stage model for predictive analytics 揭开取消动力学:预测分析的两阶段模型
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-06-06 DOI: 10.1016/j.datak.2025.102467
Soumyadeep Kundu , Soumya Roy , Archit Shukla , Arqum Mateen
Booking cancellations have an adverse impact on the performance of firms in the hospitality industry. Most of the studies in this domain have considered the questions of whether a booking would be cancelled or not (if). While useful, given the nature of the industry, it would be important to understand the timing of cancellation as well (when). Answering the inter-temporal nature of the question would help hotels to devise appropriate strategies to accommodate this change. In our study, we have proposed a novel two-stage model, which predicts both the likelihood (if) as well as the timing (when) of cancellation, using various statistical and machine learning techniques. We find that significant predictors include the average daily rate (which is an indicator of average rental revenue earned for an occupied room per day), month of arrival, day of arrival, and the lead time. Our insights can help hotels design bespoke cancellation policies and exercise personalised services and interventions for guests.
预订取消对酒店行业公司的业绩有不利影响。这一领域的大多数研究都考虑了预订是否会被取消的问题。考虑到行业的性质,了解取消的时间(何时)也很重要。回答这个问题的跨时间性质将有助于酒店制定适当的策略来适应这种变化。在我们的研究中,我们提出了一个新的两阶段模型,该模型使用各种统计和机器学习技术来预测取消的可能性(if)和时间(when)。我们发现,重要的预测因素包括平均每日房价(这是一个指标,表示每天已入住房间的平均租金收入)、到达月份、到达日期和交货时间。我们的见解可以帮助酒店设计定制的取消政策,并为客人提供个性化的服务和干预措施。
{"title":"Unveiling cancellation dynamics: A two-stage model for predictive analytics","authors":"Soumyadeep Kundu ,&nbsp;Soumya Roy ,&nbsp;Archit Shukla ,&nbsp;Arqum Mateen","doi":"10.1016/j.datak.2025.102467","DOIUrl":"10.1016/j.datak.2025.102467","url":null,"abstract":"<div><div>Booking cancellations have an adverse impact on the performance of firms in the hospitality industry. Most of the studies in this domain have considered the questions of whether a booking would be cancelled or not (if). While useful, given the nature of the industry, it would be important to understand the timing of cancellation as well (when). Answering the inter-temporal nature of the question would help hotels to devise appropriate strategies to accommodate this change. In our study, we have proposed a novel two-stage model, which predicts both the likelihood (if) as well as the timing (when) of cancellation, using various statistical and machine learning techniques. We find that significant predictors include the average daily rate (which is an indicator of average rental revenue earned for an occupied room per day), month of arrival, day of arrival, and the lead time. Our insights can help hotels design bespoke cancellation policies and exercise personalised services and interventions for guests.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"160 ","pages":"Article 102467"},"PeriodicalIF":2.7,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144279321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-feature classification for fake news detection using multiscale and atrous convolution-based adaptive temporal convolution network 基于多尺度自适应时域卷积网络的多特征分类假新闻检测
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-06-03 DOI: 10.1016/j.datak.2025.102469
Rashmi Rane , R. Subhashini
In the exponential growth of social media platforms, Facebook, Twitter, YouTube and Instagram are the main sources for providing news and information about anything at anywhere. Sometimes, fake information is quickly spread by uploading from particular people affecting the media usage of people. In this research work, a novel deep learning-based framework is proposed to effectively detect fake news for enhancing the trust of social media users. At first, the required text data is gathered from the benchmark resources and given to the preprocessing stage. Then, the preprocessed data is fed into the feature extraction phase here, the Bidirectional Encoder Representations from Transformers (BERT), Recurrent Neural Networks (RNN), and Convolutional Neural Networks (CNN) mechanisms are utilized to effectively extract the meaningful information from the data and improve the accuracy. Also, it can generate three sets of BERT, temporal, and spatial features in the extraction phase and then given to the detection phase. Here, the Multiscale and Atrous Convolution-based Adaptive Temporal Convolution Network (MAC-ATCN) is used for ultimately identifying and categorizing the false information to ensure more reliable outcomes and decision-making processes. Additionally, the Modified Osprey Optimization Algorithm (MOOA) algorithm is employed to fine-tune the parameters to prevent overfitting issues when dealing with larger data. It helps to easily address the imbalanced dataset issues by varying the hyperparameters in the training process. Finally, the overall detection performance is validated with various performance measures and compared with existing works. Also, the developed method achieved better accuracy value for dataset 1 is 93.74 % and dataset 2 is 92.82%. By effectively identifying the fake news in social media can help users to make timely informed decisions. This helps to prevent the spread of misinformation and protects individuals from harmful consequences.
在社交媒体平台呈指数级增长的情况下,Facebook、Twitter、YouTube和Instagram是随时随地提供新闻和信息的主要来源。有时,虚假信息通过某些人的上传迅速传播,影响了人们对媒体的使用。在本研究中,提出了一种基于深度学习的新型框架来有效地检测假新闻,以增强社交媒体用户的信任。首先,从基准资源中收集所需的文本数据,并将其交给预处理阶段。然后,将预处理后的数据输入到特征提取阶段,利用变形器(BERT)、循环神经网络(RNN)和卷积神经网络(CNN)机制有效地从数据中提取有意义的信息,提高准确率。并且在提取阶段可以生成BERT、时间和空间三组特征,然后再交给检测阶段。在这里,基于多尺度和亚特鲁斯卷积的自适应时间卷积网络(MAC-ATCN)被用于最终识别和分类虚假信息,以确保更可靠的结果和决策过程。此外,采用Modified Osprey Optimization Algorithm (MOOA)算法对参数进行微调,避免在处理较大数据时出现过拟合问题。通过改变训练过程中的超参数,可以很容易地解决数据集不平衡的问题。最后,采用各种性能指标对整体检测性能进行了验证,并与现有工作进行了比较。该方法在数据集1和数据集2上的准确率分别为93.74%和92.82%。通过有效识别社交媒体中的假新闻,可以帮助用户及时做出明智的决定。这有助于防止错误信息的传播,并保护个人免受有害后果的影响。
{"title":"Multi-feature classification for fake news detection using multiscale and atrous convolution-based adaptive temporal convolution network","authors":"Rashmi Rane ,&nbsp;R. Subhashini","doi":"10.1016/j.datak.2025.102469","DOIUrl":"10.1016/j.datak.2025.102469","url":null,"abstract":"<div><div>In the exponential growth of social media platforms, Facebook, Twitter, YouTube and Instagram are the main sources for providing news and information about anything at anywhere. Sometimes, fake information is quickly spread by uploading from particular people affecting the media usage of people. In this research work, a novel deep learning-based framework is proposed to effectively detect fake news for enhancing the trust of social media users. At first, the required text data is gathered from the benchmark resources and given to the preprocessing stage. Then, the preprocessed data is fed into the feature extraction phase here, the Bidirectional Encoder Representations from Transformers (BERT), Recurrent Neural Networks (RNN), and Convolutional Neural Networks (CNN) mechanisms are utilized to effectively extract the meaningful information from the data and improve the accuracy. Also, it can generate three sets of BERT, temporal, and spatial features in the extraction phase and then given to the detection phase. Here, the Multiscale and Atrous Convolution-based Adaptive Temporal Convolution Network (MAC-ATCN) is used for ultimately identifying and categorizing the false information to ensure more reliable outcomes and decision-making processes. Additionally, the Modified Osprey Optimization Algorithm (MOOA) algorithm is employed to fine-tune the parameters to prevent overfitting issues when dealing with larger data. It helps to easily address the imbalanced dataset issues by varying the hyperparameters in the training process. Finally, the overall detection performance is validated with various performance measures and compared with existing works. Also, the developed method achieved better accuracy value for dataset 1 is 93.74 % and dataset 2 is 92.82%. By effectively identifying the fake news in social media can help users to make timely informed decisions. This helps to prevent the spread of misinformation and protects individuals from harmful consequences.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"160 ","pages":"Article 102469"},"PeriodicalIF":2.7,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144271234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic query expansion for enhancing document retrieval system in healthcare application using GAN based embedding and hyper-tuned DAEBERT algorithm 基于GAN嵌入和超调DAEBERT算法的医疗保健文档检索系统自动查询扩展
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-06-01 DOI: 10.1016/j.datak.2025.102468
Deepak Vishwakarma , Suresh Kumar
Query expansion is a useful technique for improving document retrieval systems' dependability and performance. Search engines frequently employ query expansion strategies to improve Information Retrieval (IR) performance and elucidate users' information requirements. Although there are several methods for automatically expanding queries, the list of documents that are returned can occasionally be lengthy and contain a lot of useless information, particularly when searching the Web. As the size of medical document grows, Automatic Query Expansion might struggle with efficiency and real-time application. Thus, Hyper-Tuned Dual Attention Enhanced Bi-directional Encoder Representation from Transformers (HT-DAEBERT) with automatic ranking based query expansion system is created for enhancing medical document retrieval system. Initially, the user's query from the medical corpus document was collected, and it was augmented using the Generative Adversarial Network (GAN) approach. Then augmented text is pre-processed to improve the original text's quality through tokenization, acronym expansion, stemming, stop word removal, hyperlink removal, and spell correction. After that, Keywords are extracted using the Proximity-based Keyword Extraction (PKE) technique from the pre-processed text. Afterwards, the words are converted into vector form by utilizing the Hyper-Tuned Dual Attention Enhanced Bi-directional Encoder Representation from Transformers (HT-DAEBERT) model. In DAEBERT, key parameters such as dropout rate and weight decay were optimally selected by using the Election Optimization Algorithm (EOA). After that, a ranking-based query expansion approach was employed to enhance the document retrieval system. The proposed method achieves an accuracy of 97.60 %, a Hit Rate of 98.30 %, a PPV of 93.40 %, an F1-Score of 95.79 %, and an NPV of 97.50 %. This approach improves the accuracy and relevance of document retrieval in healthcare, potentially leading to better patient care and enhanced clinical outcomes.
查询扩展是提高文档检索系统可靠性和性能的一种有效技术。搜索引擎经常采用查询扩展策略来提高信息检索(Information Retrieval, IR)性能,并阐明用户的信息需求。虽然有几种方法可以自动扩展查询,但是返回的文档列表有时会很长,并且包含很多无用的信息,特别是在搜索Web时。随着医疗文档大小的增长,自动查询扩展可能会在效率和实时性方面遇到困难。为此,为增强医学文献检索系统,提出了基于自动排序的超调双注意增强双向编码器表示(HT-DAEBERT)查询扩展系统。首先,从医学语料库文档中收集用户查询,并使用生成对抗网络(GAN)方法对其进行增强。然后对增强文本进行预处理,通过标记化、首字母缩略词扩展、词干提取、停止词删除、超链接删除和拼写纠正来提高原始文本的质量。然后,使用基于邻近度的关键字提取(PKE)技术从预处理文本中提取关键字。然后,利用变形金刚的超调谐双注意增强双向编码器表示(HT-DAEBERT)模型将单词转换为向量形式。在DAEBERT中,采用选举优化算法(EOA)对辍学率和权值衰减等关键参数进行优化选择。然后,采用基于排序的查询扩展方法对文档检索系统进行增强。该方法的准确率为97.60%,命中率为98.30%,PPV为93.40%,F1-Score为95.79%,NPV为97.50%。这种方法提高了医疗保健中文档检索的准确性和相关性,可能会带来更好的患者护理和增强的临床结果。
{"title":"Automatic query expansion for enhancing document retrieval system in healthcare application using GAN based embedding and hyper-tuned DAEBERT algorithm","authors":"Deepak Vishwakarma ,&nbsp;Suresh Kumar","doi":"10.1016/j.datak.2025.102468","DOIUrl":"10.1016/j.datak.2025.102468","url":null,"abstract":"<div><div>Query expansion is a useful technique for improving document retrieval systems' dependability and performance. Search engines frequently employ query expansion strategies to improve Information Retrieval (IR) performance and elucidate users' information requirements. Although there are several methods for automatically expanding queries, the list of documents that are returned can occasionally be lengthy and contain a lot of useless information, particularly when searching the Web. As the size of medical document grows, Automatic Query Expansion might struggle with efficiency and real-time application. Thus, Hyper-Tuned Dual Attention Enhanced Bi-directional Encoder Representation from Transformers (HT-DAEBERT) with automatic ranking based query expansion system is created for enhancing medical document retrieval system. Initially, the user's query from the medical corpus document was collected, and it was augmented using the Generative Adversarial Network (GAN) approach. Then augmented text is pre-processed to improve the original text's quality through tokenization, acronym expansion, stemming, stop word removal, hyperlink removal, and spell correction. After that, Keywords are extracted using the Proximity-based Keyword Extraction (PKE) technique from the pre-processed text. Afterwards, the words are converted into vector form by utilizing the Hyper-Tuned Dual Attention Enhanced Bi-directional Encoder Representation from Transformers (HT-DAEBERT) model. In DAEBERT, key parameters such as dropout rate and weight decay were optimally selected by using the Election Optimization Algorithm (EOA). After that, a ranking-based query expansion approach was employed to enhance the document retrieval system. The proposed method achieves an accuracy of 97.60 %, a Hit Rate of 98.30 %, a PPV of 93.40 %, an F1-Score of 95.79 %, and an NPV of 97.50 %. This approach improves the accuracy and relevance of document retrieval in healthcare, potentially leading to better patient care and enhanced clinical outcomes.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"160 ","pages":"Article 102468"},"PeriodicalIF":2.7,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144306983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ROSI: A hybrid solution for omni-channel feature integration in E-commerce ROSI:电子商务全渠道特色整合的混合解决方案
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-05-31 DOI: 10.1016/j.datak.2025.102465
Luyi Ma , Shengwei Tang , Anjana Ganesh, Jiao Chen, Aashika Padmanabhan, Malay Patel, Jianpeng Xu, Jason Cho, Evren Korpeoglu, Sushant Kumar, Kannan Achan
Efficient integration of customer behavior data across multiple channels, including online and in-store interactions, is essential for developing recommendation systems that enhance customer experiences and maintain a competitive edge in e-commerce. However, the integration process faces several challenges, including data synchronization and discrepancies in data schemas. In this study, we introduce a hybrid data pipeline, ROSI (Retail Online-Store Integration), designed to integrate real-time streaming data from online platforms with batch data from in-store interactions. ROSI employs scalable, fault-tolerant streaming systems for online data and periodic batch processing for offline data, ensuring effective synchronization despite variations in data volume, update frequency, and schema. Our approach incorporates in-memory storage, sliding time windows, and feature registries to support applications such as machine learning model training and real-time inference in recommendation systems. Experimental results on a real-world retail data demonstrate that ROSI is highly robust, with a reduced growth rate of overall latency when data size increases linearly. Additionally, sequential recommendation systems built on the integrated dataset show a 6.25% improvement in ranking metrics. Overall, the proposed hybrid pipeline facilitates more personalized, omnichannel customer experiences while enhancing operational efficiency.
通过多种渠道(包括在线和店内互动)有效整合客户行为数据,对于开发推荐系统至关重要,该系统可以增强客户体验,并在电子商务中保持竞争优势。然而,集成过程面临着一些挑战,包括数据同步和数据模式的差异。在本研究中,我们引入了一种混合数据管道,ROSI(零售在线商店集成),旨在将在线平台的实时流数据与店内交互的批量数据集成在一起。ROSI对在线数据采用可扩展的容错流系统,对离线数据采用定期批处理系统,从而确保在数据量、更新频率和模式变化的情况下有效同步。我们的方法结合了内存存储、滑动时间窗口和特征注册表,以支持推荐系统中的机器学习模型训练和实时推理等应用。在真实零售数据上的实验结果表明,ROSI具有高度鲁棒性,当数据大小线性增加时,总延迟的增长率降低。此外,基于集成数据集构建的顺序推荐系统在排名指标上提高了6.25%。总体而言,拟议的混合管道在提高运营效率的同时,促进了更加个性化、全渠道的客户体验。
{"title":"ROSI: A hybrid solution for omni-channel feature integration in E-commerce","authors":"Luyi Ma ,&nbsp;Shengwei Tang ,&nbsp;Anjana Ganesh,&nbsp;Jiao Chen,&nbsp;Aashika Padmanabhan,&nbsp;Malay Patel,&nbsp;Jianpeng Xu,&nbsp;Jason Cho,&nbsp;Evren Korpeoglu,&nbsp;Sushant Kumar,&nbsp;Kannan Achan","doi":"10.1016/j.datak.2025.102465","DOIUrl":"10.1016/j.datak.2025.102465","url":null,"abstract":"<div><div>Efficient integration of customer behavior data across multiple channels, including online and in-store interactions, is essential for developing recommendation systems that enhance customer experiences and maintain a competitive edge in e-commerce. However, the integration process faces several challenges, including data synchronization and discrepancies in data schemas. In this study, we introduce a hybrid data pipeline, <span>ROSI</span> (Retail Online-Store Integration), designed to integrate real-time streaming data from online platforms with batch data from in-store interactions. <span>ROSI</span> employs scalable, fault-tolerant streaming systems for online data and periodic batch processing for offline data, ensuring effective synchronization despite variations in data volume, update frequency, and schema. Our approach incorporates in-memory storage, sliding time windows, and feature registries to support applications such as machine learning model training and real-time inference in recommendation systems. Experimental results on a real-world retail data demonstrate that <span>ROSI</span> is highly robust, with a reduced growth rate of overall latency when data size increases linearly. Additionally, sequential recommendation systems built on the integrated dataset show a 6.25% improvement in ranking metrics. Overall, the proposed hybrid pipeline facilitates more personalized, omnichannel customer experiences while enhancing operational efficiency.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"160 ","pages":"Article 102465"},"PeriodicalIF":2.7,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144365621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Data & Knowledge Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1