首页 > 最新文献

Data & Knowledge Engineering最新文献

英文 中文
Source-Free Domain Adaptation with complex distribution considerations for time series data 考虑复杂分布的无源域自适应时间序列数据
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-08-21 DOI: 10.1016/j.datak.2025.102501
Jing Shang, Zunming Chen, Zhiwen Xiao, Zhihui Wu, Yifei Zhang, Jibing Wang
Source-Free Domain Adaptation (SFDA) aims to adapt a pre-trained model from a labeled source domain to an unlabeled target domain without accessing source domain data, thereby protecting source domain privacy. Although SFDA has recently been applied to time series data, the inherent complex distribution characteristics including temporal variability and distributional diversity of such data remain underexplored. Time series data exhibit significant dynamic variability influenced by collection environments, leading to discrepancies between sequences. Additionally, multidimensional time series data face distributional diversity across dimensions. These complex characteristics increase the learning difficulty for source models and widen the adaptation gap between the source and target domains. To address these challenges, this paper proposes a novel SFDA method for time series data, named Adaptive Latent Subdomain feature extraction and joint Prediction (ALSP). The method divides the source domain, which has a complex distribution, into multiple latent subdomains with relatively simple distributions, thereby effectively capturing the features of different subdistributions. It extracts latent domain-specific and domain-invariant features to identify subdomain-specific characteristics. Furthermore, it combines domain-specific classifiers and a domain-invariant classifier to enhance model performance through multi-classifier joint prediction. During target domain adaptation, ALSP reduces domain dependence by extracting invariant features, thereby narrowing the distributional gap between the source and target domains. Simultaneously, it leverages prior knowledge from the source domain distribution to support the hypothesis space and dynamically adapt to the target domain. Experiments on three real-world datasets demonstrate that ALSP achieves superior performance in cross-domain time series classification tasks, significantly outperforming existing methods.
无源域自适应(source - free Domain Adaptation, SFDA)是在不访问源域数据的情况下,将预训练好的模型从标记的源域适应到未标记的目标域,从而保护源域的隐私。虽然SFDA最近被应用于时间序列数据,但这些数据固有的复杂分布特征,包括时间变异性和分布多样性,仍未得到充分的探索。时间序列数据受采集环境的影响表现出显著的动态变异性,从而导致序列之间的差异。此外,多维时间序列数据还面临着跨维度分布的多样性。这些复杂的特征增加了源模型的学习难度,扩大了源和目标域之间的适应差距。为了解决这些挑战,本文提出了一种新的时间序列数据的SFDA方法,称为自适应潜在子域特征提取和联合预测(ALSP)。该方法将分布复杂的源域划分为多个分布相对简单的潜在子域,从而有效捕获不同子分布的特征。它提取潜在的领域特定特征和领域不变特征来识别子领域特定特征。此外,该方法结合了领域特定分类器和领域不变分类器,通过多分类器联合预测来提高模型性能。在目标域自适应过程中,ALSP通过提取不变特征来减少域依赖性,从而缩小源域与目标域之间的分布差距。同时,利用源域分布的先验知识支持假设空间,动态适应目标域。在三个真实数据集上的实验表明,ALSP在跨域时间序列分类任务中取得了优异的性能,显著优于现有方法。
{"title":"Source-Free Domain Adaptation with complex distribution considerations for time series data","authors":"Jing Shang,&nbsp;Zunming Chen,&nbsp;Zhiwen Xiao,&nbsp;Zhihui Wu,&nbsp;Yifei Zhang,&nbsp;Jibing Wang","doi":"10.1016/j.datak.2025.102501","DOIUrl":"10.1016/j.datak.2025.102501","url":null,"abstract":"<div><div>Source-Free Domain Adaptation (SFDA) aims to adapt a pre-trained model from a labeled source domain to an unlabeled target domain without accessing source domain data, thereby protecting source domain privacy. Although SFDA has recently been applied to time series data, the inherent complex distribution characteristics including temporal variability and distributional diversity of such data remain underexplored. Time series data exhibit significant dynamic variability influenced by collection environments, leading to discrepancies between sequences. Additionally, multidimensional time series data face distributional diversity across dimensions. These complex characteristics increase the learning difficulty for source models and widen the adaptation gap between the source and target domains. To address these challenges, this paper proposes a novel SFDA method for time series data, named Adaptive Latent Subdomain feature extraction and joint Prediction (ALSP). The method divides the source domain, which has a complex distribution, into multiple latent subdomains with relatively simple distributions, thereby effectively capturing the features of different subdistributions. It extracts latent domain-specific and domain-invariant features to identify subdomain-specific characteristics. Furthermore, it combines domain-specific classifiers and a domain-invariant classifier to enhance model performance through multi-classifier joint prediction. During target domain adaptation, ALSP reduces domain dependence by extracting invariant features, thereby narrowing the distributional gap between the source and target domains. Simultaneously, it leverages prior knowledge from the source domain distribution to support the hypothesis space and dynamically adapt to the target domain. Experiments on three real-world datasets demonstrate that ALSP achieves superior performance in cross-domain time series classification tasks, significantly outperforming existing methods.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"161 ","pages":"Article 102501"},"PeriodicalIF":2.7,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144918105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Elevating human-machine collaboration in NLP for enhanced content creation and decision support 提升NLP中的人机协作,以增强内容创建和决策支持
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-08-21 DOI: 10.1016/j.datak.2025.102505
Priyanka V. Deshmukh, Aniket K. Shahade
Human-machine collaboration in Natural Language Processing (NLP) is revolutionizing content creation and decision support by seamlessly combining the strengths of both entities for enhanced efficiency and quality. The lack of seamless integration between human creativity and machine efficiency in NLP hinders optimal content creation and decision support. The objective of this study is to explore and promote the integration of human-machine collaboration in NLP to enhance both content creation and decision support processes. Data Acquisition for NLP requests involves defining the task and target audience, identifying relevant data sources like text documents and web data, and incorporating human expertise for data curation through validation and annotation. Machine processing techniques like tokenization, stemming/lemmatization, and removal of stop words, as well as human input for tasks like data annotation and error correction, to improve data quality and relevance for NLP applications. The combination of automated processing and human feedback leads to more precise and dependable effects. Techniques such as sentiment analysis, topic modelling, and entity recognition are utilized to excerpt valued perceptions from the data and enhance collaboration between humans and machines. These techniques help to streamline the NLP process and ensure that the system is providing accurate and relevant information to users. The analysis of NLP models in machine processing involves training the models to perform specific tasks, such as summarization, sentiment analysis, information extraction, trend identification, and creative content generation. The results show that social media leads with 90% usage, pivotal for audience engagement, while blogs at 78% highlight their depth in content creation implementation using Python software. These trained models are then used to improve decision-making processes, generate creative content, and enhance the accuracy of search results. The future scope involves leveraging advanced NLP techniques to deepen the collaboration between humans and machines for more effective content creation and decision support.
自然语言处理(NLP)中的人机协作通过无缝结合两个实体的优势来提高效率和质量,从而彻底改变了内容创建和决策支持。在NLP中,人类创造力和机器效率之间缺乏无缝集成,阻碍了最佳的内容创建和决策支持。本研究的目的是探索和促进NLP中人机协作的集成,以增强内容创建和决策支持过程。NLP请求的数据获取包括定义任务和目标受众,识别文本文档和网络数据等相关数据源,并通过验证和注释结合人类专业知识进行数据管理。机器处理技术,如标记化、词干/词法化和删除停止词,以及人工输入任务,如数据注释和纠错,以提高NLP应用程序的数据质量和相关性。自动化处理和人工反馈相结合,效果更加精确和可靠。利用情感分析、主题建模和实体识别等技术从数据中提取有价值的感知,增强人与机器之间的协作。这些技术有助于简化NLP过程,并确保系统向用户提供准确和相关的信息。机器处理中对NLP模型的分析包括训练模型执行特定任务,如总结、情感分析、信息提取、趋势识别和创造性内容生成。结果显示,社交媒体以90%的使用率领先,这对受众参与度至关重要,而博客以78%的使用率突出了他们使用Python软件进行内容创作的深度。然后,这些经过训练的模型被用来改进决策过程,生成创造性的内容,并提高搜索结果的准确性。未来的范围包括利用先进的自然语言处理技术来深化人与机器之间的合作,以更有效地创建内容和决策支持。
{"title":"Elevating human-machine collaboration in NLP for enhanced content creation and decision support","authors":"Priyanka V. Deshmukh,&nbsp;Aniket K. Shahade","doi":"10.1016/j.datak.2025.102505","DOIUrl":"10.1016/j.datak.2025.102505","url":null,"abstract":"<div><div>Human-machine collaboration in Natural Language Processing (NLP) is revolutionizing content creation and decision support by seamlessly combining the strengths of both entities for enhanced efficiency and quality. The lack of seamless integration between human creativity and machine efficiency in NLP hinders optimal content creation and decision support. The objective of this study is to explore and promote the integration of human-machine collaboration in NLP to enhance both content creation and decision support processes. Data Acquisition for NLP requests involves defining the task and target audience, identifying relevant data sources like text documents and web data, and incorporating human expertise for data curation through validation and annotation. Machine processing techniques like tokenization, stemming/lemmatization, and removal of stop words, as well as human input for tasks like data annotation and error correction, to improve data quality and relevance for NLP applications. The combination of automated processing and human feedback leads to more precise and dependable effects. Techniques such as sentiment analysis, topic modelling, and entity recognition are utilized to excerpt valued perceptions from the data and enhance collaboration between humans and machines. These techniques help to streamline the NLP process and ensure that the system is providing accurate and relevant information to users. The analysis of NLP models in machine processing involves training the models to perform specific tasks, such as summarization, sentiment analysis, information extraction, trend identification, and creative content generation. The results show that social media leads with 90% usage, pivotal for audience engagement, while blogs at 78% highlight their depth in content creation implementation using Python software. These trained models are then used to improve decision-making processes, generate creative content, and enhance the accuracy of search results. The future scope involves leveraging advanced NLP techniques to deepen the collaboration between humans and machines for more effective content creation and decision support.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"161 ","pages":"Article 102505"},"PeriodicalIF":2.7,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144908780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ELEVATE-ID: Extending Large Language Models for End-to-End Entity Linking Evaluation in Indonesian 在印尼语中扩展端到端实体链接评估的大型语言模型
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-08-19 DOI: 10.1016/j.datak.2025.102504
Ria Hari Gusmita , Asep Fajar Firmansyah , Hamada M. Zahera , Axel-Cyrille Ngonga Ngomo
Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of natural language processing tasks. However, their effectiveness in low-resource languages remains underexplored, particularly in complex tasks such as end-to-end Entity Linking (EL), which requires both mention detection and disambiguation against a knowledge base (KB). In earlier work, we introduced IndEL — the first end-to-end EL benchmark dataset for the Indonesian language — covering both a general domain (news) and a specific domain (religious text from the Indonesian translation of the Quran), and evaluated four traditional end-to-end EL systems on this dataset. In this study, we propose ELEVATE-ID, a comprehensive evaluation framework for assessing LLM performance on end-to-end EL in Indonesian. The framework evaluates LLMs under both zero-shot and fine-tuned conditions, using multilingual and Indonesian monolingual models, with Wikidata as the target KB. Our experiments include performance benchmarking, generalization analysis across domains, and systematic error analysis. Results show that GPT-4 and GPT-3.5 achieve the highest accuracy in zero-shot and fine-tuned settings, respectively. However, even fine-tuned GPT-3.5 underperforms compared to DBpedia Spotlight — the weakest of the traditional model baselines — in the general domain. Interestingly, GPT-3.5 outperforms Babelfy in the specific domain. Generalization analysis indicates that fine-tuned GPT-3.5 adapts more effectively to cross-domain and mixed-domain scenarios. Error analysis uncovers persistent challenges that hinder LLM performance: difficulties with non-complete mentions, acronym disambiguation, and full-name recognition in formal contexts. These issues point to limitations in mention boundary detection and contextual grounding. Indonesian-pretrained LLMs, Komodo and Merak, reveal core weaknesses: template leakage and entity hallucination, respectively—underscoring architectural and training limitations in low-resource end-to-end EL.1
大型语言模型(llm)在广泛的自然语言处理任务中表现出了卓越的性能。然而,它们在低资源语言中的有效性仍未得到充分探索,特别是在复杂的任务中,如端到端实体链接(EL),这需要对知识库(KB)进行提及检测和消歧。在早期的工作中,我们引入了IndEL——第一个针对印尼语的端到端EL基准数据集——涵盖了一般领域(新闻)和特定领域(来自印尼语古兰经翻译的宗教文本),并在该数据集上评估了四个传统的端到端EL系统。在本研究中,我们提出了ELEVATE-ID,这是一个综合评估框架,用于评估印尼法学硕士在端到端EL方面的表现。该框架使用多语言和印度尼西亚单语言模型,以维基数据作为目标知识库,在零射击和微调条件下评估法学硕士。我们的实验包括性能基准测试、跨域泛化分析和系统误差分析。结果表明,GPT-4和GPT-3.5分别在零射击和微调设置下具有最高的精度。然而,即使是经过微调的GPT-3.5,与DBpedia Spotlight(传统模型基线中最弱的一个)相比,在一般领域的表现也不佳。有趣的是,GPT-3.5在特定领域的表现优于Babelfy。泛化分析表明,优化后的GPT-3.5能够更有效地适应跨域和混合域场景。错误分析揭示了阻碍LLM性能的持续挑战:在正式上下文中出现不完整提及、首字母缩略词消歧和全名识别的困难。这些问题指出了提及边界检测和上下文基础的局限性。印度尼西亚预训练的法学硕士,Komodo和Merak,揭示了核心弱点:模板泄漏和实体幻觉,分别强调了低资源端到端el的架构和培训限制
{"title":"ELEVATE-ID: Extending Large Language Models for End-to-End Entity Linking Evaluation in Indonesian","authors":"Ria Hari Gusmita ,&nbsp;Asep Fajar Firmansyah ,&nbsp;Hamada M. Zahera ,&nbsp;Axel-Cyrille Ngonga Ngomo","doi":"10.1016/j.datak.2025.102504","DOIUrl":"10.1016/j.datak.2025.102504","url":null,"abstract":"<div><div>Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of natural language processing tasks. However, their effectiveness in low-resource languages remains underexplored, particularly in complex tasks such as end-to-end Entity Linking (EL), which requires both mention detection and disambiguation against a knowledge base (KB). In earlier work, we introduced IndEL — the first end-to-end EL benchmark dataset for the Indonesian language — covering both a general domain (news) and a specific domain (religious text from the Indonesian translation of the Quran), and evaluated four traditional end-to-end EL systems on this dataset. In this study, we propose ELEVATE-ID, a comprehensive evaluation framework for assessing LLM performance on end-to-end EL in Indonesian. The framework evaluates LLMs under both zero-shot and fine-tuned conditions, using multilingual and Indonesian monolingual models, with Wikidata as the target KB. Our experiments include performance benchmarking, generalization analysis across domains, and systematic error analysis. Results show that GPT-4 and GPT-3.5 achieve the highest accuracy in zero-shot and fine-tuned settings, respectively. However, even fine-tuned GPT-3.5 underperforms compared to DBpedia Spotlight — the weakest of the traditional model baselines — in the general domain. Interestingly, GPT-3.5 outperforms Babelfy in the specific domain. Generalization analysis indicates that fine-tuned GPT-3.5 adapts more effectively to cross-domain and mixed-domain scenarios. Error analysis uncovers persistent challenges that hinder LLM performance: difficulties with non-complete mentions, acronym disambiguation, and full-name recognition in formal contexts. These issues point to limitations in mention boundary detection and contextual grounding. Indonesian-pretrained LLMs, Komodo and Merak, reveal core weaknesses: template leakage and entity hallucination, respectively—underscoring architectural and training limitations in low-resource end-to-end EL.<span><span><sup>1</sup></span></span></div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"161 ","pages":"Article 102504"},"PeriodicalIF":2.7,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144889217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Time-Aware Complex Question Answering over Temporal Knowledge Graph 基于时间知识图的时间感知复杂问题回答
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-08-18 DOI: 10.1016/j.datak.2025.102503
Luyi Bai, Tongyue Zhang, Guangchen Feng
Knowledge Graph Question Answering (KGQA) is a crucial topic in Knowledge Graphs (KGs), with the objective of retrieving the corresponding facts from KGs to answer given questions. In practical applications, facts in KGs usually have time constraints, thus, question answering on Temporal Knowledge Graphs (TKGs) has attracted extensive attention. Existing Temporal Knowledge Graph Question Answering (TKGQA) methods focus on dealing with complex questions involving multiple facts, and mainly face two challenges. First, these methods only consider matching questions with facts in TKGs to identify the answer, ignoring the temporal order between different facts, which makes it challenging to solve the questions involving temporal order. Second, they usually focus on the representation of the question text while neglecting the rich semantic information within the questions, which leads to certain limitations in understanding question. To address the above challenges, this research proposes a model named Time-Aware Complex Question Answering (TA-CQA). Specifically, we extend the Temporal Knowledge Graph Embedding (TKGE) model by incorporating temporal order information into the embedding vectors, ensuring that the model can distinguish the temporal order of different facts. To enhance the semantic representation of the question, we integrate question information using attention mechanism and learnable encoder. Different from the previous TKGQA methods, we propose time relevance measurement to further enhance the accuracy of answer prediction by better capturing the correlation between question information and time information. Multiple sets of experiments on CronQuestions and TimeQuestions demonstrate our model’s superior performance across all question types. In particular, for complex questions involving multiple facts, the hit@1 values are increased by 3.2% and 3.5% respectively.
知识图谱问答(Knowledge Graph Question answer, KGQA)是知识图谱(Knowledge Graphs, KGs)中的一个重要课题,其目的是从知识图谱中检索相应的事实来回答给定的问题。在实际应用中,时间知识图中的事实通常有时间限制,因此时间知识图的问答技术引起了广泛的关注。现有的时间知识图问答(TKGQA)方法侧重于处理涉及多个事实的复杂问题,主要面临两方面的挑战。首先,这些方法只考虑TKGs中问题与事实的匹配来识别答案,忽略了不同事实之间的时间顺序,这给解决涉及时间顺序的问题带来了挑战。其次,他们通常只关注问题文本的表征,而忽略了问题中丰富的语义信息,这在理解问题上有一定的局限性。为了解决上述问题,本研究提出了一个时间感知复杂问答(TA-CQA)模型。具体来说,我们扩展了时间知识图嵌入(TKGE)模型,将时间顺序信息加入到嵌入向量中,确保模型能够区分不同事实的时间顺序。为了增强问题的语义表征,我们使用注意机制和可学习编码器来整合问题信息。与以往的TKGQA方法不同,我们提出了时间相关性度量,通过更好地捕捉问题信息与时间信息之间的相关性,进一步提高答案预测的准确性。在CronQuestions和TimeQuestions上的多组实验证明了我们的模型在所有问题类型上的优越性能。特别是,对于涉及多个事实的复杂问题,hit@1值分别提高了3.2%和3.5%。
{"title":"Time-Aware Complex Question Answering over Temporal Knowledge Graph","authors":"Luyi Bai,&nbsp;Tongyue Zhang,&nbsp;Guangchen Feng","doi":"10.1016/j.datak.2025.102503","DOIUrl":"10.1016/j.datak.2025.102503","url":null,"abstract":"<div><div>Knowledge Graph Question Answering (KGQA) is a crucial topic in Knowledge Graphs (KGs), with the objective of retrieving the corresponding facts from KGs to answer given questions. In practical applications, facts in KGs usually have time constraints, thus, question answering on Temporal Knowledge Graphs (TKGs) has attracted extensive attention. Existing Temporal Knowledge Graph Question Answering (TKGQA) methods focus on dealing with complex questions involving multiple facts, and mainly face two challenges. First, these methods only consider matching questions with facts in TKGs to identify the answer, ignoring the temporal order between different facts, which makes it challenging to solve the questions involving temporal order. Second, they usually focus on the representation of the question text while neglecting the rich semantic information within the questions, which leads to certain limitations in understanding question. To address the above challenges, this research proposes a model named Time-Aware Complex Question Answering (TA-CQA). Specifically, we extend the Temporal Knowledge Graph Embedding (TKGE) model by incorporating temporal order information into the embedding vectors, ensuring that the model can distinguish the temporal order of different facts. To enhance the semantic representation of the question, we integrate question information using attention mechanism and learnable encoder. Different from the previous TKGQA methods, we propose time relevance measurement to further enhance the accuracy of answer prediction by better capturing the correlation between question information and time information. Multiple sets of experiments on CronQuestions and TimeQuestions demonstrate our model’s superior performance across all question types. In particular, for complex questions involving multiple facts, the hit@1 values are increased by 3.2% and 3.5% respectively.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"161 ","pages":"Article 102503"},"PeriodicalIF":2.7,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144865507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Conceptual modeling of user perspectives — From data warehouses to alliance-driven data ecosystems 用户视角的概念建模——从数据仓库到联盟驱动的数据生态系统
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-08-14 DOI: 10.1016/j.datak.2025.102502
Sandra Geisler , Christoph Quix , István Koren , Matthias Jarke
The increasing complexity of modern information systems has highlighted the need for advanced conceptual modeling techniques that incorporate multi-perspective and view-based approaches. This paper explores the role of multi-perspective modeling and view modeling in designing distributed, heterogeneous systems while addressing diverse user requirements and ensuring semantic consistency. These methods enable the representation of multiple viewpoints, traceability, and dynamic integration across different levels of abstraction. Key advancements in schema mapping, view maintenance, and semantic metadata management are examined, illustrating how they support query optimization, data quality, and interoperability. We discuss how data management architectures, such as data ecosystems, data warehouses, and data lakes, leverage these innovations to enable flexible and sustainable data sharing. By integrating user-centric and goal-oriented modeling frameworks, the alignment of technical design with organizational and social requirements is emphasized. Future challenges include the need for enhanced reasoning capabilities and collaborative tools to manage the growing complexity of interconnected systems while maintaining adaptability and trust.
现代信息系统的日益复杂突出了对先进的概念建模技术的需求,这些技术包括多视角和基于视图的方法。本文探讨了多角度建模和视图建模在设计分布式异构系统中的作用,同时满足不同的用户需求并确保语义一致性。这些方法支持跨不同抽象级别的多视点、可追溯性和动态集成的表示。研究了模式映射、视图维护和语义元数据管理方面的关键改进,说明了它们如何支持查询优化、数据质量和互操作性。我们将讨论数据管理架构(如数据生态系统、数据仓库和数据湖)如何利用这些创新来实现灵活和可持续的数据共享。通过集成以用户为中心和面向目标的建模框架,强调了技术设计与组织和社会需求的一致性。未来的挑战包括需要增强推理能力和协作工具来管理互联系统日益增长的复杂性,同时保持适应性和信任。
{"title":"Conceptual modeling of user perspectives — From data warehouses to alliance-driven data ecosystems","authors":"Sandra Geisler ,&nbsp;Christoph Quix ,&nbsp;István Koren ,&nbsp;Matthias Jarke","doi":"10.1016/j.datak.2025.102502","DOIUrl":"10.1016/j.datak.2025.102502","url":null,"abstract":"<div><div>The increasing complexity of modern information systems has highlighted the need for advanced conceptual modeling techniques that incorporate multi-perspective and view-based approaches. This paper explores the role of multi-perspective modeling and view modeling in designing distributed, heterogeneous systems while addressing diverse user requirements and ensuring semantic consistency. These methods enable the representation of multiple viewpoints, traceability, and dynamic integration across different levels of abstraction. Key advancements in schema mapping, view maintenance, and semantic metadata management are examined, illustrating how they support query optimization, data quality, and interoperability. We discuss how data management architectures, such as data ecosystems, data warehouses, and data lakes, leverage these innovations to enable flexible and sustainable data sharing. By integrating user-centric and goal-oriented modeling frameworks, the alignment of technical design with organizational and social requirements is emphasized. Future challenges include the need for enhanced reasoning capabilities and collaborative tools to manage the growing complexity of interconnected systems while maintaining adaptability and trust.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"161 ","pages":"Article 102502"},"PeriodicalIF":2.7,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144988186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corrigendum to “Unraveling the foundations and the evolution of conceptual modeling—Intellectual structure, current themes, and trajectories” [Knowledge and Data Engineering 154, 2024, 102351] “揭示概念建模的基础和演变——知识结构、当前主题和轨迹”的勘误表[j] .知识与数据工程154,2024,102351。
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-08-11 DOI: 10.1016/j.datak.2025.102498
Jacky Akoka , Isabelle Comyn-Wattiau , Nicolas Prat , Veda C. Storey
{"title":"Corrigendum to “Unraveling the foundations and the evolution of conceptual modeling—Intellectual structure, current themes, and trajectories” [Knowledge and Data Engineering 154, 2024, 102351]","authors":"Jacky Akoka ,&nbsp;Isabelle Comyn-Wattiau ,&nbsp;Nicolas Prat ,&nbsp;Veda C. Storey","doi":"10.1016/j.datak.2025.102498","DOIUrl":"10.1016/j.datak.2025.102498","url":null,"abstract":"","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"160 ","pages":"Article 102498"},"PeriodicalIF":2.7,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145117782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Conceptual modeling: A large language model assistant for characterizing research contributions 概念建模:描述研究成果的大型语言模型助手
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-08-11 DOI: 10.1016/j.datak.2025.102497
Stephen W. Liddle , Heinrich C. Mayr , Oscar Pastor , Veda C. Storey , Bernhard Thalheim
The body of conceptual modeling research publications is vast and diverse, making it challenging for a single researcher or research group to fully comprehend the field’s overall development. Although some approaches have been proposed to help organize these research contributions, it is still unrealistic to expect human experts to manually comprehend and characterize all of this research. However, as generative AI tools based on large language models, such as ChatGPT, become increasingly sophisticated, it may be possible to replace or augment tedious, manual work with semi-automated approaches. In this research, we present a customized version of ChatGPT that is tuned to the task of characterizing conceptual modeling research. Experiments with this AI tool demonstrate that it is feasible to create a usable knowledge survey for the continually evolving body of conceptual modeling research contributions.
概念建模研究出版物的主体是巨大的和多样化的,这使得单个研究人员或研究小组很难完全理解该领域的整体发展。尽管已经提出了一些方法来帮助组织这些研究贡献,但期望人类专家手动理解和描述所有这些研究仍然是不现实的。然而,随着基于大型语言模型(如ChatGPT)的生成式人工智能工具变得越来越复杂,有可能用半自动方法取代或增加繁琐的手工工作。在这项研究中,我们提出了一个定制版本的ChatGPT,用于表征概念建模研究的任务。使用该人工智能工具的实验表明,为不断发展的概念建模研究贡献体创建可用的知识调查是可行的。
{"title":"Conceptual modeling: A large language model assistant for characterizing research contributions","authors":"Stephen W. Liddle ,&nbsp;Heinrich C. Mayr ,&nbsp;Oscar Pastor ,&nbsp;Veda C. Storey ,&nbsp;Bernhard Thalheim","doi":"10.1016/j.datak.2025.102497","DOIUrl":"10.1016/j.datak.2025.102497","url":null,"abstract":"<div><div>The body of conceptual modeling research publications is vast and diverse, making it challenging for a single researcher or research group to fully comprehend the field’s overall development. Although some approaches have been proposed to help organize these research contributions, it is still unrealistic to expect human experts to manually comprehend and characterize all of this research. However, as generative AI tools based on large language models, such as ChatGPT, become increasingly sophisticated, it may be possible to replace or augment tedious, manual work with semi-automated approaches. In this research, we present a customized version of ChatGPT that is tuned to the task of characterizing conceptual modeling research. Experiments with this AI tool demonstrate that it is feasible to create a usable knowledge survey for the continually evolving body of conceptual modeling research contributions.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"161 ","pages":"Article 102497"},"PeriodicalIF":2.7,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144840695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semantic-aware query answering with Large Language Models 基于大型语言模型的语义感知查询应答
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-08-05 DOI: 10.1016/j.datak.2025.102494
Paolo Atzeni , Teodoro Baldazzi , Luigi Bellomarini , Eleonora Laurenza , Emanuel Sallinger
In the modern data-driven world, answering queries over heterogeneous and semantically inconsistent data remains a significant challenge. Modern datasets originate from diverse sources, such as relational databases, semi-structured repositories, and unstructured documents, leading to substantial variability in schemas, terminologies, and data formats. Traditional systems, constrained by rigid syntactic matching and strict data binding, struggle to capture critical semantic connections and schema ambiguities, failing to meet the growing demand among data scientists for advanced forms of flexibility and context-awareness in query answering. In parallel, the advent of Large Language Models (LLMs) has introduced new capabilities in natural language interpretation, making them highly promising for addressing such challenges. However, LLMs alone lack the systematic rigor and explainability required for robust query processing and decision-making in high-stakes domains. In this paper, we propose Soft Query Answering (Soft QA), a novel hybrid approach that integrates LLMs as an intermediate semantic layer within the query processing pipeline. Soft QA enhances query answering adaptability and flexibility by injecting semantic understanding through context-aware, schema-informed prompts, and leverages LLMs to semantically link entities, resolve ambiguities, and deliver accurate query results in complex settings. We demonstrate its practical effectiveness through real-world examples, highlighting its ability to resolve semantic mismatches and improve query outcomes without requiring extensive data cleaning or restructuring.
在现代数据驱动的世界中,回答对异构和语义不一致数据的查询仍然是一个重大挑战。现代数据集来自不同的来源,例如关系数据库、半结构化存储库和非结构化文档,这导致模式、术语和数据格式存在很大的可变性。传统系统受到严格的语法匹配和严格的数据绑定的约束,难以捕获关键的语义连接和模式歧义,无法满足数据科学家对查询回答中高级形式的灵活性和上下文感知的日益增长的需求。与此同时,大型语言模型(llm)的出现为自然语言解释引入了新的功能,使它们在解决此类挑战方面非常有希望。然而,法学硕士本身缺乏在高风险领域进行稳健查询处理和决策所需的系统严谨性和可解释性。在本文中,我们提出了软查询应答(Soft QA),这是一种新颖的混合方法,它将llm作为查询处理管道中的中间语义层集成在一起。软QA通过上下文感知、模式通知提示注入语义理解来增强查询回答的适应性和灵活性,并利用llm在复杂设置中语义链接实体、解决歧义并提供准确的查询结果。我们通过现实世界的例子展示了它的实际有效性,强调了它解决语义不匹配和改进查询结果的能力,而不需要大量的数据清理或重组。
{"title":"Semantic-aware query answering with Large Language Models","authors":"Paolo Atzeni ,&nbsp;Teodoro Baldazzi ,&nbsp;Luigi Bellomarini ,&nbsp;Eleonora Laurenza ,&nbsp;Emanuel Sallinger","doi":"10.1016/j.datak.2025.102494","DOIUrl":"10.1016/j.datak.2025.102494","url":null,"abstract":"<div><div>In the modern data-driven world, answering queries over heterogeneous and semantically inconsistent data remains a significant challenge. Modern datasets originate from diverse sources, such as relational databases, semi-structured repositories, and unstructured documents, leading to substantial variability in schemas, terminologies, and data formats. Traditional systems, constrained by rigid syntactic matching and strict data binding, struggle to capture critical semantic connections and schema ambiguities, failing to meet the growing demand among data scientists for advanced forms of flexibility and context-awareness in query answering. In parallel, the advent of Large Language Models (LLMs) has introduced new capabilities in natural language interpretation, making them highly promising for addressing such challenges. However, LLMs alone lack the systematic rigor and explainability required for robust query processing and decision-making in high-stakes domains. In this paper, we propose Soft Query Answering (Soft QA), a novel hybrid approach that integrates LLMs as an intermediate semantic layer within the query processing pipeline. Soft QA enhances query answering adaptability and flexibility by injecting semantic understanding through context-aware, schema-informed prompts, and leverages LLMs to semantically link entities, resolve ambiguities, and deliver accurate query results in complex settings. We demonstrate its practical effectiveness through real-world examples, highlighting its ability to resolve semantic mismatches and improve query outcomes without requiring extensive data cleaning or restructuring.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"161 ","pages":"Article 102494"},"PeriodicalIF":2.7,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144830326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corrigendum to “Domain knowledge in artificial intelligence: Using conceptual modeling to increase machine learning accuracy and explainability” [Knowledge and Data Engineering Volume 160, November 2025, 102482] “人工智能领域知识:使用概念建模来提高机器学习的准确性和可解释性”的勘误表[知识与数据工程卷160,十一月2025,102482]
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-08-05 DOI: 10.1016/j.datak.2025.102500
Veda C. Storey , Jeffrey Parsons , Arturo Castellanos Bueso , Monica Chiarini Tremblay , Roman Lukyanenko , Alfred Castillo , Wolfgang Maaß
{"title":"Corrigendum to “Domain knowledge in artificial intelligence: Using conceptual modeling to increase machine learning accuracy and explainability” [Knowledge and Data Engineering Volume 160, November 2025, 102482]","authors":"Veda C. Storey ,&nbsp;Jeffrey Parsons ,&nbsp;Arturo Castellanos Bueso ,&nbsp;Monica Chiarini Tremblay ,&nbsp;Roman Lukyanenko ,&nbsp;Alfred Castillo ,&nbsp;Wolfgang Maaß","doi":"10.1016/j.datak.2025.102500","DOIUrl":"10.1016/j.datak.2025.102500","url":null,"abstract":"","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"160 ","pages":"Article 102500"},"PeriodicalIF":2.7,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145120290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corrigendum to “Large Language Models for Conceptual Modeling: Assessment and Application Potential” [Knowledge and Data Engineering Volume 160, November 2025, 102480] “用于概念建模的大型语言模型:评估和应用潜力”的勘误表[知识和数据工程卷160,十一月2025,102480]
IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-08-05 DOI: 10.1016/j.datak.2025.102499
Veda C. Storey , Jeffrey Parsons , Arturo Castellanos Bueso , Monica Chiarini Tremblay , Roman Lukyanenko , Alfred Castillo , Wolfgang Maaß
{"title":"Corrigendum to “Large Language Models for Conceptual Modeling: Assessment and Application Potential” [Knowledge and Data Engineering Volume 160, November 2025, 102480]","authors":"Veda C. Storey ,&nbsp;Jeffrey Parsons ,&nbsp;Arturo Castellanos Bueso ,&nbsp;Monica Chiarini Tremblay ,&nbsp;Roman Lukyanenko ,&nbsp;Alfred Castillo ,&nbsp;Wolfgang Maaß","doi":"10.1016/j.datak.2025.102499","DOIUrl":"10.1016/j.datak.2025.102499","url":null,"abstract":"","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"160 ","pages":"Article 102499"},"PeriodicalIF":2.7,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145117780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Data & Knowledge Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1