Pub Date : 2025-01-18DOI: 10.1016/j.websem.2024.100859
Aldo Gangemi , Andrea Giovanni Nuzzolese
Semantic Knowledge Graphs (SKG) face challenges with scalability, flexibility, contextual understanding, and handling unstructured or ambiguous information. However, they offer formal and structured knowledge enabling highly interpretable and reliable results by means of reasoning and querying. Large Language Models (LLMs) may overcome those limitations, making them suitable in open-ended tasks and unstructured environments. Nevertheless, LLMs are hardly interpretable and often unreliable. To take the best out of LLMs and SKGs, we envision Logic Augmented Generation (LAG) to combine the benefits of the two worlds. LAG uses LLMs as Reactive Continuous Knowledge Graphs that can generate potentially infinite relations and tacit knowledge on-demand. LAG uses SKGs to inject a discrete heuristic dimension with clear logical and factual boundaries. We exemplify LAG in two tasks of collective intelligence, i.e., medical diagnostics and climate projections. Understanding the properties and limitations of LAG, which are still mostly unknown, is of utmost importance for enabling a variety of tasks involving tacit knowledge in order to provide interpretable and effective results.
{"title":"Logic Augmented Generation","authors":"Aldo Gangemi , Andrea Giovanni Nuzzolese","doi":"10.1016/j.websem.2024.100859","DOIUrl":"10.1016/j.websem.2024.100859","url":null,"abstract":"<div><div>Semantic Knowledge Graphs (SKG) face challenges with scalability, flexibility, contextual understanding, and handling unstructured or ambiguous information. However, they offer formal and structured knowledge enabling highly interpretable and reliable results by means of reasoning and querying. Large Language Models (LLMs) may overcome those limitations, making them suitable in open-ended tasks and unstructured environments. Nevertheless, LLMs are hardly interpretable and often unreliable. To take the best out of LLMs and SKGs, we envision Logic Augmented Generation (LAG) to combine the benefits of the two worlds. LAG uses LLMs as Reactive Continuous Knowledge Graphs that can generate potentially infinite relations and tacit knowledge on-demand. LAG uses SKGs to inject a discrete heuristic dimension with clear logical and factual boundaries. We exemplify LAG in two tasks of collective intelligence, i.e., medical diagnostics and climate projections. Understanding the properties and limitations of LAG, which are still mostly unknown, is of utmost importance for enabling a variety of tasks involving tacit knowledge in order to provide interpretable and effective results.</div></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"85 ","pages":"Article 100859"},"PeriodicalIF":2.1,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143165568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-15DOI: 10.1016/j.websem.2024.100858
Juan Sequeda, Dean Allemang, Bryon Jacob
Generative AI provides an innovative and exciting way to manage knowledge and data at any scale; for small projects, at the enterprise level, and even at a world wide web scale. It is tempting to think that Generative AI has made other knowledge-based technologies obsolete; that anything we wanted to do with knowledge-based systems, Knowledge Graphs or even expert systems can instead be done with Generative AI. Our position is counter to that conclusion.
Our practical experience on implementing enterprise question answering systems using Generative AI has shown that Knowledge Graphs support this infrastructure in multiple ways: they provide a formal framework to evaluate the validity of a query generated by an LLM, serve as a foundation for explaining results, and offer access to governed and trusted data. In this position paper, we share our experience, present industry needs, and outline the opportunities for future research contributions.
{"title":"Knowledge Graphs as a source of trust for LLM-powered enterprise question answering","authors":"Juan Sequeda, Dean Allemang, Bryon Jacob","doi":"10.1016/j.websem.2024.100858","DOIUrl":"10.1016/j.websem.2024.100858","url":null,"abstract":"<div><div>Generative AI provides an innovative and exciting way to manage knowledge and data at any scale; for small projects, at the enterprise level, and even at a world wide web scale. It is tempting to think that Generative AI has made other knowledge-based technologies obsolete; that anything we wanted to do with knowledge-based systems, Knowledge Graphs or even expert systems can instead be done with Generative AI. Our position is counter to that conclusion.</div><div>Our practical experience on implementing enterprise question answering systems using Generative AI has shown that Knowledge Graphs support this infrastructure in multiple ways: they provide a formal framework to evaluate the validity of a query generated by an LLM, serve as a foundation for explaining results, and offer access to governed and trusted data. In this position paper, we share our experience, present industry needs, and outline the opportunities for future research contributions.</div></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"85 ","pages":"Article 100858"},"PeriodicalIF":2.1,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143165567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Exploratory search on Knowledge Graphs (KGs) arises when a user needs to understand and extract insights from an unfamiliar KG. In these exploratory sessions, the users issue a series of queries to identify relevant portions of the KG that can answer their questions, with each query answer informing the formulation of the next query. Despite the widespread adoption of KGs, the needs of current KG exploration use cases are not well understood. This work presents the “Exploratory Search Workflows” (ESW) collection focusing on real-world exploration sessions of an open-domain KG, Wikidata, conducted by 57 M.Sc. Computer Engineering students in two advanced Graph Database course editions. This resource includes 234 real exploratory workflows, each containing an average of 45 SPARQL queries and reference workflows that serve as gold-standard solutions to the proposed tasks. The ESW collection is also available as an RDF graph and accessible via a public SPARQL endpoint. It allows for analysis of real user sessions, understanding query evolution and complexity, and serves as the first query benchmark for KG management systems for exploratory search.
{"title":"The ESW of Wikidata: Exploratory search workflows on Knowledge Graphs","authors":"Matteo Lissandrini , Gianmarco Prando , Gianmaria Silvello","doi":"10.1016/j.websem.2024.100860","DOIUrl":"10.1016/j.websem.2024.100860","url":null,"abstract":"<div><div>Exploratory search on Knowledge Graphs (KGs) arises when a user needs to understand and extract insights from an unfamiliar KG. In these exploratory sessions, the users issue a series of queries to identify relevant portions of the KG that can answer their questions, with each query answer informing the formulation of the next query. Despite the widespread adoption of KGs, the needs of current KG exploration use cases are not well understood. This work presents the “Exploratory Search Workflows” (ESW) collection focusing on real-world exploration sessions of an open-domain KG, Wikidata, conducted by 57 M.Sc. Computer Engineering students in two advanced Graph Database course editions. This resource includes 234 real exploratory workflows, each containing an average of 45 SPARQL queries and reference workflows that serve as gold-standard solutions to the proposed tasks. The ESW collection is also available as an RDF graph and accessible via a public SPARQL endpoint. It allows for analysis of real user sessions, understanding query evolution and complexity, and serves as the first query benchmark for KG management systems for exploratory search.</div></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"85 ","pages":"Article 100860"},"PeriodicalIF":2.1,"publicationDate":"2025-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143165570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.1016/j.websem.2024.100848
Pankaj Singh , Plaban Kumar Bhowmick
Recent entity-based retrieval models utilizing knowledge bases have shown significant improvement in ad-hoc retrieval. However, a lack of coherence between candidate entities can lead to query intent drift at retrieval time. To address this issue, we present an entity selection algorithm that utilizes a graph clustering framework to discover the semantics between entities and encompass the query with highly coherent entities accumulated from different resources, including knowledge bases, and pseudo-relevance feedback documents. Through this work, we propose: (1) An entity acquisition strategy to systematically acquire coherent entities for query expansion. (2) We propose a graph representation of entities to capture the coherence between entities where nodes correspond to the entities and edges represent semantic relatedness between entities. (3) We propose two different entity ranking approaches to select candidate entities based on the coherence with query entities and other coherent entities. A set of experiments on five TREC collections: ClueWeb09B, ClueWeb12B, Robust04, GOV2, and MS-Marco dataset under document retrieval task were conducted to verify the proposed algorithm’s performance. The reported results indicated that the proposed methodology outperforms existing state-of-the-art retrieval approaches in terms of MAP, NDCG, and P@20. The code and relevant data are available in https://github.com/pankajkashyap65/KnowledgeGraph.
{"title":"Knowledge graph based entity selection framework for ad-hoc retrieval","authors":"Pankaj Singh , Plaban Kumar Bhowmick","doi":"10.1016/j.websem.2024.100848","DOIUrl":"10.1016/j.websem.2024.100848","url":null,"abstract":"<div><div>Recent entity-based retrieval models utilizing knowledge bases have shown significant improvement in ad-hoc retrieval. However, a lack of coherence between candidate entities can lead to query intent drift at retrieval time. To address this issue, we present an entity selection algorithm that utilizes a graph clustering framework to discover the semantics between entities and encompass the query with highly coherent entities accumulated from different resources, including knowledge bases, and pseudo-relevance feedback documents. Through this work, we propose: (1) An entity acquisition strategy to systematically acquire coherent entities for query expansion. (2) We propose a graph representation of entities to capture the coherence between entities where nodes correspond to the entities and edges represent semantic relatedness between entities. (3) We propose two different entity ranking approaches to select candidate entities based on the coherence with query entities and other coherent entities. A set of experiments on five TREC collections: ClueWeb09B, ClueWeb12B, Robust04, GOV2, and MS-Marco dataset under document retrieval task were conducted to verify the proposed algorithm’s performance. The reported results indicated that the proposed methodology outperforms existing state-of-the-art retrieval approaches in terms of MAP, NDCG, and P@20. The code and relevant data are available in <span><span>https://github.com/pankajkashyap65/KnowledgeGraph</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"84 ","pages":"Article 100848"},"PeriodicalIF":2.1,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143161137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.1016/j.websem.2024.100849
Laura Waltersdorfer , Marta Sabou
Auditing complex Artificial Intelligence (AI) systems is gaining importance in light of new regulations and is particularly challenging in terms of system complexity, knowledge integration, and differing transparency needs. Current AI auditing tools however, lack semantic context, resulting in difficulties for auditors in effectively collecting and integrating, but also for analysing and querying audit data. In this position paper, we explore how Knowledge Graphs (KGs) can address these challenges by offering a structured and integrative approach to collecting and transforming audit traces. This work discusses the current limitations in both AI auditing processes and tools. Furthermore, we examine how KGs can play a transformative role in overcoming these obstacles to achieve improved auditability and transparency of AI systems.
{"title":"Leveraging Knowledge Graphs for AI System Auditing and Transparency","authors":"Laura Waltersdorfer , Marta Sabou","doi":"10.1016/j.websem.2024.100849","DOIUrl":"10.1016/j.websem.2024.100849","url":null,"abstract":"<div><div>Auditing complex Artificial Intelligence (AI) systems is gaining importance in light of new regulations and is particularly challenging in terms of system complexity, knowledge integration, and differing transparency needs. Current AI auditing tools however, lack semantic context, resulting in difficulties for auditors in effectively collecting and integrating, but also for analysing and querying audit data. In this position paper, we explore how Knowledge Graphs (KGs) can address these challenges by offering a structured and integrative approach to collecting and transforming audit traces. This work discusses the current limitations in both AI auditing processes and tools. Furthermore, we examine how KGs can play a transformative role in overcoming these obstacles to achieve improved auditability and transparency of AI systems.</div></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"84 ","pages":"Article 100849"},"PeriodicalIF":2.1,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143161140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.1016/j.websem.2024.100846
Enrico Daga
In the past years, a new generation of systems has emerged, which apply recent advances in generative Artificial Intelligence (AI) in combination with traditional technologies. Specifically, generative AI is being delegated tasks in natural language or vision understanding within complex hybrid architectures that also include databases, procedural code, and interfaces. Process Knowledge Graphs (PKG) have a long-standing tradition within symbolic AI research. On the one hand, PKGs can play an important role in describing complex, hybrid applications, thus opening the way for addressing fundamental challenges such as explaining and documenting such systems (unpacking). On the other hand, by organising complex processes in simpler building blocks, PKGs can potentially increase accuracy and control over such systems (repacking). In this position paper, we discuss opportunities and challenges of PGRs and their potential role towards a more robust and principled design of AI applications.
{"title":"Process Knowledge Graphs (PKG): Towards unpacking and repacking AI applications","authors":"Enrico Daga","doi":"10.1016/j.websem.2024.100846","DOIUrl":"10.1016/j.websem.2024.100846","url":null,"abstract":"<div><div>In the past years, a new generation of systems has emerged, which apply recent advances in generative Artificial Intelligence (AI) in combination with traditional technologies. Specifically, generative AI is being delegated tasks in natural language or vision understanding within complex hybrid architectures that also include databases, procedural code, and interfaces. Process Knowledge Graphs (PKG) have a long-standing tradition within symbolic AI research. On the one hand, PKGs can play an important role in describing complex, hybrid applications, thus opening the way for addressing fundamental challenges such as explaining and documenting such systems (unpacking). On the other hand, by organising complex processes in simpler building blocks, PKGs can potentially increase accuracy and control over such systems (repacking). In this position paper, we discuss opportunities and challenges of PGRs and their potential role towards a more robust and principled design of AI applications.</div></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"84 ","pages":"Article 100846"},"PeriodicalIF":2.1,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143161420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With digital transformation, industrial companies today are facing the challenges to change and innovate their business, by leveraging digital technologies and tools to support their processes and their operations. One of their main challenges is the management of the company knowledge, especially when tacit and owned by industry workers. In this paper, we illustrate how knowledge graphs can be the turning point to allow industry workers digitize and exploit the knowledge about the “what”, the “how” and the “why” of their everyday activities.
In particular, we focus on the “how” by illustrating the challenges related to procedural knowledge management, i.e., the knowledge about processes and workflows that employees need to follow, and comply with, to correctly execute their tasks, in order to improve efficiency and effectiveness, to reduce risks and human errors and to optimize operations. We also explain the relationship in this context between knowledge graphs and sub-symbolic AI approaches.
{"title":"Procedural knowledge management in Industry 5.0: Challenges and opportunities for knowledge graphs","authors":"Irene Celino, Valentina Anita Carriero, Antonia Azzini, Ilaria Baroni, Mario Scrocca","doi":"10.1016/j.websem.2024.100850","DOIUrl":"10.1016/j.websem.2024.100850","url":null,"abstract":"<div><div>With digital transformation, industrial companies today are facing the challenges to change and innovate their business, by leveraging digital technologies and tools to support their processes and their operations. One of their main challenges is the management of the company knowledge, especially when tacit and owned by industry workers. In this paper, we illustrate how knowledge graphs can be the turning point to allow industry workers digitize and exploit the knowledge about the “what”, the “how” and the “why” of their everyday activities.</div><div>In particular, we focus on the “how” by illustrating the challenges related to procedural knowledge management, i.e., the knowledge about processes and workflows that employees need to follow, and comply with, to correctly execute their tasks, in order to improve efficiency and effectiveness, to reduce risks and human errors and to optimize operations. We also explain the relationship in this context between knowledge graphs and sub-symbolic AI approaches.</div></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"84 ","pages":"Article 100850"},"PeriodicalIF":2.1,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143161139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.1016/j.websem.2024.100842
Cogan Shimizu , Shirly Stephen , Adrita Barua , Ling Cai , Antrea Christou , Kitty Currier , Abhilekha Dalal , Colby K. Fisher , Pascal Hitzler , Krzysztof Janowicz , Wenwen Li , Zilong Liu , Mohammad Saeid Mahdavinejad , Gengchen Mai , Dean Rehberger , Mark Schildhauer , Meilin Shi , Sanaz Saki Norouzi , Yuanyuan Tian , Sizhe Wang , Rui Zhu
KnowWhereGraph is one of the largest fully publicly available geospatial knowledge graphs. It includes data from 30 layers on natural hazards (e.g., hurricanes, wildfires), climate variables (e.g., air temperature, precipitation), soil properties, crop and land-cover types, demographics, and human health, various place and region identifiers, among other themes. These have been leveraged through the graph by a variety of applications to address challenges in food security and agricultural supply chains; sustainability related to soil conservation practices and farm labor; and delivery of emergency humanitarian aid following a disaster. In this paper, we introduce the ontology that acts as the schema for KnowWhereGraph. This broad overview provides insight into the requirements and design specifications for the graph and its schema, including the development methodology (modular ontology modeling) and the resources utilized to implement, materialize, and deploy KnowWhereGraph with its end-user interfaces and public query SPARQL endpoint.
{"title":"The KnowWhereGraph ontology","authors":"Cogan Shimizu , Shirly Stephen , Adrita Barua , Ling Cai , Antrea Christou , Kitty Currier , Abhilekha Dalal , Colby K. Fisher , Pascal Hitzler , Krzysztof Janowicz , Wenwen Li , Zilong Liu , Mohammad Saeid Mahdavinejad , Gengchen Mai , Dean Rehberger , Mark Schildhauer , Meilin Shi , Sanaz Saki Norouzi , Yuanyuan Tian , Sizhe Wang , Rui Zhu","doi":"10.1016/j.websem.2024.100842","DOIUrl":"10.1016/j.websem.2024.100842","url":null,"abstract":"<div><div>KnowWhereGraph is one of the largest fully publicly available geospatial knowledge graphs. It includes data from 30 layers on natural hazards (e.g., hurricanes, wildfires), climate variables (e.g., air temperature, precipitation), soil properties, crop and land-cover types, demographics, and human health, various place and region identifiers, among other themes. These have been leveraged through the graph by a variety of applications to address challenges in food security and agricultural supply chains; sustainability related to soil conservation practices and farm labor; and delivery of emergency humanitarian aid following a disaster. In this paper, we introduce the ontology that acts as the schema for KnowWhereGraph. This broad overview provides insight into the requirements and design specifications for the graph and its schema, including the development methodology (modular ontology modeling) and the resources utilized to implement, materialize, and deploy KnowWhereGraph with its end-user interfaces and public query SPARQL endpoint.</div></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"84 ","pages":"Article 100842"},"PeriodicalIF":2.1,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143161417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.1016/j.websem.2024.100841
Gianluca Cima , Domenico Lembo , Lorenzo Marconi , Riccardo Rosati , Domenico Fabio Savo
In this paper we study Controlled Query Evaluation (CQE), a declarative approach to privacy-preserving query answering over databases, knowledge bases, and ontologies. CQE is based on the notion of censor, which defines the answers to each query posed to the data/knowledge base. We investigate both semantic and computational properties of CQE in the context of OWL ontologies, and specifically in the description logic , which underpins the OWL 2 QL profile. In our analysis, we focus on semantics of CQE based on censors (called optimal GA censors) that enjoy the so-called indistinguishability property, analyzing the trade-off between maximizing the amount of data disclosed by query answers and minimizing the computational cost of privacy-preserving query answering. We first study the data complexity of skeptical entailment of unions of conjunctive queries under all the optimal GA censors, showing that the computational cost of query answering in this setting is intractable. To overcome this computational issue, we then define a different semantics for CQE centered around the notion of intersection of all the optimal GA censors. We show that query answering over OWL 2 QL ontologies under the new intersection-based semantics for CQE enjoys tractability and is first-order rewritable, i.e. amenable to be implemented through SQL query rewriting techniques and the use of standard relational database systems; on the other hand, this approach shows limitations in terms of amount of data disclosed. To improve this aspect, we add preferences between ontology predicates to the CQE framework, and identify a semantics under which query answering over OWL 2 QL ontologies maintains the same computational properties of the intersection-based approach without preferences.
{"title":"Indistinguishability in controlled query evaluation over prioritized description logic ontologies","authors":"Gianluca Cima , Domenico Lembo , Lorenzo Marconi , Riccardo Rosati , Domenico Fabio Savo","doi":"10.1016/j.websem.2024.100841","DOIUrl":"10.1016/j.websem.2024.100841","url":null,"abstract":"<div><div>In this paper we study <em>Controlled Query Evaluation (CQE)</em>, a declarative approach to privacy-preserving query answering over databases, knowledge bases, and ontologies. CQE is based on the notion of <em>censor</em>, which defines the answers to each query posed to the data/knowledge base. We investigate both semantic and computational properties of CQE in the context of OWL ontologies, and specifically in the description logic <span><math><msub><mrow><mtext>DL-Lite</mtext></mrow><mrow><mi>R</mi></mrow></msub></math></span>, which underpins the OWL 2 QL profile. In our analysis, we focus on semantics of CQE based on censors (called <em>optimal GA censors</em>) that enjoy the so-called <em>indistinguishability</em> property, analyzing the trade-off between maximizing the amount of data disclosed by query answers and minimizing the computational cost of privacy-preserving query answering. We first study the data complexity of <em>skeptical entailment</em> of unions of conjunctive queries under all the optimal GA censors, showing that the computational cost of query answering in this setting is intractable. To overcome this computational issue, we then define a different semantics for CQE centered around the notion of <em>intersection</em> of all the optimal GA censors. We show that query answering over OWL 2 QL ontologies under the new intersection-based semantics for CQE enjoys tractability and is <em>first-order rewritable</em>, i.e. amenable to be implemented through SQL query rewriting techniques and the use of standard relational database systems; on the other hand, this approach shows limitations in terms of amount of data disclosed. To improve this aspect, we add preferences between ontology predicates to the CQE framework, and identify a semantics under which query answering over OWL 2 QL ontologies maintains the same computational properties of the intersection-based approach without preferences.</div></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"84 ","pages":"Article 100841"},"PeriodicalIF":2.1,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143161409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Foundation Models (FMs) hold transformative potential to accelerate scientific discovery, yet reaching their full capacity in complex, highly multimodal domains such as genomics, drug discovery, and materials science requires a deeper consideration of the contextual nature of the scientific knowledge. We revisit the synergy between FMs and Multimodal Knowledge Graph (MKG) representation and learning, exploring their potential to enhance predictive and generative tasks in biomedical contexts like drug discovery. We seek to exploit MKGs to improve generative AI models’ ability to capture intricate domain-specific relations and facilitate multimodal fusion. This integration promises to accelerate discovery workflows by providing more meaningful multimodal knowledge-enhanced representations and contextual evidence. Despite this potential, challenges and opportunities remain, including fusing multiple sequential, structural and knowledge modalities and models leveraging the strengths of each; developing scalable architectures for multi-task multi-dataset learning; creating end-to-end workflows to enhance the trustworthiness of biomedical FMs using knowledge from heterogeneous datasets and scientific literature; the domain data bottleneck and the lack of a unified representation between natural language and chemical representations; and benchmarking, specifically the transfer learning to tasks with limited data (e.g., unseen molecules and proteins, rear diseases). Finally, fostering openness and collaboration is key to accelerate scientific breakthroughs.
{"title":"Enhancing foundation models for scientific discovery via multimodal knowledge graph representations","authors":"Vanessa Lopez, Lam Hoang, Marcos Martinez-Galindo, Raúl Fernández-Díaz, Marco Luca Sbodio, Rodrigo Ordonez-Hurtado, Mykhaylo Zayats, Natasha Mulligan, Joao Bettencourt-Silva","doi":"10.1016/j.websem.2024.100845","DOIUrl":"10.1016/j.websem.2024.100845","url":null,"abstract":"<div><div>Foundation Models (FMs) hold transformative potential to accelerate scientific discovery, yet reaching their full capacity in complex, highly multimodal domains such as genomics, drug discovery, and materials science requires a deeper consideration of the contextual nature of the scientific knowledge. We revisit the synergy between FMs and Multimodal Knowledge Graph (MKG) representation and learning, exploring their potential to enhance predictive and generative tasks in biomedical contexts like drug discovery. We seek to exploit MKGs to improve generative AI models’ ability to capture intricate domain-specific relations and facilitate multimodal fusion. This integration promises to accelerate discovery workflows by providing more meaningful multimodal knowledge-enhanced representations and contextual evidence. Despite this potential, challenges and opportunities remain, including fusing multiple sequential, structural and knowledge modalities and models leveraging the strengths of each; developing scalable architectures for multi-task multi-dataset learning; creating end-to-end workflows to enhance the trustworthiness of biomedical FMs using knowledge from heterogeneous datasets and scientific literature; the domain data bottleneck and the lack of a unified representation between natural language and chemical representations; and benchmarking, specifically the transfer learning to tasks with limited data (e.g., unseen molecules and proteins, rear diseases). Finally, fostering openness and collaboration is key to accelerate scientific breakthroughs.</div></div>","PeriodicalId":49951,"journal":{"name":"Journal of Web Semantics","volume":"84 ","pages":"Article 100845"},"PeriodicalIF":2.1,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143161138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}