Semantic Web最新文献_第5页

Interpretable ontology extension in chemistry 化学中的可解释本体扩展

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Semantic Web

Pub Date : 2023-05-18 DOI: 10.3233/sw-233183

Martin Glauer, A. Memariani, F. Neuhaus, T. Mossakowski, Janna Hastings

Reference ontologies provide a shared vocabulary and knowledge resource for their domain. Manual construction and annotation enables them to maintain high quality, allowing them to be widely accepted across their community. However, the manual ontology development process does not scale for large domains. We present a new methodology for automatic ontology extension for domains in which the ontology classes have associated graph-structured annotations, and apply it to the ChEBI ontology, a prominent reference ontology for life sciences chemistry. We train Transformer-based deep learning models on the leaf node structures from the ChEBI ontology and the classes to which they belong. The models are then able to automatically classify previously unseen chemical structures, resulting in automated ontology extension. The proposed models achieved an overall F1 scores of 0.80 and above, improvements of at least 6 percentage points over our previous results on the same dataset. In addition, the models are interpretable: we illustrate that visualizing the model’s attention weights can help to explain the results by providing insight into how the model made its decisions. We also analyse the performance for molecules that have not been part of the ontology and evaluate the logical correctness of the resulting extension.

参考本体为其领域提供共享词汇表和知识资源。手工构建和注释使它们能够保持高质量，从而使它们在社区中被广泛接受。然而，手工本体开发过程不适合大型领域。我们提出了一种新的本体自动扩展方法，用于本体类具有关联图结构注释的领域，并将其应用于生命科学化学领域的重要参考本体ChEBI本体。我们在ChEBI本体及其所属类的叶节点结构上训练基于transformer的深度学习模型。然后，模型能够自动对以前未见过的化学结构进行分类，从而实现自动本体扩展。所提出的模型实现了0.80及以上的总体F1分数，比我们之前在相同数据集上的结果至少提高了6个百分点。此外，模型是可解释的:我们说明了可视化模型的注意力权重可以通过洞察模型如何做出决策来帮助解释结果。我们还分析了不属于本体的分子的性能，并评估了结果扩展的逻辑正确性。

{"title":"Interpretable ontology extension in chemistry","authors":"Martin Glauer, A. Memariani, F. Neuhaus, T. Mossakowski, Janna Hastings","doi":"10.3233/sw-233183","DOIUrl":"https://doi.org/10.3233/sw-233183","url":null,"abstract":"Reference ontologies provide a shared vocabulary and knowledge resource for their domain. Manual construction and annotation enables them to maintain high quality, allowing them to be widely accepted across their community. However, the manual ontology development process does not scale for large domains. We present a new methodology for automatic ontology extension for domains in which the ontology classes have associated graph-structured annotations, and apply it to the ChEBI ontology, a prominent reference ontology for life sciences chemistry. We train Transformer-based deep learning models on the leaf node structures from the ChEBI ontology and the classes to which they belong. The models are then able to automatically classify previously unseen chemical structures, resulting in automated ontology extension. The proposed models achieved an overall F1 scores of 0.80 and above, improvements of at least 6 percentage points over our previous results on the same dataset. In addition, the models are interpretable: we illustrate that visualizing the model’s attention weights can help to explain the results by providing insight into how the model made its decisions. We also analyse the performance for molecules that have not been part of the ontology and evaluate the logical correctness of the resulting extension.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"4 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2023-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87249569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

A semantic meta-model for data integration and exploitation in precision agriculture and livestock farming 面向精准农业和畜牧业数据集成与开发的语义元模型

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Semantic Web

Pub Date : 2023-05-16 DOI: 10.3233/sw-233156

Dimitris Zeginis, E. Kalampokis, Raúl Palma, R. Atkinson, K. Tarabanis

At the domains of agriculture and livestock farming a large amount of data are produced through numerous heterogeneous sources including sensor data, weather/climate data, statistical and government data, drone/satellite imagery, video, and maps. This plethora of data can be used at precision agriculture and precision livestock farming in order to provide predictive insights in farming operations, drive real-time operational decisions, redesign business processes and support policy-making. The predictive power of the data can be further boosted if data from diverse sources are integrated and processed together, thus providing more unexplored insights. However, the exploitation and integration of data used in precision agriculture is not straightforward since they: i) cannot be easily discovered across the numerous heterogeneous sources and ii) use different structural and naming conventions hindering their interoperability. The aim of this paper is to: i) study the characteristics of data used in precision agriculture & livestock farming and ii) study the user requirements related to data modeling and processing from nine real cases at the agriculture, livestock farming and aquaculture domains and iii) propose a semantic meta-model that is based on W3C standards (DCAT, PROV-O and QB vocabulary) in order to enable the definition of metadata that facilitate the discovery, exploration, integration and accessing of data in the domain.

在农业和畜牧业领域，大量数据是通过众多异构来源产生的，包括传感器数据、天气/气候数据、统计和政府数据、无人机/卫星图像、视频和地图。这些大量的数据可以用于精准农业和精准畜牧业，以便为农业运营提供预测性见解，推动实时运营决策，重新设计业务流程并支持决策制定。如果将来自不同来源的数据进行整合和处理，可以进一步提高数据的预测能力，从而提供更多未开发的见解。然而，在精准农业中使用的数据的开发和集成并不简单，因为它们:i)不容易在众多异构源中发现，ii)使用不同的结构和命名约定阻碍了它们的互操作性。本文的目的是:i)研究精准农牧业数据的特点;ii)研究农牧业和水产养殖领域9个真实案例中与数据建模和处理相关的用户需求;iii)提出一种基于W3C标准(DCAT、provo和QB词汇表)的语义元模型，实现元数据的定义，方便该领域数据的发现、探索、集成和访问。

{"title":"A semantic meta-model for data integration and exploitation in precision agriculture and livestock farming","authors":"Dimitris Zeginis, E. Kalampokis, Raúl Palma, R. Atkinson, K. Tarabanis","doi":"10.3233/sw-233156","DOIUrl":"https://doi.org/10.3233/sw-233156","url":null,"abstract":"At the domains of agriculture and livestock farming a large amount of data are produced through numerous heterogeneous sources including sensor data, weather/climate data, statistical and government data, drone/satellite imagery, video, and maps. This plethora of data can be used at precision agriculture and precision livestock farming in order to provide predictive insights in farming operations, drive real-time operational decisions, redesign business processes and support policy-making. The predictive power of the data can be further boosted if data from diverse sources are integrated and processed together, thus providing more unexplored insights. However, the exploitation and integration of data used in precision agriculture is not straightforward since they: i) cannot be easily discovered across the numerous heterogeneous sources and ii) use different structural and naming conventions hindering their interoperability. The aim of this paper is to: i) study the characteristics of data used in precision agriculture & livestock farming and ii) study the user requirements related to data modeling and processing from nine real cases at the agriculture, livestock farming and aquaculture domains and iii) propose a semantic meta-model that is based on W3C standards (DCAT, PROV-O and QB vocabulary) in order to enable the definition of metadata that facilitate the discovery, exploration, integration and accessing of data in the domain.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"41 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2023-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84050345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Special issue on Semantic Web Meets Health Data Management 语义网遇上健康数据管理特刊

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Semantic Web

Pub Date : 2023-05-05 DOI: 10.3233/sw-239000

K. Stefanidis, H. Kondylakis, P. Rao

data framework: approach and study”

数据框架:方法与研究”

引用次数: 0

Publishing public transport data on the Web with the Linked Connections framework 利用“连接”框架在网上发布公共交通数据

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Semantic Web

Pub Date : 2023-04-21 DOI: 10.3233/sw-223116

J. Rojas, Harm Delva, Pieter Colpaert, R. Verborgh

Publishing transport data on the Web for consumption by others poses several challenges for data publishers. In addition to planned schedules, access to live schedule updates (e.g. delays or cancellations) and historical data is fundamental to enable reliable applications and to support machine learning use cases. However publishing such dynamic data further increases the computational burden for data publishers, resulting in often unavailable historical data and live schedule updates for most public transport networks. In this paper we apply and extend the current Linked Connections approach for static data to also support cost-efficient live and historical public transport data publishing on the Web. Our contributions include (i) a reference specification and system architecture to support cost-efficient publishing of dynamic public transport schedules and historical data; (ii) empirical evaluations on route planning query performance based on data fragmentation size, publishing costs and a comparison with a traditional route planning engine such as OpenTripPlanner; (iii) an analysis of potential correlations of query performance with particular public transport network characteristics such as size, average degree, density, clustering coefficient and average connection duration. Results confirm that fragmentation size influences route planning query performance and converges on an optimal fragment size per network. Size (stops), density and connection duration also show correlation with route planning query performance. Our approach proves to be more cost-efficient and in some cases outperforms OpenTripPlanner when supporting the earliest arrival time route planning use case. Moreover, the cost of publishing live and historical schedules remains in the same order of magnitude for server-side resources compared to publishing planned schedules only. Yet, further optimizations are needed for larger networks (>1000 stops) to be useful in practice. Additional dataset fragmentation strategies (e.g. geospatial) may be studied for designing more scalable and performant Web apis that adapt to particular use cases, not only limited to the public transport domain.

在Web上发布传输数据供其他人使用给数据发布者带来了一些挑战。除了计划的时间表，访问实时时间表更新(例如延迟或取消)和历史数据是实现可靠应用程序和支持机器学习用例的基础。然而，发布这种动态数据进一步增加了数据发布者的计算负担，导致大多数公共交通网络经常无法获得历史数据和实时时间表更新。在本文中，我们应用并扩展了当前用于静态数据的链接连接方法，以支持在Web上发布具有成本效益的实时和历史公共交通数据。我们的贡献包括(i)一个参考规范和系统架构，以支持经济高效地发布动态公共交通时刻表和历史数据;(ii)基于数据碎片大小、发布成本以及与传统路由规划引擎(如OpenTripPlanner)的比较，对路由规划查询性能进行实证评估;(iii)分析查询表现与特定公共交通网络特征(例如规模、平均程度、密度、聚类系数和平均连接时间)的潜在关联。结果证实，碎片大小影响路由规划查询性能，并收敛于每个网络的最优碎片大小。大小(站点)、密度和连接时长也与路由规划查询性能相关。事实证明，我们的方法更具成本效益，在支持最早到达时间路线规划用例时，在某些情况下优于OpenTripPlanner。此外，对于服务器端资源来说，与仅发布计划调度相比，发布实时调度和历史调度的成本保持在相同的数量级。然而，对于更大的网络(bbb1000个站点)，需要进一步的优化才能在实践中发挥作用。可以研究额外的数据集碎片策略(例如地理空间)，以设计更具可扩展性和高性能的Web api，以适应特定的用例，而不仅仅局限于公共交通领域。

{"title":"Publishing public transport data on the Web with the Linked Connections framework","authors":"J. Rojas, Harm Delva, Pieter Colpaert, R. Verborgh","doi":"10.3233/sw-223116","DOIUrl":"https://doi.org/10.3233/sw-223116","url":null,"abstract":"Publishing transport data on the Web for consumption by others poses several challenges for data publishers. In addition to planned schedules, access to live schedule updates (e.g. delays or cancellations) and historical data is fundamental to enable reliable applications and to support machine learning use cases. However publishing such dynamic data further increases the computational burden for data publishers, resulting in often unavailable historical data and live schedule updates for most public transport networks. In this paper we apply and extend the current Linked Connections approach for static data to also support cost-efficient live and historical public transport data publishing on the Web. Our contributions include (i) a reference specification and system architecture to support cost-efficient publishing of dynamic public transport schedules and historical data; (ii) empirical evaluations on route planning query performance based on data fragmentation size, publishing costs and a comparison with a traditional route planning engine such as OpenTripPlanner; (iii) an analysis of potential correlations of query performance with particular public transport network characteristics such as size, average degree, density, clustering coefficient and average connection duration. Results confirm that fragmentation size influences route planning query performance and converges on an optimal fragment size per network. Size (stops), density and connection duration also show correlation with route planning query performance. Our approach proves to be more cost-efficient and in some cases outperforms OpenTripPlanner when supporting the earliest arrival time route planning use case. Moreover, the cost of publishing live and historical schedules remains in the same order of magnitude for server-side resources compared to publishing planned schedules only. Yet, further optimizations are needed for larger networks (>1000 stops) to be useful in practice. Additional dataset fragmentation strategies (e.g. geospatial) may be studied for designing more scalable and performant Web apis that adapt to particular use cases, not only limited to the public transport domain.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"14 1","pages":"659-693"},"PeriodicalIF":3.0,"publicationDate":"2023-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72904415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ciTIzen-centric DAta pLatform (TIDAL): Sharing distributed personal data in a privacy-preserving manner for health research 以公民为中心的数据平台(TIDAL):以保护隐私的方式共享分布的个人数据，用于卫生研究

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Semantic Web

Pub Date : 2023-04-21 DOI: 10.3233/sw-223220

Chang Sun, Marc Gallofré Ocaña, J. V. Soest, M. Dumontier

Developing personal data sharing tools and standards in conformity with data protection regulations is essential to empower citizens to control and share their health data with authorized parties for any purpose they approve. This can be, among others, for primary use in healthcare, or secondary use for research to improve human health and well-being. Ensuring that citizens are able to make fine-grained decisions about how their personal health data can be used and shared will significantly encourage citizens to participate in more health-related research. In this paper, we propose a ciTIzen-centric DatA pLatform (TIDAL) to give individuals ownership of their own data, and connect them with researchers to donate the use of their personal data for research while being in control of the entire data life cycle, including data access, storage and analysis. We recognize that most existing technologies focus on one particular aspect such as personal data storage, or suffer from executing data analysis over a large number of participants, or face challenges of low data quality and insufficient data interoperability. To address these challenges, the TIDAL platform integrates a set of components for requesting subsets of RDF (Resource Description Framework) data stored in personal data vaults based on SOcial LInked Data (Solid) technology and analyzing them in a privacy-preserving manner. We demonstrate the feasibility and efficiency of the TIDAL platform by conducting a set of simulation experiments using three different pod providers (Inrupt, Solidcommunity, Self-hosted Server). On each pod provider, we evaluated the performance of TIDAL by querying and analyzing personal health data with varying scales of participants and configurations. The reasonable total time consumption and a linear correlation between the number of pods and variables on all pod providers show the feasibility and potential to implement and use the TIDAL platform in practice. TIDAL facilitates individuals to access their personal data in a fine-grained manner and to make their own decision on their data. Researchers are able to reach out to individuals and send them digital consent directly for using personal data for health-related research. TIDAL can play an important role to connect citizens, researchers, and data organizations to increase the trust placed by citizens in the processing of personal data.

制定符合数据保护条例的个人数据共享工具和标准对于使公民能够控制其健康数据并为其批准的任何目的与被授权方共享其健康数据至关重要。除其他外，这可以主要用于医疗保健，也可以用于改善人类健康和福祉的研究。确保公民能够对如何使用和共享个人健康数据做出精细的决定，将极大地鼓励公民参与更多与健康相关的研究。在本文中，我们提出了一个以公民为中心的数据平台(TIDAL)，让个人拥有自己的数据，并将他们与研究人员联系起来，以捐赠他们的个人数据用于研究，同时控制整个数据生命周期，包括数据访问，存储和分析。我们认识到，大多数现有技术都侧重于个人数据存储等特定方面，或者在执行大量参与者的数据分析时受到影响，或者面临数据质量低和数据互操作性不足的挑战。为了应对这些挑战，TIDAL平台集成了一组组件，用于请求存储在基于社会关联数据(Solid)技术的个人数据保管库中的RDF(资源描述框架)数据子集，并以保护隐私的方式对其进行分析。我们通过使用三种不同的pod提供商(interrupt, Solidcommunity, Self-hosted Server)进行一组模拟实验来证明TIDAL平台的可行性和效率。在每个pod提供商上，我们通过查询和分析不同参与者规模和配置的个人健康数据来评估TIDAL的性能。合理的总时间消耗以及所有pod提供商的pod数量和变量之间的线性相关性表明了在实践中实施和使用TIDAL平台的可行性和潜力。TIDAL使个人能够以细粒度的方式访问他们的个人数据，并对他们的数据做出自己的决定。研究人员能够接触到个人，并直接向他们发送数字同意书，同意将个人数据用于与健康相关的研究。TIDAL可以发挥重要作用，将公民、研究人员和数据组织联系起来，增加公民对个人数据处理的信任。

{"title":"ciTIzen-centric DAta pLatform (TIDAL): Sharing distributed personal data in a privacy-preserving manner for health research","authors":"Chang Sun, Marc Gallofré Ocaña, J. V. Soest, M. Dumontier","doi":"10.3233/sw-223220","DOIUrl":"https://doi.org/10.3233/sw-223220","url":null,"abstract":"Developing personal data sharing tools and standards in conformity with data protection regulations is essential to empower citizens to control and share their health data with authorized parties for any purpose they approve. This can be, among others, for primary use in healthcare, or secondary use for research to improve human health and well-being. Ensuring that citizens are able to make fine-grained decisions about how their personal health data can be used and shared will significantly encourage citizens to participate in more health-related research. In this paper, we propose a ciTIzen-centric DatA pLatform (TIDAL) to give individuals ownership of their own data, and connect them with researchers to donate the use of their personal data for research while being in control of the entire data life cycle, including data access, storage and analysis. We recognize that most existing technologies focus on one particular aspect such as personal data storage, or suffer from executing data analysis over a large number of participants, or face challenges of low data quality and insufficient data interoperability. To address these challenges, the TIDAL platform integrates a set of components for requesting subsets of RDF (Resource Description Framework) data stored in personal data vaults based on SOcial LInked Data (Solid) technology and analyzing them in a privacy-preserving manner. We demonstrate the feasibility and efficiency of the TIDAL platform by conducting a set of simulation experiments using three different pod providers (Inrupt, Solidcommunity, Self-hosted Server). On each pod provider, we evaluated the performance of TIDAL by querying and analyzing personal health data with varying scales of participants and configurations. The reasonable total time consumption and a linear correlation between the number of pods and variables on all pod providers show the feasibility and potential to implement and use the TIDAL platform in practice. TIDAL facilitates individuals to access their personal data in a fine-grained manner and to make their own decision on their data. Researchers are able to reach out to individuals and send them digital consent directly for using personal data for health-related research. TIDAL can play an important role to connect citizens, researchers, and data organizations to increase the trust placed by citizens in the processing of personal data.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"7 1","pages":"977-996"},"PeriodicalIF":3.0,"publicationDate":"2023-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78330245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Knowledge graphs for enhancing transparency in health data ecosystems 用于提高卫生数据生态系统透明度的知识图谱

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Semantic Web

Pub Date : 2023-04-21 DOI: 10.3233/sw-223294

F. Aisopos, S. Jozashoori, E. Niazmand, Disha Purohit, Ariam Rivas, Ahmad Sakor, Enrique Iglesias, D. Vogiatzis, Ernestina Menasalvas Ruiz, A. R. González, Guillermo Vigueras, Daniel Gómez-Bravo, M. Torrente, Roberto Hernández López, M. P. Pulla, Athanasios Dalianis, A. Triantafillou, G. Paliouras, M. Vidal

Tailoring personalized treatments demands the analysis of a patient’s characteristics, which may be scattered over a wide variety of sources. These features include family history, life habits, comorbidities, and potential treatment side effects. Moreover, the analysis of the services visited the most by a patient before a new diagnosis, as well as the type of requested tests, may uncover patterns that contribute to earlier disease detection and treatment effectiveness. Built on knowledge-driven ecosystems, we devise DE4LungCancer, a health data ecosystem of data sources for lung cancer. In this data ecosystem, knowledge extracted from heterogeneous sources, e.g., clinical records, scientific publications, and pharmacological data, is integrated into knowledge graphs. Ontologies describe the meaning of the combined data, and mapping rules enable the declarative definition of the transformation and integration processes. DE4LungCancer is assessed regarding the methods followed for data quality assessment and curation. Lastly, the role of controlled vocabularies and ontologies in health data management is discussed, as well as their impact on transparent knowledge extraction and analytics. This paper presents the lessons learned in the DE4LungCancer development. It demonstrates the transparency level supported by the proposed knowledge-driven ecosystem, in the context of the lung cancer pilots of the EU H2020-funded project BigMedilytic, the ERA PerMed funded project P4-LUCAT, and the EU H2020 projects CLARIFY and iASiS.

定制个性化治疗需要分析患者的特征，这些特征可能分散在各种各样的来源上。这些特征包括家族史、生活习惯、合并症和潜在的治疗副作用。此外，对病人在作出新的诊断之前访问次数最多的服务以及所要求的检查类型进行分析，可能会发现有助于早期发现疾病和提高治疗效果的模式。基于知识驱动的生态系统，我们设计了DE4LungCancer，一个肺癌数据源的健康数据生态系统。在这个数据生态系统中，从异质来源提取的知识，如临床记录、科学出版物和药理学数据，被整合到知识图谱中。本体描述组合数据的含义，映射规则支持转换和集成过程的声明性定义。对DE4LungCancer进行数据质量评估和整理的方法评估。最后，讨论了受控词汇表和本体在健康数据管理中的作用，以及它们对透明知识提取和分析的影响。本文介绍了DE4LungCancer发展过程中的经验教训。在欧盟H2020资助项目BigMedilytic、ERA PerMed资助项目P4-LUCAT以及欧盟H2020项目clarity和iASiS的肺癌试点项目背景下，它展示了拟议的知识驱动生态系统所支持的透明度水平。

{"title":"Knowledge graphs for enhancing transparency in health data ecosystems","authors":"F. Aisopos, S. Jozashoori, E. Niazmand, Disha Purohit, Ariam Rivas, Ahmad Sakor, Enrique Iglesias, D. Vogiatzis, Ernestina Menasalvas Ruiz, A. R. González, Guillermo Vigueras, Daniel Gómez-Bravo, M. Torrente, Roberto Hernández López, M. P. Pulla, Athanasios Dalianis, A. Triantafillou, G. Paliouras, M. Vidal","doi":"10.3233/sw-223294","DOIUrl":"https://doi.org/10.3233/sw-223294","url":null,"abstract":"Tailoring personalized treatments demands the analysis of a patient’s characteristics, which may be scattered over a wide variety of sources. These features include family history, life habits, comorbidities, and potential treatment side effects. Moreover, the analysis of the services visited the most by a patient before a new diagnosis, as well as the type of requested tests, may uncover patterns that contribute to earlier disease detection and treatment effectiveness. Built on knowledge-driven ecosystems, we devise DE4LungCancer, a health data ecosystem of data sources for lung cancer. In this data ecosystem, knowledge extracted from heterogeneous sources, e.g., clinical records, scientific publications, and pharmacological data, is integrated into knowledge graphs. Ontologies describe the meaning of the combined data, and mapping rules enable the declarative definition of the transformation and integration processes. DE4LungCancer is assessed regarding the methods followed for data quality assessment and curation. Lastly, the role of controlled vocabularies and ontologies in health data management is discussed, as well as their impact on transparent knowledge extraction and analytics. This paper presents the lessons learned in the DE4LungCancer development. It demonstrates the transparency level supported by the proposed knowledge-driven ecosystem, in the context of the lung cancer pilots of the EU H2020-funded project BigMedilytic, the ERA PerMed funded project P4-LUCAT, and the EU H2020 projects CLARIFY and iASiS.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"41 1","pages":"943-976"},"PeriodicalIF":3.0,"publicationDate":"2023-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90832625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Context-aware query derivation for IoT data streams with DIVIDE enabling privacy by design 物联网数据流的上下文感知查询派生，DIVIDE通过设计启用隐私

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Semantic Web

Pub Date : 2023-04-05 DOI: 10.3233/sw-223281

Mathias De Brouwer, Bram Steenwinckel, Ziye Fang, Marija Stojchevska, P. Bonte, Filip De Turck, Sofie Van Hoecke, F. Ongenae

Integrating Internet of Things (IoT) sensor data from heterogeneous sources with domain knowledge and context information in real-time is a challenging task in IoT healthcare data management applications that can be solved with semantics. Existing IoT platforms often have issues with preserving the privacy of patient data. Moreover, configuring and managing context-aware stream processing queries in semantic IoT platforms requires much manual, labor-intensive effort. Generic queries can deal with context changes but often lead to performance issues caused by the need for expressive real-time semantic reasoning. In addition, query window parameters are part of the manual configuration and cannot be made context-dependent. To tackle these problems, this paper presents DIVIDE, a component for a semantic IoT platform that adaptively derives and manages the queries of the platform’s stream processing components in a context-aware and scalable manner, and that enables privacy by design. By performing semantic reasoning to derive the queries when context changes are observed, their real-time evaluation does require any reasoning. The results of an evaluation on a homecare monitoring use case demonstrate how activity detection queries derived with DIVIDE can be evaluated in on average less than 3.7 seconds and can therefore successfully run on low-end IoT devices.

在物联网医疗数据管理应用中，将来自异构源的物联网(IoT)传感器数据与领域知识和上下文信息实时集成是一项具有挑战性的任务，可以通过语义来解决。现有的物联网平台通常在保护患者数据隐私方面存在问题。此外，在语义物联网平台中配置和管理上下文感知流处理查询需要大量的手工和劳动密集型工作。通用查询可以处理上下文更改，但由于需要进行表达性实时语义推理，通常会导致性能问题。此外，查询窗口参数是手动配置的一部分，不能与上下文相关。为了解决这些问题，本文提出了DIVIDE，这是一个语义物联网平台的组件，它以上下文感知和可扩展的方式自适应地派生和管理平台流处理组件的查询，并通过设计实现隐私。通过在观察到上下文变化时执行语义推理来派生查询，它们的实时评估不需要任何推理。对家庭护理监控用例的评估结果表明，使用DIVIDE派生的活动检测查询如何在平均不到3.7秒的时间内进行评估，因此可以成功地在低端物联网设备上运行。

{"title":"Context-aware query derivation for IoT data streams with DIVIDE enabling privacy by design","authors":"Mathias De Brouwer, Bram Steenwinckel, Ziye Fang, Marija Stojchevska, P. Bonte, Filip De Turck, Sofie Van Hoecke, F. Ongenae","doi":"10.3233/sw-223281","DOIUrl":"https://doi.org/10.3233/sw-223281","url":null,"abstract":"Integrating Internet of Things (IoT) sensor data from heterogeneous sources with domain knowledge and context information in real-time is a challenging task in IoT healthcare data management applications that can be solved with semantics. Existing IoT platforms often have issues with preserving the privacy of patient data. Moreover, configuring and managing context-aware stream processing queries in semantic IoT platforms requires much manual, labor-intensive effort. Generic queries can deal with context changes but often lead to performance issues caused by the need for expressive real-time semantic reasoning. In addition, query window parameters are part of the manual configuration and cannot be made context-dependent. To tackle these problems, this paper presents DIVIDE, a component for a semantic IoT platform that adaptively derives and manages the queries of the platform’s stream processing components in a context-aware and scalable manner, and that enables privacy by design. By performing semantic reasoning to derive the queries when context changes are observed, their real-time evaluation does require any reasoning. The results of an evaluation on a homecare monitoring use case demonstrate how activity detection queries derived with DIVIDE can be evaluated in on average less than 3.7 seconds and can therefore successfully run on low-end IoT devices.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"27 1","pages":"893-941"},"PeriodicalIF":3.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89098743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Terminology and ontology development for semantic annotation: A use case on sepsis and adverse events 语义注释的术语和本体开发:一个关于败血症和不良事件的用例

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Semantic Web

Pub Date : 2023-03-15 DOI: 10.3233/sw-223226

Melissa Y. Yan, L. Gustad, L. Høvik, Ø. Nytrø

Annotations enrich text corpora and provide necessary labels for natural language processing studies. To reason and infer underlying implicit knowledge captured by labels, an ontology is needed to provide a semantically annotated corpus with structured domain knowledge. Utilizing a corpus of adverse event documents annotated for sepsis-related signs and symptoms as a use case, this paper details how a terminology and corresponding ontology were developed. The Annotated Adverse Event NOte TErminology (AAENOTE) represents annotated documents and assists annotators in annotating text. In contrast, the complementary Catheter Infection Indications Ontology (CIIO) is intended for clinician use and captures domain knowledge needed to reason and infer implicit information from data. The approach taken makes ontology development understandable and accessible to domain experts without formal ontology training.

标注丰富了文本语料库，为自然语言处理研究提供了必要的标注。为了推理和推断标签捕获的潜在隐性知识，需要本体提供具有结构化领域知识的语义注释语料库。本文以脓毒症相关体征和症状的不良事件文档为例，详细介绍了如何开发术语和相应的本体。注释不良事件注释术语(AAENOTE)表示注释文档，并帮助注释者注释文本。相比之下，补充导管感染指征本体(CIIO)旨在供临床医生使用，并捕获从数据中推理和推断隐含信息所需的领域知识。所采用的方法使得无需经过正式本体培训的领域专家也可以理解和访问本体开发。

引用次数: 1

Empowering machine learning models with contextual knowledge for enhancing the detection of eating disorders in social media posts 赋予机器学习模型上下文知识，以增强对社交媒体帖子中饮食失调的检测

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Semantic Web

Pub Date : 2023-03-13 DOI: 10.3233/sw-223269

J. Benítez-Andrades, María Teresa García-Ordás, Mayra Russo, Ahmad Sakor, Luis Daniel Fernandes Rotger, M. Vidal

Social networks have become information dissemination channels, where announcements are posted frequently; they also serve as frameworks for debates in various areas (e.g., scientific, political, and social). In particular, in the health area, social networks represent a channel to communicate and disseminate novel treatments’ success; they also allow ordinary people to express their concerns about a disease or disorder. The Artificial Intelligence (AI) community has developed analytical methods to uncover and predict patterns from posts that enable it to explain news about a particular topic, e.g., mental disorders expressed as eating disorders or depression. Albeit potentially rich while expressing an idea or concern, posts are presented as short texts, preventing, thus, AI models from accurately encoding these posts’ contextual knowledge. We propose a hybrid approach where knowledge encoded in community-maintained knowledge graphs (e.g., Wikidata) is combined with deep learning to categorize social media posts using existing classification models. The proposed approach resorts to state-of-the-art named entity recognizers and linkers (e.g., Falcon 2.0) to extract entities in short posts and link them to concepts in knowledge graphs. Then, knowledge graph embeddings (KGEs) are utilized to compute latent representations of the extracted entities, which result in vector representations of the posts that encode these entities’ contextual knowledge extracted from the knowledge graphs. These KGEs are combined with contextualized word embeddings (e.g., BERT) to generate a context-based representation of the posts that empower prediction models. We apply our proposed approach in the health domain to detect whether a publication is related to an eating disorder (e.g., anorexia or bulimia) and uncover concepts within the discourse that could help healthcare providers diagnose this type of mental disorder. We evaluate our approach on a dataset of 2,000 tweets about eating disorders. Our experimental results suggest that combining contextual knowledge encoded in word embeddings with the one built from knowledge graphs increases the reliability of the predictive models. The ambition is that the proposed method can support health domain experts in discovering patterns that may forecast a mental disorder, enhancing early detection and more precise diagnosis towards personalized medicine.

社交网络已成为信息传播渠道，频繁发布公告;它们还可以作为各个领域(如科学、政治和社会)辩论的框架。特别是在卫生领域，社交网络是沟通和传播新疗法成功的渠道;他们也允许普通人表达他们对疾病或失调的担忧。人工智能(AI)社区已经开发出分析方法，从帖子中发现和预测模式，使其能够解释有关特定主题的新闻，例如，以饮食失调或抑郁症为表现形式的精神障碍。尽管帖子在表达想法或关注时可能内容丰富，但它们以短文形式呈现，因此，人工智能模型无法准确地编码这些帖子的上下文知识。我们提出了一种混合方法，将社区维护的知识图(例如Wikidata)中编码的知识与深度学习相结合，使用现有的分类模型对社交媒体帖子进行分类。建议的方法采用最先进的命名实体识别器和链接器(例如，Falcon 2.0)来提取短帖子中的实体，并将它们链接到知识图中的概念。然后，利用知识图嵌入(KGEs)来计算提取实体的潜在表示，从而得到从知识图中提取的对这些实体的上下文知识进行编码的帖子的向量表示。这些kge与上下文化的词嵌入(例如BERT)相结合，生成基于上下文的帖子表示，从而增强预测模型。我们将我们提出的方法应用于健康领域，以检测出版物是否与饮食失调(例如，厌食症或贪食症)有关，并揭示话语中的概念，可以帮助医疗保健提供者诊断这种类型的精神障碍。我们在2000条关于饮食失调的推文数据集上评估了我们的方法。我们的实验结果表明，将词嵌入中编码的上下文知识与从知识图中构建的上下文知识相结合，可以提高预测模型的可靠性。其目标是，所提出的方法可以支持卫生领域专家发现可能预测精神障碍的模式，加强早期发现和更精确的诊断，以实现个性化医疗。

{"title":"Empowering machine learning models with contextual knowledge for enhancing the detection of eating disorders in social media posts","authors":"J. Benítez-Andrades, María Teresa García-Ordás, Mayra Russo, Ahmad Sakor, Luis Daniel Fernandes Rotger, M. Vidal","doi":"10.3233/sw-223269","DOIUrl":"https://doi.org/10.3233/sw-223269","url":null,"abstract":"Social networks have become information dissemination channels, where announcements are posted frequently; they also serve as frameworks for debates in various areas (e.g., scientific, political, and social). In particular, in the health area, social networks represent a channel to communicate and disseminate novel treatments’ success; they also allow ordinary people to express their concerns about a disease or disorder. The Artificial Intelligence (AI) community has developed analytical methods to uncover and predict patterns from posts that enable it to explain news about a particular topic, e.g., mental disorders expressed as eating disorders or depression. Albeit potentially rich while expressing an idea or concern, posts are presented as short texts, preventing, thus, AI models from accurately encoding these posts’ contextual knowledge. We propose a hybrid approach where knowledge encoded in community-maintained knowledge graphs (e.g., Wikidata) is combined with deep learning to categorize social media posts using existing classification models. The proposed approach resorts to state-of-the-art named entity recognizers and linkers (e.g., Falcon 2.0) to extract entities in short posts and link them to concepts in knowledge graphs. Then, knowledge graph embeddings (KGEs) are utilized to compute latent representations of the extracted entities, which result in vector representations of the posts that encode these entities’ contextual knowledge extracted from the knowledge graphs. These KGEs are combined with contextualized word embeddings (e.g., BERT) to generate a context-based representation of the posts that empower prediction models. We apply our proposed approach in the health domain to detect whether a publication is related to an eating disorder (e.g., anorexia or bulimia) and uncover concepts within the discourse that could help healthcare providers diagnose this type of mental disorder. We evaluate our approach on a dataset of 2,000 tweets about eating disorders. Our experimental results suggest that combining contextual knowledge encoded in word embeddings with the one built from knowledge graphs increases the reliability of the predictive models. The ambition is that the proposed method can support health domain experts in discovering patterns that may forecast a mental disorder, enhancing early detection and more precise diagnosis towards personalized medicine.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"42 1","pages":"873-892"},"PeriodicalIF":3.0,"publicationDate":"2023-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74201255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards a formal ontology of engineering functions, behaviours, and capabilities 朝向工程功能、行为和能力的正式本体

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Semantic Web

Pub Date : 2023-03-09 DOI: 10.3233/sw-223188

Francesco Compagno, S. Borgo

In both applied ontology and engineering, functionality is a well-researched topic, since it is through teleological causal reasoning that domain experts build mental models of engineering systems, giving birth to functions. These mental models are important throughout the whole lifecycle of any product, being used from the design phase up to diagnosis activities. Though a vast amount of work to model functions has already been carried out, the literature has not settled on a shared and well-defined approach due to the variety of concepts involved and the modeling tasks that functional descriptions should satisfy. The work in this paper posits the basis and makes some crucial steps towards a rich ontological description of functions and related concepts, such as behaviour, capability, and capacity. A conceptual analysis of such notions is carried out using the top-level ontology DOLCE as a framework, and the ensuing logical theory is formally described in first-order logic and OWL, showing how ontological concepts can model major aspects of engineering products in applications. In particular, it is shown how functions can be distinguished from the implementation methods to realize them, how one can differentiate between capabilities and capacities of a product, and how these are related to engineering functions.

在应用本体论和工程学中，功能都是一个很好的研究主题，因为领域专家通过目的论因果推理建立工程系统的心智模型，从而产生功能。这些心理模型在任何产品的整个生命周期中都很重要，从设计阶段一直使用到诊断活动。虽然已经进行了大量的功能建模工作，但由于涉及的概念和功能描述应该满足的建模任务的多样性，文献并没有确定一个共享的和定义良好的方法。本文的工作为功能和相关概念(如行为、能力和容量)的丰富本体论描述奠定了基础，并采取了一些关键步骤。使用顶级本体DOLCE作为框架对这些概念进行概念性分析，随后的逻辑理论以一阶逻辑和OWL进行正式描述，展示了本体概念如何在应用程序中对工程产品的主要方面进行建模。特别是，它展示了如何将功能与实现它们的实现方法区分开来，如何区分产品的功能和容量，以及这些功能如何与工程功能相关。

{"title":"Towards a formal ontology of engineering functions, behaviours, and capabilities","authors":"Francesco Compagno, S. Borgo","doi":"10.3233/sw-223188","DOIUrl":"https://doi.org/10.3233/sw-223188","url":null,"abstract":"In both applied ontology and engineering, functionality is a well-researched topic, since it is through teleological causal reasoning that domain experts build mental models of engineering systems, giving birth to functions. These mental models are important throughout the whole lifecycle of any product, being used from the design phase up to diagnosis activities. Though a vast amount of work to model functions has already been carried out, the literature has not settled on a shared and well-defined approach due to the variety of concepts involved and the modeling tasks that functional descriptions should satisfy. The work in this paper posits the basis and makes some crucial steps towards a rich ontological description of functions and related concepts, such as behaviour, capability, and capacity. A conceptual analysis of such notions is carried out using the top-level ontology DOLCE as a framework, and the ensuing logical theory is formally described in first-order logic and OWL, showing how ontological concepts can model major aspects of engineering products in applications. In particular, it is shown how functions can be distinguished from the implementation methods to realize them, how one can differentiate between capabilities and capacities of a product, and how these are related to engineering functions.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"213 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2023-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73348103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0