首页 > 最新文献

Journal of Biomedical Semantics最新文献

英文 中文
An annotated corpus of clinical trial publications supporting schema-based relational information extraction 临床试验出版物的注释语料库,支持基于模式的关系信息提取
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2022-05-23 DOI: 10.1186/s13326-022-00271-7
Olivia Sanchez-Graillet, Christian Witte, Frank Grimm, P. Cimiano
{"title":"An annotated corpus of clinical trial publications supporting schema-based relational information extraction","authors":"Olivia Sanchez-Graillet, Christian Witte, Frank Grimm, P. Cimiano","doi":"10.1186/s13326-022-00271-7","DOIUrl":"https://doi.org/10.1186/s13326-022-00271-7","url":null,"abstract":"","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":" ","pages":""},"PeriodicalIF":1.9,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43016385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
SemClinBr - a multi-institutional and multi-specialty semantically annotated corpus for Portuguese clinical NLP tasks. SemClinBr -一个多机构和多专业语义注释的语料库,用于葡萄牙临床NLP任务。
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2022-05-08 DOI: 10.1186/s13326-022-00269-1
Lucas Emanuel Silva E Oliveira, Ana Carolina Peters, Adalniza Moura Pucca da Silva, Caroline Pilatti Gebeluca, Yohan Bonescki Gumiel, Lilian Mie Mukai Cintho, Deborah Ribeiro Carvalho, Sadid Al Hasan, Claudia Maria Cabral Moro

Background: The high volume of research focusing on extracting patient information from electronic health records (EHRs) has led to an increase in the demand for annotated corpora, which are a precious resource for both the development and evaluation of natural language processing (NLP) algorithms. The absence of a multipurpose clinical corpus outside the scope of the English language, especially in Brazilian Portuguese, is glaring and severely impacts scientific progress in the biomedical NLP field.

Methods: In this study, a semantically annotated corpus was developed using clinical text from multiple medical specialties, document types, and institutions. In addition, we present, (1) a survey listing common aspects, differences, and lessons learned from previous research, (2) a fine-grained annotation schema that can be replicated to guide other annotation initiatives, (3) a web-based annotation tool focusing on an annotation suggestion feature, and (4) both intrinsic and extrinsic evaluation of the annotations.

Results: This study resulted in SemClinBr, a corpus that has 1000 clinical notes, labeled with 65,117 entities and 11,263 relations. In addition, both negation cues and medical abbreviation dictionaries were generated from the annotations. The average annotator agreement score varied from 0.71 (applying strict match) to 0.92 (considering a relaxed match) while accepting partial overlaps and hierarchically related semantic types. The extrinsic evaluation, when applying the corpus to two downstream NLP tasks, demonstrated the reliability and usefulness of annotations, with the systems achieving results that were consistent with the agreement scores.

Conclusion: The SemClinBr corpus and other resources produced in this work can support clinical NLP studies, providing a common development and evaluation resource for the research community, boosting the utilization of EHRs in both clinical practice and biomedical research. To the best of our knowledge, SemClinBr is the first available Portuguese clinical corpus.

背景:从电子健康记录(EHRs)中提取患者信息的大量研究导致对标注语料库的需求增加,这是开发和评估自然语言处理(NLP)算法的宝贵资源。在英语语言范围之外缺乏多用途临床语料库,特别是在巴西葡萄牙语中,这是显而易见的,并严重影响了生物医学NLP领域的科学进展。方法:在本研究中,使用来自多个医学专业、文献类型和机构的临床文本开发了一个语义注释的语料库。此外,我们提出了(1)一项调查,列出了共同的方面、差异和从以往研究中吸取的教训;(2)一个可以复制的细粒度注释模式,以指导其他注释计划;(3)一个基于web的注释工具,专注于注释建议功能;(4)对注释进行内在和外在评估。结果:该研究产生了SemClinBr,这是一个包含1000个临床记录的语料库,标记了65117个实体和11263个关系。此外,还生成了否定线索和医学缩写词典。在接受部分重叠和层次相关的语义类型时,注释器协议的平均得分从0.71(应用严格匹配)到0.92(考虑宽松匹配)不等。当将语料库应用于两个下游NLP任务时,外部评估证明了注释的可靠性和有用性,系统获得的结果与协议分数一致。结论:本工作生成的SemClinBr语料库和其他资源可以支持临床NLP研究,为研究界提供一个共同的开发和评估资源,促进电子病历在临床实践和生物医学研究中的应用。据我们所知,SemClinBr是第一个可用的葡萄牙临床语料库。
{"title":"SemClinBr - a multi-institutional and multi-specialty semantically annotated corpus for Portuguese clinical NLP tasks.","authors":"Lucas Emanuel Silva E Oliveira,&nbsp;Ana Carolina Peters,&nbsp;Adalniza Moura Pucca da Silva,&nbsp;Caroline Pilatti Gebeluca,&nbsp;Yohan Bonescki Gumiel,&nbsp;Lilian Mie Mukai Cintho,&nbsp;Deborah Ribeiro Carvalho,&nbsp;Sadid Al Hasan,&nbsp;Claudia Maria Cabral Moro","doi":"10.1186/s13326-022-00269-1","DOIUrl":"https://doi.org/10.1186/s13326-022-00269-1","url":null,"abstract":"<p><strong>Background: </strong>The high volume of research focusing on extracting patient information from electronic health records (EHRs) has led to an increase in the demand for annotated corpora, which are a precious resource for both the development and evaluation of natural language processing (NLP) algorithms. The absence of a multipurpose clinical corpus outside the scope of the English language, especially in Brazilian Portuguese, is glaring and severely impacts scientific progress in the biomedical NLP field.</p><p><strong>Methods: </strong>In this study, a semantically annotated corpus was developed using clinical text from multiple medical specialties, document types, and institutions. In addition, we present, (1) a survey listing common aspects, differences, and lessons learned from previous research, (2) a fine-grained annotation schema that can be replicated to guide other annotation initiatives, (3) a web-based annotation tool focusing on an annotation suggestion feature, and (4) both intrinsic and extrinsic evaluation of the annotations.</p><p><strong>Results: </strong>This study resulted in SemClinBr, a corpus that has 1000 clinical notes, labeled with 65,117 entities and 11,263 relations. In addition, both negation cues and medical abbreviation dictionaries were generated from the annotations. The average annotator agreement score varied from 0.71 (applying strict match) to 0.92 (considering a relaxed match) while accepting partial overlaps and hierarchically related semantic types. The extrinsic evaluation, when applying the corpus to two downstream NLP tasks, demonstrated the reliability and usefulness of annotations, with the systems achieving results that were consistent with the agreement scores.</p><p><strong>Conclusion: </strong>The SemClinBr corpus and other resources produced in this work can support clinical NLP studies, providing a common development and evaluation resource for the research community, boosting the utilization of EHRs in both clinical practice and biomedical research. To the best of our knowledge, SemClinBr is the first available Portuguese clinical corpus.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"13 1","pages":"13"},"PeriodicalIF":1.9,"publicationDate":"2022-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9080187/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10252310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Applying the FAIR principles to data in a hospital: challenges and opportunities in a pandemic 将公平原则应用于医院数据:大流行中的挑战和机遇
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2022-04-25 DOI: 10.1186/s13326-022-00263-7
Queralt-Rosinach, Núria, Kaliyaperumal, Rajaram, Bernabé, César H., Long, Qinqin, Joosten, Simone A., van der Wijk, Henk Jan, Flikkenschild, Erik L.A., Burger, Kees, Jacobsen, Annika, Mons, Barend, Roos, Marco
The COVID-19 pandemic has challenged healthcare systems and research worldwide. Data is collected all over the world and needs to be integrated and made available to other researchers quickly. However, the various heterogeneous information systems that are used in hospitals can result in fragmentation of health data over multiple data ‘silos’ that are not interoperable for analysis. Consequently, clinical observations in hospitalised patients are not prepared to be reused efficiently and timely. There is a need to adapt the research data management in hospitals to make COVID-19 observational patient data machine actionable, i.e. more Findable, Accessible, Interoperable and Reusable (FAIR) for humans and machines. We therefore applied the FAIR principles in the hospital to make patient data more FAIR. In this paper, we present our FAIR approach to transform COVID-19 observational patient data collected in the hospital into machine actionable digital objects to answer medical doctors’ research questions. With this objective, we conducted a coordinated FAIRification among stakeholders based on ontological models for data and metadata, and a FAIR based architecture that complements the existing data management. We applied FAIR Data Points for metadata exposure, turning investigational parameters into a FAIR dataset. We demonstrated that this dataset is machine actionable by means of three different computational activities: federated query of patient data along open existing knowledge sources across the world through the Semantic Web, implementing Web APIs for data query interoperability, and building applications on top of these FAIR patient data for FAIR data analytics in the hospital. Our work demonstrates that a FAIR research data management plan based on ontological models for data and metadata, open Science, Semantic Web technologies, and FAIR Data Points is providing data infrastructure in the hospital for machine actionable FAIR Digital Objects. This FAIR data is prepared to be reused for federated analysis, linkable to other FAIR data such as Linked Open Data, and reusable to develop software applications on top of them for hypothesis generation and knowledge discovery.
COVID-19大流行给世界各地的医疗保健系统和研究带来了挑战。数据是在世界各地收集的,需要进行整合,并迅速提供给其他研究人员。然而,医院中使用的各种异构信息系统可能会导致卫生数据分散在多个数据“孤岛”上,这些“孤岛”无法互操作以进行分析。因此,住院患者的临床观察没有准备好有效和及时地重复使用。有必要调整医院的研究数据管理,使COVID-19观察患者数据机器可操作,即人类和机器更易于查找、可访问、可互操作和可重复使用(FAIR)。因此,我们在医院应用了公平原则,使患者数据更加公平。在本文中,我们提出了我们的FAIR方法,将医院收集的COVID-19观察患者数据转换为机器可操作的数字对象,以回答医生的研究问题。为了实现这一目标,我们基于数据和元数据的本体论模型在利益相关者之间进行了协调的公平化,并基于公平的体系结构来补充现有的数据管理。我们将FAIR数据点用于元数据暴露,将研究参数转换为FAIR数据集。我们通过三种不同的计算活动证明了该数据集是机器可操作的:通过语义网沿着世界各地开放的现有知识来源对患者数据进行联合查询,实现数据查询互操作性的Web api,以及在这些FAIR患者数据之上构建应用程序,用于医院中的FAIR数据分析。我们的工作表明,FAIR研究数据管理计划基于数据和元数据的本体论模型、开放科学、语义网技术和FAIR数据点,为医院中机器可操作的FAIR数字对象提供了数据基础设施。这些FAIR数据可用于联邦分析,可与其他FAIR数据(如链接开放数据)链接,并可用于在其基础上开发用于假设生成和知识发现的软件应用程序。
{"title":"Applying the FAIR principles to data in a hospital: challenges and opportunities in a pandemic","authors":"Queralt-Rosinach, Núria, Kaliyaperumal, Rajaram, Bernabé, César H., Long, Qinqin, Joosten, Simone A., van der Wijk, Henk Jan, Flikkenschild, Erik L.A., Burger, Kees, Jacobsen, Annika, Mons, Barend, Roos, Marco","doi":"10.1186/s13326-022-00263-7","DOIUrl":"https://doi.org/10.1186/s13326-022-00263-7","url":null,"abstract":"The COVID-19 pandemic has challenged healthcare systems and research worldwide. Data is collected all over the world and needs to be integrated and made available to other researchers quickly. However, the various heterogeneous information systems that are used in hospitals can result in fragmentation of health data over multiple data ‘silos’ that are not interoperable for analysis. Consequently, clinical observations in hospitalised patients are not prepared to be reused efficiently and timely. There is a need to adapt the research data management in hospitals to make COVID-19 observational patient data machine actionable, i.e. more Findable, Accessible, Interoperable and Reusable (FAIR) for humans and machines. We therefore applied the FAIR principles in the hospital to make patient data more FAIR. In this paper, we present our FAIR approach to transform COVID-19 observational patient data collected in the hospital into machine actionable digital objects to answer medical doctors’ research questions. With this objective, we conducted a coordinated FAIRification among stakeholders based on ontological models for data and metadata, and a FAIR based architecture that complements the existing data management. We applied FAIR Data Points for metadata exposure, turning investigational parameters into a FAIR dataset. We demonstrated that this dataset is machine actionable by means of three different computational activities: federated query of patient data along open existing knowledge sources across the world through the Semantic Web, implementing Web APIs for data query interoperability, and building applications on top of these FAIR patient data for FAIR data analytics in the hospital. Our work demonstrates that a FAIR research data management plan based on ontological models for data and metadata, open Science, Semantic Web technologies, and FAIR Data Points is providing data infrastructure in the hospital for machine actionable FAIR Digital Objects. This FAIR data is prepared to be reused for federated analysis, linkable to other FAIR data such as Linked Open Data, and reusable to develop software applications on top of them for hypothesis generation and knowledge discovery.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"90 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2022-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138538450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Defining health data elements under the HL7 development framework for metadata management 在用于元数据管理的HL7开发框架下定义运行状况数据元素
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2022-03-18 DOI: 10.1186/s13326-022-00265-5
Yang, Zhe, Jiang, Kun, Lou, Miaomiao, Gong, Yang, Zhang, Lili, Liu, Jing, Bao, Xinyu, Liu, Danhong, Yang, Peng
Health data from different specialties or domains generallly have diverse formats and meanings, which can cause semantic communication barriers when these data are exchanged among heterogeneous systems. As such, this study is intended to develop a national health concept data model (HCDM) and develop a corresponding system to facilitate healthcare data standardization and centralized metadata management. Based on 55 data sets (4640 data items) from 7 health business domains in China, a bottom-up approach was employed to build the structure and metadata for HCDM by referencing HL7 RIM. According to ISO/IEC 11179, a top-down approach was used to develop and standardize the data elements. HCDM adopted three-level architecture of class, attribute and data type, and consisted of 6 classes and 15 sub-classes. Each class had a set of descriptive attributes and every attribute was assigned a data type. 100 initial data elements (DEs) were extracted from HCDM and 144 general DEs were derived from corresponding initial DEs. Domain DEs were transformed by specializing general DEs using 12 controlled vocabularies which developed from HL7 vocabularies and actual health demands. A model-based system was successfully established to evaluate and manage the NHDD. HCDM provided a unified metadata reference for multi-source data standardization and management. This approach of defining health data elements was a feasible solution in healthcare information standardization to enable healthcare interoperability in China.
来自不同专业或领域的健康数据通常具有不同的格式和含义,当这些数据在异构系统之间交换时,可能会造成语义通信障碍。因此,本研究旨在建立国家健康概念数据模型(HCDM)并开发相应的系统,以促进医疗数据标准化和元数据的集中管理。基于来自中国7个健康业务领域的55个数据集(4640个数据项),采用自底向上的方法,参照HL7 RIM构建HCDM的结构和元数据。根据ISO/IEC 11179,采用了自顶向下的方法来开发和标准化数据元素。HCDM采用类、属性、数据类型三级架构,由6个类和15个子类组成。每个类都有一组描述性属性,每个属性都被分配了一个数据类型。从HCDM中提取了100个初始数据元素(initial data elements, DEs),并由相应的初始数据元素衍生出144个通用数据元素(general data elements),利用HL7词汇表和实际健康需求发展而来的12个受控词汇表,对通用数据元素进行专门化转换。成功建立了一个基于模型的NHDD评估与管理系统。HCDM为多源数据的标准化和管理提供了统一的元数据参考。这种定义健康数据元素的方法是实现中国医疗保健互操作性的医疗保健信息标准化的可行解决方案。
{"title":"Defining health data elements under the HL7 development framework for metadata management","authors":"Yang, Zhe, Jiang, Kun, Lou, Miaomiao, Gong, Yang, Zhang, Lili, Liu, Jing, Bao, Xinyu, Liu, Danhong, Yang, Peng","doi":"10.1186/s13326-022-00265-5","DOIUrl":"https://doi.org/10.1186/s13326-022-00265-5","url":null,"abstract":"Health data from different specialties or domains generallly have diverse formats and meanings, which can cause semantic communication barriers when these data are exchanged among heterogeneous systems. As such, this study is intended to develop a national health concept data model (HCDM) and develop a corresponding system to facilitate healthcare data standardization and centralized metadata management. Based on 55 data sets (4640 data items) from 7 health business domains in China, a bottom-up approach was employed to build the structure and metadata for HCDM by referencing HL7 RIM. According to ISO/IEC 11179, a top-down approach was used to develop and standardize the data elements. HCDM adopted three-level architecture of class, attribute and data type, and consisted of 6 classes and 15 sub-classes. Each class had a set of descriptive attributes and every attribute was assigned a data type. 100 initial data elements (DEs) were extracted from HCDM and 144 general DEs were derived from corresponding initial DEs. Domain DEs were transformed by specializing general DEs using 12 controlled vocabularies which developed from HL7 vocabularies and actual health demands. A model-based system was successfully established to evaluate and manage the NHDD. HCDM provided a unified metadata reference for multi-source data standardization and management. This approach of defining health data elements was a feasible solution in healthcare information standardization to enable healthcare interoperability in China.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"24 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2022-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138538405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Semantic modelling of common data elements for rare disease registries, and a prototype workflow for their deployment over registry data 罕见病注册表常用数据元素的语义建模,以及在注册表数据上部署这些元素的原型工作流
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2022-03-15 DOI: 10.1186/s13326-022-00264-6
Kaliyaperumal, Rajaram, Wilkinson, Mark D., Moreno, Pablo Alarcón, Benis, Nirupama, Cornet, Ronald, dos Santos Vieira, Bruna, Dumontier, Michel, Bernabé, César Henrique, Jacobsen, Annika, Le Cornec, Clémence M. A., Godoy, Mario Prieto, Queralt-Rosinach, Núria, Schultze Kool, Leo J., Swertz, Morris A., van Damme, Philip, van der Velde, K. Joeri, Lalout, Nawel, Zhang, Shuxin, Roos, Marco
The European Platform on Rare Disease Registration (EU RD Platform) aims to address the fragmentation of European rare disease (RD) patient data, scattered among hundreds of independent and non-coordinating registries, by establishing standards for integration and interoperability. The first practical output of this effort was a set of 16 Common Data Elements (CDEs) that should be implemented by all RD registries. Interoperability, however, requires decisions beyond data elements - including data models, formats, and semantics. Within the European Joint Programme on Rare Diseases (EJP RD), we aim to further the goals of the EU RD Platform by generating reusable RD semantic model templates that follow the FAIR Data Principles. Through a team-based iterative approach, we created semantically grounded models to represent each of the CDEs, using the SemanticScience Integrated Ontology as the core framework for representing the entities and their relationships. Within that framework, we mapped the concepts represented in the CDEs, and their possible values, into domain ontologies such as the Orphanet Rare Disease Ontology, Human Phenotype Ontology and National Cancer Institute Thesaurus. Finally, we created an exemplar, reusable ETL pipeline that we will be deploying over these non-coordinating data repositories to assist them in creating model-compliant FAIR data without requiring site-specific coding nor expertise in Linked Data or FAIR. Within the EJP RD project, we determined that creating reusable, expert-designed templates reduced or eliminated the requirement for our participating biomedical domain experts and rare disease data hosts to understand OWL semantics. This enabled them to publish highly expressive FAIR data using tools and approaches that were already familiar to them.
欧洲罕见病注册平台(EU RD Platform)旨在通过建立整合和互操作性标准,解决欧洲罕见病(RD)患者数据分散在数百个独立和非协调注册中心的问题。这项工作的第一个实际输出是一组16个公共数据元素(cde),应该由所有RD注册中心实现。然而,互操作性需要超越数据元素的决策——包括数据模型、格式和语义。在欧洲罕见病联合计划(EJP RD)中,我们的目标是通过生成遵循FAIR数据原则的可重用RD语义模型模板来进一步实现欧盟研发平台的目标。通过基于团队的迭代方法,我们创建了基于语义的模型来表示每个cde,使用SemanticScience集成本体作为表示实体及其关系的核心框架。在该框架内,我们将cde中表示的概念及其可能的值映射到领域本体中,例如孤儿罕见疾病本体、人类表型本体和国家癌症研究所同义词库。最后,我们创建了一个范例,可重用的ETL管道,我们将在这些非协调数据存储库上部署它,以帮助他们创建模型兼容的FAIR数据,而不需要特定于站点的编码,也不需要关联数据或FAIR方面的专业知识。在EJP RD项目中,我们确定创建可重用的、专家设计的模板减少或消除了我们参与的生物医学领域专家和罕见疾病数据主机理解OWL语义的需求。这使他们能够使用他们已经熟悉的工具和方法发布具有高度表现力的FAIR数据。
{"title":"Semantic modelling of common data elements for rare disease registries, and a prototype workflow for their deployment over registry data","authors":"Kaliyaperumal, Rajaram, Wilkinson, Mark D., Moreno, Pablo Alarcón, Benis, Nirupama, Cornet, Ronald, dos Santos Vieira, Bruna, Dumontier, Michel, Bernabé, César Henrique, Jacobsen, Annika, Le Cornec, Clémence M. A., Godoy, Mario Prieto, Queralt-Rosinach, Núria, Schultze Kool, Leo J., Swertz, Morris A., van Damme, Philip, van der Velde, K. Joeri, Lalout, Nawel, Zhang, Shuxin, Roos, Marco","doi":"10.1186/s13326-022-00264-6","DOIUrl":"https://doi.org/10.1186/s13326-022-00264-6","url":null,"abstract":"The European Platform on Rare Disease Registration (EU RD Platform) aims to address the fragmentation of European rare disease (RD) patient data, scattered among hundreds of independent and non-coordinating registries, by establishing standards for integration and interoperability. The first practical output of this effort was a set of 16 Common Data Elements (CDEs) that should be implemented by all RD registries. Interoperability, however, requires decisions beyond data elements - including data models, formats, and semantics. Within the European Joint Programme on Rare Diseases (EJP RD), we aim to further the goals of the EU RD Platform by generating reusable RD semantic model templates that follow the FAIR Data Principles. Through a team-based iterative approach, we created semantically grounded models to represent each of the CDEs, using the SemanticScience Integrated Ontology as the core framework for representing the entities and their relationships. Within that framework, we mapped the concepts represented in the CDEs, and their possible values, into domain ontologies such as the Orphanet Rare Disease Ontology, Human Phenotype Ontology and National Cancer Institute Thesaurus. Finally, we created an exemplar, reusable ETL pipeline that we will be deploying over these non-coordinating data repositories to assist them in creating model-compliant FAIR data without requiring site-specific coding nor expertise in Linked Data or FAIR. Within the EJP RD project, we determined that creating reusable, expert-designed templates reduced or eliminated the requirement for our participating biomedical domain experts and rare disease data hosts to understand OWL semantics. This enabled them to publish highly expressive FAIR data using tools and approaches that were already familiar to them.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"32 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2022-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138538468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Transfer language space with similar domain adaptation: a case study with hepatocellular carcinoma. 具有相似域适应的迁移语言空间:以肝细胞癌为例。
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2022-02-23 DOI: 10.1186/s13326-022-00262-8
Amara Tariq, Omar Kallas, Patricia Balthazar, Scott Jeffery Lee, Terry Desser, Daniel Rubin, Judy Wawira Gichoya, Imon Banerjee

Background: Transfer learning is a common practice in image classification with deep learning where the available data is often limited for training a complex model with millions of parameters. However, transferring language models requires special attention since cross-domain vocabularies (e.g. between two different modalities MR and US) do not always overlap as the pixel intensity range overlaps mostly for images.

Method: We present a concept of similar domain adaptation where we transfer inter-institutional language models (context-dependent and context-independent) between two different modalities (ultrasound and MRI) to capture liver abnormalities.

Results: We use MR and US screening exam reports for hepatocellular carcinoma as the use-case and apply the transfer language space strategy to automatically label imaging exams with and without structured template with > 0.9 average f1-score.

Conclusion: We conclude that transfer learning along with fine-tuning the discriminative model is often more effective for performing shared targeted tasks than the training for a language space from scratch.

背景:迁移学习是深度学习图像分类中的一种常见做法,其中可用数据通常有限,无法训练具有数百万个参数的复杂模型。然而,迁移语言模型需要特别注意,因为跨领域词汇表(例如在两种不同的模态MR和US之间)并不总是重叠,因为图像的像素强度范围大多重叠。方法:我们提出了类似领域适应的概念,其中我们在两种不同的模式(超声和MRI)之间转移机构间语言模型(上下文依赖和上下文独立)以捕获肝脏异常。结果:我们以肝细胞癌的MR和US筛查检查报告为例,应用迁移语言空间策略自动标记有和没有结构化模板的影像学检查,平均f1评分> 0.9。结论:我们得出的结论是,在执行共享目标任务时,迁移学习和微调判别模型通常比从头开始训练语言空间更有效。
{"title":"Transfer language space with similar domain adaptation: a case study with hepatocellular carcinoma.","authors":"Amara Tariq,&nbsp;Omar Kallas,&nbsp;Patricia Balthazar,&nbsp;Scott Jeffery Lee,&nbsp;Terry Desser,&nbsp;Daniel Rubin,&nbsp;Judy Wawira Gichoya,&nbsp;Imon Banerjee","doi":"10.1186/s13326-022-00262-8","DOIUrl":"https://doi.org/10.1186/s13326-022-00262-8","url":null,"abstract":"<p><strong>Background: </strong>Transfer learning is a common practice in image classification with deep learning where the available data is often limited for training a complex model with millions of parameters. However, transferring language models requires special attention since cross-domain vocabularies (e.g. between two different modalities MR and US) do not always overlap as the pixel intensity range overlaps mostly for images.</p><p><strong>Method: </strong>We present a concept of similar domain adaptation where we transfer inter-institutional language models (context-dependent and context-independent) between two different modalities (ultrasound and MRI) to capture liver abnormalities.</p><p><strong>Results: </strong>We use MR and US screening exam reports for hepatocellular carcinoma as the use-case and apply the transfer language space strategy to automatically label imaging exams with and without structured template with > 0.9 average f1-score.</p><p><strong>Conclusion: </strong>We conclude that transfer learning along with fine-tuning the discriminative model is often more effective for performing shared targeted tasks than the training for a language space from scratch.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":" ","pages":"8"},"PeriodicalIF":1.9,"publicationDate":"2022-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8867666/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39809029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multipurpose TNM stage ontology for cancer registries. 用于癌症登记的多用途TNM阶段本体。
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2022-02-22 DOI: 10.1186/s13326-022-00260-w
Nicholas Charles Nicholson, Francesco Giusti, Manola Bettio, Raquel Negrao Carvalho, Nadya Dimitrova, Tadeusz Dyba, Manuela Flego, Luciana Neamtiu, Giorgia Randi, Carmen Martos

Background: Population-based cancer registries are a critical reference source for the surveillance and control of cancer. Cancer registries work extensively with the internationally recognised TNM classification system used to stage solid tumours, but the system is complex and compounded by the different TNM editions in concurrent use. TNM ontologies exist but the design requirements are different for the needs of the clinical and cancer-registry domains. Two TNM ontologies developed specifically for cancer registries were designed for different purposes and have limitations for serving wider application. A unified ontology is proposed to serve the various cancer registry TNM-related tasks and reduce the multiplication effects of different ontologies serving specific tasks. The ontology is comprehensive of the rules for TNM edition 7 as required by cancer registries and designed on a modular basis to allow extension to other TNM editions.

Results: A unified ontology was developed building on the experience and design of the existing ontologies. It follows a modular approach allowing plug in of components dependent upon any particular TNM edition. A Java front-end was developed to interface with the ontology via the Web Ontology Language application programme interface and enables batch validation or classification of cancer registry records. The programme also allows the means of automated error correction in some instances. Initial tests verified the design concept by correctly inferring TNM stage and successfully handling the TNM-related validation checks on a number of cancer case records, with a performance similar to that of an existing ontology dedicated to the task.

Conclusions: The unified ontology provides a multi-purpose tool for TNM-related tasks in a cancer registry and is scalable for different editions of TNM. It offers a convenient way of quickly checking validity of cancer case stage information and for batch processing of multi-record data via a dedicated front-end programme. The ontology is adaptable to many uses, either as a standalone TNM module or as a component in applications of wider focus. It provides a first step towards a single, unified TNM ontology for cancer registries.

背景:基于人群的癌症登记是癌症监测和控制的重要参考来源。癌症登记处与国际公认的TNM分类系统广泛合作,用于分期实体肿瘤,但该系统很复杂,并且同时使用不同的TNM版本。TNM本体是存在的,但设计要求因临床和癌症注册域的需要而有所不同。专门为癌症登记处开发的两种TNM本体是为不同的目的而设计的,并且在服务于更广泛的应用方面存在局限性。提出了一个统一的本体来服务于各种癌症注册tnm相关的任务,并减少了服务于特定任务的不同本体的倍增效应。本体论综合了癌症登记处要求的TNM第7版规则,并以模块化为基础进行设计,以允许扩展到其他TNM版本。结果:基于现有本体的经验和设计,开发了统一的本体。它遵循模块化方法,允许插件依赖于任何特定TNM版本的组件。开发了Java前端,通过Web ontology Language应用程序编程接口与本体进行接口,实现对癌症注册记录的批量验证或分类。该程序还允许在某些情况下采用自动纠错的手段。初始测试通过正确推断TNM阶段并成功地处理与TNM相关的癌症病例记录的验证检查来验证设计概念,其性能与专门用于该任务的现有本体相似。结论:统一本体为癌症注册中TNM相关任务提供了一个多用途工具,并且可扩展到不同版本的TNM。它提供了一种方便的方法,快速检查癌症病例阶段信息的有效性,并通过专用的前端程序批量处理多记录数据。本体适用于多种用途,既可以作为独立的TNM模块,也可以作为更广泛应用程序中的组件。它为癌症注册提供了一个单一的、统一的TNM本体的第一步。
{"title":"A multipurpose TNM stage ontology for cancer registries.","authors":"Nicholas Charles Nicholson,&nbsp;Francesco Giusti,&nbsp;Manola Bettio,&nbsp;Raquel Negrao Carvalho,&nbsp;Nadya Dimitrova,&nbsp;Tadeusz Dyba,&nbsp;Manuela Flego,&nbsp;Luciana Neamtiu,&nbsp;Giorgia Randi,&nbsp;Carmen Martos","doi":"10.1186/s13326-022-00260-w","DOIUrl":"https://doi.org/10.1186/s13326-022-00260-w","url":null,"abstract":"<p><strong>Background: </strong>Population-based cancer registries are a critical reference source for the surveillance and control of cancer. Cancer registries work extensively with the internationally recognised TNM classification system used to stage solid tumours, but the system is complex and compounded by the different TNM editions in concurrent use. TNM ontologies exist but the design requirements are different for the needs of the clinical and cancer-registry domains. Two TNM ontologies developed specifically for cancer registries were designed for different purposes and have limitations for serving wider application. A unified ontology is proposed to serve the various cancer registry TNM-related tasks and reduce the multiplication effects of different ontologies serving specific tasks. The ontology is comprehensive of the rules for TNM edition 7 as required by cancer registries and designed on a modular basis to allow extension to other TNM editions.</p><p><strong>Results: </strong>A unified ontology was developed building on the experience and design of the existing ontologies. It follows a modular approach allowing plug in of components dependent upon any particular TNM edition. A Java front-end was developed to interface with the ontology via the Web Ontology Language application programme interface and enables batch validation or classification of cancer registry records. The programme also allows the means of automated error correction in some instances. Initial tests verified the design concept by correctly inferring TNM stage and successfully handling the TNM-related validation checks on a number of cancer case records, with a performance similar to that of an existing ontology dedicated to the task.</p><p><strong>Conclusions: </strong>The unified ontology provides a multi-purpose tool for TNM-related tasks in a cancer registry and is scalable for different editions of TNM. It offers a convenient way of quickly checking validity of cancer case stage information and for batch processing of multi-record data via a dedicated front-end programme. The ontology is adaptable to many uses, either as a standalone TNM module or as a component in applications of wider focus. It provides a first step towards a single, unified TNM ontology for cancer registries.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":" ","pages":"7"},"PeriodicalIF":1.9,"publicationDate":"2022-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8862240/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39945025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Extending electronic medical records vector models with knowledge graphs to improve hospitalization prediction. 使用知识图扩展电子医疗记录向量模型,以改进住院预测。
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2022-02-22 DOI: 10.1186/s13326-022-00261-9
Raphaël Gazzotti, Catherine Faron, Fabien Gandon, Virginie Lacroix-Hugues, David Darmon

Background: Artificial intelligence methods applied to electronic medical records (EMRs) hold the potential to help physicians save time by sharpening their analysis and decisions, thereby improving the health of patients. On the one hand, machine learning algorithms have proven their effectiveness in extracting information and exploiting knowledge extracted from data. On the other hand, knowledge graphs capture human knowledge by relying on conceptual schemas and formalization and supporting reasoning. Leveraging knowledge graphs that are legion in the medical field, it is possible to pre-process and enrich data representation used by machine learning algorithms. Medical data standardization is an opportunity to jointly exploit the richness of knowledge graphs and the capabilities of machine learning algorithms.

Methods: We propose to address the problem of hospitalization prediction for patients with an approach that enriches vector representation of EMRs with information extracted from different knowledge graphs before learning and predicting. In addition, we performed an automatic selection of features resulting from knowledge graphs to distinguish noisy ones from those that can benefit the decision making. We report the results of our experiments on the PRIMEGE PACA database that contains more than 600,000 consultations carried out by 17 general practitioners (GPs).

Results: A statistical evaluation shows that our proposed approach improves hospitalization prediction. More precisely, injecting features extracted from cross-domain knowledge graphs in the vector representation of EMRs given as input to the prediction algorithm significantly increases the F1 score of the prediction.

Conclusions: By injecting knowledge from recognized reference sources into the representation of EMRs, it is possible to significantly improve the prediction of medical events. Future work would be to evaluate the impact of a feature selection step coupled with a combination of features extracted from several knowledge graphs. A possible avenue is to study more hierarchical levels and properties related to concepts, as well as to integrate more semantic annotators to exploit unstructured data.

背景:应用于电子病历(emr)的人工智能方法有可能帮助医生通过加强分析和决策来节省时间,从而改善患者的健康状况。一方面,机器学习算法已经证明了它们在提取信息和利用从数据中提取的知识方面的有效性。另一方面,知识图通过依赖概念图式和形式化以及支持推理来捕获人类知识。利用医学领域大量的知识图谱,可以预处理和丰富机器学习算法使用的数据表示。医疗数据标准化是一个机会,可以共同利用知识图谱的丰富性和机器学习算法的能力。方法:我们提出了一种方法,通过在学习和预测之前从不同的知识图中提取信息来丰富电子病历的向量表示,以解决患者住院预测问题。此外,我们执行了从知识图中产生的特征的自动选择,以区分嘈杂的特征和有利于决策的特征。我们报告了我们在PRIMEGE PACA数据库上的实验结果,该数据库包含17名全科医生(gp)进行的60多万次咨询。结果:统计评估表明,我们提出的方法提高了住院预测。更准确地说,将从跨领域知识图中提取的特征注入到作为预测算法输入的emr向量表示中,可以显著提高预测的F1分数。结论:通过将来自公认参考来源的知识注入到电子病历的表示中,可以显著提高医疗事件的预测。未来的工作将是评估特征选择步骤与从几个知识图中提取的特征组合的影响。一种可能的方法是研究与概念相关的更多层次和属性,以及集成更多的语义注释器来利用非结构化数据。
{"title":"Extending electronic medical records vector models with knowledge graphs to improve hospitalization prediction.","authors":"Raphaël Gazzotti,&nbsp;Catherine Faron,&nbsp;Fabien Gandon,&nbsp;Virginie Lacroix-Hugues,&nbsp;David Darmon","doi":"10.1186/s13326-022-00261-9","DOIUrl":"https://doi.org/10.1186/s13326-022-00261-9","url":null,"abstract":"<p><strong>Background: </strong>Artificial intelligence methods applied to electronic medical records (EMRs) hold the potential to help physicians save time by sharpening their analysis and decisions, thereby improving the health of patients. On the one hand, machine learning algorithms have proven their effectiveness in extracting information and exploiting knowledge extracted from data. On the other hand, knowledge graphs capture human knowledge by relying on conceptual schemas and formalization and supporting reasoning. Leveraging knowledge graphs that are legion in the medical field, it is possible to pre-process and enrich data representation used by machine learning algorithms. Medical data standardization is an opportunity to jointly exploit the richness of knowledge graphs and the capabilities of machine learning algorithms.</p><p><strong>Methods: </strong>We propose to address the problem of hospitalization prediction for patients with an approach that enriches vector representation of EMRs with information extracted from different knowledge graphs before learning and predicting. In addition, we performed an automatic selection of features resulting from knowledge graphs to distinguish noisy ones from those that can benefit the decision making. We report the results of our experiments on the PRIMEGE PACA database that contains more than 600,000 consultations carried out by 17 general practitioners (GPs).</p><p><strong>Results: </strong>A statistical evaluation shows that our proposed approach improves hospitalization prediction. More precisely, injecting features extracted from cross-domain knowledge graphs in the vector representation of EMRs given as input to the prediction algorithm significantly increases the F1 score of the prediction.</p><p><strong>Conclusions: </strong>By injecting knowledge from recognized reference sources into the representation of EMRs, it is possible to significantly improve the prediction of medical events. Future work would be to evaluate the impact of a feature selection step coupled with a combination of features extracted from several knowledge graphs. A possible avenue is to study more hierarchical levels and properties related to concepts, as well as to integrate more semantic annotators to exploit unstructured data.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":" ","pages":"6"},"PeriodicalIF":1.9,"publicationDate":"2022-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8861628/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39945027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Text mining-based measurement of precision of polysomnographic reports as basis for intervention. 基于文本挖掘的多导睡眠图报告精度测量作为干预的基础。
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2022-01-31 DOI: 10.1186/s13326-022-00259-3
Florent Baty, Jemima Hegermann, Tiziana Locatelli, Claudio Rüegg, Christian Gysin, Frank Rassouli, Martin Brutsche

Background: Text mining can be applied to automate knowledge extraction from unstructured data included in medical reports and generate quality indicators applicable for medical documentation. The primary objective of this study was to apply text mining methodology for the analysis of polysomnographic medical reports in order to quantify sources of variation - here the diagnostic precision vs. the inter-rater variability - in the work-up of sleep-disordered breathing. The secondary objective was to assess the impact of a text block standardization on the diagnostic precision of polysomnography reports in an independent test set.

Results: Polysomnography reports of 243 laboratory-based overnight sleep investigations scored by 9 trained sleep specialists of the Sleep Center St. Gallen were analyzed using a text-mining methodology. Patterns in the usage of discriminating terms allowed for the characterization of type and severity of disease and inter-rater homogeneity. The variation introduced by the inter-rater (technician/physician) heterogeneity was found to be twice as high compared to the variation introduced by effective diagnostic information. A simple text block standardization could significantly reduce the inter-rater variability by 44%, enhance the predictive value and ultimately improve the diagnostic accuracy of polysomnography reports.

Conclusions: Text mining was successfully used to assess and optimize the quality, as well as the precision and homogeneity of medical reporting of diagnostic procedures - here exemplified with sleep studies. Text mining methodology could lay the ground for objective and systematic qualitative assessment of medical reports.

背景:文本挖掘可以应用于从医疗报告中包含的非结构化数据中自动提取知识,并生成适用于医疗文档的质量指标。本研究的主要目的是将文本挖掘方法应用于多导睡眠图医学报告的分析,以便量化睡眠呼吸障碍检查中变异的来源——这里是诊断精度与评分间变异的对比。第二个目标是在一个独立的测试集中评估文本块标准化对多导睡眠图报告诊断精度的影响。结果:采用文本挖掘方法分析了圣加仑睡眠中心9名训练有素的睡眠专家对243份基于实验室的夜间睡眠调查的多导睡眠图报告。区别用语的使用模式允许对疾病的类型和严重程度进行定性,并允许在不同的比率之间进行同质性。与有效诊断信息引起的差异相比,由评估者(技术员/医生)异质性引起的差异是前者的两倍。一个简单的文本块标准化可以显著降低44%的评分间变异性,提高预测值,最终提高多导睡眠图报告的诊断准确性。结论:文本挖掘成功地用于评估和优化诊断程序的质量,以及医疗报告的准确性和同质性-这里以睡眠研究为例。文本挖掘方法可以为医学报告的客观、系统的定性评估奠定基础。
{"title":"Text mining-based measurement of precision of polysomnographic reports as basis for intervention.","authors":"Florent Baty,&nbsp;Jemima Hegermann,&nbsp;Tiziana Locatelli,&nbsp;Claudio Rüegg,&nbsp;Christian Gysin,&nbsp;Frank Rassouli,&nbsp;Martin Brutsche","doi":"10.1186/s13326-022-00259-3","DOIUrl":"https://doi.org/10.1186/s13326-022-00259-3","url":null,"abstract":"<p><strong>Background: </strong>Text mining can be applied to automate knowledge extraction from unstructured data included in medical reports and generate quality indicators applicable for medical documentation. The primary objective of this study was to apply text mining methodology for the analysis of polysomnographic medical reports in order to quantify sources of variation - here the diagnostic precision vs. the inter-rater variability - in the work-up of sleep-disordered breathing. The secondary objective was to assess the impact of a text block standardization on the diagnostic precision of polysomnography reports in an independent test set.</p><p><strong>Results: </strong>Polysomnography reports of 243 laboratory-based overnight sleep investigations scored by 9 trained sleep specialists of the Sleep Center St. Gallen were analyzed using a text-mining methodology. Patterns in the usage of discriminating terms allowed for the characterization of type and severity of disease and inter-rater homogeneity. The variation introduced by the inter-rater (technician/physician) heterogeneity was found to be twice as high compared to the variation introduced by effective diagnostic information. A simple text block standardization could significantly reduce the inter-rater variability by 44%, enhance the predictive value and ultimately improve the diagnostic accuracy of polysomnography reports.</p><p><strong>Conclusions: </strong>Text mining was successfully used to assess and optimize the quality, as well as the precision and homogeneity of medical reporting of diagnostic procedures - here exemplified with sleep studies. Text mining methodology could lay the ground for objective and systematic qualitative assessment of medical reports.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":" ","pages":"5"},"PeriodicalIF":1.9,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8805265/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39576218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Ontology-based identification and prioritization of candidate drugs for epilepsy from literature. 文献中基于本体的癫痫候选药物识别和优先排序。
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2022-01-24 DOI: 10.1186/s13326-021-00258-w
Bernd Müller, Leyla Jael Castro, Dietrich Rebholz-Schuhmann

Background: Drug repurposing can improve the return of investment as it finds new uses for existing drugs. Literature-based analyses exploit factual knowledge on drugs and diseases, e.g. from databases, and combine it with information from scholarly publications. Here we report the use of the Open Discovery Process on scientific literature to identify non-explicit ties between a disease, namely epilepsy, and known drugs, making full use of available epilepsy-specific ontologies.

Results: We identified characteristics of epilepsy-specific ontologies to create subsets of documents from the literature; from these subsets we generated ranked lists of co-occurring neurological drug names with varying specificity. From these ranked lists, we observed a high intersection regarding reference lists of pharmaceutical compounds recommended for the treatment of epilepsy. Furthermore, we performed a drug set enrichment analysis, i.e. a novel scoring function using an adaptive tuning parameter and comparing top-k ranked lists taking into account the varying length and the current position in the list. We also provide an overview of the pharmaceutical space in the context of epilepsy, including a final combined ranked list of more than 70 drug names.

Conclusions: Biomedical ontologies are a rich resource that can be combined with text mining for the identification of drug names for drug repurposing in the domain of epilepsy. The ranking of the drug names related to epilepsy provides benefits to patients and to researchers as it enables a quick evaluation of statistical evidence hidden in the scientific literature, useful to validate approaches in the drug discovery process.

背景:药物再利用可以提高投资回报,因为它为现有药物找到了新的用途。基于文献的分析利用关于药物和疾病的事实知识,例如来自数据库的知识,并将其与来自学术出版物的信息结合起来。在这里,我们报告了在科学文献上使用开放发现过程来确定疾病(即癫痫)与已知药物之间的非明确联系,充分利用现有的癫痫特异性本体论。结果:我们确定了癫痫特异性本体的特征,并从文献中创建了文档子集;从这些子集中,我们生成了具有不同特异性的同时发生的神经学药物名称的排名列表。从这些排名列表中,我们观察到关于推荐用于治疗癫痫的药物化合物参考列表的高度交叉。此外,我们进行了药物集富集分析,即使用自适应调整参数的新型评分函数,并考虑到列表长度和当前位置的变化,比较排名前k的列表。我们还概述了癫痫的制药领域,包括70多个药物名称的最终综合排名。结论:生物医学本体是一种丰富的资源,可以与文本挖掘相结合,用于癫痫领域药物再利用的药物名称识别。与癫痫有关的药物名称的排名为患者和研究人员提供了好处,因为它可以快速评估隐藏在科学文献中的统计证据,有助于验证药物发现过程中的方法。
{"title":"Ontology-based identification and prioritization of candidate drugs for epilepsy from literature.","authors":"Bernd Müller,&nbsp;Leyla Jael Castro,&nbsp;Dietrich Rebholz-Schuhmann","doi":"10.1186/s13326-021-00258-w","DOIUrl":"https://doi.org/10.1186/s13326-021-00258-w","url":null,"abstract":"<p><strong>Background: </strong>Drug repurposing can improve the return of investment as it finds new uses for existing drugs. Literature-based analyses exploit factual knowledge on drugs and diseases, e.g. from databases, and combine it with information from scholarly publications. Here we report the use of the Open Discovery Process on scientific literature to identify non-explicit ties between a disease, namely epilepsy, and known drugs, making full use of available epilepsy-specific ontologies.</p><p><strong>Results: </strong>We identified characteristics of epilepsy-specific ontologies to create subsets of documents from the literature; from these subsets we generated ranked lists of co-occurring neurological drug names with varying specificity. From these ranked lists, we observed a high intersection regarding reference lists of pharmaceutical compounds recommended for the treatment of epilepsy. Furthermore, we performed a drug set enrichment analysis, i.e. a novel scoring function using an adaptive tuning parameter and comparing top-k ranked lists taking into account the varying length and the current position in the list. We also provide an overview of the pharmaceutical space in the context of epilepsy, including a final combined ranked list of more than 70 drug names.</p><p><strong>Conclusions: </strong>Biomedical ontologies are a rich resource that can be combined with text mining for the identification of drug names for drug repurposing in the domain of epilepsy. The ranking of the drug names related to epilepsy provides benefits to patients and to researchers as it enables a quick evaluation of statistical evidence hidden in the scientific literature, useful to validate approaches in the drug discovery process.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":" ","pages":"3"},"PeriodicalIF":1.9,"publicationDate":"2022-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8785029/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39945193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Journal of Biomedical Semantics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1