首页 > 最新文献

Journal of Biomedical Semantics最新文献

英文 中文
Plural and Quantified Protagonists in Free Indirect Discourse and Protagonist Projection 自由间接语篇中的复数和量词主角与主角投射
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-04-03 DOI: 10.1093/jos/ffad004
Márta Abrusán
In this paper I observe a number of new plural and (apparently) quantified examples of free indirect discourse (FID) and protagonist projection (PP). I analyse them within major current theoretical approaches, proposing extensions to these approaches where needed. In order to derive the wide range of readings observed with plural protagonists, I show how we can exploit existing mechanisms for the interpretation of plural anaphora and plural predication. The upshot is that the interpretation of plural examples of perspective shift relies on a remarkable concert of covert semantic and pragmatic operations.
在本文中,我观察了一些新的复数和(显然)量化的自由间接语篇(FID)和主角投射(PP)的例子。我在当前主要的理论方法中分析它们,并在需要的地方提出对这些方法的扩展。为了推导出复数主人公所观察到的广泛的阅读,我展示了我们如何利用现有的机制来解释复数回指和复数谓语。结果是,对复数视角转换例子的解释依赖于隐蔽的语义和语用操作的显著协调。
{"title":"Plural and Quantified Protagonists in Free Indirect Discourse and Protagonist Projection","authors":"Márta Abrusán","doi":"10.1093/jos/ffad004","DOIUrl":"https://doi.org/10.1093/jos/ffad004","url":null,"abstract":"\u0000 In this paper I observe a number of new plural and (apparently) quantified examples of free indirect discourse (FID) and protagonist projection (PP). I analyse them within major current theoretical approaches, proposing extensions to these approaches where needed. In order to derive the wide range of readings observed with plural protagonists, I show how we can exploit existing mechanisms for the interpretation of plural anaphora and plural predication. The upshot is that the interpretation of plural examples of perspective shift relies on a remarkable concert of covert semantic and pragmatic operations.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"18 1","pages":"127-151"},"PeriodicalIF":1.9,"publicationDate":"2023-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77176856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Are There Pluralities of Worlds? 世界是否存在多元性?
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-04-03 DOI: 10.1093/jos/ffad002
V. Schmitt
Indicative conditionals and configurations with neg-raising predicates have been brought up as potential candidates for constructions involving world pluralities. I argue against this hypothesis, showing that cumulativity and quantifiers targeting a plurality’s part structure cannot access the presumed world pluralities. I furthermore argue that this makes worlds special in the sense that the same tests provide evidence for pluralities in various other semantic domains.
指示性条件句和带否定谓语的结构被提出作为涉及世界复数的结构的潜在候选者。我反对这一假设,表明累积性和量词的目标是多元化的部分结构不能进入假定的世界多元化。我进一步认为,这使得世界变得特别,因为同样的测试为其他各种语义领域的多元性提供了证据。
{"title":"Are There Pluralities of Worlds?","authors":"V. Schmitt","doi":"10.1093/jos/ffad002","DOIUrl":"https://doi.org/10.1093/jos/ffad002","url":null,"abstract":"\u0000 Indicative conditionals and configurations with neg-raising predicates have been brought up as potential candidates for constructions involving world pluralities. I argue against this hypothesis, showing that cumulativity and quantifiers targeting a plurality’s part structure cannot access the presumed world pluralities. I furthermore argue that this makes worlds special in the sense that the same tests provide evidence for pluralities in various other semantic domains.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"7 1","pages":"153-178"},"PeriodicalIF":1.9,"publicationDate":"2023-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81520300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Copredication and Meaning Transfer 合作和意义转移
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-04-01 DOI: 10.1093/jos/ffad001
David Liebesman, Ofra Magidor
Copredication occurs when a sentence receives a true reading despite prima facie ascribing categorically incompatible properties to a single entity. For example, ‘The red book is by Tolstoy’ can have a true reading even though it seems that being red is only a property of physical copies, while being by Tolstoy is only a property of informational texts. A tempting strategy for resolving this tension is to claim that at least one of the predicates has a non-standard interpretation, with the salient proposal involving reinterpretation via meaning transfer. For example, in ‘The red book is by Tolstoy’, one could hold that the predicate ‘by Tolstoy’ is reinterpreted (or on the more specific proposal, transferred) to ascribe a property that physical copies can uncontroversially instantiate, such as expresses an informational text by Tolstoy. On this view, the truth of the copredicational sentence is no longer mysterious. Furthermore, such a reinterpretation view can give a straightforward account of a range of puzzling copredicational sentences involving counting an individuation. Despite these substantial virtues, we will argue that reinterpretation approaches to copredication are untenable. In §1 we introduce reinterpretation views of copredication and contrast them with key alternatives. In §2 we argue against a general reinterpretation theory of copredication on which every copredicational sentence contains at least one reinterpreted predicate. We also raise additional problems for the more specific proposal of implementing reinterpretation via meaning transfer. In §3 we argue against more limited appeals to reinterpretation on which only some copredicational sentences contain reinterpretation. In §4 we criticize a series of arguments in favour of reinterpretation theories. The upshot is that reinterpretation theories of copredication, and in particular, meaning transfer-based accounts, should be rejected.
当一个句子得到一个真实的解读时,尽管表面上把绝对不相容的属性归因于一个单独的实体,就会发生共译。例如,“红色的书是托尔斯泰的”可以有一个真正的阅读,即使看起来红色只是物理副本的属性,而托尔斯泰只是信息文本的属性。解决这种紧张关系的一个诱人策略是,声称至少有一个谓词具有非标准的解释,其突出建议涉及通过意义转移进行重新解释。例如,在“红皮书是由托尔斯泰写的”中,我们可以认为“由托尔斯泰写的”这个谓词被重新解释(或者更具体地说,被转移),以赋予物理副本可以毫无争议地实例化的属性,例如表达托尔斯泰的信息文本。根据这种观点,谓词句的真理不再是神秘的。此外,这种重新解释的观点可以直接说明一系列令人困惑的共谓词句子,包括计数个性化。尽管有这些实质性的优点,我们将认为,重新解释的方法来共同复制是站不住脚的。在§1中,我们介绍了共同复制的重新解释观点,并将它们与关键替代观点进行了对比。在§2中,我们论证了一种普遍的重复解释理论,即每个重复解释的句子至少包含一个重新解释的谓词。我们还提出了通过意义迁移实施重新解释的更具体建议的附加问题。在§3中,我们反对更有限的重新解释,因为只有某些谓词式的句子才有重新解释。在§4中,我们批判了一系列支持重新解释理论的论证。其结果是,应该拒绝重新解释共同合作的理论,尤其是基于转账的账户。
{"title":"Copredication and Meaning Transfer","authors":"David Liebesman, Ofra Magidor","doi":"10.1093/jos/ffad001","DOIUrl":"https://doi.org/10.1093/jos/ffad001","url":null,"abstract":"\u0000 Copredication occurs when a sentence receives a true reading despite prima facie ascribing categorically incompatible properties to a single entity. For example, ‘The red book is by Tolstoy’ can have a true reading even though it seems that being red is only a property of physical copies, while being by Tolstoy is only a property of informational texts.\u0000 A tempting strategy for resolving this tension is to claim that at least one of the predicates has a non-standard interpretation, with the salient proposal involving reinterpretation via meaning transfer. For example, in ‘The red book is by Tolstoy’, one could hold that the predicate ‘by Tolstoy’ is reinterpreted (or on the more specific proposal, transferred) to ascribe a property that physical copies can uncontroversially instantiate, such as expresses an informational text by Tolstoy. On this view, the truth of the copredicational sentence is no longer mysterious. Furthermore, such a reinterpretation view can give a straightforward account of a range of puzzling copredicational sentences involving counting an individuation.\u0000 Despite these substantial virtues, we will argue that reinterpretation approaches to copredication are untenable. In §1 we introduce reinterpretation views of copredication and contrast them with key alternatives. In §2 we argue against a general reinterpretation theory of copredication on which every copredicational sentence contains at least one reinterpreted predicate. We also raise additional problems for the more specific proposal of implementing reinterpretation via meaning transfer. In §3 we argue against more limited appeals to reinterpretation on which only some copredicational sentences contain reinterpretation. In §4 we criticize a series of arguments in favour of reinterpretation theories. The upshot is that reinterpretation theories of copredication, and in particular, meaning transfer-based accounts, should be rejected.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"3 1","pages":"69-91"},"PeriodicalIF":1.9,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81558248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Environmental Conditions, Treatments, and Exposures Ontology (ECTO): connecting toxicology and exposure to human health and beyond. 环境条件、治疗和暴露本体论(ECTO):将毒理学和暴露与人类健康及其他联系起来。
IF 1.6 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-02-24 DOI: 10.1186/s13326-023-00283-x
Lauren E Chan, Anne E Thessen, William D Duncan, Nicolas Matentzoglu, Charles Schmitt, Cynthia J Grondin, Nicole Vasilevsky, Julie A McMurry, Peter N Robinson, Christopher J Mungall, Melissa A Haendel

Background: Evaluating the impact of environmental exposures on organism health is a key goal of modern biomedicine and is critically important in an age of greater pollution and chemicals in our environment. Environmental health utilizes many different research methods and generates a variety of data types. However, to date, no comprehensive database represents the full spectrum of environmental health data. Due to a lack of interoperability between databases, tools for integrating these resources are needed. In this manuscript we present the Environmental Conditions, Treatments, and Exposures Ontology (ECTO), a species-agnostic ontology focused on exposure events that occur as a result of natural and experimental processes, such as diet, work, or research activities. ECTO is intended for use in harmonizing environmental health data resources to support cross-study integration and inference for mechanism discovery.

Methods and findings: ECTO is an ontology designed for describing organismal exposures such as toxicological research, environmental variables, dietary features, and patient-reported data from surveys. ECTO utilizes the base model established within the Exposure Ontology (ExO). ECTO is developed using a combination of manual curation and Dead Simple OWL Design Patterns (DOSDP), and contains over 2700 environmental exposure terms, and incorporates chemical and environmental ontologies. ECTO is an Open Biological and Biomedical Ontology (OBO) Foundry ontology that is designed for interoperability, reuse, and axiomatization with other ontologies. ECTO terms have been utilized in axioms within the Mondo Disease Ontology to represent diseases caused or influenced by environmental factors, as well as for survey encoding for the Personalized Environment and Genes Study (PEGS).

Conclusions: We constructed ECTO to meet Open Biological and Biomedical Ontology (OBO) Foundry principles to increase translation opportunities between environmental health and other areas of biology. ECTO has a growing community of contributors consisting of toxicologists, public health epidemiologists, and health care providers to provide the necessary expertise for areas that have been identified previously as gaps.

背景:评估环境暴露对生物体健康的影响是现代生物医学的一个关键目标,在环境污染和化学品日益严重的时代,这一点至关重要。环境健康利用了许多不同的研究方法,并生成了各种类型的数据。然而,迄今为止,还没有一个全面的数据库能代表环境健康数据的全部内容。由于数据库之间缺乏互操作性,因此需要整合这些资源的工具。在本手稿中,我们介绍了环境条件、处理和暴露本体论(ECTO),这是一个与物种无关的本体论,重点关注因饮食、工作或研究活动等自然和实验过程而发生的暴露事件。ECTO 旨在用于协调环境健康数据资源,以支持跨研究整合和机制发现推论:ECTO 是一种本体论,用于描述生物体暴露,如毒理学研究、环境变量、饮食特征和来自调查的患者报告数据。ECTO 利用暴露本体(ExO)中建立的基础模型。ECTO 是通过手工整理和 Dead Simple OWL Design Patterns(DOSDP)相结合的方式开发的,包含 2700 多个环境暴露术语,并融合了化学和环境本体。ECTO 是一个开放式生物和生物医学本体论(OBO)基金会本体论,旨在实现与其他本体论的互操作性、重复使用和公理化。ECTO术语已被用于蒙多疾病本体的公理中,以表示由环境因素引起或影响的疾病,以及用于个性化环境与基因研究(PEGS)的调查编码:我们构建的 ECTO 符合开放生物和生物医学本体论 (OBO) 基金会的原则,以增加环境健康与其他生物学领域之间的转化机会。ECTO 的贡献者群体在不断壮大,其中包括毒理学家、公共卫生流行病学家和医疗保健提供者,他们为之前被确定为空白的领域提供了必要的专业知识。
{"title":"The Environmental Conditions, Treatments, and Exposures Ontology (ECTO): connecting toxicology and exposure to human health and beyond.","authors":"Lauren E Chan, Anne E Thessen, William D Duncan, Nicolas Matentzoglu, Charles Schmitt, Cynthia J Grondin, Nicole Vasilevsky, Julie A McMurry, Peter N Robinson, Christopher J Mungall, Melissa A Haendel","doi":"10.1186/s13326-023-00283-x","DOIUrl":"10.1186/s13326-023-00283-x","url":null,"abstract":"<p><strong>Background: </strong>Evaluating the impact of environmental exposures on organism health is a key goal of modern biomedicine and is critically important in an age of greater pollution and chemicals in our environment. Environmental health utilizes many different research methods and generates a variety of data types. However, to date, no comprehensive database represents the full spectrum of environmental health data. Due to a lack of interoperability between databases, tools for integrating these resources are needed. In this manuscript we present the Environmental Conditions, Treatments, and Exposures Ontology (ECTO), a species-agnostic ontology focused on exposure events that occur as a result of natural and experimental processes, such as diet, work, or research activities. ECTO is intended for use in harmonizing environmental health data resources to support cross-study integration and inference for mechanism discovery.</p><p><strong>Methods and findings: </strong>ECTO is an ontology designed for describing organismal exposures such as toxicological research, environmental variables, dietary features, and patient-reported data from surveys. ECTO utilizes the base model established within the Exposure Ontology (ExO). ECTO is developed using a combination of manual curation and Dead Simple OWL Design Patterns (DOSDP), and contains over 2700 environmental exposure terms, and incorporates chemical and environmental ontologies. ECTO is an Open Biological and Biomedical Ontology (OBO) Foundry ontology that is designed for interoperability, reuse, and axiomatization with other ontologies. ECTO terms have been utilized in axioms within the Mondo Disease Ontology to represent diseases caused or influenced by environmental factors, as well as for survey encoding for the Personalized Environment and Genes Study (PEGS).</p><p><strong>Conclusions: </strong>We constructed ECTO to meet Open Biological and Biomedical Ontology (OBO) Foundry principles to increase translation opportunities between environmental health and other areas of biology. ECTO has a growing community of contributors consisting of toxicologists, public health epidemiologists, and health care providers to provide the necessary expertise for areas that have been identified previously as gaps.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"14 1","pages":"3"},"PeriodicalIF":1.6,"publicationDate":"2023-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9951428/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9257159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Focused NPIs in Statements and Questions 陈述和问题中的重点npi
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-02-16 DOI: 10.1093/jos/ffac014
Sunwoo Jeong, F. Roelofsen
Negative Polarity Items (NPIs) with emphatic prosody such as ANY or EVER, and minimizers such as lift a finger or sleep a wink are known to generate particular contextual inferences that are absent in the case of non-emphatic NPIs such as unstressed any or ever. It remains an open question, however, what the exact status of these inferences is and how they come about. In this paper, we analyze these cases as NPIs bearing focus, and examine the interaction between focus semantics and the lexical semantics of NPIs across statements and questions. In the process, we refine and expand the empirical landscape by demonstrating that focused NPIs give rise to a variety of apparently heterogeneous contextual inferences, including domain widening in statements and inferences of negative bias in questions. These inferences are further shown to be modulated in subtle ways depending on the specific clause-type in which the NPI occurs (e.g., polar questions vs. wh-questions) and the type of emphatic NPI involved (e.g., ANY vs. lift a finger). Building on these empirical observations, we propose a unified account of NPIs which posits a single core semantic operator, even, across both focused and unfocused NPIs. What plays a central role in our account is the additive component of even, which we formulate in such a way that it applies uniformly across statements and questions. This additive component of even, intuitively paraphrased as the implication that all salient focus alternatives of the prejacent of the operator must be settled in the doxastic state of the speaker, is selectively activated depending on the presence of focus alternatives, and is shown to be able to derive all the observed contextual inferences stemming from focused NPIs, both in statements and in questions.
具有强调韵律的负极性词(npi),如ANY或EVER,以及最小化词(如举起手指或眨个眼),已知会产生特定的上下文推断,而非强调的npi(如unstressed ANY或EVER)则不存在这种情况。然而,这些推论的确切地位是什么以及它们是如何产生的,这仍然是一个悬而未决的问题。在本文中,我们分析了这些带有焦点的npi案例,并考察了焦点语义与npi在语句和疑问句中的词汇语义之间的相互作用。在此过程中,我们通过证明集中的npi会产生各种明显异质的上下文推断,包括陈述中的领域扩大和问题中的负面偏见推断,来完善和扩展实证景观。这些推论被进一步证明以微妙的方式被调节,这取决于发生NPI的特定从句类型(例如,极性问题与whh问题)和所涉及的强调NPI类型(例如,ANY与举手抬指头)。在这些经验观察的基础上,我们提出了一个统一的npi账户,它假设一个单一的核心语义算子,甚至在集中和非集中的npi之间。在我们的描述中起核心作用的是偶数的加性成分,我们以这样一种方式制定,它统一适用于陈述和问题。偶数的这个附加成分,直观地解释为,操作员在场的所有突出的焦点选择必须在说话人的不确定性状态下解决,根据焦点选择的存在有选择地激活,并且能够推导出所有观察到的来自焦点npi的上下文推断,无论是在陈述中还是在问题中。
{"title":"Focused NPIs in Statements and Questions","authors":"Sunwoo Jeong, F. Roelofsen","doi":"10.1093/jos/ffac014","DOIUrl":"https://doi.org/10.1093/jos/ffac014","url":null,"abstract":"\u0000 Negative Polarity Items (NPIs) with emphatic prosody such as ANY or EVER, and minimizers such as lift a finger or sleep a wink are known to generate particular contextual inferences that are absent in the case of non-emphatic NPIs such as unstressed any or ever. It remains an open question, however, what the exact status of these inferences is and how they come about. In this paper, we analyze these cases as NPIs bearing focus, and examine the interaction between focus semantics and the lexical semantics of NPIs across statements and questions. In the process, we refine and expand the empirical landscape by demonstrating that focused NPIs give rise to a variety of apparently heterogeneous contextual inferences, including domain widening in statements and inferences of negative bias in questions. These inferences are further shown to be modulated in subtle ways depending on the specific clause-type in which the NPI occurs (e.g., polar questions vs. wh-questions) and the type of emphatic NPI involved (e.g., ANY vs. lift a finger). Building on these empirical observations, we propose a unified account of NPIs which posits a single core semantic operator, even, across both focused and unfocused NPIs. What plays a central role in our account is the additive component of even, which we formulate in such a way that it applies uniformly across statements and questions. This additive component of even, intuitively paraphrased as the implication that all salient focus alternatives of the prejacent of the operator must be settled in the doxastic state of the speaker, is selectively activated depending on the presence of focus alternatives, and is shown to be able to derive all the observed contextual inferences stemming from focused NPIs, both in statements and in questions.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"34 1","pages":"1-68"},"PeriodicalIF":1.9,"publicationDate":"2023-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81055982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
MedLexSp - a medical lexicon for Spanish medical natural language processing. MedLexSp -用于西班牙医学自然语言处理的医学词典。
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-02-02 DOI: 10.1186/s13326-022-00281-5
Leonardo Campillos-Llanos

Background: Medical lexicons enable the natural language processing (NLP) of health texts. Lexicons gather terms and concepts from thesauri and ontologies, and linguistic data for part-of-speech (PoS) tagging, lemmatization or natural language generation. To date, there is no such type of resource for Spanish.

Construction and content: This article describes an unified medical lexicon for Medical Natural Language Processing in Spanish. MedLexSp includes terms and inflected word forms with PoS information and Unified Medical Language System[Formula: see text] (UMLS) semantic types, groups and Concept Unique Identifiers (CUIs). To create it, we used NLP techniques and domain corpora (e.g. MedlinePlus). We also collected terms from the Dictionary of Medical Terms from the Spanish Royal Academy of Medicine, the Medical Subject Headings (MeSH), the Systematized Nomenclature of Medicine - Clinical Terms (SNOMED-CT), the Medical Dictionary for Regulatory Activities Terminology (MedDRA), the International Classification of Diseases vs. 10, the Anatomical Therapeutic Chemical Classification, the National Cancer Institute (NCI) Dictionary, the Online Mendelian Inheritance in Man (OMIM) and OrphaData. Terms related to COVID-19 were assembled by applying a similarity-based approach with word embeddings trained on a large corpus. MedLexSp includes 100 887 lemmas, 302 543 inflected forms (conjugated verbs, and number/gender variants), and 42 958 UMLS CUIs. We report two use cases of MedLexSp. First, applying the lexicon to pre-annotate a corpus of 1200 texts related to clinical trials. Second, PoS tagging and lemmatizing texts about clinical cases. MedLexSp improved the scores for PoS tagging and lemmatization compared to the default Spacy and Stanza python libraries.

Conclusions: The lexicon is distributed in a delimiter-separated value file; an XML file with the Lexical Markup Framework; a lemmatizer module for the Spacy and Stanza libraries; and complementary Lexical Record (LR) files. The embeddings and code to extract COVID-19 terms, and the Spacy and Stanza lemmatizers enriched with medical terms are provided in a public repository.

背景:医学词汇使健康文本的自然语言处理(NLP)成为可能。词汇表从同义词典和本体论中收集术语和概念,以及词性标注、词序化或自然语言生成的语言数据。到目前为止,还没有这种类型的西班牙语资源。结构与内容:本文描述了一个用于西班牙语医学自然语言处理的统一医学词典。MedLexSp包括带有PoS信息的术语和屈折词形,以及统一医学语言系统(UMLS)的语义类型、组和概念唯一标识符(gui)。为了创建它,我们使用了NLP技术和领域语料库(例如MedlinePlus)。我们还从西班牙皇家医学院医学术语词典、医学主题词(MeSH)、医学系统命名法-临床术语(SNOMED-CT)、调节活动术语医学词典(MedDRA)、国际疾病分类与10、解剖治疗化学分类、国家癌症研究所(NCI)词典、在线孟德尔人类遗传(OMIM)和孤儿数据中收集术语。采用基于相似性的方法和在大型语料库上训练的词嵌入来组装与COVID-19相关的术语。MedLexSp包括100 887个引理,302 543个屈折形式(共轭动词和数/性别变体)和42 958个UMLS gui。我们报告MedLexSp的两个用例。首先,应用该词典对1200篇临床试验相关文本的语料库进行预注释。第二,临床病例文本的词性标注和词性化。与默认的Spacy和Stanza python库相比,MedLexSp提高了PoS标记和词序化的分数。结论:词典分布在一个分隔符分隔的值文件中;带有词法标记框架的XML文件;用于Spacy和Stanza库的词法分析器模块;和补充词法记录(LR)文件。在公共存储库中提供了用于提取COVID-19术语的嵌入和代码,以及丰富了医学术语的空间和Stanza词形器。
{"title":"MedLexSp - a medical lexicon for Spanish medical natural language processing.","authors":"Leonardo Campillos-Llanos","doi":"10.1186/s13326-022-00281-5","DOIUrl":"10.1186/s13326-022-00281-5","url":null,"abstract":"<p><strong>Background: </strong>Medical lexicons enable the natural language processing (NLP) of health texts. Lexicons gather terms and concepts from thesauri and ontologies, and linguistic data for part-of-speech (PoS) tagging, lemmatization or natural language generation. To date, there is no such type of resource for Spanish.</p><p><strong>Construction and content: </strong>This article describes an unified medical lexicon for Medical Natural Language Processing in Spanish. MedLexSp includes terms and inflected word forms with PoS information and Unified Medical Language System[Formula: see text] (UMLS) semantic types, groups and Concept Unique Identifiers (CUIs). To create it, we used NLP techniques and domain corpora (e.g. MedlinePlus). We also collected terms from the Dictionary of Medical Terms from the Spanish Royal Academy of Medicine, the Medical Subject Headings (MeSH), the Systematized Nomenclature of Medicine - Clinical Terms (SNOMED-CT), the Medical Dictionary for Regulatory Activities Terminology (MedDRA), the International Classification of Diseases vs. 10, the Anatomical Therapeutic Chemical Classification, the National Cancer Institute (NCI) Dictionary, the Online Mendelian Inheritance in Man (OMIM) and OrphaData. Terms related to COVID-19 were assembled by applying a similarity-based approach with word embeddings trained on a large corpus. MedLexSp includes 100 887 lemmas, 302 543 inflected forms (conjugated verbs, and number/gender variants), and 42 958 UMLS CUIs. We report two use cases of MedLexSp. First, applying the lexicon to pre-annotate a corpus of 1200 texts related to clinical trials. Second, PoS tagging and lemmatizing texts about clinical cases. MedLexSp improved the scores for PoS tagging and lemmatization compared to the default Spacy and Stanza python libraries.</p><p><strong>Conclusions: </strong>The lexicon is distributed in a delimiter-separated value file; an XML file with the Lexical Markup Framework; a lemmatizer module for the Spacy and Stanza libraries; and complementary Lexical Record (LR) files. The embeddings and code to extract COVID-19 terms, and the Spacy and Stanza lemmatizers enriched with medical terms are provided in a public repository.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"14 1","pages":"2"},"PeriodicalIF":1.9,"publicationDate":"2023-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9892682/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9619937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Classifying literature mentions of biological pathogens as experimentally studied using natural language processing. 将提及生物病原体的文献分类为使用自然语言处理进行实验研究。
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-01-31 DOI: 10.1186/s13326-023-00282-y
Antonio Jose Jimeno Yepes, Karin Verspoor
<p><strong>Background: </strong>Information pertaining to mechanisms, management and treatment of disease-causing pathogens including viruses and bacteria is readily available from research publications indexed in MEDLINE. However, identifying the literature that specifically characterises these pathogens and their properties based on experimental research, important for understanding of the molecular basis of diseases caused by these agents, requires sifting through a large number of articles to exclude incidental mentions of the pathogens, or references to pathogens in other non-experimental contexts such as public health.</p><p><strong>Objective: </strong>In this work, we lay the foundations for the development of automatic methods for characterising mentions of pathogens in scientific literature, focusing on the task of identifying research that involves the experimental study of a pathogen in an experimental context. There are no manually annotated pathogen corpora available for this purpose, while such resources are necessary to support the development of machine learning-based models. We therefore aim to fill this gap, producing a large data set automatically from MEDLINE under some simplifying assumptions for the task definition, and using it to explore automatic methods that specifically support the detection of experimentally studied pathogen mentions in research publications.</p><p><strong>Methods: </strong>We developed a pathogen mention characterisation literature data set -READBiomed-Pathogens- automatically using NCBI resources, which we make available. Resources such as the NCBI Taxonomy, MeSH and GenBank can be used effectively to identify relevant literature about experimentally researched pathogens, more specifically using MeSH to link to MEDLINE citations including titles and abstracts with experimentally researched pathogens. We experiment with several machine learning-based natural language processing (NLP) algorithms leveraging this data set as training data, to model the task of detecting papers that specifically describe experimental study of a pathogen.</p><p><strong>Results: </strong>We show that our data set READBiomed-Pathogens can be used to explore natural language processing configurations for experimental pathogen mention characterisation. READBiomed-Pathogens includes citations related to organisms including bacteria, viruses, and a small number of toxins and other disease-causing agents.</p><p><strong>Conclusions: </strong>We studied the characterisation of experimentally studied pathogens in scientific literature, developing several natural language processing methods supported by an automatically developed data set. As a core contribution of the work, we presented a methodology to automatically construct a data set for pathogen identification using existing biomedical resources. The data set and the annotation code are made publicly available. Performance of the pathogen mention identification and characterisa
背景:有关致病病原体(包括病毒和细菌)的机制、管理和治疗的信息可以从MEDLINE上的研究出版物中轻易获得。然而,在实验研究的基础上确定具体表征这些病原体及其特性的文献,这对于理解这些病原体引起的疾病的分子基础很重要,需要筛选大量文章,以排除偶然提及病原体的情况,或在公共卫生等其他非实验环境中提及病原体。目的:在这项工作中,我们为开发科学文献中病原体提及的自动表征方法奠定了基础,重点是识别涉及在实验背景下对病原体进行实验研究的研究。目前还没有可用于此目的的手动注释病原体语料库,而这些资源对于支持基于机器学习的模型的开发是必要的。因此,我们的目标是填补这一空白,在任务定义的一些简化假设下,从MEDLINE自动生成一个大型数据集,并使用它来探索专门支持检测研究出版物中提及的实验研究病原体的自动方法。方法:我们使用我们提供的NCBI资源自动开发了一个病原体提及表征文献数据集——READBiomed病原体。NCBI分类法、MeSH和GenBank等资源可以有效地用于识别有关实验研究病原体的相关文献,更具体地说,使用MeSH链接到MEDLINE引文,包括实验研究病原体标题和摘要。我们实验了几种基于机器学习的自然语言处理(NLP)算法,利用这些数据集作为训练数据,对检测专门描述病原体实验研究的论文的任务进行建模。结果:我们表明,我们的数据集READBiomed病原体可用于探索实验病原体提及表征的自然语言处理配置。READBiomed病原体包括与生物体相关的引文,包括细菌、病毒、少量毒素和其他致病因子。结论:我们研究了科学文献中实验研究病原体的特征,开发了几种由自动开发的数据集支持的自然语言处理方法。作为这项工作的核心贡献,我们提出了一种利用现有生物医学资源自动构建病原体识别数据集的方法。数据集和注释代码是公开的。病原体提及识别和表征算法的性能在一个小的手动注释数据集上进行了额外评估,表明我们生成的数据集允许表征感兴趣的病原体。试用注册:不适用。
{"title":"Classifying literature mentions of biological pathogens as experimentally studied using natural language processing.","authors":"Antonio Jose Jimeno Yepes,&nbsp;Karin Verspoor","doi":"10.1186/s13326-023-00282-y","DOIUrl":"10.1186/s13326-023-00282-y","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;Information pertaining to mechanisms, management and treatment of disease-causing pathogens including viruses and bacteria is readily available from research publications indexed in MEDLINE. However, identifying the literature that specifically characterises these pathogens and their properties based on experimental research, important for understanding of the molecular basis of diseases caused by these agents, requires sifting through a large number of articles to exclude incidental mentions of the pathogens, or references to pathogens in other non-experimental contexts such as public health.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;In this work, we lay the foundations for the development of automatic methods for characterising mentions of pathogens in scientific literature, focusing on the task of identifying research that involves the experimental study of a pathogen in an experimental context. There are no manually annotated pathogen corpora available for this purpose, while such resources are necessary to support the development of machine learning-based models. We therefore aim to fill this gap, producing a large data set automatically from MEDLINE under some simplifying assumptions for the task definition, and using it to explore automatic methods that specifically support the detection of experimentally studied pathogen mentions in research publications.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;We developed a pathogen mention characterisation literature data set -READBiomed-Pathogens- automatically using NCBI resources, which we make available. Resources such as the NCBI Taxonomy, MeSH and GenBank can be used effectively to identify relevant literature about experimentally researched pathogens, more specifically using MeSH to link to MEDLINE citations including titles and abstracts with experimentally researched pathogens. We experiment with several machine learning-based natural language processing (NLP) algorithms leveraging this data set as training data, to model the task of detecting papers that specifically describe experimental study of a pathogen.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;We show that our data set READBiomed-Pathogens can be used to explore natural language processing configurations for experimental pathogen mention characterisation. READBiomed-Pathogens includes citations related to organisms including bacteria, viruses, and a small number of toxins and other disease-causing agents.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusions: &lt;/strong&gt;We studied the characterisation of experimentally studied pathogens in scientific literature, developing several natural language processing methods supported by an automatically developed data set. As a core contribution of the work, we presented a methodology to automatically construct a data set for pathogen identification using existing biomedical resources. The data set and the annotation code are made publicly available. Performance of the pathogen mention identification and characterisa","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"14 1","pages":"1"},"PeriodicalIF":1.9,"publicationDate":"2023-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9889128/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9243626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
We are not ready yet: limitations of state-of-the-art disease named entity recognizers. 我们还没有准备好:最先进的疾病命名实体识别器的局限性。
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2022-10-27 DOI: 10.1186/s13326-022-00280-6
Lisa Kühnel, Juliane Fluck

Background: Intense research has been done in the area of biomedical natural language processing. Since the breakthrough of transfer learning-based methods, BERT models are used in a variety of biomedical and clinical applications. For the available data sets, these models show excellent results - partly exceeding the inter-annotator agreements. However, biomedical named entity recognition applied on COVID-19 preprints shows a performance drop compared to the results on test data. The question arises how well trained models are able to predict on completely new data, i.e. to generalize.

Results: Based on the example of disease named entity recognition, we investigate the robustness of different machine learning-based methods - thereof transfer learning - and show that current state-of-the-art methods work well for a given training and the corresponding test set but experience a significant lack of generalization when applying to new data.

Conclusions: We argue that there is a need for larger annotated data sets for training and testing. Therefore, we foresee the curation of further data sets and, moreover, the investigation of continual learning processes for machine learning-based models.

背景:生物医学领域的自然语言处理已经得到了广泛的研究。自从基于迁移学习的方法取得突破以来,BERT模型被用于各种生物医学和临床应用。对于可用的数据集,这些模型显示出出色的结果-部分超过了注释者之间的协议。但是,在COVID-19预印本上应用生物医学命名实体识别的性能与测试数据的结果相比有所下降。问题来了,训练有素的模型如何能够在全新的数据上进行预测,即泛化。结果:基于疾病命名实体识别的例子,我们研究了不同的基于机器学习的方法(即迁移学习)的鲁棒性,并表明当前最先进的方法对于给定的训练和相应的测试集工作良好,但在应用于新数据时明显缺乏泛化。结论:我们认为需要更大的带注释的数据集来进行训练和测试。因此,我们预见到进一步的数据集的管理,而且,基于机器学习的模型的持续学习过程的调查。
{"title":"We are not ready yet: limitations of state-of-the-art disease named entity recognizers.","authors":"Lisa Kühnel,&nbsp;Juliane Fluck","doi":"10.1186/s13326-022-00280-6","DOIUrl":"https://doi.org/10.1186/s13326-022-00280-6","url":null,"abstract":"<p><strong>Background: </strong>Intense research has been done in the area of biomedical natural language processing. Since the breakthrough of transfer learning-based methods, BERT models are used in a variety of biomedical and clinical applications. For the available data sets, these models show excellent results - partly exceeding the inter-annotator agreements. However, biomedical named entity recognition applied on COVID-19 preprints shows a performance drop compared to the results on test data. The question arises how well trained models are able to predict on completely new data, i.e. to generalize.</p><p><strong>Results: </strong>Based on the example of disease named entity recognition, we investigate the robustness of different machine learning-based methods - thereof transfer learning - and show that current state-of-the-art methods work well for a given training and the corresponding test set but experience a significant lack of generalization when applying to new data.</p><p><strong>Conclusions: </strong>We argue that there is a need for larger annotated data sets for training and testing. Therefore, we foresee the curation of further data sets and, moreover, the investigation of continual learning processes for machine learning-based models.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":" ","pages":"26"},"PeriodicalIF":1.9,"publicationDate":"2022-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9612606/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40429097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A comprehensive update on CIDO: the community-based coronavirus infectious disease ontology. 全面更新 CIDO:基于社区的冠状病毒传染病本体。
IF 2 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2022-10-21 DOI: 10.1186/s13326-022-00279-z
Yongqun He, Hong Yu, Anthony Huffman, Asiyah Yu Lin, Darren A Natale, John Beverley, Ling Zheng, Yehoshua Perl, Zhigang Wang, Yingtong Liu, Edison Ong, Yang Wang, Philip Huang, Long Tran, Jinyang Du, Zalan Shah, Easheta Shah, Roshan Desai, Hsin-Hui Huang, Yujia Tian, Eric Merrell, William D Duncan, Sivaram Arabandi, Lynn M Schriml, Jie Zheng, Anna Maria Masci, Liwei Wang, Hongfang Liu, Fatima Zohra Smaili, Robert Hoehndorf, Zoë May Pendlington, Paola Roncaglia, Xianwei Ye, Jiangan Xie, Yi-Wei Tang, Xiaolin Yang, Suyuan Peng, Luxia Zhang, Luonan Chen, Junguk Hur, Gilbert S Omenn, Brian Athey, Barry Smith

Background: The current COVID-19 pandemic and the previous SARS/MERS outbreaks of 2003 and 2012 have resulted in a series of major global public health crises. We argue that in the interest of developing effective and safe vaccines and drugs and to better understand coronaviruses and associated disease mechenisms it is necessary to integrate the large and exponentially growing body of heterogeneous coronavirus data. Ontologies play an important role in standard-based knowledge and data representation, integration, sharing, and analysis. Accordingly, we initiated the development of the community-based Coronavirus Infectious Disease Ontology (CIDO) in early 2020.

Results: As an Open Biomedical Ontology (OBO) library ontology, CIDO is open source and interoperable with other existing OBO ontologies. CIDO is aligned with the Basic Formal Ontology and Viral Infectious Disease Ontology. CIDO has imported terms from over 30 OBO ontologies. For example, CIDO imports all SARS-CoV-2 protein terms from the Protein Ontology, COVID-19-related phenotype terms from the Human Phenotype Ontology, and over 100 COVID-19 terms for vaccines (both authorized and in clinical trial) from the Vaccine Ontology. CIDO systematically represents variants of SARS-CoV-2 viruses and over 300 amino acid substitutions therein, along with over 300 diagnostic kits and methods. CIDO also describes hundreds of host-coronavirus protein-protein interactions (PPIs) and the drugs that target proteins in these PPIs. CIDO has been used to model COVID-19 related phenomena in areas such as epidemiology. The scope of CIDO was evaluated by visual analysis supported by a summarization network method. CIDO has been used in various applications such as term standardization, inference, natural language processing (NLP) and clinical data integration. We have applied the amino acid variant knowledge present in CIDO to analyze differences between SARS-CoV-2 Delta and Omicron variants. CIDO's integrative host-coronavirus PPIs and drug-target knowledge has also been used to support drug repurposing for COVID-19 treatment.

Conclusion: CIDO represents entities and relations in the domain of coronavirus diseases with a special focus on COVID-19. It supports shared knowledge representation, data and metadata standardization and integration, and has been used in a range of applications.

背景:当前的 COVID-19 大流行以及之前 2003 年和 2012 年的 SARS/MERS 爆发导致了一系列重大的全球公共卫生危机。我们认为,为了开发有效、安全的疫苗和药物,更好地了解冠状病毒和相关疾病机制,有必要整合大量呈指数级增长的异构冠状病毒数据。本体在基于标准的知识和数据表示、整合、共享和分析方面发挥着重要作用。因此,我们在2020年初启动了基于社区的冠状病毒传染病本体(CIDO)的开发工作:作为一个开放生物医学本体(OBO)库本体,CIDO是开源的,并可与其他现有的OBO本体互操作。CIDO与基本形式本体(Basic Formal Ontology)和病毒性传染病本体(Viral Infectious Disease Ontology)保持一致。CIDO 从 30 多个 OBO 本体中导入了术语。例如,CIDO从蛋白质本体论(Protein Ontology)中导入了所有SARS-CoV-2蛋白质术语,从人类表型本体论(Human Phenotype Ontology)中导入了与COVID-19相关的表型术语,并从疫苗本体论(Vaccine Ontology)中导入了100多个COVID-19疫苗术语(包括授权疫苗和临床试验疫苗)。CIDO系统地描述了SARS-CoV-2病毒的变种及其300多个氨基酸替换,以及300多种诊断试剂盒和方法。CIDO还描述了数百种宿主-冠状病毒蛋白质-蛋白质相互作用(PPI)以及针对这些PPI中蛋白质的药物。CIDO已被用于模拟COVID-19在流行病学等领域的相关现象。在总结网络方法的支持下,通过视觉分析对CIDO的范围进行了评估。CIDO已被用于术语标准化、推理、自然语言处理(NLP)和临床数据整合等多种应用中。我们将CIDO中的氨基酸变体知识用于分析SARS-CoV-2 Delta和Omicron变体之间的差异。CIDO的宿主-冠状病毒PPIs和药物-靶点整合知识还被用于支持COVID-19治疗药物的再利用:CIDO代表了冠状病毒疾病领域的实体和关系,重点关注COVID-19。它支持共享知识表示、数据和元数据标准化与集成,并已在一系列应用中使用。
{"title":"A comprehensive update on CIDO: the community-based coronavirus infectious disease ontology.","authors":"Yongqun He, Hong Yu, Anthony Huffman, Asiyah Yu Lin, Darren A Natale, John Beverley, Ling Zheng, Yehoshua Perl, Zhigang Wang, Yingtong Liu, Edison Ong, Yang Wang, Philip Huang, Long Tran, Jinyang Du, Zalan Shah, Easheta Shah, Roshan Desai, Hsin-Hui Huang, Yujia Tian, Eric Merrell, William D Duncan, Sivaram Arabandi, Lynn M Schriml, Jie Zheng, Anna Maria Masci, Liwei Wang, Hongfang Liu, Fatima Zohra Smaili, Robert Hoehndorf, Zoë May Pendlington, Paola Roncaglia, Xianwei Ye, Jiangan Xie, Yi-Wei Tang, Xiaolin Yang, Suyuan Peng, Luxia Zhang, Luonan Chen, Junguk Hur, Gilbert S Omenn, Brian Athey, Barry Smith","doi":"10.1186/s13326-022-00279-z","DOIUrl":"10.1186/s13326-022-00279-z","url":null,"abstract":"<p><strong>Background: </strong>The current COVID-19 pandemic and the previous SARS/MERS outbreaks of 2003 and 2012 have resulted in a series of major global public health crises. We argue that in the interest of developing effective and safe vaccines and drugs and to better understand coronaviruses and associated disease mechenisms it is necessary to integrate the large and exponentially growing body of heterogeneous coronavirus data. Ontologies play an important role in standard-based knowledge and data representation, integration, sharing, and analysis. Accordingly, we initiated the development of the community-based Coronavirus Infectious Disease Ontology (CIDO) in early 2020.</p><p><strong>Results: </strong>As an Open Biomedical Ontology (OBO) library ontology, CIDO is open source and interoperable with other existing OBO ontologies. CIDO is aligned with the Basic Formal Ontology and Viral Infectious Disease Ontology. CIDO has imported terms from over 30 OBO ontologies. For example, CIDO imports all SARS-CoV-2 protein terms from the Protein Ontology, COVID-19-related phenotype terms from the Human Phenotype Ontology, and over 100 COVID-19 terms for vaccines (both authorized and in clinical trial) from the Vaccine Ontology. CIDO systematically represents variants of SARS-CoV-2 viruses and over 300 amino acid substitutions therein, along with over 300 diagnostic kits and methods. CIDO also describes hundreds of host-coronavirus protein-protein interactions (PPIs) and the drugs that target proteins in these PPIs. CIDO has been used to model COVID-19 related phenomena in areas such as epidemiology. The scope of CIDO was evaluated by visual analysis supported by a summarization network method. CIDO has been used in various applications such as term standardization, inference, natural language processing (NLP) and clinical data integration. We have applied the amino acid variant knowledge present in CIDO to analyze differences between SARS-CoV-2 Delta and Omicron variants. CIDO's integrative host-coronavirus PPIs and drug-target knowledge has also been used to support drug repurposing for COVID-19 treatment.</p><p><strong>Conclusion: </strong>CIDO represents entities and relations in the domain of coronavirus diseases with a special focus on COVID-19. It supports shared knowledge representation, data and metadata standardization and integration, and has been used in a range of applications.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"13 1","pages":"25"},"PeriodicalIF":2.0,"publicationDate":"2022-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9585694/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9587760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Alignment of vaccine codes using an ontology of vaccine descriptions. 使用疫苗描述本体对疫苗代码进行对齐。
IF 1.9 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2022-10-18 DOI: 10.1186/s13326-022-00278-0
Benedikt Fh Becker, Jan A Kors, Erik M van Mulligen, Miriam Cjm Sturkenboom

Background: Vaccine information in European electronic health record (EHR) databases is represented using various clinical and database-specific coding systems and drug vocabularies. The lack of harmonization constitutes a challenge in reusing EHR data in collaborative benefit-risk studies about vaccines.

Methods: We designed an ontology of the properties that are commonly used in vaccine descriptions, called Ontology of Vaccine Descriptions (VaccO), with a dictionary for the analysis of multilingual vaccine descriptions. We implemented five algorithms for the alignment of vaccine coding systems, i.e., the identification of corresponding codes from different coding ystems, based on an analysis of the code descriptors. The algorithms were evaluated by comparing their results with manually created alignments in two reference sets including clinical and database-specific coding systems with multilingual code descriptors.

Results: The best-performing algorithm represented code descriptors as logical statements about entities in the VaccO ontology and used an ontology reasoner to infer common properties and identify corresponding vaccine codes. The evaluation demonstrated excellent performance of the approach (F-scores 0.91 and 0.96).

Conclusion: The VaccO ontology allows the identification, representation, and comparison of heterogeneous descriptions of vaccines. The automatic alignment of vaccine coding systems can accelerate the readiness of EHR databases in collaborative vaccine studies.

背景:欧洲电子健康记录(EHR)数据库中的疫苗信息使用各种临床和数据库特定的编码系统和药物词汇表表示。缺乏统一构成了在疫苗利益-风险合作研究中重新使用电子病历数据的挑战。方法:我们设计了一个疫苗描述中常用属性的本体,称为疫苗描述本体(vaccine description ontology, VaccO),并带有一个用于多语言疫苗描述分析的字典。我们实施了五种对齐疫苗编码系统的算法,即基于对代码描述符的分析,从不同的编码系统中识别相应的代码。通过将其结果与两个参考集(包括具有多语言代码描述符的临床和数据库特定编码系统)中手动创建的比对结果进行比较,对算法进行评估。结果:表现最好的算法将代码描述符表示为关于VaccO本体中实体的逻辑语句,并使用本体推理器来推断共同属性并识别相应的疫苗代码。评价结果表明该方法具有良好的效果(f值分别为0.91和0.96)。结论:VaccO本体允许对疫苗的异质描述进行识别、表示和比较。疫苗编码系统的自动对齐可以加速EHR数据库在协同疫苗研究中的准备工作。
{"title":"Alignment of vaccine codes using an ontology of vaccine descriptions.","authors":"Benedikt Fh Becker,&nbsp;Jan A Kors,&nbsp;Erik M van Mulligen,&nbsp;Miriam Cjm Sturkenboom","doi":"10.1186/s13326-022-00278-0","DOIUrl":"https://doi.org/10.1186/s13326-022-00278-0","url":null,"abstract":"<p><strong>Background: </strong>Vaccine information in European electronic health record (EHR) databases is represented using various clinical and database-specific coding systems and drug vocabularies. The lack of harmonization constitutes a challenge in reusing EHR data in collaborative benefit-risk studies about vaccines.</p><p><strong>Methods: </strong>We designed an ontology of the properties that are commonly used in vaccine descriptions, called Ontology of Vaccine Descriptions (VaccO), with a dictionary for the analysis of multilingual vaccine descriptions. We implemented five algorithms for the alignment of vaccine coding systems, i.e., the identification of corresponding codes from different coding ystems, based on an analysis of the code descriptors. The algorithms were evaluated by comparing their results with manually created alignments in two reference sets including clinical and database-specific coding systems with multilingual code descriptors.</p><p><strong>Results: </strong>The best-performing algorithm represented code descriptors as logical statements about entities in the VaccO ontology and used an ontology reasoner to infer common properties and identify corresponding vaccine codes. The evaluation demonstrated excellent performance of the approach (F-scores 0.91 and 0.96).</p><p><strong>Conclusion: </strong>The VaccO ontology allows the identification, representation, and comparison of heterogeneous descriptions of vaccines. The automatic alignment of vaccine coding systems can accelerate the readiness of EHR databases in collaborative vaccine studies.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":" ","pages":"24"},"PeriodicalIF":1.9,"publicationDate":"2022-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9580193/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40339107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Biomedical Semantics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1