{"title":"一种基于词汇的形式概念分析方法识别NCI词库中的缺失概念。","authors":"Fengbo Zheng, Licong Cui","doi":"10.1109/bibm49941.2020.9313186","DOIUrl":null,"url":null,"abstract":"<p><p>Biomedical terminologies have been increasingly used in modern biomedical research and applications to facilitate data management and ensure semantic interoperability. As part of the evolution process, new concepts are regularly added to biomedical terminologies in response to the evolving domain knowledge and emerging applications. Most existing concept enrichment methods suggest new concepts via directly importing knowledge from external sources. In this paper, we introduced a lexical method based on formal concept analysis (FCA) to identify potentially missing concepts in a given terminology by leveraging its intrinsic knowledge - concept names. We first construct the FCA formal context based on the lexical features of concepts. Then we perform multistage intersection to formalize new concepts and detect potentially missing concepts. We applied our method to the <i>Disease or Disorder</i> sub-hierarchy in the National Cancer Institute (NCI) Thesaurus (19.08d version) and identified a total of 8,983 potentially missing concepts. As a preliminary evaluation of our method to validate the potentially missing concepts, we further checked whether they were included in any external source terminology in the Unified Medical Language System (UMLS). The result showed that 592 out of 8,937 potentially missing concepts were found in the UMLS.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2020 ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/bibm49941.2020.9313186","citationCount":"7","resultStr":"{\"title\":\"A Lexical-based Formal Concept Analysis Method to Identify Missing Concepts in the NCI Thesaurus.\",\"authors\":\"Fengbo Zheng, Licong Cui\",\"doi\":\"10.1109/bibm49941.2020.9313186\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Biomedical terminologies have been increasingly used in modern biomedical research and applications to facilitate data management and ensure semantic interoperability. As part of the evolution process, new concepts are regularly added to biomedical terminologies in response to the evolving domain knowledge and emerging applications. Most existing concept enrichment methods suggest new concepts via directly importing knowledge from external sources. In this paper, we introduced a lexical method based on formal concept analysis (FCA) to identify potentially missing concepts in a given terminology by leveraging its intrinsic knowledge - concept names. We first construct the FCA formal context based on the lexical features of concepts. Then we perform multistage intersection to formalize new concepts and detect potentially missing concepts. We applied our method to the <i>Disease or Disorder</i> sub-hierarchy in the National Cancer Institute (NCI) Thesaurus (19.08d version) and identified a total of 8,983 potentially missing concepts. As a preliminary evaluation of our method to validate the potentially missing concepts, we further checked whether they were included in any external source terminology in the Unified Medical Language System (UMLS). The result showed that 592 out of 8,937 potentially missing concepts were found in the UMLS.</p>\",\"PeriodicalId\":74563,\"journal\":{\"name\":\"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine\",\"volume\":\"2020 \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1109/bibm49941.2020.9313186\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/bibm49941.2020.9313186\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2021/1/13 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/bibm49941.2020.9313186","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2021/1/13 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
A Lexical-based Formal Concept Analysis Method to Identify Missing Concepts in the NCI Thesaurus.
Biomedical terminologies have been increasingly used in modern biomedical research and applications to facilitate data management and ensure semantic interoperability. As part of the evolution process, new concepts are regularly added to biomedical terminologies in response to the evolving domain knowledge and emerging applications. Most existing concept enrichment methods suggest new concepts via directly importing knowledge from external sources. In this paper, we introduced a lexical method based on formal concept analysis (FCA) to identify potentially missing concepts in a given terminology by leveraging its intrinsic knowledge - concept names. We first construct the FCA formal context based on the lexical features of concepts. Then we perform multistage intersection to formalize new concepts and detect potentially missing concepts. We applied our method to the Disease or Disorder sub-hierarchy in the National Cancer Institute (NCI) Thesaurus (19.08d version) and identified a total of 8,983 potentially missing concepts. As a preliminary evaluation of our method to validate the potentially missing concepts, we further checked whether they were included in any external source terminology in the Unified Medical Language System (UMLS). The result showed that 592 out of 8,937 potentially missing concepts were found in the UMLS.