Learning Effective Distributed Representation of Complex Biomedical Concepts

Khai Nguyen, R. Ichise
{"title":"Learning Effective Distributed Representation of Complex Biomedical Concepts","authors":"Khai Nguyen, R. Ichise","doi":"10.1109/BIBE.2018.00073","DOIUrl":null,"url":null,"abstract":"Word embedding is the state-of-the-art representation to capture semantic information of terms. It benefits a wide range of natural language processing and related applications, not only in general fields of artificial intelligence but also in bioinformatics. Although recent efforts of using word embedding to represent medical concepts have provided remarkable analyses, many essential problems remain unsolved. Examples include representation of complex concepts (i.e., formed by multiple tokens), leveraging of a large corpus to maximize the trainable concepts, and downstream analyses on a biomedical-related dataset. Our study focused on training effective representations for biomedical concepts including complex ones. We used an efficient technique to index all possible concepts of UMLS thesaurus (Unified Medical Language System) in a huge corpus of 15,4 billion tokens. By this way, we can obtain the vector representations for more than 650,000 concepts, the largest ever reported resource to date. Furthermore, evaluations of trained vectors on retrieval task show superior performance compared to recent studies.","PeriodicalId":127507,"journal":{"name":"2018 IEEE 18th International Conference on Bioinformatics and Bioengineering (BIBE)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 18th International Conference on Bioinformatics and Bioengineering (BIBE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBE.2018.00073","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Word embedding is the state-of-the-art representation to capture semantic information of terms. It benefits a wide range of natural language processing and related applications, not only in general fields of artificial intelligence but also in bioinformatics. Although recent efforts of using word embedding to represent medical concepts have provided remarkable analyses, many essential problems remain unsolved. Examples include representation of complex concepts (i.e., formed by multiple tokens), leveraging of a large corpus to maximize the trainable concepts, and downstream analyses on a biomedical-related dataset. Our study focused on training effective representations for biomedical concepts including complex ones. We used an efficient technique to index all possible concepts of UMLS thesaurus (Unified Medical Language System) in a huge corpus of 15,4 billion tokens. By this way, we can obtain the vector representations for more than 650,000 concepts, the largest ever reported resource to date. Furthermore, evaluations of trained vectors on retrieval task show superior performance compared to recent studies.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
学习复杂生物医学概念的有效分布式表示
词嵌入是一种捕捉词的语义信息的最先进的表示方法。它有利于广泛的自然语言处理和相关应用,不仅在人工智能的一般领域,而且在生物信息学。虽然最近使用词嵌入来表示医学概念的努力提供了显著的分析,但许多基本问题仍未解决。示例包括复杂概念的表示(即由多个令牌组成),利用大型语料库来最大化可训练的概念,以及对生物医学相关数据集的下游分析。我们的研究重点是训练生物医学概念的有效表征,包括复杂的生物医学概念。我们使用了一种高效的技术来索引UMLS同义词库(统一医学语言系统)中所有可能的概念,这些概念包含154亿个标记。通过这种方式,我们可以获得超过65万个概念的向量表示,这是迄今为止报道的最大的资源。此外,训练后的向量在检索任务上的评价与目前的研究相比,表现出更好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Nonlinear CMOS Image Sensor with SOC Integrated Local Contrast Stretch for Bio-Microfluidic Imaging [Regular Paper] Recovering a Chemotopic Feature Space from a Group of Fruit Fly Antenna Chemosensors A Systems Biology Approach to Model Gene-Gene Interaction for Childhood Sarcomas Finite Element Modelling for the Detection of Breast Tumor [Regular Paper] Implementation of an Ultrasound Platform for Proposed Photoacoustic Image Reconstruction Algorithm
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1