Automatic Labeling for Gene-Disease Associations through Distant Supervision

Fei Teng, Meng Bai, Tian-Jie Li
{"title":"Automatic Labeling for Gene-Disease Associations through Distant Supervision","authors":"Fei Teng, Meng Bai, Tian-Jie Li","doi":"10.1109/ISKE47853.2019.9170268","DOIUrl":null,"url":null,"abstract":"Associating genes with diseases is a fundamental challenge in human health with applications of understanding disease properties and developing precision medicine. Over the past decades, biomedical articles increase explosively, which contain a great number of gene-disease associations (GDAs). Association extraction requires annotated corpus of high accuracy, but manual labeling is time consuming and labor intensive. This paper proposes a distant supervision-based method, to automatically label corpus for GDAs extraction. Compared with the manually annotated gold corpus, the automatic labeled corpus has much larger scale and better quality. It improves the performance of state-of-the-art extraction models, with AUC of 0.96, and F1 of 90%. To the best of our knowledge, this is the first study of automatic labeling GDAs in the field of precision medicine. We extracted GDAs using new corpora from 115,261 PubMed abstracts about 29 lung cancers, and finally discovered 296 new genes/proteins related to lung cancers. These findings indicate new directions for drug design.","PeriodicalId":399084,"journal":{"name":"2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISKE47853.2019.9170268","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Associating genes with diseases is a fundamental challenge in human health with applications of understanding disease properties and developing precision medicine. Over the past decades, biomedical articles increase explosively, which contain a great number of gene-disease associations (GDAs). Association extraction requires annotated corpus of high accuracy, but manual labeling is time consuming and labor intensive. This paper proposes a distant supervision-based method, to automatically label corpus for GDAs extraction. Compared with the manually annotated gold corpus, the automatic labeled corpus has much larger scale and better quality. It improves the performance of state-of-the-art extraction models, with AUC of 0.96, and F1 of 90%. To the best of our knowledge, this is the first study of automatic labeling GDAs in the field of precision medicine. We extracted GDAs using new corpora from 115,261 PubMed abstracts about 29 lung cancers, and finally discovered 296 new genes/proteins related to lung cancers. These findings indicate new directions for drug design.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过远程监督的基因疾病关联自动标记
将基因与疾病联系起来是人类健康的一个基本挑战,它可以应用于理解疾病特性和发展精准医学。在过去的几十年里,生物医学文章爆炸式增长,其中包含了大量的基因-疾病关联(GDAs)。关联提取要求标注语料的准确性高,而人工标注耗时且费力。本文提出了一种基于远程监督的自动标注语料库的方法。与人工标注的黄金语料库相比,自动标注的语料库具有更大的规模和更高的质量。它提高了最先进的提取模型的性能,AUC为0.96,F1为90%。据我们所知,这是在精准医学领域首次进行自动标记gda的研究。我们利用新语料库从29种肺癌的115,261篇PubMed摘要中提取GDAs,最终发现296个与肺癌相关的新基因/蛋白。这些发现为药物设计指明了新的方向。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Incremental Learning for Transductive SVMs ISKE 2019 Table of Contents Consensus: The Minimum Cost Model based Robust Optimization A Learned Clause Deletion Strategy Based on Distance Ratio Effects of Real Estate Regulation Policy of Beijing Based on Discrete Dependent Variables Model
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1