Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature.

IF 1.6 3区 工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Journal of Biomedical Semantics Pub Date : 2023-05-30 DOI:10.1186/s13326-023-00287-7
Weixin Xie, Kunjie Fan, Shijun Zhang, Lang Li
{"title":"Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature.","authors":"Weixin Xie, Kunjie Fan, Shijun Zhang, Lang Li","doi":"10.1186/s13326-023-00287-7","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Drug-drug interaction (DDI) information retrieval (IR) is an important natural language process (NLP) task from the PubMed literature. For the first time, active learning (AL) is studied in DDI IR analysis. DDI IR analysis from PubMed abstracts faces the challenges of relatively small positive DDI samples among overwhelmingly large negative samples. Random negative sampling and positive sampling are purposely designed to improve the efficiency of AL analysis. The consistency of random negative sampling and positive sampling is shown in the paper.</p><p><strong>Results: </strong>PubMed abstracts are divided into two pools. Screened pool contains all abstracts that pass the DDI keywords query in PubMed, while unscreened pool includes all the other abstracts. At a prespecified recall rate of 0.95, DDI IR analysis precision is evaluated and compared. In screened pool IR analysis using supporting vector machine (SVM), similarity sampling plus uncertainty sampling improves the precision over uncertainty sampling, from 0.89 to 0.92 respectively. In the unscreened pool IR analysis, the integrated random negative sampling, positive sampling, and similarity sampling improve the precision over uncertainty sampling along, from 0.72 to 0.81 respectively. When we change the SVM to a deep learning method, all sampling schemes consistently improve DDI AL analysis in both screened pool and unscreened pool. Deep learning has significant improvement of precision over SVM, 0.96 vs. 0.92 in screened pool, and 0.90 vs. 0.81 in the unscreened pool, respectively.</p><p><strong>Conclusions: </strong>By integrating various sampling schemes and deep learning algorithms into AL, the DDI IR analysis from literature is significantly improved. The random negative sampling and positive sampling are highly effective methods in improving AL analysis where the positive and negative samples are extremely imbalanced.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"14 1","pages":"5"},"PeriodicalIF":1.6000,"publicationDate":"2023-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10228061/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Semantics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1186/s13326-023-00287-7","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Drug-drug interaction (DDI) information retrieval (IR) is an important natural language process (NLP) task from the PubMed literature. For the first time, active learning (AL) is studied in DDI IR analysis. DDI IR analysis from PubMed abstracts faces the challenges of relatively small positive DDI samples among overwhelmingly large negative samples. Random negative sampling and positive sampling are purposely designed to improve the efficiency of AL analysis. The consistency of random negative sampling and positive sampling is shown in the paper.

Results: PubMed abstracts are divided into two pools. Screened pool contains all abstracts that pass the DDI keywords query in PubMed, while unscreened pool includes all the other abstracts. At a prespecified recall rate of 0.95, DDI IR analysis precision is evaluated and compared. In screened pool IR analysis using supporting vector machine (SVM), similarity sampling plus uncertainty sampling improves the precision over uncertainty sampling, from 0.89 to 0.92 respectively. In the unscreened pool IR analysis, the integrated random negative sampling, positive sampling, and similarity sampling improve the precision over uncertainty sampling along, from 0.72 to 0.81 respectively. When we change the SVM to a deep learning method, all sampling schemes consistently improve DDI AL analysis in both screened pool and unscreened pool. Deep learning has significant improvement of precision over SVM, 0.96 vs. 0.92 in screened pool, and 0.90 vs. 0.81 in the unscreened pool, respectively.

Conclusions: By integrating various sampling schemes and deep learning algorithms into AL, the DDI IR analysis from literature is significantly improved. The random negative sampling and positive sampling are highly effective methods in improving AL analysis where the positive and negative samples are extremely imbalanced.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
多重采样方案和深度学习提高了文献中药物相互作用信息检索分析的主动学习性能。
背景:药物相互作用(DDI)信息检索(IR)是从 PubMed 文献中提取的一项重要的自然语言处理(NLP)任务。在 DDI IR 分析中首次研究了主动学习(AL)。从 PubMed 摘要中进行 DDI IR 分析面临的挑战是,在大量的阴性样本中,DDI 阳性样本相对较少。为了提高 AL 分析的效率,特意设计了随机阴性采样和阳性采样。文中展示了随机阴性取样和阳性取样的一致性:PubMed 摘要分为两个池。筛选池包含所有通过 PubMed DDI 关键词查询的摘要,而未筛选池包含所有其他摘要。在预设召回率为 0.95 的条件下,对 DDI IR 分析的精确度进行评估和比较。在使用支持向量机(SVM)进行的筛选池 IR 分析中,相似性采样加不确定性采样比不确定性采样提高了精确度,分别从 0.89 提高到 0.92。在非筛选池红外分析中,综合随机负采样、正采样和相似性采样比不确定性采样的精度分别从 0.72 提高到 0.81。当我们将 SVM 改为深度学习方法时,所有采样方案在筛选池和非筛选池中都能持续改进 DDI AL 分析。深度学习比 SVM 的精确度有明显提高,在筛选池中分别为 0.96 对 0.92,在未筛选池中分别为 0.90 对 0.81:通过将各种采样方案和深度学习算法整合到 AL 中,文献中的 DDI IR 分析得到了显著改善。在正负样本极不平衡的情况下,随机负向采样和正向采样是改进 AL 分析的高效方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Biomedical Semantics
Journal of Biomedical Semantics MATHEMATICAL & COMPUTATIONAL BIOLOGY-
CiteScore
4.20
自引率
5.30%
发文量
28
审稿时长
30 weeks
期刊介绍: Journal of Biomedical Semantics addresses issues of semantic enrichment and semantic processing in the biomedical domain. The scope of the journal covers two main areas: Infrastructure for biomedical semantics: focusing on semantic resources and repositories, meta-data management and resource description, knowledge representation and semantic frameworks, the Biomedical Semantic Web, and semantic interoperability. Semantic mining, annotation, and analysis: focusing on approaches and applications of semantic resources; and tools for investigation, reasoning, prediction, and discoveries in biomedicine.
期刊最新文献
Dynamic Retrieval Augmented Generation of Ontologies using Artificial Intelligence (DRAGON-AI). MeSH2Matrix: combining MeSH keywords and machine learning for biomedical relation classification based on PubMed. Annotation of epilepsy clinic letters for natural language processing An extensible and unifying approach to retrospective clinical data modeling: the BrainTeaser Ontology. Concretizing plan specifications as realizables within the OBO foundry.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1