Semi-automated literature mining to identify putative biomarkers of disease from multiple biofluids.

Journal of clinical bioinformatics Pub Date : 2014-10-23 eCollection Date: 2014-01-01 DOI:10.1186/2043-9113-4-13
Rick Jordan, Shyam Visweswaran, Vanathi Gopalakrishnan
{"title":"Semi-automated literature mining to identify putative biomarkers of disease from multiple biofluids.","authors":"Rick Jordan,&nbsp;Shyam Visweswaran,&nbsp;Vanathi Gopalakrishnan","doi":"10.1186/2043-9113-4-13","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Computational methods for mining of biomedical literature can be useful in augmenting manual searches of the literature using keywords for disease-specific biomarker discovery from biofluids. In this work, we develop and apply a semi-automated literature mining method to mine abstracts obtained from PubMed to discover putative biomarkers of breast and lung cancers in specific biofluids.</p><p><strong>Methodology: </strong>A positive set of abstracts was defined by the terms 'breast cancer' and 'lung cancer' in conjunction with 14 separate 'biofluids' (bile, blood, breastmilk, cerebrospinal fluid, mucus, plasma, saliva, semen, serum, synovial fluid, stool, sweat, tears, and urine), while a negative set of abstracts was defined by the terms '(biofluid) NOT breast cancer' or '(biofluid) NOT lung cancer.' More than 5.3 million total abstracts were obtained from PubMed and examined for biomarker-disease-biofluid associations (34,296 positive and 2,653,396 negative for breast cancer; 28,355 positive and 2,595,034 negative for lung cancer). Biological entities such as genes and proteins were tagged using ABNER, and processed using Python scripts to produce a list of putative biomarkers. Z-scores were calculated, ranked, and used to determine significance of putative biomarkers found. Manual verification of relevant abstracts was performed to assess our method's performance.</p><p><strong>Results: </strong>Biofluid-specific markers were identified from the literature, assigned relevance scores based on frequency of occurrence, and validated using known biomarker lists and/or databases for lung and breast cancer [NCBI's On-line Mendelian Inheritance in Man (OMIM), Cancer Gene annotation server for cancer genomics (CAGE), NCBI's Genes & Disease, NCI's Early Detection Research Network (EDRN), and others]. The specificity of each marker for a given biofluid was calculated, and the performance of our semi-automated literature mining method assessed for breast and lung cancer.</p><p><strong>Conclusions: </strong>We developed a semi-automated process for determining a list of putative biomarkers for breast and lung cancer. New knowledge is presented in the form of biomarker lists; ranked, newly discovered biomarker-disease-biofluid relationships; and biomarker specificity across biofluids.</p>","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2014-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-4-13","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of clinical bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/2043-9113-4-13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2014/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Background: Computational methods for mining of biomedical literature can be useful in augmenting manual searches of the literature using keywords for disease-specific biomarker discovery from biofluids. In this work, we develop and apply a semi-automated literature mining method to mine abstracts obtained from PubMed to discover putative biomarkers of breast and lung cancers in specific biofluids.

Methodology: A positive set of abstracts was defined by the terms 'breast cancer' and 'lung cancer' in conjunction with 14 separate 'biofluids' (bile, blood, breastmilk, cerebrospinal fluid, mucus, plasma, saliva, semen, serum, synovial fluid, stool, sweat, tears, and urine), while a negative set of abstracts was defined by the terms '(biofluid) NOT breast cancer' or '(biofluid) NOT lung cancer.' More than 5.3 million total abstracts were obtained from PubMed and examined for biomarker-disease-biofluid associations (34,296 positive and 2,653,396 negative for breast cancer; 28,355 positive and 2,595,034 negative for lung cancer). Biological entities such as genes and proteins were tagged using ABNER, and processed using Python scripts to produce a list of putative biomarkers. Z-scores were calculated, ranked, and used to determine significance of putative biomarkers found. Manual verification of relevant abstracts was performed to assess our method's performance.

Results: Biofluid-specific markers were identified from the literature, assigned relevance scores based on frequency of occurrence, and validated using known biomarker lists and/or databases for lung and breast cancer [NCBI's On-line Mendelian Inheritance in Man (OMIM), Cancer Gene annotation server for cancer genomics (CAGE), NCBI's Genes & Disease, NCI's Early Detection Research Network (EDRN), and others]. The specificity of each marker for a given biofluid was calculated, and the performance of our semi-automated literature mining method assessed for breast and lung cancer.

Conclusions: We developed a semi-automated process for determining a list of putative biomarkers for breast and lung cancer. New knowledge is presented in the form of biomarker lists; ranked, newly discovered biomarker-disease-biofluid relationships; and biomarker specificity across biofluids.

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
半自动文献挖掘,从多种生物体液中识别假定的疾病生物标志物。
背景:生物医学文献挖掘的计算方法可用于增加使用关键字从生物流体中发现疾病特异性生物标志物的文献的手动搜索。在这项工作中,我们开发并应用了一种半自动文献挖掘方法来挖掘从PubMed获得的摘要,以发现特定生物体液中乳腺癌和肺癌的推定生物标志物。方法:阳性摘要用术语“乳腺癌”和“肺癌”以及14种单独的“生物液体”(胆汁、血液、母乳、脑脊液、粘液、血浆、唾液、精液、血清、滑液、粪便、汗液、眼泪和尿液)来定义,而阴性摘要用术语“(生物液体)非乳腺癌”或“(生物液体)非肺癌”来定义。从PubMed获得了530多万份摘要,并检查了生物标志物-疾病-生物流体相关性(乳腺癌阳性34296例,阴性2653396例;28,355例肺癌呈阳性,2,595,034例呈阴性)。使用ABNER对基因和蛋白质等生物实体进行标记,并使用Python脚本进行处理,以产生假定的生物标记物列表。计算z分数,排序,并用于确定发现的假定生物标志物的显著性。对相关摘要进行了人工验证,以评估我们的方法的性能。结果:从文献中识别出生物液体特异性标志物,根据发生频率分配相关性评分,并使用已知的生物标志物列表和/或肺癌和乳腺癌数据库[NCBI的在线孟德尔遗传(OMIM),癌症基因组学的癌症基因注释服务器(CAGE), NCBI的基因与疾病,NCI的早期检测研究网络(EDRN)等]进行验证。计算了给定生物流体的每个标记物的特异性,并评估了我们的半自动文献挖掘方法在乳腺癌和肺癌方面的性能。结论:我们开发了一种半自动化的过程来确定乳腺癌和肺癌的假定生物标志物列表。新知识以生物标志物列表的形式呈现;排名,新发现的生物标志物-疾病-生物流体关系;以及生物流体的生物标志物特异性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Clinical research informatics (CRI): overview over new tools and services First Clinical Research Informatics (CRI) Solutions Day: advanced IT support from EU projects for clinical trials Mobile eHealth solution (ePRO) EHR4CR local workbench TRANSFoRm Data quality tool
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1