Comparison of false-discovery rates of various decoy databases.

IF 2.1 3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Proteome Science Pub Date : 2021-09-18 DOI:10.1186/s12953-021-00179-7
Sangjeong Lee, Heejin Park, Hyunwoo Kim
{"title":"Comparison of false-discovery rates of various decoy databases.","authors":"Sangjeong Lee,&nbsp;Heejin Park,&nbsp;Hyunwoo Kim","doi":"10.1186/s12953-021-00179-7","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The target-decoy strategy effectively estimates the false-discovery rate (FDR) by creating a decoy database with a size identical to that of the target database. Decoy databases are created by various methods, such as, the reverse, pseudo-reverse, shuffle, pseudo-shuffle, and the de Bruijn methods. FDR is sometimes over- or under-estimated depending on which decoy database is used because the ratios of redundant peptides in the target databases are different, that is, the numbers of unique (non-redundancy) peptides in the target and decoy databases differ.</p><p><strong>Results: </strong>We used two protein databases (the UniProt Saccharomyces cerevisiae protein database and the UniProt human protein database) to compare the FDRs of various decoy databases. When the ratio of redundant peptides in the target database is low, the FDR is not overestimated by any decoy construction method. However, if the ratio of redundant peptides in the target database is high, the FDR is overestimated when the (pseudo) shuffle decoy database is used. Additionally, human and S. cerevisiae six frame translation databases, which are large databases, also showed outcomes similar to that from the UniProt human protein database.</p><p><strong>Conclusion: </strong>The FDR must be estimated using the correction factor proposed by Elias and Gygi or that by Kim et al. when (pseudo) shuffle decoy databases are used.</p>","PeriodicalId":20857,"journal":{"name":"Proteome Science","volume":"19 1","pages":"11"},"PeriodicalIF":2.1000,"publicationDate":"2021-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8449453/pdf/","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proteome Science","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12953-021-00179-7","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 4

Abstract

Background: The target-decoy strategy effectively estimates the false-discovery rate (FDR) by creating a decoy database with a size identical to that of the target database. Decoy databases are created by various methods, such as, the reverse, pseudo-reverse, shuffle, pseudo-shuffle, and the de Bruijn methods. FDR is sometimes over- or under-estimated depending on which decoy database is used because the ratios of redundant peptides in the target databases are different, that is, the numbers of unique (non-redundancy) peptides in the target and decoy databases differ.

Results: We used two protein databases (the UniProt Saccharomyces cerevisiae protein database and the UniProt human protein database) to compare the FDRs of various decoy databases. When the ratio of redundant peptides in the target database is low, the FDR is not overestimated by any decoy construction method. However, if the ratio of redundant peptides in the target database is high, the FDR is overestimated when the (pseudo) shuffle decoy database is used. Additionally, human and S. cerevisiae six frame translation databases, which are large databases, also showed outcomes similar to that from the UniProt human protein database.

Conclusion: The FDR must be estimated using the correction factor proposed by Elias and Gygi or that by Kim et al. when (pseudo) shuffle decoy databases are used.

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
各种诱饵数据库的错误发现率比较。
背景:目标-诱饵策略通过创建一个大小与目标数据库相同的诱饵数据库,有效地估计错误发现率(FDR)。通过各种方法创建诱饵数据库,例如,反向、伪反向、shuffle、伪shuffle和de Bruijn方法。根据使用的诱饵数据库,FDR有时被高估或低估,因为目标数据库中冗余肽的比例不同,也就是说,目标数据库和诱饵数据库中唯一(非冗余)肽的数量不同。结果:我们使用UniProt酿酒酵母蛋白数据库和UniProt人蛋白数据库对不同诱饵数据库的fdr进行比较。当目标数据库中冗余肽的比例较低时,任何诱饵构建方法都不会高估FDR。然而,如果目标数据库中冗余肽的比例很高,则在使用(伪)洗牌诱饵数据库时,会高估FDR。此外,人类和酿酒酵母六框架翻译数据库也显示了与UniProt人类蛋白质数据库相似的结果。结论:当使用(伪)洗牌诱饵数据库时,必须使用Elias和Gygi或Kim等人提出的校正因子来估计FDR。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Proteome Science
Proteome Science 生物-生化研究方法
CiteScore
2.90
自引率
0.00%
发文量
17
审稿时长
4.5 months
期刊介绍: Proteome Science is an open access journal publishing research in the area of systems studies. Proteome Science considers manuscripts based on all aspects of functional and structural proteomics, genomics, metabolomics, systems analysis and metabiome analysis. It encourages the submissions of studies that use large-scale or systems analysis of biomolecules in a cellular, organismal and/or environmental context. Studies that describe novel biological or clinical insights as well as methods-focused studies that describe novel methods for the large-scale study of any and all biomolecules in cells and tissues, such as mass spectrometry, protein and nucleic acid microarrays, genomics, next-generation sequencing and computational algorithms and methods are all within the scope of Proteome Science, as are electron topography, structural methods, proteogenomics, chemical proteomics, stem cell proteomics, organelle proteomics, plant and microbial proteomics. In spite of its name, Proteome Science considers all aspects of large-scale and systems studies because ultimately any mechanism that results in genomic and metabolomic changes will affect or be affected by the proteome. To reflect this intrinsic relationship of biological systems, Proteome Science will consider all such articles.
期刊最新文献
Metabolism-related proteins as biomarkers for predicting prognosis in polycystic ovary syndrome. LC-MS-based quantitation of proteomic changes induced by Norcantharidin in MTB-Treated macrophages. Identification of mRNA biomarkers in extremely early hypertensive intracerebral hemorrhage (HICH). Multi-targeted olink proteomics analyses of cerebrospinal fluid from patients with aneurysmal subarachnoid hemorrhage. Genome-wide computational analysis of the dirigent gene family in Solanum lycopersicum.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1