Comparison of false-discovery rates of various decoy databases.

IF 1.6 3区生物学 Q3 BIOCHEMICAL RESEARCH METHODS Proteome Science Pub Date : 2021-09-18 DOI:10.1186/s12953-021-00179-7

Sangjeong Lee, Heejin Park, Hyunwoo Kim

{"title":"Comparison of false-discovery rates of various decoy databases.","authors":"Sangjeong Lee, Heejin Park, Hyunwoo Kim","doi":"10.1186/s12953-021-00179-7","DOIUrl":null,"url":null,"abstract":"Background: The target-decoy strategy effectively estimates the false-discovery rate (FDR) by creating a decoy database with a size identical to that of the target database. Decoy databases are created by various methods, such as, the reverse, pseudo-reverse, shuffle, pseudo-shuffle, and the de Bruijn methods. FDR is sometimes over- or under-estimated depending on which decoy database is used because the ratios of redundant peptides in the target databases are different, that is, the numbers of unique (non-redundancy) peptides in the target and decoy databases differ.Results: We used two protein databases (the UniProt Saccharomyces cerevisiae protein database and the UniProt human protein database) to compare the FDRs of various decoy databases. When the ratio of redundant peptides in the target database is low, the FDR is not overestimated by any decoy construction method. However, if the ratio of redundant peptides in the target database is high, the FDR is overestimated when the (pseudo) shuffle decoy database is used. Additionally, human and S. cerevisiae six frame translation databases, which are large databases, also showed outcomes similar to that from the UniProt human protein database.Conclusion: The FDR must be estimated using the correction factor proposed by Elias and Gygi or that by Kim et al. when (pseudo) shuffle decoy databases are used.","PeriodicalId":20857,"journal":{"name":"Proteome Science","volume":"19 1","pages":"11"},"PeriodicalIF":1.6000,"publicationDate":"2021-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8449453/pdf/","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proteome Science","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12953-021-00179-7","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 4

Abstract

Background: The target-decoy strategy effectively estimates the false-discovery rate (FDR) by creating a decoy database with a size identical to that of the target database. Decoy databases are created by various methods, such as, the reverse, pseudo-reverse, shuffle, pseudo-shuffle, and the de Bruijn methods. FDR is sometimes over- or under-estimated depending on which decoy database is used because the ratios of redundant peptides in the target databases are different, that is, the numbers of unique (non-redundancy) peptides in the target and decoy databases differ.

Results: We used two protein databases (the UniProt Saccharomyces cerevisiae protein database and the UniProt human protein database) to compare the FDRs of various decoy databases. When the ratio of redundant peptides in the target database is low, the FDR is not overestimated by any decoy construction method. However, if the ratio of redundant peptides in the target database is high, the FDR is overestimated when the (pseudo) shuffle decoy database is used. Additionally, human and S. cerevisiae six frame translation databases, which are large databases, also showed outcomes similar to that from the UniProt human protein database.

Conclusion: The FDR must be estimated using the correction factor proposed by Elias and Gygi or that by Kim et al. when (pseudo) shuffle decoy databases are used.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

各种诱饵数据库的错误发现率比较。

背景:目标-诱饵策略通过创建一个大小与目标数据库相同的诱饵数据库，有效地估计错误发现率(FDR)。通过各种方法创建诱饵数据库，例如，反向、伪反向、shuffle、伪shuffle和de Bruijn方法。根据使用的诱饵数据库，FDR有时被高估或低估，因为目标数据库中冗余肽的比例不同，也就是说，目标数据库和诱饵数据库中唯一(非冗余)肽的数量不同。结果:我们使用UniProt酿酒酵母蛋白数据库和UniProt人蛋白数据库对不同诱饵数据库的fdr进行比较。当目标数据库中冗余肽的比例较低时，任何诱饵构建方法都不会高估FDR。然而，如果目标数据库中冗余肽的比例很高，则在使用(伪)洗牌诱饵数据库时，会高估FDR。此外，人类和酿酒酵母六框架翻译数据库也显示了与UniProt人类蛋白质数据库相似的结果。结论:当使用(伪)洗牌诱饵数据库时，必须使用Elias和Gygi或Kim等人提出的校正因子来估计FDR。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proteome Science 生物-生化研究方法

CiteScore

2.90

自引率

0.00%

发文量

审稿时长

4.5 months

期刊介绍： Proteome Science is an open access journal publishing research in the area of systems studies. Proteome Science considers manuscripts based on all aspects of functional and structural proteomics, genomics, metabolomics, systems analysis and metabiome analysis. It encourages the submissions of studies that use large-scale or systems analysis of biomolecules in a cellular, organismal and/or environmental context. Studies that describe novel biological or clinical insights as well as methods-focused studies that describe novel methods for the large-scale study of any and all biomolecules in cells and tissues, such as mass spectrometry, protein and nucleic acid microarrays, genomics, next-generation sequencing and computational algorithms and methods are all within the scope of Proteome Science, as are electron topography, structural methods, proteogenomics, chemical proteomics, stem cell proteomics, organelle proteomics, plant and microbial proteomics. In spite of its name, Proteome Science considers all aspects of large-scale and systems studies because ultimately any mechanism that results in genomic and metabolomic changes will affect or be affected by the proteome. To reflect this intrinsic relationship of biological systems, Proteome Science will consider all such articles.