{"title":"用于z值计算的增量算法","authors":"J.-C. Aude , A. Louis","doi":"10.1016/S0097-8485(02)00003-7","DOIUrl":null,"url":null,"abstract":"<div><p>The <em>Z</em>-value (Comput. Chem. 23 (1999) 333) is an extension of the <em>Z</em>-score that is classically used to compare sets of biological sequences. The <em>Z</em>-value has been successfully used to handle complete genome studies as well as analyze large sets of proteins. The <em>Z</em>-value computation is based on a Monte Carlo approach to estimate the statistical significance of a Smith & Waterman alignment score. Comet et al. (Comput. Chem. 23 (1999) 333) have shown that, in contrast to the alignment score, the <em>Z</em>-value largely reduces the bias due to the lengths and compositions of the sequences. They also described an estimator of the deviation of <em>Z</em>-values, that we extend in this paper in order to optimize <em>Z</em>-values computation. The <em>incremental</em> algorithm described here provides two characteristics which are usually incompatible: (i) it improves the accuracy of <em>Z</em>-values calculation; (ii) it reduces the time complexity (this algorithm has been named <em>incremental</em> because it iteratively adds random sequences to the Monte-Carlo process when needed). Results are presented, originating from the all-by-all comparison of the proteins from <em>Saccharomyces cerevisiae</em> and <em>Escherichia coli</em>.</p></div>","PeriodicalId":79331,"journal":{"name":"Computers & chemistry","volume":"26 5","pages":"Pages 402-410"},"PeriodicalIF":0.0000,"publicationDate":"2002-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/S0097-8485(02)00003-7","citationCount":"12","resultStr":"{\"title\":\"An incremental algorithm for Z-value computations\",\"authors\":\"J.-C. Aude , A. Louis\",\"doi\":\"10.1016/S0097-8485(02)00003-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The <em>Z</em>-value (Comput. Chem. 23 (1999) 333) is an extension of the <em>Z</em>-score that is classically used to compare sets of biological sequences. The <em>Z</em>-value has been successfully used to handle complete genome studies as well as analyze large sets of proteins. The <em>Z</em>-value computation is based on a Monte Carlo approach to estimate the statistical significance of a Smith & Waterman alignment score. Comet et al. (Comput. Chem. 23 (1999) 333) have shown that, in contrast to the alignment score, the <em>Z</em>-value largely reduces the bias due to the lengths and compositions of the sequences. They also described an estimator of the deviation of <em>Z</em>-values, that we extend in this paper in order to optimize <em>Z</em>-values computation. The <em>incremental</em> algorithm described here provides two characteristics which are usually incompatible: (i) it improves the accuracy of <em>Z</em>-values calculation; (ii) it reduces the time complexity (this algorithm has been named <em>incremental</em> because it iteratively adds random sequences to the Monte-Carlo process when needed). Results are presented, originating from the all-by-all comparison of the proteins from <em>Saccharomyces cerevisiae</em> and <em>Escherichia coli</em>.</p></div>\",\"PeriodicalId\":79331,\"journal\":{\"name\":\"Computers & chemistry\",\"volume\":\"26 5\",\"pages\":\"Pages 402-410\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1016/S0097-8485(02)00003-7\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & chemistry\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0097848502000037\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & chemistry","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0097848502000037","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The Z-value (Comput. Chem. 23 (1999) 333) is an extension of the Z-score that is classically used to compare sets of biological sequences. The Z-value has been successfully used to handle complete genome studies as well as analyze large sets of proteins. The Z-value computation is based on a Monte Carlo approach to estimate the statistical significance of a Smith & Waterman alignment score. Comet et al. (Comput. Chem. 23 (1999) 333) have shown that, in contrast to the alignment score, the Z-value largely reduces the bias due to the lengths and compositions of the sequences. They also described an estimator of the deviation of Z-values, that we extend in this paper in order to optimize Z-values computation. The incremental algorithm described here provides two characteristics which are usually incompatible: (i) it improves the accuracy of Z-values calculation; (ii) it reduces the time complexity (this algorithm has been named incremental because it iteratively adds random sequences to the Monte-Carlo process when needed). Results are presented, originating from the all-by-all comparison of the proteins from Saccharomyces cerevisiae and Escherichia coli.