{"title":"利用欧几里得距离判别法寻找酵母基因组中的蛋白质编码基因","authors":"Chun-Ting Zhang , Ju Wang , Ren Zhang","doi":"10.1016/S0097-8485(01)00107-3","DOIUrl":null,"url":null,"abstract":"<div><p>The Euclid distance discriminant method is used to find protein coding genes in the yeast genome, based on the single nucleotide frequencies at three codon positions in the ORFs. The method is extremely simple and may be extended to find genes in prokaryotic genomes or eukaryotic genomes with less introns. Six-fold cross-validation tests have demonstrated that the accuracy of the algorithm is better than 93%. Based on this, it is found that the total number of protein coding genes in the yeast genome is less than or equal to 5579 only, about 3.8–7.0% less than 5800–6000, which is currently widely accepted. The base compositions at three codon positions are analyzed in details using a graphic method. The result shows that the preference codons adopted by yeast genes are of the R<span><math><mtext>G</mtext><mtext>̄</mtext></math></span>W type, where R, <span><math><mtext>G</mtext><mtext>̄</mtext></math></span> and W indicate the bases of purine, non-G and A/T, whereas the ‘codons’ in the intergenic sequences are of the form NNN, where N denotes any base. This fact constitutes the basis of the algorithm to distinguish between coding and non-coding ORFs in the yeast genome. The names of putative non-coding ORFs are listed here in detail.</p></div>","PeriodicalId":79331,"journal":{"name":"Computers & chemistry","volume":"26 3","pages":"Pages 195-206"},"PeriodicalIF":0.0000,"publicationDate":"2002-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/S0097-8485(01)00107-3","citationCount":"17","resultStr":"{\"title\":\"Using a Euclid distance discriminant method to find protein coding genes in the yeast genome\",\"authors\":\"Chun-Ting Zhang , Ju Wang , Ren Zhang\",\"doi\":\"10.1016/S0097-8485(01)00107-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The Euclid distance discriminant method is used to find protein coding genes in the yeast genome, based on the single nucleotide frequencies at three codon positions in the ORFs. The method is extremely simple and may be extended to find genes in prokaryotic genomes or eukaryotic genomes with less introns. Six-fold cross-validation tests have demonstrated that the accuracy of the algorithm is better than 93%. Based on this, it is found that the total number of protein coding genes in the yeast genome is less than or equal to 5579 only, about 3.8–7.0% less than 5800–6000, which is currently widely accepted. The base compositions at three codon positions are analyzed in details using a graphic method. The result shows that the preference codons adopted by yeast genes are of the R<span><math><mtext>G</mtext><mtext>̄</mtext></math></span>W type, where R, <span><math><mtext>G</mtext><mtext>̄</mtext></math></span> and W indicate the bases of purine, non-G and A/T, whereas the ‘codons’ in the intergenic sequences are of the form NNN, where N denotes any base. This fact constitutes the basis of the algorithm to distinguish between coding and non-coding ORFs in the yeast genome. The names of putative non-coding ORFs are listed here in detail.</p></div>\",\"PeriodicalId\":79331,\"journal\":{\"name\":\"Computers & chemistry\",\"volume\":\"26 3\",\"pages\":\"Pages 195-206\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1016/S0097-8485(01)00107-3\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & chemistry\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0097848501001073\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & chemistry","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0097848501001073","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Using a Euclid distance discriminant method to find protein coding genes in the yeast genome
The Euclid distance discriminant method is used to find protein coding genes in the yeast genome, based on the single nucleotide frequencies at three codon positions in the ORFs. The method is extremely simple and may be extended to find genes in prokaryotic genomes or eukaryotic genomes with less introns. Six-fold cross-validation tests have demonstrated that the accuracy of the algorithm is better than 93%. Based on this, it is found that the total number of protein coding genes in the yeast genome is less than or equal to 5579 only, about 3.8–7.0% less than 5800–6000, which is currently widely accepted. The base compositions at three codon positions are analyzed in details using a graphic method. The result shows that the preference codons adopted by yeast genes are of the RW type, where R, and W indicate the bases of purine, non-G and A/T, whereas the ‘codons’ in the intergenic sequences are of the form NNN, where N denotes any base. This fact constitutes the basis of the algorithm to distinguish between coding and non-coding ORFs in the yeast genome. The names of putative non-coding ORFs are listed here in detail.