Akihiko Kishimoto, Yukio Ono, K. Murakawa, T. Ishibashi, A. Wakamatsu, K. Kanehori, N. Nomura, T. Isogai, M. Yohda, S. Sugano
{"title":"Classification and characterization of human full-length cDNA clones that are difficult to sequence","authors":"Akihiko Kishimoto, Yukio Ono, K. Murakawa, T. Ishibashi, A. Wakamatsu, K. Kanehori, N. Nomura, T. Isogai, M. Yohda, S. Sugano","doi":"10.1273/CBIJ.8.1","DOIUrl":null,"url":null,"abstract":"In the Full-length Human cDNA Sequencing Project, 30,160 cDNA were sequenced. Among them, our group performed sequencing of 3,588 cDNAs, mainly using the primer walking method. The sequences achieved an average Phrap score of 76, which means the average of expected sequence accuracy was 99.9999975%, by sequencing of both strands with the criterion of a Phrap score over 30. In spite of the extremely high sequence reliability, we met with difficulty in sequencing 52 cDNAs, which are termed undecipherable cDNAs. cDNAs of long repeats were considered as a possible source of sequencing difficulty; their maximum repeat length sequenced by the primer walking method was 530 bp, without using the random method, and 81% of long repeat sequences remained in the ORFs. In single repeat regions, the insertion/deletion rates were much larger than in the usual regions. The fraction of SINE/Alu repeats in the cDNAs was 5.4%, half of the fraction of the human genome. The fraction of SINE/Alu in undecipherable cDNAs was up to 10%, the same level of the human genome.","PeriodicalId":40659,"journal":{"name":"Chem-Bio Informatics Journal","volume":"23 1","pages":"1-13"},"PeriodicalIF":0.4000,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chem-Bio Informatics Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1273/CBIJ.8.1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
In the Full-length Human cDNA Sequencing Project, 30,160 cDNA were sequenced. Among them, our group performed sequencing of 3,588 cDNAs, mainly using the primer walking method. The sequences achieved an average Phrap score of 76, which means the average of expected sequence accuracy was 99.9999975%, by sequencing of both strands with the criterion of a Phrap score over 30. In spite of the extremely high sequence reliability, we met with difficulty in sequencing 52 cDNAs, which are termed undecipherable cDNAs. cDNAs of long repeats were considered as a possible source of sequencing difficulty; their maximum repeat length sequenced by the primer walking method was 530 bp, without using the random method, and 81% of long repeat sequences remained in the ORFs. In single repeat regions, the insertion/deletion rates were much larger than in the usual regions. The fraction of SINE/Alu repeats in the cDNAs was 5.4%, half of the fraction of the human genome. The fraction of SINE/Alu in undecipherable cDNAs was up to 10%, the same level of the human genome.