Elmira Katanchi Kheiavi, A. Ahmadikhah, A. M. Mosammam
{"title":"水稻基因组的挖掘。索引)用于长回文序列的检测和表征","authors":"Elmira Katanchi Kheiavi, A. Ahmadikhah, A. M. Mosammam","doi":"10.4172/2153-0602.1000199","DOIUrl":null,"url":null,"abstract":"Because the rice genome has been sequenced entirely, search to find specific features at genome-wide scale is of high importance for studying genome evolution and subsequent applications. Palindromic sequences are important DNA motifs involved in the regulation of different cellular processes and are a potential source of genetic instability. A genome mining approach was applied to detect and characterize the long palindromic sequences in the rice genome. All palindromes, defined as identical inverted repeats with spacer DNA, could be analyzed and sorted according to their frequency, size, GC content, compact index etc. The results showed that the overall palindrome frequency is high in rice genome (nearly 51000 palindromes), that totally cover 41.4% of nuclear genome of rice, with highest and lowest number of palindromes, respectively belongs to chromosome 1 and 12. Palindrome number could well explain the rice chromosome expansion (R2>92%). Average GC content of the palindromic sequences is 42.1%, indicating AT-richness and hence, the low-complexity of palindromic sequences. The results also showed different compact indices of palindromes in different chromosomes (43.2 per cM in chromosome 8 and 34.5 per cM in chromosome 3, as highest and lowest, respectively). Co-location analysis showed that more than 20% of rice genes overlapped with palindromic regions, mainly concentrating on chromosomal arms. Based on the results of this research it can be concluded that the rice genome is rich in long palindromic sequences that triggered most variation during evolution. Generally, both sections of palindromic sequences including stems and loops are AT-rich, indicating that these regions locate in the low-complexity segments of the rice chromosomes.","PeriodicalId":15630,"journal":{"name":"Journal of Data Mining in Genomics & Proteomics","volume":"103 1","pages":"1-10"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Genome Mining of Rice ( Oryza sativa subsp. indica ) for Detection and Characterization of Long Palindromic Sequences\",\"authors\":\"Elmira Katanchi Kheiavi, A. Ahmadikhah, A. M. Mosammam\",\"doi\":\"10.4172/2153-0602.1000199\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Because the rice genome has been sequenced entirely, search to find specific features at genome-wide scale is of high importance for studying genome evolution and subsequent applications. Palindromic sequences are important DNA motifs involved in the regulation of different cellular processes and are a potential source of genetic instability. A genome mining approach was applied to detect and characterize the long palindromic sequences in the rice genome. All palindromes, defined as identical inverted repeats with spacer DNA, could be analyzed and sorted according to their frequency, size, GC content, compact index etc. The results showed that the overall palindrome frequency is high in rice genome (nearly 51000 palindromes), that totally cover 41.4% of nuclear genome of rice, with highest and lowest number of palindromes, respectively belongs to chromosome 1 and 12. Palindrome number could well explain the rice chromosome expansion (R2>92%). Average GC content of the palindromic sequences is 42.1%, indicating AT-richness and hence, the low-complexity of palindromic sequences. The results also showed different compact indices of palindromes in different chromosomes (43.2 per cM in chromosome 8 and 34.5 per cM in chromosome 3, as highest and lowest, respectively). Co-location analysis showed that more than 20% of rice genes overlapped with palindromic regions, mainly concentrating on chromosomal arms. Based on the results of this research it can be concluded that the rice genome is rich in long palindromic sequences that triggered most variation during evolution. Generally, both sections of palindromic sequences including stems and loops are AT-rich, indicating that these regions locate in the low-complexity segments of the rice chromosomes.\",\"PeriodicalId\":15630,\"journal\":{\"name\":\"Journal of Data Mining in Genomics & Proteomics\",\"volume\":\"103 1\",\"pages\":\"1-10\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Data Mining in Genomics & Proteomics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4172/2153-0602.1000199\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Data Mining in Genomics & Proteomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4172/2153-0602.1000199","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
摘要
由于水稻基因组已经完全测序,在全基因组尺度上寻找特定特征对于研究基因组进化及其后续应用具有重要意义。回文序列是参与调控不同细胞过程的重要DNA基序,是遗传不稳定的潜在来源。采用基因组挖掘方法对水稻基因组中的长回文序列进行检测和表征。所有回文,定义为与间隔DNA相同的反向重复序列,可以根据其频率,大小,GC含量,紧凑索引等进行分析和排序。结果表明,水稻基因组总体回文频率较高(近51000个回文),覆盖了41.4%的水稻核基因组,回文频率最高的位于1号染色体,回文频率最低的位于12号染色体。回文数可以很好地解释水稻染色体扩增(R2>92%)。回文序列的平均GC含量为42.1%,表明回文序列具有at丰富度,复杂度较低。不同染色体回文的密实指数也不同(8号染色体最高为43.2 / cM, 3号染色体最低为34.5 / cM)。共定位分析表明,水稻基因中有20%以上与回文区重叠,主要集中在染色体臂上。根据本研究的结果,可以得出结论,水稻基因组富含在进化过程中引发大多数变异的长回文序列。一般来说,包括茎和环在内的回文序列的两个部分都富含at,表明这些区域位于水稻染色体的低复杂性区段。
Genome Mining of Rice ( Oryza sativa subsp. indica ) for Detection and Characterization of Long Palindromic Sequences
Because the rice genome has been sequenced entirely, search to find specific features at genome-wide scale is of high importance for studying genome evolution and subsequent applications. Palindromic sequences are important DNA motifs involved in the regulation of different cellular processes and are a potential source of genetic instability. A genome mining approach was applied to detect and characterize the long palindromic sequences in the rice genome. All palindromes, defined as identical inverted repeats with spacer DNA, could be analyzed and sorted according to their frequency, size, GC content, compact index etc. The results showed that the overall palindrome frequency is high in rice genome (nearly 51000 palindromes), that totally cover 41.4% of nuclear genome of rice, with highest and lowest number of palindromes, respectively belongs to chromosome 1 and 12. Palindrome number could well explain the rice chromosome expansion (R2>92%). Average GC content of the palindromic sequences is 42.1%, indicating AT-richness and hence, the low-complexity of palindromic sequences. The results also showed different compact indices of palindromes in different chromosomes (43.2 per cM in chromosome 8 and 34.5 per cM in chromosome 3, as highest and lowest, respectively). Co-location analysis showed that more than 20% of rice genes overlapped with palindromic regions, mainly concentrating on chromosomal arms. Based on the results of this research it can be concluded that the rice genome is rich in long palindromic sequences that triggered most variation during evolution. Generally, both sections of palindromic sequences including stems and loops are AT-rich, indicating that these regions locate in the low-complexity segments of the rice chromosomes.