{"title":"Organizational Heterogeneity of the Human Genome: Significant Variation of Recombination Rate of 100 kbp Sequences within GC Ranges","authors":"S. Frenkel, V. Kirzhner, Z. Frenkel, A. Korol","doi":"10.1109/SMRLO.2016.72","DOIUrl":null,"url":null,"abstract":"The association of nucleotide composition of genome sequences with their functional characteristics is widely known, among the most studied characteristics correlated with GC content are gene density and expression and recombination rate. Previously, we found that similar in nucleotide composition genomic regions may exhibit considerable differences in sequence organization and hypothesized that organizationally different regions may also exhibit functional and evolutionary heterogeneity. Here we examine this hypothesis by classifying 100 kbp segments of human genome into 14 compositionally homogeneous groups according to their GC content and differentiating the segments within each group by organization patterns (OP) using oligonucleotide (k-mer) counting, referred to as Compositional Spectra (CS) Analysis. We identified 141 groups of segments different in their CS organization and found that obtained compositionally similar OP groups (OPG) differ significantly in their recombination rate. This conclusion was robust with respect to the selected window size (confirmed by independent analysis for 50 kb and 200 kb segments). We further performed a test of contribution of specific k-mers in clustering of 100 kbp segments to OPGs with contrast levels of recombination rates. Eight k-mers, which demonstrated highest importance for such clustering, allowed correct classification at least 76% of segments in all 14 OPG pairs. Moreover, these k-mers proved similar with five previously described patterns related to recombination hotspots including the most known 13 bp recombination motif CCNCCNTNNCCNC.","PeriodicalId":254910,"journal":{"name":"2016 Second International Symposium on Stochastic Models in Reliability Engineering, Life Science and Operations Management (SMRLO)","volume":"32 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Second International Symposium on Stochastic Models in Reliability Engineering, Life Science and Operations Management (SMRLO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SMRLO.2016.72","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The association of nucleotide composition of genome sequences with their functional characteristics is widely known, among the most studied characteristics correlated with GC content are gene density and expression and recombination rate. Previously, we found that similar in nucleotide composition genomic regions may exhibit considerable differences in sequence organization and hypothesized that organizationally different regions may also exhibit functional and evolutionary heterogeneity. Here we examine this hypothesis by classifying 100 kbp segments of human genome into 14 compositionally homogeneous groups according to their GC content and differentiating the segments within each group by organization patterns (OP) using oligonucleotide (k-mer) counting, referred to as Compositional Spectra (CS) Analysis. We identified 141 groups of segments different in their CS organization and found that obtained compositionally similar OP groups (OPG) differ significantly in their recombination rate. This conclusion was robust with respect to the selected window size (confirmed by independent analysis for 50 kb and 200 kb segments). We further performed a test of contribution of specific k-mers in clustering of 100 kbp segments to OPGs with contrast levels of recombination rates. Eight k-mers, which demonstrated highest importance for such clustering, allowed correct classification at least 76% of segments in all 14 OPG pairs. Moreover, these k-mers proved similar with five previously described patterns related to recombination hotspots including the most known 13 bp recombination motif CCNCCNTNNCCNC.