W H Adrian Tsui, Spencer C Ding, Peiyong Jiang, Y M Dennis Lo
The discovery of circulating fetal and tumor cell-free DNA (cfDNA) molecules in plasma has opened up tremendous opportunities in noninvasive diagnostics such as the detection of fetal chromosomal aneuploidies and cancers and in posttransplantation monitoring. The advent of high-throughput sequencing technologies makes it possible to scrutinize the characteristics of cfDNA molecules, opening up the fields of cfDNA genetics, epigenetics, transcriptomics, and fragmentomics, providing a plethora of biomarkers. Machine learning (ML) and/or artificial intelligence (AI) technologies that are known for their ability to integrate high-dimensional features have recently been applied to the field of liquid biopsy. In this review, we highlight various AI and ML approaches in cfDNA-based diagnostics. We first introduce the biology of cell-free DNA and basic concepts of ML and AI technologies. We then discuss selected examples of ML- or AI-based applications in noninvasive prenatal testing and cancer liquid biopsy. These applications include the deduction of fetal DNA fraction, plasma DNA tissue mapping, and cancer detection and localization. Finally, we offer perspectives on the future direction of using ML and AI technologies to leverage cfDNA fragmentation patterns in terms of methylomic and transcriptional investigations.
{"title":"Artificial intelligence and machine learning in cell-free-DNA-based diagnostics.","authors":"W H Adrian Tsui, Spencer C Ding, Peiyong Jiang, Y M Dennis Lo","doi":"10.1101/gr.278413.123","DOIUrl":"10.1101/gr.278413.123","url":null,"abstract":"<p><p>The discovery of circulating fetal and tumor cell-free DNA (cfDNA) molecules in plasma has opened up tremendous opportunities in noninvasive diagnostics such as the detection of fetal chromosomal aneuploidies and cancers and in posttransplantation monitoring. The advent of high-throughput sequencing technologies makes it possible to scrutinize the characteristics of cfDNA molecules, opening up the fields of cfDNA genetics, epigenetics, transcriptomics, and fragmentomics, providing a plethora of biomarkers. Machine learning (ML) and/or artificial intelligence (AI) technologies that are known for their ability to integrate high-dimensional features have recently been applied to the field of liquid biopsy. In this review, we highlight various AI and ML approaches in cfDNA-based diagnostics. We first introduce the biology of cell-free DNA and basic concepts of ML and AI technologies. We then discuss selected examples of ML- or AI-based applications in noninvasive prenatal testing and cancer liquid biopsy. These applications include the deduction of fetal DNA fraction, plasma DNA tissue mapping, and cancer detection and localization. Finally, we offer perspectives on the future direction of using ML and AI technologies to leverage cfDNA fragmentation patterns in terms of methylomic and transcriptional investigations.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"35 1","pages":"1-19"},"PeriodicalIF":6.2,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11789496/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143023247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yasine Malki, Guannan Kang, W K Jacky Lam, Qing Zhou, Suk Hang Cheng, Peter P H Cheung, Jinyue Bai, Ming Lok Chan, Chui Ting Lee, Wenlei Peng, Yiqiong Zhang, Wanxia Gai, Winsome W S Wong, Mary-Jane L Ma, Wenshuo Li, Xinzhou Xu, Zhuoran Gao, Irene O L Tse, Huimin Shang, L Y Lois Choy, Peiyong Jiang, K C Allen Chan, Y M Dennis Lo
The concentration of circulating cell-free DNA (cfDNA) in plasma is an important determinant of the robustness of liquid biopsies. However, biological mechanisms that lead to inter-individual differences in cfDNA concentrations remain unexplored. The concentration of plasma cfDNA is governed by an interplay between its release and clearance. We hypothesized that cfDNA clearance by nucleases might be one mechanism that contributes toward inter-individual variations in cfDNA concentrations. We performed fragmentomic analysis of the plasma cfDNA from 862 healthy individuals, with a cfDNA concentration range of 1.61-41.01 ng/mL. We observed an increase in large DNA fragments (231-600 bp), a decreased frequencies of shorter DNA fragments (20-160 bp), and an increased frequency of G-end motifs with increasing cfDNA concentrations. End motif deconvolution analysis revealed a decreased contribution of DNASE1L3 and DFFB in subjects with higher cfDNA concentration. The five subjects with the highest plasma DNA concentration (top 0.58%) had aberrantly decreased levels of DNASE1L3 protein in plasma. The cfDNA concentration could be inferred from the fragmentomic profile through machine learning and was well correlated to the measured cfDNA concentration. Such an approach could infer the fractional DNA concentration from particular tissue types, such as the fetal and tumor fraction. This work shows that individuals with different cfDNA concentrations are associated with characteristic fragmentomic patterns of the cfDNA pool and that nuclease-mediated clearance of DNA is a key parameter that affects cfDNA concentration. Understanding these mechanisms has facilitated the enhanced measurement of cfDNA species of clinical interest, including circulating fetal and tumor DNA.
{"title":"Analysis of a cell-free DNA-based cancer screening cohort links fragmentomic profiles, nuclease levels, and plasma DNA concentrations.","authors":"Yasine Malki, Guannan Kang, W K Jacky Lam, Qing Zhou, Suk Hang Cheng, Peter P H Cheung, Jinyue Bai, Ming Lok Chan, Chui Ting Lee, Wenlei Peng, Yiqiong Zhang, Wanxia Gai, Winsome W S Wong, Mary-Jane L Ma, Wenshuo Li, Xinzhou Xu, Zhuoran Gao, Irene O L Tse, Huimin Shang, L Y Lois Choy, Peiyong Jiang, K C Allen Chan, Y M Dennis Lo","doi":"10.1101/gr.279667.124","DOIUrl":"10.1101/gr.279667.124","url":null,"abstract":"<p><p>The concentration of circulating cell-free DNA (cfDNA) in plasma is an important determinant of the robustness of liquid biopsies. However, biological mechanisms that lead to inter-individual differences in cfDNA concentrations remain unexplored. The concentration of plasma cfDNA is governed by an interplay between its release and clearance. We hypothesized that cfDNA clearance by nucleases might be one mechanism that contributes toward inter-individual variations in cfDNA concentrations. We performed fragmentomic analysis of the plasma cfDNA from 862 healthy individuals, with a cfDNA concentration range of 1.61-41.01 ng/mL. We observed an increase in large DNA fragments (231-600 bp), a decreased frequencies of shorter DNA fragments (20-160 bp), and an increased frequency of G-end motifs with increasing cfDNA concentrations. End motif deconvolution analysis revealed a decreased contribution of DNASE1L3 and DFFB in subjects with higher cfDNA concentration. The five subjects with the highest plasma DNA concentration (top 0.58%) had aberrantly decreased levels of DNASE1L3 protein in plasma. The cfDNA concentration could be inferred from the fragmentomic profile through machine learning and was well correlated to the measured cfDNA concentration. Such an approach could infer the fractional DNA concentration from particular tissue types, such as the fetal and tumor fraction. This work shows that individuals with different cfDNA concentrations are associated with characteristic fragmentomic patterns of the cfDNA pool and that nuclease-mediated clearance of DNA is a key parameter that affects cfDNA concentration. Understanding these mechanisms has facilitated the enhanced measurement of cfDNA species of clinical interest, including circulating fetal and tumor DNA.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"31-42"},"PeriodicalIF":6.2,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11789642/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142739223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Following amputation, zebrafish regenerate their injured caudal fin through lineage-restricted reprogramming. Although previous studies have charted various genetic and epigenetic dimensions of this process, the intricate gene regulatory programs shared by, or unique to, different regenerating cell types remain underinvestigated. Here, we mapped the regulatory landscape of fin regeneration by applying paired snRNA-seq and snATAC-seq on uninjured and regenerating fins. This map delineates the regulatory dynamics of predominant cell populations at multiple stages of regeneration. We observe a marked increase in the accessibility of chromatin regions associated with regenerative and developmental processes at 1 dpa, followed by a gradual closure across major cell types at later stages. This pattern is distinct from that of transcriptomic dynamics, which is characterized by several waves of gene upregulation and downregulation. We identified and in vivo validated cell-type-specific and position-specific regeneration-responsive enhancers and constructed regulatory networks by cell type and stage. Our single-cell resolution transcriptomic and chromatin accessibility map across regenerative stages provides new insights into regeneration regulatory mechanisms and serves as a valuable resource for the community.
{"title":"Common and specific gene regulatory programs in zebrafish caudal fin regeneration at single-cell resolution","authors":"Yujie Chen, Yiran Hou, Qinglin Zeng, Irene Wang, Meiru Shang, Kwangdeok Shin, Christopher Hemauer, Xiaoyun Xing, Junsu Kang, Guoyan Zhao, Ting Wang","doi":"10.1101/gr.279372.124","DOIUrl":"https://doi.org/10.1101/gr.279372.124","url":null,"abstract":"Following amputation, zebrafish regenerate their injured caudal fin through lineage-restricted reprogramming. Although previous studies have charted various genetic and epigenetic dimensions of this process, the intricate gene regulatory programs shared by, or unique to, different regenerating cell types remain underinvestigated. Here, we mapped the regulatory landscape of fin regeneration by applying paired snRNA-seq and snATAC-seq on uninjured and regenerating fins. This map delineates the regulatory dynamics of predominant cell populations at multiple stages of regeneration. We observe a marked increase in the accessibility of chromatin regions associated with regenerative and developmental processes at 1 dpa, followed by a gradual closure across major cell types at later stages. This pattern is distinct from that of transcriptomic dynamics, which is characterized by several waves of gene upregulation and downregulation. We identified and in vivo validated cell-type-specific and position-specific regeneration-responsive enhancers and constructed regulatory networks by cell type and stage. Our single-cell resolution transcriptomic and chromatin accessibility map across regenerative stages provides new insights into regeneration regulatory mechanisms and serves as a valuable resource for the community.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"36 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142974823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mikkel Dahl-Jessen, Thorkild Terkelsen, Rasmus O Bak, Uffe Birk Jensen
Structural variations (SVs) play important roles in genetic diversity, evolution, and carcinogenesis and are, as such, important for human health. However, it remains unclear how spatial proximity of double-strand breaks (DSBs) affects the formation of SVs. To investigate if spatial proximity between two DSBs affects DNA repair, we used data from 3C experiments (Hi-C, ChIA-PET, and ChIP-seq) to identify highly interacting loci on six different chromosomes. The target regions correlate with the borders of mega-base sized Topologically Associated Domains (TADs), and we used CRISPR-Cas9 nuclease and pairs of single guide RNAs (sgRNAs) against these targets to generate DSBs in both K562 cells and H9 human embryonic stem cells (hESC). Droplet Digital PCR (ddPCR) was used to quantify the resulting recombination events, and high-throughput sequencing was used to analyze the chimeric junctions created between the two DSBs. We observe a significantly higher formation frequency of deletions and inversions with DSBs in proximity as compared to deletions and inversions with DSBs not in proximity in K562 cells. Additionally, our results suggest that DSB proximity may affect the ligation of chimeric deletion junctions. Taken together, spatial proximity between DSBs is a significant predictor of large-scale deletion and inversion frequency induced by CRISPR-Cas9 in K562 cells. This finding has implications for understanding SVs in the human genome and for the future application of CRISPR-Cas9 in gene editing and the modelling of rare SVs.
结构变异(SVs)在遗传多样性、进化和致癌作用中发挥着重要作用,因此对人类健康非常重要。然而,双链断裂(dsb)的空间邻近性如何影响sv的形成尚不清楚。为了研究两个dsb之间的空间接近是否会影响DNA修复,我们使用了3C实验(Hi-C, china - pet和ChIP-seq)的数据来鉴定六条不同染色体上高度相互作用的位点。靶区与大碱基大小的拓扑相关结构域(TADs)的边界相关,我们使用CRISPR-Cas9核酸酶和针对这些靶点的单向导rna对(sgRNAs)在K562细胞和H9人胚胎干细胞(hESC)中生成dsb。利用液滴数字PCR (ddPCR)对重组事件进行定量分析,并利用高通量测序对两个dsb之间的嵌合连接进行分析。我们观察到在K562细胞中,与dsb邻近的缺失和反转相比,dsb邻近的缺失和反转的形成频率明显更高。此外,我们的结果表明,DSB邻近可能影响嵌合缺失连接的连接。综上所述,dsb之间的空间接近性是CRISPR-Cas9在K562细胞中诱导的大规模缺失和反转频率的重要预测因子。这一发现对理解人类基因组中的SVs以及CRISPR-Cas9在基因编辑和罕见SVs建模中的未来应用具有重要意义。
{"title":"Characterization of the role of spatial proximity of DNA double-strand breaks in the formation of CRISPR-Cas9-induced large structural variations","authors":"Mikkel Dahl-Jessen, Thorkild Terkelsen, Rasmus O Bak, Uffe Birk Jensen","doi":"10.1101/gr.278575.123","DOIUrl":"https://doi.org/10.1101/gr.278575.123","url":null,"abstract":"Structural variations (SVs) play important roles in genetic diversity, evolution, and carcinogenesis and are, as such, important for human health. However, it remains unclear how spatial proximity of double-strand breaks (DSBs) affects the formation of SVs. To investigate if spatial proximity between two DSBs affects DNA repair, we used data from 3C experiments (Hi-C, ChIA-PET, and ChIP-seq) to identify highly interacting loci on six different chromosomes. The target regions correlate with the borders of mega-base sized Topologically Associated Domains (TADs), and we used CRISPR-Cas9 nuclease and pairs of single guide RNAs (sgRNAs) against these targets to generate DSBs in both K562 cells and H9 human embryonic stem cells (hESC). Droplet Digital PCR (ddPCR) was used to quantify the resulting recombination events, and high-throughput sequencing was used to analyze the chimeric junctions created between the two DSBs. We observe a significantly higher formation frequency of deletions and inversions with DSBs in proximity as compared to deletions and inversions with DSBs not in proximity in K562 cells. Additionally, our results suggest that DSB proximity may affect the ligation of chimeric deletion junctions. Taken together, spatial proximity between DSBs is a significant predictor of large-scale deletion and inversion frequency induced by CRISPR-Cas9 in K562 cells. This finding has implications for understanding SVs in the human genome and for the future application of CRISPR-Cas9 in gene editing and the modelling of rare SVs.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"29 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142974716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li-Ting Chen, Myrthe Jager, Dàmi Rebergen, Geertruid J. Brink, Tom van den Ende, Willem Vanderlinden, Pauline Kolbeck, Marc Pagès-Gallego, Ymke van der Pol, Nicolle Besselink, Norbert Moldovan, Nizar Hami, Wigard P. Kloosterman, Hanneke van Laarhoven, Florent Mouliere, Ronald Zweemer, Jan Lipfert, Sarah Derks, Alessio Marcozzi, Jeroen de Ridder
Shallow genome-wide cell-free DNA (cfDNA) sequencing holds great promise for non-invasive cancer monitoring by providing reliable copy number alteration (CNA) and fragmentomic profiles. Single nucleotide variations (SNVs) are, however, much harder to identify with low sequencing depth due to sequencing errors. Here we present Nanopore Rolling Circle Amplification (RCA)-enhanced Consensus Sequencing (NanoRCS), which leverages RCA and consensus calling based on genome-wide long-read nanopore sequencing to enable simultaneous multimodal tumor fraction estimation through SNVs, CNAs, and fragmentomics. Efficacy of NanoRCS is tested on 18 cancer patient samples and seven healthy controls, demonstrating its ability to reliably detect tumor fractions as low as 0.24%. In vitro experiments confirm that SNV measurements are essential for detecting tumor fractions below 3%. NanoRCS provides the opportunity for cost-effective and rapid processing, which aligns well with clinical needs, particularly in settings where quick and accurate cancer monitoring is essential for personalized treatment strategies.
{"title":"Nanopore-based consensus sequencing enables accurate multimodal tumor cell-free DNA profiling","authors":"Li-Ting Chen, Myrthe Jager, Dàmi Rebergen, Geertruid J. Brink, Tom van den Ende, Willem Vanderlinden, Pauline Kolbeck, Marc Pagès-Gallego, Ymke van der Pol, Nicolle Besselink, Norbert Moldovan, Nizar Hami, Wigard P. Kloosterman, Hanneke van Laarhoven, Florent Mouliere, Ronald Zweemer, Jan Lipfert, Sarah Derks, Alessio Marcozzi, Jeroen de Ridder","doi":"10.1101/gr.279144.124","DOIUrl":"https://doi.org/10.1101/gr.279144.124","url":null,"abstract":"Shallow genome-wide cell-free DNA (cfDNA) sequencing holds great promise for non-invasive cancer monitoring by providing reliable copy number alteration (CNA) and fragmentomic profiles. Single nucleotide variations (SNVs) are, however, much harder to identify with low sequencing depth due to sequencing errors. Here we present Nanopore Rolling Circle Amplification (RCA)-enhanced Consensus Sequencing (NanoRCS), which leverages RCA and consensus calling based on genome-wide long-read nanopore sequencing to enable simultaneous multimodal tumor fraction estimation through SNVs, CNAs, and fragmentomics. Efficacy of NanoRCS is tested on 18 cancer patient samples and seven healthy controls, demonstrating its ability to reliably detect tumor fractions as low as 0.24%. In vitro experiments confirm that SNV measurements are essential for detecting tumor fractions below 3%. NanoRCS provides the opportunity for cost-effective and rapid processing, which aligns well with clinical needs, particularly in settings where quick and accurate cancer monitoring is essential for personalized treatment strategies.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"83 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142974713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the increasing availability of high-quality genome assemblies, pangenome graphs emerged as a new paradigm in the genomics field for identifying, encoding, and presenting genomic variation at both population and species levels. However, it remains challenging to truly dissect and interpret pangenome graphs via biologically informative visualization. To facilitate better exploration and understanding of pangenome graphs towards novel biological insights, here we present a web-based interactive Visualization and interpretation framework for linear-Reference-projected Pangenome Graphs (VRPG). VRPG provides efficient and intuitive supports for exploring and annotating pangenome graphs along a linear-genome-based coordinate system (e.g., that of a primary linear reference genome). Moreover, VRPG offers many unique features such as in-graph path highlighting for graph-constituent input assemblies, copy number characterization for graph-embedding nodes, graph-based mapping for query sequences, all of which are highly valuable for researchers working with pangenome graphs. Additionally, VRPG enables side-by-side visualization between the graph-based pangenome representation and the conventional primary-linear-reference-genome-based feature annotations, therefore seamlessly bridging the graph and linear genomic contexts. To further demonstrate its functionality and scalability, we applied VRPG to the cutting-edge yeast and human reference pangenome graphs derived from hundreds of high-quality genome assemblies via a dedicated web portal and examined their local genome diversity in the graph contexts.
{"title":"Interactive visualization and interpretation of pangenome graphs by linear-reference-based coordinate projection and annotation integration","authors":"Zepu Miao, Jia-Xing Yue","doi":"10.1101/gr.279461.124","DOIUrl":"https://doi.org/10.1101/gr.279461.124","url":null,"abstract":"With the increasing availability of high-quality genome assemblies, pangenome graphs emerged as a new paradigm in the genomics field for identifying, encoding, and presenting genomic variation at both population and species levels. However, it remains challenging to truly dissect and interpret pangenome graphs via biologically informative visualization. To facilitate better exploration and understanding of pangenome graphs towards novel biological insights, here we present a web-based interactive Visualization and interpretation framework for linear-Reference-projected Pangenome Graphs (VRPG). VRPG provides efficient and intuitive supports for exploring and annotating pangenome graphs along a linear-genome-based coordinate system (e.g., that of a primary linear reference genome). Moreover, VRPG offers many unique features such as in-graph path highlighting for graph-constituent input assemblies, copy number characterization for graph-embedding nodes, graph-based mapping for query sequences, all of which are highly valuable for researchers working with pangenome graphs. Additionally, VRPG enables side-by-side visualization between the graph-based pangenome representation and the conventional primary-linear-reference-genome-based feature annotations, therefore seamlessly bridging the graph and linear genomic contexts. To further demonstrate its functionality and scalability, we applied VRPG to the cutting-edge yeast and human reference pangenome graphs derived from hundreds of high-quality genome assemblies via a dedicated web portal and examined their local genome diversity in the graph contexts.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"27 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142974715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongke Peng, Jafar S. Jabbari, Luyi Tian, Changqing Wang, Yupei You, Chong Chyn Chua, Natasha S. Anstee, Noorul Amin, Andrew H. Wei, Nadia Davidson, Andrew W. Roberts, David Huang, Matthew E Ritchie, Rachel Thijssen
Single-cell long-read sequencing has transformed our understanding of isoform usage and the mutation heterogeneity between cells. Despite unbiased in-depth analysis, the low sequencing throughput often results in insufficient read coverage thereby limiting our ability to perform mutation calling for specific genes. Here, we developed a single-cell Rapid Capture Hybridization sequencing (scRaCH-seq) method that demonstrated high specificity and efficiency in capturing targeted transcripts using long-read sequencing, allowing an in-depth analysis of mutation status and transcript usage for genes of interest. The method includes creating a probe panel for transcript capture, using barcoded primers for pooling and efficient sequencing via Oxford Nanopore Technologies platforms. scRaCH-seq is applicable to stored and indexed single-cell cDNA which allows analysis to be combined with existing short-read RNA-seq datasets. In our investigation of BTK and SF3B1 genes in samples from patients with chronic lymphocytic leukaemia (CLL), we detected SF3B1 isoforms and mutations with high sensitivity. Integration with short-read scRNA-seq data revealed significant gene expression differences in SF3B1-mutated CLL cells, though it did not impact the sensitivity of the anti-cancer drug venetoclax. scRaCH-seq's capability to study long-read transcripts of multiple genes makes it a powerful tool for single-cell genomics.
{"title":"Single-cell Rapid Capture Hybridization sequencing to reliably detect isoform usage and coding mutations in targeted genes","authors":"Hongke Peng, Jafar S. Jabbari, Luyi Tian, Changqing Wang, Yupei You, Chong Chyn Chua, Natasha S. Anstee, Noorul Amin, Andrew H. Wei, Nadia Davidson, Andrew W. Roberts, David Huang, Matthew E Ritchie, Rachel Thijssen","doi":"10.1101/gr.279322.124","DOIUrl":"https://doi.org/10.1101/gr.279322.124","url":null,"abstract":"Single-cell long-read sequencing has transformed our understanding of isoform usage and the mutation heterogeneity between cells. Despite unbiased in-depth analysis, the low sequencing throughput often results in insufficient read coverage thereby limiting our ability to perform mutation calling for specific genes. Here, we developed a single-cell Rapid Capture Hybridization sequencing (scRaCH-seq) method that demonstrated high specificity and efficiency in capturing targeted transcripts using long-read sequencing, allowing an in-depth analysis of mutation status and transcript usage for genes of interest. The method includes creating a probe panel for transcript capture, using barcoded primers for pooling and efficient sequencing via Oxford Nanopore Technologies platforms. scRaCH-seq is applicable to stored and indexed single-cell cDNA which allows analysis to be combined with existing short-read RNA-seq datasets. In our investigation of <em>BTK</em> and <em>SF3B1</em> genes in samples from patients with chronic lymphocytic leukaemia (CLL), we detected <em>SF3B1</em> isoforms and mutations with high sensitivity. Integration with short-read scRNA-seq data revealed significant gene expression differences in <em>SF3B1</em>-mutated CLL cells, though it did not impact the sensitivity of the anti-cancer drug venetoclax. scRaCH-seq's capability to study long-read transcripts of multiple genes makes it a powerful tool for single-cell genomics.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"36 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142961752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
One of the major challenges in genomic data sharing is protecting participants' privacy in collaborative studies and when genomic data is outsourced to perform analysis tasks, e.g., genotype imputation services and federated collaborations genomic analysis. Although numerous cryptographic methods have been developed, these methods may not yet be practical for population-scale tasks in terms of computational requirements, rely on high-level expertise in security, and require each algorithm to be implemented from scratch. In this study, we focus on outsourcing of genotype imputation, a fundamental task that utilizes population-level reference panels, and develop protocols that rely on using "proxy-panels" to protect genotype panels while imputation task is being outsourced at servers. The proxy panels are generated through a series of protection mechanisms such as haplotype sampling, allele hashing, and coordinate anonymization to protect the underlying sensitive panel's genetic variant coordinates, genetic maps, and chromosome-wide haplotypes. While the resulting proxy panels are almost distinct from the sensitive panels, they are valid panels that can be used as input to imputation methods such as Beagle. We demonstrate that proxy-based imputation protects against well-known attacks with a minor decrease in imputation accuracy for variants in a wide range of allele frequencies.
{"title":"Proxy panels enable privacy-aware outsourcing of genotype imputation","authors":"Degui Zhi, Xiaoqian Jiang, Arif O. Harmanci","doi":"10.1101/gr.278934.124","DOIUrl":"https://doi.org/10.1101/gr.278934.124","url":null,"abstract":"One of the major challenges in genomic data sharing is protecting participants' privacy in collaborative studies and when genomic data is outsourced to perform analysis tasks, e.g., genotype imputation services and federated collaborations genomic analysis. Although numerous cryptographic methods have been developed, these methods may not yet be practical for population-scale tasks in terms of computational requirements, rely on high-level expertise in security, and require each algorithm to be implemented from scratch. In this study, we focus on outsourcing of genotype imputation, a fundamental task that utilizes population-level reference panels, and develop protocols that rely on using \"proxy-panels\" to protect genotype panels while imputation task is being outsourced at servers. The proxy panels are generated through a series of protection mechanisms such as haplotype sampling, allele hashing, and coordinate anonymization to protect the underlying sensitive panel's genetic variant coordinates, genetic maps, and chromosome-wide haplotypes. While the resulting proxy panels are almost distinct from the sensitive panels, they are valid panels that can be used as input to imputation methods such as Beagle. We demonstrate that proxy-based imputation protects against well-known attacks with a minor decrease in imputation accuracy for variants in a wide range of allele frequencies.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"7 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142961755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shruthi Rengarajan, Jason Derks, Daniel W. Bellott, Nikolai Slavov, David C. Page
The Y-linked gene DDX3Y and its X-linked homolog DDX3X survived the evolution of the human sex chromosomes from ordinary autosomes. DDX3X encodes a multifunctional RNA helicase, with mutations causing developmental disorders and cancers. We find that, among X-linked genes with surviving Y homologs, DDX3X is extraordinarily dosage sensitive. Studying cells of individuals with sex chromosome aneuploidy, we observe that when the number of Y Chromosomes increases, DDX3X transcript levels fall; conversely, when the number of X Chromosomes increases, DDX3Y transcript levels fall. In 46,XY cells, CRISPRi knockdown of either DDX3X or DDX3Y causes transcript levels of the homologous gene to rise. In 46,XX cells, chemical inhibition of DDX3X protein activity elicits an increase in DDX3X transcript levels. Thus, perturbation of either DDX3X or DDX3Y expression is buffered: by negative cross-regulation of DDX3X and DDX3Y in 46,XY cells and by negative auto-regulation of DDX3X in 46,XX cells. DDX3X–DDX3Y cross-regulation is mediated through mRNA destabilization—as shown by metabolic labeling of newly transcribed RNA—and buffers total levels of DDX3X and DDX3Y protein in human cells. We infer that post-transcriptional auto-regulation of the ancestral (autosomal) DDX3X gene transmuted into auto- and cross-regulation of DDX3X and DDX3Y as these sex-linked genes evolved from ordinary alleles of their autosomal precursor.
{"title":"Post-transcriptional cross- and auto-regulation buffer expression of the human RNA helicases DDX3X and DDX3Y","authors":"Shruthi Rengarajan, Jason Derks, Daniel W. Bellott, Nikolai Slavov, David C. Page","doi":"10.1101/gr.279707.124","DOIUrl":"https://doi.org/10.1101/gr.279707.124","url":null,"abstract":"The Y-linked gene <em>DDX3Y</em> and its X-linked homolog <em>DDX3X</em> survived the evolution of the human sex chromosomes from ordinary autosomes. <em>DDX3X</em> encodes a multifunctional RNA helicase, with mutations causing developmental disorders and cancers. We find that, among X-linked genes with surviving Y homologs, <em>DDX3X</em> is extraordinarily dosage sensitive. Studying cells of individuals with sex chromosome aneuploidy, we observe that when the number of Y Chromosomes increases, <em>DDX3X</em> transcript levels fall; conversely, when the number of X Chromosomes increases, <em>DDX3Y</em> transcript levels fall. In 46,XY cells, CRISPRi knockdown of either <em>DDX3X</em> or <em>DDX3Y</em> causes transcript levels of the homologous gene to rise. In 46,XX cells, chemical inhibition of DDX3X protein activity elicits an increase in <em>DDX3X</em> transcript levels. Thus, perturbation of either <em>DDX3X</em> or <em>DDX3Y</em> expression is buffered: by negative cross-regulation of <em>DDX3X</em> and <em>DDX3Y</em> in 46,XY cells and by negative auto-regulation of <em>DDX3X</em> in 46,XX cells. <em>DDX3X</em>–<em>DDX3Y</em> cross-regulation is mediated through mRNA destabilization—as shown by metabolic labeling of newly transcribed RNA—and buffers total levels of DDX3X and DDX3Y protein in human cells. We infer that post-transcriptional auto-regulation of the ancestral (autosomal) <em>DDX3X</em> gene transmuted into auto- and cross-regulation of <em>DDX3X</em> and <em>DDX3Y</em> as these sex-linked genes evolved from ordinary alleles of their autosomal precursor.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"26 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142961769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Prokaryotes have evolved a wide repertoire of defense systems to prevent invasion by mobile genetic elements (MGE). However, because MGE are vehicles for the exchange of beneficial accessory genes, defense systems could consequently impede rapid adaptation in microbial populations. Here, we study how defense systems impact horizontal gene transfer (HGT) in the short and long terms. By combining comparative genomics and phylogeny-aware statistical methods, we quantified the association between the presence of 7 widespread defense systems and the abundance of MGE in the genomes of 196 bacterial and 1 archaeal species. We also calculated the differences in the rates of gene gain and loss between lineages that possess and lack each defense system. Our results show that the impact of defense systems on HGT is highly taxon- and system-dependent, and in most cases not statistically significant. Timescale analysis reveals that defense systems must persist in a lineage for a relatively long time to exert an appreciable negative impact on HGT. In contrast, for shorter evolutionary timescales, frequent co-acquisition of MGE and defense systems results in a net positive association of the latter with HGT. Given the high turnover rates experienced by defense systems, we propose that the inhibitory effect of most defense systems on HGT is masked by their strong linkage with MGE. These findings help explain the contradictory conclusions of previous research by pointing at mobility and within-host retention times as key factors that determine the impact of defense systems on genome plasticity.
{"title":"Timescale and genetic linkage explain the variable impact of defense systems on horizontal gene transfer","authors":"Yang Liu, Joao Botelho, Jaime Iranzo","doi":"10.1101/gr.279300.124","DOIUrl":"https://doi.org/10.1101/gr.279300.124","url":null,"abstract":"Prokaryotes have evolved a wide repertoire of defense systems to prevent invasion by mobile genetic elements (MGE). However, because MGE are vehicles for the exchange of beneficial accessory genes, defense systems could consequently impede rapid adaptation in microbial populations. Here, we study how defense systems impact horizontal gene transfer (HGT) in the short and long terms. By combining comparative genomics and phylogeny-aware statistical methods, we quantified the association between the presence of 7 widespread defense systems and the abundance of MGE in the genomes of 196 bacterial and 1 archaeal species. We also calculated the differences in the rates of gene gain and loss between lineages that possess and lack each defense system. Our results show that the impact of defense systems on HGT is highly taxon- and system-dependent, and in most cases not statistically significant. Timescale analysis reveals that defense systems must persist in a lineage for a relatively long time to exert an appreciable negative impact on HGT. In contrast, for shorter evolutionary timescales, frequent co-acquisition of MGE and defense systems results in a net positive association of the latter with HGT. Given the high turnover rates experienced by defense systems, we propose that the inhibitory effect of most defense systems on HGT is masked by their strong linkage with MGE. These findings help explain the contradictory conclusions of previous research by pointing at mobility and within-host retention times as key factors that determine the impact of defense systems on genome plasticity.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"26 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142961756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}