首页 > 最新文献

Genome research最新文献

英文 中文
Variation in the fitness impact of translationally optimal codons among animals
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-02-10 DOI: 10.1101/gr.279837.124
Florian Bénitière, Tristan Lefébure, Laurent Duret
Early studies in invertebrate model organisms (fruit flies, nematodes) showed that their synonymous codon usage is under selective pressure to optimize translation efficiency in highly expressed genes (a process called translational selection). In contrast, mammals show little evidence of selection for translationally optimal codons. To understand this difference, we examined the use of synonymous codons in 223 metazoan species, covering a wide range of animal clades. For each species, we predicted the set of optimal codons based on the pool of tRNA genes present in its genome, and we analyzed how the frequency of optimal codons correlates with gene expression to quantify the intensity of translational selection (S). We observed that few metazoans show clear signs of translational selection. As predicted by the nearly neutral theory, the highest values of S are observed in species with large effective population sizes (Ne). Overall, however, Ne appears to be a poor predictor of the intensity of translational selection, suggesting important differences in the fitness effect of synonymous codon usage across taxa. We propose that the few animal taxa that are clearly affected by translational selection correspond to organisms with strong constraints for a very rapid growth rate.
{"title":"Variation in the fitness impact of translationally optimal codons among animals","authors":"Florian Bénitière, Tristan Lefébure, Laurent Duret","doi":"10.1101/gr.279837.124","DOIUrl":"https://doi.org/10.1101/gr.279837.124","url":null,"abstract":"Early studies in invertebrate model organisms (fruit flies, nematodes) showed that their synonymous codon usage is under selective pressure to optimize translation efficiency in highly expressed genes (a process called translational selection). In contrast, mammals show little evidence of selection for translationally optimal codons. To understand this difference, we examined the use of synonymous codons in 223 metazoan species, covering a wide range of animal clades. For each species, we predicted the set of optimal codons based on the pool of tRNA genes present in its genome, and we analyzed how the frequency of optimal codons correlates with gene expression to quantify the intensity of translational selection (<em>S</em>). We observed that few metazoans show clear signs of translational selection. As predicted by the nearly neutral theory, the highest values of <em>S</em> are observed in species with large effective population sizes (<em>N</em><sub>e</sub>). Overall, however, <em>N</em><sub>e</sub> appears to be a poor predictor of the intensity of translational selection, suggesting important differences in the fitness effect of synonymous codon usage across taxa. We propose that the few animal taxa that are clearly affected by translational selection correspond to organisms with strong constraints for a very rapid growth rate.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"9 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143385075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nucleosome-binding by TP53, TP63, and TP73 is determined by the composition, accessibility, and helical orientation of their binding sites
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-02-10 DOI: 10.1101/gr.279541.124
Patrick Wilson, Xinyang Yu, Christopher R Handelmann, Michael J Buck
The TP53 family of transcription factors plays key roles in driving development and combating cancer by regulating gene expression. TP53, TP63, and TP73 - the three members of the TP53 family - regulate gene expression by binding to their DNA binding sites, many of which are situated within nucleosomes. To thoroughly examine the nucleosome-binding abilities of the TP53 family, we used Pioneer-seq, a technique that assesses a transcription factor's binding affinity to its DNA binding sites at all possible positions within the nucleosome core particle. Using Pioneer-seq, we analyzed the binding affinity of TP53, TP63, and TP73 to 10 TP53-family binding sites across the nucleosome core particle. We found that the affinity of TP53, TP63, and TP73 for nucleosomes was primarily determined by the positioning of TP53-family binding sites within nucleosomes; TP53-family members bind strongly to the more accessible edges of nucleosomes but weakly to the less accessible centers of nucleosomes. Our results further show that the DNA-helical orientation of TP53-family binding sites within nucleosomal DNA impacts the nucleosome-binding affinity of TP53-family members, with binding site composition impacting each TP53-family member's affinity only when the binding site location was accessible. Taken together, our results show that the accessibility, composition, and helical orientation of TP53-family binding sites collectively determine the nucleosome-binding affinities of TP53, TP63, and TP73. These findings help explain the rules underlying TP53-family-nucleosome binding and thus provide requisite insight into how we may better control gene-expression changes involved in development and tumor suppression.
{"title":"Nucleosome-binding by TP53, TP63, and TP73 is determined by the composition, accessibility, and helical orientation of their binding sites","authors":"Patrick Wilson, Xinyang Yu, Christopher R Handelmann, Michael J Buck","doi":"10.1101/gr.279541.124","DOIUrl":"https://doi.org/10.1101/gr.279541.124","url":null,"abstract":"The TP53 family of transcription factors plays key roles in driving development and combating cancer by regulating gene expression. TP53, TP63, and TP73 - the three members of the TP53 family - regulate gene expression by binding to their DNA binding sites, many of which are situated within nucleosomes. To thoroughly examine the nucleosome-binding abilities of the TP53 family, we used Pioneer-seq, a technique that assesses a transcription factor's binding affinity to its DNA binding sites at all possible positions within the nucleosome core particle. Using Pioneer-seq, we analyzed the binding affinity of TP53, TP63, and TP73 to 10 TP53-family binding sites across the nucleosome core particle. We found that the affinity of TP53, TP63, and TP73 for nucleosomes was primarily determined by the positioning of TP53-family binding sites within nucleosomes; TP53-family members bind strongly to the more accessible edges of nucleosomes but weakly to the less accessible centers of nucleosomes. Our results further show that the DNA-helical orientation of TP53-family binding sites within nucleosomal DNA impacts the nucleosome-binding affinity of TP53-family members, with binding site composition impacting each TP53-family member's affinity only when the binding site location was accessible. Taken together, our results show that the accessibility, composition, and helical orientation of TP53-family binding sites collectively determine the nucleosome-binding affinities of TP53, TP63, and TP73. These findings help explain the rules underlying TP53-family-nucleosome binding and thus provide requisite insight into how we may better control gene-expression changes involved in development and tumor suppression.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"60 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143385078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Kernel-bounded clustering for spatial transcriptomics enables scalable discovery of complex spatial domains
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-02-05 DOI: 10.1101/gr.278983.124
Hang Zhang, Yi Zhang, Kai Ming Ting, Jie Zhang, Qiuran Zhao
Spatial transcriptomics are a collection of technologies that have enabled characterization of gene expression profiles and spatial information in tissue samples. Existing methods for clustering spatial transcriptomics data have primarily focused on data transformation techniques to represent the data suitably for subsequent clustering analysis, often using an existing clustering algorithm. These methods have limitations in handling complex data characteristics with varying densities, sizes, and shapes (in the transformed space on which clustering is performed), and they have high computational complexity, resulting in unsatisfactory clustering outcomes and slow execution time even with GPUs. Rather than focusing on data transformation techniques, we propose a new clustering algorithm called kernel-bounded clustering (KBC). It has two unique features: (1) It is the first clustering algorithm that employs a distributional kernel to recruit members of a cluster, enabling clusters of varying densities, sizes, and shapes to be discovered, and (2) it is a linear-time clustering algorithm that significantly enhances the speed of clustering analysis, enabling researchers to effectively handle large-scale spatial transcriptomics data sets. We show that (1) KBC works well with a simple data transformation technique called the Weisfeiler–Lehman scheme, and (2) a combination of KBC and the Weisfeiler–Lehman scheme produces good clustering outcomes, and it is faster and easier-to-use than many methods that employ existing clustering algorithms and data transformation techniques.
{"title":"Kernel-bounded clustering for spatial transcriptomics enables scalable discovery of complex spatial domains","authors":"Hang Zhang, Yi Zhang, Kai Ming Ting, Jie Zhang, Qiuran Zhao","doi":"10.1101/gr.278983.124","DOIUrl":"https://doi.org/10.1101/gr.278983.124","url":null,"abstract":"Spatial transcriptomics are a collection of technologies that have enabled characterization of gene expression profiles and spatial information in tissue samples. Existing methods for clustering spatial transcriptomics data have primarily focused on data transformation techniques to represent the data suitably for subsequent clustering analysis, often using an existing clustering algorithm. These methods have limitations in handling complex data characteristics with varying densities, sizes, and shapes (in the transformed space on which clustering is performed), and they have high computational complexity, resulting in unsatisfactory clustering outcomes and slow execution time even with GPUs. Rather than focusing on data transformation techniques, we propose a new clustering algorithm called kernel-bounded clustering (KBC). It has two unique features: (1) It is the first clustering algorithm that employs a distributional kernel to recruit members of a cluster, enabling clusters of varying densities, sizes, and shapes to be discovered, and (2) it is a linear-time clustering algorithm that significantly enhances the speed of clustering analysis, enabling researchers to effectively handle large-scale spatial transcriptomics data sets. We show that (1) KBC works well with a simple data transformation technique called the Weisfeiler–Lehman scheme, and (2) a combination of KBC and the Weisfeiler–Lehman scheme produces good clustering outcomes, and it is faster and easier-to-use than many methods that employ existing clustering algorithms and data transformation techniques.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"40 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143192103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The additional diagnostic yield of long-read sequencing in undiagnosed rare diseases
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-02-03 DOI: 10.1101/gr.279970.124
Giulia F. Del Gobbo, Kym M. Boycott
Long-read sequencing (LRS) is a promising technology positioned to study the significant proportion of rare diseases (RDs) that remain undiagnosed as it addresses many of the limitations of short-read sequencing, detecting and clarifying additional disease-associated variants that may be missed by the current standard diagnostic workflow for RDs. Some key areas where additional diagnostic yields may be realized include: (1) detection and resolution of structural variants (SVs); (2) detection and characterization of tandem repeat expansions; (3) coverage of regions of high sequence similarity; (4) variant phasing; (5) the use of de novo genome assemblies for reference-based or graph genome variant detection; and (6) epigenetic and transcriptomic evaluations. Examples from over 50 studies support that the main areas of added diagnostic yield currently lie in SV detection and characterization, repeat expansion assessment, and phasing (with or without DNA methylation information). Several emerging studies applying LRS in cohorts of undiagnosed RDs also demonstrate that LRS can boost diagnostic yields following negative standard-of-care clinical testing and provide an added yield of 7%–17% following negative short-read genome sequencing. With this evidence of improved diagnostic yield, we discuss the incorporation of LRS into the diagnostic care pathway for undiagnosed RDs, including current challenges and considerations, with the ultimate goal of ending the diagnostic odyssey for countless individuals with RDs.
{"title":"The additional diagnostic yield of long-read sequencing in undiagnosed rare diseases","authors":"Giulia F. Del Gobbo, Kym M. Boycott","doi":"10.1101/gr.279970.124","DOIUrl":"https://doi.org/10.1101/gr.279970.124","url":null,"abstract":"Long-read sequencing (LRS) is a promising technology positioned to study the significant proportion of rare diseases (RDs) that remain undiagnosed as it addresses many of the limitations of short-read sequencing, detecting and clarifying additional disease-associated variants that may be missed by the current standard diagnostic workflow for RDs. Some key areas where additional diagnostic yields may be realized include: (1) detection and resolution of structural variants (SVs); (2) detection and characterization of tandem repeat expansions; (3) coverage of regions of high sequence similarity; (4) variant phasing; (5) the use of de novo genome assemblies for reference-based or graph genome variant detection; and (6) epigenetic and transcriptomic evaluations. Examples from over 50 studies support that the main areas of added diagnostic yield currently lie in SV detection and characterization, repeat expansion assessment, and phasing (with or without DNA methylation information). Several emerging studies applying LRS in cohorts of undiagnosed RDs also demonstrate that LRS can boost diagnostic yields following negative standard-of-care clinical testing and provide an added yield of 7%–17% following negative short-read genome sequencing. With this evidence of improved diagnostic yield, we discuss the incorporation of LRS into the diagnostic care pathway for undiagnosed RDs, including current challenges and considerations, with the ultimate goal of ending the diagnostic odyssey for countless individuals with RDs.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"35 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143077627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
k-mer approaches for biodiversity genomics
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-01-31 DOI: 10.1101/gr.279452.124
Katharine M. Jenike, Lucía Campos-Domínguez, Marilou Boddé, José Cerca, Christina N. Hodson, Michael C. Schatz, Kamil S. Jaron
The wide array of currently available genomes displays a wonderful diversity in size, composition, and structure and is quickly expanding thanks to several global biodiversity genomics initiatives. However, sequencing of genomes, even with the latest technologies, can still be challenging for both technical (e.g., small physical size, contaminated samples, or access to appropriate sequencing platforms) and biological reasons (e.g., germline-restricted DNA, variable ploidy levels, sex chromosomes, or very large genomes). In recent years, k-mer-based techniques have become popular to overcome some of these challenges. They are based on the simple process of dividing the analyzed sequences (e.g., raw reads or genomes) into a set of subsequences of length k, called k-mers, and then analyzing the frequency or sequences of those k-mers. Analyses based on k-mers allow for a rapid and intuitive assessment of complex sequencing data sets. Here, we provide a comprehensive review to the theoretical properties and practical applications of k-mers in biodiversity genomics with a special focus on genome modeling.
{"title":"k-mer approaches for biodiversity genomics","authors":"Katharine M. Jenike, Lucía Campos-Domínguez, Marilou Boddé, José Cerca, Christina N. Hodson, Michael C. Schatz, Kamil S. Jaron","doi":"10.1101/gr.279452.124","DOIUrl":"https://doi.org/10.1101/gr.279452.124","url":null,"abstract":"The wide array of currently available genomes displays a wonderful diversity in size, composition, and structure and is quickly expanding thanks to several global biodiversity genomics initiatives. However, sequencing of genomes, even with the latest technologies, can still be challenging for both technical (e.g., small physical size, contaminated samples, or access to appropriate sequencing platforms) and biological reasons (e.g., germline-restricted DNA, variable ploidy levels, sex chromosomes, or very large genomes). In recent years, <em>k</em>-mer-based techniques have become popular to overcome some of these challenges. They are based on the simple process of dividing the analyzed sequences (e.g., raw reads or genomes) into a set of subsequences of length <em>k</em>, called <em>k</em>-mers, and then analyzing the frequency or sequences of those <em>k</em>-mers. Analyses based on <em>k</em>-mers allow for a rapid and intuitive assessment of complex sequencing data sets. Here, we provide a comprehensive review to the theoretical properties and practical applications of <em>k</em>-mers in biodiversity genomics with a special focus on genome modeling.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"15 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143072635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing nanopore adaptive sampling for PromethION using readfish at scale
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-01-30 DOI: 10.1101/gr.279329.124
Rory Munro, Alex Payne, Nadine Holmes, Chris Moore, Inswasti Cahyani, Matt Loose
A unique feature of Oxford Nanopore Technologies sequencers, adaptive sampling, allows precise DNA molecule selection from sequencing libraries. Here we present enhancements to our tool, readfish, enabling all features for the industrial scale PromethION sequencer, including standard and "barcode-aware" adaptive sampling. We demonstrate effective coverage enrichment and assessment of multiple human genomes for copy number and structural variation on a single PromethION flow cell.
{"title":"Enhancing nanopore adaptive sampling for PromethION using readfish at scale","authors":"Rory Munro, Alex Payne, Nadine Holmes, Chris Moore, Inswasti Cahyani, Matt Loose","doi":"10.1101/gr.279329.124","DOIUrl":"https://doi.org/10.1101/gr.279329.124","url":null,"abstract":"A unique feature of Oxford Nanopore Technologies sequencers, adaptive sampling, allows precise DNA molecule selection from sequencing libraries. Here we present enhancements to our tool, readfish, enabling all features for the industrial scale PromethION sequencer, including standard and \"barcode-aware\" adaptive sampling. We demonstrate effective coverage enrichment and assessment of multiple human genomes for copy number and structural variation on a single PromethION flow cell.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"60 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143056623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rapid and accurate demultiplexing of direct RNA nanopore sequencing datasets with SeqTagger
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-01-29 DOI: 10.1101/gr.279290.124
Leszek P Pryszcz, Gregor Diensthuber, Laia Llovera, Rebeca Medina, Anna Delgado-Tejedor, Luca Cozzuto, Julia Ponomarenko, Eva Maria Novoa
Nanopore direct RNA sequencing (DRS) enables direct measurement of RNA molecules, including their native RNA modifications, without prior conversion to cDNA. However, commercial methods for molecular barcoding of multiple DRS samples are lacking, and community-driven efforts, such as DeePlexiCon, are not compatible with newer RNA chemistry flowcells and the latest-generation GPU cards. To overcome these limitations, we introduce SeqTagger, a rapid and robust method that can demultiplex direct RNA sequencing datasets with 99% precision and 95% recall. We demonstrate the applicability of SeqTagger in both RNA002/R9.4 and RNA004/RNA chemistries and show its robust performance both for long and short RNA libraries, including custom libraries that do not contain standard poly(A) tails, such as Nano-tRNAseq libraries. Finally, we demonstrate that increasing the multiplexing up to 96 barcodes yields highly accurate demultiplexing models. SeqTagger can be executed in a standalone manner or through the MasterOfPores NextFlow workflow. The availability of an efficient and simple multiplexing strategy improves the cost-effectiveness of this technology and facilitates the analysis of low-input biological samples.
{"title":"Rapid and accurate demultiplexing of direct RNA nanopore sequencing datasets with SeqTagger","authors":"Leszek P Pryszcz, Gregor Diensthuber, Laia Llovera, Rebeca Medina, Anna Delgado-Tejedor, Luca Cozzuto, Julia Ponomarenko, Eva Maria Novoa","doi":"10.1101/gr.279290.124","DOIUrl":"https://doi.org/10.1101/gr.279290.124","url":null,"abstract":"Nanopore direct RNA sequencing (DRS) enables direct measurement of RNA molecules, including their native RNA modifications, without prior conversion to cDNA. However, commercial methods for molecular barcoding of multiple DRS samples are lacking, and community-driven efforts, such as DeePlexiCon, are not compatible with newer RNA chemistry flowcells and the latest-generation GPU cards. To overcome these limitations, we introduce SeqTagger, a rapid and robust method that can demultiplex direct RNA sequencing datasets with 99% precision and 95% recall. We demonstrate the applicability of SeqTagger in both RNA002/R9.4 and RNA004/RNA chemistries and show its robust performance both for long and short RNA libraries, including custom libraries that do not contain standard poly(A) tails, such as Nano-tRNAseq libraries. Finally, we demonstrate that increasing the multiplexing up to 96 barcodes yields highly accurate demultiplexing models. SeqTagger can be executed in a standalone manner or through the MasterOfPores NextFlow workflow. The availability of an efficient and simple multiplexing strategy improves the cost-effectiveness of this technology and facilitates the analysis of low-input biological samples.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"29 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143056331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Artificial intelligence and machine learning in cell-free-DNA-based diagnostics.
IF 6.2 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-01-22 DOI: 10.1101/gr.278413.123
W H Adrian Tsui, Spencer C Ding, Peiyong Jiang, Y M Dennis Lo

The discovery of circulating fetal and tumor cell-free DNA (cfDNA) molecules in plasma has opened up tremendous opportunities in noninvasive diagnostics such as the detection of fetal chromosomal aneuploidies and cancers and in posttransplantation monitoring. The advent of high-throughput sequencing technologies makes it possible to scrutinize the characteristics of cfDNA molecules, opening up the fields of cfDNA genetics, epigenetics, transcriptomics, and fragmentomics, providing a plethora of biomarkers. Machine learning (ML) and/or artificial intelligence (AI) technologies that are known for their ability to integrate high-dimensional features have recently been applied to the field of liquid biopsy. In this review, we highlight various AI and ML approaches in cfDNA-based diagnostics. We first introduce the biology of cell-free DNA and basic concepts of ML and AI technologies. We then discuss selected examples of ML- or AI-based applications in noninvasive prenatal testing and cancer liquid biopsy. These applications include the deduction of fetal DNA fraction, plasma DNA tissue mapping, and cancer detection and localization. Finally, we offer perspectives on the future direction of using ML and AI technologies to leverage cfDNA fragmentation patterns in terms of methylomic and transcriptional investigations.

{"title":"Artificial intelligence and machine learning in cell-free-DNA-based diagnostics.","authors":"W H Adrian Tsui, Spencer C Ding, Peiyong Jiang, Y M Dennis Lo","doi":"10.1101/gr.278413.123","DOIUrl":"10.1101/gr.278413.123","url":null,"abstract":"<p><p>The discovery of circulating fetal and tumor cell-free DNA (cfDNA) molecules in plasma has opened up tremendous opportunities in noninvasive diagnostics such as the detection of fetal chromosomal aneuploidies and cancers and in posttransplantation monitoring. The advent of high-throughput sequencing technologies makes it possible to scrutinize the characteristics of cfDNA molecules, opening up the fields of cfDNA genetics, epigenetics, transcriptomics, and fragmentomics, providing a plethora of biomarkers. Machine learning (ML) and/or artificial intelligence (AI) technologies that are known for their ability to integrate high-dimensional features have recently been applied to the field of liquid biopsy. In this review, we highlight various AI and ML approaches in cfDNA-based diagnostics. We first introduce the biology of cell-free DNA and basic concepts of ML and AI technologies. We then discuss selected examples of ML- or AI-based applications in noninvasive prenatal testing and cancer liquid biopsy. These applications include the deduction of fetal DNA fraction, plasma DNA tissue mapping, and cancer detection and localization. Finally, we offer perspectives on the future direction of using ML and AI technologies to leverage cfDNA fragmentation patterns in terms of methylomic and transcriptional investigations.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"35 1","pages":"1-19"},"PeriodicalIF":6.2,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11789496/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143023247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analysis of a cell-free DNA-based cancer screening cohort links fragmentomic profiles, nuclease levels, and plasma DNA concentrations. 分析基于无细胞 DNA 的癌症筛查队列,将片段组图谱、核酸酶水平和血浆 DNA 浓度联系起来。
IF 6.2 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-01-22 DOI: 10.1101/gr.279667.124
Yasine Malki, Guannan Kang, W K Jacky Lam, Qing Zhou, Suk Hang Cheng, Peter P H Cheung, Jinyue Bai, Ming Lok Chan, Chui Ting Lee, Wenlei Peng, Yiqiong Zhang, Wanxia Gai, Winsome W S Wong, Mary-Jane L Ma, Wenshuo Li, Xinzhou Xu, Zhuoran Gao, Irene O L Tse, Huimin Shang, L Y Lois Choy, Peiyong Jiang, K C Allen Chan, Y M Dennis Lo

The concentration of circulating cell-free DNA (cfDNA) in plasma is an important determinant of the robustness of liquid biopsies. However, biological mechanisms that lead to inter-individual differences in cfDNA concentrations remain unexplored. The concentration of plasma cfDNA is governed by an interplay between its release and clearance. We hypothesized that cfDNA clearance by nucleases might be one mechanism that contributes toward inter-individual variations in cfDNA concentrations. We performed fragmentomic analysis of the plasma cfDNA from 862 healthy individuals, with a cfDNA concentration range of 1.61-41.01 ng/mL. We observed an increase in large DNA fragments (231-600 bp), a decreased frequencies of shorter DNA fragments (20-160 bp), and an increased frequency of G-end motifs with increasing cfDNA concentrations. End motif deconvolution analysis revealed a decreased contribution of DNASE1L3 and DFFB in subjects with higher cfDNA concentration. The five subjects with the highest plasma DNA concentration (top 0.58%) had aberrantly decreased levels of DNASE1L3 protein in plasma. The cfDNA concentration could be inferred from the fragmentomic profile through machine learning and was well correlated to the measured cfDNA concentration. Such an approach could infer the fractional DNA concentration from particular tissue types, such as the fetal and tumor fraction. This work shows that individuals with different cfDNA concentrations are associated with characteristic fragmentomic patterns of the cfDNA pool and that nuclease-mediated clearance of DNA is a key parameter that affects cfDNA concentration. Understanding these mechanisms has facilitated the enhanced measurement of cfDNA species of clinical interest, including circulating fetal and tumor DNA.

血浆中循环游离细胞 DNA(cfDNA)的浓度是决定液体活检可靠性的重要因素。然而,导致个体间 cfDNA 浓度差异的生物学机制仍有待探索。血浆中 cfDNA 的浓度受其释放和清除之间相互作用的影响。我们假设,核酸酶清除 cfDNA 可能是导致 cfDNA 浓度个体间差异的机制之一。我们对 862 名健康人的血浆 cfDNA 进行了片段分析,其 cfDNA 浓度范围为 1.61 - 41.01 纳克/毫升。我们观察到,随着 cfDNA 浓度的增加,大 DNA 片段(231-600 bp)的频率增加,短 DNA 片段(20-160 bp)的频率减少,G 端基序的频率增加。末端基团解卷积分析表明,在 cfDNA 浓度较高的受试者中,DNASE1L3 和 DFFB 的贡献率有所下降。血浆 DNA 浓度最高的五名受试者(前 0.58%)血浆中的 DNASE1L3 蛋白水平异常降低。通过机器学习,可以从片段组图谱推断出cfDNA浓度,而且与测量的cfDNA浓度有很好的相关性。这种方法可以推断出特定组织类型(如胎儿和肿瘤部分)的DNA分数浓度。这项工作表明,不同cfDNA浓度的个体与cfDNA池的特征片段组模式有关;核酸酶介导的DNA清除是影响cfDNA浓度的关键参数。对这些机制的了解有助于加强对临床感兴趣的 cfDNA 种类(包括循环中的胎儿 DNA 和肿瘤 DNA)的测量。
{"title":"Analysis of a cell-free DNA-based cancer screening cohort links fragmentomic profiles, nuclease levels, and plasma DNA concentrations.","authors":"Yasine Malki, Guannan Kang, W K Jacky Lam, Qing Zhou, Suk Hang Cheng, Peter P H Cheung, Jinyue Bai, Ming Lok Chan, Chui Ting Lee, Wenlei Peng, Yiqiong Zhang, Wanxia Gai, Winsome W S Wong, Mary-Jane L Ma, Wenshuo Li, Xinzhou Xu, Zhuoran Gao, Irene O L Tse, Huimin Shang, L Y Lois Choy, Peiyong Jiang, K C Allen Chan, Y M Dennis Lo","doi":"10.1101/gr.279667.124","DOIUrl":"10.1101/gr.279667.124","url":null,"abstract":"<p><p>The concentration of circulating cell-free DNA (cfDNA) in plasma is an important determinant of the robustness of liquid biopsies. However, biological mechanisms that lead to inter-individual differences in cfDNA concentrations remain unexplored. The concentration of plasma cfDNA is governed by an interplay between its release and clearance. We hypothesized that cfDNA clearance by nucleases might be one mechanism that contributes toward inter-individual variations in cfDNA concentrations. We performed fragmentomic analysis of the plasma cfDNA from 862 healthy individuals, with a cfDNA concentration range of 1.61-41.01 ng/mL. We observed an increase in large DNA fragments (231-600 bp), a decreased frequencies of shorter DNA fragments (20-160 bp), and an increased frequency of G-end motifs with increasing cfDNA concentrations. End motif deconvolution analysis revealed a decreased contribution of DNASE1L3 and DFFB in subjects with higher cfDNA concentration. The five subjects with the highest plasma DNA concentration (top 0.58%) had aberrantly decreased levels of DNASE1L3 protein in plasma. The cfDNA concentration could be inferred from the fragmentomic profile through machine learning and was well correlated to the measured cfDNA concentration. Such an approach could infer the fractional DNA concentration from particular tissue types, such as the fetal and tumor fraction. This work shows that individuals with different cfDNA concentrations are associated with characteristic fragmentomic patterns of the cfDNA pool and that nuclease-mediated clearance of DNA is a key parameter that affects cfDNA concentration. Understanding these mechanisms has facilitated the enhanced measurement of cfDNA species of clinical interest, including circulating fetal and tumor DNA.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"31-42"},"PeriodicalIF":6.2,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11789642/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142739223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Common and specific gene regulatory programs in zebrafish caudal fin regeneration at single-cell resolution 单细胞分辨率下斑马鱼尾鳍再生过程中的常见和特异基因调控程序
IF 7 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Pub Date : 2025-01-14 DOI: 10.1101/gr.279372.124
Yujie Chen, Yiran Hou, Qinglin Zeng, Irene Wang, Meiru Shang, Kwangdeok Shin, Christopher Hemauer, Xiaoyun Xing, Junsu Kang, Guoyan Zhao, Ting Wang
Following amputation, zebrafish regenerate their injured caudal fin through lineage-restricted reprogramming. Although previous studies have charted various genetic and epigenetic dimensions of this process, the intricate gene regulatory programs shared by, or unique to, different regenerating cell types remain underinvestigated. Here, we mapped the regulatory landscape of fin regeneration by applying paired snRNA-seq and snATAC-seq on uninjured and regenerating fins. This map delineates the regulatory dynamics of predominant cell populations at multiple stages of regeneration. We observe a marked increase in the accessibility of chromatin regions associated with regenerative and developmental processes at 1 dpa, followed by a gradual closure across major cell types at later stages. This pattern is distinct from that of transcriptomic dynamics, which is characterized by several waves of gene upregulation and downregulation. We identified and in vivo validated cell-type-specific and position-specific regeneration-responsive enhancers and constructed regulatory networks by cell type and stage. Our single-cell resolution transcriptomic and chromatin accessibility map across regenerative stages provides new insights into regeneration regulatory mechanisms and serves as a valuable resource for the community.
截肢后,斑马鱼通过谱系限制重编程再生其受伤的尾鳍。尽管先前的研究已经绘制了这一过程的各种遗传和表观遗传维度,但不同再生细胞类型共享或独特的复杂基因调控程序仍未得到充分研究。在这里,我们通过在未受伤和再生的鳍上应用配对的snRNA-seq和snATAC-seq,绘制了鳍再生的调控图景。这张图描绘了在再生的多个阶段优势细胞群的调控动态。我们观察到在1 dpa时,与再生和发育过程相关的染色质区域的可及性显著增加,随后在后期阶段,主要细胞类型逐渐关闭。这种模式与转录组动力学不同,转录组动力学的特点是基因上调和下调几波。我们鉴定并在体内验证了细胞类型特异性和位置特异性再生反应增强子,并根据细胞类型和阶段构建了调控网络。我们的单细胞分辨率转录组和染色质可及性图谱跨越再生阶段,为再生调控机制提供了新的见解,并为社区提供了宝贵的资源。
{"title":"Common and specific gene regulatory programs in zebrafish caudal fin regeneration at single-cell resolution","authors":"Yujie Chen, Yiran Hou, Qinglin Zeng, Irene Wang, Meiru Shang, Kwangdeok Shin, Christopher Hemauer, Xiaoyun Xing, Junsu Kang, Guoyan Zhao, Ting Wang","doi":"10.1101/gr.279372.124","DOIUrl":"https://doi.org/10.1101/gr.279372.124","url":null,"abstract":"Following amputation, zebrafish regenerate their injured caudal fin through lineage-restricted reprogramming. Although previous studies have charted various genetic and epigenetic dimensions of this process, the intricate gene regulatory programs shared by, or unique to, different regenerating cell types remain underinvestigated. Here, we mapped the regulatory landscape of fin regeneration by applying paired snRNA-seq and snATAC-seq on uninjured and regenerating fins. This map delineates the regulatory dynamics of predominant cell populations at multiple stages of regeneration. We observe a marked increase in the accessibility of chromatin regions associated with regenerative and developmental processes at 1 dpa, followed by a gradual closure across major cell types at later stages. This pattern is distinct from that of transcriptomic dynamics, which is characterized by several waves of gene upregulation and downregulation. We identified and in vivo validated cell-type-specific and position-specific regeneration-responsive enhancers and constructed regulatory networks by cell type and stage. Our single-cell resolution transcriptomic and chromatin accessibility map across regenerative stages provides new insights into regeneration regulatory mechanisms and serves as a valuable resource for the community.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"36 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142974823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Genome research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1