Pub Date : 2026-02-07DOI: 10.1186/s13059-026-03947-w
Xiaopu Zhang, Idil Yet, Sergio Villicaña, Juan Castillo-Fernandez, Massimo Mangino, Jouke Jan Hottenga, Pei-Chien Tsai, Josine L Min, Mario Falchi, Andrew Wong, Dorret I Boomsma, Ken K Ong, Jenny van Dongen, Jordana T Bell
Background: Genetic variants that are associated with phenotypic variability, or variance quantitative trait loci (vQTLs), have been detected for multiple human traits. Gene-environment interactions can lead to differential phenotypic variability across genotype groups, therefore, genetic variants that interact with environmental exposures can manifest as vQTLs. Although changes in DNA methylation variability have been observed in several diseases, vQTLs for methylation levels (vmeQTL) have not yet been explored in depth.
Results: We optimize the value of monozygotic twin studies to identify and replicate vmeQTLs for blood DNA methylation variance at 358 CpGs in 988 adult monozygotic twin pairs from two European twin registries. Over a third of vmeQTLs capture identical vmeQTL-environmental factor interactions in both datasets, and the majority of interactions are observed with blood cell counts. Correspondingly, over 60% of CpGs affected by genotype-monocyte and genotype-T cell interactions replicate as CpGs affected by genetic effects in the relevant cell type in an independent dataset. Most vmeQTLs also replicate in 1,348 UK non-twin adults and show longitudinal stability in a sample subset. Integrating gene expression and phenotype association results identifies multiple vmeQTLs that capture GxE effects relevant to human health. Examples include vmeQTLs interacting with blood cell type to influence DNA methylation in FAM65A, NAPRT, and CSGALNACT1 underlying immune disease susceptibility and progression.
Conclusions: Our findings identify novel genetic effects on human DNA methylation variability within a unique monozygotic twin study design. The results show the potential of vmeQTLs to identify gene-environment interactions and provide novel insights into complex traits.
{"title":"Genetic impacts on within-pair DNA methylation variance in monozygotic twins capture gene-environment interactions and cell-type effects.","authors":"Xiaopu Zhang, Idil Yet, Sergio Villicaña, Juan Castillo-Fernandez, Massimo Mangino, Jouke Jan Hottenga, Pei-Chien Tsai, Josine L Min, Mario Falchi, Andrew Wong, Dorret I Boomsma, Ken K Ong, Jenny van Dongen, Jordana T Bell","doi":"10.1186/s13059-026-03947-w","DOIUrl":"https://doi.org/10.1186/s13059-026-03947-w","url":null,"abstract":"<p><strong>Background: </strong>Genetic variants that are associated with phenotypic variability, or variance quantitative trait loci (vQTLs), have been detected for multiple human traits. Gene-environment interactions can lead to differential phenotypic variability across genotype groups, therefore, genetic variants that interact with environmental exposures can manifest as vQTLs. Although changes in DNA methylation variability have been observed in several diseases, vQTLs for methylation levels (vmeQTL) have not yet been explored in depth.</p><p><strong>Results: </strong>We optimize the value of monozygotic twin studies to identify and replicate vmeQTLs for blood DNA methylation variance at 358 CpGs in 988 adult monozygotic twin pairs from two European twin registries. Over a third of vmeQTLs capture identical vmeQTL-environmental factor interactions in both datasets, and the majority of interactions are observed with blood cell counts. Correspondingly, over 60% of CpGs affected by genotype-monocyte and genotype-T cell interactions replicate as CpGs affected by genetic effects in the relevant cell type in an independent dataset. Most vmeQTLs also replicate in 1,348 UK non-twin adults and show longitudinal stability in a sample subset. Integrating gene expression and phenotype association results identifies multiple vmeQTLs that capture GxE effects relevant to human health. Examples include vmeQTLs interacting with blood cell type to influence DNA methylation in FAM65A, NAPRT, and CSGALNACT1 underlying immune disease susceptibility and progression.</p><p><strong>Conclusions: </strong>Our findings identify novel genetic effects on human DNA methylation variability within a unique monozygotic twin study design. The results show the potential of vmeQTLs to identify gene-environment interactions and provide novel insights into complex traits.</p>","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":" ","pages":""},"PeriodicalIF":10.1,"publicationDate":"2026-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146137585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-07DOI: 10.1186/s13059-026-03984-5
Somia Saidi, Mathieu Blaison, María Del Pilar Rodríguez-Ordóñez, Johann Confais, Hadi Quesneville
Background: The role of transposable elements (TEs) in host adaptation has gained interest in recent years. Individuals of the same species undergo independent TE insertions, providing genetic variability within populations, upon which natural selection can act to foster adaptation to environmental conditions.
Results: As de novo assembled genomes are becoming increasingly affordable, helping to overcome the bias introduced by relying on a single reference genome, there is a growing need for suitable pangenomic tools to explore the genomic diversity within a species. We developed a new pipeline called panREPET that identifies TE insertions shared by groups of individuals. Unlike other pangenomic tools, panREPET operates independently of a reference genome and provides the precise sequence and genomic coordinates of each TE copy for each genome.
Conclusions: We showcase the potential of this tool by identifying TE insertions shared among 42 Brachypodium distachyon genomes and by comparing our results with those of existing tools to demonstrate its advantages. Using panREPET, we were able to date two major TE bursts corresponding to major climate events: 22 kya during the Last Glacial Maximum and 10 kya during the Holocene, showing a potential link between environmental stress and TE activity.
{"title":"A reference-free pipeline for detecting shared transposable elements from pan-genomes to retrace their dynamics in a species.","authors":"Somia Saidi, Mathieu Blaison, María Del Pilar Rodríguez-Ordóñez, Johann Confais, Hadi Quesneville","doi":"10.1186/s13059-026-03984-5","DOIUrl":"https://doi.org/10.1186/s13059-026-03984-5","url":null,"abstract":"<p><strong>Background: </strong>The role of transposable elements (TEs) in host adaptation has gained interest in recent years. Individuals of the same species undergo independent TE insertions, providing genetic variability within populations, upon which natural selection can act to foster adaptation to environmental conditions.</p><p><strong>Results: </strong>As de novo assembled genomes are becoming increasingly affordable, helping to overcome the bias introduced by relying on a single reference genome, there is a growing need for suitable pangenomic tools to explore the genomic diversity within a species. We developed a new pipeline called panREPET that identifies TE insertions shared by groups of individuals. Unlike other pangenomic tools, panREPET operates independently of a reference genome and provides the precise sequence and genomic coordinates of each TE copy for each genome.</p><p><strong>Conclusions: </strong>We showcase the potential of this tool by identifying TE insertions shared among 42 Brachypodium distachyon genomes and by comparing our results with those of existing tools to demonstrate its advantages. Using panREPET, we were able to date two major TE bursts corresponding to major climate events: 22 kya during the Last Glacial Maximum and 10 kya during the Holocene, showing a potential link between environmental stress and TE activity.</p>","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":" ","pages":""},"PeriodicalIF":10.1,"publicationDate":"2026-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146137439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-07DOI: 10.1186/s13059-026-03966-7
Yongxin Ji, Jiaojiao Guan, Herui Liao, Jiayu Shang, Yanni Sun
Plasmids play a pivotal role in the emergence of multidrug-resistant and pathogenic bacteria, posing significant clinical challenges. However, the rapidly growing number of unannotated plasmids necessitates comprehensive characterization of their diverse properties. Here, we present PlasRAG, a tool that integrates multi-faceted property characterization of query plasmids and plasmid DNA retrieval based on textual queries. PlasRAG employs a bidirectional multi-modal information retrieval model that aligns DNA sequences with textual data, effectively overcoming the limitations of traditional approaches. Rigorous experiments demonstrate that PlasRAG delivers robust performance and enhanced analytical capabilities, underscoring the effectiveness of its architectural design.
{"title":"PlasRAG: comprehensive plasmid characterization and retrieval through sequence-text alignment.","authors":"Yongxin Ji, Jiaojiao Guan, Herui Liao, Jiayu Shang, Yanni Sun","doi":"10.1186/s13059-026-03966-7","DOIUrl":"https://doi.org/10.1186/s13059-026-03966-7","url":null,"abstract":"<p><p>Plasmids play a pivotal role in the emergence of multidrug-resistant and pathogenic bacteria, posing significant clinical challenges. However, the rapidly growing number of unannotated plasmids necessitates comprehensive characterization of their diverse properties. Here, we present PlasRAG, a tool that integrates multi-faceted property characterization of query plasmids and plasmid DNA retrieval based on textual queries. PlasRAG employs a bidirectional multi-modal information retrieval model that aligns DNA sequences with textual data, effectively overcoming the limitations of traditional approaches. Rigorous experiments demonstrate that PlasRAG delivers robust performance and enhanced analytical capabilities, underscoring the effectiveness of its architectural design.</p>","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":" ","pages":""},"PeriodicalIF":10.1,"publicationDate":"2026-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146137588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-30DOI: 10.1186/s13059-026-03959-6
Cai-Jin Chen, Xiao-Xu Pang, Ya-Mei Ding, Wei-Ping Zhang, Yang Yang, Jie Liu, Anush Nersesyan, Bo-Wen Zhang, Susanne S Renner, Da-Yong Zhang, Wei-Ning Bai
Background: The inference of population structure in domestication studies is prone to biases whenever sampling is unbalanced and effective population sizes (Ne) differ across populations. Such biases can lead to the misclassification of large ancestral populations as admixed, particularly under single-origin domestication scenarios.
Results: We propose a novel parameterization strategy for the STRUCTURE software, combining the F model and alternative ancestry prior (along with a smaller initial ALPHA value), and simulations demonstrate that the strategy mitigates unbalanced sampling and unequal population size biases. We apply our strategy to the domestication history of the common walnut (Juglans regia), using whole-genome resequencing data from 298 individuals from across its range. The results support an origin of J. regia in South Asia, where walnut populations are characterized by high genetic diversity, extensive private allele content, low mutation load, and demographic stability. Building on this demographic framework, we further identify genomic regions under recent positive selection and candidate domestication genes involved in shell structure, pollen development, and lipid transport.
Conclusions: Our results clarify the long-standing debate on the geographic origin of walnut domestication and demonstrate that an optimized, model-aware use of STRUCTURE can substantially improve population-genetic inference in domestication studies and other systems characterized by complex demography.
{"title":"Resolving sampling and population-size biases in domestication genomics supports a South Asian origin of walnuts.","authors":"Cai-Jin Chen, Xiao-Xu Pang, Ya-Mei Ding, Wei-Ping Zhang, Yang Yang, Jie Liu, Anush Nersesyan, Bo-Wen Zhang, Susanne S Renner, Da-Yong Zhang, Wei-Ning Bai","doi":"10.1186/s13059-026-03959-6","DOIUrl":"https://doi.org/10.1186/s13059-026-03959-6","url":null,"abstract":"<p><strong>Background: </strong>The inference of population structure in domestication studies is prone to biases whenever sampling is unbalanced and effective population sizes (N<sub>e</sub>) differ across populations. Such biases can lead to the misclassification of large ancestral populations as admixed, particularly under single-origin domestication scenarios.</p><p><strong>Results: </strong>We propose a novel parameterization strategy for the STRUCTURE software, combining the F model and alternative ancestry prior (along with a smaller initial ALPHA value), and simulations demonstrate that the strategy mitigates unbalanced sampling and unequal population size biases. We apply our strategy to the domestication history of the common walnut (Juglans regia), using whole-genome resequencing data from 298 individuals from across its range. The results support an origin of J. regia in South Asia, where walnut populations are characterized by high genetic diversity, extensive private allele content, low mutation load, and demographic stability. Building on this demographic framework, we further identify genomic regions under recent positive selection and candidate domestication genes involved in shell structure, pollen development, and lipid transport.</p><p><strong>Conclusions: </strong>Our results clarify the long-standing debate on the geographic origin of walnut domestication and demonstrate that an optimized, model-aware use of STRUCTURE can substantially improve population-genetic inference in domestication studies and other systems characterized by complex demography.</p>","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":" ","pages":""},"PeriodicalIF":10.1,"publicationDate":"2026-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146092784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-30DOI: 10.1186/s13059-026-03935-0
Han Yuan, Johannes Linder, David R Kelley
Background: DNA sequence deep learning models can accurately predict epigenetic and transcriptional profiles, enabling analysis of gene regulation and genetic variant effects. While large-scale models like Enformer and Borzoi are trained on abundant data, they cannot cover all cell states and assays, necessitating training new model to analyze gene regulation in novel contexts. However, training models from scratch for new datasets is computationally expensive.
Results: In this study, we systematically develop and evaluate a transfer learning framework based on parameter-efficient fine-tuning for supervised regulatory sequence models. Using the state-of-the-art model Borzoi, our framework enables accurate model transfer while significantly reducing runtime and memory requirements. Across bulk and single cell RNA-seq datasets, the transferred models effectively predict held-out gene expression changes, identify regulatory drivers in perturbation conditions, and predict cell-type-specific variant effects. We further demonstrate that transferring Borzoi to relevant cell types facilitates mechanistic interpretation of fine-mapped GWAS variants.
Conclusions: Our framework offers a scalable and practical solution for extending large sequence models to novel biological contexts, enabling mechanistic insight into gene regulation and variant effects.
{"title":"Parameter-efficient fine-tuning enables scalable transfer of regulatory sequence models to novel contexts.","authors":"Han Yuan, Johannes Linder, David R Kelley","doi":"10.1186/s13059-026-03935-0","DOIUrl":"https://doi.org/10.1186/s13059-026-03935-0","url":null,"abstract":"<p><strong>Background: </strong>DNA sequence deep learning models can accurately predict epigenetic and transcriptional profiles, enabling analysis of gene regulation and genetic variant effects. While large-scale models like Enformer and Borzoi are trained on abundant data, they cannot cover all cell states and assays, necessitating training new model to analyze gene regulation in novel contexts. However, training models from scratch for new datasets is computationally expensive.</p><p><strong>Results: </strong>In this study, we systematically develop and evaluate a transfer learning framework based on parameter-efficient fine-tuning for supervised regulatory sequence models. Using the state-of-the-art model Borzoi, our framework enables accurate model transfer while significantly reducing runtime and memory requirements. Across bulk and single cell RNA-seq datasets, the transferred models effectively predict held-out gene expression changes, identify regulatory drivers in perturbation conditions, and predict cell-type-specific variant effects. We further demonstrate that transferring Borzoi to relevant cell types facilitates mechanistic interpretation of fine-mapped GWAS variants.</p><p><strong>Conclusions: </strong>Our framework offers a scalable and practical solution for extending large sequence models to novel biological contexts, enabling mechanistic insight into gene regulation and variant effects.</p>","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":" ","pages":""},"PeriodicalIF":10.1,"publicationDate":"2026-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146092832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-30DOI: 10.1186/s13059-025-03908-9
Thomas Defard, Alice Blondel, Sebastien Bellow, Anthony Coleon, Guilherme Dias de Melo, Florian Mueller, Thomas Walter
Imaging-based spatial transcriptomics enables high-resolution spatial mapping of RNA species. A key challenge in imaging-based spatial transcriptomics is accurate cell segmentation to assign each RNA molecule to the right cell. Here, we present RNA2seg, a novel segmentation algorithm trained on over 4 million cells from MERFISH and CosMx datasets across seven organs using a teacher-student training scheme. RNA2seg integrates RNA point clouds and all available membrane and nuclear stainings. Validation on manually annotated data shows superior performance including in zero-shot and few-shot settings.
{"title":"RNA2seg: a generalist model for cell segmentation in image-based spatial transcriptomics.","authors":"Thomas Defard, Alice Blondel, Sebastien Bellow, Anthony Coleon, Guilherme Dias de Melo, Florian Mueller, Thomas Walter","doi":"10.1186/s13059-025-03908-9","DOIUrl":"https://doi.org/10.1186/s13059-025-03908-9","url":null,"abstract":"<p><p>Imaging-based spatial transcriptomics enables high-resolution spatial mapping of RNA species. A key challenge in imaging-based spatial transcriptomics is accurate cell segmentation to assign each RNA molecule to the right cell. Here, we present RNA2seg, a novel segmentation algorithm trained on over 4 million cells from MERFISH and CosMx datasets across seven organs using a teacher-student training scheme. RNA2seg integrates RNA point clouds and all available membrane and nuclear stainings. Validation on manually annotated data shows superior performance including in zero-shot and few-shot settings.</p>","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":" ","pages":""},"PeriodicalIF":10.1,"publicationDate":"2026-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146092825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-30DOI: 10.1186/s13059-025-03902-1
Qing Chen, Tianchang Yang, Yingping Hou, Kai Wang, Tingting Li, Longteng Wang, Cuiyun Dou, Panpan Cheng, Minglei Shi, Wei Li
Background: Neutrophil differentiation is a well-orchestrated process that involves coordinated changes in the chromatin accessibility, transcription factor (TF) binding, 3D genome-structure and transcription. However, despite significant advances in understanding the later stages of neutrophil maturation, the initial molecular events that trigger and drive the commitment to neutrophil lineage remain poorly characterized, especially the functional roles of master TFs in orchestrating the earliest stages of lineage specification.
Results: Here, we examine changes in the genome topology, transcriptome, and chromatin accessibility during the neutrophil differentiation process. We demonstrate striking changes in 3D genome structure and chromatin accessibility as early as 4 h after all-trans-retinoic acid treatment, which accommodate and regulate gene expression to guide neutrophil lineage differentiation. Analysis of early transcriptional changes confirmed CEBPA as a key TF. To further elucidate the relationships among CEBPA binding, chromatin accessibility, 3D genomic organization, and gene expression, we perform CEBPA HiCut, a technology that simultaneously profiles TF- DNA binding sites and TF-mediated 3D genome interactions. Our first TF-mediated HiCut experiment in neutrophils revealed the synergistic relationship and sequential regulation cascade between this core TF and chromatin accessibility, 3D genomic organization, and gene expression.
Conclusions: Our work systematically investigates coordinated chromatin accessibility, TF binding, 3D genome-structure and transcriptional changes during neutrophil differentiation, especially at initialization. Our study highlights the sequential interplay between the initial changes of chromatin state, 3D genome organization and pioneer TF s such as CEBPA.
{"title":"An integrated multi-omics and network analysis of neutrophil differentiation from initial- to late-stage.","authors":"Qing Chen, Tianchang Yang, Yingping Hou, Kai Wang, Tingting Li, Longteng Wang, Cuiyun Dou, Panpan Cheng, Minglei Shi, Wei Li","doi":"10.1186/s13059-025-03902-1","DOIUrl":"https://doi.org/10.1186/s13059-025-03902-1","url":null,"abstract":"<p><strong>Background: </strong>Neutrophil differentiation is a well-orchestrated process that involves coordinated changes in the chromatin accessibility, transcription factor (TF) binding, 3D genome-structure and transcription. However, despite significant advances in understanding the later stages of neutrophil maturation, the initial molecular events that trigger and drive the commitment to neutrophil lineage remain poorly characterized, especially the functional roles of master TFs in orchestrating the earliest stages of lineage specification.</p><p><strong>Results: </strong>Here, we examine changes in the genome topology, transcriptome, and chromatin accessibility during the neutrophil differentiation process. We demonstrate striking changes in 3D genome structure and chromatin accessibility as early as 4 h after all-trans-retinoic acid treatment, which accommodate and regulate gene expression to guide neutrophil lineage differentiation. Analysis of early transcriptional changes confirmed CEBPA as a key TF. To further elucidate the relationships among CEBPA binding, chromatin accessibility, 3D genomic organization, and gene expression, we perform CEBPA HiCut, a technology that simultaneously profiles TF- DNA binding sites and TF-mediated 3D genome interactions. Our first TF-mediated HiCut experiment in neutrophils revealed the synergistic relationship and sequential regulation cascade between this core TF and chromatin accessibility, 3D genomic organization, and gene expression.</p><p><strong>Conclusions: </strong>Our work systematically investigates coordinated chromatin accessibility, TF binding, 3D genome-structure and transcriptional changes during neutrophil differentiation, especially at initialization. Our study highlights the sequential interplay between the initial changes of chromatin state, 3D genome organization and pioneer TF s such as CEBPA.</p>","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":" ","pages":""},"PeriodicalIF":10.1,"publicationDate":"2026-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146092766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-30DOI: 10.1186/s13059-026-03970-x
Yuqiu Wang, Wenxuan Zuo, Jiawei Huang, Fengzhu Sun, Yuxuan Du
Background: Metagenomics combined with High-throughput Chromosome Conformation Capture (Hi-C) provides a powerful approach to study microbial communities by linking genomic content with spatial interactions. Hi-C complements shotgun sequencing by revealing taxonomic composition, functional interactions, and genomic organization within a single sample. However, aligning Hi-C reads to metagenomic contigs is challenging due to variable insert sizes of Hi-C paired-end reads, multi-species complexity, and gaps in assemblies. Although several benchmark studies have evaluated general alignment tools and Hi-C data alignment, none have specifically focused on metagenomic Hi-C data.
Results: We evaluated seven alignment strategies commonly used in Hi-C analyses: BWA MEM -5SP, BWA MEM default, BWA aln default, Bowtie2 default, Bowtie2 -very-sensitive-local, Minimap2 default, and Chromap Hi-C default. We benchmarked these tools on one synthetic dataset and seven real-world environments. Performance was assessed based on the number of inter-contig Hi-C read pairs and their impact on downstream tasks, such as binning quality.
Conclusions: We show that BWA MEM -5SPgenerally outperformed all other tools across most environments in terms of inter-contig read pairs and binning quality, followed by BWA MEM default. Chromap and Minimap2, while less effective in these metrics, demonstrated the highest computational efficiency.
背景:宏基因组学与高通量染色体构象捕获(Hi-C)相结合,通过将基因组内容与空间相互作用联系起来,为研究微生物群落提供了一种强有力的方法。Hi-C通过揭示单个样品内的分类组成、功能相互作用和基因组组织来补充鸟枪测序。然而,由于Hi-C配对末端reads的插入大小可变,多物种复杂性和组装中的间隙,将Hi-C reads与宏基因组组合体对齐是具有挑战性的。虽然一些基准研究已经评估了一般的比对工具和Hi-C数据比对,但没有一个专门针对宏基因组的Hi-C数据。结果:我们评估了7种在Hi-C分析中常用的对齐策略:BWA MEM -5SP、BWA MEM default、BWA aln default、Bowtie2 default、Bowtie2 -非常敏感-local、Minimap2 default和Chromap Hi-C default。我们在一个合成数据集和七个真实环境中对这些工具进行了基准测试。性能评估是基于相互连接的Hi-C读对的数量及其对下游任务的影响,如分箱质量。结论:我们表明,在大多数环境中,BWA MEM - 5sp在互配置读取对和分组质量方面通常优于所有其他工具,其次是BWA MEM默认。Chromap和Minimap2虽然在这些指标上效率较低,但显示出最高的计算效率。
{"title":"Benchmarking alignment strategies for Hi-C reads in metagenomic Hi-C data.","authors":"Yuqiu Wang, Wenxuan Zuo, Jiawei Huang, Fengzhu Sun, Yuxuan Du","doi":"10.1186/s13059-026-03970-x","DOIUrl":"https://doi.org/10.1186/s13059-026-03970-x","url":null,"abstract":"<p><strong>Background: </strong>Metagenomics combined with High-throughput Chromosome Conformation Capture (Hi-C) provides a powerful approach to study microbial communities by linking genomic content with spatial interactions. Hi-C complements shotgun sequencing by revealing taxonomic composition, functional interactions, and genomic organization within a single sample. However, aligning Hi-C reads to metagenomic contigs is challenging due to variable insert sizes of Hi-C paired-end reads, multi-species complexity, and gaps in assemblies. Although several benchmark studies have evaluated general alignment tools and Hi-C data alignment, none have specifically focused on metagenomic Hi-C data.</p><p><strong>Results: </strong>We evaluated seven alignment strategies commonly used in Hi-C analyses: BWA MEM -5SP, BWA MEM default, BWA aln default, Bowtie2 default, Bowtie2 -very-sensitive-local, Minimap2 default, and Chromap Hi-C default. We benchmarked these tools on one synthetic dataset and seven real-world environments. Performance was assessed based on the number of inter-contig Hi-C read pairs and their impact on downstream tasks, such as binning quality.</p><p><strong>Conclusions: </strong>We show that BWA MEM -5SPgenerally outperformed all other tools across most environments in terms of inter-contig read pairs and binning quality, followed by BWA MEM default. Chromap and Minimap2, while less effective in these metrics, demonstrated the highest computational efficiency.</p>","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":" ","pages":""},"PeriodicalIF":10.1,"publicationDate":"2026-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146092769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-29DOI: 10.1186/s13059-026-03956-9
Alessandra Bigi, Fabrizio Chiti
Phase separation is an important process in biology associated with formation of membraneless organelles but possibly related to the emergence of solid inclusions. TDP-43 is a largely studied paradigmatic case, as it forms neuronal cytoplasmic inclusions in neurodegenerative diseases and is an essential component of many membraneless organelles. Here, we review the physicochemical fundamentals of liquid-liquid phase separation (LLPS) of TDP-43 and its fragments in vitro, showing that full-length TDP-43 requires RNA or chaperones to form stable liquid droplets. We describe TDP-43-containing membraneless organelles and the debate on whether these assemblies represent reservoirs for pathological solid inclusion formation.
{"title":"Understanding liquid-liquid phase separation through TDP-43: fundamental principles, subcellular compartmentalisation, and role of solid inclusion formation.","authors":"Alessandra Bigi, Fabrizio Chiti","doi":"10.1186/s13059-026-03956-9","DOIUrl":"https://doi.org/10.1186/s13059-026-03956-9","url":null,"abstract":"<p><p>Phase separation is an important process in biology associated with formation of membraneless organelles but possibly related to the emergence of solid inclusions. TDP-43 is a largely studied paradigmatic case, as it forms neuronal cytoplasmic inclusions in neurodegenerative diseases and is an essential component of many membraneless organelles. Here, we review the physicochemical fundamentals of liquid-liquid phase separation (LLPS) of TDP-43 and its fragments in vitro, showing that full-length TDP-43 requires RNA or chaperones to form stable liquid droplets. We describe TDP-43-containing membraneless organelles and the debate on whether these assemblies represent reservoirs for pathological solid inclusion formation.</p>","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":" ","pages":""},"PeriodicalIF":10.1,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146085467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Phenotypic diversity arises from the process of development and is shaped by genomic variation in plants. However, the genetic basis of growth dynamics remains poorly understood in maize.
Results: Here, we analyze 679 maize inbred lines derived from a synthetic CUBIC population with approximately 2.8 million SNPs, leveraging high-throughput phenotyping to capture 1,002,240 RGB images across 18 growth stages. We quantify 67 image-based traits (i-traits), revealing distinct dynamic patterns throughout development. Genome-wide association studies identify 857 quantitative trait loci (QTLs) influencing growth variation, with 88.6% classified as period-specific dynamic QTLs exhibiting modest effects, and 11.4% as conservative QTLs with sustained effects. Notably, 1.5% of cryptic pleiotropic QTLs spanning different growth stages suggest genetic relocations during development. These QTLs enhance heritability estimates for mature traits by an average of 6.2%. We further characterize the novel function of key genes linked with these QTLs, including BRD1 with the pleiotropic effects on plant height and perimeter of convex hull and ZmGalOx1 with the broad-spectrum regulation of plant architecture. Developmental rewiring of epistatic networks shapes maize growth, underscoring the vitality of temporal genetic regulation. Trajectory modeling of i-traits across periods decodes the growth variation patterns, supporting the ontogenic hypothesis driven predictive breeding strategies.
Conclusion: The findings elucidate the genetic architecture underlying growth dynamics from a spatial-temporal perspective, offering novel insights for maize improvement.
{"title":"Genetic dynamics drive maize growth and breeding.","authors":"Chengxiu Wu, Zedong Geng, Weikun Li, Junli Ye, Xiaoyuan Hao, Jieting Xu, Minliang Jin, Xiaoyu Wu, Yuanhao Du, Yunyu Chen, Cheng Ma, Yu Gao, Yuyue Chen, Tianjin Xie, Songtao Gui, Yuanyuan Chen, Jingyun Luo, Yupeng Liu, Wenyu Yang, Jianbing Yan, Wanneng Yang, Yingjie Xiao","doi":"10.1186/s13059-026-03957-8","DOIUrl":"https://doi.org/10.1186/s13059-026-03957-8","url":null,"abstract":"<p><strong>Background: </strong>Phenotypic diversity arises from the process of development and is shaped by genomic variation in plants. However, the genetic basis of growth dynamics remains poorly understood in maize.</p><p><strong>Results: </strong>Here, we analyze 679 maize inbred lines derived from a synthetic CUBIC population with approximately 2.8 million SNPs, leveraging high-throughput phenotyping to capture 1,002,240 RGB images across 18 growth stages. We quantify 67 image-based traits (i-traits), revealing distinct dynamic patterns throughout development. Genome-wide association studies identify 857 quantitative trait loci (QTLs) influencing growth variation, with 88.6% classified as period-specific dynamic QTLs exhibiting modest effects, and 11.4% as conservative QTLs with sustained effects. Notably, 1.5% of cryptic pleiotropic QTLs spanning different growth stages suggest genetic relocations during development. These QTLs enhance heritability estimates for mature traits by an average of 6.2%. We further characterize the novel function of key genes linked with these QTLs, including BRD1 with the pleiotropic effects on plant height and perimeter of convex hull and ZmGalOx1 with the broad-spectrum regulation of plant architecture. Developmental rewiring of epistatic networks shapes maize growth, underscoring the vitality of temporal genetic regulation. Trajectory modeling of i-traits across periods decodes the growth variation patterns, supporting the ontogenic hypothesis driven predictive breeding strategies.</p><p><strong>Conclusion: </strong>The findings elucidate the genetic architecture underlying growth dynamics from a spatial-temporal perspective, offering novel insights for maize improvement.</p>","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":" ","pages":""},"PeriodicalIF":10.1,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146085458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}