Abhisek Chakraborty, Shruti Mahajan, Vishnu Prasoodanan P K, Akhilesh Shailendra Khamkar, Vineet K Sharma
Bagworms are commonly known for the well-organized case or bag surrounding them constructed using their silk and plant materials. To understand the genetic basis of these unique characteristics in bagworms, we performed multiomics analyses of a bagworm species, Eumeta crameri. The genome and transcriptome sequencing of E. crameri were used to construct the nuclear genome with a size of 668.2 Mb, N50 value of 6.6 Mb, and 13,554 coding genes, which was further assembled into 31 pseudochromosomes. The mitochondrial genome had a size of 15.6 Kb. We established the phylogenetic position of E. crameri with respect to 54 other insect species. The comparative analyses of E. crameri with other Lepidopterans revealed the adaptive evolution of genes related to primary metabolic pathways, defense, molting, and metamorphosis, and silk formation in the bagworm species. We also showed the ultrafine nature of the E. crameri silk fibres. Further, we performed the gut microbiome sequencing for E. crameri and constructed a gut microbial gene catalogue, which revealed the unique composition of the gut microbiome and its significance for host metabolism and defense. Together, the results provide multifaceted insights into the biological processes that support the well-organized holometabolous metamorphosis inside the bags of E. crameri.
{"title":"Life inside a bag: multiomics insights into the bagworm species Eumeta crameri.","authors":"Abhisek Chakraborty, Shruti Mahajan, Vishnu Prasoodanan P K, Akhilesh Shailendra Khamkar, Vineet K Sharma","doi":"10.1093/dnares/dsaf029","DOIUrl":"10.1093/dnares/dsaf029","url":null,"abstract":"<p><p>Bagworms are commonly known for the well-organized case or bag surrounding them constructed using their silk and plant materials. To understand the genetic basis of these unique characteristics in bagworms, we performed multiomics analyses of a bagworm species, Eumeta crameri. The genome and transcriptome sequencing of E. crameri were used to construct the nuclear genome with a size of 668.2 Mb, N50 value of 6.6 Mb, and 13,554 coding genes, which was further assembled into 31 pseudochromosomes. The mitochondrial genome had a size of 15.6 Kb. We established the phylogenetic position of E. crameri with respect to 54 other insect species. The comparative analyses of E. crameri with other Lepidopterans revealed the adaptive evolution of genes related to primary metabolic pathways, defense, molting, and metamorphosis, and silk formation in the bagworm species. We also showed the ultrafine nature of the E. crameri silk fibres. Further, we performed the gut microbiome sequencing for E. crameri and constructed a gut microbial gene catalogue, which revealed the unique composition of the gut microbiome and its significance for host metabolism and defense. Together, the results provide multifaceted insights into the biological processes that support the well-organized holometabolous metamorphosis inside the bags of E. crameri.</p>","PeriodicalId":51014,"journal":{"name":"DNA Research","volume":" ","pages":""},"PeriodicalIF":2.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12666389/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145356486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In molecular evolution analyses, genomic DNA sequence information is usually represented in the form of 4 bases (ATGC). However, research since the turn of the century has revealed the importance of epigenetic genome modifications, such as DNA base methylation, which can now be decoded using advanced sequence technologies. Here we provide an integrated framework for analyzing molecular evolution of nucleotide substitution, methylation, and demethylation using an expanded nucleotide code that incorporates different types of methylated bases. As a first attempt, we analysed substitution rates between bases, both unmethylated and methylated ones. As the model methylomes, we chose those of Helicobacter pylori, a unicellular bacterium with the largest known repertoire of sequence-specific DNA methyltransferases. We found that the demethylation rates are remarkably high while the methylation rates are comparable with the substitution rates between unmethylated bases. We found that the ribosomal proteins known for sequence conservation showed high methylation and demethylation frequencies, whereas the genes for DNA methyltransferases themselves showed low methylation and demethylation frequencies compared to base substitution. This study represents the first step toward molecular evolutionary epigenomics, which, we expect, would contribute to understanding epigenome evolution.
{"title":"Towards molecular evolutionary epigenomics with an expanded nucleotide code involving methylated bases.","authors":"Shinya Yoshida, Ikuo Uchiyama, Masaki Fukuyo, Mototsugu Kato, Desirazu N Rao, Mutsuko Konno, Shin-Ichi Fujiwara, Takeshi Azuma, Ichizo Kobayashi, Hirohisa Kishino","doi":"10.1093/dnares/dsaf025","DOIUrl":"10.1093/dnares/dsaf025","url":null,"abstract":"<p><p>In molecular evolution analyses, genomic DNA sequence information is usually represented in the form of 4 bases (ATGC). However, research since the turn of the century has revealed the importance of epigenetic genome modifications, such as DNA base methylation, which can now be decoded using advanced sequence technologies. Here we provide an integrated framework for analyzing molecular evolution of nucleotide substitution, methylation, and demethylation using an expanded nucleotide code that incorporates different types of methylated bases. As a first attempt, we analysed substitution rates between bases, both unmethylated and methylated ones. As the model methylomes, we chose those of Helicobacter pylori, a unicellular bacterium with the largest known repertoire of sequence-specific DNA methyltransferases. We found that the demethylation rates are remarkably high while the methylation rates are comparable with the substitution rates between unmethylated bases. We found that the ribosomal proteins known for sequence conservation showed high methylation and demethylation frequencies, whereas the genes for DNA methyltransferases themselves showed low methylation and demethylation frequencies compared to base substitution. This study represents the first step toward molecular evolutionary epigenomics, which, we expect, would contribute to understanding epigenome evolution.</p>","PeriodicalId":51014,"journal":{"name":"DNA Research","volume":" ","pages":""},"PeriodicalIF":2.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12666383/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145092730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Flowering cherries (genus Cerasus) are iconic trees in Japan, celebrated for their cultural and ecological significance. Despite their prominence, high-quality genomic resources for wild Cerasus species have been limited. Here, we report chromosome-level genome assemblies of two representative Japanese cherries: Cerasus itosakura, a progenitor of the widely cultivated C. ×yedoensis "Somei-yoshino," and Cerasus jamasakura, a traditional popular wild species endemic to Japan. Using deep PacBio long-read and Illumina short-read sequencing, combined with reference-guided scaffolding based on near-complete C. speciosa genome, we generated assemblies of 259.1 Mbp (C. itosakura) and 312.6 Mbp (C. jamasakura), with both >98% BUSCO completeness. Consistent with their natural histories, C. itosakura showed low heterozygosity, while C. jamasakura displayed high genomic diversity. Comparative genomic analyses revealed structural variations, including large chromosomal inversions. Notably, the availability of both the previously published C. speciosa genome and our new C. itosakura genome enabled the reconstruction of proxy haplotypes for both parental lineages of "Somei-yoshino." Comparison with the phased genome of "Somei-yoshino" revealed genomic discrepancies, suggesting that the cultivar may have arisen from genetically distinct or admixed individuals, and may also reflect intraspecific diversity. Our results offer genomic foundations for evolutionary and breeding studies in Cerasus and Prunus.
{"title":"Chromosome-scale genomes of two wild flowering cherries (Cerasus itosakura and Cerasus jamasakura) provide insights into structural evolution in Cerasus.","authors":"Kazumichi Fujiwara, Atsushi Toyoda, Toshio Katsuki, Yutaka Sato, Bhim B Biswa, Takushi Kishida, Momi Tsuruta, Yasukazu Nakamura, Takako Mochizuki, Noriko Kimura, Shoko Kawamoto, Tazro Ohta, Ken-Ichi Nonomura, Hironori Niki, Hiroyuki Yano, Kinji Umehara, Chikahiko Suzuki, Tsuyoshi Koide","doi":"10.1093/dnares/dsaf031","DOIUrl":"10.1093/dnares/dsaf031","url":null,"abstract":"<p><p>Flowering cherries (genus Cerasus) are iconic trees in Japan, celebrated for their cultural and ecological significance. Despite their prominence, high-quality genomic resources for wild Cerasus species have been limited. Here, we report chromosome-level genome assemblies of two representative Japanese cherries: Cerasus itosakura, a progenitor of the widely cultivated C. ×yedoensis \"Somei-yoshino,\" and Cerasus jamasakura, a traditional popular wild species endemic to Japan. Using deep PacBio long-read and Illumina short-read sequencing, combined with reference-guided scaffolding based on near-complete C. speciosa genome, we generated assemblies of 259.1 Mbp (C. itosakura) and 312.6 Mbp (C. jamasakura), with both >98% BUSCO completeness. Consistent with their natural histories, C. itosakura showed low heterozygosity, while C. jamasakura displayed high genomic diversity. Comparative genomic analyses revealed structural variations, including large chromosomal inversions. Notably, the availability of both the previously published C. speciosa genome and our new C. itosakura genome enabled the reconstruction of proxy haplotypes for both parental lineages of \"Somei-yoshino.\" Comparison with the phased genome of \"Somei-yoshino\" revealed genomic discrepancies, suggesting that the cultivar may have arisen from genetically distinct or admixed individuals, and may also reflect intraspecific diversity. Our results offer genomic foundations for evolutionary and breeding studies in Cerasus and Prunus.</p>","PeriodicalId":51014,"journal":{"name":"DNA Research","volume":" ","pages":""},"PeriodicalIF":2.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12666387/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145402721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Martin Abraham Puthumana, Manohar S Bisht, Mitali Singh, Vineet K Sharma
The Red-vented Bulbul (Pycnonotus cafer) of the Pycnonotidae family is one of the most invasive tropical passerine bird species. We accomplished the genome and transcriptome sequencing of P. cafer to explore the genomic basis of invasiveness and assembled the genome size of 1.03 Gb and 15,533 protein-coding genes with an N50 of 3.04 Mb and 97.2% BUSCO completeness. Our study constructed the mitogenome and 18S rRNA marker gene of P. cafer for the first time. Further, we investigated the demographic history and identified recent genetic bottlenecks the species experienced. We established the phylogenetic position of P. cafer and examined the gene family evolution along with orthologous gene clustering to provide clues on the invasive characteristics of P. cafer. Our study thus serves as a significant resource for future studies in invasion genomics and the possible management of this bird species in alien ranges.
{"title":"Genome assembly and insights into globally invasive Red-vented Bulbul (Pycnonotus cafer).","authors":"Martin Abraham Puthumana, Manohar S Bisht, Mitali Singh, Vineet K Sharma","doi":"10.1093/dnares/dsaf027","DOIUrl":"10.1093/dnares/dsaf027","url":null,"abstract":"<p><p>The Red-vented Bulbul (Pycnonotus cafer) of the Pycnonotidae family is one of the most invasive tropical passerine bird species. We accomplished the genome and transcriptome sequencing of P. cafer to explore the genomic basis of invasiveness and assembled the genome size of 1.03 Gb and 15,533 protein-coding genes with an N50 of 3.04 Mb and 97.2% BUSCO completeness. Our study constructed the mitogenome and 18S rRNA marker gene of P. cafer for the first time. Further, we investigated the demographic history and identified recent genetic bottlenecks the species experienced. We established the phylogenetic position of P. cafer and examined the gene family evolution along with orthologous gene clustering to provide clues on the invasive characteristics of P. cafer. Our study thus serves as a significant resource for future studies in invasion genomics and the possible management of this bird species in alien ranges.</p>","PeriodicalId":51014,"journal":{"name":"DNA Research","volume":" ","pages":""},"PeriodicalIF":2.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12666379/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145253563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Root exudates shape root-associated microbial communities that differ from those in soil. Notably, specific microorganisms colonize the root surface (rhizoplane) and strongly associate with plants. Although retrieving microbial genomes from soil and root-associated environments remains challenging, single amplified genomes (SAGs) and metagenome-assembled genomes (MAGs) are essential for studying these microbiomes. This study compared SAGs and MAGs constructed from short-read metagenomes of the same soil samples to clarify their advantages and limitations in soil and root-associated microbiomes, and to deepen insights into microbial dynamics in rhizoplane. We demonstrated that SAGs are better suited than MAGs for expanding the microbial tree of life in soil and rhizoplane environments, due to their greater gene content, broader taxonomic coverage, and higher sequence resolution of quality genomes. Metagenomic analysis provided sufficient coverage in the rhizoplane but was limited in soil. Additionally, integrating SAGs with metagenomic reads enabled strain-level analysis of microbial dynamics in the rhizoplane. Furthermore, SAGs provided insights into plasmid-host associations and dynamics, which MAGs failed to capture. Our study highlights the effectiveness of single-cell genomics in expanding microbial genome catalogues in soil and rhizosphere environments. Integrating high-resolution SAGs with comprehensive rhizoplane metagenomes offers a robust approach to elucidating microbial dynamics around plant roots.
{"title":"Strain-level dissection of complex rhizoplane and soil bacterial communities using single-cell genomics and metagenomics.","authors":"Masako Kifushi, Yohei Nishikawa, Masahito Hosokawa, Toyoaki Anai, Haruko Takeyama","doi":"10.1093/dnares/dsaf032","DOIUrl":"10.1093/dnares/dsaf032","url":null,"abstract":"<p><p>Root exudates shape root-associated microbial communities that differ from those in soil. Notably, specific microorganisms colonize the root surface (rhizoplane) and strongly associate with plants. Although retrieving microbial genomes from soil and root-associated environments remains challenging, single amplified genomes (SAGs) and metagenome-assembled genomes (MAGs) are essential for studying these microbiomes. This study compared SAGs and MAGs constructed from short-read metagenomes of the same soil samples to clarify their advantages and limitations in soil and root-associated microbiomes, and to deepen insights into microbial dynamics in rhizoplane. We demonstrated that SAGs are better suited than MAGs for expanding the microbial tree of life in soil and rhizoplane environments, due to their greater gene content, broader taxonomic coverage, and higher sequence resolution of quality genomes. Metagenomic analysis provided sufficient coverage in the rhizoplane but was limited in soil. Additionally, integrating SAGs with metagenomic reads enabled strain-level analysis of microbial dynamics in the rhizoplane. Furthermore, SAGs provided insights into plasmid-host associations and dynamics, which MAGs failed to capture. Our study highlights the effectiveness of single-cell genomics in expanding microbial genome catalogues in soil and rhizosphere environments. Integrating high-resolution SAGs with comprehensive rhizoplane metagenomes offers a robust approach to elucidating microbial dynamics around plant roots.</p>","PeriodicalId":51014,"journal":{"name":"DNA Research","volume":" ","pages":""},"PeriodicalIF":2.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12666376/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145476835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Oil palm (Elaeis guineensis Jacq.) is a globally important crop, and its genetic improvements benefit from comprehensive genome sequencing. Here, we report the whole-genome sequencing and annotation of two key genetic resources: the wild (Eg-DCM) and ancestral (Eg-DBG) Dura accessions, using a combination of short- and long-read sequencing technologies. De novo assembly followed by polishing, proximity ligation, and reference-guided scaffolding yielded high-quality assemblies with ungapped lengths of 1.71 Gb (Eg-DBG) and 1.48 Gb (Eg-DCM). Eg-DCM and Eg-DBG genomes exhibited high completeness, with over 97% of Benchmarking Universal Single-Copy Orthologs (BUSCOs) recovered across the Eukaryota, Viridiplantae, and Embryophyta datasets. Repetitive elements, particularly retrotransposons, dominated both genomes, accounting for 46.10% of Eg-DBG and 43.85% of Eg-DCM. Gene prediction initially identified 61,256 (Eg-DBG) and 53,985 (Eg-DCM) genes, which were refined into high-confidence gene sets of 39,263 and 35,298, respectively. Additionally, 1,760 and 1,684 putative resistance (R) genes were identified in Eg-DCM and Eg-DBG, with similar class distributions. The five major R gene classes comprise KIN, RLK, RLP, CNL, and CK. With further research, the assembled whole-genome sequences and the annotated genes of Eg-DBG and Eg-DCM offer valuable insights into the untapped genomic information of undomesticated accessions, with implications for future breeding and crop improvement efforts of oil palm.
{"title":"Whole-genome sequencing of wild and ancestral Dura provides insight into the untapped genomic information of undomesticated oil palm (Elaeis guineensis Jacq.).","authors":"Redi Aditama, Heri Adriwan Siregar, Zulfikar Achmad Tanjung, Diny Dinarti, Sintho Wahyuning Ardie, Willy Bayuardi Suwarno, Edy Suprianto, Condro Utomo, Tony Liwang, Sudarsono Sudarsono","doi":"10.1093/dnares/dsaf033","DOIUrl":"10.1093/dnares/dsaf033","url":null,"abstract":"<p><p>Oil palm (Elaeis guineensis Jacq.) is a globally important crop, and its genetic improvements benefit from comprehensive genome sequencing. Here, we report the whole-genome sequencing and annotation of two key genetic resources: the wild (Eg-DCM) and ancestral (Eg-DBG) Dura accessions, using a combination of short- and long-read sequencing technologies. De novo assembly followed by polishing, proximity ligation, and reference-guided scaffolding yielded high-quality assemblies with ungapped lengths of 1.71 Gb (Eg-DBG) and 1.48 Gb (Eg-DCM). Eg-DCM and Eg-DBG genomes exhibited high completeness, with over 97% of Benchmarking Universal Single-Copy Orthologs (BUSCOs) recovered across the Eukaryota, Viridiplantae, and Embryophyta datasets. Repetitive elements, particularly retrotransposons, dominated both genomes, accounting for 46.10% of Eg-DBG and 43.85% of Eg-DCM. Gene prediction initially identified 61,256 (Eg-DBG) and 53,985 (Eg-DCM) genes, which were refined into high-confidence gene sets of 39,263 and 35,298, respectively. Additionally, 1,760 and 1,684 putative resistance (R) genes were identified in Eg-DCM and Eg-DBG, with similar class distributions. The five major R gene classes comprise KIN, RLK, RLP, CNL, and CK. With further research, the assembled whole-genome sequences and the annotated genes of Eg-DBG and Eg-DCM offer valuable insights into the untapped genomic information of undomesticated accessions, with implications for future breeding and crop improvement efforts of oil palm.</p>","PeriodicalId":51014,"journal":{"name":"DNA Research","volume":" ","pages":""},"PeriodicalIF":2.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12730872/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145565491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fei Ge, Yingwei Guo, Lei Xu, Wai Yee Low, Haoran Ma, Qian Li, Zezhao Wang, Bo Zhu, Lingyang Xu, Xue Gao, Lupei Zhang, Huijiang Gao, Junya Li, Yan Chen
Genomic research is currently undergoing a paradigm shift from reliance on a single reference sequence to the use of breed-specific genomes. Chinese indicine cattle (Bos taurus indicus), characterized by their notable tick resistance and heat tolerance, display extensively genetic diversity than taurine. Here, we generated a chromosome level genome assembly of Chinese indicine cattle, achieving a contiguity N50 of 90.92 Mb and an overall size of 2.91Gb, utilizing PacBio HiFi sequencing complemented by Hi-C sequencing technology. The assembly is characterized by near-complete chromosomes, telomeres, and less gaps. Utilizing this highly quality assembly, we explored the phylogenetic relationship and speciation time. The gene family and selection signatures analyses indicated that candidate genes and biosynthetic pathways potentially contributing to disease immunity and thermotolerance of indicine cattle. Altogether, this study enriches the bovine pangenome repository and advances our understanding of the complex evolutionary patterns and distinctive adaptation traits of Chinese indicine cattle.
{"title":"The chromosome-level genome of Chinese indicine cattle breed provides insights into bovine adaptation and immunity.","authors":"Fei Ge, Yingwei Guo, Lei Xu, Wai Yee Low, Haoran Ma, Qian Li, Zezhao Wang, Bo Zhu, Lingyang Xu, Xue Gao, Lupei Zhang, Huijiang Gao, Junya Li, Yan Chen","doi":"10.1093/dnares/dsaf034","DOIUrl":"https://doi.org/10.1093/dnares/dsaf034","url":null,"abstract":"<p><p>Genomic research is currently undergoing a paradigm shift from reliance on a single reference sequence to the use of breed-specific genomes. Chinese indicine cattle (Bos taurus indicus), characterized by their notable tick resistance and heat tolerance, display extensively genetic diversity than taurine. Here, we generated a chromosome level genome assembly of Chinese indicine cattle, achieving a contiguity N50 of 90.92 Mb and an overall size of 2.91Gb, utilizing PacBio HiFi sequencing complemented by Hi-C sequencing technology. The assembly is characterized by near-complete chromosomes, telomeres, and less gaps. Utilizing this highly quality assembly, we explored the phylogenetic relationship and speciation time. The gene family and selection signatures analyses indicated that candidate genes and biosynthetic pathways potentially contributing to disease immunity and thermotolerance of indicine cattle. Altogether, this study enriches the bovine pangenome repository and advances our understanding of the complex evolutionary patterns and distinctive adaptation traits of Chinese indicine cattle.</p>","PeriodicalId":51014,"journal":{"name":"DNA Research","volume":" ","pages":""},"PeriodicalIF":2.9,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145642461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Understanding the molecular causes of complex diseases remains one of the most pressing challenges in biomedicine. Despite large-scale genome-wide association studies mapping thousands of risk loci, identifying which genetic variants truly drive disease remains difficult. Traditional statistical genetics has laid a strong foundation for variant discovery, but it often struggles to capture nonlinear interactions and cannot fully integrate the breadth of the interconnected multi-omics data. In recent years, deep learning approaches have shown promise in bridging these gaps: modelling high-order genetic interactions, uncovering latent biological structure, and enabling multi-layered data integration. However, most current deep learning models for genomics remain exploratory in nature, and issues such as susceptibility to overfitting, difficulties in interpretability, and the general lack of standardized evaluation frameworks have limited their widespread adoption for genomics research. In this review, we explore how traditional statistical and deep learning methods can be applied to uncover causal mechanisms in complex disease. We critically compare these two frameworks for their advantages and limitations in detecting genetic associations and prioritizing causal associations. Towards the end, we propose a future direction centred around hybrid models that blend the scalability of deep learning with the inferential power of statistical genetics. Our goal is to guide researchers in developing next-generation computational tools to uncover the molecular basis of complex diseases and accelerate the translation of genetic findings into effective treatments.
{"title":"Can classical statistics and deep learning converge on explainable, causally driven target discovery?","authors":"Liyin Chen","doi":"10.1093/dnares/dsaf024","DOIUrl":"10.1093/dnares/dsaf024","url":null,"abstract":"<p><p>Understanding the molecular causes of complex diseases remains one of the most pressing challenges in biomedicine. Despite large-scale genome-wide association studies mapping thousands of risk loci, identifying which genetic variants truly drive disease remains difficult. Traditional statistical genetics has laid a strong foundation for variant discovery, but it often struggles to capture nonlinear interactions and cannot fully integrate the breadth of the interconnected multi-omics data. In recent years, deep learning approaches have shown promise in bridging these gaps: modelling high-order genetic interactions, uncovering latent biological structure, and enabling multi-layered data integration. However, most current deep learning models for genomics remain exploratory in nature, and issues such as susceptibility to overfitting, difficulties in interpretability, and the general lack of standardized evaluation frameworks have limited their widespread adoption for genomics research. In this review, we explore how traditional statistical and deep learning methods can be applied to uncover causal mechanisms in complex disease. We critically compare these two frameworks for their advantages and limitations in detecting genetic associations and prioritizing causal associations. Towards the end, we propose a future direction centred around hybrid models that blend the scalability of deep learning with the inferential power of statistical genetics. Our goal is to guide researchers in developing next-generation computational tools to uncover the molecular basis of complex diseases and accelerate the translation of genetic findings into effective treatments.</p>","PeriodicalId":51014,"journal":{"name":"DNA Research","volume":" ","pages":""},"PeriodicalIF":2.9,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12628793/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145092784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tian-Wen Xiao, Xin-Feng Wang, Zheng-Feng Wang, Hai-Fei Yan
Sauvagesia rhodoleuca is an endangered species endemic to southern China. Due to human activities, only 6 fragmented populations remain in Guangdong and Guangxi. Despite considerable conservation efforts, its demographic history and evolution remain poorly understood, particularly from a genomic perspective. To address this, we assembled a chromosome-scale genome of S. rhodoleuca using Nanopore long-read sequencing, DNA short-read sequencing, RNA-seq, and Hi-C data. A total of 290.37 Mb of assembled sequences, accounting for 99.76% of the genome, were successfully anchored to 19 pseudo-chromosomes, achieving a BUSCO completeness of 98.40% and a long terminal repeat assembly index of 17.28. Genome annotation identified 26,758 protein-coding genes and 369 tRNA genes. Demographic analysis revealed a sharp decline in the effective population size of S. rhodoleuca beginning approximately 1 million years ago. Whole-genome duplication (WGD) analysis revealed that S. rhodoleuca experienced a whole-genome triplication (WGT) followed by a more recent WGD after diverging from the Rhizophoraceae. Genes retained from WGT and WGD events played key roles in the development and survival of S. rhodoleuca, as indicated by Gene Ontology analysis. The high-quality genome of S. rhodoleuca provides insights into its genomic characteristics and evolutionary history, offering a valuable resource for conservation and genetic management.
{"title":"Chromosome-scale genome assembly of Sauvagesia rhodoleuca (Ochnaceae) provides insights into its genome evolution and demographic history.","authors":"Tian-Wen Xiao, Xin-Feng Wang, Zheng-Feng Wang, Hai-Fei Yan","doi":"10.1093/dnares/dsaf022","DOIUrl":"10.1093/dnares/dsaf022","url":null,"abstract":"<p><p>Sauvagesia rhodoleuca is an endangered species endemic to southern China. Due to human activities, only 6 fragmented populations remain in Guangdong and Guangxi. Despite considerable conservation efforts, its demographic history and evolution remain poorly understood, particularly from a genomic perspective. To address this, we assembled a chromosome-scale genome of S. rhodoleuca using Nanopore long-read sequencing, DNA short-read sequencing, RNA-seq, and Hi-C data. A total of 290.37 Mb of assembled sequences, accounting for 99.76% of the genome, were successfully anchored to 19 pseudo-chromosomes, achieving a BUSCO completeness of 98.40% and a long terminal repeat assembly index of 17.28. Genome annotation identified 26,758 protein-coding genes and 369 tRNA genes. Demographic analysis revealed a sharp decline in the effective population size of S. rhodoleuca beginning approximately 1 million years ago. Whole-genome duplication (WGD) analysis revealed that S. rhodoleuca experienced a whole-genome triplication (WGT) followed by a more recent WGD after diverging from the Rhizophoraceae. Genes retained from WGT and WGD events played key roles in the development and survival of S. rhodoleuca, as indicated by Gene Ontology analysis. The high-quality genome of S. rhodoleuca provides insights into its genomic characteristics and evolutionary history, offering a valuable resource for conservation and genetic management.</p>","PeriodicalId":51014,"journal":{"name":"DNA Research","volume":" ","pages":""},"PeriodicalIF":2.9,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12448743/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145016517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhi-Jian Zhou, Yang Xiao, Jie Fang, Yong-Xiu Yao, Chen-Hui Yang, Laurent Dacheux, Dong-Sheng Luo, Ye Qiu, Xing-Yi Ge
Bats (Chiroptera) are a taxonomic group of immense biological and ecological importance. They are primary reservoirs and carriers of various zoonotic viruses. Endogenous retroviruses (ERVs) originate from ancient retroviruses invading the host, and ERV-derived sequences can function as regulatory elements which influence gene expression and contribute to both physiological and pathological processes. However, ERVs and ERV-like elements (ERVLEs) carried by bats have not been fully characterized. In this study, we systematically explored the ERVs in 61 bat species and identified 10,352 bat-ERVs and 5,884 bat-ERVLEs sequences, and these sequences covered 3 major virus genera and included 7 groups related to human ERVs in the subfamily Orthoretrovirinae. In particular, a relatively intact endogenous deltaretrovirus sequence was identified in Molossus molossus. Additionally, 358 bat-ERV and 33 bat-ERVLE were identified as recombinants. The integration time of bat-ERVs was estimated to be concentrated in the last 10 to 40 million years, indicating their role in shaping the bat genome during the long-term co-evolution of virus and host. Furthermore, carnivorous bats tended to have more relatively complete and younger ERVs compared to herbivorous bats. According to bat transcriptomes, we found that 1,385 bat-ERVs and 197 bat-ERVLEs had transcriptional potential in 20 different tissues of 25 bats, implying that bat-ERVs harboured actively expressed genes with potential functions. In summary, we comprehensively characterized bat-ERVs in terms of their evolution, types and potential functions, providing foundational data and a new perspective for further research on bat-ERVs.
{"title":"Diversity, evolution, and transcription of endogenous retroviruses in Chiroptera genomes.","authors":"Zhi-Jian Zhou, Yang Xiao, Jie Fang, Yong-Xiu Yao, Chen-Hui Yang, Laurent Dacheux, Dong-Sheng Luo, Ye Qiu, Xing-Yi Ge","doi":"10.1093/dnares/dsaf021","DOIUrl":"10.1093/dnares/dsaf021","url":null,"abstract":"<p><p>Bats (Chiroptera) are a taxonomic group of immense biological and ecological importance. They are primary reservoirs and carriers of various zoonotic viruses. Endogenous retroviruses (ERVs) originate from ancient retroviruses invading the host, and ERV-derived sequences can function as regulatory elements which influence gene expression and contribute to both physiological and pathological processes. However, ERVs and ERV-like elements (ERVLEs) carried by bats have not been fully characterized. In this study, we systematically explored the ERVs in 61 bat species and identified 10,352 bat-ERVs and 5,884 bat-ERVLEs sequences, and these sequences covered 3 major virus genera and included 7 groups related to human ERVs in the subfamily Orthoretrovirinae. In particular, a relatively intact endogenous deltaretrovirus sequence was identified in Molossus molossus. Additionally, 358 bat-ERV and 33 bat-ERVLE were identified as recombinants. The integration time of bat-ERVs was estimated to be concentrated in the last 10 to 40 million years, indicating their role in shaping the bat genome during the long-term co-evolution of virus and host. Furthermore, carnivorous bats tended to have more relatively complete and younger ERVs compared to herbivorous bats. According to bat transcriptomes, we found that 1,385 bat-ERVs and 197 bat-ERVLEs had transcriptional potential in 20 different tissues of 25 bats, implying that bat-ERVs harboured actively expressed genes with potential functions. In summary, we comprehensively characterized bat-ERVs in terms of their evolution, types and potential functions, providing foundational data and a new perspective for further research on bat-ERVs.</p>","PeriodicalId":51014,"journal":{"name":"DNA Research","volume":" ","pages":""},"PeriodicalIF":2.9,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12402889/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144977606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}