Pub Date : 2024-10-31DOI: 10.1186/s12864-024-10931-w
Harpreet Kaur, Laura M Shannon, Deborah A Samac
Background: The concept of pangenomics and the importance of structural variants is gaining recognition within the plant genomics community. Due to advancements in sequencing and computational technology, it has become feasible to sequence the entire genome of numerous individuals of a single species at a reasonable cost. Pangenomes have been constructed for many major diploid crops, including rice, maize, soybean, sorghum, pearl millet, peas, sunflower, grapes, and mustards. However, pangenomes for polyploid species are relatively scarce and are available in only few crops including wheat, cotton, rapeseed, and potatoes.
Main body: In this review, we explore the various methods used in crop pangenome development, discussing the challenges and implications of these techniques based on insights from published pangenome studies. We offer a systematic guide and discuss the tools available for constructing a pangenome and conducting downstream analyses. Alfalfa, a highly heterozygous, cross pollinated and autotetraploid forage crop species, is used as an example to discuss the concerns and challenges offered by polyploid crop species. We conducted a comparative analysis using linear and graph-based methods by constructing an alfalfa graph pangenome using three publicly available genome assemblies. To illustrate the intricacies captured by pangenome graphs for a complex crop genome, we used five different gene sequences and aligned them against the three graph-based pangenomes. The comparison of the three graph pangenome methods reveals notable variations in the genomic variation captured by each pipeline.
Conclusion: Pangenome resources are proving invaluable by offering insights into core and dispensable genes, novel gene discovery, and genome-wide patterns of variation. Developing user-friendly online portals for linear pangenome visualization has made these resources accessible to the broader scientific and breeding community. However, challenges remain with graph-based pangenomes including compatibility with other tools, extraction of sequence for regions of interest, and visualization of genetic variation captured in pangenome graphs. These issues necessitate further refinement of tools and pipelines to effectively address the complexities of polyploid, highly heterozygous, and cross-pollinated species.
{"title":"A stepwise guide for pangenome development in crop plants: an alfalfa (Medicago sativa) case study.","authors":"Harpreet Kaur, Laura M Shannon, Deborah A Samac","doi":"10.1186/s12864-024-10931-w","DOIUrl":"10.1186/s12864-024-10931-w","url":null,"abstract":"<p><strong>Background: </strong>The concept of pangenomics and the importance of structural variants is gaining recognition within the plant genomics community. Due to advancements in sequencing and computational technology, it has become feasible to sequence the entire genome of numerous individuals of a single species at a reasonable cost. Pangenomes have been constructed for many major diploid crops, including rice, maize, soybean, sorghum, pearl millet, peas, sunflower, grapes, and mustards. However, pangenomes for polyploid species are relatively scarce and are available in only few crops including wheat, cotton, rapeseed, and potatoes.</p><p><strong>Main body: </strong>In this review, we explore the various methods used in crop pangenome development, discussing the challenges and implications of these techniques based on insights from published pangenome studies. We offer a systematic guide and discuss the tools available for constructing a pangenome and conducting downstream analyses. Alfalfa, a highly heterozygous, cross pollinated and autotetraploid forage crop species, is used as an example to discuss the concerns and challenges offered by polyploid crop species. We conducted a comparative analysis using linear and graph-based methods by constructing an alfalfa graph pangenome using three publicly available genome assemblies. To illustrate the intricacies captured by pangenome graphs for a complex crop genome, we used five different gene sequences and aligned them against the three graph-based pangenomes. The comparison of the three graph pangenome methods reveals notable variations in the genomic variation captured by each pipeline.</p><p><strong>Conclusion: </strong>Pangenome resources are proving invaluable by offering insights into core and dispensable genes, novel gene discovery, and genome-wide patterns of variation. Developing user-friendly online portals for linear pangenome visualization has made these resources accessible to the broader scientific and breeding community. However, challenges remain with graph-based pangenomes including compatibility with other tools, extraction of sequence for regions of interest, and visualization of genetic variation captured in pangenome graphs. These issues necessitate further refinement of tools and pipelines to effectively address the complexities of polyploid, highly heterozygous, and cross-pollinated species.</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"25 1","pages":"1022"},"PeriodicalIF":3.5,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11526573/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142557133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Bacterial small regulatory RNA (sRNA) plays a crucial role in cell metabolism and could be used as a new potential drug target in the treatment of pathogen-induced disease. However, experimental methods for identifying sRNAs still require a large investment of human and material resources.
Methods: In this study, we propose a novel sRNA prediction model called sRNAdeep based on the DistilBERT feature extraction and TextCNN methods. The sRNA and non-sRNA sequences of bacteria were considered as sentences and then fed into a composite model consisting of deep learning models to evaluate classification performance.
Results: By filtering sRNAs from BSRD database, we obtained a validation dataset comprised of 2438 positive and 4730 negative samples. The benchmark experiments showed that sRNAdeep displayed better performance in the various indexes compared to previous sRNA prediction tools. By applying our tool to Mycobacterium tuberculosis (MTB) genome, we have identified 21 sRNAs within the intergenic and intron regions. A set of 272 targeted genes regulated by these sRNAs were also captured in MTB. The coding proteins of two genes (lysX and icd1) are implicated in drug response, with significant active sites related to drug resistance mechanisms of MTB.
Conclusion: In conclusion, our newly developed sRNAdeep can help researchers identify bacterial sRNAs more precisely and can be freely available from https://github.com/pyajagod/sRNAdeep.git .
{"title":"sRNAdeep: a novel tool for bacterial sRNA prediction based on DistilBERT encoding mode and deep learning algorithms.","authors":"Weiye Qian, Jiawei Sun, Tianyi Liu, Zhiyuan Yang, Stephen Kwok-Wing Tsui","doi":"10.1186/s12864-024-10951-6","DOIUrl":"10.1186/s12864-024-10951-6","url":null,"abstract":"<p><strong>Background: </strong>Bacterial small regulatory RNA (sRNA) plays a crucial role in cell metabolism and could be used as a new potential drug target in the treatment of pathogen-induced disease. However, experimental methods for identifying sRNAs still require a large investment of human and material resources.</p><p><strong>Methods: </strong>In this study, we propose a novel sRNA prediction model called sRNAdeep based on the DistilBERT feature extraction and TextCNN methods. The sRNA and non-sRNA sequences of bacteria were considered as sentences and then fed into a composite model consisting of deep learning models to evaluate classification performance.</p><p><strong>Results: </strong>By filtering sRNAs from BSRD database, we obtained a validation dataset comprised of 2438 positive and 4730 negative samples. The benchmark experiments showed that sRNAdeep displayed better performance in the various indexes compared to previous sRNA prediction tools. By applying our tool to Mycobacterium tuberculosis (MTB) genome, we have identified 21 sRNAs within the intergenic and intron regions. A set of 272 targeted genes regulated by these sRNAs were also captured in MTB. The coding proteins of two genes (lysX and icd1) are implicated in drug response, with significant active sites related to drug resistance mechanisms of MTB.</p><p><strong>Conclusion: </strong>In conclusion, our newly developed sRNAdeep can help researchers identify bacterial sRNAs more precisely and can be freely available from https://github.com/pyajagod/sRNAdeep.git .</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"25 1","pages":"1021"},"PeriodicalIF":3.5,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11526673/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142557135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-31DOI: 10.1186/s12864-024-10938-3
Tuo Yin, Rong Xu, Ling Zhu, Xiuyao Yang, Mengjie Zhang, Xulin Li, Yinqiang Zi, Ke Wen, Ke Zhao, Hanbing Cai, Xiaozhen Liu, Hanyao Zhang
Background: The phenylalanine ammonia-lyase (PAL) gene, a well-studied plant defense gene, is crucial for growth, development, and stress resistance. The PAL gene family has been studied in many plants. Citrus is among the most vital cash crops worldwide. However, the PAL gene family has not been comprehensively studied in most Citrus species, and the biological functions and specific underlying mechanisms are unclear.
Results: We identified 41 PAL genes from nine Citrus species and revealed different patterns of evolution among the PAL genes in different Citrus species. Gene duplication was found to be a vital mechanism for the expansion of the PAL gene family in citrus. In addition, there was a strong correlation between the ability of PAL genes to respond to stress and their evolutionary duration in citrus. PAL genes with shorter evolutionary times were involved in more multiple stress responses, and these PAL genes with broad-spectrum resistance were all single-copy genes. By further integrating the lignin and flavonoid synthesis pathways in citrus, we observed that PAL genes contribute to the synthesis of lignin and flavonoids, which enhance the physical defense and ROS scavenging ability of citrus plants, thereby helping them withstand stress.
Conclusions: This study provides a comprehensive framework of the PAL gene family in citrus, and we propose a hypothetical model for the stress resistance mechanism in citrus. This study provides a foundation for further investigations into the biological functions of PAL genes in the growth, development, and response to various stresses in citrus.
背景:苯丙氨酸氨基转移酶(PAL)基因是一种研究较多的植物防御基因,对植物的生长、发育和抗逆性至关重要。对许多植物的 PAL 基因家族都进行过研究。柑橘是全球最重要的经济作物之一。然而,在大多数柑橘物种中,PAL 基因家族尚未得到全面研究,其生物学功能和具体的内在机制尚不清楚:结果:我们从 9 个柑橘物种中发现了 41 个 PAL 基因,并揭示了不同柑橘物种中 PAL 基因的不同进化模式。研究发现,基因复制是柑橘中 PAL 基因家族扩展的重要机制。此外,PAL 基因应对压力的能力与它们在柑橘中的进化持续时间之间存在密切联系。进化时间较短的 PAL 基因参与了更多的多种胁迫响应,而且这些具有广谱抗性的 PAL 基因都是单拷贝基因。通过进一步整合柑橘木质素和类黄酮的合成途径,我们观察到PAL基因有助于木质素和类黄酮的合成,而木质素和类黄酮能增强柑橘植物的物理防御和清除ROS的能力,从而帮助它们抵御胁迫:本研究为柑橘中的 PAL 基因家族提供了一个全面的框架,并为柑橘的抗逆机制提出了一个假设模型。本研究为进一步研究 PAL 基因在柑橘生长、发育和应对各种胁迫中的生物学功能奠定了基础。
{"title":"Comparative analysis of the PAL gene family in nine citruses provides new insights into the stress resistance mechanism of Citrus species.","authors":"Tuo Yin, Rong Xu, Ling Zhu, Xiuyao Yang, Mengjie Zhang, Xulin Li, Yinqiang Zi, Ke Wen, Ke Zhao, Hanbing Cai, Xiaozhen Liu, Hanyao Zhang","doi":"10.1186/s12864-024-10938-3","DOIUrl":"10.1186/s12864-024-10938-3","url":null,"abstract":"<p><strong>Background: </strong>The phenylalanine ammonia-lyase (PAL) gene, a well-studied plant defense gene, is crucial for growth, development, and stress resistance. The PAL gene family has been studied in many plants. Citrus is among the most vital cash crops worldwide. However, the PAL gene family has not been comprehensively studied in most Citrus species, and the biological functions and specific underlying mechanisms are unclear.</p><p><strong>Results: </strong>We identified 41 PAL genes from nine Citrus species and revealed different patterns of evolution among the PAL genes in different Citrus species. Gene duplication was found to be a vital mechanism for the expansion of the PAL gene family in citrus. In addition, there was a strong correlation between the ability of PAL genes to respond to stress and their evolutionary duration in citrus. PAL genes with shorter evolutionary times were involved in more multiple stress responses, and these PAL genes with broad-spectrum resistance were all single-copy genes. By further integrating the lignin and flavonoid synthesis pathways in citrus, we observed that PAL genes contribute to the synthesis of lignin and flavonoids, which enhance the physical defense and ROS scavenging ability of citrus plants, thereby helping them withstand stress.</p><p><strong>Conclusions: </strong>This study provides a comprehensive framework of the PAL gene family in citrus, and we propose a hypothetical model for the stress resistance mechanism in citrus. This study provides a foundation for further investigations into the biological functions of PAL genes in the growth, development, and response to various stresses in citrus.</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"25 1","pages":"1020"},"PeriodicalIF":3.5,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11526608/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142557134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-30DOI: 10.1186/s12864-024-10871-5
Li-Qiong Jiang, Bryan T Drew, Watchara Arthan, Guo-Ying Yu, Hong Wu, Yue Zhao, Hua Peng, Chun-Lei Xiang
Background: Arundinelleae is a small tribe within the Poaceae (grass family) possessing a widespread distribution that includes Asia, the Americas, and Africa. Several species of Arundinelleae are used as natural forage, feed, and raw materials for paper. The tribe is taxonomically cumbersome due to a paucity of clear diagnostic morphological characters. There has been scant genetic and genomic research conducted for this group, and as a result the phylogenetic relationships and species boundaries within Arundinelleae are poorly understood.
Results: We compared and analyzed 11 plastomes of Arundinelleae, of which seven plastomes were newly sequenced. The plastomes range from 139,629 base pairs (bp) (Garnotia tenella) to 140,943 bp (Arundinella barbinodis), with a standard four-part structure. The average GC content was 38.39%, but varied in different regions of the plastome. In all, 110 genes were annotated, comprising 76 protein-coding genes, 30 tRNA genes, and four rRNA genes. Furthermore, 539 simple sequence repeats, 519 long repeats, and 10 hyper-variable regions were identified from the 11 plastomes of Arundinelleae. A phylogenetic reconstruction of Panicoideae based on 98 plastomes demonstrated the monophyly of Arundinella and Garnotia, but the circumscription of Arundinelleae remains unresolved.
Conclusion: Complete chloroplast genome sequences can improve phylogenetic resolution relative to single marker approaches, particularly within taxonomically challenging groups. All phylogenetic analyses strongly support the monophyly of Arundinella and Garnotia, respectively, but the monophylly of Arundinelleae was not well supported. The intergeneric phylogenetic relationships within Arundinelleae require clarification, indicating that more data is necessary to resolve generic boundaries and evaluate the monophyly of Arundinelleae. A comprehensive taxonomic revision for the tribe is necessary. In addition, the identified hyper-variable regions could function as molecular markers for clarifying phylogenetic relationships and potentially as barcoding markers for species identification in the future.
{"title":"Comparative plastome analysis of Arundinelleae (Poaceae, Panicoideae), with implications for phylogenetic relationships and plastome evolution.","authors":"Li-Qiong Jiang, Bryan T Drew, Watchara Arthan, Guo-Ying Yu, Hong Wu, Yue Zhao, Hua Peng, Chun-Lei Xiang","doi":"10.1186/s12864-024-10871-5","DOIUrl":"10.1186/s12864-024-10871-5","url":null,"abstract":"<p><strong>Background: </strong>Arundinelleae is a small tribe within the Poaceae (grass family) possessing a widespread distribution that includes Asia, the Americas, and Africa. Several species of Arundinelleae are used as natural forage, feed, and raw materials for paper. The tribe is taxonomically cumbersome due to a paucity of clear diagnostic morphological characters. There has been scant genetic and genomic research conducted for this group, and as a result the phylogenetic relationships and species boundaries within Arundinelleae are poorly understood.</p><p><strong>Results: </strong>We compared and analyzed 11 plastomes of Arundinelleae, of which seven plastomes were newly sequenced. The plastomes range from 139,629 base pairs (bp) (Garnotia tenella) to 140,943 bp (Arundinella barbinodis), with a standard four-part structure. The average GC content was 38.39%, but varied in different regions of the plastome. In all, 110 genes were annotated, comprising 76 protein-coding genes, 30 tRNA genes, and four rRNA genes. Furthermore, 539 simple sequence repeats, 519 long repeats, and 10 hyper-variable regions were identified from the 11 plastomes of Arundinelleae. A phylogenetic reconstruction of Panicoideae based on 98 plastomes demonstrated the monophyly of Arundinella and Garnotia, but the circumscription of Arundinelleae remains unresolved.</p><p><strong>Conclusion: </strong>Complete chloroplast genome sequences can improve phylogenetic resolution relative to single marker approaches, particularly within taxonomically challenging groups. All phylogenetic analyses strongly support the monophyly of Arundinella and Garnotia, respectively, but the monophylly of Arundinelleae was not well supported. The intergeneric phylogenetic relationships within Arundinelleae require clarification, indicating that more data is necessary to resolve generic boundaries and evaluate the monophyly of Arundinelleae. A comprehensive taxonomic revision for the tribe is necessary. In addition, the identified hyper-variable regions could function as molecular markers for clarifying phylogenetic relationships and potentially as barcoding markers for species identification in the future.</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"25 1","pages":"1016"},"PeriodicalIF":3.5,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11523875/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142543511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-30DOI: 10.1186/s12864-024-10915-w
Yingtian Guo, Chengyan Deng, Guizhi Feng, Dan Liu
Phytochrome-interacting factors (PIFs) are a subgroup of transcription factors within the basic helix-loop-helix (bHLH) family, playing a crucial role in integrating various environmental signals to regulate plant growth and development. Despite the significance of PIFs in these processes, a comprehensive genome-wide analysis of PIFs in conifers has yet to be conducted. In this investigation, three PtPIF genes were identified in Chinese pine, categorized into three subgroups, with conserved motifs indicating the presence of the APA/APB motif and bHLH domain in the PtPIF1 and PtPIF3 proteins. Phylogenetic analysis revealed that the PtPIF1 and PtPIF3 proteins belong to the PIF7/8 and PIF3 groups, respectively, and were relatively conserved among gymnosperms. Additionally, a class of PIF lacking APA/APB motif was identified in conifers, suggesting its function may differ from that of traditional PIFs. The cis-elements of the PtPIF genes were systematically examined, and analysis of PtPIF gene expression across various tissues and under different light, temperature, and plant hormone conditions demonstrated similar expression profiles for PtPIF1 and PtPIF3. Investigations into protein-protein interactions and co-expression networks speculated the involvement of PtPIFs and PtPHYA/Bs in circadian rhythms and hormone signal transduction. Further analysis of transcriptome data and experimental validation indicated an interaction between PtPIF3 and PtPHYB1, potentially linked to diurnal rhythms. Notably, the study revealed that PtPIF3 may be involved in gibberellic acid (GA) signaling through its interaction with PtDELLAs, suggesting a potential role for PtPIF3 in mediating both light and GA responses. Overall, this research provides a foundation for future studies investigating the functions of PIFs in conifer growth and development.
{"title":"Genome-wide analysis of phytochrome-interacting factor (PIF) families and their potential roles in light and gibberellin signaling in Chinese pine.","authors":"Yingtian Guo, Chengyan Deng, Guizhi Feng, Dan Liu","doi":"10.1186/s12864-024-10915-w","DOIUrl":"10.1186/s12864-024-10915-w","url":null,"abstract":"<p><p>Phytochrome-interacting factors (PIFs) are a subgroup of transcription factors within the basic helix-loop-helix (bHLH) family, playing a crucial role in integrating various environmental signals to regulate plant growth and development. Despite the significance of PIFs in these processes, a comprehensive genome-wide analysis of PIFs in conifers has yet to be conducted. In this investigation, three PtPIF genes were identified in Chinese pine, categorized into three subgroups, with conserved motifs indicating the presence of the APA/APB motif and bHLH domain in the PtPIF1 and PtPIF3 proteins. Phylogenetic analysis revealed that the PtPIF1 and PtPIF3 proteins belong to the PIF7/8 and PIF3 groups, respectively, and were relatively conserved among gymnosperms. Additionally, a class of PIF lacking APA/APB motif was identified in conifers, suggesting its function may differ from that of traditional PIFs. The cis-elements of the PtPIF genes were systematically examined, and analysis of PtPIF gene expression across various tissues and under different light, temperature, and plant hormone conditions demonstrated similar expression profiles for PtPIF1 and PtPIF3. Investigations into protein-protein interactions and co-expression networks speculated the involvement of PtPIFs and PtPHYA/Bs in circadian rhythms and hormone signal transduction. Further analysis of transcriptome data and experimental validation indicated an interaction between PtPIF3 and PtPHYB1, potentially linked to diurnal rhythms. Notably, the study revealed that PtPIF3 may be involved in gibberellic acid (GA) signaling through its interaction with PtDELLAs, suggesting a potential role for PtPIF3 in mediating both light and GA responses. Overall, this research provides a foundation for future studies investigating the functions of PIFs in conifer growth and development.</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"25 1","pages":"1017"},"PeriodicalIF":3.5,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11523891/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142543514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-30DOI: 10.1186/s12864-024-10937-4
Ridwaan Nazeer Milase, Johnson Lin, Nontobeko E Mvubu, Nokulunga Hlengwa
Bacillus tropicus is a recently identified subspecies of the Bacillus cereus group of bacteria that have been shown to possess genes associated with antimicrobial resistance (AMR) and identified as the causative agent for anthrax-like disease in Chinese soft-shelled turtles. In addition, B. tropicus has demonstrated great potential in the fields of bioremediation and bioconversion. This article describes the comparative genomics of a Bacillus phage vB_Btc-RBClinn15 (referred to as RBClin15) infecting the recently identified B. tropicus AOA-CPS1. RBClin15 is a temperate phage with a putative parABS partitioning system as well as an arbitrium system, which are presumed to enable extrachromosomal genome maintenance and regulate the lysis/lysogeny switch, respectively. The temperate phage RBClin15 has been sequenced however, was erroneously deposited as a plasmid in the NCBI GenBank database. A BLASTn search against the GenBank database using the whole genome sequence of RBClin15 revealed seven other putative temperate phages that were also deposited as plasmids in the database. Comparative genomic analyses shows that RBClin15 shares between 87 and 92% average nucleotide identity (ANI) with the seven temperate phages from the GenBank database. All together RBClin15 and the seven putative temperate phages share common genome arrangements and < 29% protein homologs with the closest phages, including 0105phi7-2. A phylogenomic tree and proteome-based phylogenetic tree analysis showed that RBClin15 and the seven temperate phages formed a separate branch from the closest phage, 0105phi7-2. In addition, the intergenomic similarity between RBClin15 and its closely related phages ranged between 0.3 and 47.7%. Collectively, based on the phylogenetic, and comparative genomic analyses, we propose three new species which will include RBClin15 and the seven temperate phages in the newly proposed genus Theosmithvirus under Caudoviricetes.
{"title":"Reclassification of the first Bacillus tropicus phage calls for reclassification of other Bacillus temperate phages previously designated as plasmids.","authors":"Ridwaan Nazeer Milase, Johnson Lin, Nontobeko E Mvubu, Nokulunga Hlengwa","doi":"10.1186/s12864-024-10937-4","DOIUrl":"10.1186/s12864-024-10937-4","url":null,"abstract":"<p><p>Bacillus tropicus is a recently identified subspecies of the Bacillus cereus group of bacteria that have been shown to possess genes associated with antimicrobial resistance (AMR) and identified as the causative agent for anthrax-like disease in Chinese soft-shelled turtles. In addition, B. tropicus has demonstrated great potential in the fields of bioremediation and bioconversion. This article describes the comparative genomics of a Bacillus phage vB_Btc-RBClinn15 (referred to as RBClin15) infecting the recently identified B. tropicus AOA-CPS1. RBClin15 is a temperate phage with a putative parABS partitioning system as well as an arbitrium system, which are presumed to enable extrachromosomal genome maintenance and regulate the lysis/lysogeny switch, respectively. The temperate phage RBClin15 has been sequenced however, was erroneously deposited as a plasmid in the NCBI GenBank database. A BLASTn search against the GenBank database using the whole genome sequence of RBClin15 revealed seven other putative temperate phages that were also deposited as plasmids in the database. Comparative genomic analyses shows that RBClin15 shares between 87 and 92% average nucleotide identity (ANI) with the seven temperate phages from the GenBank database. All together RBClin15 and the seven putative temperate phages share common genome arrangements and < 29% protein homologs with the closest phages, including 0105phi7-2. A phylogenomic tree and proteome-based phylogenetic tree analysis showed that RBClin15 and the seven temperate phages formed a separate branch from the closest phage, 0105phi7-2. In addition, the intergenomic similarity between RBClin15 and its closely related phages ranged between 0.3 and 47.7%. Collectively, based on the phylogenetic, and comparative genomic analyses, we propose three new species which will include RBClin15 and the seven temperate phages in the newly proposed genus Theosmithvirus under Caudoviricetes.</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"25 1","pages":"1018"},"PeriodicalIF":3.5,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11526630/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142543526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-30DOI: 10.1186/s12864-024-10901-2
Zachary T Calamari, Andrew Song, Emily Cohen, Muspika Akter, Rishi Das Roy, Outi Hallikas, Mona M Christensen, Pengyang Li, Pauline Marangoni, Jukka Jernvall, Ophir D Klein
Background: Continuously growing teeth are an important innovation in mammalian evolution, yet genetic regulation of continuous growth by stem cells remains incompletely understood. Dental stem cells responsible for tooth crown growth are lost at the onset of tooth root formation. Genetic signaling that initiates this loss is difficult to study with the ever-growing incisor and rooted molars of mice, the most common mammalian dental model species, because signals for root formation overlap with signals that pattern tooth size and shape (i.e., cusp patterns). Bank and prairie voles (Cricetidae, Rodentia, Glires) have evolved rooted and unrooted molars while retaining similar size and shape, providing alternative models for studying roots.
Results: We assembled a de novo genome of Myodes glareolus, a vole with high-crowned, rooted molars, and performed genomic and transcriptomic analyses in a broad phylogenetic context of Glires (rodents and lagomorphs) to assess differential selection and evolution in tooth forming genes. Bulk transcriptomics comparisons of embryonic molar development between bank voles and mice demonstrated overall conservation of gene expression levels, with species-specific differences corresponding to the accelerated and more extensive patterning of the vole molar. We leverage convergent evolution of unrooted molars across the clade to examine changes that may underlie the evolution of unrooted molars. We identified 15 dental genes with changing synteny relationships and six dental genes undergoing positive selection across Glires, two of which were undergoing positive selection in species with unrooted molars, Dspp and Aqp1. Decreased expression of both genes in prairie voles with unrooted molars compared to bank voles supports the presence of positive selection and may underlie differences in root formation.
Conclusions: Our results support ongoing evolution of dental genes across Glires and identify candidate genes for mechanistic studies of root formation. Comparative research using the bank vole as a model species can reveal the complex evolutionary background of convergent evolution for ever-growing molars.
{"title":"Bank vole genomics links determinate and indeterminate growth of teeth.","authors":"Zachary T Calamari, Andrew Song, Emily Cohen, Muspika Akter, Rishi Das Roy, Outi Hallikas, Mona M Christensen, Pengyang Li, Pauline Marangoni, Jukka Jernvall, Ophir D Klein","doi":"10.1186/s12864-024-10901-2","DOIUrl":"10.1186/s12864-024-10901-2","url":null,"abstract":"<p><strong>Background: </strong>Continuously growing teeth are an important innovation in mammalian evolution, yet genetic regulation of continuous growth by stem cells remains incompletely understood. Dental stem cells responsible for tooth crown growth are lost at the onset of tooth root formation. Genetic signaling that initiates this loss is difficult to study with the ever-growing incisor and rooted molars of mice, the most common mammalian dental model species, because signals for root formation overlap with signals that pattern tooth size and shape (i.e., cusp patterns). Bank and prairie voles (Cricetidae, Rodentia, Glires) have evolved rooted and unrooted molars while retaining similar size and shape, providing alternative models for studying roots.</p><p><strong>Results: </strong>We assembled a de novo genome of Myodes glareolus, a vole with high-crowned, rooted molars, and performed genomic and transcriptomic analyses in a broad phylogenetic context of Glires (rodents and lagomorphs) to assess differential selection and evolution in tooth forming genes. Bulk transcriptomics comparisons of embryonic molar development between bank voles and mice demonstrated overall conservation of gene expression levels, with species-specific differences corresponding to the accelerated and more extensive patterning of the vole molar. We leverage convergent evolution of unrooted molars across the clade to examine changes that may underlie the evolution of unrooted molars. We identified 15 dental genes with changing synteny relationships and six dental genes undergoing positive selection across Glires, two of which were undergoing positive selection in species with unrooted molars, Dspp and Aqp1. Decreased expression of both genes in prairie voles with unrooted molars compared to bank voles supports the presence of positive selection and may underlie differences in root formation.</p><p><strong>Conclusions: </strong>Our results support ongoing evolution of dental genes across Glires and identify candidate genes for mechanistic studies of root formation. Comparative research using the bank vole as a model species can reveal the complex evolutionary background of convergent evolution for ever-growing molars.</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"25 1","pages":"1000"},"PeriodicalIF":3.5,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11523675/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142543509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-30DOI: 10.1186/s12864-024-10954-3
Jianan Sui, Jiazi Chen, Yuehui Chen, Naoki Iwamori, Jin Sun
The Golgi apparatus is a crucial component of the inner membrane system in eukaryotic cells, playing a central role in protein biosynthesis. Dysfunction of the Golgi apparatus has been linked to neurodegenerative diseases. Accurate identification of sub-Golgi protein types is therefore essential for developing effective treatments for such diseases. Due to the expensive and time-consuming nature of experimental methods for identifying sub-Golgi protein types, various computational methods have been developed as identification tools. However, the majority of these methods rely solely on neighboring features in the protein sequence and neglect the crucial spatial structure information of the protein.To discover alternative methods for accurately identifying sub-Golgi proteins, we have developed a model called GASIDN. The GASIDN model extracts multi-dimension features by utilizing a 1D convolution module on protein sequences and a graph learning module on contact maps constructed from AlphaFold2.The model utilizes the deep representation learning model SeqVec to initialize protein sequences. GASIDN achieved accuracy values of 98.4% and 96.4% in independent testing and ten-fold cross-validation, respectively, outperforming the majority of previous predictors. To the best of our knowledge, this is the first method that utilizes multi-scale feature fusion to identify and locate sub-Golgi proteins. In order to assess the generalizability and scalability of our model, we conducted experiments to apply it in the identification of proteins from other organelles, including plant vacuoles and peroxisomes. The results obtained from these experiments demonstrated promising outcomes, indicating the effectiveness and versatility of our model. The source code and datasets can be accessed at https://github.com/SJNNNN/GASIDN .
{"title":"GASIDN: identification of sub-Golgi proteins with multi-scale feature fusion.","authors":"Jianan Sui, Jiazi Chen, Yuehui Chen, Naoki Iwamori, Jin Sun","doi":"10.1186/s12864-024-10954-3","DOIUrl":"10.1186/s12864-024-10954-3","url":null,"abstract":"<p><p>The Golgi apparatus is a crucial component of the inner membrane system in eukaryotic cells, playing a central role in protein biosynthesis. Dysfunction of the Golgi apparatus has been linked to neurodegenerative diseases. Accurate identification of sub-Golgi protein types is therefore essential for developing effective treatments for such diseases. Due to the expensive and time-consuming nature of experimental methods for identifying sub-Golgi protein types, various computational methods have been developed as identification tools. However, the majority of these methods rely solely on neighboring features in the protein sequence and neglect the crucial spatial structure information of the protein.To discover alternative methods for accurately identifying sub-Golgi proteins, we have developed a model called GASIDN. The GASIDN model extracts multi-dimension features by utilizing a 1D convolution module on protein sequences and a graph learning module on contact maps constructed from AlphaFold2.The model utilizes the deep representation learning model SeqVec to initialize protein sequences. GASIDN achieved accuracy values of 98.4% and 96.4% in independent testing and ten-fold cross-validation, respectively, outperforming the majority of previous predictors. To the best of our knowledge, this is the first method that utilizes multi-scale feature fusion to identify and locate sub-Golgi proteins. In order to assess the generalizability and scalability of our model, we conducted experiments to apply it in the identification of proteins from other organelles, including plant vacuoles and peroxisomes. The results obtained from these experiments demonstrated promising outcomes, indicating the effectiveness and versatility of our model. The source code and datasets can be accessed at https://github.com/SJNNNN/GASIDN .</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"25 1","pages":"1019"},"PeriodicalIF":3.5,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11526662/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142543513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-29DOI: 10.1186/s12864-024-10926-7
Reyhaneh Nouri, Vladimir Mashanov, April Harris, Gari New, William Taylor, Daniel Janies, Robert W Reid, Denis Jacob Machado
Collagenous connective tissue, found throughout the bodies of metazoans, plays a crucial role in maintaining structural integrity. This versatile tissue has the potential for numerous biomedical applications, including the development of innovative collagen-based biomaterials. Inspiration for such advancements can be drawn from echinoderms, a group of marine invertebrates that includes sea stars, sea cucumbers, brittle stars, sea urchins, and sea lilies. Through their nervous system, these organisms can reversibly control the pliability of their connective tissue components (i.e., tendons and ligaments) that are composed of mutable collagenous tissue (MCT). The variable tensile properties of the MCT allow echinoderms to perform unique functions, including postural maintenance, reduction of muscular energy use, autotomy to avoid predators, and asexual reproduction through fission. The changes in the tensile strength of MCT structures are specifically controlled by specialized neurosecretory cells called juxtaligamental cells. These cells release substances that either soften or stiffen the MCT. So far, only a few of these substances have been purified and characterized, and the genetic underpinning of MCT biology remains unknown. Therefore, we have conducted this research to identify MCT-related genes in echinoderms as a first step towards a better understanding of the MCT molecular control mechanisms. Our ultimate goal is to unlock new biomaterial applications based on this knowledge. In this project, we used RNA-Seq to identify and annotate differentially expressed genes in the MCT structures of the brittle star Ophiomastix wendtii. As a result, we present a list of 16 putative MCT modulator genes, which will be validated and characterized in forthcoming functional analyses.
{"title":"Unveiling putative modulators of mutable collagenous tissue in the brittle star Ophiomastix wendtii: an RNA-Seq analysis.","authors":"Reyhaneh Nouri, Vladimir Mashanov, April Harris, Gari New, William Taylor, Daniel Janies, Robert W Reid, Denis Jacob Machado","doi":"10.1186/s12864-024-10926-7","DOIUrl":"10.1186/s12864-024-10926-7","url":null,"abstract":"<p><p>Collagenous connective tissue, found throughout the bodies of metazoans, plays a crucial role in maintaining structural integrity. This versatile tissue has the potential for numerous biomedical applications, including the development of innovative collagen-based biomaterials. Inspiration for such advancements can be drawn from echinoderms, a group of marine invertebrates that includes sea stars, sea cucumbers, brittle stars, sea urchins, and sea lilies. Through their nervous system, these organisms can reversibly control the pliability of their connective tissue components (i.e., tendons and ligaments) that are composed of mutable collagenous tissue (MCT). The variable tensile properties of the MCT allow echinoderms to perform unique functions, including postural maintenance, reduction of muscular energy use, autotomy to avoid predators, and asexual reproduction through fission. The changes in the tensile strength of MCT structures are specifically controlled by specialized neurosecretory cells called juxtaligamental cells. These cells release substances that either soften or stiffen the MCT. So far, only a few of these substances have been purified and characterized, and the genetic underpinning of MCT biology remains unknown. Therefore, we have conducted this research to identify MCT-related genes in echinoderms as a first step towards a better understanding of the MCT molecular control mechanisms. Our ultimate goal is to unlock new biomaterial applications based on this knowledge. In this project, we used RNA-Seq to identify and annotate differentially expressed genes in the MCT structures of the brittle star Ophiomastix wendtii. As a result, we present a list of 16 putative MCT modulator genes, which will be validated and characterized in forthcoming functional analyses.</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"25 1","pages":"1013"},"PeriodicalIF":3.5,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11520437/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142543527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-29DOI: 10.1186/s12864-024-10862-6
João Luís Reis-Cunha, Daniel Charlton Jeffares
Background: Trypanosomatid parasites are a group of protozoans that cause devastating diseases that disproportionately affect developing countries. These protozoans have developed several mechanisms for adaptation to survive in the mammalian host, such as extensive expansion of multigene families enrolled in host-parasite interaction, adaptation to invade and modulate host cells, and the presence of aneuploidy and polyploidy. Two mechanisms might result in "complex" isolates, with more than two haplotypes being present in a single sample: multiplicity of infections (MOI) and polyploidy. We have developed and validated a methodology to identify multiclonal infections and polyploidy using whole genome sequencing reads, based on fluctuations in allelic read depth in heterozygous positions, which can be easily implemented in experiments sequencing genomes from one sample to larger population surveys.
Results: The methodology estimates the complexity index (CI) of an isolate, and compares real samples with simulated clonal infections at individual and populational level, excluding regions with somy and gene copy number variation. It was primarily validated with simulated MOI and known polyploid isolates respectively from Leishmania and Trypanosoma cruzi. Then, the approach was used to assess the complexity of infection using genome wide SNP data from 497 trypanosomatid samples from four clades, L. donovani/L. infantum, L. braziliensis, T. cruzi and T. brucei providing an overview of multiclonal infection and polyploidy in these cultured parasites. We show that our method robustly detects complex infections in samples with at least 25x coverage, 100 heterozygous SNPs and where 5-10% of the reads correspond to the secondary clone. We find that relatively small proportions (≤ 7%) of cultured trypanosomatid isolates are complex.
Conclusions: The method can accurately identify polyploid isolates, and can identify multiclonal infections in scenarios with sufficient genome read coverage. We pack our method in a single R script that requires only a standard variant call format (VCF) file to run ( https://github.com/jaumlrc/Complex-Infections ). Our analyses indicate that multiclonality and polyploidy do occur in all clades, but not very frequently in cultured trypanosomatids. We caution that our estimates are lower bounds due to the limitations of current laboratory and bioinformatic methods.
背景:锥虫寄生虫是一类原生动物,可导致严重影响发展中国家的毁灭性疾病。为了在哺乳动物宿主体内生存,这些原生动物发展出了多种适应机制,如在宿主与寄生虫相互作用中广泛扩增多基因家族,适应入侵和调节宿主细胞,以及存在非整倍体和多倍体。有两种机制可能会导致 "复杂 "的分离物,即单个样本中存在两种以上的单倍型:多重感染(MOI)和多倍体。我们根据杂合位置等位基因读数深度的波动,开发并验证了一种利用全基因组测序读数识别多克隆感染和多倍体的方法,该方法可在从一个样本到更大群体调查的基因组测序实验中轻松实施:结果:该方法估算了分离株的复杂性指数(CI),并在个体和种群水平上对真实样本与模拟克隆感染进行了比较,排除了存在染色体和基因拷贝数变异的区域。该方法主要通过模拟 MOI 和已知多倍体分离物分别从利什曼原虫和克鲁斯锥虫中进行验证。然后,利用来自 L. donovani/L.infantum、L. braziliensis、T. cruzi 和 T. brucei 四个支系的 497 个锥虫样本的全基因组 SNP 数据评估了感染的复杂性,提供了这些培养寄生虫中多克隆感染和多倍体的概况。我们的研究表明,我们的方法能在至少有 25 倍覆盖率、100 个杂合 SNP 和 5-10% 的读数与二级克隆相对应的样本中稳健地检测出复杂感染。我们发现,相对较小比例(≤ 7%)的培养锥虫分离物是复杂的:结论:该方法能准确识别多倍体分离株,并能在基因组读数覆盖率足够大的情况下识别多克隆感染。我们将该方法打包到一个 R 脚本中,运行时只需一个标准变异调用格式(VCF)文件 ( https://github.com/jaumlrc/Complex-Infections )。我们的分析表明,多克隆性和多倍体确实发生在所有支系中,但在培养的锥虫中并不常见。我们要提醒的是,由于目前实验室和生物信息学方法的局限性,我们的估计值只是下限。
{"title":"Detecting complex infections in trypanosomatids using whole genome sequencing.","authors":"João Luís Reis-Cunha, Daniel Charlton Jeffares","doi":"10.1186/s12864-024-10862-6","DOIUrl":"10.1186/s12864-024-10862-6","url":null,"abstract":"<p><strong>Background: </strong>Trypanosomatid parasites are a group of protozoans that cause devastating diseases that disproportionately affect developing countries. These protozoans have developed several mechanisms for adaptation to survive in the mammalian host, such as extensive expansion of multigene families enrolled in host-parasite interaction, adaptation to invade and modulate host cells, and the presence of aneuploidy and polyploidy. Two mechanisms might result in \"complex\" isolates, with more than two haplotypes being present in a single sample: multiplicity of infections (MOI) and polyploidy. We have developed and validated a methodology to identify multiclonal infections and polyploidy using whole genome sequencing reads, based on fluctuations in allelic read depth in heterozygous positions, which can be easily implemented in experiments sequencing genomes from one sample to larger population surveys.</p><p><strong>Results: </strong>The methodology estimates the complexity index (CI) of an isolate, and compares real samples with simulated clonal infections at individual and populational level, excluding regions with somy and gene copy number variation. It was primarily validated with simulated MOI and known polyploid isolates respectively from Leishmania and Trypanosoma cruzi. Then, the approach was used to assess the complexity of infection using genome wide SNP data from 497 trypanosomatid samples from four clades, L. donovani/L. infantum, L. braziliensis, T. cruzi and T. brucei providing an overview of multiclonal infection and polyploidy in these cultured parasites. We show that our method robustly detects complex infections in samples with at least 25x coverage, 100 heterozygous SNPs and where 5-10% of the reads correspond to the secondary clone. We find that relatively small proportions (≤ 7%) of cultured trypanosomatid isolates are complex.</p><p><strong>Conclusions: </strong>The method can accurately identify polyploid isolates, and can identify multiclonal infections in scenarios with sufficient genome read coverage. We pack our method in a single R script that requires only a standard variant call format (VCF) file to run ( https://github.com/jaumlrc/Complex-Infections ). Our analyses indicate that multiclonality and polyploidy do occur in all clades, but not very frequently in cultured trypanosomatids. We caution that our estimates are lower bounds due to the limitations of current laboratory and bioinformatic methods.</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"25 1","pages":"1011"},"PeriodicalIF":3.5,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11520695/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142543512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}