Pub Date : 2024-12-31DOI: 10.1186/s12864-024-11178-1
Xia Zhou, Jilong Li, Lei Chen, Minjie Guo, Renmin Liang, Yinghua Pan
Background: Rice, as one of the most important staple crops, its genetic improvement plays a crucial role in agricultural production and food security. Although extensive research has utilized single nucleotide polymorphisms (SNPs) data to explore the genetic basis of important agronomic traits in rice improvement, reports on the role of other types of variations, such as insertions and deletions (INDELs), are still limited.
Results: In this study, we extracted INDELs from resequencing data of 148 rice improved varieties. We identified 938,585 INDELs and found that as the length of the variation increases, the number of variations decreases, with 89.0% of INDELs being 2-10 bp. The highest number of INDELs was found on chromosome 1, while the least was on chromosome 10. INDELs were unevenly distributed across the genome, generating a total of 33 hotspot regions. 47.0% of INDELs were located within 2 kb upstream and downstream of genes. Using phenotypic data from five agronomic traits (heading date, flag leaf length, flag leaf width, panicle number, and plant height) along with INDEL data to perform genome-wide association study (GWAS), we identified 6,331 significant loci involving 157 cloned genes. Haplotype analysis of candidate genes revealed INDELs affecting important functional genes, such as OsMED25 and OsRRMh related to heading date, and MOC2 related to plant height.
Conclusions: Our work analyzed the variation patterns of INDELs in rice improvement and identified INDELs associated with agronomic traits. These results will provide valuable genetic and material resources for the genetic improvement of rice.
{"title":"The genomic pattern of insertion/deletion variations during rice improvement.","authors":"Xia Zhou, Jilong Li, Lei Chen, Minjie Guo, Renmin Liang, Yinghua Pan","doi":"10.1186/s12864-024-11178-1","DOIUrl":"10.1186/s12864-024-11178-1","url":null,"abstract":"<p><strong>Background: </strong>Rice, as one of the most important staple crops, its genetic improvement plays a crucial role in agricultural production and food security. Although extensive research has utilized single nucleotide polymorphisms (SNPs) data to explore the genetic basis of important agronomic traits in rice improvement, reports on the role of other types of variations, such as insertions and deletions (INDELs), are still limited.</p><p><strong>Results: </strong>In this study, we extracted INDELs from resequencing data of 148 rice improved varieties. We identified 938,585 INDELs and found that as the length of the variation increases, the number of variations decreases, with 89.0% of INDELs being 2-10 bp. The highest number of INDELs was found on chromosome 1, while the least was on chromosome 10. INDELs were unevenly distributed across the genome, generating a total of 33 hotspot regions. 47.0% of INDELs were located within 2 kb upstream and downstream of genes. Using phenotypic data from five agronomic traits (heading date, flag leaf length, flag leaf width, panicle number, and plant height) along with INDEL data to perform genome-wide association study (GWAS), we identified 6,331 significant loci involving 157 cloned genes. Haplotype analysis of candidate genes revealed INDELs affecting important functional genes, such as OsMED25 and OsRRMh related to heading date, and MOC2 related to plant height.</p><p><strong>Conclusions: </strong>Our work analyzed the variation patterns of INDELs in rice improvement and identified INDELs associated with agronomic traits. These results will provide valuable genetic and material resources for the genetic improvement of rice.</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"25 1","pages":"1263"},"PeriodicalIF":3.5,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11686897/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142909165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-30DOI: 10.1186/s12864-024-11115-2
Ocky K Radjasa, Ray Steven, Yosua Natanael, Husna Nugrahapraja, Septhy K Radjasa, Tati Kristianti, Maelita R Moeis, Joko P Trinugroho, Haekal B Suharya, Alfito O Rachmatsyah, Ari Dwijayanti, Mutiara R Putri, Charlie E de Fretes, Zen L Siallagan, Muhammad Fadli, Rafidha D A Opier, Jandinta D Farahyah, Viana Rahmawati, Meirifa Rizanti, Zalfa Humaira, Ary S Prihatmanto, Nugroho D Hananto, R Dwi Susanto, Agus Chahyadi, Elfahmi, Neil Priharto, Kamarisima, Fenny M Dwivany
Background: The marine environment boasts distinctive physical, chemical, and biological characteristics. While numerous studies have delved into the microbial ecology and biological potential of the marine environment, exploration of genetically encoded, deep-sea sourced secondary metabolites remains scarce. This study endeavors to investigate marine bioproducts derived from deep-sea water samples at a depth of 1,000 m in the Java Trench, Indonesia, utilizing both culture-dependent and whole-genome sequencing methods.
Results: Our efforts led to the successful isolation and cultivation of a bacterium Priestia flexa JT4 from the water samples, followed by comprehensive genome sequencing. The resultant high-quality draft genome, approximately 4 Mb, harbored 5185 coding sequences (CDSs). Notably, 61.97% of these CDSs were inadequately characterized, presenting potential novel CDSs. This study is the first to identify the "open-type" (α < 1) pangenome within the genus Priestia. Moreover, our analysis uncovered eight biosynthetic gene clusters (BGCs) using the common genome mining pipeline, antiSMASH. Two non-ribosomal peptide synthetase (NRPS) BGCs within these clusters exhibited the potential to generate novel biological compounds. Noteworthy is the confirmation that the terpene BGC in P. flexa JT4 can produce lycopene, a compound in substantial industrial demand. The presence of lycopene in the P. flexa JT4 cells was verified using Ultra-performance liquid chromatography-mass spectrometry (UPLC-MS/MS) in multiple reaction modes.
Conclusions: This study highlights the bioprospecting opportunity to explore novel bioproducts and lycopene compounds from P. flexa JT4. It marks the pioneering exploration of deep-sea bacterium bioprospecting in Indonesia, seeking to unveil novel bioproducts and lycopene compounds through a genome mining approach.
{"title":"From the depths of the Java Trench: genomic analysis of Priestia flexa JT4 reveals bioprospecting and lycopene production potential.","authors":"Ocky K Radjasa, Ray Steven, Yosua Natanael, Husna Nugrahapraja, Septhy K Radjasa, Tati Kristianti, Maelita R Moeis, Joko P Trinugroho, Haekal B Suharya, Alfito O Rachmatsyah, Ari Dwijayanti, Mutiara R Putri, Charlie E de Fretes, Zen L Siallagan, Muhammad Fadli, Rafidha D A Opier, Jandinta D Farahyah, Viana Rahmawati, Meirifa Rizanti, Zalfa Humaira, Ary S Prihatmanto, Nugroho D Hananto, R Dwi Susanto, Agus Chahyadi, Elfahmi, Neil Priharto, Kamarisima, Fenny M Dwivany","doi":"10.1186/s12864-024-11115-2","DOIUrl":"10.1186/s12864-024-11115-2","url":null,"abstract":"<p><strong>Background: </strong>The marine environment boasts distinctive physical, chemical, and biological characteristics. While numerous studies have delved into the microbial ecology and biological potential of the marine environment, exploration of genetically encoded, deep-sea sourced secondary metabolites remains scarce. This study endeavors to investigate marine bioproducts derived from deep-sea water samples at a depth of 1,000 m in the Java Trench, Indonesia, utilizing both culture-dependent and whole-genome sequencing methods.</p><p><strong>Results: </strong>Our efforts led to the successful isolation and cultivation of a bacterium Priestia flexa JT4 from the water samples, followed by comprehensive genome sequencing. The resultant high-quality draft genome, approximately 4 Mb, harbored 5185 coding sequences (CDSs). Notably, 61.97% of these CDSs were inadequately characterized, presenting potential novel CDSs. This study is the first to identify the \"open-type\" (α < 1) pangenome within the genus Priestia. Moreover, our analysis uncovered eight biosynthetic gene clusters (BGCs) using the common genome mining pipeline, antiSMASH. Two non-ribosomal peptide synthetase (NRPS) BGCs within these clusters exhibited the potential to generate novel biological compounds. Noteworthy is the confirmation that the terpene BGC in P. flexa JT4 can produce lycopene, a compound in substantial industrial demand. The presence of lycopene in the P. flexa JT4 cells was verified using Ultra-performance liquid chromatography-mass spectrometry (UPLC-MS/MS) in multiple reaction modes.</p><p><strong>Conclusions: </strong>This study highlights the bioprospecting opportunity to explore novel bioproducts and lycopene compounds from P. flexa JT4. It marks the pioneering exploration of deep-sea bacterium bioprospecting in Indonesia, seeking to unveil novel bioproducts and lycopene compounds through a genome mining approach.</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"25 1","pages":"1259"},"PeriodicalIF":3.5,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11687134/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142906267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-30DOI: 10.1186/s12864-024-11179-0
Xiaoyu Guo, Zhenming Wu, Shu Zhang, Jin Zhao
Background: Discontinuous transcription allows coronaviruses to efficiently replicate and transmit within host cells, enhancing their adaptability and survival. Assembling viral transcripts is crucial for virology research and the development of antiviral strategies. However, traditional transcript assembly methods primarily designed for variable alternative splicing events in eukaryotes are not suitable for the viral transcript assembly problem. The current algorithms designed for assembling viral transcripts often struggle with low accuracy in determining the transcript boundaries. There is an urgent need to develop a highly accurate viral transcript assembly algorithm.
Results: In this work, we propose Cov-trans, a reference-based transcript assembler specifically tailored for the discontinuous transcription of coronaviruses. Cov-trans first identifies canonical transcripts based on discontinuous transcription mechanisms, start and stop codons, as well as reads alignment information. Subsequently, it formulates the assembly of non-canonical transcripts as a path extraction problem, and introduces a mixed integer linear programming to recover these non-canonical transcripts.
Conclusion: Experimental results show that Cov-trans outperforms other assemblers in both accuracy and recall, with a notable strength in accurately identifying the boundaries of transcripts. Cov-trans is freely available at https://github.com/computer-Bioinfo/Cov-trans.git .
{"title":"Cov-trans: an efficient algorithm for discontinuous transcript assembly in coronaviruses.","authors":"Xiaoyu Guo, Zhenming Wu, Shu Zhang, Jin Zhao","doi":"10.1186/s12864-024-11179-0","DOIUrl":"10.1186/s12864-024-11179-0","url":null,"abstract":"<p><strong>Background: </strong>Discontinuous transcription allows coronaviruses to efficiently replicate and transmit within host cells, enhancing their adaptability and survival. Assembling viral transcripts is crucial for virology research and the development of antiviral strategies. However, traditional transcript assembly methods primarily designed for variable alternative splicing events in eukaryotes are not suitable for the viral transcript assembly problem. The current algorithms designed for assembling viral transcripts often struggle with low accuracy in determining the transcript boundaries. There is an urgent need to develop a highly accurate viral transcript assembly algorithm.</p><p><strong>Results: </strong>In this work, we propose Cov-trans, a reference-based transcript assembler specifically tailored for the discontinuous transcription of coronaviruses. Cov-trans first identifies canonical transcripts based on discontinuous transcription mechanisms, start and stop codons, as well as reads alignment information. Subsequently, it formulates the assembly of non-canonical transcripts as a path extraction problem, and introduces a mixed integer linear programming to recover these non-canonical transcripts.</p><p><strong>Conclusion: </strong>Experimental results show that Cov-trans outperforms other assemblers in both accuracy and recall, with a notable strength in accurately identifying the boundaries of transcripts. Cov-trans is freely available at https://github.com/computer-Bioinfo/Cov-trans.git .</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"25 1","pages":"1257"},"PeriodicalIF":3.5,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11684287/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142906321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-30DOI: 10.1186/s12864-024-11176-3
Fuyuan Qiu, Qingbo Ai, Jun Li, Hua Wu
Background: Nuptial pads, a typical sexually dimorphic trait in anurans, are located on the first digit of the male forelimb in Rana chensinensis and exhibit morphological changes synchronized with breeding cycles. However, the genetic mechanisms underlying its formation and seasonal changes remain poorly understood.
Results: To identify genes and biological processes associated with the development and seasonal variations of nuptial pads, we conducted a comprehensive transcriptome analysis on nuptial pads and hind toe skin across both sexes at different breeding periods in R. chensinensis. We identified numerous sexually and seasonally differential expression genes in nuptial pads. Notably, genes including KRT, TRY, HPDB, AKR1C1, and AKR1C3 were identified as potential key regulators of keratinization and coloration variation in nuptial pads. We further examined gene co-expression modules closely linked to nuptial pad development. These modules contained genes involved in signal transduction, substance transport, cytoskeletal structure, energy metabolism, and protein modification, suggesting that the development of nuptial pads is a complex multifaceted regulatory process. Furthermore, genes in modules associated with pad development during the breeding season were primarily involved in apoptosis, steroid hormone synthesis, autophagy, and cytochrome P450 pathways, suggesting their pivotal role in pad formation. Additionally, key regulators of the cell cycle, such as FOXO4, PIK3C2A, and GSPT2, were implicated in influencing nuptial pad development by modulating cell differentiation and proliferation.
Conclusions: Our study provides a valuable reference for investigating the molecular basis of sexual dimorphism in R. chensinensis and other amphibian species more broadly.
{"title":"Transcriptome analysis reveals the genetic basis underlying the formation and seasonal changes of nuptial pads in Rana chensinensis.","authors":"Fuyuan Qiu, Qingbo Ai, Jun Li, Hua Wu","doi":"10.1186/s12864-024-11176-3","DOIUrl":"10.1186/s12864-024-11176-3","url":null,"abstract":"<p><strong>Background: </strong>Nuptial pads, a typical sexually dimorphic trait in anurans, are located on the first digit of the male forelimb in Rana chensinensis and exhibit morphological changes synchronized with breeding cycles. However, the genetic mechanisms underlying its formation and seasonal changes remain poorly understood.</p><p><strong>Results: </strong>To identify genes and biological processes associated with the development and seasonal variations of nuptial pads, we conducted a comprehensive transcriptome analysis on nuptial pads and hind toe skin across both sexes at different breeding periods in R. chensinensis. We identified numerous sexually and seasonally differential expression genes in nuptial pads. Notably, genes including KRT, TRY, HPDB, AKR1C1, and AKR1C3 were identified as potential key regulators of keratinization and coloration variation in nuptial pads. We further examined gene co-expression modules closely linked to nuptial pad development. These modules contained genes involved in signal transduction, substance transport, cytoskeletal structure, energy metabolism, and protein modification, suggesting that the development of nuptial pads is a complex multifaceted regulatory process. Furthermore, genes in modules associated with pad development during the breeding season were primarily involved in apoptosis, steroid hormone synthesis, autophagy, and cytochrome P450 pathways, suggesting their pivotal role in pad formation. Additionally, key regulators of the cell cycle, such as FOXO4, PIK3C2A, and GSPT2, were implicated in influencing nuptial pad development by modulating cell differentiation and proliferation.</p><p><strong>Conclusions: </strong>Our study provides a valuable reference for investigating the molecular basis of sexual dimorphism in R. chensinensis and other amphibian species more broadly.</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"25 1","pages":"1254"},"PeriodicalIF":3.5,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11684044/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142906345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-30DOI: 10.1186/s12864-024-11170-9
Haixia Shi, Zunqiang Yan, Hong Du, Bo Zhang, Shuangbao Gun
Background: The Hezuo (HZ) pig, a famous indigenous breed in China, is characterized by precocious puberty compared with foreign-introduced pig breeds. Sexual maturation is a complex physiological process, and in recent years, circular RNAs (circRNAs), a new class of noncoding RNAs with endogenous regulatory functions, have been shown to play important roles in regulating sexual maturation. However, the dynamic expression and regulatory mechanism of circRNAs during sexual maturation in HZ pigs remain unclear. In this study, we performed RNA sequencing and bioinformatics analysis to reveal circRNA expression patterns in the testes of HZ boars at 30 days (sexual immaturity; Ha) and 120 days (sexual maturity; Hb), with Landrace (LC) boars of the same age (La and Lb) as controls. Subsequently, an abundant circ_005678 (circPAN3) transcribed from the PAN3 gene, was functionally investigated by RT-qPCR, Western Blot, CCK-8, and flow cytometry.
Results: We identified 31,134 circRNAs in 12 samples, and 2,562, 2,401, 749, and 831 differentially expressed (DE) circRNAs were identified in the Ha-vs-Hb, La-vs-Lb, Ha-vs-La, and Hb-vs-Lb groups, respectively. The results of functional enrichment analyses indicated that these source genes of the DE circRNAs were involved mainly in testicular development and spermatogenesis. Furthermore, we constructed a circRNA-miRNA-mRNA interaction network and functionally analyzed the target genes. GO functional annotation of the target genes suggested that they were mainly involved in biological processes such as gland development, cell proliferation, and reproduction. KEGG pathway analysis further revealed that these genes were enriched mainly in signaling pathways involved in testicular development and spermatogenesis, including the PI3K-Akt and MAPK signaling pathways. Cellular assays revealed that circPAN3 promoted proliferation and inhibited apoptosis in immature Sertoli cells, whereas opposite changes were observed by circPAN3 knockdown.
Conclusions: This study revealed the dynamic expression profiles and regulatory mechanisms of circRNAs during sexual maturation in HZ pigs. Further functional studies demonstrated that circPAN3 promoted the proliferation and inhibited the apoptosis of immature Sertoli cells, suggesting that circPAN3 may be closely related to the characteristics of precocious puberty in HZ boars. These findings provide a new perspective for exploring the regulatory mechanism of circRNAs in precocious puberty in HZ pigs.
{"title":"CircRNA profiling reveals the regulatory role of circPAN3 in Hezuo boars Sertoli cell growth.","authors":"Haixia Shi, Zunqiang Yan, Hong Du, Bo Zhang, Shuangbao Gun","doi":"10.1186/s12864-024-11170-9","DOIUrl":"10.1186/s12864-024-11170-9","url":null,"abstract":"<p><strong>Background: </strong>The Hezuo (HZ) pig, a famous indigenous breed in China, is characterized by precocious puberty compared with foreign-introduced pig breeds. Sexual maturation is a complex physiological process, and in recent years, circular RNAs (circRNAs), a new class of noncoding RNAs with endogenous regulatory functions, have been shown to play important roles in regulating sexual maturation. However, the dynamic expression and regulatory mechanism of circRNAs during sexual maturation in HZ pigs remain unclear. In this study, we performed RNA sequencing and bioinformatics analysis to reveal circRNA expression patterns in the testes of HZ boars at 30 days (sexual immaturity; Ha) and 120 days (sexual maturity; Hb), with Landrace (LC) boars of the same age (La and Lb) as controls. Subsequently, an abundant circ_005678 (circPAN3) transcribed from the PAN3 gene, was functionally investigated by RT-qPCR, Western Blot, CCK-8, and flow cytometry.</p><p><strong>Results: </strong>We identified 31,134 circRNAs in 12 samples, and 2,562, 2,401, 749, and 831 differentially expressed (DE) circRNAs were identified in the Ha-vs-Hb, La-vs-Lb, Ha-vs-La, and Hb-vs-Lb groups, respectively. The results of functional enrichment analyses indicated that these source genes of the DE circRNAs were involved mainly in testicular development and spermatogenesis. Furthermore, we constructed a circRNA-miRNA-mRNA interaction network and functionally analyzed the target genes. GO functional annotation of the target genes suggested that they were mainly involved in biological processes such as gland development, cell proliferation, and reproduction. KEGG pathway analysis further revealed that these genes were enriched mainly in signaling pathways involved in testicular development and spermatogenesis, including the PI3K-Akt and MAPK signaling pathways. Cellular assays revealed that circPAN3 promoted proliferation and inhibited apoptosis in immature Sertoli cells, whereas opposite changes were observed by circPAN3 knockdown.</p><p><strong>Conclusions: </strong>This study revealed the dynamic expression profiles and regulatory mechanisms of circRNAs during sexual maturation in HZ pigs. Further functional studies demonstrated that circPAN3 promoted the proliferation and inhibited the apoptosis of immature Sertoli cells, suggesting that circPAN3 may be closely related to the characteristics of precocious puberty in HZ boars. These findings provide a new perspective for exploring the regulatory mechanism of circRNAs in precocious puberty in HZ pigs.</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"25 1","pages":"1258"},"PeriodicalIF":3.5,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11686915/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142906316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Myogenic factor 6 (Myf6) plays an important role in muscle growth and differentiation. In aquatic animals and livestock, Myf6 contributes to improving meat quality and strengthening the accumulation of muscle flavor substances. However, studies on Myf6 gene polymorphisms in crustaceans have not been reported.
Results: In the current study, we characterized the Myf6 gene for Portunus trituberculatus to better understand its biological function. The full-length cDNA of Myf6 was 4,101 bp, with a 915 bp open reading frame encoding 304 amino acids. In addition, Myf6 included a conservative bHLH domain. Homology analysis showed that Myf6 shared the highest identity with Penaeus vannamei. Expression pattern analysis of Myf6 in fast- and slow-growing groups revealed that the expression level of the latter was significantly higher than that of the former (P < 0.05). qPCR studies revealed that Myf6 was expressed in various tissues with the highest level in muscle. Nineteen single nucleotide polymorphisms (SNPs) of Myf6 were identified and five of them were significantly associated with growth-related traits of P. trituberculatus (P < 0.05), including full carapace width, carapace length, body height, and body weight. The AG and GG genotypes of g.1,187,834 A > G exhibited superior growth-related traits than the AA genotype. In the combined genotypes of g.1,187,324 C > T and g.1,187,834 A > G, the average body weight of diplotype D5 (CT-GG) was higher than that of diplotype D1 (CC-AA), D2 (CC-AG), and D3 (CC-GG) in a cultivated population. A haploblock was generated by three significant SNPs (g.1187834 A > G, g.1188616 A > G, and g.1189024 C > A), containing four haplotypes (AAA, AAC, AGC, and GGC), among which GGC haplotype exhibited superior growth traits (full carapace width and body weight) than the AAA haplotype.
Conclusions: To our knowledge, this is the first report on Myf6 in crustaceans. The results of this study would contribute to elucidating multiple functions of the Myf6 gene in crustaceans and exploring the potential as a candidate gene in selective breeding programs of P. trituberculatus.
{"title":"Characterization of Myf6 and association with growth traits in swimming crab (Portunus trituberculatus).","authors":"Baohua Duan, Weibiao Liu, Chen Zhang, Tongxu Kang, Haifu Wan, Shumei Mu, Yueqiang Guan, Zejian Li, Yang Tian, Yuqin Ren, Xianjiang Kang","doi":"10.1186/s12864-024-11181-6","DOIUrl":"10.1186/s12864-024-11181-6","url":null,"abstract":"<p><strong>Background: </strong>Myogenic factor 6 (Myf6) plays an important role in muscle growth and differentiation. In aquatic animals and livestock, Myf6 contributes to improving meat quality and strengthening the accumulation of muscle flavor substances. However, studies on Myf6 gene polymorphisms in crustaceans have not been reported.</p><p><strong>Results: </strong>In the current study, we characterized the Myf6 gene for Portunus trituberculatus to better understand its biological function. The full-length cDNA of Myf6 was 4,101 bp, with a 915 bp open reading frame encoding 304 amino acids. In addition, Myf6 included a conservative bHLH domain. Homology analysis showed that Myf6 shared the highest identity with Penaeus vannamei. Expression pattern analysis of Myf6 in fast- and slow-growing groups revealed that the expression level of the latter was significantly higher than that of the former (P < 0.05). qPCR studies revealed that Myf6 was expressed in various tissues with the highest level in muscle. Nineteen single nucleotide polymorphisms (SNPs) of Myf6 were identified and five of them were significantly associated with growth-related traits of P. trituberculatus (P < 0.05), including full carapace width, carapace length, body height, and body weight. The AG and GG genotypes of g.1,187,834 A > G exhibited superior growth-related traits than the AA genotype. In the combined genotypes of g.1,187,324 C > T and g.1,187,834 A > G, the average body weight of diplotype D5 (CT-GG) was higher than that of diplotype D1 (CC-AA), D2 (CC-AG), and D3 (CC-GG) in a cultivated population. A haploblock was generated by three significant SNPs (g.1187834 A > G, g.1188616 A > G, and g.1189024 C > A), containing four haplotypes (AAA, AAC, AGC, and GGC), among which GGC haplotype exhibited superior growth traits (full carapace width and body weight) than the AAA haplotype.</p><p><strong>Conclusions: </strong>To our knowledge, this is the first report on Myf6 in crustaceans. The results of this study would contribute to elucidating multiple functions of the Myf6 gene in crustaceans and exploring the potential as a candidate gene in selective breeding programs of P. trituberculatus.</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"25 1","pages":"1256"},"PeriodicalIF":3.5,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11684091/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142906280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Previous genome-wide association studies (GWAS) have established association between genetic variants and pulmonary function across various ethnics, whereas such associations are scarcely reported in Chinese adults. Therefore, we conducted an GWAS to explore relationships between genetic variants and pulmonary function among middle-aged Chinese dizygotic twins and further validated the top variants using data from the UK Biobank (UKB).
Methods: In the discovery phase, 139 dizygotic twin pairs were drawn from the Qingdao Twin Registry. Pulmonary function was assessed using three parameters: forced expiratory volume the first second (FEV1), forced vital capacity (FVC), and FEV1/FVC ratio. GWAS was performed using GEMMA, Gene-based analysis was conducted by VEGAS2. And pathway enrichment analysis was performed using PASCAL. In the validation phase, Single-nucleotide polymorphisms (SNPs) with suggestive significance were examined through linear regression analysis of the additive effect model among 1573 Chinese ethnic participants from UKB.
Results: The median age of twin pairs in the study was 49 years. 3 SNPs (rs80345886, rs117883876, and 75139439) related to FEV1 achieved genome-wide significance. Moreover, 222, 150, and 73 SNPs surpassed suggestive evidence level (p < 1 × 10- 5) for FEV1, FVC, and FEV1/FVC, respectively. Among them, 16 SNPs located in TBC1D16 for FEV1, 25 SNPs located in GPR126 for FVC, and 2 SNPs located in CCDC110 for FEV1/FVC, the three genes were also revealed by gene-based analysis. Moreover, 12 novel SNPs related to pulmonary function were validated to reach the nominal significance level (p < 0.05) in the UKB, with some located in the TBC1D16, TAFA5, and MTHFD1L genes.
Conclusion: Our GWAS results on Chinese dizygotic twins provide new references for the genetic regulation on pulmonary function. Twelve novel susceptibility loci are considered as possible crucial to pulmonary function.
{"title":"Genome-wide analysis in northern Chinese twins identifies twelve new susceptibility loci for pulmonary function.","authors":"Tong Wang, Weijing Wang, Chunsheng Xu, Xiaocao Tian, Dongfeng Zhang","doi":"10.1186/s12864-024-11165-6","DOIUrl":"10.1186/s12864-024-11165-6","url":null,"abstract":"<p><strong>Background: </strong>Previous genome-wide association studies (GWAS) have established association between genetic variants and pulmonary function across various ethnics, whereas such associations are scarcely reported in Chinese adults. Therefore, we conducted an GWAS to explore relationships between genetic variants and pulmonary function among middle-aged Chinese dizygotic twins and further validated the top variants using data from the UK Biobank (UKB).</p><p><strong>Methods: </strong>In the discovery phase, 139 dizygotic twin pairs were drawn from the Qingdao Twin Registry. Pulmonary function was assessed using three parameters: forced expiratory volume the first second (FEV1), forced vital capacity (FVC), and FEV1/FVC ratio. GWAS was performed using GEMMA, Gene-based analysis was conducted by VEGAS2. And pathway enrichment analysis was performed using PASCAL. In the validation phase, Single-nucleotide polymorphisms (SNPs) with suggestive significance were examined through linear regression analysis of the additive effect model among 1573 Chinese ethnic participants from UKB.</p><p><strong>Results: </strong>The median age of twin pairs in the study was 49 years. 3 SNPs (rs80345886, rs117883876, and 75139439) related to FEV1 achieved genome-wide significance. Moreover, 222, 150, and 73 SNPs surpassed suggestive evidence level (p < 1 × 10<sup>- 5</sup>) for FEV1, FVC, and FEV1/FVC, respectively. Among them, 16 SNPs located in TBC1D16 for FEV1, 25 SNPs located in GPR126 for FVC, and 2 SNPs located in CCDC110 for FEV1/FVC, the three genes were also revealed by gene-based analysis. Moreover, 12 novel SNPs related to pulmonary function were validated to reach the nominal significance level (p < 0.05) in the UKB, with some located in the TBC1D16, TAFA5, and MTHFD1L genes.</p><p><strong>Conclusion: </strong>Our GWAS results on Chinese dizygotic twins provide new references for the genetic regulation on pulmonary function. Twelve novel susceptibility loci are considered as possible crucial to pulmonary function.</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"25 1","pages":"1255"},"PeriodicalIF":3.5,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11684132/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142906270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-28DOI: 10.1186/s12864-024-11168-3
Li Tan, Li Mengshan, Fu Yu, Li Yelin, Zhu Jihong, Guan Lixin
Long non-coding RNAs (lncRNAs) play crucial roles in numerous biological processes and are involved in complex human diseases through interactions with proteins. Accurate identification of lncRNA-protein interactions (LPI) can help elucidate the functional mechanisms of lncRNAs and provide scientific insights into the molecular mechanisms underlying related diseases. While many sequence-based methods have been developed to predict LPIs, efficiently extracting and effectively integrating potential feature information that reflects functional attributes from lncRNA and protein sequences remains a significant challenge. This paper proposes a Dinucleotide-Codon Fusion Feature encoding (DNCFF) and constructs an LPI prediction model based on deep learning, termed LPI-DNCFF. The Dual Nucleotide Visual Fusion Feature encoding (DNVFF) incorporates positional information of single nucleotides with subsequent nucleotide connections, while Codon Fusion Feature encoding (CFF) considers the specificity, molecular weight, and physicochemical properties of each amino acid. These encoding methods encapsulate rich and intuitive sequence information in limited encoding dimensions. The model comprehensively predicts LPIs by integrating global, local, and structural features, and inputs them into BiLSTM and attention layers to form a hybrid deep learning model. Experimental results demonstrate that LPI-DNCFF effectively predicts LPIs. The BiLSTM layer and attention mechanism can learn long-term dependencies and identify weighted key features, enhancing model performance. Compared to one-hot encoding, DNCFF more efficiently and thoroughly extracts potential sequence features. Compared to other existing methods, LPI-DNCFF achieved the best performance on the RPI1847 and ATH948 datasets, with MCC values of approximately 97.84% and 84.58%, respectively, outperforming the state-of-the-art method by about 1.44% and 3.48%.
{"title":"Predicting lncRNA-protein interactions using a hybrid deep learning model with dinucleotide-codon fusion feature encoding.","authors":"Li Tan, Li Mengshan, Fu Yu, Li Yelin, Zhu Jihong, Guan Lixin","doi":"10.1186/s12864-024-11168-3","DOIUrl":"10.1186/s12864-024-11168-3","url":null,"abstract":"<p><p>Long non-coding RNAs (lncRNAs) play crucial roles in numerous biological processes and are involved in complex human diseases through interactions with proteins. Accurate identification of lncRNA-protein interactions (LPI) can help elucidate the functional mechanisms of lncRNAs and provide scientific insights into the molecular mechanisms underlying related diseases. While many sequence-based methods have been developed to predict LPIs, efficiently extracting and effectively integrating potential feature information that reflects functional attributes from lncRNA and protein sequences remains a significant challenge. This paper proposes a Dinucleotide-Codon Fusion Feature encoding (DNCFF) and constructs an LPI prediction model based on deep learning, termed LPI-DNCFF. The Dual Nucleotide Visual Fusion Feature encoding (DNVFF) incorporates positional information of single nucleotides with subsequent nucleotide connections, while Codon Fusion Feature encoding (CFF) considers the specificity, molecular weight, and physicochemical properties of each amino acid. These encoding methods encapsulate rich and intuitive sequence information in limited encoding dimensions. The model comprehensively predicts LPIs by integrating global, local, and structural features, and inputs them into BiLSTM and attention layers to form a hybrid deep learning model. Experimental results demonstrate that LPI-DNCFF effectively predicts LPIs. The BiLSTM layer and attention mechanism can learn long-term dependencies and identify weighted key features, enhancing model performance. Compared to one-hot encoding, DNCFF more efficiently and thoroughly extracts potential sequence features. Compared to other existing methods, LPI-DNCFF achieved the best performance on the RPI1847 and ATH948 datasets, with MCC values of approximately 97.84% and 84.58%, respectively, outperforming the state-of-the-art method by about 1.44% and 3.48%.</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"25 1","pages":"1253"},"PeriodicalIF":3.5,"publicationDate":"2024-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11682639/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142891826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-27DOI: 10.1186/s12864-024-11167-4
Isabel Barranco, Carmen Almiñana, Ana Parra, Pablo Martínez-Diaz, Xiomara Lucas, Stefan Bauersachs, Jordi Roca
Background: Extracellular vesicles (EVs) are essential for cell-to-cell communication because they transport functionally active molecules, including proteins, RNA, and lipids, from secretory cells to nearby or distant target cells. Seminal plasma contains a large number of EVs (sEVs) that are phenotypically heterogeneous. The aim of the present study was to identify the RNA species contained in two subsets of porcine sEVs of different sizes, namely small sEVs (S-sEVs) and large sEVs (L-sEVs). The two subsets of sEVs were isolated from 54 seminal plasma samples by a method combining serial centrifugations, size exclusion chromatography, and ultrafiltration. The sEVs were characterized using an orthogonal approach. Analysis of RNA content and quantification were performed using RNA-seq analysis.
Results: The two subsets of sEVs had different size distributions (P < 0.001). They also showed differences in concentration, morphology, and specific protein markers (P < 0.05). A total of 735 RNAs were identified and quantified, which included: (1) mRNAs, rRNAs, snoRNAs, snRNAs, tRNAs, other ncRNAs (termed as "all RNAs"), (2) miRNAs and (3) piRNAs. The distribution pattern of these RNA classes differed between S-sEVs and L-sEVs (P < 0.05). More than half of "all RNAs", miRNAs and piRNAs were found to be differentially abundant between S- and L-sEVs (FDR < 0.1%). Among the differentially abundant RNAs, "all RNAs" were more abundant in L- than in S-sEVs, whereas the most of the miRNAs were more abundant in S- than in L-sEVs. Differentially abundant piRNAs were equally distributed between S- and L-sEVs. Some of the all RNAs and miRNAs found to be differentially abundant between S- and L-sEVs were associated with sperm quality and functionality and male fertility success.
Conclusions: Small and large sEVs isolated from porcine seminal plasma show quantitative differences in RNA content. These differences would suggest that each sEV subtype exerts different functional activities in the targeted cells, namely spermatozoa and functional cells of the female reproductive tract.
{"title":"RNA profiles differ between small and large extracellular vesicle subsets isolated from porcine seminal plasma.","authors":"Isabel Barranco, Carmen Almiñana, Ana Parra, Pablo Martínez-Diaz, Xiomara Lucas, Stefan Bauersachs, Jordi Roca","doi":"10.1186/s12864-024-11167-4","DOIUrl":"10.1186/s12864-024-11167-4","url":null,"abstract":"<p><strong>Background: </strong>Extracellular vesicles (EVs) are essential for cell-to-cell communication because they transport functionally active molecules, including proteins, RNA, and lipids, from secretory cells to nearby or distant target cells. Seminal plasma contains a large number of EVs (sEVs) that are phenotypically heterogeneous. The aim of the present study was to identify the RNA species contained in two subsets of porcine sEVs of different sizes, namely small sEVs (S-sEVs) and large sEVs (L-sEVs). The two subsets of sEVs were isolated from 54 seminal plasma samples by a method combining serial centrifugations, size exclusion chromatography, and ultrafiltration. The sEVs were characterized using an orthogonal approach. Analysis of RNA content and quantification were performed using RNA-seq analysis.</p><p><strong>Results: </strong>The two subsets of sEVs had different size distributions (P < 0.001). They also showed differences in concentration, morphology, and specific protein markers (P < 0.05). A total of 735 RNAs were identified and quantified, which included: (1) mRNAs, rRNAs, snoRNAs, snRNAs, tRNAs, other ncRNAs (termed as \"all RNAs\"), (2) miRNAs and (3) piRNAs. The distribution pattern of these RNA classes differed between S-sEVs and L-sEVs (P < 0.05). More than half of \"all RNAs\", miRNAs and piRNAs were found to be differentially abundant between S- and L-sEVs (FDR < 0.1%). Among the differentially abundant RNAs, \"all RNAs\" were more abundant in L- than in S-sEVs, whereas the most of the miRNAs were more abundant in S- than in L-sEVs. Differentially abundant piRNAs were equally distributed between S- and L-sEVs. Some of the all RNAs and miRNAs found to be differentially abundant between S- and L-sEVs were associated with sperm quality and functionality and male fertility success.</p><p><strong>Conclusions: </strong>Small and large sEVs isolated from porcine seminal plasma show quantitative differences in RNA content. These differences would suggest that each sEV subtype exerts different functional activities in the targeted cells, namely spermatozoa and functional cells of the female reproductive tract.</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"25 1","pages":"1250"},"PeriodicalIF":3.5,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11673705/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142891829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-27DOI: 10.1186/s12864-024-11173-6
Yu Deng, Jianhua Jia, Mengyue Yi
Background: The subcellular localization of mRNA plays a crucial role in gene expression regulation and various cellular processes. However, existing wet lab techniques like RNA-FISH are usually time-consuming, labor-intensive, and limited to specific tissue types. Researchers have developed several computational methods to predict mRNA subcellular localization to address this. These methods face the problem of class imbalance in multi-label classification, causing models to favor majority classes and overlook minority classes during training. Additionally, traditional feature extraction methods have high computational costs, incomplete features, and may lead to the loss of critical information. On the other hand, deep learning methods face challenges related to hardware performance and training time when handling complex sequences. They may suffer from the curse of dimensionality and overfitting problems. Therefore, there is an urgent need for more efficient and accurate prediction models.
Results: To address these issues, we propose a multi-label classifier, EDCLoc, for predicting mRNA subcellular localization. EDCLoc reduces training pressure through a stepwise pooling strategy and applies grouped convolution blocks of varying sizes at different levels, combined with residual connections, to achieve efficient feature extraction and gradient propagation. The model employs global max pooling at the end to further reduce feature dimensions and highlight key features. To tackle class imbalance, we improved the focal loss function to enhance the model's focus on minority classes. Evaluation results show that EDCLoc outperforms existing methods in most subcellular regions. Additionally, the position weight matrix extracted by multi-scale CNN filters can match known RNA-binding protein motifs, demonstrating EDCLoc's effectiveness in capturing key sequence features.
Conclusions: EDCLoc outperforms existing prediction tools in most subcellular regions and effectively mitigates class imbalance issues in multi-label classification. These advantages make EDCLoc a reliable choice for multi-label mRNA subcellular localization. The dataset and source code used in this study are available at https://github.com/DellCode233/EDCLoc .
{"title":"EDCLoc: a prediction model for mRNA subcellular localization using improved focal loss to address multi-label class imbalance.","authors":"Yu Deng, Jianhua Jia, Mengyue Yi","doi":"10.1186/s12864-024-11173-6","DOIUrl":"10.1186/s12864-024-11173-6","url":null,"abstract":"<p><strong>Background: </strong>The subcellular localization of mRNA plays a crucial role in gene expression regulation and various cellular processes. However, existing wet lab techniques like RNA-FISH are usually time-consuming, labor-intensive, and limited to specific tissue types. Researchers have developed several computational methods to predict mRNA subcellular localization to address this. These methods face the problem of class imbalance in multi-label classification, causing models to favor majority classes and overlook minority classes during training. Additionally, traditional feature extraction methods have high computational costs, incomplete features, and may lead to the loss of critical information. On the other hand, deep learning methods face challenges related to hardware performance and training time when handling complex sequences. They may suffer from the curse of dimensionality and overfitting problems. Therefore, there is an urgent need for more efficient and accurate prediction models.</p><p><strong>Results: </strong>To address these issues, we propose a multi-label classifier, EDCLoc, for predicting mRNA subcellular localization. EDCLoc reduces training pressure through a stepwise pooling strategy and applies grouped convolution blocks of varying sizes at different levels, combined with residual connections, to achieve efficient feature extraction and gradient propagation. The model employs global max pooling at the end to further reduce feature dimensions and highlight key features. To tackle class imbalance, we improved the focal loss function to enhance the model's focus on minority classes. Evaluation results show that EDCLoc outperforms existing methods in most subcellular regions. Additionally, the position weight matrix extracted by multi-scale CNN filters can match known RNA-binding protein motifs, demonstrating EDCLoc's effectiveness in capturing key sequence features.</p><p><strong>Conclusions: </strong>EDCLoc outperforms existing prediction tools in most subcellular regions and effectively mitigates class imbalance issues in multi-label classification. These advantages make EDCLoc a reliable choice for multi-label mRNA subcellular localization. The dataset and source code used in this study are available at https://github.com/DellCode233/EDCLoc .</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"25 1","pages":"1252"},"PeriodicalIF":3.5,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11674359/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142891819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}