Background: One of the most promising approaches for early and more precise disease prediction and diagnosis is through the inclusion of proteomics data augmented with clinical data. Clinical proteomics data is often characterized by its high dimensionality and extremely limited sample size, posing a significant challenge when employing machine learning techniques for extracting only the most relevant information. Although there is a wide array of statistical techniques and numerous analysis pipelines employed in proteomics data analysis, it is unclear which of these methods produce the most efficient, reproducible, and clinically meaningful results.
Results: In this study, we compared 9 unique analysis schemes comprised of different machine learning and dimensionality reduction methods for the analysis of simulated proteomics data consisting of 1317 proteins measured in 26 subjects (i.e., 13 controls and 13 cases). In scenarios where the sample size is extremely small (i.e., n < 30), all schemes resulted in an exceptionally high level of performance metrics, indicating potential overfitting. While performance metrics did not exhibit significant differences across schemes, the set of proteins selected to be discriminatory between groups demonstrated a substantial level of heterogeneity. However, despite heterogeneity in the selected proteins, their biological pathways and genetic diseases exhibited similarities. A sensitivity analysis conducted using varying sample sizes indicated that the stability of a set of selected biomarkers improves with larger sample sizes within a scheme.
Conclusions: When the aim of the study is to identify a statistical model that best distinguishes between cohort groups using proteomics data and to uncover the biological pathways and disorders common among the selected proteins, the majority of widely used analysis pipelines perform similarly. However, if the main objective is to pinpoint a set of selected proteins that wield significant influence in discriminating cohort groups and utilize them for subsequent investigations, meticulous consideration is necessary when opting for statistical models, due to the possibility of heterogeneity in the sets of selected proteins.
{"title":"Addressing statistical challenges in the analysis of proteomics data with extremely small sample size: a simulation study.","authors":"Kyung Hyun Lee, Shervin Assassi, Chandra Mohan, Claudia Pedroza","doi":"10.1186/s12864-024-11018-2","DOIUrl":"10.1186/s12864-024-11018-2","url":null,"abstract":"<p><strong>Background: </strong>One of the most promising approaches for early and more precise disease prediction and diagnosis is through the inclusion of proteomics data augmented with clinical data. Clinical proteomics data is often characterized by its high dimensionality and extremely limited sample size, posing a significant challenge when employing machine learning techniques for extracting only the most relevant information. Although there is a wide array of statistical techniques and numerous analysis pipelines employed in proteomics data analysis, it is unclear which of these methods produce the most efficient, reproducible, and clinically meaningful results.</p><p><strong>Results: </strong>In this study, we compared 9 unique analysis schemes comprised of different machine learning and dimensionality reduction methods for the analysis of simulated proteomics data consisting of 1317 proteins measured in 26 subjects (i.e., 13 controls and 13 cases). In scenarios where the sample size is extremely small (i.e., n < 30), all schemes resulted in an exceptionally high level of performance metrics, indicating potential overfitting. While performance metrics did not exhibit significant differences across schemes, the set of proteins selected to be discriminatory between groups demonstrated a substantial level of heterogeneity. However, despite heterogeneity in the selected proteins, their biological pathways and genetic diseases exhibited similarities. A sensitivity analysis conducted using varying sample sizes indicated that the stability of a set of selected biomarkers improves with larger sample sizes within a scheme.</p><p><strong>Conclusions: </strong>When the aim of the study is to identify a statistical model that best distinguishes between cohort groups using proteomics data and to uncover the biological pathways and disorders common among the selected proteins, the majority of widely used analysis pipelines perform similarly. However, if the main objective is to pinpoint a set of selected proteins that wield significant influence in discriminating cohort groups and utilize them for subsequent investigations, meticulous consideration is necessary when opting for statistical models, due to the possibility of heterogeneity in the sets of selected proteins.</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"25 1","pages":"1086"},"PeriodicalIF":3.5,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11566501/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142614215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-13DOI: 10.1186/s12864-024-10991-y
Nicholas J Eagles, Svitlana V Bach, Madhavi Tippani, Prashanthi Ravichandran, Yufeng Du, Ryan A Miller, Thomas M Hyde, Stephanie C Page, Keri Martinowich, Leonardo Collado-Torres
Background: Visium is a widely-used spatially-resolved transcriptomics assay available from 10x Genomics. Standard Visium capture areas (6.5mm by 6.5mm) limit the survey of larger tissue structures, but combining overlapping images and associated gene expression data allow for more complex study designs. Current software can handle nested or partial image overlaps, but is designed for merging up to two capture areas, and cannot account for some technical scenarios related to capture area alignment.
Results: We generated Visium data from a postmortem human tissue sample such that two capture areas were partially overlapping and a third one was adjacent. We developed the R/Bioconductor package visiumStitched, which facilitates stitching the images together with Fiji (ImageJ), and constructing SpatialExperiment R objects with the stitched images and gene expression data. visiumStitched constructs an artificial hexagonal array grid which allows seamless downstream analyses such as spatially-aware clustering without discarding data from overlapping spots. Data stitched with visiumStitched can then be interactively visualized with spatialLIBD.
Conclusions: visiumStitched provides a simple, but flexible framework to handle various multi-capture area study design scenarios. Specifically, it resolves a data processing step without disrupting analysis workflows and without discarding data from overlapping spots. visiumStitched relies on affine transformations by Fiji, which have limitations and are less accurate when aligning against an atlas or other situations. visiumStitched provides an easy-to-use solution which expands possibilities for designing multi-capture area study designs.
{"title":"Integrating gene expression and imaging data across Visium capture areas with visiumStitched.","authors":"Nicholas J Eagles, Svitlana V Bach, Madhavi Tippani, Prashanthi Ravichandran, Yufeng Du, Ryan A Miller, Thomas M Hyde, Stephanie C Page, Keri Martinowich, Leonardo Collado-Torres","doi":"10.1186/s12864-024-10991-y","DOIUrl":"10.1186/s12864-024-10991-y","url":null,"abstract":"<p><strong>Background: </strong>Visium is a widely-used spatially-resolved transcriptomics assay available from 10x Genomics. Standard Visium capture areas (6.5mm by 6.5mm) limit the survey of larger tissue structures, but combining overlapping images and associated gene expression data allow for more complex study designs. Current software can handle nested or partial image overlaps, but is designed for merging up to two capture areas, and cannot account for some technical scenarios related to capture area alignment.</p><p><strong>Results: </strong>We generated Visium data from a postmortem human tissue sample such that two capture areas were partially overlapping and a third one was adjacent. We developed the R/Bioconductor package visiumStitched, which facilitates stitching the images together with Fiji (ImageJ), and constructing SpatialExperiment R objects with the stitched images and gene expression data. visiumStitched constructs an artificial hexagonal array grid which allows seamless downstream analyses such as spatially-aware clustering without discarding data from overlapping spots. Data stitched with visiumStitched can then be interactively visualized with spatialLIBD.</p><p><strong>Conclusions: </strong>visiumStitched provides a simple, but flexible framework to handle various multi-capture area study design scenarios. Specifically, it resolves a data processing step without disrupting analysis workflows and without discarding data from overlapping spots. visiumStitched relies on affine transformations by Fiji, which have limitations and are less accurate when aligning against an atlas or other situations. visiumStitched provides an easy-to-use solution which expands possibilities for designing multi-capture area study designs.</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"25 1","pages":"1077"},"PeriodicalIF":3.5,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11559125/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142614330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-13DOI: 10.1186/s12864-024-11021-7
Ana Claudia de Freitas, Henrique G Reolon, Natalya G Abduch, Fernando Baldi, Rafael M O Silva, Daniela Lourenco, Breno O Fragomeni, Claudia C P Paz, Nedenia B Stafuzza
Background: Heat stress has deleterious effects on physiological and performance traits in livestock. Within this context, using tropically adapted cattle breeds in pure herds or terminal crossbreeding schemes to explore heterosis is attractive for increasing animal production in warmer climate regions. This study aimed to identify biological processes, pathways, and potential biomarkers related to thermotolerance in Caracu, a tropically adapted beef cattle breed, by proteomic analysis of blood plasma. To achieve this goal, 61 bulls had their thermotolerance evaluated through a heat tolerance index. A subset of 14 extreme animals, including the seven most thermotolerant (HIGH group) and the seven least thermotolerant (LOW group), had their blood plasma samples used for proteomic analysis by liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS). The differentially regulated proteins detected between HIGH and LOW groups were used to perform functional enrichment analysis and a protein-protein interaction network analysis.
Results: A total of 217 proteins were detected only in the HIGH thermotolerant group and 51 only in the LOW thermotolerant group. In addition, 81 and 87 proteins had significantly higher and lower abundancies in the HIGH group, respectively. Regarding proteins with the highest absolute log-fold change values, we highlighted those encoded by DUSP5, IGFALS, ROCK2, RTN4, IRAG1, and NNT genes based on their functions. The functional enrichment analysis detected several biological processes, molecular functions, and pathways related to cellular responses to stress, immune system, complement system, and hemostasis in both HIGH and LOW groups, in addition to terms and pathways related to lipids and calcium only in the HIGH group. Protein-protein interaction (PPI) network revealed as important nodes many proteins with roles in response to stress, hemostasis, immune system, inflammation, and homeostasis. Additionally, proteins with high absolute log-fold change values and proteins detected as essential nodes by PPI analysis highlighted herein are potential biomarkers for thermotolerance, such as ADRA1A, APOA1, APOB, APOC3, C4BPA, CAT, CFB, CFH, CLU, CXADR, DNAJB1, DNAJC13, DUSP5, FGA, FGB, FGG, HBA, HBB, HP, HSPD1, IGFALS, IRAG1, KNG1, NNT, OSGIN1, PROC, PROS1, ROCK2, RTN4, RYR1, TGFB2, VLDLR, VTN, and VWF.
Conclusions: Identifying potential biomarkers, molecular mechanisms and pathways that act in response to heat stress in tropically adapted beef cattle contributes to developing strategies to improve performance and welfare traits in livestock under tropical climates.
{"title":"Proteomic identification of potential biomarkers for heat tolerance in Caracu beef cattle using high and low thermotolerant groups.","authors":"Ana Claudia de Freitas, Henrique G Reolon, Natalya G Abduch, Fernando Baldi, Rafael M O Silva, Daniela Lourenco, Breno O Fragomeni, Claudia C P Paz, Nedenia B Stafuzza","doi":"10.1186/s12864-024-11021-7","DOIUrl":"10.1186/s12864-024-11021-7","url":null,"abstract":"<p><strong>Background: </strong>Heat stress has deleterious effects on physiological and performance traits in livestock. Within this context, using tropically adapted cattle breeds in pure herds or terminal crossbreeding schemes to explore heterosis is attractive for increasing animal production in warmer climate regions. This study aimed to identify biological processes, pathways, and potential biomarkers related to thermotolerance in Caracu, a tropically adapted beef cattle breed, by proteomic analysis of blood plasma. To achieve this goal, 61 bulls had their thermotolerance evaluated through a heat tolerance index. A subset of 14 extreme animals, including the seven most thermotolerant (HIGH group) and the seven least thermotolerant (LOW group), had their blood plasma samples used for proteomic analysis by liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS). The differentially regulated proteins detected between HIGH and LOW groups were used to perform functional enrichment analysis and a protein-protein interaction network analysis.</p><p><strong>Results: </strong>A total of 217 proteins were detected only in the HIGH thermotolerant group and 51 only in the LOW thermotolerant group. In addition, 81 and 87 proteins had significantly higher and lower abundancies in the HIGH group, respectively. Regarding proteins with the highest absolute log-fold change values, we highlighted those encoded by DUSP5, IGFALS, ROCK2, RTN4, IRAG1, and NNT genes based on their functions. The functional enrichment analysis detected several biological processes, molecular functions, and pathways related to cellular responses to stress, immune system, complement system, and hemostasis in both HIGH and LOW groups, in addition to terms and pathways related to lipids and calcium only in the HIGH group. Protein-protein interaction (PPI) network revealed as important nodes many proteins with roles in response to stress, hemostasis, immune system, inflammation, and homeostasis. Additionally, proteins with high absolute log-fold change values and proteins detected as essential nodes by PPI analysis highlighted herein are potential biomarkers for thermotolerance, such as ADRA1A, APOA1, APOB, APOC3, C4BPA, CAT, CFB, CFH, CLU, CXADR, DNAJB1, DNAJC13, DUSP5, FGA, FGB, FGG, HBA, HBB, HP, HSPD1, IGFALS, IRAG1, KNG1, NNT, OSGIN1, PROC, PROS1, ROCK2, RTN4, RYR1, TGFB2, VLDLR, VTN, and VWF.</p><p><strong>Conclusions: </strong>Identifying potential biomarkers, molecular mechanisms and pathways that act in response to heat stress in tropically adapted beef cattle contributes to developing strategies to improve performance and welfare traits in livestock under tropical climates.</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"25 1","pages":"1079"},"PeriodicalIF":3.5,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562314/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142613837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Phloem protein 2 (PP2), a dimeric lectin, is known for its involvement in plant responses to biotic and abiotic stresses. However, research on PP2 proteins in Moso bamboo is lacking.
Results: In this study, comprehensive genome-wide analysis of the PP2-like gene family was conducted in Moso bamboo (Phyllostachys edulis), which has a significant economic and ecological value. Using HMMER3 search and InterPro domain analysis, 23 PP2-like genes (PhePP2-1 to PhePP2-23) were identified in the P. edulis genome. These genes were distributed across 12 chromosomal scaffolds, with proteins ranging from 216 to 556 amino acids in length. Phylogenetic analysis, including 163 PP2 proteins from eight plant species, revealed six distinct groups, with Group III and Group V being the largest. Gene structure and motif analyses indicated conserved domains across the PhePP2 proteins. In addition, Cis-element analysis of the promoter regions highlighted their potential regulatory roles in hormone, stress, and light responses. Expression pattern analysis using RNA-seq data showed differential expression of PhePP2 genes under drought, salt, salicylic acid, and abscisic acid treatments, indicating their involvement in stress response pathways. Furthermore, qPCR validation in different tissues and organs of Moso bamboo confirmed the expression profiles of the selected PhePP2 genes.
Conclusions: This study provides a comprehensive understanding of the functional roles of PP2-like genes in Moso bamboo and insights into their potential applications in enhancing stress tolerance and growth in plants.
{"title":"PP2 gene family in Phyllostachys edulis: identification, characterization, and expression profiles.","authors":"Liumeng Zheng, Huifang Zheng, Xianzhe Zheng, Yanling Duan, Xiaobo Yu","doi":"10.1186/s12864-024-11007-5","DOIUrl":"10.1186/s12864-024-11007-5","url":null,"abstract":"<p><strong>Background: </strong>Phloem protein 2 (PP2), a dimeric lectin, is known for its involvement in plant responses to biotic and abiotic stresses. However, research on PP2 proteins in Moso bamboo is lacking.</p><p><strong>Results: </strong>In this study, comprehensive genome-wide analysis of the PP2-like gene family was conducted in Moso bamboo (Phyllostachys edulis), which has a significant economic and ecological value. Using HMMER3 search and InterPro domain analysis, 23 PP2-like genes (PhePP2-1 to PhePP2-23) were identified in the P. edulis genome. These genes were distributed across 12 chromosomal scaffolds, with proteins ranging from 216 to 556 amino acids in length. Phylogenetic analysis, including 163 PP2 proteins from eight plant species, revealed six distinct groups, with Group III and Group V being the largest. Gene structure and motif analyses indicated conserved domains across the PhePP2 proteins. In addition, Cis-element analysis of the promoter regions highlighted their potential regulatory roles in hormone, stress, and light responses. Expression pattern analysis using RNA-seq data showed differential expression of PhePP2 genes under drought, salt, salicylic acid, and abscisic acid treatments, indicating their involvement in stress response pathways. Furthermore, qPCR validation in different tissues and organs of Moso bamboo confirmed the expression profiles of the selected PhePP2 genes.</p><p><strong>Conclusions: </strong>This study provides a comprehensive understanding of the functional roles of PP2-like genes in Moso bamboo and insights into their potential applications in enhancing stress tolerance and growth in plants.</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"25 1","pages":"1081"},"PeriodicalIF":3.5,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562636/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142613817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-13DOI: 10.1186/s12864-024-10997-6
Karthick Raja Arulprakasam, Janelle Wing Shan Toh, Herman Foo, Mani R Kumar, An-Nikol Kutevska, Emilia Emmanuelle Davey, Marek Mutwil, Guillaume Thibault
In the rapidly expanding domain of scientific research, tracking and synthesizing information from the rapidly increasing volume of publications pose significant challenges. To address this, we introduce a novel high-throughput pipeline that employs ChatGPT to systematically extract and analyze connectivity information from the full-texts and abstracts of 24,237 and 150,538 research publications concerning Caenorhabditis elegans and Drosophila melanogaster, respectively. This approach has effectively identified 200,219 and 1,194,587 interactions within the C. elegans and Drosophila biomaps, respectively. Utilizing Cytoscape Web, we have developed a searchable online biomaps that link relevant keywords to their corresponding PubMed IDs, thus providing seamless access to an extensive knowledge network encompassing C. elegans and Drosophila. Our work highlights the transformative potential of integrating artificial intelligence with bioinformatics to deepen our understanding of complex biological systems. By revealing the intricate web of relationships among key entities in C. elegans and Drosophila, we offer invaluable insights that promise to propel advancements in genetics, developmental biology, neuroscience, longevity, and beyond. We also provide details and discuss significant nodes within both biomaps, including the insulin/IGF-1 signaling (IIS) and the notch pathways. Our innovative methodology sets a robust foundation for future research aimed at unravelling complex biological networks across diverse organisms. The two databases are available at worm.bio-map.com and drosophila.bio-map.com.
{"title":"Harnessing full-text publications for deep insights into C. elegans and Drosophila biomaps.","authors":"Karthick Raja Arulprakasam, Janelle Wing Shan Toh, Herman Foo, Mani R Kumar, An-Nikol Kutevska, Emilia Emmanuelle Davey, Marek Mutwil, Guillaume Thibault","doi":"10.1186/s12864-024-10997-6","DOIUrl":"10.1186/s12864-024-10997-6","url":null,"abstract":"<p><p>In the rapidly expanding domain of scientific research, tracking and synthesizing information from the rapidly increasing volume of publications pose significant challenges. To address this, we introduce a novel high-throughput pipeline that employs ChatGPT to systematically extract and analyze connectivity information from the full-texts and abstracts of 24,237 and 150,538 research publications concerning Caenorhabditis elegans and Drosophila melanogaster, respectively. This approach has effectively identified 200,219 and 1,194,587 interactions within the C. elegans and Drosophila biomaps, respectively. Utilizing Cytoscape Web, we have developed a searchable online biomaps that link relevant keywords to their corresponding PubMed IDs, thus providing seamless access to an extensive knowledge network encompassing C. elegans and Drosophila. Our work highlights the transformative potential of integrating artificial intelligence with bioinformatics to deepen our understanding of complex biological systems. By revealing the intricate web of relationships among key entities in C. elegans and Drosophila, we offer invaluable insights that promise to propel advancements in genetics, developmental biology, neuroscience, longevity, and beyond. We also provide details and discuss significant nodes within both biomaps, including the insulin/IGF-1 signaling (IIS) and the notch pathways. Our innovative methodology sets a robust foundation for future research aimed at unravelling complex biological networks across diverse organisms. The two databases are available at worm.bio-map.com and drosophila.bio-map.com.</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"25 1","pages":"1080"},"PeriodicalIF":3.5,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562368/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142614321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-13DOI: 10.1186/s12864-024-10977-w
Taylor Tushar, Thai Binh Pham, Kiona Parker, Marc Crepeau, Gregory C Lanzaro, Anthony A James, Rebeca Carballar-Lejarazú
Background: Novel technologies are needed to combat anopheline vectors of malaria parasites as the reductions in worldwide disease incidence has stalled in recent years. Gene drive-based approaches utilizing Cas9/guide RNA (gRNA) systems are being developed to suppress anopheline populations or modify them by increasing their refractoriness to the parasites. These systems rely on the successful cleavage of a chromosomal DNA target site followed by homology-directed repair (HDR) in germline cells to bias inheritance of the drive system. An optimal drive system should be highly efficient for HDR-mediated gene conversion with minimal error rates. A gene-drive system, AgNosCd-1, with these attributes has been developed in the Anopheles gambiae G3 strain and serves as a framework for further development of population modification strains. To validate AgNosCd-1 as a versatile platform, it must perform well in a variety of genetic backgrounds.
Results: We introduced or introgressed AgNosCd-1 into different genetic backgrounds, three in geographically-diverse Anopheles gambiae strains, and one each in an An. coluzzii and An. arabiensis strain. The overall drive inheritance, determined by presence of a dominant marker gene in the F2 hybrids, far exceeded Mendelian inheritance ratios in all genetic backgrounds that produced viable progeny. Haldane's rule was confirmed for AgNosCd-1 introgression into the An. arabiensis Dongola strain and sterility of the F1 hybrid males prevented production of F2 hybrid offspring. Back-crosses of F1 hybrid females were not performed to keep the experimental design consistent across all the genetic backgrounds and to avoid maternally-generated mutant alleles that might confound the drive dynamics. DNA sequencing of the target site in F1 and F2 mosquitoes with exceptional phenotypes revealed drive system-generated mutations resulting from non-homologous end joining events (NHEJ), which formed at rates similar to AgNosCd-1 in the G3 genetic background and were generated via the same maternal-effect mechanism.
Conclusions: These findings support the conclusion that the AgNosCd-1 drive system is robust and has high drive inheritance and gene conversion efficiency accompanied by low NHEJ mutation rates in diverse An. gambiae s.l. laboratory strains.
背景:近年来,全球疟疾发病率的下降停滞不前,因此需要采用新技术来对付疟原虫病媒。目前正在开发基于基因驱动的方法,利用 Cas9/guide RNA(gRNA)系统来抑制疟原虫种群,或通过增加它们对寄生虫的耐受性来改变它们。这些系统依赖于染色体 DNA 目标位点的成功裂解,然后在生殖细胞中进行同源定向修复(HDR),使驱动系统偏向遗传。最佳的驱动系统应能高效地进行 HDR 介导的基因转换,并将错误率降至最低。具有这些特性的基因驱动系统 AgNosCd-1 已在冈比亚按蚊 G3 株系中开发出来,并作为进一步开发群体改造株系的框架。为了验证 AgNosCd-1 作为一个多功能平台的有效性,它必须在各种遗传背景下表现良好:结果:我们在不同的遗传背景中引入或导入了 AgNosCd-1,其中三个是在地理位置不同的冈比亚按蚊品系中,另一个是在科鲁兹按蚊和阿拉伯按蚊品系中。根据 F2 杂交种中显性标记基因的存在情况确定,在所有产生可存活后代的遗传背景中,总体驱动遗传率远远超过孟德尔遗传率。在将 AgNosCd-1 导入阿拉伯疟原虫 Dongola 株系时,霍尔丹法则得到了证实,F1 杂交雄性的不育性阻止了 F2 杂交后代的产生。为了使所有遗传背景的实验设计保持一致,并避免母本产生的突变等位基因可能对驱动力动态产生干扰,没有对 F1 杂交雌性进行回交。对具有特殊表型的 F1 和 F2 蚊子的目标位点进行 DNA 测序,发现了由非同源末端连接事件(NHEJ)导致的驱动系统产生的突变,这些突变的形成速度与 G3 遗传背景中的 AgNosCd-1 相似,并且是通过相同的母体效应机制产生的:这些发现支持以下结论:AgNosCd-1驱动系统是稳健的,在不同的冈比亚蚂蚁实验室菌株中具有高驱动遗传和基因转换效率以及低NHEJ突变率。
{"title":"Cas9/guide RNA-based gene-drive dynamics following introduction and introgression into diverse anopheline mosquito genetic backgrounds.","authors":"Taylor Tushar, Thai Binh Pham, Kiona Parker, Marc Crepeau, Gregory C Lanzaro, Anthony A James, Rebeca Carballar-Lejarazú","doi":"10.1186/s12864-024-10977-w","DOIUrl":"10.1186/s12864-024-10977-w","url":null,"abstract":"<p><strong>Background: </strong>Novel technologies are needed to combat anopheline vectors of malaria parasites as the reductions in worldwide disease incidence has stalled in recent years. Gene drive-based approaches utilizing Cas9/guide RNA (gRNA) systems are being developed to suppress anopheline populations or modify them by increasing their refractoriness to the parasites. These systems rely on the successful cleavage of a chromosomal DNA target site followed by homology-directed repair (HDR) in germline cells to bias inheritance of the drive system. An optimal drive system should be highly efficient for HDR-mediated gene conversion with minimal error rates. A gene-drive system, AgNosCd-1, with these attributes has been developed in the Anopheles gambiae G3 strain and serves as a framework for further development of population modification strains. To validate AgNosCd-1 as a versatile platform, it must perform well in a variety of genetic backgrounds.</p><p><strong>Results: </strong>We introduced or introgressed AgNosCd-1 into different genetic backgrounds, three in geographically-diverse Anopheles gambiae strains, and one each in an An. coluzzii and An. arabiensis strain. The overall drive inheritance, determined by presence of a dominant marker gene in the F2 hybrids, far exceeded Mendelian inheritance ratios in all genetic backgrounds that produced viable progeny. Haldane's rule was confirmed for AgNosCd-1 introgression into the An. arabiensis Dongola strain and sterility of the F1 hybrid males prevented production of F2 hybrid offspring. Back-crosses of F1 hybrid females were not performed to keep the experimental design consistent across all the genetic backgrounds and to avoid maternally-generated mutant alleles that might confound the drive dynamics. DNA sequencing of the target site in F1 and F2 mosquitoes with exceptional phenotypes revealed drive system-generated mutations resulting from non-homologous end joining events (NHEJ), which formed at rates similar to AgNosCd-1 in the G3 genetic background and were generated via the same maternal-effect mechanism.</p><p><strong>Conclusions: </strong>These findings support the conclusion that the AgNosCd-1 drive system is robust and has high drive inheritance and gene conversion efficiency accompanied by low NHEJ mutation rates in diverse An. gambiae s.l. laboratory strains.</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"25 1","pages":"1078"},"PeriodicalIF":3.5,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11558816/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142614272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mangiferin, a C-glucosyl xanthone, is a biologically active glycoside naturally synthesized in mango. Glycosyltransferase can catalyze the biosynthesis of mangiferin. In this study, we identified 221 members of the UGT glycosyltransferase family in mango. The 221 MiUGT genes were grouped into 13 subfamilies through phylogenetic tree analysis with Arabidopsis, Chinese bayberry, and mango. All UGT family members in mango were unevenly distributed on 17 chromosomes and found that tandem duplication dominated the expansion of UGT family members in mango. Purification selection primarily influenced the evolution of the mango UGT family members. In addition, cis-element analysis of the mango UGT gene family revealed the presence of MYB binding sites, which are involved in flavonoid biosynthesis; which further supports the role of UGT family members in the synthesis of flavonoids. To verify these results, we analyzed the expression of UGT family members in mango leaves, stems, and different developmental stages of fruit peel. The RNA-seq and qRT-PCR results showed significant differences in the expression patterns of MiUGT genes in various tissues and developmental stages of mango. We identified MiUGT gene-specific expression at different stages of fruit development. These results lay a theoretical foundation for research on the relationship between members of the mango UGT family and the synthesis of flavonoids, mangiferin.
{"title":"Transcriptome and genome-wide analysis of the mango glycosyltransferase family involved in mangiferin biosynthesis.","authors":"Yibo Bai, Xinran Huang, Rundong Yao, Muhammad Mubashar Zafar, Waqas Shafqat Chattha, Fei Qiao, Hanqing Cong","doi":"10.1186/s12864-024-10998-5","DOIUrl":"10.1186/s12864-024-10998-5","url":null,"abstract":"<p><p>Mangiferin, a C-glucosyl xanthone, is a biologically active glycoside naturally synthesized in mango. Glycosyltransferase can catalyze the biosynthesis of mangiferin. In this study, we identified 221 members of the UGT glycosyltransferase family in mango. The 221 MiUGT genes were grouped into 13 subfamilies through phylogenetic tree analysis with Arabidopsis, Chinese bayberry, and mango. All UGT family members in mango were unevenly distributed on 17 chromosomes and found that tandem duplication dominated the expansion of UGT family members in mango. Purification selection primarily influenced the evolution of the mango UGT family members. In addition, cis-element analysis of the mango UGT gene family revealed the presence of MYB binding sites, which are involved in flavonoid biosynthesis; which further supports the role of UGT family members in the synthesis of flavonoids. To verify these results, we analyzed the expression of UGT family members in mango leaves, stems, and different developmental stages of fruit peel. The RNA-seq and qRT-PCR results showed significant differences in the expression patterns of MiUGT genes in various tissues and developmental stages of mango. We identified MiUGT gene-specific expression at different stages of fruit development. These results lay a theoretical foundation for research on the relationship between members of the mango UGT family and the synthesis of flavonoids, mangiferin.</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"25 1","pages":"1074"},"PeriodicalIF":3.5,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11555977/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142614007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: As the inflorescence of wheat, spike architecture largely determines grain productivity. Dissecting the genetic basis for the spike morphology of wheat can contribute to the designation of ideal spike morphology to improve grain production.
Results: The present study characterizes a dense spike1 (ds1) mutant, derived from Nongda3753, induced by EMS treatment, which exhibits a dense spike and reduced plant height. Through bulked segregant analysis sequencing (BSA-Seq) of two segregating populations, ds1 was mapped to the short arm of chromosome 7B. Further genotypic and phenotypic analyses of the residual heterozygous lines from F3 to F6 of Yong3002×ds1 revealed that there was a 0-135 Mb deletion in chromosome 7B associated with the dense spike phenotype. The reads count analysis of the two bulks in BSA-Seq, along with the cytological analysis of ds1, ND3753, NIL-ds1 and NIL-Y3002, confirmed that the partial unidirectional translocation of 5AL (543-713 Mb) to 7BS (0-135 Mb) exists in ds1. This translocation led to an increase in both copy number and expression of the Q gene, which is one of the reasons for the dense spike phenotype observed in ds1.
Conclusion: Partial unidirectional translocation from 5AL to 7BS was identified in the EMS-induced mutant ds1, which exhibits dense spike phenotype. This research illustrates the effect of one chromosome structure variation on wheat spike morphology, and provides new materials with several chromosome structure variations for future wheat breeding.
{"title":"Partial unidirectional translocation from 5AL to 7BS leads to dense spike in an EMS-induced wheat mutant.","authors":"Xiaoyu Zhang, Yongfa Wang, Yongming Chen, Yazhou Li, Kai Guo, Jin Xu, Panfeng Guan, Tianyu Lan, Mingming Xin, Zhaorong Hu, Weilong Guo, Yingyin Yao, Zhongfu Ni, Qixin Sun, Ming Hao, Huiru Peng","doi":"10.1186/s12864-024-11000-y","DOIUrl":"10.1186/s12864-024-11000-y","url":null,"abstract":"<p><strong>Background: </strong>As the inflorescence of wheat, spike architecture largely determines grain productivity. Dissecting the genetic basis for the spike morphology of wheat can contribute to the designation of ideal spike morphology to improve grain production.</p><p><strong>Results: </strong>The present study characterizes a dense spike1 (ds1) mutant, derived from Nongda3753, induced by EMS treatment, which exhibits a dense spike and reduced plant height. Through bulked segregant analysis sequencing (BSA-Seq) of two segregating populations, ds1 was mapped to the short arm of chromosome 7B. Further genotypic and phenotypic analyses of the residual heterozygous lines from F<sub>3</sub> to F<sub>6</sub> of Yong3002×ds1 revealed that there was a 0-135 Mb deletion in chromosome 7B associated with the dense spike phenotype. The reads count analysis of the two bulks in BSA-Seq, along with the cytological analysis of ds1, ND3753, NIL-ds1 and NIL-Y3002, confirmed that the partial unidirectional translocation of 5AL (543-713 Mb) to 7BS (0-135 Mb) exists in ds1. This translocation led to an increase in both copy number and expression of the Q gene, which is one of the reasons for the dense spike phenotype observed in ds1.</p><p><strong>Conclusion: </strong>Partial unidirectional translocation from 5AL to 7BS was identified in the EMS-induced mutant ds1, which exhibits dense spike phenotype. This research illustrates the effect of one chromosome structure variation on wheat spike morphology, and provides new materials with several chromosome structure variations for future wheat breeding.</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"25 1","pages":"1073"},"PeriodicalIF":3.5,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11555835/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142613813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-12DOI: 10.1186/s12864-024-11008-4
Siyuan Zhan, Rui Jiang, Zongqi An, Yang Zhang, Tao Zhong, Linjie Wang, Jiazhong Guo, Jiaxue Cao, Li Li, Hongping Zhang
Background: Circular RNAs (circRNAs) function as essential regulatory elements with pivotal roles in various biological processes. However, their expression profiles and functional regulation during the differentiation of goat myoblasts have not been thoroughly explored. This study conducts an analysis of circRNA expression profiles during the proliferation phase (cultured in growth medium, GM) and differentiation phase (cultured in differentiation medium, DM1/DM5) of skeletal muscle satellite cells (MuSCs) in goats.
Results: A total of 2,094 circRNAs were identified, among which 84 were differentially expressed as determined by pairwise comparisons across three distinct groups. Validation of the expression levels of six randomly selected circRNAs was performed using reverse transcription PCR (RT-PCR) and quantitative RT-PCR (qRT-PCR), with confirmation of their back-splicing junction sites. Enrichment analysis of the host genes associated with differentially expressed circRNAs (DEcircRNAs) indicated significant involvement in biological processes such as muscle contraction, muscle hypertrophy, and muscle tissue development. Additionally, these host genes were implicated in key signaling pathways, including Hippo, TGF-beta, and MAPK pathways. Subsequently, employing Cytoscape, we developed a circRNA-miRNA interaction network to elucidate the complex regulatory mechanisms underlying goat muscle development, encompassing 21 circRNAs and 47 miRNAs. Functional assays demonstrated that circTGFβ2 enhances myogenic differentiation in goats, potentially through a miRNA sponge mechanism.
Conclusion: In conclusion, we identified the genome-wide expression profiles of circRNAs in goat MuSCs during both proliferation and differentiation phases, and established that circTGFβ2 plays a role in the regulation of myogenesis. This study offers a significant resource for the advanced exploration of the biological functions and mechanisms of circRNAs in the myogenesis of goats.
{"title":"CircRNA profiling of skeletal muscle satellite cells in goats reveals circTGFβ2 promotes myoblast differentiation.","authors":"Siyuan Zhan, Rui Jiang, Zongqi An, Yang Zhang, Tao Zhong, Linjie Wang, Jiazhong Guo, Jiaxue Cao, Li Li, Hongping Zhang","doi":"10.1186/s12864-024-11008-4","DOIUrl":"10.1186/s12864-024-11008-4","url":null,"abstract":"<p><strong>Background: </strong>Circular RNAs (circRNAs) function as essential regulatory elements with pivotal roles in various biological processes. However, their expression profiles and functional regulation during the differentiation of goat myoblasts have not been thoroughly explored. This study conducts an analysis of circRNA expression profiles during the proliferation phase (cultured in growth medium, GM) and differentiation phase (cultured in differentiation medium, DM1/DM5) of skeletal muscle satellite cells (MuSCs) in goats.</p><p><strong>Results: </strong>A total of 2,094 circRNAs were identified, among which 84 were differentially expressed as determined by pairwise comparisons across three distinct groups. Validation of the expression levels of six randomly selected circRNAs was performed using reverse transcription PCR (RT-PCR) and quantitative RT-PCR (qRT-PCR), with confirmation of their back-splicing junction sites. Enrichment analysis of the host genes associated with differentially expressed circRNAs (DEcircRNAs) indicated significant involvement in biological processes such as muscle contraction, muscle hypertrophy, and muscle tissue development. Additionally, these host genes were implicated in key signaling pathways, including Hippo, TGF-beta, and MAPK pathways. Subsequently, employing Cytoscape, we developed a circRNA-miRNA interaction network to elucidate the complex regulatory mechanisms underlying goat muscle development, encompassing 21 circRNAs and 47 miRNAs. Functional assays demonstrated that circTGFβ2 enhances myogenic differentiation in goats, potentially through a miRNA sponge mechanism.</p><p><strong>Conclusion: </strong>In conclusion, we identified the genome-wide expression profiles of circRNAs in goat MuSCs during both proliferation and differentiation phases, and established that circTGFβ2 plays a role in the regulation of myogenesis. This study offers a significant resource for the advanced exploration of the biological functions and mechanisms of circRNAs in the myogenesis of goats.</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"25 1","pages":"1075"},"PeriodicalIF":3.5,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11555921/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142614287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}