In current bioinformatics research, spatial transcriptomics (ST) as a rapidly evolving technology is gradually receiving widespread attention from researchers. Spatial domains are regions where gene expression and histology are consistent in space, and detecting spatial domains can better understand the organization and functional distribution of tissues. Spatial domain recognition is a fundamental step in the process of ST data interpretation, which is also a major challenge in ST analysis. Therefore, developing more accurate, efficient, and general spatial domain recognition methods has become an important and urgent research direction. This article aims to review the current status and progress of spatial domain recognition research, explore the advantages and limitations of existing methods, and provide suggestions and directions for future tool development.
在当前的生物信息学研究中,空间转录组学(ST)作为一种快速发展的技术正逐渐受到研究人员的广泛关注。空间域是基因表达和组织学在空间上一致的区域,检测空间域可以更好地了解组织的组织和功能分布。空间域识别是 ST 数据解读过程中的基础步骤,也是 ST 分析中的一大挑战。因此,开发更准确、高效、通用的空间域识别方法已成为一个重要而紧迫的研究方向。本文旨在回顾空间域识别研究的现状和进展,探讨现有方法的优势和局限,并为未来工具的开发提供建议和方向。
{"title":"A comprehensive review of approaches for spatial domain recognition of spatial transcriptomes.","authors":"Ziyi Wang, Aoyun Geng, Hao Duan, Feifei Cui, Quan Zou, Zilong Zhang","doi":"10.1093/bfgp/elae040","DOIUrl":"10.1093/bfgp/elae040","url":null,"abstract":"<p><p>In current bioinformatics research, spatial transcriptomics (ST) as a rapidly evolving technology is gradually receiving widespread attention from researchers. Spatial domains are regions where gene expression and histology are consistent in space, and detecting spatial domains can better understand the organization and functional distribution of tissues. Spatial domain recognition is a fundamental step in the process of ST data interpretation, which is also a major challenge in ST analysis. Therefore, developing more accurate, efficient, and general spatial domain recognition methods has become an important and urgent research direction. This article aims to review the current status and progress of spatial domain recognition research, explore the advantages and limitations of existing methods, and provide suggestions and directions for future tool development.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"702-712"},"PeriodicalIF":2.5,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142481471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Acute myeloid leukemia (AML) is one of the leading leukemic malignancies in adults. The heterogeneity of the disease makes the diagnosis and treatment extremely difficult. With the advent of next-generation sequencing (NGS) technologies, exploration at the molecular level for the identification of biomarkers and drug targets has been the focus for the researchers to come up with novel therapies for better prognosis and survival outcomes of AML patients. However, the huge amount of data from NGS platforms requires a comprehensive AML platform to streamline literature mining efforts and save time. To facilitate this, we developed AMLdb, an interactive multi-omics platform that allows users to query, visualize, retrieve, and analyse AML related multi-omics data. AMLdb contains 86 datasets for gene expression profiles, 15 datasets for methylation profiles, CRISPR-Cas9 knockout screens of 26 AML cell lines, sensitivity of 26 AML cell lines to 288 drugs, mutations in 41 unique genes in 23 AML cell lines, and information on 41 experimentally validated biomarkers. In this study, we have reported five genes, i.e. CBFB, ENO1, IMPDH2, SEPHS2, and MYH9 identified via our analysis using AMLdb. ENO1 is uniquely identified gene which requires further investigation as a novel potential target while other reported genes have been previously confirmed as targets through experimental studies. Top of form we believe that these findings utilizing AMLdb can make it an invaluable resource to accelerate the development of effective therapies for AML and assisting the research community in advancing their understanding of AML pathogenesis. AMLdb is freely available at https://project.iith.ac.in/cgntlab/amldb.
急性髓性白血病(AML)是成人主要的白血病恶性肿瘤之一。这种疾病的异质性给诊断和治疗带来了极大的困难。随着下一代测序(NGS)技术的出现,在分子水平上探索生物标志物和药物靶点已成为研究人员的工作重点,以便提出新的疗法,改善急性髓细胞白血病患者的预后和生存状况。然而,来自 NGS 平台的海量数据需要一个全面的 AML 平台来简化文献挖掘工作并节省时间。为此,我们开发了一个交互式多组学平台 AMLdb,允许用户查询、可视化、检索和分析 AML 相关的多组学数据。AMLdb 包含 86 个基因表达谱数据集、15 个甲基化谱数据集、26 个 AML 细胞系的 CRISPR-Cas9 基因敲除筛选、26 个 AML 细胞系对 288 种药物的敏感性、23 个 AML 细胞系中 41 个独特基因的突变以及 41 个实验验证生物标志物的信息。在本研究中,我们报告了通过 AMLdb 分析发现的五个基因,即 CBFB、ENO1、IMPDH2、SEPHS2 和 MYH9。ENO1是唯一被发现的基因,作为一个新的潜在靶点还需要进一步研究,而其他报告的基因之前已通过实验研究证实为靶点。最重要的是,我们相信利用 AMLdb 的这些发现可以使其成为加快开发急性髓细胞性白血病有效疗法的宝贵资源,并帮助研究界加深对急性髓细胞性白血病发病机制的了解。AMLdb 可在 https://project.iith.ac.in/cgntlab/amldb 免费获取。
{"title":"AMLdb: a comprehensive multi-omics platform to identify biomarkers and drug targets for acute myeloid leukemia.","authors":"Keerthana Vinod Kumar, Ambuj Kumar, Kavita Kundal, Avik Sengupta, Kunjulakshmi R, Subashani Singh, Bhanu Teja Korra, Simran Sharma, Vandana Suresh, Mayilaadumveettil Nishana, Rahul Kumar","doi":"10.1093/bfgp/elae024","DOIUrl":"10.1093/bfgp/elae024","url":null,"abstract":"<p><p>Acute myeloid leukemia (AML) is one of the leading leukemic malignancies in adults. The heterogeneity of the disease makes the diagnosis and treatment extremely difficult. With the advent of next-generation sequencing (NGS) technologies, exploration at the molecular level for the identification of biomarkers and drug targets has been the focus for the researchers to come up with novel therapies for better prognosis and survival outcomes of AML patients. However, the huge amount of data from NGS platforms requires a comprehensive AML platform to streamline literature mining efforts and save time. To facilitate this, we developed AMLdb, an interactive multi-omics platform that allows users to query, visualize, retrieve, and analyse AML related multi-omics data. AMLdb contains 86 datasets for gene expression profiles, 15 datasets for methylation profiles, CRISPR-Cas9 knockout screens of 26 AML cell lines, sensitivity of 26 AML cell lines to 288 drugs, mutations in 41 unique genes in 23 AML cell lines, and information on 41 experimentally validated biomarkers. In this study, we have reported five genes, i.e. CBFB, ENO1, IMPDH2, SEPHS2, and MYH9 identified via our analysis using AMLdb. ENO1 is uniquely identified gene which requires further investigation as a novel potential target while other reported genes have been previously confirmed as targets through experimental studies. Top of form we believe that these findings utilizing AMLdb can make it an invaluable resource to accelerate the development of effective therapies for AML and assisting the research community in advancing their understanding of AML pathogenesis. AMLdb is freely available at https://project.iith.ac.in/cgntlab/amldb.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"798-805"},"PeriodicalIF":2.5,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141307484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ferroptosis, a commonly observed type of programmed cell death caused by abnormal metabolic and biochemical mechanisms, is frequently triggered by cellular stress. The occurrence of ferroptosis is predominantly linked to pathophysiological conditions due to the substantial impact of various metabolic pathways, including fatty acid metabolism and iron regulation, on cellular reactions to lipid peroxidation and ferroptosis. This mode of cell death serves as a fundamental factor in the development of numerous diseases, thereby presenting a range of therapeutic targets. Single-cell sequencing technology provides insights into the cellular and molecular characteristics of individual cells, as opposed to bulk sequencing, which provides data in a more generalized manner. Single-cell sequencing has found extensive application in the field of cancer research. This paper reviews the progress made in ferroptosis-associated cancer research using single-cell sequencing, including ferroptosis-associated pathways, immune checkpoints, biomarkers, and the identification of cell clusters associated with ferroptosis in tumors. In general, the utilization of single-cell sequencing technology has the potential to contribute significantly to the investigation of the mechanistic regulatory pathways linked to ferroptosis. Moreover, it can shed light on the intricate connection between ferroptosis and cancer. This technology holds great promise in advancing tumor-wide diagnosis, targeted therapy, and prognosis prediction.
{"title":"Advances in integrating single-cell sequencing data to unravel the mechanism of ferroptosis in cancer.","authors":"Zhaolan Du, Yi Shi, Jianjun Tan","doi":"10.1093/bfgp/elae025","DOIUrl":"10.1093/bfgp/elae025","url":null,"abstract":"<p><p>Ferroptosis, a commonly observed type of programmed cell death caused by abnormal metabolic and biochemical mechanisms, is frequently triggered by cellular stress. The occurrence of ferroptosis is predominantly linked to pathophysiological conditions due to the substantial impact of various metabolic pathways, including fatty acid metabolism and iron regulation, on cellular reactions to lipid peroxidation and ferroptosis. This mode of cell death serves as a fundamental factor in the development of numerous diseases, thereby presenting a range of therapeutic targets. Single-cell sequencing technology provides insights into the cellular and molecular characteristics of individual cells, as opposed to bulk sequencing, which provides data in a more generalized manner. Single-cell sequencing has found extensive application in the field of cancer research. This paper reviews the progress made in ferroptosis-associated cancer research using single-cell sequencing, including ferroptosis-associated pathways, immune checkpoints, biomarkers, and the identification of cell clusters associated with ferroptosis in tumors. In general, the utilization of single-cell sequencing technology has the potential to contribute significantly to the investigation of the mechanistic regulatory pathways linked to ferroptosis. Moreover, it can shed light on the intricate connection between ferroptosis and cancer. This technology holds great promise in advancing tumor-wide diagnosis, targeted therapy, and prognosis prediction.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"713-725"},"PeriodicalIF":2.5,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141319002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kristina Santucci, Yuning Cheng, Si-Mei Xu, Michael Janitz
Long-read sequencing technologies can capture entire RNA transcripts in a single sequencing read, reducing the ambiguity in constructing and quantifying transcript models in comparison to more common and earlier methods, such as short-read sequencing. Recent improvements in the accuracy of long-read sequencing technologies have expanded the scope for novel splice isoform detection and have also enabled a far more accurate reconstruction of complex splicing patterns and transcriptomes. Additionally, the incorporation and advancements of machine learning and deep learning algorithms in bioinformatic software have significantly improved the reliability of long-read sequencing transcriptomic studies. However, there is a lack of consensus on what bioinformatic tools and pipelines produce the most precise and consistent results. Thus, this review aims to discuss and compare the performance of available methods for novel isoform discovery with long-read sequencing technologies, with 25 tools being presented. Furthermore, this review intends to demonstrate the need for developing standard analytical pipelines, tools, and transcript model conventions for novel isoform discovery and transcriptomic studies.
{"title":"Enhancing novel isoform discovery: leveraging nanopore long-read sequencing and machine learning approaches.","authors":"Kristina Santucci, Yuning Cheng, Si-Mei Xu, Michael Janitz","doi":"10.1093/bfgp/elae031","DOIUrl":"10.1093/bfgp/elae031","url":null,"abstract":"<p><p>Long-read sequencing technologies can capture entire RNA transcripts in a single sequencing read, reducing the ambiguity in constructing and quantifying transcript models in comparison to more common and earlier methods, such as short-read sequencing. Recent improvements in the accuracy of long-read sequencing technologies have expanded the scope for novel splice isoform detection and have also enabled a far more accurate reconstruction of complex splicing patterns and transcriptomes. Additionally, the incorporation and advancements of machine learning and deep learning algorithms in bioinformatic software have significantly improved the reliability of long-read sequencing transcriptomic studies. However, there is a lack of consensus on what bioinformatic tools and pipelines produce the most precise and consistent results. Thus, this review aims to discuss and compare the performance of available methods for novel isoform discovery with long-read sequencing technologies, with 25 tools being presented. Furthermore, this review intends to demonstrate the need for developing standard analytical pipelines, tools, and transcript model conventions for novel isoform discovery and transcriptomic studies.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"683-694"},"PeriodicalIF":2.5,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142001414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It is projected that 10 million deaths could be attributed to drug-resistant bacteria infections in 2050. To address this concern, identifying new-generation antibiotics is an effective way. Antimicrobial peptides (AMPs), a class of innate immune effectors, have received significant attention for their capacity to eliminate drug-resistant pathogens, including viruses, bacteria, and fungi. Recent years have witnessed widespread applications of computational methods especially machine learning (ML) and deep learning (DL) for discovering AMPs. However, existing methods only use features including compositional, physiochemical, and structural properties of peptides, which cannot fully capture sequence information from AMPs. Here, we present SAMP, an ensemble random projection (RP) based computational model that leverages a new type of feature called proportionalized split amino acid composition (PSAAC) in addition to conventional sequence-based features for AMP prediction. With this new feature set, SAMP captures the residue patterns like sorting signals at both the N-terminal and the C-terminal, while also retaining the sequence order information from the middle peptide fragments. Benchmarking tests on different balanced and imbalanced datasets demonstrate that SAMP consistently outperforms existing state-of-the-art methods, such as iAMPpred and AMPScanner V2, in terms of accuracy, Matthews correlation coefficient (MCC), G-measure, and F1-score. In addition, by leveraging an ensemble RP architecture, SAMP is scalable to processing large-scale AMP identification with further performance improvement, compared to those models without RP. To facilitate the use of SAMP, we have developed a Python package that is freely available at https://github.com/wan-mlab/SAMP.
预计到 2050 年,可能会有 1 000 万人死于耐药菌感染。要解决这一问题,找出新一代抗生素是一种有效的方法。抗菌肽(AMPs)是一类先天性免疫效应物,因其消除耐药病原体(包括病毒、细菌和真菌)的能力而备受关注。近年来,人们广泛应用计算方法,特别是机器学习(ML)和深度学习(DL)来发现 AMPs。然而,现有的方法只能利用肽的组成、理化和结构特性等特征,无法完全捕捉到 AMPs 的序列信息。在这里,我们提出了一种基于集合随机投影(RP)的计算模型 SAMP,该模型除了利用传统的基于序列的特征进行 AMP 预测外,还利用了一种新型特征,即比例化拆分氨基酸组成(PSAAC)。利用这种新型特征集,SAMP 可以捕捉 N 端和 C 端的残基模式(如排序信号),同时还能保留中间肽段的序列顺序信息。在不同的平衡和不平衡数据集上进行的基准测试表明,SAMP 在准确度、马修斯相关系数 (MCC)、G-measure 和 F1 分数等方面始终优于 iAMPpred 和 AMPScanner V2 等现有的一流方法。此外,通过利用集合 RP 架构,SAMP 可以扩展到处理大规模 AMP 识别,与没有 RP 的模型相比,性能得到进一步提高。为方便使用 SAMP,我们开发了一个 Python 软件包,可在 https://github.com/wan-mlab/SAMP 免费获取。
{"title":"SAMP: Identifying antimicrobial peptides by an ensemble learning model based on proportionalized split amino acid composition.","authors":"Junxi Feng, Mengtao Sun, Cong Liu, Weiwei Zhang, Changmou Xu, Jieqiong Wang, Guangshun Wang, Shibiao Wan","doi":"10.1093/bfgp/elae046","DOIUrl":"10.1093/bfgp/elae046","url":null,"abstract":"<p><p>It is projected that 10 million deaths could be attributed to drug-resistant bacteria infections in 2050. To address this concern, identifying new-generation antibiotics is an effective way. Antimicrobial peptides (AMPs), a class of innate immune effectors, have received significant attention for their capacity to eliminate drug-resistant pathogens, including viruses, bacteria, and fungi. Recent years have witnessed widespread applications of computational methods especially machine learning (ML) and deep learning (DL) for discovering AMPs. However, existing methods only use features including compositional, physiochemical, and structural properties of peptides, which cannot fully capture sequence information from AMPs. Here, we present SAMP, an ensemble random projection (RP) based computational model that leverages a new type of feature called proportionalized split amino acid composition (PSAAC) in addition to conventional sequence-based features for AMP prediction. With this new feature set, SAMP captures the residue patterns like sorting signals at both the N-terminal and the C-terminal, while also retaining the sequence order information from the middle peptide fragments. Benchmarking tests on different balanced and imbalanced datasets demonstrate that SAMP consistently outperforms existing state-of-the-art methods, such as iAMPpred and AMPScanner V2, in terms of accuracy, Matthews correlation coefficient (MCC), G-measure, and F1-score. In addition, by leveraging an ensemble RP architecture, SAMP is scalable to processing large-scale AMP identification with further performance improvement, compared to those models without RP. To facilitate the use of SAMP, we have developed a Python package that is freely available at https://github.com/wan-mlab/SAMP.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"879-890"},"PeriodicalIF":2.5,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11631067/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142689781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matheus Sanita Lima, Douglas Silva Domingues, Alexandre Rossi Paschoal, David Roy Smith
40 years ago, organelle genomes were assumed to be streamlined and, perhaps, unexciting remnants of their prokaryotic past. However, the field of organelle genomics has exposed an unparallel diversity in genome architecture (i.e. genome size, structure, and content). The transcription of these eccentric genomes can be just as elaborate - organelle genomes are pervasively transcribed into a plethora of RNA types. However, while organelle protein-coding genes are known to produce polycistronic transcripts that undergo heavy posttranscriptional processing, the nature of organelle noncoding transcriptomes is still poorly resolved. Here, we review how wet-lab experiments and second-generation sequencing data (i.e. short reads) have been useful to determine certain types of organelle RNAs, particularly noncoding RNAs. We then explain how third-generation (long-read) RNA-Seq data represent the new frontier in organelle transcriptomics. We show that public repositories (e.g. NCBI SRA) already contain enough data for inter-phyla comparative studies and argue that organelle biologists can benefit from such data. We discuss the prospects of using publicly available sequencing data for organelle-focused studies and examine the challenges of such an approach. We highlight that the lack of a comprehensive database dedicated to organelle genomics/transcriptomics is a major impediment to the development of a field with implications in basic and applied science.
{"title":"Long-read RNA sequencing can probe organelle genome pervasive transcription.","authors":"Matheus Sanita Lima, Douglas Silva Domingues, Alexandre Rossi Paschoal, David Roy Smith","doi":"10.1093/bfgp/elae026","DOIUrl":"10.1093/bfgp/elae026","url":null,"abstract":"<p><p>40 years ago, organelle genomes were assumed to be streamlined and, perhaps, unexciting remnants of their prokaryotic past. However, the field of organelle genomics has exposed an unparallel diversity in genome architecture (i.e. genome size, structure, and content). The transcription of these eccentric genomes can be just as elaborate - organelle genomes are pervasively transcribed into a plethora of RNA types. However, while organelle protein-coding genes are known to produce polycistronic transcripts that undergo heavy posttranscriptional processing, the nature of organelle noncoding transcriptomes is still poorly resolved. Here, we review how wet-lab experiments and second-generation sequencing data (i.e. short reads) have been useful to determine certain types of organelle RNAs, particularly noncoding RNAs. We then explain how third-generation (long-read) RNA-Seq data represent the new frontier in organelle transcriptomics. We show that public repositories (e.g. NCBI SRA) already contain enough data for inter-phyla comparative studies and argue that organelle biologists can benefit from such data. We discuss the prospects of using publicly available sequencing data for organelle-focused studies and examine the challenges of such an approach. We highlight that the lack of a comprehensive database dedicated to organelle genomics/transcriptomics is a major impediment to the development of a field with implications in basic and applied science.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"695-701"},"PeriodicalIF":2.5,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141332590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Issa Keerthi, Vishnu Shukla, Sudhamani Kalluru, Lal Ahamed Mohammad, P Lavanya Kumari, Eswarayya Ramireddy, Lakshminarayana R Vemireddy
Rapidly identifying candidate genes underlying major QTLs is crucial for improving rice (Oryza sativa L.). In this study, we developed a workflow to rapidly prioritize candidate genes underpinning 99 major QTLs governing yield component traits. This workflow integrates multiomics databases, including sequence variation, gene expression, gene ontology, co-expression analysis, and protein-protein interaction. We predicted 206 candidate genes for 99 reported QTLs governing ten economically important yield-contributing traits using this approach. Among these, transcription factors belonging to families of MADS-box, WRKY, helix-loop-helix, TCP, MYB, GRAS, auxin response factor, and nuclear transcription factor Y subunit were promising. Validation of key prioritized candidate genes in contrasting rice genotypes for sequence variation and differential expression identified Leucine-Rich Repeat family protein (LOC_Os03g28270) and cytochrome P450 (LOC_Os02g57290) as candidate genes for the major QTLs GL1 and pl2.1, which govern grain length and panicle length, respectively. In conclusion, this study demonstrates that our workflow can significantly narrow down a large number of annotated genes in a QTL to a very small number of the most probable candidates, achieving approximately a 21-fold reduction. These candidate genes have potential implications for enhancing rice yield.
{"title":"Prioritization of candidate genes for major QTLs governing yield traits employing integrated multi-omics approach in rice (Oryza sativa L.).","authors":"Issa Keerthi, Vishnu Shukla, Sudhamani Kalluru, Lal Ahamed Mohammad, P Lavanya Kumari, Eswarayya Ramireddy, Lakshminarayana R Vemireddy","doi":"10.1093/bfgp/elae035","DOIUrl":"10.1093/bfgp/elae035","url":null,"abstract":"<p><p>Rapidly identifying candidate genes underlying major QTLs is crucial for improving rice (Oryza sativa L.). In this study, we developed a workflow to rapidly prioritize candidate genes underpinning 99 major QTLs governing yield component traits. This workflow integrates multiomics databases, including sequence variation, gene expression, gene ontology, co-expression analysis, and protein-protein interaction. We predicted 206 candidate genes for 99 reported QTLs governing ten economically important yield-contributing traits using this approach. Among these, transcription factors belonging to families of MADS-box, WRKY, helix-loop-helix, TCP, MYB, GRAS, auxin response factor, and nuclear transcription factor Y subunit were promising. Validation of key prioritized candidate genes in contrasting rice genotypes for sequence variation and differential expression identified Leucine-Rich Repeat family protein (LOC_Os03g28270) and cytochrome P450 (LOC_Os02g57290) as candidate genes for the major QTLs GL1 and pl2.1, which govern grain length and panicle length, respectively. In conclusion, this study demonstrates that our workflow can significantly narrow down a large number of annotated genes in a QTL to a very small number of the most probable candidates, achieving approximately a 21-fold reduction. These candidate genes have potential implications for enhancing rice yield.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"843-857"},"PeriodicalIF":2.5,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142127426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Next-generation sequencing and other sequencing approaches have made significant progress in DNA analysis. However, there are indispensable advantages in the nonsequencing methods. They have their justifications such as being speedy, cost-effective, multi-applicable, and straightforward. Among the nonsequencing methods, the genome profiling method is worthy of reviewing because of its high potential. This article first reviews its basic properties, highlights the key concept of species identification dots (spiddos), and then summarizes its various applications.
新一代测序和其他测序方法在 DNA 分析领域取得了重大进展。然而,非测序方法也有其不可或缺的优势。它们有其合理性,如速度快、成本效益高、适用范围广、简单明了等。在非测序方法中,基因组图谱分析法因其巨大潜力而值得研究。本文首先回顾了其基本特性,强调了物种识别点(spiddos)的关键概念,然后总结了其各种应用。
{"title":"Discoveries by the genome profiling, symbolic powers of non-next generation sequencing methods.","authors":"Koichi Nishigaki","doi":"10.1093/bfgp/elae047","DOIUrl":"10.1093/bfgp/elae047","url":null,"abstract":"<p><p>Next-generation sequencing and other sequencing approaches have made significant progress in DNA analysis. However, there are indispensable advantages in the nonsequencing methods. They have their justifications such as being speedy, cost-effective, multi-applicable, and straightforward. Among the nonsequencing methods, the genome profiling method is worthy of reviewing because of its high potential. This article first reviews its basic properties, highlights the key concept of species identification dots (spiddos), and then summarizes its various applications.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"775-797"},"PeriodicalIF":2.5,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142741507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuanfeng Xu, Fan Yu, Wenrong Feng, Jia Wei, Shengyan Su, Jianlin Li, Guoan Hua, Wenjing Li, Yongkai Tang
At present, public databases house an extensive repository of transcriptome data, with the volume continuing to grow at an accelerated pace. Utilizing these data effectively is a shared interest within the scientific community. In this study, we introduced a novel strategy that harnesses SNPs and InDels identified from transcriptome data, combined with sample metadata from databases, to effectively screen for molecular markers correlated with traits. We utilized 228 transcriptome datasets of Eriocheir sinensis from the NCBI database and employed the Genome Analysis Toolkit software to identify 96 388 SNPs and 20 645 InDels. Employing the genome-wide association study analysis, in conjunction with the gender information from databases, we identified 3456 sex-biased SNPs and 639 sex-biased InDels. The KOG and KEGG annotations of the sex-biased SNPs and InDels revealed that these genes were primarily involved in the metabolic processes of E. sinensis. Combined with SnpEff annotation and PCR experimental validation, a highly sex-biased SNP located in the Kelch domain containing 4 (Klhdc4) gene, CHR67-6415071, was found to alter the splicing sites of Klhdc4, generating two splice variants, Klhdc4_a and Klhdc4_b. Additionally, Klhdc4 exhibited robust expression across the ovaries, testes, and accessory glands. The sex-biased SNPs and InDels identified in this study are conducive to the development of unisexual cultivation methods for E. sinensis, and the alternative splicing event caused by the sex-biased SNP in Klhdc4 may serve as a potential mechanism for sex regulation in E. sinensis. The analysis strategy employed in this study represents a new direction for the rational exploitation and utilization of transcriptome data in public databases.
{"title":"Genetic variation mining of the Chinese mitten crab (Eriocheir sinensis) based on transcriptome data from public databases.","authors":"Yuanfeng Xu, Fan Yu, Wenrong Feng, Jia Wei, Shengyan Su, Jianlin Li, Guoan Hua, Wenjing Li, Yongkai Tang","doi":"10.1093/bfgp/elae030","DOIUrl":"10.1093/bfgp/elae030","url":null,"abstract":"<p><p>At present, public databases house an extensive repository of transcriptome data, with the volume continuing to grow at an accelerated pace. Utilizing these data effectively is a shared interest within the scientific community. In this study, we introduced a novel strategy that harnesses SNPs and InDels identified from transcriptome data, combined with sample metadata from databases, to effectively screen for molecular markers correlated with traits. We utilized 228 transcriptome datasets of Eriocheir sinensis from the NCBI database and employed the Genome Analysis Toolkit software to identify 96 388 SNPs and 20 645 InDels. Employing the genome-wide association study analysis, in conjunction with the gender information from databases, we identified 3456 sex-biased SNPs and 639 sex-biased InDels. The KOG and KEGG annotations of the sex-biased SNPs and InDels revealed that these genes were primarily involved in the metabolic processes of E. sinensis. Combined with SnpEff annotation and PCR experimental validation, a highly sex-biased SNP located in the Kelch domain containing 4 (Klhdc4) gene, CHR67-6415071, was found to alter the splicing sites of Klhdc4, generating two splice variants, Klhdc4_a and Klhdc4_b. Additionally, Klhdc4 exhibited robust expression across the ovaries, testes, and accessory glands. The sex-biased SNPs and InDels identified in this study are conducive to the development of unisexual cultivation methods for E. sinensis, and the alternative splicing event caused by the sex-biased SNP in Klhdc4 may serve as a potential mechanism for sex regulation in E. sinensis. The analysis strategy employed in this study represents a new direction for the rational exploitation and utilization of transcriptome data in public databases.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"816-827"},"PeriodicalIF":2.5,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141565172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gene regulatory networks (GRNs) contribute toward understanding the function of genes and the development of cancer or the impact of key genes on diseases. Hence, this study proposes an ensemble method based on 13 basic classification methods and a flexible neural tree (FNT) to improve GRN identification accuracy. The primary classification methods contain ridge classification, stochastic gradient descent, Gaussian process classification, Bernoulli Naive Bayes, adaptive boosting, gradient boosting decision tree, hist gradient boosting classification, eXtreme gradient boosting (XGBoost), multilayer perceptron, light gradient boosting machine, random forest, support vector machine, and k-nearest neighbor algorithm, which are regarded as the input variable set of FNT model. Additionally, a hybrid evolutionary algorithm based on a gene programming variant and particle swarm optimization is developed to search for the optimal FNT model. Experiments on three simulation datasets and three real single-cell RNA-seq datasets demonstrate that the proposed ensemble feature outperforms 13 supervised algorithms, seven unsupervised algorithms (ARACNE, CLR, GENIE3, MRNET, PCACMI, GENECI, and EPCACMI) and four single cell-specific methods (SCODE, BiRGRN, LEAP, and BiGBoost) based on the area under the receiver operating characteristic curve, area under the precision-recall curve, and F1 metrics.
{"title":"Gene regulatory network inference based on novel ensemble method.","authors":"Bin Yang, Jing Li, Xiang Li, Sanrong Liu","doi":"10.1093/bfgp/elae036","DOIUrl":"10.1093/bfgp/elae036","url":null,"abstract":"<p><p>Gene regulatory networks (GRNs) contribute toward understanding the function of genes and the development of cancer or the impact of key genes on diseases. Hence, this study proposes an ensemble method based on 13 basic classification methods and a flexible neural tree (FNT) to improve GRN identification accuracy. The primary classification methods contain ridge classification, stochastic gradient descent, Gaussian process classification, Bernoulli Naive Bayes, adaptive boosting, gradient boosting decision tree, hist gradient boosting classification, eXtreme gradient boosting (XGBoost), multilayer perceptron, light gradient boosting machine, random forest, support vector machine, and k-nearest neighbor algorithm, which are regarded as the input variable set of FNT model. Additionally, a hybrid evolutionary algorithm based on a gene programming variant and particle swarm optimization is developed to search for the optimal FNT model. Experiments on three simulation datasets and three real single-cell RNA-seq datasets demonstrate that the proposed ensemble feature outperforms 13 supervised algorithms, seven unsupervised algorithms (ARACNE, CLR, GENIE3, MRNET, PCACMI, GENECI, and EPCACMI) and four single cell-specific methods (SCODE, BiRGRN, LEAP, and BiGBoost) based on the area under the receiver operating characteristic curve, area under the precision-recall curve, and F1 metrics.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"866-878"},"PeriodicalIF":2.5,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142332842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}