首页 > 最新文献

Briefings in bioinformatics最新文献

英文 中文
MicroHDF: predicting host phenotypes with metagenomic data using a deep forest-based framework. MicroHDF:利用基于深度森林的框架,通过元基因组数据预测宿主表型。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae530
Kai Shi, Qiaohui Liu, Qingrong Ji, Qisheng He, Xing-Ming Zhao

The gut microbiota plays a vital role in human health, and significant effort has been made to predict human phenotypes, especially diseases, with the microbiota as a promising indicator or predictor with machine learning (ML) methods. However, the accuracy is impacted by a lot of factors when predicting host phenotypes with the metagenomic data, e.g. small sample size, class imbalance, high-dimensional features, etc. To address these challenges, we propose MicroHDF, an interpretable deep learning framework to predict host phenotypes, where a cascade layers of deep forest units is designed for handling sample class imbalance and high dimensional features. The experimental results show that the performance of MicroHDF is competitive with that of existing state-of-the-art methods on 13 publicly available datasets of six different diseases. In particular, it performs best with the area under the receiver operating characteristic curve of 0.9182 ± 0.0098 and 0.9469 ± 0.0076 for inflammatory bowel disease (IBD) and liver cirrhosis, respectively. Our MicroHDF also shows better performance and robustness in cross-study validation. Furthermore, MicroHDF is applied to two high-risk diseases, IBD and autism spectrum disorder, as case studies to identify potential biomarkers. In conclusion, our method provides an effective and reliable prediction of the host phenotype and discovers informative features with biological insights.

肠道微生物群对人类健康起着至关重要的作用,人们已经做出了巨大努力,利用微生物群作为机器学习(ML)方法的一个有前途的指标或预测因子来预测人类表型,特别是疾病。然而,在利用元基因组数据预测宿主表型时,准确性受到很多因素的影响,如样本量小、类不平衡、高维特征等。为了应对这些挑战,我们提出了一种可解释的深度学习框架--MicroHDF,用于预测宿主表型,其中设计了一个级联层的深度森林单元,用于处理样本类不平衡和高维特征。实验结果表明,在六种不同疾病的 13 个公开数据集上,MicroHDF 的性能与现有的最先进方法相比具有竞争力。特别是,它在炎症性肠病(IBD)和肝硬化的接收者工作特征曲线下面积分别为 0.9182 ± 0.0098 和 0.9469 ± 0.0076,表现最佳。我们的 MicroHDF 在交叉研究验证中也表现出更好的性能和稳健性。此外,我们还将 MicroHDF 应用于两种高风险疾病(IBD 和自闭症谱系障碍)的案例研究,以确定潜在的生物标记物。总之,我们的方法能有效、可靠地预测宿主表型,并发现具有生物学洞察力的信息特征。
{"title":"MicroHDF: predicting host phenotypes with metagenomic data using a deep forest-based framework.","authors":"Kai Shi, Qiaohui Liu, Qingrong Ji, Qisheng He, Xing-Ming Zhao","doi":"10.1093/bib/bbae530","DOIUrl":"10.1093/bib/bbae530","url":null,"abstract":"<p><p>The gut microbiota plays a vital role in human health, and significant effort has been made to predict human phenotypes, especially diseases, with the microbiota as a promising indicator or predictor with machine learning (ML) methods. However, the accuracy is impacted by a lot of factors when predicting host phenotypes with the metagenomic data, e.g. small sample size, class imbalance, high-dimensional features, etc. To address these challenges, we propose MicroHDF, an interpretable deep learning framework to predict host phenotypes, where a cascade layers of deep forest units is designed for handling sample class imbalance and high dimensional features. The experimental results show that the performance of MicroHDF is competitive with that of existing state-of-the-art methods on 13 publicly available datasets of six different diseases. In particular, it performs best with the area under the receiver operating characteristic curve of 0.9182 ± 0.0098 and 0.9469 ± 0.0076 for inflammatory bowel disease (IBD) and liver cirrhosis, respectively. Our MicroHDF also shows better performance and robustness in cross-study validation. Furthermore, MicroHDF is applied to two high-risk diseases, IBD and autism spectrum disorder, as case studies to identify potential biomarkers. In conclusion, our method provides an effective and reliable prediction of the host phenotype and discovers informative features with biological insights.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11500453/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142516299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unlocking cross-modal interplay of single-cell joint profiling with CellMATE. 利用 CellMATE 揭开单细胞联合剖析的跨模式相互作用。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae582
Qi Wang, Bolei Zhang, Yue Guo, Luyu Gong, Erguang Li, Jingping Yang

A key advantage of single-cell multimodal joint profiling is the modality interplay, which is essential for deciphering the cell fate. However, while current analytical methods can leverage the additive benefits, they fall short to explore the synergistic insights of joint profiling, thereby diminishing the advantage of joint profiling. Here, we introduce CellMATE, a Multi-head Adversarial Training-based Early-integration approach specifically developed for multimodal joint profiling. CellMATE can capture both additive and synergistic benefits inherent in joint profiling through auto-learning of multimodal distributions and simultaneously represents all features into a unified latent space. Through extensive evaluation across diverse joint profiling scenarios, CellMATE demonstrated its superiority in ensuring utility of cross-modal properties, uncovering cellular heterogeneity and plasticity, and delineating differentiation trajectories. CellMATE uniquely unlocks the full potential of joint profiling to elucidate the dynamic nature of cells during critical processes as differentiation, development, and diseases.

单细胞多模态联合图谱分析的一个关键优势是模态间的相互作用,这对破译细胞命运至关重要。然而,虽然目前的分析方法可以利用叠加优势,但却无法探索联合剖析的协同作用,从而削弱了联合剖析的优势。在此,我们介绍 CellMATE,这是一种基于多头对抗训练的早期整合方法,专门为多模态联合剖析而开发。CellMATE 可通过自动学习多模态分布,同时将所有特征表示到统一的潜在空间中,从而捕捉联合剖析固有的叠加和协同优势。通过对各种联合剖析方案的广泛评估,CellMATE 在确保跨模态属性的实用性、揭示细胞异质性和可塑性以及描绘分化轨迹方面都表现出了自己的优势。CellMATE 独一无二地释放了联合剖析的全部潜力,以阐明细胞在分化、发育和疾病等关键过程中的动态性质。
{"title":"Unlocking cross-modal interplay of single-cell joint profiling with CellMATE.","authors":"Qi Wang, Bolei Zhang, Yue Guo, Luyu Gong, Erguang Li, Jingping Yang","doi":"10.1093/bib/bbae582","DOIUrl":"https://doi.org/10.1093/bib/bbae582","url":null,"abstract":"<p><p>A key advantage of single-cell multimodal joint profiling is the modality interplay, which is essential for deciphering the cell fate. However, while current analytical methods can leverage the additive benefits, they fall short to explore the synergistic insights of joint profiling, thereby diminishing the advantage of joint profiling. Here, we introduce CellMATE, a Multi-head Adversarial Training-based Early-integration approach specifically developed for multimodal joint profiling. CellMATE can capture both additive and synergistic benefits inherent in joint profiling through auto-learning of multimodal distributions and simultaneously represents all features into a unified latent space. Through extensive evaluation across diverse joint profiling scenarios, CellMATE demonstrated its superiority in ensuring utility of cross-modal properties, uncovering cellular heterogeneity and plasticity, and delineating differentiation trajectories. CellMATE uniquely unlocks the full potential of joint profiling to elucidate the dynamic nature of cells during critical processes as differentiation, development, and diseases.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142614952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Molecular group and correlation guided structural learning for multi-phenotype prediction. 用于多表型预测的分子组和相关性引导的结构学习。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae585
Xueping Zhou, Manqi Cai, Molin Yue, Juan C Celedón, Jiebiao Wang, Ying Ding, Wei Chen, Yanming Li

We propose a supervised learning bioinformatics tool, Biological gRoup guIded muLtivariate muLtiple lIneAr regression with peNalizaTion (Brilliant), designed for feature selection and outcome prediction in genomic data with multi-phenotypic responses. Brilliant specifically incorporates genome and/or phenotype grouping structures, as well as phenotype correlation structures, in feature selection, effect estimation, and outcome prediction under a penalized multi-response linear regression model. Extensive simulations demonstrate its superior performance compared to competing methods. We applied Brilliant to two omics studies. In the first study, we identified novel association signals between multivariate gene expressions and high-dimensional DNA methylation profiles, providing biological insights for the baseline CpG-to-gene regulation patterns in a Puerto Rican children asthma cohort. The second study focused on cell-type deconvolution prediction using high-dimensional gene expression profiles. Using Brilliant, we improved the accuracy for cell-type fraction prediction and identified novel cell-type signature genes.

我们提出了一种生物信息学监督学习工具--生物组指导的多变量多反应线性回归(Biological gRoup guIded muLtivariate muLtiple lIneAr regression with peNalizaTion,Brilliant),该工具设计用于具有多表型反应的基因组数据的特征选择和结果预测。Brilliant 特别将基因组和/或表型分组结构以及表型相关结构纳入特征选择、效应估计和受惩罚多反应线性回归模型下的结果预测中。大量的模拟证明,与同类方法相比,Brilliant 的性能更优越。我们将 Brilliant 应用于两项 omics 研究。在第一项研究中,我们在多变量基因表达和高维 DNA 甲基化图谱之间发现了新的关联信号,为波多黎各儿童哮喘队列中的基线 CpG 基因调控模式提供了生物学见解。第二项研究的重点是利用高维基因表达谱进行细胞类型解旋预测。利用 Brilliant,我们提高了细胞类型分数预测的准确性,并确定了新的细胞类型特征基因。
{"title":"Molecular group and correlation guided structural learning for multi-phenotype prediction.","authors":"Xueping Zhou, Manqi Cai, Molin Yue, Juan C Celedón, Jiebiao Wang, Ying Ding, Wei Chen, Yanming Li","doi":"10.1093/bib/bbae585","DOIUrl":"10.1093/bib/bbae585","url":null,"abstract":"<p><p>We propose a supervised learning bioinformatics tool, Biological gRoup guIded muLtivariate muLtiple lIneAr regression with peNalizaTion (Brilliant), designed for feature selection and outcome prediction in genomic data with multi-phenotypic responses. Brilliant specifically incorporates genome and/or phenotype grouping structures, as well as phenotype correlation structures, in feature selection, effect estimation, and outcome prediction under a penalized multi-response linear regression model. Extensive simulations demonstrate its superior performance compared to competing methods. We applied Brilliant to two omics studies. In the first study, we identified novel association signals between multivariate gene expressions and high-dimensional DNA methylation profiles, providing biological insights for the baseline CpG-to-gene regulation patterns in a Puerto Rican children asthma cohort. The second study focused on cell-type deconvolution prediction using high-dimensional gene expression profiles. Using Brilliant, we improved the accuracy for cell-type fraction prediction and identified novel cell-type signature genes.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562839/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142614917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adversarial regularized autoencoder graph neural network for microbe-disease associations prediction. 用于微生物-疾病关联预测的对抗正则化自动编码器图神经网络。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae584
Limuxuan He, Quan Zou, Qi Dai, Shuang Cheng, Yansu Wang

Background: Microorganisms inhabit various regions of the human body and significantly contribute to numerous diseases. Predicting the associations between microbes and diseases is crucial for understanding pathogenic mechanisms and informing prevention and treatment strategies. Biological experiments to determine these associations are time-consuming and costly. Therefore, integrating deep learning with biological networks can efficiently identify potential microbe-disease associations on a large scale.

Methods: We propose an adversarial regularized autoencoder graph neural network algorithm, named Stacked Adversarial Regularization for Microbe-Disease Associations Prediction (SARMDA), for predicting associations between microbes and diseases. First, we integrate topological structural similarity and functional similarity metrics of microbes and diseases to construct a heterogeneous network. Then, utilizing an autoencoder based on GraphSAGE, we learn both the topological and attribute representations of nodes within the constructed network. Finally, we introduce an adversarial regularized autoencoder graph neural network embedding model to address the inherent limitations of traditional GraphSAGE autoencoders in capturing global information.

Results: Under the five-fold cross-validation on microbe-disease pairs, SARMDA was compared with eight advanced methods using the Human Microbe-Disease Association Database (HMDAD) and Disbiome databases. The best area under the ROC curve (AUC) achieved by SARMDA on HMDAD was 0.9891$pm$0.0057, and the best area under the precision-recall curve (AUPR) was 0.9902$pm$0.0128. On the Disbiome dataset, the AUC was 0.9328$pm$0.0072, and the best AUPR was 0.9233$pm$0.0089, outperforming the other eight MDAs prediction methods. Furthermore, the effectiveness of our model was demonstrated through a detailed analysis of asthma and inflammatory bowel disease cases.

背景:微生物栖息在人体的各个部位,是导致多种疾病的重要因素。预测微生物与疾病之间的关联对于了解致病机制以及制定预防和治疗策略至关重要。确定这些关联的生物实验既耗时又昂贵。因此,将深度学习与生物网络相结合,可以有效地大规模识别潜在的微生物与疾病的关联:我们提出了一种对抗正则化自动编码器图神经网络算法,名为 "堆叠对抗正则化微生物-疾病关联预测(SARMDA)",用于预测微生物与疾病之间的关联。首先,我们整合了微生物和疾病的拓扑结构相似性和功能相似性指标,构建了一个异构网络。然后,我们利用基于 GraphSAGE 的自动编码器,学习所构建网络中节点的拓扑和属性表示。最后,我们引入了对抗正则化自动编码器图神经网络嵌入模型,以解决传统 GraphSAGE 自动编码器在捕捉全局信息方面的固有局限性:在微生物-疾病对的五倍交叉验证下,利用人类微生物-疾病关联数据库(HMDAD)和Disbiome数据库将SARMDA与八种先进方法进行了比较。SARMDA在HMDAD上获得的最佳ROC曲线下面积(AUC)为0.9891pm$0.0057,最佳精度-召回曲线下面积(AUPR)为0.9902pm$0.0128。在 Disbiome 数据集上,AUC 为 0.9328$/pm$0.0072,最佳 AUPR 为 0.9233$/pm$0.0089,优于其他八种 MDAs 预测方法。此外,通过对哮喘和炎症性肠病病例的详细分析,证明了我们模型的有效性。
{"title":"Adversarial regularized autoencoder graph neural network for microbe-disease associations prediction.","authors":"Limuxuan He, Quan Zou, Qi Dai, Shuang Cheng, Yansu Wang","doi":"10.1093/bib/bbae584","DOIUrl":"10.1093/bib/bbae584","url":null,"abstract":"<p><strong>Background: </strong>Microorganisms inhabit various regions of the human body and significantly contribute to numerous diseases. Predicting the associations between microbes and diseases is crucial for understanding pathogenic mechanisms and informing prevention and treatment strategies. Biological experiments to determine these associations are time-consuming and costly. Therefore, integrating deep learning with biological networks can efficiently identify potential microbe-disease associations on a large scale.</p><p><strong>Methods: </strong>We propose an adversarial regularized autoencoder graph neural network algorithm, named Stacked Adversarial Regularization for Microbe-Disease Associations Prediction (SARMDA), for predicting associations between microbes and diseases. First, we integrate topological structural similarity and functional similarity metrics of microbes and diseases to construct a heterogeneous network. Then, utilizing an autoencoder based on GraphSAGE, we learn both the topological and attribute representations of nodes within the constructed network. Finally, we introduce an adversarial regularized autoencoder graph neural network embedding model to address the inherent limitations of traditional GraphSAGE autoencoders in capturing global information.</p><p><strong>Results: </strong>Under the five-fold cross-validation on microbe-disease pairs, SARMDA was compared with eight advanced methods using the Human Microbe-Disease Association Database (HMDAD) and Disbiome databases. The best area under the ROC curve (AUC) achieved by SARMDA on HMDAD was 0.9891$pm$0.0057, and the best area under the precision-recall curve (AUPR) was 0.9902$pm$0.0128. On the Disbiome dataset, the AUC was 0.9328$pm$0.0072, and the best AUPR was 0.9233$pm$0.0089, outperforming the other eight MDAs prediction methods. Furthermore, the effectiveness of our model was demonstrated through a detailed analysis of asthma and inflammatory bowel disease cases.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11554402/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142614872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Long-range alternative splicing contributes to neoantigen specificity in glioblastoma. 长程替代剪接有助于胶质母细胞瘤新抗原特异性的形成。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae503
Mingjun Ji, Qing Yu, Xin-Zhuang Yang, Xianhong Yu, Jiaxin Wang, Chunfu Xiao, Ni A An, Chuanhui Han, Chuan-Yun Li, Wanqiu Ding

Recent advances in neoantigen research have accelerated the development of immunotherapies for cancers, such as glioblastoma (GBM). Neoantigens resulting from genomic mutations and dysregulated alternative splicing have been studied in GBM. However, these studies have primarily focused on annotated alternatively-spliced transcripts, leaving non-annotated transcripts largely unexplored. Circular ribonucleic acids (circRNAs), abnormally regulated in tumors, are correlated with the presence of non-annotated linear transcripts with exon skipping events. But the extent to which these linear transcripts truly exist and their functions in cancer immunotherapies remain unknown. Here, we found the ubiquitous co-occurrence of circRNA biogenesis and alternative splicing across various tumor types, resulting in large amounts of long-range alternatively-spliced transcripts (LRs). By comparing tumor and healthy tissues, we identified tumor-specific LRs more abundant in GBM than in normal tissues and other tumor types. This may be attributable to the upregulation of the protein quaking in GBM, which is reported to promote circRNA biogenesis. In total, we identified 1057 specific and recurrent LRs in GBM. Through in silico translation prediction and MS-based immunopeptidome analysis, 16 major histocompatibility complex class I-associated peptides were identified as potential immunotherapy targets in GBM. This study revealed long-range alternatively-spliced transcripts specifically upregulated in GBM may serve as recurrent, immunogenic tumor-specific antigens.

新抗原研究的最新进展加速了胶质母细胞瘤(GBM)等癌症免疫疗法的开发。人们已经对 GBM 中基因组突变和替代剪接失调产生的新抗原进行了研究。然而,这些研究主要集中在有注释的另类剪接转录本上,对无注释的转录本基本上没有进行研究。肿瘤中异常调控的环状核糖核酸(circRNA)与存在外显子跳过事件的非注释线性转录本相关。但这些线性转录本的真实存在程度及其在癌症免疫疗法中的功能仍然未知。在这里,我们发现在各种肿瘤类型中,circRNA的生物发生和替代剪接无处不在,从而产生了大量的长程替代剪接转录本(LRs)。通过比较肿瘤组织和健康组织,我们发现肿瘤特异性 LRs 在 GBM 中比在正常组织和其他肿瘤类型中更为丰富。这可能归因于 GBM 中蛋白 quaking 的上调,有报道称这种蛋白能促进 circRNA 的生物生成。我们在 GBM 中总共发现了 1057 个特异性和复发性 LRs。通过硅翻译预测和基于 MS 的免疫肽组分析,我们发现 16 种主要组织相容性复合体 I 类相关肽是 GBM 中潜在的免疫治疗靶点。这项研究揭示了在GBM中特异性上调的长程交替剪接转录本可作为复发性、免疫原性肿瘤特异性抗原。
{"title":"Long-range alternative splicing contributes to neoantigen specificity in glioblastoma.","authors":"Mingjun Ji, Qing Yu, Xin-Zhuang Yang, Xianhong Yu, Jiaxin Wang, Chunfu Xiao, Ni A An, Chuanhui Han, Chuan-Yun Li, Wanqiu Ding","doi":"10.1093/bib/bbae503","DOIUrl":"https://doi.org/10.1093/bib/bbae503","url":null,"abstract":"<p><p>Recent advances in neoantigen research have accelerated the development of immunotherapies for cancers, such as glioblastoma (GBM). Neoantigens resulting from genomic mutations and dysregulated alternative splicing have been studied in GBM. However, these studies have primarily focused on annotated alternatively-spliced transcripts, leaving non-annotated transcripts largely unexplored. Circular ribonucleic acids (circRNAs), abnormally regulated in tumors, are correlated with the presence of non-annotated linear transcripts with exon skipping events. But the extent to which these linear transcripts truly exist and their functions in cancer immunotherapies remain unknown. Here, we found the ubiquitous co-occurrence of circRNA biogenesis and alternative splicing across various tumor types, resulting in large amounts of long-range alternatively-spliced transcripts (LRs). By comparing tumor and healthy tissues, we identified tumor-specific LRs more abundant in GBM than in normal tissues and other tumor types. This may be attributable to the upregulation of the protein quaking in GBM, which is reported to promote circRNA biogenesis. In total, we identified 1057 specific and recurrent LRs in GBM. Through in silico translation prediction and MS-based immunopeptidome analysis, 16 major histocompatibility complex class I-associated peptides were identified as potential immunotherapy targets in GBM. This study revealed long-range alternatively-spliced transcripts specifically upregulated in GBM may serve as recurrent, immunogenic tumor-specific antigens.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11472750/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142458375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HIP: a method for high-dimensional multi-view data integration and prediction accounting for subgroup heterogeneity. HIP:一种考虑亚组异质性的高维多视角数据整合与预测方法。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae470
Jessica Butts, Leif Verace, Christine Wendt, Russel P Bowler, Craig P Hersh, Qi Long, Lynn Eberly, Sandra E Safo

Epidemiologic and genetic studies in many complex diseases suggest subgroup disparities (e.g. by sex, race) in disease course and patient outcomes. We consider this from the standpoint of integrative analysis where we combine information from different views (e.g. genomics, proteomics, clinical data). Existing integrative analysis methods ignore the heterogeneity in subgroups, and stacking the views and accounting for subgroup heterogeneity does not model the association among the views. We propose Heterogeneity in Integration and Prediction (HIP), a statistical approach for joint association and prediction that leverages the strengths in each view to identify molecular signatures that are shared by and specific to a subgroup. We apply HIP to proteomics and gene expression data pertaining to chronic obstructive pulmonary disease (COPD) to identify proteins and genes shared by, and unique to, males and females, contributing to the variation in COPD, measured by airway wall thickness. Our COPD findings have identified proteins, genes, and pathways that are common across and specific to males and females, some implicated in COPD, while others could lead to new insights into sex differences in COPD mechanisms. HIP accounts for subgroup heterogeneity in multi-view data, ranks variables based on importance, is applicable to univariate or multivariate continuous outcomes, and incorporates covariate adjustment. With the efficient algorithms implemented using PyTorch, this method has many potential scientific applications and could enhance multiomics research in health disparities. HIP is available at https://github.com/lasandrall/HIP, a video tutorial at https://youtu.be/O6E2OLmeMDo and a Shiny Application at https://multi-viewlearn.shinyapps.io/HIP_ShinyApp/ for users with limited programming experience.

许多复杂疾病的流行病学和遗传学研究表明,亚群体(如性别、种族)在疾病过程和患者预后方面存在差异。我们从综合分析的角度来考虑这个问题,将来自不同视角(如基因组学、蛋白质组学、临床数据)的信息结合起来。现有的整合分析方法忽略了亚组的异质性,而堆叠视图和考虑亚组异质性并不能模拟视图之间的关联。我们提出了 "整合与预测中的异质性"(HIP),这是一种用于联合关联与预测的统计方法,它利用每个视图的优势来识别亚组共享和特异的分子特征。我们将 HIP 应用于与慢性阻塞性肺病(COPD)有关的蛋白质组学和基因表达数据,以确定男性和女性共有的和特有的蛋白质和基因,这些蛋白质和基因导致了慢性阻塞性肺病(通过气道壁厚度测量)的变异。我们的慢性阻塞性肺病研究发现了男性和女性共有的和特有的蛋白质、基因和通路,其中一些与慢性阻塞性肺病有关,而另一些则可能导致对慢性阻塞性肺病性别差异机制的新认识。HIP 考虑了多视图数据中的亚组异质性,根据重要性对变量进行排序,适用于单变量或多变量连续结果,并结合了协变量调整。通过使用 PyTorch 实现的高效算法,该方法具有许多潜在的科学应用价值,并能加强健康差异方面的多组学研究。HIP 可从 https://github.com/lasandrall/HIP 获取,视频教程可从 https://youtu.be/O6E2OLmeMDo 获取,Shiny 应用程序可从 https://multi-viewlearn.shinyapps.io/HIP_ShinyApp/ 获取,供编程经验有限的用户使用。
{"title":"HIP: a method for high-dimensional multi-view data integration and prediction accounting for subgroup heterogeneity.","authors":"Jessica Butts, Leif Verace, Christine Wendt, Russel P Bowler, Craig P Hersh, Qi Long, Lynn Eberly, Sandra E Safo","doi":"10.1093/bib/bbae470","DOIUrl":"10.1093/bib/bbae470","url":null,"abstract":"<p><p>Epidemiologic and genetic studies in many complex diseases suggest subgroup disparities (e.g. by sex, race) in disease course and patient outcomes. We consider this from the standpoint of integrative analysis where we combine information from different views (e.g. genomics, proteomics, clinical data). Existing integrative analysis methods ignore the heterogeneity in subgroups, and stacking the views and accounting for subgroup heterogeneity does not model the association among the views. We propose Heterogeneity in Integration and Prediction (HIP), a statistical approach for joint association and prediction that leverages the strengths in each view to identify molecular signatures that are shared by and specific to a subgroup. We apply HIP to proteomics and gene expression data pertaining to chronic obstructive pulmonary disease (COPD) to identify proteins and genes shared by, and unique to, males and females, contributing to the variation in COPD, measured by airway wall thickness. Our COPD findings have identified proteins, genes, and pathways that are common across and specific to males and females, some implicated in COPD, while others could lead to new insights into sex differences in COPD mechanisms. HIP accounts for subgroup heterogeneity in multi-view data, ranks variables based on importance, is applicable to univariate or multivariate continuous outcomes, and incorporates covariate adjustment. With the efficient algorithms implemented using PyTorch, this method has many potential scientific applications and could enhance multiomics research in health disparities. HIP is available at https://github.com/lasandrall/HIP, a video tutorial at https://youtu.be/O6E2OLmeMDo and a Shiny Application at https://multi-viewlearn.shinyapps.io/HIP_ShinyApp/ for users with limited programming experience.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11440091/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142341944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
miniSNV: accurate and fast single nucleotide variant calling from nanopore sequencing data. miniSNV:从纳米孔测序数据中准确快速地进行单核苷酸变异调用。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae473
Miao Cui, Yadong Liu, Xian Yu, Hongzhe Guo, Tao Jiang, Yadong Wang, Bo Liu

Nanopore sequence technology has demonstrated a longer read length and enabled to potentially address the limitations of short-read sequencing including long-range haplotype phasing and accurate variant calling. However, there is still room for improvement in terms of the performance of single nucleotide variant (SNV) identification and computing resource usage for the state-of-the-art approaches. In this work, we introduce miniSNV, a lightweight SNV calling algorithm that simultaneously achieves high performance and yield. miniSNV utilizes known common variants in populations as variation backgrounds and leverages read pileup, read-based phasing, and consensus generation to identify and genotype SNVs for Oxford Nanopore Technologies (ONT) long reads. Benchmarks on real and simulated ONT data under various error profiles demonstrate that miniSNV has superior sensitivity and comparable accuracy on SNV detection and runs faster with outstanding scalability and lower memory than most state-of-the-art variant callers. miniSNV is available from https://github.com/CuiMiao-HIT/miniSNV.

纳米孔测序技术具有更长的读数长度,有可能解决短读数测序的局限性,包括长程单倍型分期和准确的变异调用。然而,在单核苷酸变异(SNV)识别性能和计算资源使用方面,最先进的方法仍有改进的余地。miniSNV 利用人群中已知的常见变异作为变异背景,并利用读取堆积、基于读取的分期和共识生成来识别牛津纳米孔技术公司(ONT)长读取的 SNV 并对其进行基因分型。在各种误差情况下对真实和模拟 ONT 数据进行的基准测试表明,miniSNV 在 SNV 检测方面具有卓越的灵敏度和可比的准确性,而且与大多数最先进的变异调用程序相比,运行速度更快、可扩展性更强、内存更低。miniSNV 可从 https://github.com/CuiMiao-HIT/miniSNV 上获取。
{"title":"miniSNV: accurate and fast single nucleotide variant calling from nanopore sequencing data.","authors":"Miao Cui, Yadong Liu, Xian Yu, Hongzhe Guo, Tao Jiang, Yadong Wang, Bo Liu","doi":"10.1093/bib/bbae473","DOIUrl":"https://doi.org/10.1093/bib/bbae473","url":null,"abstract":"<p><p>Nanopore sequence technology has demonstrated a longer read length and enabled to potentially address the limitations of short-read sequencing including long-range haplotype phasing and accurate variant calling. However, there is still room for improvement in terms of the performance of single nucleotide variant (SNV) identification and computing resource usage for the state-of-the-art approaches. In this work, we introduce miniSNV, a lightweight SNV calling algorithm that simultaneously achieves high performance and yield. miniSNV utilizes known common variants in populations as variation backgrounds and leverages read pileup, read-based phasing, and consensus generation to identify and genotype SNVs for Oxford Nanopore Technologies (ONT) long reads. Benchmarks on real and simulated ONT data under various error profiles demonstrate that miniSNV has superior sensitivity and comparable accuracy on SNV detection and runs faster with outstanding scalability and lower memory than most state-of-the-art variant callers. miniSNV is available from https://github.com/CuiMiao-HIT/miniSNV.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11428505/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142341946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A two-task predictor for discovering phase separation proteins and their undergoing mechanism. 发现相分离蛋白质及其作用机制的双任务预测器
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae528
Yetong Zhou, Shengming Zhou, Yue Bi, Quan Zou, Cangzhi Jia

Liquid-liquid phase separation (LLPS) is one of the mechanisms mediating the compartmentalization of macromolecules (proteins and nucleic acids) in cells, forming biomolecular condensates or membraneless organelles. Consequently, the systematic identification of potential LLPS proteins is crucial for understanding the phase separation process and its biological mechanisms. A two-task predictor, Opt_PredLLPS, was developed to discover potential phase separation proteins and further evaluate their mechanism. The first task model of Opt_PredLLPS combines a convolutional neural network (CNN) and bidirectional long short-term memory (BiLSTM) through a fully connected layer, where the CNN utilizes evolutionary information features as input, and BiLSTM utilizes multimodal features as input. If a protein is predicted to be an LLPS protein, it is input into the second task model to predict whether this protein needs to interact with its partners to undergo LLPS. The second task model employs the XGBoost classification algorithm and 37 physicochemical properties following a three-step feature selection. The effectiveness of the model was validated on multiple benchmark datasets, and in silico saturation mutagenesis was used to identify regions that play a key role in phase separation. These findings may assist future research on the LLPS mechanism and the discovery of potential phase separation proteins.

液-液相分离(LLPS)是介导细胞内大分子(蛋白质和核酸)分隔、形成生物分子凝聚体或无膜细胞器的机制之一。因此,系统识别潜在的 LLPS 蛋白对于了解相分离过程及其生物机制至关重要。为了发现潜在的相分离蛋白并进一步评估其机制,我们开发了一个双任务预测器 Opt_PredLLPS。Opt_PredLLPS 的第一个任务模型通过一个全连接层将卷积神经网络(CNN)和双向长短期记忆(BiLSTM)结合在一起,其中 CNN 利用进化信息特征作为输入,BiLSTM 利用多模态特征作为输入。如果一个蛋白质被预测为 LLPS 蛋白质,它就会被输入第二个任务模型,以预测该蛋白质是否需要与其伙伴相互作用才能发生 LLPS。第二个任务模型采用了 XGBoost 分类算法和 37 种物理化学特性,并经过三步特征选择。该模型的有效性在多个基准数据集上得到了验证,并利用硅饱和诱变技术确定了在相分离中起关键作用的区域。这些发现可能有助于未来对 LLPS 机制的研究和潜在相分离蛋白的发现。
{"title":"A two-task predictor for discovering phase separation proteins and their undergoing mechanism.","authors":"Yetong Zhou, Shengming Zhou, Yue Bi, Quan Zou, Cangzhi Jia","doi":"10.1093/bib/bbae528","DOIUrl":"10.1093/bib/bbae528","url":null,"abstract":"<p><p>Liquid-liquid phase separation (LLPS) is one of the mechanisms mediating the compartmentalization of macromolecules (proteins and nucleic acids) in cells, forming biomolecular condensates or membraneless organelles. Consequently, the systematic identification of potential LLPS proteins is crucial for understanding the phase separation process and its biological mechanisms. A two-task predictor, Opt_PredLLPS, was developed to discover potential phase separation proteins and further evaluate their mechanism. The first task model of Opt_PredLLPS combines a convolutional neural network (CNN) and bidirectional long short-term memory (BiLSTM) through a fully connected layer, where the CNN utilizes evolutionary information features as input, and BiLSTM utilizes multimodal features as input. If a protein is predicted to be an LLPS protein, it is input into the second task model to predict whether this protein needs to interact with its partners to undergo LLPS. The second task model employs the XGBoost classification algorithm and 37 physicochemical properties following a three-step feature selection. The effectiveness of the model was validated on multiple benchmark datasets, and in silico saturation mutagenesis was used to identify regions that play a key role in phase separation. These findings may assist future research on the LLPS mechanism and the discovery of potential phase separation proteins.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11492799/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142458361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AptaDiff: de novo design and optimization of aptamers based on diffusion models. AptaDiff:基于扩散模型的全新设计和优化适配体。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae517
Zhen Wang, Ziqi Liu, Wei Zhang, Yanjun Li, Yizhen Feng, Shaokang Lv, Han Diao, Zhaofeng Luo, Pengju Yan, Min He, Xiaolin Li

Aptamers are single-stranded nucleic acid ligands, featuring high affinity and specificity to target molecules. Traditionally they are identified from large DNA/RNA libraries using $in vitro$ methods, like Systematic Evolution of Ligands by Exponential Enrichment (SELEX). However, these libraries capture only a small fraction of theoretical sequence space, and various aptamer candidates are constrained by actual sequencing capabilities from the experiment. Addressing this, we proposed AptaDiff, the first in silico aptamer design and optimization method based on the diffusion model. Our Aptadiff can generate aptamers beyond the constraints of high-throughput sequencing data, leveraging motif-dependent latent embeddings from variational autoencoder, and can optimize aptamers by affinity-guided aptamer generation according to Bayesian optimization. Comparative evaluations revealed AptaDiff's superiority over existing aptamer generation methods in terms of quality and fidelity across four high-throughput screening data targeting distinct proteins. Moreover, surface plasmon resonance experiments were conducted to validate the binding affinity of aptamers generated through Bayesian optimization for two target proteins. The results unveiled a significant boost of $87.9%$ and $60.2%$ in RU values, along with a 3.6-fold and 2.4-fold decrease in KD values for the respective target proteins. Notably, the optimized aptamers demonstrated superior binding affinity compared to top experimental candidates selected through SELEX, underscoring the promising outcomes of our AptaDiff in accelerating the discovery of superior aptamers.

Aptamers 是单链核酸配体,对目标分子具有高亲和力和特异性。传统上,它们是通过体外方法(如通过指数富集配体的系统进化(SELEX))从大型 DNA/RNA 文库中鉴定出来的。然而,这些文库只能捕获理论序列空间的一小部分,而且各种适配体候选物受到实验实际测序能力的限制。针对这一问题,我们提出了 AptaDiff,这是第一种基于扩散模型的硅学适配体设计和优化方法。我们的 Aptadiff 可以超越高通量测序数据的限制,利用变异自动编码器中依赖于主题的潜在嵌入来生成适配体,并可以根据贝叶斯优化法通过亲和力引导生成适配体来优化适配体。对比评估显示,在针对不同蛋白质的四种高通量筛选数据中,AptaDiff 在质量和保真度方面优于现有的适配体生成方法。此外,还进行了表面等离子体共振实验,以验证通过贝叶斯优化生成的适配体对两种目标蛋白质的结合亲和力。结果表明,RU值分别显著提高了87.9%$和60.2%$,KD值分别降低了3.6倍和2.4倍。值得注意的是,与通过 SELEX 筛选出的顶级实验候选物相比,优化后的适配体表现出了更高的结合亲和力,这突显了我们的 AptaDiff 在加速发现优质适配体方面取得的可喜成果。
{"title":"AptaDiff: de novo design and optimization of aptamers based on diffusion models.","authors":"Zhen Wang, Ziqi Liu, Wei Zhang, Yanjun Li, Yizhen Feng, Shaokang Lv, Han Diao, Zhaofeng Luo, Pengju Yan, Min He, Xiaolin Li","doi":"10.1093/bib/bbae517","DOIUrl":"10.1093/bib/bbae517","url":null,"abstract":"<p><p>Aptamers are single-stranded nucleic acid ligands, featuring high affinity and specificity to target molecules. Traditionally they are identified from large DNA/RNA libraries using $in vitro$ methods, like Systematic Evolution of Ligands by Exponential Enrichment (SELEX). However, these libraries capture only a small fraction of theoretical sequence space, and various aptamer candidates are constrained by actual sequencing capabilities from the experiment. Addressing this, we proposed AptaDiff, the first in silico aptamer design and optimization method based on the diffusion model. Our Aptadiff can generate aptamers beyond the constraints of high-throughput sequencing data, leveraging motif-dependent latent embeddings from variational autoencoder, and can optimize aptamers by affinity-guided aptamer generation according to Bayesian optimization. Comparative evaluations revealed AptaDiff's superiority over existing aptamer generation methods in terms of quality and fidelity across four high-throughput screening data targeting distinct proteins. Moreover, surface plasmon resonance experiments were conducted to validate the binding affinity of aptamers generated through Bayesian optimization for two target proteins. The results unveiled a significant boost of $87.9%$ and $60.2%$ in RU values, along with a 3.6-fold and 2.4-fold decrease in KD values for the respective target proteins. Notably, the optimized aptamers demonstrated superior binding affinity compared to top experimental candidates selected through SELEX, underscoring the promising outcomes of our AptaDiff in accelerating the discovery of superior aptamers.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11491854/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142458363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Current approaches and outstanding challenges of functional annotation of metabolites: a comprehensive review. 代谢物功能注释的当前方法和突出挑战:全面综述。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae498
Quang-Huy Nguyen, Ha Nguyen, Edwin C Oh, Tin Nguyen

Metabolite profiling is a powerful approach for the clinical diagnosis of complex diseases, ranging from cardiometabolic diseases, cancer, and cognitive disorders to respiratory pathologies and conditions that involve dysregulated metabolism. Because of the importance of systems-level interpretation, many methods have been developed to identify biologically significant pathways using metabolomics data. In this review, we first describe a complete metabolomics workflow (sample preparation, data acquisition, pre-processing, downstream analysis, etc.). We then comprehensively review 24 approaches capable of performing functional analysis, including those that combine metabolomics data with other types of data to investigate the disease-relevant changes at multiple omics layers. We discuss their availability, implementation, capability for pre-processing and quality control, supported omics types, embedded databases, pathway analysis methodologies, and integration techniques. We also provide a rating and evaluation of each software, focusing on their key technique, software accessibility, documentation, and user-friendliness. Following our guideline, life scientists can easily choose a suitable method depending on method rating, available data, input format, and method category. More importantly, we highlight outstanding challenges and potential solutions that need to be addressed by future research. To further assist users in executing the reviewed methods, we provide wrappers of the software packages at https://github.com/tinnlab/metabolite-pathway-review-docker.

代谢组学分析是临床诊断复杂疾病的有力方法,其范围从心脏代谢疾病、癌症和认知障碍到呼吸系统病症和涉及代谢失调的疾病。由于系统级解读的重要性,人们开发了许多方法来利用代谢组学数据识别具有生物学意义的通路。在本综述中,我们首先介绍了完整的代谢组学工作流程(样品制备、数据采集、预处理、下游分析等)。然后,我们全面回顾了能够进行功能分析的 24 种方法,包括那些将代谢组学数据与其他类型的数据相结合,在多个 omics 层面研究疾病相关变化的方法。我们讨论了这些方法的可用性、实施情况、预处理和质量控制能力、支持的 omics 类型、嵌入式数据库、通路分析方法和集成技术。我们还对每种软件进行了评级和评估,重点关注其关键技术、软件可访问性、文档和用户友好性。根据我们的指南,生命科学家可以很容易地根据方法评级、可用数据、输入格式和方法类别选择合适的方法。更重要的是,我们强调了未来研究需要解决的突出挑战和潜在解决方案。为了进一步帮助用户执行所审查的方法,我们在 https://github.com/tinnlab/metabolite-pathway-review-docker 网站上提供了软件包的封装程序。
{"title":"Current approaches and outstanding challenges of functional annotation of metabolites: a comprehensive review.","authors":"Quang-Huy Nguyen, Ha Nguyen, Edwin C Oh, Tin Nguyen","doi":"10.1093/bib/bbae498","DOIUrl":"https://doi.org/10.1093/bib/bbae498","url":null,"abstract":"<p><p>Metabolite profiling is a powerful approach for the clinical diagnosis of complex diseases, ranging from cardiometabolic diseases, cancer, and cognitive disorders to respiratory pathologies and conditions that involve dysregulated metabolism. Because of the importance of systems-level interpretation, many methods have been developed to identify biologically significant pathways using metabolomics data. In this review, we first describe a complete metabolomics workflow (sample preparation, data acquisition, pre-processing, downstream analysis, etc.). We then comprehensively review 24 approaches capable of performing functional analysis, including those that combine metabolomics data with other types of data to investigate the disease-relevant changes at multiple omics layers. We discuss their availability, implementation, capability for pre-processing and quality control, supported omics types, embedded databases, pathway analysis methodologies, and integration techniques. We also provide a rating and evaluation of each software, focusing on their key technique, software accessibility, documentation, and user-friendliness. Following our guideline, life scientists can easily choose a suitable method depending on method rating, available data, input format, and method category. More importantly, we highlight outstanding challenges and potential solutions that need to be addressed by future research. To further assist users in executing the reviewed methods, we provide wrappers of the software packages at https://github.com/tinnlab/metabolite-pathway-review-docker.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11471905/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142458370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Briefings in bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1