首页 > 最新文献

Briefings in bioinformatics最新文献

英文 中文
A comprehensive benchmark study of methods for identifying significantly perturbed subnetworks in cancer. 鉴别癌症中显著扰动子网络的方法的综合基准研究。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbae692
Le Yang, Runpu Chen, Steve Goodison, Yijun Sun

Network-based methods utilize protein-protein interaction information to identify significantly perturbed subnetworks in cancer and to propose key molecular pathways. Numerous methods have been developed, but to date, a rigorous benchmark analysis to compare the performance of existing approaches is lacking. In this paper, we proposed a novel benchmarking framework using synthetic data and conducted a comprehensive analysis to investigate the ability of existing methods to detect target genes and subnetworks and to control false positives, and how they perform in the presence of topological biases at both gene and subnetwork levels. Our analysis revealed insights into algorithmic performance that were previously unattainable. Based on the results of the benchmark study, we presented a practical guide for users on how to select appropriate detection methods and protein-protein interaction networks for cancer pathway identification, and provided suggestions for future algorithm development.

基于网络的方法利用蛋白质-蛋白质相互作用信息来识别癌症中显著受干扰的子网络,并提出关键的分子途径。已经开发了许多方法,但是到目前为止,还缺乏一个严格的基准分析来比较现有方法的性能。在本文中,我们使用合成数据提出了一个新的基准测试框架,并进行了全面的分析,以研究现有方法检测目标基因和子网络以及控制假阳性的能力,以及它们在基因和子网络层面存在拓扑偏差时的表现。我们的分析揭示了对算法性能的洞察,这是以前无法实现的。基于基准研究的结果,我们为用户提供了如何选择合适的检测方法和蛋白质-蛋白质相互作用网络进行癌症通路识别的实用指南,并为未来的算法开发提供了建议。
{"title":"A comprehensive benchmark study of methods for identifying significantly perturbed subnetworks in cancer.","authors":"Le Yang, Runpu Chen, Steve Goodison, Yijun Sun","doi":"10.1093/bib/bbae692","DOIUrl":"10.1093/bib/bbae692","url":null,"abstract":"<p><p>Network-based methods utilize protein-protein interaction information to identify significantly perturbed subnetworks in cancer and to propose key molecular pathways. Numerous methods have been developed, but to date, a rigorous benchmark analysis to compare the performance of existing approaches is lacking. In this paper, we proposed a novel benchmarking framework using synthetic data and conducted a comprehensive analysis to investigate the ability of existing methods to detect target genes and subnetworks and to control false positives, and how they perform in the presence of topological biases at both gene and subnetwork levels. Our analysis revealed insights into algorithmic performance that were previously unattainable. Based on the results of the benchmark study, we presented a practical guide for users on how to select appropriate detection methods and protein-protein interaction networks for cancer pathway identification, and provided suggestions for future algorithm development.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11684898/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142906223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CDCM: a correlation-dependent connectivity map approach to rapidly screen drugs during outbreaks of infectious diseases. CDCM:在传染病暴发期间快速筛选药物的相关性依赖连接图方法。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbae659
Junlei Liao, Hongyang Yi, Hao Wang, Sumei Yang, Duanmei Jiang, Xin Huang, Mingxia Zhang, Jiayin Shen, Hongzhou Lu, Yuanling Niu

In the context of the global damage caused by coronavirus disease 2019 (COVID-19) and the emergence of the monkeypox virus (MPXV) outbreak as a public health emergency of international concern, research into methods that can rapidly test potential therapeutics during an outbreak of a new infectious disease is urgently needed. Computational drug discovery is an effective way to solve such problems. The existence of various large open databases has mitigated the time and resource consumption of traditional drug development and improved the speed of drug discovery. However, the diversity of cell lines used in various databases remains limited, and previous drug discovery methods are ineffective for cross-cell prediction. In this study, we propose a correlation-dependent connectivity map (CDCM) to achieve cross-cell predictions of drug similarity. The CDCM mainly identifies drug-drug or disease-drug relationships from the perspective of gene networks by exploring the correlation changes between genes and identifying similarities in the effects of drugs or diseases on gene expression. We validated the CDCM on multiple datasets and found that it performed well for drug identification across cell lines. A comparison with the Connectivity Map revealed that our method was more stable and performed better across different cell lines. In the application of the CDCM to COVID-19 and MPXV data, the predictions of potential therapeutic compounds for COVID-19 were consistent with several previous studies, and most of the predicted drugs were found to be experimentally effective against MPXV. This result confirms the practical value of the CDCM. With the ability to predict across cell lines, the CDCM outperforms the Connectivity Map, and it has wider application prospects and a reduced cost of use.

在2019年冠状病毒病(COVID-19)造成全球损害以及猴痘病毒(MPXV)疫情成为国际关注的突发公共卫生事件的背景下,迫切需要研究能够在新传染病暴发期间快速测试潜在治疗方法的方法。计算药物发现是解决这类问题的有效途径。各种大型开放数据库的存在,减轻了传统药物开发的时间和资源消耗,提高了药物发现的速度。然而,在各种数据库中使用的细胞系的多样性仍然有限,以前的药物发现方法对于跨细胞预测是无效的。在这项研究中,我们提出了一个相关依赖的连接图(CDCM)来实现药物相似性的跨细胞预测。CDCM主要从基因网络的角度,通过探索基因之间的相关性变化,识别药物或疾病对基因表达作用的相似性,来识别药物-药物或疾病-药物关系。我们在多个数据集上验证了CDCM,发现它在跨细胞系的药物鉴定中表现良好。与连接图的比较表明,我们的方法在不同的细胞系中更稳定,表现更好。在将CDCM应用于COVID-19和MPXV数据中,对COVID-19潜在治疗化合物的预测与先前的一些研究一致,并且大多数预测药物在实验中被发现对MPXV有效。这一结果证实了CDCM的实用价值。CDCM具有跨细胞系预测的能力,优于连通性图,具有更广泛的应用前景和更低的使用成本。
{"title":"CDCM: a correlation-dependent connectivity map approach to rapidly screen drugs during outbreaks of infectious diseases.","authors":"Junlei Liao, Hongyang Yi, Hao Wang, Sumei Yang, Duanmei Jiang, Xin Huang, Mingxia Zhang, Jiayin Shen, Hongzhou Lu, Yuanling Niu","doi":"10.1093/bib/bbae659","DOIUrl":"10.1093/bib/bbae659","url":null,"abstract":"<p><p>In the context of the global damage caused by coronavirus disease 2019 (COVID-19) and the emergence of the monkeypox virus (MPXV) outbreak as a public health emergency of international concern, research into methods that can rapidly test potential therapeutics during an outbreak of a new infectious disease is urgently needed. Computational drug discovery is an effective way to solve such problems. The existence of various large open databases has mitigated the time and resource consumption of traditional drug development and improved the speed of drug discovery. However, the diversity of cell lines used in various databases remains limited, and previous drug discovery methods are ineffective for cross-cell prediction. In this study, we propose a correlation-dependent connectivity map (CDCM) to achieve cross-cell predictions of drug similarity. The CDCM mainly identifies drug-drug or disease-drug relationships from the perspective of gene networks by exploring the correlation changes between genes and identifying similarities in the effects of drugs or diseases on gene expression. We validated the CDCM on multiple datasets and found that it performed well for drug identification across cell lines. A comparison with the Connectivity Map revealed that our method was more stable and performed better across different cell lines. In the application of the CDCM to COVID-19 and MPXV data, the predictions of potential therapeutic compounds for COVID-19 were consistent with several previous studies, and most of the predicted drugs were found to be experimentally effective against MPXV. This result confirms the practical value of the CDCM. With the ability to predict across cell lines, the CDCM outperforms the Connectivity Map, and it has wider application prospects and a reduced cost of use.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11658818/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142863338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UPicker: a semi-supervised particle picking transformer method for cryo-EM micrographs. UPicker:一种用于低温电镜显微图的半监督粒子拾取变压器方法。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbae636
Chi Zhang, Yiran Cheng, Kaiwen Feng, Fa Zhang, Renmin Han, Jieqing Feng

Automatic single particle picking is a critical step in the data processing pipeline of cryo-electron microscopy structure reconstruction. In recent years, several deep learning-based algorithms have been developed, demonstrating their potential to solve this challenge. However, current methods highly depend on manually labeled training data, which is labor-intensive and prone to biases especially for high-noise and low-contrast micrographs, resulting in suboptimal precision and recall. To address these problems, we propose UPicker, a semi-supervised transformer-based particle-picking method with a two-stage training process: unsupervised pretraining and supervised fine-tuning. During the unsupervised pretraining, an Adaptive Laplacian of Gaussian region proposal generator is proposed to obtain pseudo-labels from unlabeled data for initial feature learning. For the supervised fine-tuning, UPicker only needs a small amount of labeled data to achieve high accuracy in particle picking. To further enhance model performance, UPicker employs a contrastive denoising training strategy to reduce redundant detections and accelerate convergence, along with a hybrid data augmentation strategy to deal with limited labeled data. Comprehensive experiments on both simulated and experimental datasets demonstrate that UPicker outperforms state-of-the-art particle-picking methods in terms of accuracy and robustness while requiring fewer labeled data than other transformer-based models. Furthermore, ablation studies demonstrate the effectiveness and necessity of each component of UPicker. The source code and data are available at https://github.com/JachyLikeCoding/UPicker.

单粒子自动拾取是低温电镜结构重建数据处理流程中的关键步骤。近年来,一些基于深度学习的算法已经被开发出来,展示了它们解决这一挑战的潜力。然而,目前的方法高度依赖于人工标记的训练数据,这是劳动密集型的,容易产生偏差,特别是对于高噪声和低对比度的显微照片,导致精度和召回率不理想。为了解决这些问题,我们提出了UPicker,一种基于半监督变压器的粒子拾取方法,具有两阶段的训练过程:无监督预训练和监督微调。在无监督预训练中,提出了一种自适应高斯区域拉普拉斯建议生成器,从未标记的数据中获取伪标签进行初始特征学习。对于监督微调,UPicker只需要少量的标记数据就可以达到较高的粒子拾取精度。为了进一步提高模型性能,UPicker采用对比去噪训练策略来减少冗余检测并加速收敛,同时采用混合数据增强策略来处理有限的标记数据。在模拟和实验数据集上的综合实验表明,UPicker在准确性和鲁棒性方面优于最先进的颗粒拾取方法,同时比其他基于变压器的模型需要更少的标记数据。此外,烧蚀研究证明了UPicker各组成部分的有效性和必要性。源代码和数据可从https://github.com/JachyLikeCoding/UPicker获得。
{"title":"UPicker: a semi-supervised particle picking transformer method for cryo-EM micrographs.","authors":"Chi Zhang, Yiran Cheng, Kaiwen Feng, Fa Zhang, Renmin Han, Jieqing Feng","doi":"10.1093/bib/bbae636","DOIUrl":"10.1093/bib/bbae636","url":null,"abstract":"<p><p>Automatic single particle picking is a critical step in the data processing pipeline of cryo-electron microscopy structure reconstruction. In recent years, several deep learning-based algorithms have been developed, demonstrating their potential to solve this challenge. However, current methods highly depend on manually labeled training data, which is labor-intensive and prone to biases especially for high-noise and low-contrast micrographs, resulting in suboptimal precision and recall. To address these problems, we propose UPicker, a semi-supervised transformer-based particle-picking method with a two-stage training process: unsupervised pretraining and supervised fine-tuning. During the unsupervised pretraining, an Adaptive Laplacian of Gaussian region proposal generator is proposed to obtain pseudo-labels from unlabeled data for initial feature learning. For the supervised fine-tuning, UPicker only needs a small amount of labeled data to achieve high accuracy in particle picking. To further enhance model performance, UPicker employs a contrastive denoising training strategy to reduce redundant detections and accelerate convergence, along with a hybrid data augmentation strategy to deal with limited labeled data. Comprehensive experiments on both simulated and experimental datasets demonstrate that UPicker outperforms state-of-the-art particle-picking methods in terms of accuracy and robustness while requiring fewer labeled data than other transformer-based models. Furthermore, ablation studies demonstrate the effectiveness and necessity of each component of UPicker. The source code and data are available at https://github.com/JachyLikeCoding/UPicker.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11631311/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142806025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BICEP: Bayesian inference for rare genomic variant causality evaluation in pedigrees. BICEP:血统中罕见基因组变异因果关系评估的贝叶斯推断。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbae624
Cathal Ormond, Niamh M Ryan, Mathieu Cap, William Byerley, Aiden Corvin, Elizabeth A Heron

Next-generation sequencing is widely applied to the investigation of pedigree data for gene discovery. However, identifying plausible disease-causing variants within a robust statistical framework is challenging. Here, we introduce BICEP: a Bayesian inference tool for rare variant causality evaluation in pedigree-based cohorts. BICEP calculates the posterior odds that a genomic variant is causal for a phenotype based on the variant cosegregation as well as a priori evidence such as deleteriousness and functional consequence. BICEP can correctly identify causal variants for phenotypes with both Mendelian and complex genetic architectures, outperforming existing methodologies. Additionally, BICEP can correctly down-weight common variants that are unlikely to be involved in phenotypic liability in the context of a pedigree, even if they have reasonable cosegregation patterns. The output metrics from BICEP allow for the quantitative comparison of variant causality within and across pedigrees, which is not possible with existing approaches.

下一代测序被广泛应用于基因发现的血统数据调查。然而,在稳健的统计框架内识别可信的致病变异是一项挑战。在此,我们介绍 BICEP:一种贝叶斯推断工具,用于评估基于血统的队列中罕见变异的因果关系。BICEP 根据变异共聚以及先验证据(如缺失性和功能后果)计算基因组变异对表型具有因果关系的后验几率。BICEP 可以正确识别孟德尔和复杂遗传结构表型的因果变异,优于现有方法。此外,BICEP 还能正确地降低常见变异的权重,这些变异即使具有合理的共分离模式,也不太可能与血统中的表型责任有关。BICEP 的输出指标允许对血统内和血统间的变异因果关系进行定量比较,这是现有方法无法做到的。
{"title":"BICEP: Bayesian inference for rare genomic variant causality evaluation in pedigrees.","authors":"Cathal Ormond, Niamh M Ryan, Mathieu Cap, William Byerley, Aiden Corvin, Elizabeth A Heron","doi":"10.1093/bib/bbae624","DOIUrl":"10.1093/bib/bbae624","url":null,"abstract":"<p><p>Next-generation sequencing is widely applied to the investigation of pedigree data for gene discovery. However, identifying plausible disease-causing variants within a robust statistical framework is challenging. Here, we introduce BICEP: a Bayesian inference tool for rare variant causality evaluation in pedigree-based cohorts. BICEP calculates the posterior odds that a genomic variant is causal for a phenotype based on the variant cosegregation as well as a priori evidence such as deleteriousness and functional consequence. BICEP can correctly identify causal variants for phenotypes with both Mendelian and complex genetic architectures, outperforming existing methodologies. Additionally, BICEP can correctly down-weight common variants that are unlikely to be involved in phenotypic liability in the context of a pedigree, even if they have reasonable cosegregation patterns. The output metrics from BICEP allow for the quantitative comparison of variant causality within and across pedigrees, which is not possible with existing approaches.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11645550/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142827358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cell-type deconvolution for bulk RNA-seq data using single-cell reference: a comparative analysis and recommendation guideline.
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbaf031
Xintian Xu, Rui Li, Ouyang Mo, Kai Liu, Justin Li, Pei Hao

The accurate estimation of cell type proportions in tissues is crucial for various downstream analyses. With the increasing availability of single-cell sequencing data, numerous deconvolution methods that use single-cell RNA sequencing data as a reference have been developed. However, a unified understanding of how these deconvolution approaches perform in practical applications is still lacking. To address this, we systematically assessed the accuracy and robustness of nine deconvolution methods that use single-cell RNA sequencing data as a reference, evaluating them on real bulk data with cell proportions verified through flow cytometry, as well as simulated bulk data generated from five single-cell RNA sequencing datasets. Our study highlights the importance of several factors-including reference dataset construction strategies, dataset size, cell type subdivision, and cell type inconsistency-on the accuracy and robustness of deconvolution results. We also propose a set of recommended guidelines for software users in diverse scenarios.

{"title":"Cell-type deconvolution for bulk RNA-seq data using single-cell reference: a comparative analysis and recommendation guideline.","authors":"Xintian Xu, Rui Li, Ouyang Mo, Kai Liu, Justin Li, Pei Hao","doi":"10.1093/bib/bbaf031","DOIUrl":"10.1093/bib/bbaf031","url":null,"abstract":"<p><p>The accurate estimation of cell type proportions in tissues is crucial for various downstream analyses. With the increasing availability of single-cell sequencing data, numerous deconvolution methods that use single-cell RNA sequencing data as a reference have been developed. However, a unified understanding of how these deconvolution approaches perform in practical applications is still lacking. To address this, we systematically assessed the accuracy and robustness of nine deconvolution methods that use single-cell RNA sequencing data as a reference, evaluating them on real bulk data with cell proportions verified through flow cytometry, as well as simulated bulk data generated from five single-cell RNA sequencing datasets. Our study highlights the importance of several factors-including reference dataset construction strategies, dataset size, cell type subdivision, and cell type inconsistency-on the accuracy and robustness of deconvolution results. We also propose a set of recommended guidelines for software users in diverse scenarios.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11789683/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143122256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Challenges and opportunities of developing bioinformatics platforms in Africa: the case of BurkinaBioinfo at Joseph Ki-Zerbo University, Burkina Faso.
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbaf040
Ezechiel B Tibiri, Palwende R Boua, Issiaka Soulama, Christine Dubreuil-Tranchant, Ndomassi Tando, Charlotte Tollenaere, Christophe Brugidou, Romaric K Nanema, Fidèle Tiendrebeogo

Bioinformatics, an interdisciplinary field combining biology and computer science, enables meaningful information to be extracted from complex biological data. The exponential growth of biological data, driven by high-throughput omics technologies and advanced sequencing methods, requires robust computational resources. Worldwide, bioinformatics skills and computational clusters are essential for managing and analysing large-scale biological datasets across health, agriculture, and environmental science, which are crucial for the African continent. In Burkina Faso, the establishment of bioinformatics infrastructure has been a gradual process. Initial training initiatives between 2015-2016, including bioinformatics courses and the establishment of the BurkinaBioinfo (BBi) platform, marked significant progress. Over 250 scientists have been trained at diverse levels in bioinformatics, 105 user accounts have been created for high-performance computing access. Operational since 2019, this platform has significantly facilitated training programs for scientists and system administrators in west Africa, covering data production, introductory bioinformatics, phylogenetic analysis, and metagenomics. Financial and technical support from various sources has facilitated the rapid development of the platform to meet the growing need for bioinformatics analysis, particularly in conjunction with local 'wet labs'. Establishing a bioinformatics cluster in Burkina Faso involved identifying the needs of researchers, selecting appropriate hardware and installing the necessary bioinformatics tools. At present, the main challenges for the BBi platform include ongoing staff training in bioinformatics skills and high-level IT infrastructure management in the face of growing infrastructure demands. Despite these challenges, the establishment of a bioinformatics platform in Burkina Faso offers significant opportunities for scientific research and economic development in the country.

{"title":"Challenges and opportunities of developing bioinformatics platforms in Africa: the case of BurkinaBioinfo at Joseph Ki-Zerbo University, Burkina Faso.","authors":"Ezechiel B Tibiri, Palwende R Boua, Issiaka Soulama, Christine Dubreuil-Tranchant, Ndomassi Tando, Charlotte Tollenaere, Christophe Brugidou, Romaric K Nanema, Fidèle Tiendrebeogo","doi":"10.1093/bib/bbaf040","DOIUrl":"10.1093/bib/bbaf040","url":null,"abstract":"<p><p>Bioinformatics, an interdisciplinary field combining biology and computer science, enables meaningful information to be extracted from complex biological data. The exponential growth of biological data, driven by high-throughput omics technologies and advanced sequencing methods, requires robust computational resources. Worldwide, bioinformatics skills and computational clusters are essential for managing and analysing large-scale biological datasets across health, agriculture, and environmental science, which are crucial for the African continent. In Burkina Faso, the establishment of bioinformatics infrastructure has been a gradual process. Initial training initiatives between 2015-2016, including bioinformatics courses and the establishment of the BurkinaBioinfo (BBi) platform, marked significant progress. Over 250 scientists have been trained at diverse levels in bioinformatics, 105 user accounts have been created for high-performance computing access. Operational since 2019, this platform has significantly facilitated training programs for scientists and system administrators in west Africa, covering data production, introductory bioinformatics, phylogenetic analysis, and metagenomics. Financial and technical support from various sources has facilitated the rapid development of the platform to meet the growing need for bioinformatics analysis, particularly in conjunction with local 'wet labs'. Establishing a bioinformatics cluster in Burkina Faso involved identifying the needs of researchers, selecting appropriate hardware and installing the necessary bioinformatics tools. At present, the main challenges for the BBi platform include ongoing staff training in bioinformatics skills and high-level IT infrastructure management in the face of growing infrastructure demands. Despite these challenges, the establishment of a bioinformatics platform in Burkina Faso offers significant opportunities for scientific research and economic development in the country.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11789681/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143122300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TransBic: bucket trend-preserving biclustering for finding local and interpretable expression patterns.
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbaf050
Jing Li, Qinglin Mei, Chaoxia Yang, Naibo Zhu, Guojun Li

Biclustering has emerged as a promising approach for analyzing high-dimensional expression data, offering unique advantages in uncovering localized co-expression patterns that traditional clustering methods often miss and thus facilitating advancements in complex disease research and other biomedical applications. However, state-of-the-art methods identify distinct patterns at the expense of losing information about specific patterns, some of which have been used to define cancer subtypes or reflect the progression of a disease or cellular processes. Additionally, these methods exhibit poor effectiveness in noisy environments. To address these limitations, we propose the bucket trend-preserving (BTP) pattern, a novel generalization of existing patterns. And we have developed an algorithm, TransBic, to extract significant biclusters of BTP-patterns. Specifically, TransBic transforms the problem into identifying common multipartite acyclic tournament subdigraphs shared by distinct subsets of acyclic tournament digraphs derived from a given expression matrix. Compared with prominent tools, TransBic demonstrates superior performance in identifying biclusters of all non-row-constant patterns, especially under noise and data fluctuations. Furthermore, TransBic successfully identifies the most disease-related pathways for type 2 diabetes (T2D), colorectal cancer, hepatocellular carcinoma, and breast cancer, outperforming other tools in this regard. Different from previous generalizations, BTP-patterns capture specific up-regulation and down-regulation dynamics. Through targeted analysis of BTP-patterns in T2D expression data, TransBic uncovers biological processes affected by disease risk factors, extending the application of trend-preserving biclustering in expression data analysis.

双聚类已成为分析高维表达数据的一种有前途的方法,它在发现传统聚类方法经常忽略的局部共表达模式方面具有独特的优势,从而促进了复杂疾病研究和其他生物医学应用的发展。然而,最先进的方法在识别独特模式的同时,也会丢失特定模式的信息,其中一些模式已被用于定义癌症亚型或反映疾病进展或细胞过程。此外,这些方法在嘈杂的环境中效果不佳。为了解决这些局限性,我们提出了水桶趋势保留(BTP)模式,这是对现有模式的新概括。我们还开发了一种名为 TransBic 的算法,用于提取 BTP 模式的重要双簇。具体来说,TransBic 将问题转化为识别由给定表达式矩阵导出的无环锦标赛图的不同子集所共享的共同多方无环锦标赛子图。与著名的工具相比,TransBic 在识别所有非行常数模式的双簇方面表现出卓越的性能,尤其是在噪声和数据波动的情况下。此外,TransBic 还成功地识别出了 2 型糖尿病(T2D)、结直肠癌、肝细胞癌和乳腺癌中与疾病最相关的通路,在这方面优于其他工具。与以往的概括不同,BTP 模式捕捉了特定的上调和下调动态。通过有针对性地分析 T2D 表达数据中的 BTP 模式,TransBic 发现了受疾病风险因素影响的生物过程,拓展了趋势保留双聚类在表达数据分析中的应用。
{"title":"TransBic: bucket trend-preserving biclustering for finding local and interpretable expression patterns.","authors":"Jing Li, Qinglin Mei, Chaoxia Yang, Naibo Zhu, Guojun Li","doi":"10.1093/bib/bbaf050","DOIUrl":"10.1093/bib/bbaf050","url":null,"abstract":"<p><p>Biclustering has emerged as a promising approach for analyzing high-dimensional expression data, offering unique advantages in uncovering localized co-expression patterns that traditional clustering methods often miss and thus facilitating advancements in complex disease research and other biomedical applications. However, state-of-the-art methods identify distinct patterns at the expense of losing information about specific patterns, some of which have been used to define cancer subtypes or reflect the progression of a disease or cellular processes. Additionally, these methods exhibit poor effectiveness in noisy environments. To address these limitations, we propose the bucket trend-preserving (BTP) pattern, a novel generalization of existing patterns. And we have developed an algorithm, TransBic, to extract significant biclusters of BTP-patterns. Specifically, TransBic transforms the problem into identifying common multipartite acyclic tournament subdigraphs shared by distinct subsets of acyclic tournament digraphs derived from a given expression matrix. Compared with prominent tools, TransBic demonstrates superior performance in identifying biclusters of all non-row-constant patterns, especially under noise and data fluctuations. Furthermore, TransBic successfully identifies the most disease-related pathways for type 2 diabetes (T2D), colorectal cancer, hepatocellular carcinoma, and breast cancer, outperforming other tools in this regard. Different from previous generalizations, BTP-patterns capture specific up-regulation and down-regulation dynamics. Through targeted analysis of BTP-patterns in T2D expression data, TransBic uncovers biological processes affected by disease risk factors, extending the application of trend-preserving biclustering in expression data analysis.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11794469/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143188339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Polygenic prediction for underrepresented populations through transfer learning by utilizing genetic similarity shared with European populations. 利用与欧洲人群的遗传相似性,通过迁移学习对代表性不足的人群进行多基因预测。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbaf048
Yiyang Zhu, Wenying Chen, Kexuan Zhu, Yuxin Liu, Shuiping Huang, Ping Zeng

Because current genome-wide association studies are primarily conducted in individuals of European ancestry and information disparities exist among different populations, the polygenic score derived from Europeans thus exhibits poor transferability. Borrowing the idea of transfer learning, which enables the utilization of knowledge acquired from auxiliary samples to enhance learning capability in target samples, we propose transPGS, a novel polygenic score method, for genetic prediction in underrepresented populations by leveraging genetic similarity shared between the European and non-European populations while explaining the trans-ethnic difference in linkage disequilibrium (LD) and effect sizes. We demonstrate the usefulness and robustness of transPGS in elevated prediction accuracy via individual-level and summary-level simulations and apply it to seven continuous phenotypes and three diseases in the African, Chinese, and East Asian populations of the UK Biobank and Genetic Epidemiology Research Study on Adult Health and Aging cohorts. We further reveal that distinct LD and minor allele frequency patterns across ancestral groups are responsible for the dissatisfactory portability of PGS.

{"title":"Polygenic prediction for underrepresented populations through transfer learning by utilizing genetic similarity shared with European populations.","authors":"Yiyang Zhu, Wenying Chen, Kexuan Zhu, Yuxin Liu, Shuiping Huang, Ping Zeng","doi":"10.1093/bib/bbaf048","DOIUrl":"10.1093/bib/bbaf048","url":null,"abstract":"<p><p>Because current genome-wide association studies are primarily conducted in individuals of European ancestry and information disparities exist among different populations, the polygenic score derived from Europeans thus exhibits poor transferability. Borrowing the idea of transfer learning, which enables the utilization of knowledge acquired from auxiliary samples to enhance learning capability in target samples, we propose transPGS, a novel polygenic score method, for genetic prediction in underrepresented populations by leveraging genetic similarity shared between the European and non-European populations while explaining the trans-ethnic difference in linkage disequilibrium (LD) and effect sizes. We demonstrate the usefulness and robustness of transPGS in elevated prediction accuracy via individual-level and summary-level simulations and apply it to seven continuous phenotypes and three diseases in the African, Chinese, and East Asian populations of the UK Biobank and Genetic Epidemiology Research Study on Adult Health and Aging cohorts. We further reveal that distinct LD and minor allele frequency patterns across ancestral groups are responsible for the dissatisfactory portability of PGS.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11794457/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143188337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Do protein language models learn phylogeny?
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbaf047
Sanjana Tule, Gabriel Foley, Mikael Bodén

Deep machine learning demonstrates a capacity to uncover evolutionary relationships directly from protein sequences, in effect internalising notions inherent to classical phylogenetic tree inference. We connect these two paradigms by assessing the capacity of protein-based language models (pLMs) to discern phylogenetic relationships without being explicitly trained to do so. We evaluate ESM2, ProtTrans, and MSA-Transformer relative to classical phylogenetic methods, while also considering sequence insertions and deletions (indels) across 114 Pfam datasets. The largest ESM2 model tends to outperform other pLMs (including the multimodal ESM3) by recovering phylogenetic relationships among homologous protein sequences in both low- and high-gap settings. pLMs agree with conventional phylogenetic methods in general, but more so for protein families with fewer implied indels, highlighting indels as a key factor differentiating classical phylogenetics from pLMs. We find that pLMs preferentially capture broader as opposed to finer evolutionary relationships within a specific protein family, where ESM2 has a sweet spot for highly divergent sequences, at remote distance. Less than 10% of neurons are sufficient to broadly recapitulate classical phylogenetic distances; when used in isolation, the difference between the paradigms is further diminished. We show these neurons are polysemantic, shared among different homologous families but never fully overlapping. We highlight the potential of ESM2 as a complementary tool for phylogenetic analysis, especially when extending to remote homologs that are difficult to align and imply complex histories of insertions and deletions. Implementations of analyses are available at https://github.com/santule/pLMEvo.

{"title":"Do protein language models learn phylogeny?","authors":"Sanjana Tule, Gabriel Foley, Mikael Bodén","doi":"10.1093/bib/bbaf047","DOIUrl":"https://doi.org/10.1093/bib/bbaf047","url":null,"abstract":"<p><p>Deep machine learning demonstrates a capacity to uncover evolutionary relationships directly from protein sequences, in effect internalising notions inherent to classical phylogenetic tree inference. We connect these two paradigms by assessing the capacity of protein-based language models (pLMs) to discern phylogenetic relationships without being explicitly trained to do so. We evaluate ESM2, ProtTrans, and MSA-Transformer relative to classical phylogenetic methods, while also considering sequence insertions and deletions (indels) across 114 Pfam datasets. The largest ESM2 model tends to outperform other pLMs (including the multimodal ESM3) by recovering phylogenetic relationships among homologous protein sequences in both low- and high-gap settings. pLMs agree with conventional phylogenetic methods in general, but more so for protein families with fewer implied indels, highlighting indels as a key factor differentiating classical phylogenetics from pLMs. We find that pLMs preferentially capture broader as opposed to finer evolutionary relationships within a specific protein family, where ESM2 has a sweet spot for highly divergent sequences, at remote distance. Less than 10% of neurons are sufficient to broadly recapitulate classical phylogenetic distances; when used in isolation, the difference between the paradigms is further diminished. We show these neurons are polysemantic, shared among different homologous families but never fully overlapping. We highlight the potential of ESM2 as a complementary tool for phylogenetic analysis, especially when extending to remote homologs that are difficult to align and imply complex histories of insertions and deletions. Implementations of analyses are available at https://github.com/santule/pLMEvo.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143482209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring the potential of large language model-based chatbots in challenges of ribosome profiling data analysis: a review. 探索基于大型语言模型的聊天机器人在应对核糖体剖析数据分析挑战方面的潜力:综述。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-11-22 DOI: 10.1093/bib/bbae641
Zheyu Ding, Rong Wei, Jianing Xia, Yonghao Mu, Jiahuan Wang, Yingying Lin

Ribosome profiling (Ribo-seq) provides transcriptome-wide insights into protein synthesis dynamics, yet its analysis poses challenges, particularly for nonbioinformatics researchers. Large language model-based chatbots offer promising solutions by leveraging natural language processing. This review explores their convergence, highlighting opportunities for synergy. We discuss challenges in Ribo-seq analysis and how chatbots mitigate them, facilitating scientific discovery. Through case studies, we illustrate chatbots' potential contributions, including data analysis and result interpretation. Despite the absence of applied examples, existing software underscores the value of chatbots and the large language model. We anticipate their pivotal role in future Ribo-seq analysis, overcoming limitations. Challenges such as model bias and data privacy require attention, but emerging trends offer promise. The integration of large language models and Ribo-seq analysis holds immense potential for advancing translational regulation and gene expression understanding.

核糖体分析(Ribo-seq)提供了对蛋白质合成动态的全转录组洞察,但其分析带来了挑战,尤其是对非生物信息学研究人员而言。基于大语言模型的聊天机器人利用自然语言处理技术提供了有前景的解决方案。本综述探讨了两者的融合,强调了协同作用的机会。我们讨论了核糖序列分析中的挑战以及聊天机器人如何缓解这些挑战,从而促进科学发现。通过案例研究,我们说明了聊天机器人的潜在贡献,包括数据分析和结果解释。尽管缺乏应用实例,但现有软件强调了聊天机器人和大型语言模型的价值。我们预计聊天机器人将在未来的核糖序列分析中发挥关键作用,克服局限性。模型偏差和数据隐私等挑战需要关注,但新出现的趋势为我们带来了希望。大型语言模型与 Ribo-seq 分析的整合在促进转化调控和基因表达理解方面具有巨大潜力。
{"title":"Exploring the potential of large language model-based chatbots in challenges of ribosome profiling data analysis: a review.","authors":"Zheyu Ding, Rong Wei, Jianing Xia, Yonghao Mu, Jiahuan Wang, Yingying Lin","doi":"10.1093/bib/bbae641","DOIUrl":"10.1093/bib/bbae641","url":null,"abstract":"<p><p>Ribosome profiling (Ribo-seq) provides transcriptome-wide insights into protein synthesis dynamics, yet its analysis poses challenges, particularly for nonbioinformatics researchers. Large language model-based chatbots offer promising solutions by leveraging natural language processing. This review explores their convergence, highlighting opportunities for synergy. We discuss challenges in Ribo-seq analysis and how chatbots mitigate them, facilitating scientific discovery. Through case studies, we illustrate chatbots' potential contributions, including data analysis and result interpretation. Despite the absence of applied examples, existing software underscores the value of chatbots and the large language model. We anticipate their pivotal role in future Ribo-seq analysis, overcoming limitations. Challenges such as model bias and data privacy require attention, but emerging trends offer promise. The integration of large language models and Ribo-seq analysis holds immense potential for advancing translational regulation and gene expression understanding.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11638007/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142817162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Briefings in bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1