首页 > 最新文献

Frontiers in bioinformatics最新文献

英文 中文
Protein cleaver: an interactive web interface for in silico prediction and systematic annotation of protein digestion-derived peptides. 蛋白质切割器:一个交互式网络界面,用于蛋白质消化衍生肽的计算机预测和系统注释。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-09-04 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1576317
Grigorios Koulouras, Yingrong Xu

Proteolytic digestion is an essential process in mass spectrometry-based proteomics for converting proteins into peptides, hence crucial for protein identification and quantification. In a typical proteomics experiment, digestion reagents are selected without prior evaluation of their optimality for detecting proteins or peptides of interest, partly due to the lack of comprehensive and user-friendly predictive tools. In this work, we introduce Protein Cleaver, a web-based application that systematically assesses regions of proteins that are likely or unlikely to be identified, along with extensive sequence and structure annotation and visualization features. We showcase practical examples of Protein Cleaver's usability in drug discovery and highlight proteins that are typically difficult to detect using the most common proteolytic enzymes. We evaluate trypsin and chymotrypsin for identifying G-protein-coupled receptors and discover that chymotrypsin produces significantly more identifiable peptides than trypsin. We perform a bulk digestion analysis and assess 36 proteolytic enzymes for their ability to detect most of cysteine-containing peptides in the human proteome. We anticipate Protein Cleaver to be a valuable auxiliary tool for proteomics scientists.

蛋白质水解消化是基于质谱的蛋白质组学中将蛋白质转化为多肽的重要过程,因此对蛋白质鉴定和定量至关重要。在典型的蛋白质组学实验中,消化试剂的选择没有事先评估其检测感兴趣的蛋白质或肽的最佳性,部分原因是缺乏全面和用户友好的预测工具。在这项工作中,我们介绍了Protein Cleaver,这是一个基于web的应用程序,可以系统地评估可能或不可能被识别的蛋白质区域,以及广泛的序列和结构注释和可视化功能。我们展示了Protein Cleaver在药物发现中的可用性的实际例子,并强调了使用最常见的蛋白水解酶通常难以检测到的蛋白质。我们评估了胰蛋白酶和凝乳胰蛋白酶在识别g蛋白偶联受体方面的作用,发现凝乳胰蛋白酶比胰蛋白酶产生更多可识别的肽。我们进行了大量消化分析,并评估了36种蛋白水解酶检测人类蛋白质组中大多数含半胱氨酸肽的能力。我们期待Protein Cleaver成为蛋白质组学科学家的一个有价值的辅助工具。
{"title":"Protein cleaver: an interactive web interface for <i>in silico</i> prediction and systematic annotation of protein digestion-derived peptides.","authors":"Grigorios Koulouras, Yingrong Xu","doi":"10.3389/fbinf.2025.1576317","DOIUrl":"10.3389/fbinf.2025.1576317","url":null,"abstract":"<p><p>Proteolytic digestion is an essential process in mass spectrometry-based proteomics for converting proteins into peptides, hence crucial for protein identification and quantification. In a typical proteomics experiment, digestion reagents are selected without prior evaluation of their optimality for detecting proteins or peptides of interest, partly due to the lack of comprehensive and user-friendly predictive tools. In this work, we introduce Protein Cleaver, a web-based application that systematically assesses regions of proteins that are likely or unlikely to be identified, along with extensive sequence and structure annotation and visualization features. We showcase practical examples of Protein Cleaver's usability in drug discovery and highlight proteins that are typically difficult to detect using the most common proteolytic enzymes. We evaluate trypsin and chymotrypsin for identifying G-protein-coupled receptors and discover that chymotrypsin produces significantly more identifiable peptides than trypsin. We perform a bulk digestion analysis and assess 36 proteolytic enzymes for their ability to detect most of cysteine-containing peptides in the human proteome. We anticipate Protein Cleaver to be a valuable auxiliary tool for proteomics scientists.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1576317"},"PeriodicalIF":3.9,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12445168/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145115195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive sampling methods facilitate the determination of reliable dataset sizes for evidence-based modeling. 自适应采样方法有助于确定可靠的数据集大小,用于循证建模。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-09-04 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1528515
Tim Breitenbach, Thomas Dandekar

How can we be sure that there is sufficient data for our model, such that the predictions remain reliable on unseen data and the conclusions drawn from the fitted model would not vary significantly when using a different sample of the same size? We answer these and related questions through a systematic approach that examines the data size and the corresponding gains in accuracy. Assuming the sample data are drawn from a data pool with no data drift, the law of large numbers ensures that a model converges to its ground truth accuracy. Our approach provides a heuristic method for investigating the speed of convergence with respect to the size of the data sample. This relationship is estimated using sampling methods, which introduces a variation in the convergence speed results across different runs. To stabilize results-so that conclusions do not depend on the run-and extract the most reliable information encoded in the available data regarding convergence speed, the presented method automatically determines a sufficient number of repetitions to reduce sampling deviations below a predefined threshold, thereby ensuring the reliability of conclusions about the required amount of data.

我们如何确保我们的模型有足够的数据,使得预测在未知数据上仍然可靠,并且当使用相同大小的不同样本时,从拟合模型得出的结论不会有显着变化?我们通过一种系统的方法来回答这些和相关的问题,该方法检查数据大小和相应的准确性增益。假设样本数据是从没有数据漂移的数据池中提取的,那么大数定律可以确保模型收敛到其基本真值精度。我们的方法提供了一种启发式的方法来研究关于数据样本大小的收敛速度。这种关系是使用抽样方法估计的,这在不同的运行中引入了收敛速度结果的变化。为了稳定结果,使结论不依赖于运行,并提取有关收敛速度的可用数据中编码的最可靠的信息,所提出的方法自动确定足够的重复次数,以减少采样偏差低于预定义的阈值,从而确保有关所需数据量的结论的可靠性。
{"title":"Adaptive sampling methods facilitate the determination of reliable dataset sizes for evidence-based modeling.","authors":"Tim Breitenbach, Thomas Dandekar","doi":"10.3389/fbinf.2025.1528515","DOIUrl":"10.3389/fbinf.2025.1528515","url":null,"abstract":"<p><p>How can we be sure that there is sufficient data for our model, such that the predictions remain reliable on unseen data and the conclusions drawn from the fitted model would not vary significantly when using a different sample of the same size? We answer these and related questions through a systematic approach that examines the data size and the corresponding gains in accuracy. Assuming the sample data are drawn from a data pool with no data drift, the law of large numbers ensures that a model converges to its ground truth accuracy. Our approach provides a heuristic method for investigating the speed of convergence with respect to the size of the data sample. This relationship is estimated using sampling methods, which introduces a variation in the convergence speed results across different runs. To stabilize results-so that conclusions do not depend on the run-and extract the most reliable information encoded in the available data regarding convergence speed, the presented method automatically determines a sufficient number of repetitions to reduce sampling deviations below a predefined threshold, thereby ensuring the reliability of conclusions about the required amount of data.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1528515"},"PeriodicalIF":3.9,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12444090/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145115182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel linear indexing method for strings under all internal nodes in a suffix tree. 一种新颖的字符串在后缀树所有内部节点下的线性索引方法。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-09-04 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1577324
Anas Al-Okaily, Abdelghani Tbakhi

Suffix trees are fundamental data structures in stringology and have wide applications across various domains. In this work, we propose two linear-time algorithms for indexing strings under each internal node in a suffix tree while preserving the ability to track similarities and redundancies across different internal nodes. This is achieved through a novel tree structure derived from the suffix tree, along with new indexing concepts. The resulting indexes offer practical solutions in several areas, including DNA sequence analysis and approximate pattern matching.

后缀树是弦学中的基本数据结构,在各个领域都有广泛的应用。在这项工作中,我们提出了两种线性时间算法,用于在后缀树的每个内部节点下索引字符串,同时保留跟踪不同内部节点之间的相似性和冗余的能力。这是通过派生自后缀树的新颖树结构以及新的索引概念实现的。所得到的索引在DNA序列分析和近似模式匹配等多个领域提供了实用的解决方案。
{"title":"A novel linear indexing method for strings under all internal nodes in a suffix tree.","authors":"Anas Al-Okaily, Abdelghani Tbakhi","doi":"10.3389/fbinf.2025.1577324","DOIUrl":"10.3389/fbinf.2025.1577324","url":null,"abstract":"<p><p>Suffix trees are fundamental data structures in stringology and have wide applications across various domains. In this work, we propose two linear-time algorithms for indexing strings under each internal node in a suffix tree while preserving the ability to track similarities and redundancies across different internal nodes. This is achieved through a novel tree structure derived from the suffix tree, along with new indexing concepts. The resulting indexes offer practical solutions in several areas, including DNA sequence analysis and approximate pattern matching.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1577324"},"PeriodicalIF":3.9,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12443692/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145115160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editorial: Networks and graphs in biological data: current methods, opportunities and challenges. 编辑:生物数据中的网络和图形:当前的方法、机遇和挑战。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-09-02 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1685992
Derek L Thompson, Hsiang-Yun Wu, Christopher W Bartlett, William C Ray
{"title":"Editorial: Networks and graphs in biological data: current methods, opportunities and challenges.","authors":"Derek L Thompson, Hsiang-Yun Wu, Christopher W Bartlett, William C Ray","doi":"10.3389/fbinf.2025.1685992","DOIUrl":"10.3389/fbinf.2025.1685992","url":null,"abstract":"","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1685992"},"PeriodicalIF":3.9,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12437696/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145082633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Germline mutation profiling of breast cancer patients using a non-BRCA sequencing panel. 使用非brca测序面板的乳腺癌患者种系突变谱分析。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-09-02 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1620025
Sonar Soni Panigoro, Rafika Indah Paramita, Fadilah Fadilah, Septelia Inawati Wanandi, Aisyah Fitriannisa Prawiningrum, Linda Erlina, Wahyu Dian Utari, Ajeng Megawati Fajrin
{"title":"Germline mutation profiling of breast cancer patients using a non-BRCA sequencing panel.","authors":"Sonar Soni Panigoro, Rafika Indah Paramita, Fadilah Fadilah, Septelia Inawati Wanandi, Aisyah Fitriannisa Prawiningrum, Linda Erlina, Wahyu Dian Utari, Ajeng Megawati Fajrin","doi":"10.3389/fbinf.2025.1620025","DOIUrl":"10.3389/fbinf.2025.1620025","url":null,"abstract":"","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1620025"},"PeriodicalIF":3.9,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12436446/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145082588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
COC α DA - a fast and scalable algorithm for interatomic contact detection in proteins using C α distance matrices. COC α DA -一种基于C α距离矩阵的快速可扩展的蛋白质原子间接触检测算法。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-09-01 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1630078
Rafael Pereira Lemos, Diego Mariano, Sabrina De Azevedo Silveira, Raquel C de Melo-Minardi

Protein interatomic contacts, defined by spatial proximity and physicochemical complementarity at atomic resolution, are fundamental to characterizing molecular interactions and bonding. Methods for calculating contacts are generally categorized as cutoff-dependent, which rely on Euclidean distances, or cutoff-independent, which utilize Delaunay and Voronoi tessellations. While cutoff-dependent methods are recognized for their simplicity, completeness, and reliability, traditional implementations remain computationally expensive, posing significant scalability challenges in the current Big Data era of bioinformatics. Here, we introduce COC α DA (COntact search pruning by C α Distance Analysis), a Python-based command-line tool for improving search pruning in large-scale interatomic protein contact analysis using alpha-carbon (C α ) distance matrices. COC α DA detects intra- and inter-chain contacts, and classifies them into seven different types: hydrogen and disulfide bonds; hydrophobic effects; attractive, repulsive, and salt-bridge interactions; and aromatic stackings. To evaluate our tool, we compared it with three traditional approaches in the literature: all-against-all atom distance calculation ("brute-force"), static C α distance cutoff (SC), and Biopython's NeighborSearch class (NS). COC α DA demonstrated superior performance compared to the other methods, achieving on average 6x faster computation times than advanced data structures like k-d trees from NS, in addition to being simpler to implement and fully customizable. The presented tool facilitates exploratory and large-scale analyses of interatomic contacts in proteins in a simple and efficient manner, also enabling the integration of results with other tools and pipelines. The COC α DA tool is freely available at https://github.com/LBS-UFMG/COCaDA.

蛋白质原子间接触是由空间接近性和原子分辨率上的物理化学互补性定义的,是表征分子相互作用和键合的基础。计算接触的方法通常被分类为依赖于欧几里得距离的截止点,或利用Delaunay和Voronoi细分的截止点无关。虽然截止依赖方法因其简单、完整和可靠而得到认可,但传统的实现方法在计算上仍然昂贵,在当前生物信息学的大数据时代提出了重大的可扩展性挑战。在这里,我们介绍了COC α DA (COntact search pruning by C α Distance Analysis),这是一个基于python的命令行工具,用于改进使用α -碳(C α)距离矩阵进行大规模原子间蛋白质接触分析的搜索修剪。COC α DA检测链内和链间的接触,并将其分为7种不同的类型:氢键和二硫键;疏水效果;吸引、排斥和盐桥相互作用;还有芳香的堆叠。为了评估我们的工具,我们将其与文献中的三种传统方法进行了比较:全反全原子距离计算(“蛮力”)、静态C α距离切断(SC)和Biopython的NeighborSearch类(NS)。与其他方法相比,COC α DA表现出了优越的性能,实现的计算时间平均比来自NS的k-d树等高级数据结构快6倍,并且更容易实现和完全可定制。该工具以一种简单有效的方式促进了对蛋白质中原子间接触的探索性和大规模分析,也使结果能够与其他工具和管道集成。COC α DA工具可在https://github.com/LBS-UFMG/COCaDA免费获得。
{"title":"<ArticleTitle xmlns:ns0=\"http://www.w3.org/1998/Math/MathML\">COC <ns0:math><ns0:mrow><ns0:mi>α</ns0:mi></ns0:mrow> </ns0:math> DA - a fast and scalable algorithm for interatomic contact detection in proteins using C <ns0:math><ns0:mrow><ns0:mi>α</ns0:mi></ns0:mrow> </ns0:math> distance matrices.","authors":"Rafael Pereira Lemos, Diego Mariano, Sabrina De Azevedo Silveira, Raquel C de Melo-Minardi","doi":"10.3389/fbinf.2025.1630078","DOIUrl":"10.3389/fbinf.2025.1630078","url":null,"abstract":"<p><p>Protein interatomic contacts, defined by spatial proximity and physicochemical complementarity at atomic resolution, are fundamental to characterizing molecular interactions and bonding. Methods for calculating contacts are generally categorized as cutoff-dependent, which rely on Euclidean distances, or cutoff-independent, which utilize Delaunay and Voronoi tessellations. While cutoff-dependent methods are recognized for their simplicity, completeness, and reliability, traditional implementations remain computationally expensive, posing significant scalability challenges in the current Big Data era of bioinformatics. Here, we introduce COC <math><mrow><mi>α</mi></mrow> </math> DA (COntact search pruning by C <math><mrow><mi>α</mi></mrow> </math> Distance Analysis), a Python-based command-line tool for improving search pruning in large-scale interatomic protein contact analysis using alpha-carbon (C <math><mrow><mi>α</mi></mrow> </math> ) distance matrices. COC <math><mrow><mi>α</mi></mrow> </math> DA detects intra- and inter-chain contacts, and classifies them into seven different types: hydrogen and disulfide bonds; hydrophobic effects; attractive, repulsive, and salt-bridge interactions; and aromatic stackings. To evaluate our tool, we compared it with three traditional approaches in the literature: all-against-all atom distance calculation (\"brute-force\"), static C <math><mrow><mi>α</mi></mrow> </math> distance cutoff (SC), and Biopython's NeighborSearch class (NS). COC <math><mrow><mi>α</mi></mrow> </math> DA demonstrated superior performance compared to the other methods, achieving on average 6x faster computation times than advanced data structures like <i>k</i>-d trees from NS, in addition to being simpler to implement and fully customizable. The presented tool facilitates exploratory and large-scale analyses of interatomic contacts in proteins in a simple and efficient manner, also enabling the integration of results with other tools and pipelines. The COC <math><mrow><mi>α</mi></mrow> </math> DA tool is freely available at https://github.com/LBS-UFMG/COCaDA.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1630078"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12433948/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145076621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advancing bioinformatics capacity through Nextflow and nf-core: lessons from an early-to mid-career researchers-focused program at The Kids Research Institute Australia. 通过Nextflow和nf-core提升生物信息学能力:来自澳大利亚儿童研究所早期到中期职业研究人员的经验教训。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-08-29 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1610015
Patricia Agudelo-Romero, Talya Conradie, Jose Antonio Caparros-Martin, David Jimmy Martino, Anthony Kicic, Stephen Michael Stick, Christopher Hakkaart, Abhinav Sharma

The increasing adoption of high-throughput "omics" technologies has heightened the demand for standardized, scalable, and reproducible bioinformatics workflows. Nextflow and nf-core provide a robust framework for researchers, particularly early- and mid-career researchers (EMCRs), to navigate complex data analysis. At The Kids Research Institute Australia, we implemented a structured approach to bioinformatics capacity building using these tools. This perspective presents nine practical rules derived from lessons learnt, which facilitated the successful adoption of Nextflow and nf-core, addressing implementation challenges, knowledge gaps, resource allocation, and community support. Our experience serves as a guide for institutions aiming to establish sustainable bioinformatics capabilities and empower EMCRs.

越来越多地采用高通量“组学”技术,提高了对标准化、可扩展和可重复的生物信息学工作流程的需求。Nextflow和nf-core为研究人员,特别是职业生涯早期和中期的研究人员(emcr)提供了一个强大的框架,以导航复杂的数据分析。在澳大利亚儿童研究所,我们使用这些工具实施了一种结构化的方法来进行生物信息学能力建设。这一观点提出了从经验教训中得出的9条实用规则,这些规则促进了Nextflow和nf-core的成功采用,解决了实施挑战、知识差距、资源分配和社区支持。我们的经验可以作为旨在建立可持续生物信息学能力和授权emcr的机构的指南。
{"title":"Advancing bioinformatics capacity through Nextflow and nf-core: lessons from an early-to mid-career researchers-focused program at The Kids Research Institute Australia.","authors":"Patricia Agudelo-Romero, Talya Conradie, Jose Antonio Caparros-Martin, David Jimmy Martino, Anthony Kicic, Stephen Michael Stick, Christopher Hakkaart, Abhinav Sharma","doi":"10.3389/fbinf.2025.1610015","DOIUrl":"10.3389/fbinf.2025.1610015","url":null,"abstract":"<p><p>The increasing adoption of high-throughput \"omics\" technologies has heightened the demand for standardized, scalable, and reproducible bioinformatics workflows. Nextflow and nf-core provide a robust framework for researchers, particularly early- and mid-career researchers (EMCRs), to navigate complex data analysis. At The Kids Research Institute Australia, we implemented a structured approach to bioinformatics capacity building using these tools. This perspective presents nine practical rules derived from lessons learnt, which facilitated the successful adoption of Nextflow and nf-core, addressing implementation challenges, knowledge gaps, resource allocation, and community support. Our experience serves as a guide for institutions aiming to establish sustainable bioinformatics capabilities and empower EMCRs.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1610015"},"PeriodicalIF":3.9,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12425987/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145066651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying novel therapeutic targets for non-alcoholic fatty liver disease using bioinformatics approaches: from drug repositioning to traditional Chinese medicine. 利用生物信息学方法确定非酒精性脂肪肝的新治疗靶点:从药物重新定位到中药。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-08-26 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1613985
Jingmin Zhang, Tianwei Meng, Weiqi Gao, Xinghua Li, Juan Xu

Background: Non-alcoholic fatty liver disease (NAFLD) is a prevalent condition with limited effective treatments, necessitating novel therapeutic strategies. Bioinformatics offers a promising approach to identify new targets by analyzing gene expression and drug interactions.

Objective: This study aims to identify novel therapeutic targets for NAFLD through bioinformatics, focusing on drug repositioning and traditional Chinese medicine (TCM) components.

Methods: Three NAFLD-related gene expression datasets (GSE260666, GSE126848, GSE135251) were analyzed to identify differentially expressed genes. Protein-protein interaction networks were constructed using STRING and visualized with Cytoscape. Pathway enrichment analysis was performed, and drug-gene interactions were explored using the DGIdb database. TCM components were screened via the HERB database, with molecular docking conducted to assess binding affinities.

Results: Key hub genes (CXCL2, CDKN1A, TNFRSF12A, HGFAC) were identified, with significant enrichment in cell proliferation and PI3K-Akt signaling pathways. Cyclosporine emerged as a potential repurposed drug, while TCM components (curcumin, resveratrol, berberine) showed strong binding affinities to NAFLD targets.

Conclusion: Cyclosporine and TCM compounds are promising candidates for NAFLD treatment, warranting further experimental validation to confirm their therapeutic potential.

背景:非酒精性脂肪性肝病(NAFLD)是一种普遍的疾病,有效的治疗方法有限,需要新的治疗策略。生物信息学为通过分析基因表达和药物相互作用来识别新靶点提供了一种很有前途的方法。目的:本研究旨在通过生物信息学方法,从药物重新定位和中药成分等方面寻找NAFLD新的治疗靶点。方法:分析3个nafld相关基因表达数据集(GSE260666、GSE126848、GSE135251),鉴定差异表达基因。利用STRING构建蛋白-蛋白相互作用网络,并用Cytoscape进行可视化。进行途径富集分析,并使用DGIdb数据库探索药物-基因相互作用。通过HERB数据库筛选中药成分,并进行分子对接以评估结合亲和力。结果:鉴定出关键枢纽基因CXCL2、CDKN1A、TNFRSF12A、HGFAC,在细胞增殖和PI3K-Akt信号通路中显著富集。环孢素作为一种潜在的再用途药物出现,而中药成分(姜黄素、白藜芦醇、小檗碱)与NAFLD靶点表现出很强的结合亲和力。结论:环孢素和中药复方是治疗NAFLD的有希望的候选药物,需要进一步的实验验证以证实其治疗潜力。
{"title":"Identifying novel therapeutic targets for non-alcoholic fatty liver disease using bioinformatics approaches: from drug repositioning to traditional Chinese medicine.","authors":"Jingmin Zhang, Tianwei Meng, Weiqi Gao, Xinghua Li, Juan Xu","doi":"10.3389/fbinf.2025.1613985","DOIUrl":"10.3389/fbinf.2025.1613985","url":null,"abstract":"<p><strong>Background: </strong>Non-alcoholic fatty liver disease (NAFLD) is a prevalent condition with limited effective treatments, necessitating novel therapeutic strategies. Bioinformatics offers a promising approach to identify new targets by analyzing gene expression and drug interactions.</p><p><strong>Objective: </strong>This study aims to identify novel therapeutic targets for NAFLD through bioinformatics, focusing on drug repositioning and traditional Chinese medicine (TCM) components.</p><p><strong>Methods: </strong>Three NAFLD-related gene expression datasets (GSE260666, GSE126848, GSE135251) were analyzed to identify differentially expressed genes. Protein-protein interaction networks were constructed using STRING and visualized with Cytoscape. Pathway enrichment analysis was performed, and drug-gene interactions were explored using the DGIdb database. TCM components were screened via the HERB database, with molecular docking conducted to assess binding affinities.</p><p><strong>Results: </strong>Key hub genes (CXCL2, CDKN1A, TNFRSF12A, HGFAC) were identified, with significant enrichment in cell proliferation and PI3K-Akt signaling pathways. Cyclosporine emerged as a potential repurposed drug, while TCM components (curcumin, resveratrol, berberine) showed strong binding affinities to NAFLD targets.</p><p><strong>Conclusion: </strong>Cyclosporine and TCM compounds are promising candidates for NAFLD treatment, warranting further experimental validation to confirm their therapeutic potential.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1613985"},"PeriodicalIF":3.9,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12417881/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145042432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using reinforcement learning in genome assembly: in-depth analysis of a Q-learning assembler. 在基因组组装中使用强化学习:对q学习组装器的深入分析。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-08-20 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1633623
Kleber Padovani, Rafael Cabral Borges, Roberto Xavier, André Carlos Carvalho, Anna Reali, Annie Chateau, Ronnie Alves

Genome assembly remains an unsolved problem, and de novo strategies (i.e., those run without a reference) are relevant but computationally complex tasks in genomics. Although de novo assemblers have been previously successfully applied in genomic projects, there is still no "best assembler", and the choice and setup of assemblers still rely on bioinformatics experts. Thus, as with other computationally complex problems, machine learning has emerged as an alternative (or complementary) way to develop accurate, fast and autonomous assemblers. Reinforcement learning has proven promising for solving complex activities without supervision, such as games, and there is a pressing need to understand the limits of this approach to "real-life" problems, such as the DNA fragment assembly problem. In this study, we analyze the boundaries of applying machine learning via reinforcement learning (RL) for genome assembly. We expand upon the previous approach found in the literature to solve this problem by carefully exploring the learning aspects of the proposed intelligent agent, which uses the Q-learning algorithm. We improved the reward system and optimized the exploration of the state space based on pruning and in collaboration with evolutionary computing (>300% improvement). We tested the new approaches on 23 environments. Our results suggest the unsatisfactory performance of the approaches, both in terms of assembly quality and execution time, providing strong evidence for the poor scalability of the studied reinforcement learning approaches to the genome assembly problem. Finally, we discuss the existing proposal, complemented by attempts at improvement that also proved insufficient. In doing so, we contribute to the scientific community by offering a clear mapping of the limitations and challenges that should be taken into account in future attempts to apply reinforcement learning to genome assembly.

基因组组装仍然是一个未解决的问题,而de novo策略(即那些在没有参考的情况下运行的策略)是基因组学中相关但计算复杂的任务。尽管de novo组装器已经成功地应用于基因组项目中,但目前还没有“最佳组装器”,组装器的选择和设置仍然依赖于生物信息学专家。因此,与其他计算复杂的问题一样,机器学习已经成为开发准确、快速和自主组装器的一种替代(或补充)方式。强化学习已经被证明可以在没有监督的情况下解决复杂的活动,比如游戏,并且迫切需要了解这种方法在“现实生活”问题上的局限性,比如DNA片段组装问题。在本研究中,我们分析了通过强化学习(RL)在基因组组装中应用机器学习的边界。我们扩展了先前在文献中发现的方法,通过仔细探索所提出的智能代理的学习方面来解决这个问题,该智能代理使用q -学习算法。我们改进了奖励系统,优化了基于修剪的状态空间探索,并与进化计算协作(改进了300%)。我们在23个环境中测试了这些新方法。我们的研究结果表明,这些方法在组装质量和执行时间方面的性能都不令人满意,这为所研究的基因组组装问题的强化学习方法的可扩展性差提供了强有力的证据。最后,我们讨论现有的建议,并加以改进的尝试,但这些尝试也证明是不够的。通过这样做,我们为科学界提供了一个清晰的局限性和挑战的地图,这些局限性和挑战应该在未来尝试将强化学习应用于基因组组装时加以考虑。
{"title":"Using reinforcement learning in genome assembly: in-depth analysis of a Q-learning assembler.","authors":"Kleber Padovani, Rafael Cabral Borges, Roberto Xavier, André Carlos Carvalho, Anna Reali, Annie Chateau, Ronnie Alves","doi":"10.3389/fbinf.2025.1633623","DOIUrl":"10.3389/fbinf.2025.1633623","url":null,"abstract":"<p><p>Genome assembly remains an unsolved problem, and de novo strategies (i.e., those run without a reference) are relevant but computationally complex tasks in genomics. Although de novo assemblers have been previously successfully applied in genomic projects, there is still no \"best assembler\", and the choice and setup of assemblers still rely on bioinformatics experts. Thus, as with other computationally complex problems, machine learning has emerged as an alternative (or complementary) way to develop accurate, fast and autonomous assemblers. Reinforcement learning has proven promising for solving complex activities without supervision, such as games, and there is a pressing need to understand the limits of this approach to \"real-life\" problems, such as the DNA fragment assembly problem. In this study, we analyze the boundaries of applying machine learning via reinforcement learning (RL) for genome assembly. We expand upon the previous approach found in the literature to solve this problem by carefully exploring the learning aspects of the proposed intelligent agent, which uses the Q-learning algorithm. We improved the reward system and optimized the exploration of the state space based on pruning and in collaboration with evolutionary computing (>300% improvement). We tested the new approaches on 23 environments. Our results suggest the unsatisfactory performance of the approaches, both in terms of assembly quality and execution time, providing strong evidence for the poor scalability of the studied reinforcement learning approaches to the genome assembly problem. Finally, we discuss the existing proposal, complemented by attempts at improvement that also proved insufficient. In doing so, we contribute to the scientific community by offering a clear mapping of the limitations and challenges that should be taken into account in future attempts to apply reinforcement learning to genome assembly.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1633623"},"PeriodicalIF":3.9,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12405310/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145001993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Novel deep learning for multi-class classification of Alzheimer's in disability using MRI datasets. 利用MRI数据集对残疾的阿尔茨海默病进行多类分类的新型深度学习。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-08-20 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1567219
Sumaiya Binte Shahid, Maleeha Kaikaus, Md Hasanul Kabir, Mohammad Abu Yousuf, A K M Azad, A S Al-Moisheer, Naif Alotaibi, Salem A Alyami, Touhid Bhuiyan, Mohammad Ali Moni

Introduction: Alzheimer's disease (AD) is one of the most common neurodegenerative disabilities that often leads to memory loss, confusion, difficulty in language and trouble with motor coordination. Although several machine learning (ML) and deep learning (DL) algorithms have been utilized to identify Alzheimer's disease (AD) from MRI scans, precise classification of AD categories remains challenging as neighbouring categories share common features.

Methods: This study proposes transfer learning-based methods for extracting features from MRI scans for multi-class classification of different AD categories. Four transfer learning-based feature extractors, namely, ResNet152V2, VGG16, InceptionV3, and MobileNet have been employed on two publicly available datasets (i.e., ADNI and OASIS) and a Merged dataset combining ADNI and OASIS, each having four categories: Moderate Demented (MoD), Mild Demented (MD), Very Mild Demented (VMD), and Non Demented (ND).

Results: Results suggest the Modified ResNet152V2 as the optimal feature extractor among the four transfer learning methods. Next, by utilizing the modified ResNet152V2 as a feature extractor, a Convolutional Neural Network based model, namely, the 'IncepRes', is proposed by fusing the Inception and ResNet architectures for multiclass classification of AD categories. The results indicate that our proposed model achieved a standard accuracy of 96.96%, 98.35% and 97.13% for ADNI, OASIS, and Merged datasets, respectively, outperforming other competing DL structures.

Discussion: We hope that our proposed framework may automate the precise classifications of various AD categories, and thereby can offer the prompt management and treatment of cognitive and functional impairments associated with AD.

简介:阿尔茨海默病(AD)是最常见的神经退行性残疾之一,通常会导致记忆丧失、思维混乱、语言困难和运动协调困难。尽管已有几种机器学习(ML)和深度学习(DL)算法用于从MRI扫描中识别阿尔茨海默病(AD),但由于邻近类别具有共同特征,因此对AD类别进行精确分类仍然具有挑战性。方法:本研究提出了基于迁移学习的MRI扫描特征提取方法,用于不同AD类别的多类分类。四个基于迁移学习的特征提取器,即ResNet152V2, VGG16, InceptionV3和MobileNet,已被用于两个公开可用的数据集(即ADNI和OASIS)和一个合并的数据集,每个数据集都有四个类别:中度痴呆(MoD),轻度痴呆(MD),非常轻度痴呆(VMD)和非痴呆(ND)。结果:改进的ResNet152V2是四种迁移学习方法中最优的特征提取器。接下来,利用改进的ResNet152V2作为特征提取器,融合Inception和ResNet架构,提出了一种基于卷积神经网络的AD多类分类模型IncepRes。结果表明,我们提出的模型在ADNI、OASIS和合并数据集上分别达到了96.96%、98.35%和97.13%的标准准确率,优于其他竞争的DL结构。讨论:我们希望我们提出的框架可以自动化各种AD类别的精确分类,从而可以提供与AD相关的认知和功能障碍的及时管理和治疗。
{"title":"Novel deep learning for multi-class classification of Alzheimer's in disability using MRI datasets.","authors":"Sumaiya Binte Shahid, Maleeha Kaikaus, Md Hasanul Kabir, Mohammad Abu Yousuf, A K M Azad, A S Al-Moisheer, Naif Alotaibi, Salem A Alyami, Touhid Bhuiyan, Mohammad Ali Moni","doi":"10.3389/fbinf.2025.1567219","DOIUrl":"10.3389/fbinf.2025.1567219","url":null,"abstract":"<p><strong>Introduction: </strong>Alzheimer's disease (AD) is one of the most common neurodegenerative disabilities that often leads to memory loss, confusion, difficulty in language and trouble with motor coordination. Although several machine learning (ML) and deep learning (DL) algorithms have been utilized to identify Alzheimer's disease (AD) from MRI scans, precise classification of AD categories remains challenging as neighbouring categories share common features.</p><p><strong>Methods: </strong>This study proposes transfer learning-based methods for extracting features from MRI scans for multi-class classification of different AD categories. Four transfer learning-based feature extractors, namely, ResNet152V2, VGG16, InceptionV3, and MobileNet have been employed on two publicly available datasets (i.e., ADNI and OASIS) and a Merged dataset combining ADNI and OASIS, each having four categories: Moderate Demented (MoD), Mild Demented (MD), Very Mild Demented (VMD), and Non Demented (ND).</p><p><strong>Results: </strong>Results suggest the Modified ResNet152V2 as the optimal feature extractor among the four transfer learning methods. Next, by utilizing the modified ResNet152V2 as a feature extractor, a Convolutional Neural Network based model, namely, the 'IncepRes', is proposed by fusing the Inception and ResNet architectures for multiclass classification of AD categories. The results indicate that our proposed model achieved a standard accuracy of 96.96%, 98.35% and 97.13% for ADNI, OASIS, and Merged datasets, respectively, outperforming other competing DL structures.</p><p><strong>Discussion: </strong>We hope that our proposed framework may automate the precise classifications of various AD categories, and thereby can offer the prompt management and treatment of cognitive and functional impairments associated with AD.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1567219"},"PeriodicalIF":3.9,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12405159/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145002021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Frontiers in bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1