首页 > 最新文献

BMC Bioinformatics最新文献

英文 中文
SKiM-GPT: combining biomedical literature-based discovery with large language model hypothesis evaluation. skam - gpt:结合基于生物医学文献的发现与大语言模型假设评估。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-12-17 DOI: 10.1186/s12859-025-06350-7
Jack Freeman, Robert J Millikin, Leo Xu, Ishaan Sharma, Bethany Moore, Cannon Lock, Kevin Shine George, Aviral Bal, Chitrasen Mohanty, Ron Stewart
{"title":"SKiM-GPT: combining biomedical literature-based discovery with large language model hypothesis evaluation.","authors":"Jack Freeman, Robert J Millikin, Leo Xu, Ishaan Sharma, Bethany Moore, Cannon Lock, Kevin Shine George, Aviral Bal, Chitrasen Mohanty, Ron Stewart","doi":"10.1186/s12859-025-06350-7","DOIUrl":"10.1186/s12859-025-06350-7","url":null,"abstract":"","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"16"},"PeriodicalIF":3.3,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12829140/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145773432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pairwise ratio transformation of gene expression data leads to improved checkpoint response prediction in lung cancer patients. 基因表达数据的两两比值转化可改善肺癌患者的检查点反应预测。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-12-12 DOI: 10.1186/s12859-025-06332-9
Jacob Pfeil, Liqian Ma, Hin Ching Lo, Tolga Turan, R Tyler McLaughlin, Xu Shi, Severiano Villarruel, Stephen Wilson, Xi Zhao, Josue Samayoa, Kyle Halliwill
{"title":"Pairwise ratio transformation of gene expression data leads to improved checkpoint response prediction in lung cancer patients.","authors":"Jacob Pfeil, Liqian Ma, Hin Ching Lo, Tolga Turan, R Tyler McLaughlin, Xu Shi, Severiano Villarruel, Stephen Wilson, Xi Zhao, Josue Samayoa, Kyle Halliwill","doi":"10.1186/s12859-025-06332-9","DOIUrl":"10.1186/s12859-025-06332-9","url":null,"abstract":"","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"15"},"PeriodicalIF":3.3,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12809930/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145740262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SimGBS: a rapid method for simulating large-scale genotyping-by-sequencing data. SimGBS:通过测序数据模拟大规模基因分型的快速方法。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-12-09 DOI: 10.1186/s12859-025-06343-6
Jie Kang, Melanie K Hess, Ken G Dodds, Rudiger Brauning, John C McEwan, Barry J Foote, Judy F Foote, Agnieszka Konkolewska, Shannon M Clarke, Andrew S Hess
{"title":"SimGBS: a rapid method for simulating large-scale genotyping-by-sequencing data.","authors":"Jie Kang, Melanie K Hess, Ken G Dodds, Rudiger Brauning, John C McEwan, Barry J Foote, Judy F Foote, Agnieszka Konkolewska, Shannon M Clarke, Andrew S Hess","doi":"10.1186/s12859-025-06343-6","DOIUrl":"10.1186/s12859-025-06343-6","url":null,"abstract":"","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"12"},"PeriodicalIF":3.3,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12801997/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145712957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Subset selection based fusion for biomedical information retrieval tasks. 基于子集选择的生物医学信息检索融合。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-12-09 DOI: 10.1186/s12859-025-06313-y
Jiahui Sun, Shengli Wu, Xiangjun Shen, Chris Nugent, Hu Lu

To improve the effectiveness and efficiency of biomedical information retrieval by proposing ranking-based methods for selecting an optimal subset of retrieval systems for data fusion, we propose three ranking-based subset selection methods SFS (Sequential Forward Search), D&P (Diversity & Performance), and P&D (Performance & Diversity). These methods were applied in combination with the Reciprocal Rank Fusion technique. Experiments were conducted on four medical datasets from TREC, using between 62 and 125 candidate retrieval systems, and selecting up to 15 for fusion. The proposed subset selection methods significantly improved retrieval performance. Fusing the selected systems using RRF yielded improvements ranging from 10% to over 60% compared to the best individual retrieval system across the datasets. They also outperform the state-of-the-art technology by a large margin. In summary, our subset selection approach offers a practical and cost-efficient solution for biomedical information retrieval, achieving substantial performance gains while reducing computational overhead.

为了提高生物医学信息检索的有效性和效率,提出了基于排序的检索系统子集选择方法SFS (Sequential Forward Search)、D&P (Diversity & Performance)和P&D (Performance & Diversity)三种子集选择方法。这些方法与秩倒融合技术相结合。实验在来自TREC的4个医学数据集上进行,使用62到125个候选检索系统,并选择多达15个进行融合。提出的子集选择方法显著提高了检索性能。与跨数据集的最佳单个检索系统相比,使用RRF融合选定的系统产生了10%到60%以上的改进。它们的性能也远远超过了最先进的技术。总之,我们的子集选择方法为生物医学信息检索提供了一种实用且经济高效的解决方案,在减少计算开销的同时实现了显著的性能提升。
{"title":"Subset selection based fusion for biomedical information retrieval tasks.","authors":"Jiahui Sun, Shengli Wu, Xiangjun Shen, Chris Nugent, Hu Lu","doi":"10.1186/s12859-025-06313-y","DOIUrl":"10.1186/s12859-025-06313-y","url":null,"abstract":"<p><p>To improve the effectiveness and efficiency of biomedical information retrieval by proposing ranking-based methods for selecting an optimal subset of retrieval systems for data fusion, we propose three ranking-based subset selection methods SFS (Sequential Forward Search), D&P (Diversity & Performance), and P&D (Performance & Diversity). These methods were applied in combination with the Reciprocal Rank Fusion technique. Experiments were conducted on four medical datasets from TREC, using between 62 and 125 candidate retrieval systems, and selecting up to 15 for fusion. The proposed subset selection methods significantly improved retrieval performance. Fusing the selected systems using RRF yielded improvements ranging from 10% to over 60% compared to the best individual retrieval system across the datasets. They also outperform the state-of-the-art technology by a large margin. In summary, our subset selection approach offers a practical and cost-efficient solution for biomedical information retrieval, achieving substantial performance gains while reducing computational overhead.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"11"},"PeriodicalIF":3.3,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12801601/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145712972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Density-reducing Jaccard estimators for sketch-based long read applications. 基于草图的长读应用的降低密度的Jaccard估计器。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-12-09 DOI: 10.1186/s12859-025-06333-8
Tazin Rahman, Ananth Kalyanaraman

Sequence sketching-a class of techniques aimed at generating compact representations of longer sequences-has become widely used in numerous long read applications, including assembly and mapping. Instead of comparing sequences, sketches allow us to sample from a subspace of k-mers and use those samples for comparison, saving both time and memory in the end application. One of the important metrics that determines the performance of a sketch is the sketch density, which refers to the fraction of the sampled k-mers retained by the sketch. While a lower density is preferable for space considerations, it could also impact the sensitivity of the mapping process. In this work, we visit the problem of reducing sketch density while preserving accuracy in the context of long-read mapping. We present an efficient algorithm called MHsketch that uses Jaccard estimators to reduce sketch density in mapping applications. Starting from an initial ground set of k-mers generated through a sketching method of choice, the approach applies MinHashing to derive a smaller sketch and uses that for mapping. In addition to reducing density, this approach is also easily parallelizable. To demonstrate the efficacy of our method, we modified a recently developed long read mapping tool (JEM-mapper) to adopt different sketching schemes, including Syncmer and Strobemer, and incorporated MHsketch to evaluate the effectiveness of downsampling. Experimental evaluation demonstrates the ability of our approach to significantly reduce density and reap performance benefits from it. In particular, our experiments reveal that MHsketch (syncmers) achieves high-quality mapping while reducing time-to-solution (speedups between [Formula: see text] to [Formula: see text]), and drastically reducing memory usage ([Formula: see text] savings) compared to state-of-the-art tools. Availability: https://github.com/TazinRahman1105050/MHsketch .

序列草图-一类旨在生成较长序列的紧凑表示的技术-已广泛应用于许多长读取应用,包括汇编和映射。而不是比较序列,草图允许我们从k-mers的子空间采样并使用这些样本进行比较,在最终应用程序中节省时间和内存。决定草图性能的一个重要指标是草图密度,它指的是草图保留的采样k-mers的比例。虽然出于空间考虑,较低的密度是可取的,但它也可能影响映射过程的灵敏度。在这项工作中,我们访问了在长读映射的背景下降低草图密度同时保持准确性的问题。我们提出了一种称为MHsketch的高效算法,该算法使用Jaccard估计器来减少映射应用中的草图密度。该方法从通过选择的草图方法生成的k-mers的初始基础集开始,应用MinHashing来派生较小的草图并将其用于映射。除了降低密度之外,这种方法也很容易并行化。为了证明我们的方法的有效性,我们修改了最近开发的长读映射工具(JEM-mapper)来采用不同的草图方案,包括Syncmer和Strobemer,并结合MHsketch来评估下采样的有效性。实验评估证明了我们的方法能够显著降低密度并从中获得性能优势。特别是,我们的实验表明,与最先进的工具相比,MHsketch (syncmers)实现了高质量的映射,同时减少了到解决方案的时间(从[公式:参见文本]到[公式:参见文本]之间的加速),并大大减少了内存使用([公式:参见文本]节省)。可用性:https://github.com/TazinRahman1105050/MHsketch。
{"title":"Density-reducing Jaccard estimators for sketch-based long read applications.","authors":"Tazin Rahman, Ananth Kalyanaraman","doi":"10.1186/s12859-025-06333-8","DOIUrl":"10.1186/s12859-025-06333-8","url":null,"abstract":"<p><p>Sequence sketching-a class of techniques aimed at generating compact representations of longer sequences-has become widely used in numerous long read applications, including assembly and mapping. Instead of comparing sequences, sketches allow us to sample from a subspace of k-mers and use those samples for comparison, saving both time and memory in the end application. One of the important metrics that determines the performance of a sketch is the sketch density, which refers to the fraction of the sampled k-mers retained by the sketch. While a lower density is preferable for space considerations, it could also impact the sensitivity of the mapping process. In this work, we visit the problem of reducing sketch density while preserving accuracy in the context of long-read mapping. We present an efficient algorithm called MHsketch that uses Jaccard estimators to reduce sketch density in mapping applications. Starting from an initial ground set of k-mers generated through a sketching method of choice, the approach applies MinHashing to derive a smaller sketch and uses that for mapping. In addition to reducing density, this approach is also easily parallelizable. To demonstrate the efficacy of our method, we modified a recently developed long read mapping tool (JEM-mapper) to adopt different sketching schemes, including Syncmer and Strobemer, and incorporated MHsketch to evaluate the effectiveness of downsampling. Experimental evaluation demonstrates the ability of our approach to significantly reduce density and reap performance benefits from it. In particular, our experiments reveal that MHsketch (syncmers) achieves high-quality mapping while reducing time-to-solution (speedups between [Formula: see text] to [Formula: see text]), and drastically reducing memory usage ([Formula: see text] savings) compared to state-of-the-art tools. Availability: https://github.com/TazinRahman1105050/MHsketch .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"5"},"PeriodicalIF":3.3,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12781685/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145712912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
X-cross/over: a web tool for graph-based estimation of meiotic crossover events in plants. X-cross/over:一个基于图形估计植物减数分裂交叉事件的网络工具。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-12-08 DOI: 10.1186/s12859-025-06334-7
Szabolcs Makai, Diána Makai, Erika Chonata-Jiménez, Ildikó Karsai, Péter Mikó, Adél Sepsi, András Cseh

Background: Crossovers are essential for genome stability and genetic diversity, yet in plants they occur infrequently, typically restricted to only one to three per chromosome pair. Genotyping approaches such as SNP arrays or genotyping-by-sequencing (GBS) enable high-resolution detection of crossover frequency, a critical step for elucidating the mechanisms that regulate meiotic recombination and for exploiting it in plant breeding. Despite their widespread use and the availability of highly reproducible marker sets, user-friendly tools for reliable recombination analysis remain scarce.

Results: Here we present X-cross/over, a web-based platform that applies a graph-theoretical algorithm to estimate crossover frequencies from SNP datasets in HapMap format. The platform was evaluated using publicly available barley backcross inbred populations and newly developed wheat doubled haploid lines. Across both datasets, X-cross/over detected crossover events with high accuracy and sensitivity, yielding results consistent with published genotyping and cytological analyses. Importantly, the tool produces outcomes comparable to expert analyses while remaining accessible to users without bioinformatics expertise.

Conclusions: X-cross/over provides a consistent and transparent framework for detecting crossover sites and quantifying their frequency. Implemented in a platform-independent environment, the application is freely available at https://insilicolabdesk.atk.kinin.hu , making it a versatile resource for exploring the genetic and epigenetic regulation of meiotic recombination across plant species.

背景:杂交对基因组稳定性和遗传多样性至关重要,但在植物中很少发生,通常每对染色体只有1到3个。SNP阵列或基因分型测序(GBS)等基因分型方法可以实现高分辨率的交叉频率检测,这是阐明减数分裂重组调控机制和在植物育种中利用它的关键步骤。尽管它们的广泛使用和高度可重复的标记集的可用性,用户友好的工具,可靠的重组分析仍然很少。在这里,我们提出了X-cross/over,这是一个基于网络的平台,它应用图理论算法来估计HapMap格式的SNP数据集的交叉频率。利用公开的大麦回交自交系和新开发的小麦双单倍体系对该平台进行了评价。在这两个数据集中,X-cross/over检测到交叉事件具有很高的准确性和灵敏度,产生的结果与已发表的基因分型和细胞学分析一致。重要的是,该工具产生的结果与专家分析相当,同时仍然可供没有生物信息学专业知识的用户使用。结论:X-cross/over为检测交叉位点和量化其频率提供了一致和透明的框架。该应用程序在独立于平台的环境中实现,可在https://insilicolabdesk.atk.kinin.hu上免费获得,使其成为探索植物物种减数分裂重组的遗传和表观遗传调控的多功能资源。
{"title":"X-cross/over: a web tool for graph-based estimation of meiotic crossover events in plants.","authors":"Szabolcs Makai, Diána Makai, Erika Chonata-Jiménez, Ildikó Karsai, Péter Mikó, Adél Sepsi, András Cseh","doi":"10.1186/s12859-025-06334-7","DOIUrl":"10.1186/s12859-025-06334-7","url":null,"abstract":"<p><strong>Background: </strong>Crossovers are essential for genome stability and genetic diversity, yet in plants they occur infrequently, typically restricted to only one to three per chromosome pair. Genotyping approaches such as SNP arrays or genotyping-by-sequencing (GBS) enable high-resolution detection of crossover frequency, a critical step for elucidating the mechanisms that regulate meiotic recombination and for exploiting it in plant breeding. Despite their widespread use and the availability of highly reproducible marker sets, user-friendly tools for reliable recombination analysis remain scarce.</p><p><strong>Results: </strong>Here we present X-cross/over, a web-based platform that applies a graph-theoretical algorithm to estimate crossover frequencies from SNP datasets in HapMap format. The platform was evaluated using publicly available barley backcross inbred populations and newly developed wheat doubled haploid lines. Across both datasets, X-cross/over detected crossover events with high accuracy and sensitivity, yielding results consistent with published genotyping and cytological analyses. Importantly, the tool produces outcomes comparable to expert analyses while remaining accessible to users without bioinformatics expertise.</p><p><strong>Conclusions: </strong>X-cross/over provides a consistent and transparent framework for detecting crossover sites and quantifying their frequency. Implemented in a platform-independent environment, the application is freely available at https://insilicolabdesk.atk.kinin.hu , making it a versatile resource for exploring the genetic and epigenetic regulation of meiotic recombination across plant species.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"21"},"PeriodicalIF":3.3,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12849401/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145707288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MARTS-DB: a database of mechanisms and reactions of terpene synthases. MARTS-DB:萜类合成酶的机制和反应数据库。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-12-08 DOI: 10.1186/s12859-025-06341-8
Martin Engst, Martin Brokeš, Tereza Čalounová, Raman Samusevich, Roman Bushuiev, Anton Bushuiev, Ratthachat Chatpatanasiri, Adéla Tajovská, Safa Mert Akmeşe, Milana Perković, Matouš Soldát, Josef Sivic, Tomáš Pluskal

Background: Terpene synthases (TPSs) are enzymes that catalyze some of the most complex reactions in nature-the cyclizations of terpenes, which form the carbon backbones to the largest group of natural products, the terpenoids. On average, more than half of the carbon atoms in a terpene scaffold undergo a change in connectivity or configuration during these enzymatic cascades. Understanding TPS reaction mechanisms remains challenging, often requiring intricate computational modeling and isotopic labelling studies. Moreover, the relationship between TPS sequence and catalytic function is difficult to decipher, while data-driven approaches remain limited due to a lack of comprehensive, high-quality data sources. MAIN: We introduce the Mechanisms And Reactions of Terpene Synthases DataBase (MARTS-DB)-a manually curated, structured, and searchable database that integrates TPS enzymes, the terpenes they produce, and their detailed reaction mechanisms. MARTS-DB includes over 2850 reactions catalyzed by 1432 annotated enzymes from across all domains of life, with reaction mechanisms mapped as stepwise cascades for more than 500 terpenes. Accessible at https://www.marts-db.org , the database provides advanced search functionality and supports full dataset downloads in machine-readable formats. It also encourages community contributions to promote continuous growth.

Conclusion: User-friendly and comprehensive, MARTS-DB enables the systematic exploration of TPS catalysis, opening new avenues for computational analysis and machine learning, as recently demonstrated in the prediction of novel TPSs.

背景:萜烯合成酶(tps)是催化自然界中一些最复杂的反应的酶——萜烯的环化,它形成了最大的天然产物萜类化合物的碳骨架。平均而言,在这些酶级联过程中,萜烯支架中超过一半的碳原子经历连接或结构的改变。了解TPS反应机制仍然具有挑战性,通常需要复杂的计算建模和同位素标记研究。此外,TPS序列与催化功能之间的关系难以破译,而由于缺乏全面、高质量的数据源,数据驱动的方法仍然有限。主要:我们介绍了萜类合成酶的机制和反应数据库(MARTS-DB)-一个人工管理的,结构化的,可搜索的数据库,集成了TPS酶,它们产生的萜烯,以及它们的详细反应机制。MARTS-DB包括2850多个反应,这些反应由1432种带注释的酶催化,这些酶来自生命的所有领域,反应机制被映射为500多种萜烯的逐步级联。该数据库可访问https://www.marts-db.org,提供高级搜索功能,并支持以机器可读格式下载完整的数据集。它还鼓励社区贡献,以促进持续增长。结论:用户友好且全面,MARTS-DB使TPS催化的系统探索成为可能,为计算分析和机器学习开辟了新的途径,正如最近在预测新型TPS中所证明的那样。
{"title":"MARTS-DB: a database of mechanisms and reactions of terpene synthases.","authors":"Martin Engst, Martin Brokeš, Tereza Čalounová, Raman Samusevich, Roman Bushuiev, Anton Bushuiev, Ratthachat Chatpatanasiri, Adéla Tajovská, Safa Mert Akmeşe, Milana Perković, Matouš Soldát, Josef Sivic, Tomáš Pluskal","doi":"10.1186/s12859-025-06341-8","DOIUrl":"10.1186/s12859-025-06341-8","url":null,"abstract":"<p><strong>Background: </strong>Terpene synthases (TPSs) are enzymes that catalyze some of the most complex reactions in nature-the cyclizations of terpenes, which form the carbon backbones to the largest group of natural products, the terpenoids. On average, more than half of the carbon atoms in a terpene scaffold undergo a change in connectivity or configuration during these enzymatic cascades. Understanding TPS reaction mechanisms remains challenging, often requiring intricate computational modeling and isotopic labelling studies. Moreover, the relationship between TPS sequence and catalytic function is difficult to decipher, while data-driven approaches remain limited due to a lack of comprehensive, high-quality data sources. MAIN: We introduce the Mechanisms And Reactions of Terpene Synthases DataBase (MARTS-DB)-a manually curated, structured, and searchable database that integrates TPS enzymes, the terpenes they produce, and their detailed reaction mechanisms. MARTS-DB includes over 2850 reactions catalyzed by 1432 annotated enzymes from across all domains of life, with reaction mechanisms mapped as stepwise cascades for more than 500 terpenes. Accessible at https://www.marts-db.org , the database provides advanced search functionality and supports full dataset downloads in machine-readable formats. It also encourages community contributions to promote continuous growth.</p><p><strong>Conclusion: </strong>User-friendly and comprehensive, MARTS-DB enables the systematic exploration of TPS catalysis, opening new avenues for computational analysis and machine learning, as recently demonstrated in the prediction of novel TPSs.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"10"},"PeriodicalIF":3.3,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12797696/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145707303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SVhet: towards accurate detection of germline heterozygous deletions using short reads. svheet:利用短读数准确检测种系杂合缺失。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-12-07 DOI: 10.1186/s12859-025-06342-7
Chun Hing She, Sophelia Hoi-Shan Chan, Wanling Yang

Background: Accurate structural variant detection from short-read sequencing data remains challenged by false positives, particularly for heterozygous deletions where reduced allelic support and coverage-based detection methods are ambiguous. Existing SV genotyping and filtering approaches suffer from significant recall reductions, dependencies on additional pre-computed resources, or restriction to depth-based signals that overlook read level evidence.

Results: Here we present SVhet, a novel computational framework that leverages the heterozygosity patterns detected from different read evidences to identify false heterozygous deletions. Comprehensive benchmarking using 31 Human Genome Structural Variation Consortium Phase 3 samples demonstrated SVhet's ability to further reduce false positives while maintaining baseline recall. Hybrid approach of duphold and SVhet achieved up to 60% reduction in false positive counts while preserving recall. We also showed SVhet to be computationally efficient that can complete a whole genome structural variant callset under 5 min using 4 CPU cores. SVhet is available under a permissive MIT license via https://github.com/snakesch/SVhet .

Conclusion: SVhet provides an accurate and efficient solution for evaluating heterozygous deletions derived from short read sequencing data. SVhet can be used as a standalone tool or in conjunction with other filtering tools such as duphold. Importantly, it does not require additional variant sets, and can operate with minimal compute. Altogether, SVhet adds to the current effort to achieve accurate structural variant detection using short reads.

背景:从短读测序数据中准确检测结构变异仍然受到假阳性的挑战,特别是对于杂合缺失,其中减少的等位基因支持和基于覆盖率的检测方法是模糊的。现有的SV基因分型和过滤方法存在召回率显著降低、依赖于额外的预计算资源、或者对基于深度的信号的限制而忽略了读取水平的证据。结果:在这里,我们提出了svheet,一个新的计算框架,利用从不同的读取证据检测到的杂合模式来识别假杂合缺失。使用31个人类基因组结构变异联盟第三期样本的综合基准测试表明,svheet能够在保持基线召回率的同时进一步减少假阳性。duphold和svheet的混合方法在保留召回率的同时减少了60%的假阳性计数。我们还证明了svet的计算效率,它可以在5分钟内使用4个CPU内核完成全基因组结构变体调用集。SVhet在MIT许可下可通过https://github.com/snakesch/SVhet.Conclusion获得:SVhet为评估来自短读测序数据的杂合缺失提供了准确有效的解决方案。svheet可以作为一个独立的工具使用,也可以与其他过滤工具(如duhold)结合使用。重要的是,它不需要额外的变体集,并且可以用最少的计算进行操作。总之,svet增加了目前使用短读取实现准确结构变异检测的努力。
{"title":"SVhet: towards accurate detection of germline heterozygous deletions using short reads.","authors":"Chun Hing She, Sophelia Hoi-Shan Chan, Wanling Yang","doi":"10.1186/s12859-025-06342-7","DOIUrl":"10.1186/s12859-025-06342-7","url":null,"abstract":"<p><strong>Background: </strong>Accurate structural variant detection from short-read sequencing data remains challenged by false positives, particularly for heterozygous deletions where reduced allelic support and coverage-based detection methods are ambiguous. Existing SV genotyping and filtering approaches suffer from significant recall reductions, dependencies on additional pre-computed resources, or restriction to depth-based signals that overlook read level evidence.</p><p><strong>Results: </strong>Here we present SVhet, a novel computational framework that leverages the heterozygosity patterns detected from different read evidences to identify false heterozygous deletions. Comprehensive benchmarking using 31 Human Genome Structural Variation Consortium Phase 3 samples demonstrated SVhet's ability to further reduce false positives while maintaining baseline recall. Hybrid approach of duphold and SVhet achieved up to 60% reduction in false positive counts while preserving recall. We also showed SVhet to be computationally efficient that can complete a whole genome structural variant callset under 5 min using 4 CPU cores. SVhet is available under a permissive MIT license via https://github.com/snakesch/SVhet .</p><p><strong>Conclusion: </strong>SVhet provides an accurate and efficient solution for evaluating heterozygous deletions derived from short read sequencing data. SVhet can be used as a standalone tool or in conjunction with other filtering tools such as duphold. Importantly, it does not require additional variant sets, and can operate with minimal compute. Altogether, SVhet adds to the current effort to achieve accurate structural variant detection using short reads.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"9"},"PeriodicalIF":3.3,"publicationDate":"2025-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12798059/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145699631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GeDi: simplifying gene set distances for enhanced omics interpretation in R/Bioconductor. GeDi:简化基因集距离,增强R/Bioconductor组学解释。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-12-07 DOI: 10.1186/s12859-025-06335-6
Annekathrin Silvia Nedwed, Arsenij Ustjanzew, Najla Abassi, Leon Dammer, Alicia Schulze, Sara Salome Helbich, Michael Delacher, Konstantin Strauch, Federico Marini
{"title":"GeDi: simplifying gene set distances for enhanced omics interpretation in R/Bioconductor.","authors":"Annekathrin Silvia Nedwed, Arsenij Ustjanzew, Najla Abassi, Leon Dammer, Alicia Schulze, Sara Salome Helbich, Michael Delacher, Konstantin Strauch, Federico Marini","doi":"10.1186/s12859-025-06335-6","DOIUrl":"10.1186/s12859-025-06335-6","url":null,"abstract":"","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"14"},"PeriodicalIF":3.3,"publicationDate":"2025-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12809992/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145699681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A DSSM network for inferring and prioritizing cell-type-specific regulons using single-cell RNA-seq data. 使用单细胞RNA-seq数据推断和优先排序细胞类型特异性调控的DSSM网络。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-12-07 DOI: 10.1186/s12859-025-06329-4
Yaxin Fan, Yichao Mei, Shengbao Bao, Jianyong Wang, Junxiang Gao

Background: Transcription factors and their target genes form regulatory modules known as regulons, which exhibit significant specificity across various cell types. The integration of single-cell transcriptome data, transcription factor motif data, and ChIP-seq data presents a challenging task in identifying cell-type-specific regulons and examining their activities.

Results: In response, this study presents a Deep Structured Semantic Model for inferring and prioritizing cell-type-specific Regulons (DSSMReg). This approach utilizes single-cell transcriptome and transcription factor motif data to map transcription factors and target genes into a low-dimensional semantic space, resulting in the generation of feature vectors. The model then computes the cosine similarity between transcription factors and target genes to evaluate their regulatory strength and subsequently infers cell-type-specific regulons based on this assessment. Moreover, DSSMReg employs the AUCell algorithm to rank the importance of regulons for each cell type.

Conclusions: We compared DSSMReg against five representative gene regulatory inference algorithms using scRNA-seq data from five cell lines, with DSSMReg achieving the highest evaluation metrics for both AUROC and AUPRC. Furthermore, we applied DSSMReg to infer cell-type-specific regulons from scRNA-seq data of triple-negative breast cancer and human bone marrow hematopoietic stem cells. Our results indicated that regulons with high AUCell scores possess significant biological relevance. The source code of DSSMReg is freely available at https://github.com/YaxinF/DSSMReg .

背景:转录因子及其靶基因形成调控模块,称为调控子,在不同的细胞类型中表现出显著的特异性。单细胞转录组数据、转录因子基序数据和ChIP-seq数据的整合在识别细胞类型特异性调控和检查其活性方面提出了一项具有挑战性的任务。作为回应,本研究提出了一个用于推断和优先排序细胞类型特异性规则的深度结构化语义模型(DSSMReg)。该方法利用单细胞转录组和转录因子基序数据将转录因子和靶基因映射到低维语义空间中,从而生成特征向量。然后,该模型计算转录因子和靶基因之间的余弦相似性,以评估其调控强度,并随后根据该评估推断出细胞类型特异性的调控。此外,dssmregg采用AUCell算法对每种细胞类型的规则重要性进行排序。结论:我们使用来自5个细胞系的scRNA-seq数据,将DSSMReg与5种代表性基因调控推断算法进行了比较,DSSMReg在AUROC和AUPRC中都获得了最高的评价指标。此外,我们利用DSSMReg从三阴性乳腺癌和人骨髓造血干细胞的scRNA-seq数据中推断出细胞类型特异性调控。我们的研究结果表明,高AUCell评分的调控具有显著的生物学相关性。DSSMReg的源代码可以在https://github.com/YaxinF/DSSMReg上免费获得。
{"title":"A DSSM network for inferring and prioritizing cell-type-specific regulons using single-cell RNA-seq data.","authors":"Yaxin Fan, Yichao Mei, Shengbao Bao, Jianyong Wang, Junxiang Gao","doi":"10.1186/s12859-025-06329-4","DOIUrl":"10.1186/s12859-025-06329-4","url":null,"abstract":"<p><strong>Background: </strong>Transcription factors and their target genes form regulatory modules known as regulons, which exhibit significant specificity across various cell types. The integration of single-cell transcriptome data, transcription factor motif data, and ChIP-seq data presents a challenging task in identifying cell-type-specific regulons and examining their activities.</p><p><strong>Results: </strong>In response, this study presents a Deep Structured Semantic Model for inferring and prioritizing cell-type-specific Regulons (DSSMReg). This approach utilizes single-cell transcriptome and transcription factor motif data to map transcription factors and target genes into a low-dimensional semantic space, resulting in the generation of feature vectors. The model then computes the cosine similarity between transcription factors and target genes to evaluate their regulatory strength and subsequently infers cell-type-specific regulons based on this assessment. Moreover, DSSMReg employs the AUCell algorithm to rank the importance of regulons for each cell type.</p><p><strong>Conclusions: </strong>We compared DSSMReg against five representative gene regulatory inference algorithms using scRNA-seq data from five cell lines, with DSSMReg achieving the highest evaluation metrics for both AUROC and AUPRC. Furthermore, we applied DSSMReg to infer cell-type-specific regulons from scRNA-seq data of triple-negative breast cancer and human bone marrow hematopoietic stem cells. Our results indicated that regulons with high AUCell scores possess significant biological relevance. The source code of DSSMReg is freely available at https://github.com/YaxinF/DSSMReg .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"8"},"PeriodicalIF":3.3,"publicationDate":"2025-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12798040/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145699660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
BMC Bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1