首页 > 最新文献

BMC Bioinformatics最新文献

英文 中文
LCSkPOA: enabling banded semi-global partial order alignments via efficient and accurate backbone generation through extended LCSk+. LCSkPOA:通过扩展lcskk +实现高效准确的骨干生成,实现带状半全局偏序对齐。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-11-25 DOI: 10.1186/s12859-025-06293-z
Minindu Weerakoon, Christopher T Saunders, Haynes Heaton

Background: Most multiple sequence alignment and string-graph alignment algorithms focus on global alignment, but many applications exist for semi-global and local string-graph alignment. Long reads require enormous amounts of memory and runtime to fill out large dynamic programming tables. Effective algorithms for finding the backbone and thus defining a band of an alignment such as the longest common subsequence with kmer matches (LCSk++) exist but do not work with graphs. This study introduces an adaptation of the Longest Common Subsequence with kmer matches (LCSk++) algorithm tailored for graph structures, particularly focusing on Partial Order Alignment (POA) graphs. POA graphs, which are directed acyclic graphs, represent multiple sequence alignments and effectively capture the relationships between sequences. State-of-the-art methods like ABPOA and SPOA improve upon POA, while ABPOA incorporates banding, SPOA does not; however, neither utilizes parallel processing despite leveraging SIMD for faster matrix calculations. Our approach addresses these limitations by extending the LCSk++ algorithm to handle the complexities of graph-based alignment while incorporating SIMD, banding, and parallel processing for enhanced efficiency.

Results: Our extended LCSk++ algorithm integrates dynamic programming and graph traversal techniques to detect conserved regions within POA graphs, termed the LCSk++ backbone. This backbone enables precise banding of the POA matrix for all alignment modes (global, semi-global, and local). Unlike ABPOA, which only allows banded global alignment, our approach enables broader flexibility and significantly improves consensus sequence construction. While supporting more alignment modes than ABPOA, it also outperforms SPOA's global alignment, with substantial memory savings (up to 98%) and significant run-time reductions (up to 25x), particularly for long sequences (> 30,000 bp). Our method maintains high alignment accuracy and proves effective across various string lengths and datasets, including synthetic and PacBio HiFi reads. Parallel processing further enhances runtime efficiency, achieving up to 150x speed improvements on conventional PCs.

Conclusion: The extended LCSk++ algorithm for graph structures offers a substantial advancement in sequence alignment technology. It effectively reduces memory consumption and optimizes run times without compromising alignment quality, thus providing a robust solution for all alignment modes (global, local, and semi-global) in POA. This method enhances the utility of POA in critical applications such as multiple sequence alignment for phylogeny construction and graph-based reference alignment.

背景:大多数多序列对齐和字符串图对齐算法都集中在全局对齐上,但是对于半全局和局部字符串图对齐也有很多应用。长读需要大量的内存和运行时来填充大型动态规划表。现有一些有效的算法可以找到主干,从而定义一个对齐的频带,例如具有kmer匹配的最长公共子序列(lcsk++),但不适用于图。本研究引入了一种针对图结构的kmer匹配(lcsk++)算法的最长公共子序列改编,特别关注于偏序对齐(POA)图。POA图是一种有向无环图,它表示多个序列对齐,并能有效地捕捉序列之间的关系。最先进的方法,如ABPOA和SPOA改进了POA,而ABPOA结合了带状,SPOA没有;然而,尽管利用SIMD进行更快的矩阵计算,但两者都没有使用并行处理。我们的方法通过扩展lcsk++算法来处理基于图的对齐的复杂性,同时结合SIMD、带状和并行处理来提高效率,从而解决了这些限制。结果:我们的扩展lcsk++算法集成了动态规划和图遍历技术来检测POA图中的保守区域,称为lcsk++骨干。这个主干允许对所有对齐模式(全局、半全局和局部)的POA矩阵进行精确带化。与ABPOA不同,ABPOA只允许带状全局对齐,我们的方法具有更大的灵活性,并显著改善了共识序列的构建。虽然它比ABPOA支持更多的对齐模式,但它也优于SPOA的全局对齐,节省了大量内存(高达98%),显著减少了运行时间(高达25倍),特别是对于长序列(> 30,000 bp)。我们的方法保持了很高的对准精度,并且在各种字符串长度和数据集(包括合成和PacBio HiFi读取)上都是有效的。并行处理进一步提高了运行效率,在传统pc上实现了高达150倍的速度提升。结论:扩展的lcsk++算法在图结构序列比对技术上有了很大的进步。它有效地减少了内存消耗,并在不影响对齐质量的情况下优化了运行时,因此为POA中的所有对齐模式(全局、局部和半全局)提供了一个健壮的解决方案。该方法增强了POA在系统发育构建的多序列比对和基于图的参考比对等关键应用中的实用性。
{"title":"LCSkPOA: enabling banded semi-global partial order alignments via efficient and accurate backbone generation through extended LCSk+.","authors":"Minindu Weerakoon, Christopher T Saunders, Haynes Heaton","doi":"10.1186/s12859-025-06293-z","DOIUrl":"10.1186/s12859-025-06293-z","url":null,"abstract":"<p><strong>Background: </strong>Most multiple sequence alignment and string-graph alignment algorithms focus on global alignment, but many applications exist for semi-global and local string-graph alignment. Long reads require enormous amounts of memory and runtime to fill out large dynamic programming tables. Effective algorithms for finding the backbone and thus defining a band of an alignment such as the longest common subsequence with kmer matches (LCSk++) exist but do not work with graphs. This study introduces an adaptation of the Longest Common Subsequence with kmer matches (LCSk++) algorithm tailored for graph structures, particularly focusing on Partial Order Alignment (POA) graphs. POA graphs, which are directed acyclic graphs, represent multiple sequence alignments and effectively capture the relationships between sequences. State-of-the-art methods like ABPOA and SPOA improve upon POA, while ABPOA incorporates banding, SPOA does not; however, neither utilizes parallel processing despite leveraging SIMD for faster matrix calculations. Our approach addresses these limitations by extending the LCSk++ algorithm to handle the complexities of graph-based alignment while incorporating SIMD, banding, and parallel processing for enhanced efficiency.</p><p><strong>Results: </strong>Our extended LCSk++ algorithm integrates dynamic programming and graph traversal techniques to detect conserved regions within POA graphs, termed the LCSk++ backbone. This backbone enables precise banding of the POA matrix for all alignment modes (global, semi-global, and local). Unlike ABPOA, which only allows banded global alignment, our approach enables broader flexibility and significantly improves consensus sequence construction. While supporting more alignment modes than ABPOA, it also outperforms SPOA's global alignment, with substantial memory savings (up to 98%) and significant run-time reductions (up to 25x), particularly for long sequences (> 30,000 bp). Our method maintains high alignment accuracy and proves effective across various string lengths and datasets, including synthetic and PacBio HiFi reads. Parallel processing further enhances runtime efficiency, achieving up to 150x speed improvements on conventional PCs.</p><p><strong>Conclusion: </strong>The extended LCSk++ algorithm for graph structures offers a substantial advancement in sequence alignment technology. It effectively reduces memory consumption and optimizes run times without compromising alignment quality, thus providing a robust solution for all alignment modes (global, local, and semi-global) in POA. This method enhances the utility of POA in critical applications such as multiple sequence alignment for phylogeny construction and graph-based reference alignment.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"284"},"PeriodicalIF":3.3,"publicationDate":"2025-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12649094/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145602071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CLEP-GAN: an innovative approach to subject-independent ECG reconstruction from PPG signals. CLEP-GAN:一种从PPG信号中重建与受试者无关的ECG的创新方法。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-11-25 DOI: 10.1186/s12859-025-06276-0
Xiaoyan Li, Shixin Xu, Faisal Habib, Neda Aminnejad, Arvind Gupta, Huaxiong Huang

Background: Reconstructing ECG signals from PPG measurements is a critical task for non-invasive cardiac monitoring. While several public ECG-PPG datasets exist, they lack the diversity found in image datasets, and the data collection process often introduces noise, making ECG reconstruction from PPG signals challenging even for advanced machine learning models.

Results: We propose a novel ODE-based method for generating synthetic ECG-PPG pairs to enhance training diversity. Building on this, we introduce CLEP-GAN, a subject-independent PPG-to-ECG reconstruction framework that integrates contrastive learning, adversarial learning, and attention gating. CLEP-GAN achieves performance that matches or surpasses current state-of-the-art methods, particularly in reconstructing ECG signals from unseen subjects. Evaluation on real-world datasets (BIDMC and CapnoBase) confirms its effectiveness. Additionally, our analysis shows that demographic factors such as sex and age significantly impact reconstruction accuracy, emphasizing the importance of incorporating demographic diversity during model training and data augmentation.

Conclusions: Our method produces synthetic ECG-PPG pairs with RR interval distributions closely aligned with their real counterparts and shows strong potential to simulate diverse rhythms such as regular sinus rhythm (RSR), sinus arrhythmia (SA), and atrial fibrillation (AFib). Furthermore, CLEP-GAN demonstrates robust performance on both synthetic and real datasets, achieving near-perfect reconstruction in synthetic settings and competitive results on real data. These findings highlight CLEP-GAN's promise for reliable, non-invasive ECG monitoring in clinical applications.

背景:从PPG测量中重建心电信号是无创心脏监测的关键任务。虽然存在一些公开的ECG-PPG数据集,但它们缺乏图像数据集的多样性,并且数据收集过程通常会引入噪声,这使得从PPG信号中重建ECG即使对于先进的机器学习模型也是具有挑战性的。结果:我们提出了一种新的基于ode的合成ECG-PPG对的方法,以增强训练的多样性。在此基础上,我们引入了CLEP-GAN,这是一个独立于受试者的PPG-to-ECG重建框架,集成了对比学习、对抗学习和注意门控。CLEP-GAN达到了匹配或超过当前最先进的方法的性能,特别是在从看不见的对象重建ECG信号方面。对真实世界数据集(BIDMC和CapnoBase)的评估证实了其有效性。此外,我们的分析表明,性别和年龄等人口因素显著影响重建准确性,强调了在模型训练和数据增强过程中纳入人口多样性的重要性。结论:我们的方法合成的ECG-PPG对,其RR间隔分布与真实对应物密切一致,具有模拟多种节律的强大潜力,如规则窦性心律(RSR)、窦性心律失常(SA)和心房颤动(AFib)。此外,CLEP-GAN在合成和真实数据集上都表现出强大的性能,在合成设置中实现了近乎完美的重建,在真实数据上实现了具有竞争力的结果。这些发现突出了CLEP-GAN在临床应用中可靠、无创心电图监测的前景。
{"title":"CLEP-GAN: an innovative approach to subject-independent ECG reconstruction from PPG signals.","authors":"Xiaoyan Li, Shixin Xu, Faisal Habib, Neda Aminnejad, Arvind Gupta, Huaxiong Huang","doi":"10.1186/s12859-025-06276-0","DOIUrl":"10.1186/s12859-025-06276-0","url":null,"abstract":"<p><strong>Background: </strong>Reconstructing ECG signals from PPG measurements is a critical task for non-invasive cardiac monitoring. While several public ECG-PPG datasets exist, they lack the diversity found in image datasets, and the data collection process often introduces noise, making ECG reconstruction from PPG signals challenging even for advanced machine learning models.</p><p><strong>Results: </strong>We propose a novel ODE-based method for generating synthetic ECG-PPG pairs to enhance training diversity. Building on this, we introduce CLEP-GAN, a subject-independent PPG-to-ECG reconstruction framework that integrates contrastive learning, adversarial learning, and attention gating. CLEP-GAN achieves performance that matches or surpasses current state-of-the-art methods, particularly in reconstructing ECG signals from unseen subjects. Evaluation on real-world datasets (BIDMC and CapnoBase) confirms its effectiveness. Additionally, our analysis shows that demographic factors such as sex and age significantly impact reconstruction accuracy, emphasizing the importance of incorporating demographic diversity during model training and data augmentation.</p><p><strong>Conclusions: </strong>Our method produces synthetic ECG-PPG pairs with RR interval distributions closely aligned with their real counterparts and shows strong potential to simulate diverse rhythms such as regular sinus rhythm (RSR), sinus arrhythmia (SA), and atrial fibrillation (AFib). Furthermore, CLEP-GAN demonstrates robust performance on both synthetic and real datasets, achieving near-perfect reconstruction in synthetic settings and competitive results on real data. These findings highlight CLEP-GAN's promise for reliable, non-invasive ECG monitoring in clinical applications.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"306"},"PeriodicalIF":3.3,"publicationDate":"2025-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12751419/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145602141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gradient boosting with knockoff filters: a biostatistical approach to variable selection. 仿冒过滤器的梯度增强:变量选择的生物统计学方法。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-11-25 DOI: 10.1186/s12859-025-06215-z
Amr Mohamed, Kevin H Lee

As data complexity and volume increase rapidly, efficient statistical methods for identifying significant variables become crucial. Variable selection plays a vital role in establishing relationships between predictors and response variables. The challenge lies in achieving this goal while controlling the False Discovery Rate (FDR) and maintaining statistical power. The knockoff filter, a recent approach, generates inexpensive knockoff variables that mimic the correlation structure of the original variables, serving as negative controls for inference. In this study, we extend the use of knockoffs to Light Gradient Boosting Machine (LightGBM), a fast and accurate machine learning technique. Shapely Additive Explanations (SHAP) values are employed to interpret the black-box nature of machine learning. Through extensive experimentation, our proposed method outperforms traditional approaches, accurately identifying important variables for each class. It offers improved speed and efficiency across multiple datasets. To validate our approach, an extensive simulation study is conducted. The integration of knockoffs into LightGBM enhances performance and interpretability, contributing to the advancement of variable selection methods. Our research addresses the challenges of variable selection in the era of big data, providing a valuable tool for identifying relevant variables in statistical modeling and machine learning applications.

随着数据复杂性和数据量的迅速增加,识别重要变量的有效统计方法变得至关重要。变量选择对于建立预测变量和响应变量之间的关系起着至关重要的作用。挑战在于如何在控制错误发现率(FDR)和保持统计能力的同时实现这一目标。仿冒过滤器是一种最新的方法,它生成廉价的仿冒变量,模仿原始变量的相关结构,作为推理的负控制。在本研究中,我们将仿制品的使用扩展到光梯度增强机(LightGBM),这是一种快速准确的机器学习技术。形状加性解释(SHAP)值被用来解释机器学习的黑箱性质。通过大量的实验,我们提出的方法优于传统方法,准确地识别每个类的重要变量。它提高了跨多个数据集的速度和效率。为了验证我们的方法,进行了广泛的模拟研究。将仿制品集成到LightGBM中提高了性能和可解释性,促进了变量选择方法的进步。我们的研究解决了大数据时代变量选择的挑战,为统计建模和机器学习应用中识别相关变量提供了有价值的工具。
{"title":"Gradient boosting with knockoff filters: a biostatistical approach to variable selection.","authors":"Amr Mohamed, Kevin H Lee","doi":"10.1186/s12859-025-06215-z","DOIUrl":"10.1186/s12859-025-06215-z","url":null,"abstract":"<p><p>As data complexity and volume increase rapidly, efficient statistical methods for identifying significant variables become crucial. Variable selection plays a vital role in establishing relationships between predictors and response variables. The challenge lies in achieving this goal while controlling the False Discovery Rate (FDR) and maintaining statistical power. The knockoff filter, a recent approach, generates inexpensive knockoff variables that mimic the correlation structure of the original variables, serving as negative controls for inference. In this study, we extend the use of knockoffs to Light Gradient Boosting Machine (LightGBM), a fast and accurate machine learning technique. Shapely Additive Explanations (SHAP) values are employed to interpret the black-box nature of machine learning. Through extensive experimentation, our proposed method outperforms traditional approaches, accurately identifying important variables for each class. It offers improved speed and efficiency across multiple datasets. To validate our approach, an extensive simulation study is conducted. The integration of knockoffs into LightGBM enhances performance and interpretability, contributing to the advancement of variable selection methods. Our research addresses the challenges of variable selection in the era of big data, providing a valuable tool for identifying relevant variables in statistical modeling and machine learning applications.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"13"},"PeriodicalIF":3.3,"publicationDate":"2025-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12801829/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145602121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OMetaNet: an efficient hybrid deep learning model based on multimodal data fusion and contrastive learning for predicting 2'-O-methylation sites in human RNA. metanet:基于多模态数据融合和对比学习的高效混合深度学习模型,用于预测人类RNA中的2'- o -甲基化位点。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-11-24 DOI: 10.1186/s12859-025-06324-9
Peng Shen, Yiyu Lin, Sen Yang, Ziding Zhang

Background: Accurately identifying RNA 2'-O-methylation (2OM) sites is a crucial step in gaining an in-depth understanding of RNA regulatory mechanisms. Although there are currently multiple prediction tools available, they still suffer from limited prediction accuracy and an inability to fully capture the associations between sequences and sites.

Results: This study constructs a novel low-redundancy dataset and innovatively proposes the KN-PairMatrix encoding scheme, effectively addressing the research gap in sequence-site association analysis. Based on this foundation, we developed the deep learning framework OMetaNet, which integrates residual and downsampling-optimized CNN modules, Mamba network, and a proprietary cross-modal interactive fusion module. The framework incorporates a contrastive learning-driven adaptive hybrid loss function. Employing a progressive feature disentanglement strategy, it enhances the learning capability for 2OM site-specific patterns. Independent evaluation results demonstrate that OMetaNet significantly outperforms existing methods in predicting 2OM sites across all four nucleotide types.

Conclusions: We proposed a novel computational model, OMetaNet. Its unique design structure may potentially reshape the paradigm of transcriptome analysis, open up new directions for extracting modification site information, and show significant potential in biomarker research and cross-species generalization studies.

背景:准确识别RNA 2'- o -甲基化(2OM)位点是深入了解RNA调控机制的关键一步。虽然目前有多种可用的预测工具,但它们仍然存在预测精度有限和无法完全捕获序列和位点之间关联的问题。结果:本研究构建了一个新颖的低冗余数据集,并创新性地提出了KN-PairMatrix编码方案,有效解决了序列位点关联分析的研究空白。在此基础上,我们开发了深度学习框架OMetaNet,该框架集成了残差和下采样优化的CNN模块、Mamba网络和专有的跨模态交互融合模块。该框架结合了一个对比学习驱动的自适应混合损失函数。采用渐进式特征解缠策略,增强了对2OM位点特定模式的学习能力。独立评估结果表明,在预测所有四种核苷酸类型的2OM位点方面,OMetaNet显著优于现有方法。结论:我们提出了一种新的计算模型——OMetaNet。其独特的设计结构可能会重塑转录组分析的范式,为提取修饰位点信息开辟新的方向,并在生物标志物研究和跨物种推广研究中显示出巨大的潜力。
{"title":"OMetaNet: an efficient hybrid deep learning model based on multimodal data fusion and contrastive learning for predicting 2'-O-methylation sites in human RNA.","authors":"Peng Shen, Yiyu Lin, Sen Yang, Ziding Zhang","doi":"10.1186/s12859-025-06324-9","DOIUrl":"10.1186/s12859-025-06324-9","url":null,"abstract":"<p><strong>Background: </strong>Accurately identifying RNA 2'-O-methylation (2OM) sites is a crucial step in gaining an in-depth understanding of RNA regulatory mechanisms. Although there are currently multiple prediction tools available, they still suffer from limited prediction accuracy and an inability to fully capture the associations between sequences and sites.</p><p><strong>Results: </strong>This study constructs a novel low-redundancy dataset and innovatively proposes the KN-PairMatrix encoding scheme, effectively addressing the research gap in sequence-site association analysis. Based on this foundation, we developed the deep learning framework OMetaNet, which integrates residual and downsampling-optimized CNN modules, Mamba network, and a proprietary cross-modal interactive fusion module. The framework incorporates a contrastive learning-driven adaptive hybrid loss function. Employing a progressive feature disentanglement strategy, it enhances the learning capability for 2OM site-specific patterns. Independent evaluation results demonstrate that OMetaNet significantly outperforms existing methods in predicting 2OM sites across all four nucleotide types.</p><p><strong>Conclusions: </strong>We proposed a novel computational model, OMetaNet. Its unique design structure may potentially reshape the paradigm of transcriptome analysis, open up new directions for extracting modification site information, and show significant potential in biomarker research and cross-species generalization studies.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"304"},"PeriodicalIF":3.3,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12752101/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145595607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Varaps: a python package for estimating SARS-CoV-2 lineages proportions from pooled sequencing data (ANRS0160). Varaps:用于从合并测序数据(ANRS0160)估计SARS-CoV-2谱系比例的python包。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-11-23 DOI: 10.1186/s12859-025-06299-7
El Hacene Djaout, Nicolas Cluzel, Vincent Marechal, Gregory Nuel, Marie Courbariaux
{"title":"Varaps: a python package for estimating SARS-CoV-2 lineages proportions from pooled sequencing data (ANRS0160).","authors":"El Hacene Djaout, Nicolas Cluzel, Vincent Marechal, Gregory Nuel, Marie Courbariaux","doi":"10.1186/s12859-025-06299-7","DOIUrl":"10.1186/s12859-025-06299-7","url":null,"abstract":"","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"302"},"PeriodicalIF":3.3,"publicationDate":"2025-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12751500/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145585584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EGCPPIS: learning hierarchical equivariant graph representations with contrastive integration for protein-protein interaction site identification. EGCPPIS:学习层次等变图表示与蛋白质-蛋白质相互作用位点识别的对比整合。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-11-23 DOI: 10.1186/s12859-025-06328-5
Guicong Sun, Yongxian Fan, Yangfeng Zhu, Mengxin Zheng

Background: Protein-protein interactions regulate the dynamic operation of intracellular molecular networks, serving as the molecular basis for revealing protein functions and disease mechanisms. Recently, several computational methods for predicting protein-protein interaction sites (PPIs) have been presented as alternatives to costly and labor-intensive traditional experiments. However, existing methods generally ignore the inherent hierarchical structure of protein chains. Furthermore, the equivariance of graph structure during spatial transformations is often neglected when applying graph neural networks to modeling. Therefore, accurately identifying PPIs remains a challenging task.

Results: In this work, we propose an end-to-end GNN-based computational method, EGCPPIS, for efficiently identifying protein-protein interaction sites. First, we construct a hierarchical graph representation of the protein chain, including residue-level graph and atom-level graph. Next, EGCPPIS designs an E(n) Equivariant Graph Neural Network (EGNN) module to learn residue-level embeddings with equivariant features. After further extracting atom-level embeddings using the GraphSAGE module, we introduce the contrastive learning strategy to integrate hierarchical graph features. This strategy enables us to learn consistent embeddings between residue-level and atom-level representations. Finally, the fused embeddings are weighted using an improved gated multi-head attention mechanism.

Conclusion: Comprehensive evaluation results on multiple datasets demonstrate that EGCPPIS significantly outperforms state-of-the-art methods. Extensive comparative experiments and case studies further confirm that EGCPPIS can reveal the decision-making patterns in PPIs prediction, facilitating the discovery of potential PPIs. The original datasets and code of EGCPPIS are available at https://github.com/GuicongSun/EGCPPIS .

背景:蛋白质-蛋白质相互作用调节细胞内分子网络的动态运作,是揭示蛋白质功能和疾病机制的分子基础。最近,几种预测蛋白质-蛋白质相互作用位点(PPIs)的计算方法被提出,作为昂贵和劳动密集型的传统实验的替代方法。然而,现有的方法通常忽略了蛋白质链固有的层次结构。此外,在应用图神经网络进行建模时,往往忽略了图结构在空间变换过程中的等方差。因此,准确识别ppi仍然是一项具有挑战性的任务。在这项工作中,我们提出了一种端到端的基于gnn的计算方法,EGCPPIS,用于有效识别蛋白质-蛋白质相互作用位点。首先,我们构建了蛋白质链的层次图表示,包括残差级图和原子级图。接下来,EGCPPIS设计了一个E(n)等变图神经网络(EGNN)模块来学习具有等变特征的残差级嵌入。在使用GraphSAGE模块进一步提取原子级嵌入之后,我们引入了对比学习策略来整合层次图特征。这种策略使我们能够学习残余级和原子级表示之间的一致嵌入。最后,使用改进的门控多头注意机制对融合嵌入进行加权。结论:对多个数据集的综合评价结果表明,EGCPPIS显著优于最先进的方法。大量的对比实验和案例研究进一步证实了EGCPPIS可以揭示ppi预测中的决策模式,有助于发现潜在的ppi。EGCPPIS的原始数据集和代码可在https://github.com/GuicongSun/EGCPPIS上获得。
{"title":"EGCPPIS: learning hierarchical equivariant graph representations with contrastive integration for protein-protein interaction site identification.","authors":"Guicong Sun, Yongxian Fan, Yangfeng Zhu, Mengxin Zheng","doi":"10.1186/s12859-025-06328-5","DOIUrl":"10.1186/s12859-025-06328-5","url":null,"abstract":"<p><strong>Background: </strong>Protein-protein interactions regulate the dynamic operation of intracellular molecular networks, serving as the molecular basis for revealing protein functions and disease mechanisms. Recently, several computational methods for predicting protein-protein interaction sites (PPIs) have been presented as alternatives to costly and labor-intensive traditional experiments. However, existing methods generally ignore the inherent hierarchical structure of protein chains. Furthermore, the equivariance of graph structure during spatial transformations is often neglected when applying graph neural networks to modeling. Therefore, accurately identifying PPIs remains a challenging task.</p><p><strong>Results: </strong>In this work, we propose an end-to-end GNN-based computational method, EGCPPIS, for efficiently identifying protein-protein interaction sites. First, we construct a hierarchical graph representation of the protein chain, including residue-level graph and atom-level graph. Next, EGCPPIS designs an E(n) Equivariant Graph Neural Network (EGNN) module to learn residue-level embeddings with equivariant features. After further extracting atom-level embeddings using the GraphSAGE module, we introduce the contrastive learning strategy to integrate hierarchical graph features. This strategy enables us to learn consistent embeddings between residue-level and atom-level representations. Finally, the fused embeddings are weighted using an improved gated multi-head attention mechanism.</p><p><strong>Conclusion: </strong>Comprehensive evaluation results on multiple datasets demonstrate that EGCPPIS significantly outperforms state-of-the-art methods. Extensive comparative experiments and case studies further confirm that EGCPPIS can reveal the decision-making patterns in PPIs prediction, facilitating the discovery of potential PPIs. The original datasets and code of EGCPPIS are available at https://github.com/GuicongSun/EGCPPIS .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":" ","pages":"303"},"PeriodicalIF":3.3,"publicationDate":"2025-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12751822/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145586251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SynergyImage: image-based model for drug combinations synergy score prediction. SynergyImage:基于图像的药物组合协同评分预测模型。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-11-21 DOI: 10.1186/s12859-025-06314-x
Maryam Mehrabani, Amir Lakizadeh, Alireza Fotuhi Siahpirani, Ali Masoudi-Nejad
{"title":"SynergyImage: image-based model for drug combinations synergy score prediction.","authors":"Maryam Mehrabani, Amir Lakizadeh, Alireza Fotuhi Siahpirani, Ali Masoudi-Nejad","doi":"10.1186/s12859-025-06314-x","DOIUrl":"10.1186/s12859-025-06314-x","url":null,"abstract":"","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"283"},"PeriodicalIF":3.3,"publicationDate":"2025-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12639979/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145572843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MCLCBA: multi-view contrastive learning network for RNA methylation site prediction. MCLCBA:用于RNA甲基化位点预测的多视图对比学习网络。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-11-19 DOI: 10.1186/s12859-025-06306-x
Honglei Wang, Xuesong Zhang, Yanjing Sun, Zhaoyang Liu, Lin Zhang

Background: RNA methylation (RM) regulates gene expression regulation, RNA stability, and protein translation. Accurate prediction of RM modification sites is essential for understanding their biological functions. However, existing wet-lab detection techniques face challenges including operational complexity and high costs. Deep learning (DL) methods have been applied to this task. However, existing methods show performance degradation with smaller training datasets. For instance, the Bidirectional Gated Recurrent Unit (BGRU) demonstrates substantial performance degradation. Contrastive Learning Network (CNN) can extract local pattern features but learns overly specific patterns with sample-limited data, resulting in poor feature generalization. Bidirectional Long Short-Term Memory (BiLSTM) excels at modeling long-range dependencies but cannot sufficiently learn gating mechanism parameters to capture effective sequence representations with limited samples. Transformer processes sequences in parallel and captures global dependencies through self-attention, but its quadratic computational complexity and large parameter count make it prone to overfitting on small datasets. Current DL methods show reduced performance when training data is limited.

Results: This study proposes a Multi-view Contrastive Learning with CNN-BiLSTM-Attention (MCLCBA) framework for RM modification site prediction. The multi-view approach comprises a primary view and auxiliary view, where the primary view utilizes DNA Bidirectional Encoder Representations from Transformers (DNABERT) to extract sequence contextual features, and the auxiliary view employs Chaos Game Representation (CGR) to extract structural features. Feature extraction includes four components: data augmentation, multi-view encoders, projection heads, and contrastive loss functions. By implementing dual differential data augmentation strategies and constructing multi-view network architectures for feature processing and fusion, the model learns discriminative feature representations invariant to data augmentation through maximizing positive sample similarity while minimizing negative sample similarity. This effectively addresses sample-limited feature learning scenarios. Experimental results on the sample-limited m7G dataset demonstrate that MCLCBA achieves AUROC and AUPRC of 85.64% and 86.94%, respectively, improving upon existing methods by 5-6% in both metrics.

Conclusions: Through multi-view contrastive learning, MCLCBA provides an approach for RM sites under sample-limited scenarios.

背景:RNA甲基化(RM)调节基因表达调控、RNA稳定性和蛋白质翻译。准确预测RM修饰位点对了解其生物学功能至关重要。然而,现有的湿实验室检测技术面临着操作复杂性和高成本等挑战。深度学习(DL)方法已应用于此任务。然而,现有的方法在较小的训练数据集上表现出性能下降。例如,双向门控循环单元(BGRU)表现出明显的性能下降。对比学习网络(CNN)可以提取局部模式特征,但在样本有限的数据下学习过于特定的模式,导致特征泛化效果较差。双向长短期记忆(Bidirectional Long - short Memory, BiLSTM)擅长对长时间依赖关系进行建模,但无法充分学习门控机制参数,无法在有限的样本中捕获有效的序列表示。Transformer并行处理序列并通过自关注捕获全局依赖关系,但其二次计算复杂性和大参数计数使其容易在小数据集上过拟合。当前的深度学习方法在训练数据有限的情况下表现出较低的性能。结果:本研究提出了一种基于CNN-BiLSTM-Attention (MCLCBA)的多视角对比学习框架,用于RM修饰位点预测。多视图方法包括主视图和辅助视图,其中主视图利用变形变压器DNA双向编码器表示(DNABERT)提取序列上下文特征,辅助视图利用混沌博弈表示(CGR)提取序列结构特征。特征提取包括四个部分:数据增强、多视图编码器、投影头和对比损失函数。该模型通过实现双差分数据增强策略,构建多视图网络结构进行特征处理和融合,通过最大化正样本相似度和最小化负样本相似度来学习对数据增强不变的判别特征表示。这有效地解决了样本有限的特征学习场景。在样本有限的m7G数据集上的实验结果表明,MCLCBA的AUROC和AUPRC分别达到85.64%和86.94%,在这两个指标上都比现有方法提高了5-6%。结论:MCLCBA通过多视角对比学习,为样本有限的RM站点提供了一种方法。
{"title":"MCLCBA: multi-view contrastive learning network for RNA methylation site prediction.","authors":"Honglei Wang, Xuesong Zhang, Yanjing Sun, Zhaoyang Liu, Lin Zhang","doi":"10.1186/s12859-025-06306-x","DOIUrl":"10.1186/s12859-025-06306-x","url":null,"abstract":"<p><strong>Background: </strong>RNA methylation (RM) regulates gene expression regulation, RNA stability, and protein translation. Accurate prediction of RM modification sites is essential for understanding their biological functions. However, existing wet-lab detection techniques face challenges including operational complexity and high costs. Deep learning (DL) methods have been applied to this task. However, existing methods show performance degradation with smaller training datasets. For instance, the Bidirectional Gated Recurrent Unit (BGRU) demonstrates substantial performance degradation. Contrastive Learning Network (CNN) can extract local pattern features but learns overly specific patterns with sample-limited data, resulting in poor feature generalization. Bidirectional Long Short-Term Memory (BiLSTM) excels at modeling long-range dependencies but cannot sufficiently learn gating mechanism parameters to capture effective sequence representations with limited samples. Transformer processes sequences in parallel and captures global dependencies through self-attention, but its quadratic computational complexity and large parameter count make it prone to overfitting on small datasets. Current DL methods show reduced performance when training data is limited.</p><p><strong>Results: </strong>This study proposes a Multi-view Contrastive Learning with CNN-BiLSTM-Attention (MCLCBA) framework for RM modification site prediction. The multi-view approach comprises a primary view and auxiliary view, where the primary view utilizes DNA Bidirectional Encoder Representations from Transformers (DNABERT) to extract sequence contextual features, and the auxiliary view employs Chaos Game Representation (CGR) to extract structural features. Feature extraction includes four components: data augmentation, multi-view encoders, projection heads, and contrastive loss functions. By implementing dual differential data augmentation strategies and constructing multi-view network architectures for feature processing and fusion, the model learns discriminative feature representations invariant to data augmentation through maximizing positive sample similarity while minimizing negative sample similarity. This effectively addresses sample-limited feature learning scenarios. Experimental results on the sample-limited m<sup>7</sup>G dataset demonstrate that MCLCBA achieves AUROC and AUPRC of 85.64% and 86.94%, respectively, improving upon existing methods by 5-6% in both metrics.</p><p><strong>Conclusions: </strong>Through multi-view contrastive learning, MCLCBA provides an approach for RM sites under sample-limited scenarios.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"281"},"PeriodicalIF":3.3,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12628535/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145547644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Denoising single-cell RNA-seq data with a deep learning-embedded statistical framework. 基于深度学习嵌入统计框架的单细胞RNA-seq数据去噪。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-11-19 DOI: 10.1186/s12859-025-06296-w
Qinhuan Luo, Yongzhen Yu, Tianying Wang

Background: Single-cell RNA sequencing (scRNA-seq) provides extensive opportunities to explore cellular heterogeneity but is often limited by substantial technical noise and variability. The prevalence of zero counts, arising from both biological variation and technical dropout events, poses significant challenges for downstream analyses. Existing imputation methods face inherent trade-offs: statistical approaches maintain interpretability but exhibit limited capacity for capturing complex, non-linear gene expression relationships, whereas deep learning methods demonstrate superior flexibility but are prone to overfitting and lack mechanistic interpretability, particularly in settings with limited sample sizes.

Methods: We present ZILLNB (Zero-Inflated Latent factors Learning-based Negative Binomial), a novel computational framework that integrates zero-inflated negative binomial (ZINB) regression with deep generative modeling. ZILLNB employs an ensemble architecture combining Information Variational Autoencoder (InfoVAE) and Generative Adversarial Network (GAN) to learn latent representations at cellular and gene levels. These latent factors serve as dynamic covariates within a ZINB regression framework, with parameters iteratively optimized through an Expectation-Maximization algorithm. This approach enables systematic decomposition of technical variability from intrinsic biological heterogeneity.

Results: Comparative evaluations across multiple scRNA-seq datasets demonstrate ZILLNB's superior performance. In cell type classification tasks using mouse cortex and human PBMC datasets, ZILLNB achieved the highest Adjusted Rand index (ARI) and Adjusted Mutual Information (AMI) among tested methods, with improvements ranging from 0.05 to 0.2 over VIPER, scImpute, DCA, DeepImpute, SAVER, scMultiGAN and ALRA. For differential expression analysis validated against matched bulk RNA-seq data, ZILLNB demonstrated improvements ranging from 0.05 to 0.3 for area under the Receiver Operating Characteristic curve (AUC-ROC) and the Precision-Recall curve (AUC-PR) compared to standard and other imputation methods, with consistently lower false discovery rates. Application to idiopathic pulmonary fibrosis (IPF) datasets revealed distinct fibroblast subpopulations undergoing fibroblast-to-myofibroblast transition, validated through marker gene expression and pathway enrichment analyses.

Conclusion: ZILLNB provides a principled framework for addressing technical artifacts in scRNA-seq data while preserving biological variation. The integration of statistical modeling with deep learning enables robust performance across diverse analytical tasks, including cell type identification, differential expression analysis, and rare cell population discovery, demonstrating utility across common single-cell analysis tasks.

背景:单细胞RNA测序(scRNA-seq)为探索细胞异质性提供了广泛的机会,但往往受到大量技术噪音和可变性的限制。由生物变异和技术辍学事件引起的零计数的流行对下游分析提出了重大挑战。现有的归算方法面临着固有的权衡:统计方法保持可解释性,但在捕获复杂的非线性基因表达关系方面表现出有限的能力,而深度学习方法表现出优越的灵活性,但容易过度拟合,缺乏机制可解释性,特别是在样本量有限的情况下。方法:我们提出了零膨胀潜在因素学习-基于负二项(Zero-Inflated Latent factors Learning-based Negative Binomial),这是一个将零膨胀负二项(Zero-Inflated Negative Binomial, ZINB)回归与深度生成建模相结合的新型计算框架。ZILLNB采用信息变分自编码器(InfoVAE)和生成对抗网络(GAN)相结合的集成架构来学习细胞和基因水平的潜在表征。这些潜在因素在ZINB回归框架中作为动态协变量,参数通过期望最大化算法迭代优化。这种方法能够从内在的生物异质性中系统地分解技术变异性。结果:跨多个scRNA-seq数据集的比较评估表明ZILLNB具有优越的性能。在使用小鼠皮质和人类PBMC数据集的细胞类型分类任务中,ZILLNB在测试方法中获得了最高的调整Rand指数(ARI)和调整互信息(AMI),比VIPER、scImpute、DCA、DeepImpute、SAVER、scMultiGAN和ALRA提高了0.05 ~ 0.2。对于匹配的大量RNA-seq数据验证的差异表达分析,与标准方法和其他方法相比,ZILLNB在受试者工作特征曲线(AUC-ROC)和精确召回曲线(AUC-PR)下的面积改善了0.05至0.3,错误发现率始终较低。对特发性肺纤维化(IPF)数据集的应用显示,不同的成纤维细胞亚群正在经历成纤维细胞向肌成纤维细胞的转变,通过标记基因表达和途径富集分析得到了验证。结论:ZILLNB为解决scRNA-seq数据中的技术伪像提供了一个原则性框架,同时保留了生物变异。统计建模与深度学习的集成可以在不同的分析任务中实现强大的性能,包括细胞类型识别、差异表达分析和罕见细胞群发现,展示了跨常见单细胞分析任务的实用性。
{"title":"Denoising single-cell RNA-seq data with a deep learning-embedded statistical framework.","authors":"Qinhuan Luo, Yongzhen Yu, Tianying Wang","doi":"10.1186/s12859-025-06296-w","DOIUrl":"10.1186/s12859-025-06296-w","url":null,"abstract":"<p><strong>Background: </strong>Single-cell RNA sequencing (scRNA-seq) provides extensive opportunities to explore cellular heterogeneity but is often limited by substantial technical noise and variability. The prevalence of zero counts, arising from both biological variation and technical dropout events, poses significant challenges for downstream analyses. Existing imputation methods face inherent trade-offs: statistical approaches maintain interpretability but exhibit limited capacity for capturing complex, non-linear gene expression relationships, whereas deep learning methods demonstrate superior flexibility but are prone to overfitting and lack mechanistic interpretability, particularly in settings with limited sample sizes.</p><p><strong>Methods: </strong>We present ZILLNB (Zero-Inflated Latent factors Learning-based Negative Binomial), a novel computational framework that integrates zero-inflated negative binomial (ZINB) regression with deep generative modeling. ZILLNB employs an ensemble architecture combining Information Variational Autoencoder (InfoVAE) and Generative Adversarial Network (GAN) to learn latent representations at cellular and gene levels. These latent factors serve as dynamic covariates within a ZINB regression framework, with parameters iteratively optimized through an Expectation-Maximization algorithm. This approach enables systematic decomposition of technical variability from intrinsic biological heterogeneity.</p><p><strong>Results: </strong>Comparative evaluations across multiple scRNA-seq datasets demonstrate ZILLNB's superior performance. In cell type classification tasks using mouse cortex and human PBMC datasets, ZILLNB achieved the highest Adjusted Rand index (ARI) and Adjusted Mutual Information (AMI) among tested methods, with improvements ranging from 0.05 to 0.2 over VIPER, scImpute, DCA, DeepImpute, SAVER, scMultiGAN and ALRA. For differential expression analysis validated against matched bulk RNA-seq data, ZILLNB demonstrated improvements ranging from 0.05 to 0.3 for area under the Receiver Operating Characteristic curve (AUC-ROC) and the Precision-Recall curve (AUC-PR) compared to standard and other imputation methods, with consistently lower false discovery rates. Application to idiopathic pulmonary fibrosis (IPF) datasets revealed distinct fibroblast subpopulations undergoing fibroblast-to-myofibroblast transition, validated through marker gene expression and pathway enrichment analyses.</p><p><strong>Conclusion: </strong>ZILLNB provides a principled framework for addressing technical artifacts in scRNA-seq data while preserving biological variation. The integration of statistical modeling with deep learning enables robust performance across diverse analytical tasks, including cell type identification, differential expression analysis, and rare cell population discovery, demonstrating utility across common single-cell analysis tasks.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"282"},"PeriodicalIF":3.3,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12629073/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145547688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AllergenAI: a deep learning model predicting allergenicity based on protein sequence. AllergenAI:基于蛋白质序列预测致敏性的深度学习模型。
IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-11-18 DOI: 10.1186/s12859-025-06302-1
Jiajia Liu, Surendra S Negi, Chengyuan Yang, Xiaobo Zhou, Catherine H Schein, Werner Braun, Pora Kim
{"title":"AllergenAI: a deep learning model predicting allergenicity based on protein sequence.","authors":"Jiajia Liu, Surendra S Negi, Chengyuan Yang, Xiaobo Zhou, Catherine H Schein, Werner Braun, Pora Kim","doi":"10.1186/s12859-025-06302-1","DOIUrl":"10.1186/s12859-025-06302-1","url":null,"abstract":"","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"279"},"PeriodicalIF":3.3,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12625376/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145547704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
BMC Bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1