首页 > 最新文献

Briefings in bioinformatics最新文献

英文 中文
COFFEE: consensus single cell-type specific inference for gene regulatory networks. COFFEE:基因调控网络的共识性单细胞类型特异性推断。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae457
Musaddiq K Lodi, Anna Chernikov, Preetam Ghosh

The inference of gene regulatory networks (GRNs) is crucial to understanding the regulatory mechanisms that govern biological processes. GRNs may be represented as edges in a graph, and hence, it have been inferred computationally for scRNA-seq data. A wisdom of crowds approach to integrate edges from several GRNs to create one composite GRN has demonstrated improved performance when compared with individual algorithm implementations on bulk RNA-seq and microarray data. In an effort to extend this approach to scRNA-seq data, we present COFFEE (COnsensus single cell-type speciFic inFerence for gEnE regulatory networks), a Borda voting-based consensus algorithm that integrates information from 10 established GRN inference methods. We conclude that COFFEE has improved performance across synthetic, curated, and experimental datasets when compared with baseline methods. Additionally, we show that a modified version of COFFEE can be leveraged to improve performance on newer cell-type specific GRN inference methods. Overall, our results demonstrate that consensus-based methods with pertinent modifications continue to be valuable for GRN inference at the single cell level. While COFFEE is benchmarked on 10 algorithms, it is a flexible strategy that can incorporate any set of GRN inference algorithms according to user preference. A Python implementation of COFFEE may be found on GitHub: https://github.com/lodimk2/coffee.

基因调控网络(GRN)的推断对于了解生物过程的调控机制至关重要。基因调控网络可以用图中的边来表示,因此可以通过计算来推断 scRNA-seq 数据。与批量 RNA-seq 和微阵列数据上的单个算法实施相比,一种整合多个 GRN 的边以创建一个复合 GRN 的众智方法已证明性能有所提高。为了将这种方法扩展到 scRNA-seq 数据,我们提出了 COFFEE(COnsensus single cell-type speciFic inFerence for gEnE regulatory networks),这是一种基于 Borda 投票的共识算法,它整合了 10 种成熟 GRN 推断方法的信息。我们的结论是,与基线方法相比,COFFEE 在合成数据集、策划数据集和实验数据集上的性能都有所提高。此外,我们还展示了 COFFEE 的改进版,可以利用它来提高更新的特定细胞类型 GRN 推断方法的性能。总之,我们的研究结果表明,经过相关修改的基于共识的方法对于单细胞水平的 GRN 推断仍然很有价值。虽然 COFFEE 以 10 种算法为基准,但它是一种灵活的策略,可以根据用户的偏好纳入任何一组 GRN 推断算法。COFFEE 的 Python 实现可在 GitHub 上找到:https://github.com/lodimk2/coffee。
{"title":"COFFEE: consensus single cell-type specific inference for gene regulatory networks.","authors":"Musaddiq K Lodi, Anna Chernikov, Preetam Ghosh","doi":"10.1093/bib/bbae457","DOIUrl":"10.1093/bib/bbae457","url":null,"abstract":"<p><p>The inference of gene regulatory networks (GRNs) is crucial to understanding the regulatory mechanisms that govern biological processes. GRNs may be represented as edges in a graph, and hence, it have been inferred computationally for scRNA-seq data. A wisdom of crowds approach to integrate edges from several GRNs to create one composite GRN has demonstrated improved performance when compared with individual algorithm implementations on bulk RNA-seq and microarray data. In an effort to extend this approach to scRNA-seq data, we present COFFEE (COnsensus single cell-type speciFic inFerence for gEnE regulatory networks), a Borda voting-based consensus algorithm that integrates information from 10 established GRN inference methods. We conclude that COFFEE has improved performance across synthetic, curated, and experimental datasets when compared with baseline methods. Additionally, we show that a modified version of COFFEE can be leveraged to improve performance on newer cell-type specific GRN inference methods. Overall, our results demonstrate that consensus-based methods with pertinent modifications continue to be valuable for GRN inference at the single cell level. While COFFEE is benchmarked on 10 algorithms, it is a flexible strategy that can incorporate any set of GRN inference algorithms according to user preference. A Python implementation of COFFEE may be found on GitHub: https://github.com/lodimk2/coffee.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11418232/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142280435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep learning model for protein multi-label subcellular localization and function prediction based on multi-task collaborative training. 基于多任务协作训练的蛋白质多标签亚细胞定位和功能预测深度学习模型。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae568
Peihao Bai, Guanghui Li, Jiawei Luo, Cheng Liang

The functional study of proteins is a critical task in modern biology, playing a pivotal role in understanding the mechanisms of pathogenesis, developing new drugs, and discovering novel drug targets. However, existing computational models for subcellular localization face significant challenges, such as reliance on known Gene Ontology (GO) annotation databases or overlooking the relationship between GO annotations and subcellular localization. To address these issues, we propose DeepMTC, an end-to-end deep learning-based multi-task collaborative training model. DeepMTC integrates the interrelationship between subcellular localization and the functional annotation of proteins, leveraging multi-task collaborative training to eliminate dependence on known GO databases. This strategy gives DeepMTC a distinct advantage in predicting newly discovered proteins without prior functional annotations. First, DeepMTC leverages pre-trained language model with high accuracy to obtain the 3D structure and sequence features of proteins. Additionally, it employs a graph transformer module to encode protein sequence features, addressing the problem of long-range dependencies in graph neural networks. Finally, DeepMTC uses a functional cross-attention mechanism to efficiently combine upstream learned functional features to perform the subcellular localization task. The experimental results demonstrate that DeepMTC outperforms state-of-the-art models in both protein function prediction and subcellular localization. Moreover, interpretability experiments revealed that DeepMTC can accurately identify the key residues and functional domains of proteins, confirming its superior performance. The code and dataset of DeepMTC are freely available at https://github.com/ghli16/DeepMTC.

蛋白质的功能研究是现代生物学的一项关键任务,在了解发病机制、开发新药和发现新的药物靶点方面发挥着举足轻重的作用。然而,现有的亚细胞定位计算模型面临着巨大的挑战,例如依赖于已知的基因本体(GO)注释数据库,或者忽视了 GO 注释与亚细胞定位之间的关系。为了解决这些问题,我们提出了基于深度学习的端到端多任务协作训练模型 DeepMTC。DeepMTC 整合了亚细胞定位与蛋白质功能注释之间的相互关系,利用多任务协作训练消除了对已知 GO 数据库的依赖。这一策略使 DeepMTC 在预测没有预先功能注释的新发现蛋白质时具有明显优势。首先,DeepMTC 利用预先训练的高精度语言模型来获取蛋白质的三维结构和序列特征。此外,它还采用了图转换器模块来编码蛋白质序列特征,从而解决了图神经网络中的长程依赖性问题。最后,DeepMTC 利用功能交叉注意机制,有效地结合上游学习到的功能特征来完成亚细胞定位任务。实验结果表明,DeepMTC 在蛋白质功能预测和亚细胞定位方面都优于最先进的模型。此外,可解释性实验表明,DeepMTC 能准确识别蛋白质的关键残基和功能域,从而证实了其卓越的性能。DeepMTC 的代码和数据集可在 https://github.com/ghli16/DeepMTC 免费获取。
{"title":"Deep learning model for protein multi-label subcellular localization and function prediction based on multi-task collaborative training.","authors":"Peihao Bai, Guanghui Li, Jiawei Luo, Cheng Liang","doi":"10.1093/bib/bbae568","DOIUrl":"10.1093/bib/bbae568","url":null,"abstract":"<p><p>The functional study of proteins is a critical task in modern biology, playing a pivotal role in understanding the mechanisms of pathogenesis, developing new drugs, and discovering novel drug targets. However, existing computational models for subcellular localization face significant challenges, such as reliance on known Gene Ontology (GO) annotation databases or overlooking the relationship between GO annotations and subcellular localization. To address these issues, we propose DeepMTC, an end-to-end deep learning-based multi-task collaborative training model. DeepMTC integrates the interrelationship between subcellular localization and the functional annotation of proteins, leveraging multi-task collaborative training to eliminate dependence on known GO databases. This strategy gives DeepMTC a distinct advantage in predicting newly discovered proteins without prior functional annotations. First, DeepMTC leverages pre-trained language model with high accuracy to obtain the 3D structure and sequence features of proteins. Additionally, it employs a graph transformer module to encode protein sequence features, addressing the problem of long-range dependencies in graph neural networks. Finally, DeepMTC uses a functional cross-attention mechanism to efficiently combine upstream learned functional features to perform the subcellular localization task. The experimental results demonstrate that DeepMTC outperforms state-of-the-art models in both protein function prediction and subcellular localization. Moreover, interpretability experiments revealed that DeepMTC can accurately identify the key residues and functional domains of proteins, confirming its superior performance. The code and dataset of DeepMTC are freely available at https://github.com/ghli16/DeepMTC.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11531862/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142567262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Model ensembling as a tool to form interpretable multi-omic predictors of cancer pharmacosensitivity. 以模型组合为工具,形成可解释的癌症药敏性多组学预测指标。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae567
Sébastien De Landtsheer, Apurva Badkas, Dagmar Kulms, Thomas Sauter

Stratification of patients diagnosed with cancer has become a major goal in personalized oncology. One important aspect is the accurate prediction of the response to various drugs. It is expected that the molecular characteristics of the cancer cells contain enough information to retrieve specific signatures, allowing for accurate predictions based solely on these multi-omic data. Ideally, these predictions should be explainable to clinicians, in order to be integrated in the patients care. We propose a machine-learning framework based on ensemble learning to integrate multi-omic data and predict sensitivity to an array of commonly used and experimental compounds, including chemotoxic compounds and targeted kinase inhibitors. We trained a set of classifiers on the different parts of our dataset to produce omic-specific signatures, then trained a random forest classifier on these signatures to predict drug responsiveness. We used the Cancer Cell Line Encyclopedia dataset, comprising multi-omic and drug sensitivity measurements for hundreds of cell lines, to build the predictive models, and validated the results using nested cross-validation. Our results show good performance for several compounds (Area under the Receiver-Operating Curve >79%) across the most frequent cancer types. Furthermore, the simplicity of our approach allows to examine which omic layers have a greater importance in the models and identify new putative markers of drug responsiveness. We propose several models based on small subsets of transcriptional markers with the potential to become useful tools in personalized oncology, paving the way for clinicians to use the molecular characteristics of the tumors to predict sensitivity to therapeutic compounds.

对确诊为癌症的患者进行分层已成为个性化肿瘤学的一个主要目标。其中一个重要方面是准确预测对各种药物的反应。预计癌细胞的分子特征包含足够的信息来检索特定特征,从而可以仅根据这些多原子数据进行准确预测。理想情况下,这些预测结果应能向临床医生解释,以便纳入患者护理中。我们提出了一种基于集合学习的机器学习框架,以整合多组学数据并预测对一系列常用和实验化合物(包括化学毒性化合物和靶向激酶抑制剂)的敏感性。我们在数据集的不同部分训练了一组分类器,以生成omic特异性特征,然后在这些特征上训练了一个随机森林分类器,以预测药物反应性。我们使用《癌症细胞系百科全书》数据集来建立预测模型,该数据集包含数百种细胞系的多组学和药物敏感性测量结果,并使用嵌套交叉验证对结果进行了验证。我们的结果表明,在最常见的癌症类型中,有几种化合物具有良好的性能(接收曲线下面积大于 79%)。此外,我们的方法非常简单,因此可以检查模型中哪些指标层更重要,并确定药物反应性的新假定标记。我们提出了几个基于小型转录标记子集的模型,它们有可能成为个性化肿瘤学的有用工具,为临床医生利用肿瘤的分子特征预测对治疗化合物的敏感性铺平道路。
{"title":"Model ensembling as a tool to form interpretable multi-omic predictors of cancer pharmacosensitivity.","authors":"Sébastien De Landtsheer, Apurva Badkas, Dagmar Kulms, Thomas Sauter","doi":"10.1093/bib/bbae567","DOIUrl":"10.1093/bib/bbae567","url":null,"abstract":"<p><p>Stratification of patients diagnosed with cancer has become a major goal in personalized oncology. One important aspect is the accurate prediction of the response to various drugs. It is expected that the molecular characteristics of the cancer cells contain enough information to retrieve specific signatures, allowing for accurate predictions based solely on these multi-omic data. Ideally, these predictions should be explainable to clinicians, in order to be integrated in the patients care. We propose a machine-learning framework based on ensemble learning to integrate multi-omic data and predict sensitivity to an array of commonly used and experimental compounds, including chemotoxic compounds and targeted kinase inhibitors. We trained a set of classifiers on the different parts of our dataset to produce omic-specific signatures, then trained a random forest classifier on these signatures to predict drug responsiveness. We used the Cancer Cell Line Encyclopedia dataset, comprising multi-omic and drug sensitivity measurements for hundreds of cell lines, to build the predictive models, and validated the results using nested cross-validation. Our results show good performance for several compounds (Area under the Receiver-Operating Curve >79%) across the most frequent cancer types. Furthermore, the simplicity of our approach allows to examine which omic layers have a greater importance in the models and identify new putative markers of drug responsiveness. We propose several models based on small subsets of transcriptional markers with the potential to become useful tools in personalized oncology, paving the way for clinicians to use the molecular characteristics of the tumors to predict sensitivity to therapeutic compounds.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11532660/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142567268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HMPA: a pioneering framework for the noncanonical peptidome from discovery to functional insights. HMPA:非典型肽组从发现到功能深入研究的开创性框架。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae510
Xinwan Su, Chengyu Shi, Fangzhou Liu, Manman Tan, Ying Wang, Linyu Zhu, Yu Chen, Meng Yu, Xinyi Wang, Jian Liu, Yang Liu, Weiqiang Lin, Zhaoyuan Fang, Qiang Sun, Tianhua Zhou, Aifu Lin

Advancements in peptidomics have revealed numerous small open reading frames with coding potential and revealed that some of these micropeptides are closely related to human cancer. However, the systematic analysis and integration from sequence to structure and function remains largely undeveloped. Here, as a solution, we built a workflow for the collection and analysis of proteomic data, transcriptomic data, and clinical outcomes for cancer-associated micropeptides using publicly available datasets from large cohorts. We initially identified 19 586 novel micropeptides by reanalyzing proteomic profile data from 3753 samples across 8 cancer types. Further quantitative analysis of these micropeptides, along with associated clinical data, identified 3065 that were dysregulated in cancer, with 370 of them showing a strong association with prognosis. Moreover, we employed a deep learning framework to construct a micropeptide-protein interaction network for further bioinformatics analysis, revealing that micropeptides are involved in multiple biological processes as bioactive molecules. Taken together, our atlas provides a benchmark for high-throughput prediction and functional exploration of micropeptides, providing new insights into their biological mechanisms in cancer. The HMPA is freely available at http://hmpa.zju.edu.cn.

肽组学的进步揭示了许多具有编码潜力的小型开放阅读框,并发现其中一些微肽与人类癌症密切相关。然而,从序列到结构和功能的系统分析和整合在很大程度上仍未得到发展。在这里,作为一种解决方案,我们建立了一个工作流程,利用来自大型队列的公开数据集,收集和分析与癌症相关的微肽的蛋白质组数据、转录组数据和临床结果。通过重新分析 8 种癌症类型 3753 个样本的蛋白质组数据,我们初步鉴定出 19 586 种新型微肽。通过对这些微肽以及相关临床数据的进一步定量分析,我们发现了3065种在癌症中调控失调的微肽,其中370种与预后密切相关。此外,我们还利用深度学习框架构建了微肽-蛋白质相互作用网络,用于进一步的生物信息学分析,揭示了微肽作为生物活性分子参与了多种生物过程。总之,我们的图集为微肽的高通量预测和功能探索提供了一个基准,为了解微肽在癌症中的生物学机制提供了新的视角。HMPA 可在 http://hmpa.zju.edu.cn 免费获取。
{"title":"HMPA: a pioneering framework for the noncanonical peptidome from discovery to functional insights.","authors":"Xinwan Su, Chengyu Shi, Fangzhou Liu, Manman Tan, Ying Wang, Linyu Zhu, Yu Chen, Meng Yu, Xinyi Wang, Jian Liu, Yang Liu, Weiqiang Lin, Zhaoyuan Fang, Qiang Sun, Tianhua Zhou, Aifu Lin","doi":"10.1093/bib/bbae510","DOIUrl":"https://doi.org/10.1093/bib/bbae510","url":null,"abstract":"<p><p>Advancements in peptidomics have revealed numerous small open reading frames with coding potential and revealed that some of these micropeptides are closely related to human cancer. However, the systematic analysis and integration from sequence to structure and function remains largely undeveloped. Here, as a solution, we built a workflow for the collection and analysis of proteomic data, transcriptomic data, and clinical outcomes for cancer-associated micropeptides using publicly available datasets from large cohorts. We initially identified 19 586 novel micropeptides by reanalyzing proteomic profile data from 3753 samples across 8 cancer types. Further quantitative analysis of these micropeptides, along with associated clinical data, identified 3065 that were dysregulated in cancer, with 370 of them showing a strong association with prognosis. Moreover, we employed a deep learning framework to construct a micropeptide-protein interaction network for further bioinformatics analysis, revealing that micropeptides are involved in multiple biological processes as bioactive molecules. Taken together, our atlas provides a benchmark for high-throughput prediction and functional exploration of micropeptides, providing new insights into their biological mechanisms in cancer. The HMPA is freely available at http://hmpa.zju.edu.cn.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11483136/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142458374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Digital PCR threshold robustness analysis and optimization using dipcensR. 使用 dipcensR 进行数字 PCR 阈值稳健性分析和优化。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae507
Matthijs Vynck, Wim Trypsteen, Olivier Thas, Jo Vandesompele, Ward De Spiegelaere

Digital polymerase chain reaction (dPCR) is a best-in-class molecular biology technique for the accurate and precise quantification of nucleic acids. The recent maturation of dPCR technology allows the quantification of up to thousands of targeted nucleic acids per instrument per day. A key step in the dPCR data analysis workflow is the classification of partitions into two classes based on their partition intensities: partitions either containing or lacking target nucleic acids of interest. Much effort has been invested in the design and tailoring of automated dPCR partition classification procedures, and such procedures will be increasingly important as the technology ventures into high-throughput applications. However, automated partition classification is not fail-safe, and evaluation of its accuracy is highly advised. This accuracy evaluation is a manual endeavor and is becoming a bottleneck for high-throughput dPCR applications. Here, we introduce dipcensR, the first data-analysis procedure that automates the assessment of any linear partition classifier's partition classification accuracy, offering potentially substantial efficiency gains. dipcensR is based on a robustness evaluation of said partition classification and flags classifications with low robustness as needing review. Additionally, dipcensR's robustness analysis underpins (optional) automatic optimization of partition classification to achieve maximal robustness. A freely available R implementation supports dipcensR's use.

数字聚合酶链式反应(dPCR)是精确定量核酸的最佳分子生物学技术。随着 dPCR 技术的不断成熟,每台仪器每天最多可对数千个目标核酸进行定量分析。dPCR 数据分析工作流程中的一个关键步骤是根据分区强度将分区分为两类:含有或缺乏目标核酸的分区。在设计和定制 dPCR 自动分区分类程序方面投入了大量精力,随着该技术进入高通量应用领域,这些程序将变得越来越重要。不过,自动分区分类并非万无一失,因此建议对其准确性进行评估。准确性评估需要人工完成,这已成为高通量 dPCR 应用的瓶颈。dipcensR 基于对所述分区分类的稳健性评估,并将稳健性低的分类标记为需要审查。此外,dipcensR 的稳健性分析还支持分区分类的自动优化(可选),以实现最大的稳健性。免费提供的 R 实现支持 dipcensR 的使用。
{"title":"Digital PCR threshold robustness analysis and optimization using dipcensR.","authors":"Matthijs Vynck, Wim Trypsteen, Olivier Thas, Jo Vandesompele, Ward De Spiegelaere","doi":"10.1093/bib/bbae507","DOIUrl":"https://doi.org/10.1093/bib/bbae507","url":null,"abstract":"<p><p>Digital polymerase chain reaction (dPCR) is a best-in-class molecular biology technique for the accurate and precise quantification of nucleic acids. The recent maturation of dPCR technology allows the quantification of up to thousands of targeted nucleic acids per instrument per day. A key step in the dPCR data analysis workflow is the classification of partitions into two classes based on their partition intensities: partitions either containing or lacking target nucleic acids of interest. Much effort has been invested in the design and tailoring of automated dPCR partition classification procedures, and such procedures will be increasingly important as the technology ventures into high-throughput applications. However, automated partition classification is not fail-safe, and evaluation of its accuracy is highly advised. This accuracy evaluation is a manual endeavor and is becoming a bottleneck for high-throughput dPCR applications. Here, we introduce dipcensR, the first data-analysis procedure that automates the assessment of any linear partition classifier's partition classification accuracy, offering potentially substantial efficiency gains. dipcensR is based on a robustness evaluation of said partition classification and flags classifications with low robustness as needing review. Additionally, dipcensR's robustness analysis underpins (optional) automatic optimization of partition classification to achieve maximal robustness. A freely available R implementation supports dipcensR's use.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11472245/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142458371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
siRNADiscovery: a graph neural network for siRNA efficacy prediction via deep RNA sequence analysis. siRNADiscovery:通过深度 RNA 序列分析预测 siRNA 药效的图神经网络。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae563
Rongzhuo Long, Ziyu Guo, Da Han, Boxiang Liu, Xudong Yuan, Guangyong Chen, Pheng-Ann Heng, Liang Zhang

The clinical adoption of small interfering RNAs (siRNAs) has prompted the development of various computational strategies for siRNA design, from traditional data analysis to advanced machine learning techniques. However, previous studies have inadequately considered the full complexity of the siRNA silencing mechanism, neglecting critical elements such as siRNA positioning on mRNA, RNA base-pairing probabilities, and RNA-AGO2 interactions, thereby limiting the insight and accuracy of existing models. Here, we introduce siRNADiscovery, a Graph Neural Network (GNN) framework that leverages both non-empirical and empirical rule-based features of siRNA and mRNA to effectively capture the complex dynamics of gene silencing. On multiple internal datasets, siRNADiscovery achieves state-of-the-art performance. Significantly, siRNADiscovery also outperforms existing methodologies in in vitro studies and on an externally validated dataset. Additionally, we develop a new data-splitting methodology that addresses the data leakage issue, a frequently overlooked problem in previous studies, ensuring the robustness and stability of our model under various experimental settings. Through rigorous testing, siRNADiscovery has demonstrated remarkable predictive accuracy and robustness, making significant contributions to the field of gene silencing. Furthermore, our approach to redefining data-splitting standards aims to set new benchmarks for future research in the domain of predictive biological modeling for siRNA.

小干扰 RNA(siRNA)的临床应用促使人们开发了各种 siRNA 设计计算策略,从传统的数据分析到先进的机器学习技术。然而,以往的研究没有充分考虑 siRNA 沉默机制的全部复杂性,忽略了 siRNA 在 mRNA 上的定位、RNA 碱基配对概率以及 RNA-AGO2 相互作用等关键因素,从而限制了现有模型的洞察力和准确性。在这里,我们介绍了 siRNADiscovery,这是一种图神经网络(GNN)框架,它利用 siRNA 和 mRNA 的非经验和经验规则特征,有效捕捉基因沉默的复杂动态。在多个内部数据集上,siRNADiscovery 实现了最先进的性能。值得注意的是,siRNADiscovery 在体外研究和外部验证数据集上的表现也优于现有方法。此外,我们还开发了一种新的数据分割方法,解决了以往研究中经常忽视的数据泄露问题,确保了我们的模型在各种实验环境下的鲁棒性和稳定性。通过严格的测试,siRNADiscovery 显示出了非凡的预测准确性和稳健性,为基因沉默领域做出了重大贡献。此外,我们重新定义数据分割标准的方法旨在为 siRNA 预测生物学建模领域的未来研究树立新的标杆。
{"title":"siRNADiscovery: a graph neural network for siRNA efficacy prediction via deep RNA sequence analysis.","authors":"Rongzhuo Long, Ziyu Guo, Da Han, Boxiang Liu, Xudong Yuan, Guangyong Chen, Pheng-Ann Heng, Liang Zhang","doi":"10.1093/bib/bbae563","DOIUrl":"10.1093/bib/bbae563","url":null,"abstract":"<p><p>The clinical adoption of small interfering RNAs (siRNAs) has prompted the development of various computational strategies for siRNA design, from traditional data analysis to advanced machine learning techniques. However, previous studies have inadequately considered the full complexity of the siRNA silencing mechanism, neglecting critical elements such as siRNA positioning on mRNA, RNA base-pairing probabilities, and RNA-AGO2 interactions, thereby limiting the insight and accuracy of existing models. Here, we introduce siRNADiscovery, a Graph Neural Network (GNN) framework that leverages both non-empirical and empirical rule-based features of siRNA and mRNA to effectively capture the complex dynamics of gene silencing. On multiple internal datasets, siRNADiscovery achieves state-of-the-art performance. Significantly, siRNADiscovery also outperforms existing methodologies in in vitro studies and on an externally validated dataset. Additionally, we develop a new data-splitting methodology that addresses the data leakage issue, a frequently overlooked problem in previous studies, ensuring the robustness and stability of our model under various experimental settings. Through rigorous testing, siRNADiscovery has demonstrated remarkable predictive accuracy and robustness, making significant contributions to the field of gene silencing. Furthermore, our approach to redefining data-splitting standards aims to set new benchmarks for future research in the domain of predictive biological modeling for siRNA.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11539000/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142582071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PathMethy: an interpretable AI framework for cancer origin tracing based on DNA methylation. PathMethy:基于 DNA 甲基化的可解释癌症起源追踪人工智能框架。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae497
Jiajing Xie, Yuhang Song, Hailong Zheng, Shijie Luo, Ying Chen, Chen Zhang, Rongshan Yu, Mengsha Tong

Despite advanced diagnostics, 3%-5% of cases remain classified as cancer of unknown primary (CUP). DNA methylation, an important epigenetic feature, is essential for determining the origin of metastatic tumors. We presented PathMethy, a novel Transformer model integrated with functional categories and crosstalk of pathways, to accurately trace the origin of tumors in CUP samples based on DNA methylation. PathMethy outperformed seven competing methods in F1-score across nine cancer datasets and predicted accurately the molecular subtypes within nine primary tumor types. It not only excelled at tracing the origins of both primary and metastatic tumors but also demonstrated a high degree of agreement with previously diagnosed sites in cases of CUP. PathMethy provided biological insights by highlighting key pathways, functional categories, and their interactions. Using functional categories of pathways, we gained a global understanding of biological processes. For broader access, a user-friendly web server for researchers and clinicians is available at https://cup.pathmethy.com.

尽管诊断手段先进,但仍有 3%-5% 的病例被归类为原发灶不明的癌症(CUP)。DNA 甲基化是一种重要的表观遗传特征,对于确定转移性肿瘤的起源至关重要。我们提出了 PathMethy,这是一种新型的 Transformer 模型,集成了功能分类和路径串联,可根据 DNA 甲基化准确追踪 CUP 样本中肿瘤的来源。在九个癌症数据集中,PathMethy 的 F1 分数超过了七种竞争方法,并准确预测了九种原发性肿瘤类型中的分子亚型。它不仅在追踪原发性肿瘤和转移性肿瘤的起源方面表现出色,而且与之前诊断出的 CUP 病例的部位高度吻合。PathMethy 通过突出关键通路、功能类别及其相互作用来提供生物学见解。通过途径的功能类别,我们对生物过程有了全面的了解。为了扩大访问范围,我们还为研究人员和临床医生提供了一个用户友好型网络服务器,网址是 https://cup.pathmethy.com。
{"title":"PathMethy: an interpretable AI framework for cancer origin tracing based on DNA methylation.","authors":"Jiajing Xie, Yuhang Song, Hailong Zheng, Shijie Luo, Ying Chen, Chen Zhang, Rongshan Yu, Mengsha Tong","doi":"10.1093/bib/bbae497","DOIUrl":"10.1093/bib/bbae497","url":null,"abstract":"<p><p>Despite advanced diagnostics, 3%-5% of cases remain classified as cancer of unknown primary (CUP). DNA methylation, an important epigenetic feature, is essential for determining the origin of metastatic tumors. We presented PathMethy, a novel Transformer model integrated with functional categories and crosstalk of pathways, to accurately trace the origin of tumors in CUP samples based on DNA methylation. PathMethy outperformed seven competing methods in F1-score across nine cancer datasets and predicted accurately the molecular subtypes within nine primary tumor types. It not only excelled at tracing the origins of both primary and metastatic tumors but also demonstrated a high degree of agreement with previously diagnosed sites in cases of CUP. PathMethy provided biological insights by highlighting key pathways, functional categories, and their interactions. Using functional categories of pathways, we gained a global understanding of biological processes. For broader access, a user-friendly web server for researchers and clinicians is available at https://cup.pathmethy.com.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11467402/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142399351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mediation analysis in longitudinal study with high-dimensional methylation mediators. 利用高维甲基化中介因子对纵向研究进行中介分析。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae496
Yidan Cui, Qingmin Lin, Xin Yuan, Fan Jiang, Shiyang Ma, Zhangsheng Yu

Mediation analysis has been widely utilized to identify potential pathways connecting exposures and outcomes. However, there remains a lack of analytical methods for high-dimensional mediation analysis in longitudinal data. To tackle this concern, we proposed an effective and novel approach with variable selection and the indirect effect (IE) assessment based on both linear mixed-effect model and generalized estimating equation. Initially, we employ sure independence screening to reduce the dimension of candidate mediators. Subsequently, we implement the Sobel test with the Bonferroni correction for IE hypothesis testing. Through extensive simulation studies, we demonstrate the performance of our proposed procedure with a higher F$_{1}$ score (0.8056 and 0.9983 at sample sizes of 150 and 500, respectively) compared with the linear method (0.7779 and 0.9642 at the same sample sizes), along with more accurate parameter estimation and a significantly lower false discovery rate. Moreover, we apply our methodology to explore the mediation mechanisms involving over 730 000 DNA methylation sites with potential effects between the paternal body mass index (BMI) and offspring growing BMI in the Shanghai sleeping birth cohort data, leading to the identification of two previously undiscovered mediating CpG sites.

中介分析已被广泛用于识别连接暴露和结果的潜在途径。然而,在纵向数据中仍然缺乏高维中介分析的分析方法。为了解决这一问题,我们提出了一种有效的新方法,即基于线性混合效应模型和广义估计方程进行变量选择和间接效应(IE)评估。首先,我们采用确定的独立性筛选来减少候选中介因子的维度。随后,我们在 IE 假设检验中采用了带有 Bonferroni 校正的 Sobel 检验。通过大量的模拟研究,我们证明了我们提出的程序的性能,与线性方法(相同样本量下分别为 0.7779 和 0.9642)相比,我们的 F$_{1}$ 得分更高(样本量分别为 150 和 500 时分别为 0.8056 和 0.9983),参数估计更准确,误发现率显著降低。此外,我们还应用我们的方法探索了上海睡眠出生队列数据中父代体重指数(BMI)与子代生长体重指数(BMI)之间潜在影响的 73 万多个 DNA 甲基化位点的中介机制,从而发现了两个之前未被发现的中介 CpG 位点。
{"title":"Mediation analysis in longitudinal study with high-dimensional methylation mediators.","authors":"Yidan Cui, Qingmin Lin, Xin Yuan, Fan Jiang, Shiyang Ma, Zhangsheng Yu","doi":"10.1093/bib/bbae496","DOIUrl":"https://doi.org/10.1093/bib/bbae496","url":null,"abstract":"<p><p>Mediation analysis has been widely utilized to identify potential pathways connecting exposures and outcomes. However, there remains a lack of analytical methods for high-dimensional mediation analysis in longitudinal data. To tackle this concern, we proposed an effective and novel approach with variable selection and the indirect effect (IE) assessment based on both linear mixed-effect model and generalized estimating equation. Initially, we employ sure independence screening to reduce the dimension of candidate mediators. Subsequently, we implement the Sobel test with the Bonferroni correction for IE hypothesis testing. Through extensive simulation studies, we demonstrate the performance of our proposed procedure with a higher F$_{1}$ score (0.8056 and 0.9983 at sample sizes of 150 and 500, respectively) compared with the linear method (0.7779 and 0.9642 at the same sample sizes), along with more accurate parameter estimation and a significantly lower false discovery rate. Moreover, we apply our methodology to explore the mediation mechanisms involving over 730 000 DNA methylation sites with potential effects between the paternal body mass index (BMI) and offspring growing BMI in the Shanghai sleeping birth cohort data, leading to the identification of two previously undiscovered mediating CpG sites.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11479716/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142458376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CDHu40: a novel marker gene set of neuroendocrine prostate cancer. CDHu40:神经内分泌性前列腺癌的新型标记基因集。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae471
Sheng Liu, Hye Seung Nam, Ziyu Zeng, Xuehong Deng, Elnaz Pashaei, Yong Zang, Lei Yang, Chenglong Li, Jiaoti Huang, Michael K Wendt, Xin Lu, Rong Huang, Jun Wan

Prostate cancer (PCa) is the most prevalent cancer affecting American men. Castration-resistant prostate cancer (CRPC) can emerge during hormone therapy for PCa, manifesting with elevated serum prostate-specific antigen levels, continued disease progression, and/or metastasis to the new sites, resulting in a poor prognosis. A subset of CRPC patients shows a neuroendocrine (NE) phenotype, signifying reduced or no reliance on androgen receptor signaling and a particularly unfavorable prognosis. In this study, we incorporated computational approaches based on both gene expression profiles and protein-protein interaction networks. We identified 500 potential marker genes, which are significantly enriched in cell cycle and neuronal processes. The top 40 candidates, collectively named CDHu40, demonstrated superior performance in distinguishing NE PCa (NEPC) and non-NEPC samples based on gene expression profiles. CDHu40 outperformed most of the other published marker sets, excelling particularly at the prognostic level. Notably, some marker genes in CDHu40, absent in the other marker sets, have been reported to be associated with NEPC in the literature, such as DDC, FOLH1, BEX1, MAST1, and CACNA1A. Importantly, elevated CDHu40 scores derived from our predictive model showed a robust correlation with unfavorable survival outcomes in patients, indicating the potential of the CDHu40 score as a promising indicator for predicting the survival prognosis of those patients with the NE phenotype. Motif enrichment analysis on the top candidates suggests that REST and E2F6 may serve as key regulators in the NEPC progression.

前列腺癌(PCa)是美国男性发病率最高的癌症。阉割耐药前列腺癌(CRPC)可在 PCa 接受激素治疗期间出现,表现为血清前列腺特异性抗原水平升高、疾病持续进展和/或转移到新的部位,导致预后不良。CRPC患者中有一部分表现为神经内分泌(NE)表型,表明对雄激素受体信号的依赖性降低或消失,预后特别差。在这项研究中,我们采用了基于基因表达谱和蛋白质相互作用网络的计算方法。我们确定了 500 个潜在的标记基因,这些基因在细胞周期和神经元过程中明显富集。前 40 个候选基因统称为 CDHu40,它们在根据基因表达谱区分 NE PCa(NEPC)和非 NEPC 样本方面表现出卓越的性能。CDHu40 的表现优于大多数其他已发表的标记集,尤其是在预后层面。值得注意的是,CDHu40 中的一些标记基因在其他标记集中并不存在,但有文献报道它们与 NEPC 相关,如 DDC、FOLH1、BEX1、MAST1 和 CACNA1A。重要的是,从我们的预测模型中得出的 CDHu40 得分升高与患者的不良生存结果显示出很强的相关性,这表明 CDHu40 得分有可能成为预测 NE 表型患者生存预后的指标。对顶级候选基因的动因富集分析表明,REST和E2F6可能是NEPC进展过程中的关键调控因子。
{"title":"CDHu40: a novel marker gene set of neuroendocrine prostate cancer.","authors":"Sheng Liu, Hye Seung Nam, Ziyu Zeng, Xuehong Deng, Elnaz Pashaei, Yong Zang, Lei Yang, Chenglong Li, Jiaoti Huang, Michael K Wendt, Xin Lu, Rong Huang, Jun Wan","doi":"10.1093/bib/bbae471","DOIUrl":"10.1093/bib/bbae471","url":null,"abstract":"<p><p>Prostate cancer (PCa) is the most prevalent cancer affecting American men. Castration-resistant prostate cancer (CRPC) can emerge during hormone therapy for PCa, manifesting with elevated serum prostate-specific antigen levels, continued disease progression, and/or metastasis to the new sites, resulting in a poor prognosis. A subset of CRPC patients shows a neuroendocrine (NE) phenotype, signifying reduced or no reliance on androgen receptor signaling and a particularly unfavorable prognosis. In this study, we incorporated computational approaches based on both gene expression profiles and protein-protein interaction networks. We identified 500 potential marker genes, which are significantly enriched in cell cycle and neuronal processes. The top 40 candidates, collectively named CDHu40, demonstrated superior performance in distinguishing NE PCa (NEPC) and non-NEPC samples based on gene expression profiles. CDHu40 outperformed most of the other published marker sets, excelling particularly at the prognostic level. Notably, some marker genes in CDHu40, absent in the other marker sets, have been reported to be associated with NEPC in the literature, such as DDC, FOLH1, BEX1, MAST1, and CACNA1A. Importantly, elevated CDHu40 scores derived from our predictive model showed a robust correlation with unfavorable survival outcomes in patients, indicating the potential of the CDHu40 score as a promising indicator for predicting the survival prognosis of those patients with the NE phenotype. Motif enrichment analysis on the top candidates suggests that REST and E2F6 may serve as key regulators in the NEPC progression.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11422505/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142341934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development and experimental validation of computational methods for human antibody affinity enhancement. 人类抗体亲和力增强计算方法的开发与实验验证。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae488
Junxin Li, Linbu Liao, Chao Zhang, Kaifang Huang, Pengfei Zhang, John Z H Zhang, Xiaochun Wan, Haiping Zhang

High affinity is crucial for the efficacy and specificity of antibody. Due to involving high-throughput screens, biological experiments for antibody affinity maturation are time-consuming and have a low success rate. Precise computational-assisted antibody design promises to accelerate this process, but there is still a lack of effective computational methods capable of pinpointing beneficial mutations within the complementarity-determining region (CDR) of antibodies. Moreover, random mutations often lead to challenges in antibody expression and immunogenicity. In this study, to enhance the affinity of a human antibody against avian influenza virus, a CDR library was constructed and evolutionary information was acquired through sequence alignment to restrict the mutation positions and types. Concurrently, a statistical potential methodology was developed based on amino acid interactions between antibodies and antigens to calculate potential affinity-enhanced antibodies, which were further subjected to molecular dynamics simulations. Subsequently, experimental validation confirmed that a point mutation enhancing 2.5-fold affinity was obtained from 10 designs, resulting in the antibody affinity of 2 nM. A predictive model for antibody-antigen interactions based on the binding interface was also developed, achieving an Area Under the Curve (AUC) of 0.83 and a precision of 0.89 on the test set. Lastly, a novel approach involving combinations of affinity-enhancing mutations and an iterative mutation optimization scheme similar to the Monte Carlo method were proposed. This study presents computational methods that rapidly and accurately enhance antibody affinity, addressing issues related to antibody expression and immunogenicity.

高亲和力对抗体的有效性和特异性至关重要。由于涉及高通量筛选,抗体亲和力成熟的生物实验耗时长、成功率低。精确的计算辅助抗体设计有望加速这一过程,但目前仍缺乏有效的计算方法,无法准确定位抗体互补决定区(CDR)内的有益突变。此外,随机突变往往会导致抗体表达和免疫原性方面的挑战。在这项研究中,为了提高人类抗体对禽流感病毒的亲和力,我们构建了一个 CDR 库,并通过序列比对获得了进化信息,从而限制了突变的位置和类型。同时,根据抗体与抗原之间的氨基酸相互作用,开发了一种统计潜力方法,计算潜在的亲和力增强抗体,并对其进行分子动力学模拟。随后,实验验证证实,从 10 个设计中获得了亲和力增强 2.5 倍的点突变,使抗体亲和力达到 2 nM。此外,还开发了一个基于结合界面的抗体-抗原相互作用预测模型,在测试集上的曲线下面积(AUC)达到 0.83,精确度达到 0.89。最后,研究人员还提出了一种新方法,该方法涉及亲和力增强突变的组合以及与蒙特卡罗方法类似的迭代突变优化方案。本研究提出的计算方法可快速准确地增强抗体亲和力,解决抗体表达和免疫原性相关问题。
{"title":"Development and experimental validation of computational methods for human antibody affinity enhancement.","authors":"Junxin Li, Linbu Liao, Chao Zhang, Kaifang Huang, Pengfei Zhang, John Z H Zhang, Xiaochun Wan, Haiping Zhang","doi":"10.1093/bib/bbae488","DOIUrl":"10.1093/bib/bbae488","url":null,"abstract":"<p><p>High affinity is crucial for the efficacy and specificity of antibody. Due to involving high-throughput screens, biological experiments for antibody affinity maturation are time-consuming and have a low success rate. Precise computational-assisted antibody design promises to accelerate this process, but there is still a lack of effective computational methods capable of pinpointing beneficial mutations within the complementarity-determining region (CDR) of antibodies. Moreover, random mutations often lead to challenges in antibody expression and immunogenicity. In this study, to enhance the affinity of a human antibody against avian influenza virus, a CDR library was constructed and evolutionary information was acquired through sequence alignment to restrict the mutation positions and types. Concurrently, a statistical potential methodology was developed based on amino acid interactions between antibodies and antigens to calculate potential affinity-enhanced antibodies, which were further subjected to molecular dynamics simulations. Subsequently, experimental validation confirmed that a point mutation enhancing 2.5-fold affinity was obtained from 10 designs, resulting in the antibody affinity of 2 nM. A predictive model for antibody-antigen interactions based on the binding interface was also developed, achieving an Area Under the Curve (AUC) of 0.83 and a precision of 0.89 on the test set. Lastly, a novel approach involving combinations of affinity-enhancing mutations and an iterative mutation optimization scheme similar to the Monte Carlo method were proposed. This study presents computational methods that rapidly and accurately enhance antibody affinity, addressing issues related to antibody expression and immunogenicity.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11446602/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142364431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Briefings in bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1