首页 > 最新文献

Frontiers in bioinformatics最新文献

英文 中文
A breast cancer-specific combinational QSAR model development using machine learning and deep learning approaches. 利用机器学习和深度学习方法开发乳腺癌特异性组合 QSAR 模型。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-01-15 eCollection Date: 2023-01-01 DOI: 10.3389/fbinf.2023.1328262
Anush Karampuri, Shyam Perugu

Breast cancer is the most prevalent and heterogeneous form of cancer affecting women worldwide. Various therapeutic strategies are in practice based on the extent of disease spread, such as surgery, chemotherapy, radiotherapy, and immunotherapy. Combinational therapy is another strategy that has proven to be effective in controlling cancer progression. Administration of Anchor drug, a well-established primary therapeutic agent with known efficacy for specific targets, with Library drug, a supplementary drug to enhance the efficacy of anchor drugs and broaden the therapeutic approach. Our work focused on harnessing regression-based Machine learning (ML) and deep learning (DL) algorithms to develop a structure-activity relationship between the molecular descriptors of drug pairs and their combined biological activity through a QSAR (Quantitative structure-activity relationship) model. 11 popularly known machine learning and deep learning algorithms were used to develop QSAR models. A total of 52 breast cancer cell lines, 25 anchor drugs, and 51 library drugs were considered in developing the QSAR model. It was observed that Deep Neural Networks (DNNs) achieved an impressive R2 (Coefficient of Determination) of 0.94, with an RMSE (Root Mean Square Error) value of 0.255, making it the most effective algorithm for developing a structure-activity relationship with strong generalization capabilities. In conclusion, applying combinational therapy alongside ML and DL techniques represents a promising approach to combating breast cancer.

乳腺癌是影响全球妇女的最常见的异质性癌症。根据疾病的扩散程度,目前有多种治疗策略,如手术、化疗、放疗和免疫疗法。综合疗法是另一种被证明能有效控制癌症进展的策略。锚药是一种成熟的主要治疗药物,对特定靶点具有已知的疗效,而库药是一种辅助药物,可增强锚药的疗效并拓宽治疗途径。我们的工作重点是利用基于回归的机器学习(ML)和深度学习(DL)算法,通过 QSAR(定量结构-活性关系)模型,建立药物配对的分子描述符与其综合生物活性之间的结构-活性关系。11 种广为人知的机器学习和深度学习算法被用于开发 QSAR 模型。在开发 QSAR 模型时,共考虑了 52 个乳腺癌细胞系、25 种锚药物和 51 种库药物。结果表明,深度神经网络(DNN)的R2(决定系数)达到了令人印象深刻的0.94,RMSE(均方根误差)值为0.255,是建立结构-活性关系最有效的算法,具有很强的泛化能力。总之,在应用 ML 和 DL 技术的同时应用组合疗法是一种很有前景的抗击乳腺癌的方法。
{"title":"A breast cancer-specific combinational QSAR model development using machine learning and deep learning approaches.","authors":"Anush Karampuri, Shyam Perugu","doi":"10.3389/fbinf.2023.1328262","DOIUrl":"10.3389/fbinf.2023.1328262","url":null,"abstract":"<p><p>Breast cancer is the most prevalent and heterogeneous form of cancer affecting women worldwide. Various therapeutic strategies are in practice based on the extent of disease spread, such as surgery, chemotherapy, radiotherapy, and immunotherapy. Combinational therapy is another strategy that has proven to be effective in controlling cancer progression. Administration of Anchor drug, a well-established primary therapeutic agent with known efficacy for specific targets, with Library drug, a supplementary drug to enhance the efficacy of anchor drugs and broaden the therapeutic approach. Our work focused on harnessing regression-based Machine learning (ML) and deep learning (DL) algorithms to develop a structure-activity relationship between the molecular descriptors of drug pairs and their combined biological activity through a QSAR (Quantitative structure-activity relationship) model. 11 popularly known machine learning and deep learning algorithms were used to develop QSAR models. A total of 52 breast cancer cell lines, 25 anchor drugs, and 51 library drugs were considered in developing the QSAR model. It was observed that Deep Neural Networks (DNNs) achieved an impressive R<sup>2</sup> (Coefficient of Determination) of 0.94, with an RMSE (Root Mean Square Error) value of 0.255, making it the most effective algorithm for developing a structure-activity relationship with strong generalization capabilities. In conclusion, applying combinational therapy alongside ML and DL techniques represents a promising approach to combating breast cancer.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1328262"},"PeriodicalIF":2.8,"publicationDate":"2024-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10822965/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139577087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BPAGS: a web application for bacteriocin prediction via feature evaluation using alternating decision tree, genetic algorithm, and linear support vector classifier BPAGS:利用交替决策树、遗传算法和线性支持向量分类器,通过特征评估进行细菌素预测的网络应用程序
Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-01-10 DOI: 10.3389/fbinf.2023.1284705
Suraiya Akhter, John H. Miller
The use of bacteriocins has emerged as a propitious strategy in the development of new drugs to combat antibiotic resistance, given their ability to kill bacteria with both broad and narrow natural spectra. Hence, a compelling requirement arises for a precise and efficient computational model that can accurately predict novel bacteriocins. Machine learning’s ability to learn patterns and features from bacteriocin sequences that are difficult to capture using sequence matching-based methods makes it a potentially superior choice for accurate prediction. A web application for predicting bacteriocin was created in this study, utilizing a machine learning approach. The feature sets employed in the application were chosen using alternating decision tree (ADTree), genetic algorithm (GA), and linear support vector classifier (linear SVC)-based feature evaluation methods. Initially, potential features were extracted from the physicochemical, structural, and sequence-profile attributes of both bacteriocin and non-bacteriocin protein sequences. We assessed the candidate features first using the Pearson correlation coefficient, followed by separate evaluations with ADTree, GA, and linear SVC to eliminate unnecessary features. Finally, we constructed random forest (RF), support vector machine (SVM), decision tree (DT), logistic regression (LR), k-nearest neighbors (KNN), and Gaussian naïve Bayes (GNB) models using reduced feature sets. We obtained the overall top performing model using SVM with ADTree-reduced features, achieving an accuracy of 99.11% and an AUC value of 0.9984 on the testing dataset. We also assessed the predictive capabilities of our best-performing models for each reduced feature set relative to our previously developed software solution, a sequence alignment-based tool, and a deep-learning approach. A web application, titled BPAGS (Bacteriocin Prediction based on ADTree, GA, and linear SVC), was developed to incorporate the predictive models built using ADTree, GA, and linear SVC-based feature sets. Currently, the web-based tool provides classification results with associated probability values and has options to add new samples in the training data to improve the predictive efficacy. BPAGS is freely accessible at https://shiny.tricities.wsu.edu/bacteriocin-prediction/.
细菌素具有宽窄两种自然光谱,能够杀死细菌,因此在开发对抗抗生素耐药性的新药时,细菌素的使用已成为一种有利的策略。因此,人们迫切要求建立一个精确、高效的计算模型,以准确预测新型细菌素。机器学习能够从细菌素序列中学习到序列匹配方法难以捕捉到的模式和特征,因此有可能成为准确预测的上佳选择。本研究利用机器学习方法创建了一个预测细菌素的网络应用程序。应用中使用的特征集是通过交替决策树(ADTree)、遗传算法(GA)和基于特征评估方法的线性支持向量分类器(linear SVC)选择的。最初,我们从细菌素和非细菌素蛋白质序列的理化、结构和序列剖面属性中提取潜在特征。我们首先使用皮尔逊相关系数对候选特征进行评估,然后使用 ADTree、GA 和线性 SVC 分别进行评估,以剔除不必要的特征。最后,我们利用减少的特征集构建了随机森林(RF)、支持向量机(SVM)、决策树(DT)、逻辑回归(LR)、k-近邻(KNN)和高斯天真贝叶斯(GNB)模型。我们使用带有 ADTree 缩减特征的 SVM 获得了整体表现最佳的模型,在测试数据集上达到了 99.11% 的准确率和 0.9984 的 AUC 值。我们还评估了相对于我们之前开发的软件解决方案、基于序列比对的工具和深度学习方法,我们针对每个特征集缩减的最佳表现模型的预测能力。我们开发了一个名为 BPAGS(基于 ADTree、GA 和线性 SVC 的细菌素预测)的网络应用程序,以整合使用基于 ADTree、GA 和线性 SVC 特征集建立的预测模型。目前,该基于网络的工具可提供带有相关概率值的分类结果,并有在训练数据中添加新样本以提高预测效果的选项。BPAGS 可在 https://shiny.tricities.wsu.edu/bacteriocin-prediction/ 免费访问。
{"title":"BPAGS: a web application for bacteriocin prediction via feature evaluation using alternating decision tree, genetic algorithm, and linear support vector classifier","authors":"Suraiya Akhter, John H. Miller","doi":"10.3389/fbinf.2023.1284705","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1284705","url":null,"abstract":"The use of bacteriocins has emerged as a propitious strategy in the development of new drugs to combat antibiotic resistance, given their ability to kill bacteria with both broad and narrow natural spectra. Hence, a compelling requirement arises for a precise and efficient computational model that can accurately predict novel bacteriocins. Machine learning’s ability to learn patterns and features from bacteriocin sequences that are difficult to capture using sequence matching-based methods makes it a potentially superior choice for accurate prediction. A web application for predicting bacteriocin was created in this study, utilizing a machine learning approach. The feature sets employed in the application were chosen using alternating decision tree (ADTree), genetic algorithm (GA), and linear support vector classifier (linear SVC)-based feature evaluation methods. Initially, potential features were extracted from the physicochemical, structural, and sequence-profile attributes of both bacteriocin and non-bacteriocin protein sequences. We assessed the candidate features first using the Pearson correlation coefficient, followed by separate evaluations with ADTree, GA, and linear SVC to eliminate unnecessary features. Finally, we constructed random forest (RF), support vector machine (SVM), decision tree (DT), logistic regression (LR), k-nearest neighbors (KNN), and Gaussian naïve Bayes (GNB) models using reduced feature sets. We obtained the overall top performing model using SVM with ADTree-reduced features, achieving an accuracy of 99.11% and an AUC value of 0.9984 on the testing dataset. We also assessed the predictive capabilities of our best-performing models for each reduced feature set relative to our previously developed software solution, a sequence alignment-based tool, and a deep-learning approach. A web application, titled BPAGS (Bacteriocin Prediction based on ADTree, GA, and linear SVC), was developed to incorporate the predictive models built using ADTree, GA, and linear SVC-based feature sets. Currently, the web-based tool provides classification results with associated probability values and has options to add new samples in the training data to improve the predictive efficacy. BPAGS is freely accessible at https://shiny.tricities.wsu.edu/bacteriocin-prediction/.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"8 12","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139439460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
No-boundary thinking for artificial intelligence in bioinformatics and education 生物信息学和教育领域人工智能的无边界思维
Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-01-08 DOI: 10.3389/fbinf.2023.1332902
Prajay Patel, Nisha Pillai, Inimary T. Toby
No-boundary thinking enables the scientific community to reflect in a thoughtful manner and discover new opportunities, create innovative solutions, and break through barriers that might have otherwise constrained their progress. This concept encourages thinking without being confined by traditional rules, limitations, or established norms, and a mindset that is not limited by previous work, leading to fresh perspectives and innovative outcomes. So, where do we see the field of artificial intelligence (AI) in bioinformatics going in the next 30 years? That was the theme of a “No-Boundary Thinking” Session as part of the Mid-South Computational Bioinformatics Society’s (MCBIOS) 19th annual meeting in Irving, Texas. This session addressed various areas of AI in an open discussion and raised some perspectives on how popular tools like ChatGPT can be integrated into bioinformatics, communicating with scientists in different fields to properly utilize the potential of these algorithms, and how to continue educational outreach to further interest of data science and informatics to the next-generation of scientists.
无边界思维使科学界能够以深思熟虑的方式进行反思,发现新的机遇,创造创新的解决方案,突破可能制约其进步的障碍。这一概念鼓励不受传统规则、限制或既定规范束缚的思考,以及不受以往工作限制的思维方式,从而带来全新的视角和创新成果。那么,我们认为生物信息学领域的人工智能(AI)在未来 30 年将走向何方?这就是在德克萨斯州欧文市举行的中南计算生物信息学学会(MCBIOS)第 19 届年会的 "无边界思维 "会议的主题。本次会议以开放式讨论的形式探讨了人工智能的各个领域,并就如何将 ChatGPT 等流行工具整合到生物信息学中、与不同领域的科学家沟通以正确利用这些算法的潜力,以及如何继续开展教育推广活动以提高下一代科学家对数据科学和信息学的兴趣等问题提出了一些看法。
{"title":"No-boundary thinking for artificial intelligence in bioinformatics and education","authors":"Prajay Patel, Nisha Pillai, Inimary T. Toby","doi":"10.3389/fbinf.2023.1332902","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1332902","url":null,"abstract":"No-boundary thinking enables the scientific community to reflect in a thoughtful manner and discover new opportunities, create innovative solutions, and break through barriers that might have otherwise constrained their progress. This concept encourages thinking without being confined by traditional rules, limitations, or established norms, and a mindset that is not limited by previous work, leading to fresh perspectives and innovative outcomes. So, where do we see the field of artificial intelligence (AI) in bioinformatics going in the next 30 years? That was the theme of a “No-Boundary Thinking” Session as part of the Mid-South Computational Bioinformatics Society’s (MCBIOS) 19th annual meeting in Irving, Texas. This session addressed various areas of AI in an open discussion and raised some perspectives on how popular tools like ChatGPT can be integrated into bioinformatics, communicating with scientists in different fields to properly utilize the potential of these algorithms, and how to continue educational outreach to further interest of data science and informatics to the next-generation of scientists.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"49 19","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139448061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Protein-lipid interactions and protein anchoring modulate the modes of association of the globular domain of the Prion protein and Doppel protein to model membrane patches 蛋白-脂质相互作用和蛋白锚定调节朊病毒蛋白和多肽蛋白的球状结构域与模型膜片的结合模式
Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-01-05 DOI: 10.3389/fbinf.2023.1321287
Patricia Soto, Davis T. Thalhuber, Frank Luceri, Jamie Janos, Mason R. Borgman, Noah M. Greenwood, Sofia Acosta, Hunter Stoffel
The Prion protein is the molecular hallmark of the incurable prion diseases affecting mammals, including humans. The protein-only hypothesis states that the misfolding, accumulation, and deposition of the Prion protein play a critical role in toxicity. The cellular Prion protein (PrPC) anchors to the extracellular leaflet of the plasma membrane and prefers cholesterol- and sphingomyelin-rich membrane domains. Conformational Prion protein conversion into the pathological isoform happens on the cell surface. In vitro and in vivo experiments indicate that Prion protein misfolding, aggregation, and toxicity are sensitive to the lipid composition of plasma membranes and vesicles. A picture of the underlying biophysical driving forces that explain the effect of Prion protein - lipid interactions in physiological conditions is needed to develop a structural model of Prion protein conformational conversion. To this end, we use molecular dynamics simulations that mimic the interactions between the globular domain of PrPC anchored to model membrane patches. In addition, we also simulate the Doppel protein anchored to such membrane patches. The Doppel protein is the closest in the phylogenetic tree to PrPC, localizes in an extracellular milieu similar to that of PrPC, and exhibits a similar topology to PrPC even if the amino acid sequence is only 25% identical. Our simulations show that specific protein-lipid interactions and conformational constraints imposed by GPI anchoring together favor specific binding sites in globular PrPC but not in Doppel. Interestingly, the binding sites we found in PrPC correspond to prion protein loops, which are critical in aggregation and prion disease transmission barrier (β2-α2 loop) and in initial spontaneous misfolding (α2-α3 loop). We also found that the membrane re-arranges locally to accommodate protein residues inserted in the membrane surface as a response to protein binding.
朊病毒蛋白是影响哺乳动物(包括人类)的无法治愈的朊病毒疾病的分子标志。唯蛋白假说认为,朊病毒蛋白的错误折叠、积累和沉积在毒性中起着关键作用。细胞朊病毒蛋白(PrPC)锚定在质膜的细胞外小叶上,喜欢富含胆固醇和鞘磷脂的膜域。朊病毒蛋白在细胞表面转化为病理异构体。体外和体内实验表明,朊病毒蛋白的错误折叠、聚集和毒性对质膜和囊泡的脂质成分很敏感。为了建立朊病毒蛋白构象转换的结构模型,我们需要了解在生理条件下解释朊病毒蛋白-脂质相互作用效应的潜在生物物理驱动力。为此,我们使用分子动力学模拟来模拟锚定在模型膜片上的 PrPC 球状结构域之间的相互作用。此外,我们还模拟了锚定在此类膜斑块上的 Doppel 蛋白。Doppel 蛋白在系统发育树中与 PrPC 最为接近,定位于与 PrPC 相似的细胞外环境中,即使氨基酸序列只有 25% 相同,也表现出与 PrPC 相似的拓扑结构。我们的模拟结果表明,特定的蛋白质-脂质相互作用和 GPI 锚定所施加的构象限制共同作用于球状 PrPC 的特定结合位点,而不是 Doppel 的结合位点。有趣的是,我们在 PrPC 中发现的结合位点与朊病毒蛋白环路相对应,而朊病毒蛋白环路在聚集和朊病毒疾病传播屏障(β2-α2 环路)以及初始自发错误折叠(α2-α3 环路)中至关重要。我们还发现,作为对蛋白质结合的反应,膜会局部重新排列,以容纳插入膜表面的蛋白质残基。
{"title":"Protein-lipid interactions and protein anchoring modulate the modes of association of the globular domain of the Prion protein and Doppel protein to model membrane patches","authors":"Patricia Soto, Davis T. Thalhuber, Frank Luceri, Jamie Janos, Mason R. Borgman, Noah M. Greenwood, Sofia Acosta, Hunter Stoffel","doi":"10.3389/fbinf.2023.1321287","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1321287","url":null,"abstract":"The Prion protein is the molecular hallmark of the incurable prion diseases affecting mammals, including humans. The protein-only hypothesis states that the misfolding, accumulation, and deposition of the Prion protein play a critical role in toxicity. The cellular Prion protein (PrPC) anchors to the extracellular leaflet of the plasma membrane and prefers cholesterol- and sphingomyelin-rich membrane domains. Conformational Prion protein conversion into the pathological isoform happens on the cell surface. In vitro and in vivo experiments indicate that Prion protein misfolding, aggregation, and toxicity are sensitive to the lipid composition of plasma membranes and vesicles. A picture of the underlying biophysical driving forces that explain the effect of Prion protein - lipid interactions in physiological conditions is needed to develop a structural model of Prion protein conformational conversion. To this end, we use molecular dynamics simulations that mimic the interactions between the globular domain of PrPC anchored to model membrane patches. In addition, we also simulate the Doppel protein anchored to such membrane patches. The Doppel protein is the closest in the phylogenetic tree to PrPC, localizes in an extracellular milieu similar to that of PrPC, and exhibits a similar topology to PrPC even if the amino acid sequence is only 25% identical. Our simulations show that specific protein-lipid interactions and conformational constraints imposed by GPI anchoring together favor specific binding sites in globular PrPC but not in Doppel. Interestingly, the binding sites we found in PrPC correspond to prion protein loops, which are critical in aggregation and prion disease transmission barrier (β2-α2 loop) and in initial spontaneous misfolding (α2-α3 loop). We also found that the membrane re-arranges locally to accommodate protein residues inserted in the membrane surface as a response to protein binding.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"39 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139381635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RxNorm for drug name normalization: a case study of prescription opioids in the FDA adverse events reporting system 用于药品名称规范化的 RxNorm:FDA 不良事件报告系统中处方类阿片的案例研究
Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-01-05 DOI: 10.3389/fbinf.2023.1328613
Huyen Le, Ru Chen, Stephen Harris, Hong Fang, Beverly Lyn-Cook, H. Hong, W. Ge, Paul Rogers, Weida Tong, Wen Zou
Numerous studies have been conducted on the US Food and Drug Administration (FDA) Adverse Events Reporting System (FAERS) database to assess post-marketing reporting rates for drug safety review and risk assessment. However, the drug names in the adverse event (AE) reports from FAERS were heterogeneous due to a lack of uniformity of information submitted mandatorily by pharmaceutical companies and voluntarily by patients, healthcare professionals, and the public. Studies using FAERS and other spontaneous reporting AEs database without drug name normalization may encounter incomplete collection of AE reports from non-standard drug names and the accuracies of the results might be impacted. In this study, we demonstrated applicability of RxNorm, developed by the National Library of Medicine, for drug name normalization in FAERS. Using prescription opioids as a case study, we used RxNorm application program interface (API) to map all FDA-approved prescription opioids described in FAERS AE reports to their equivalent RxNorm Concept Unique Identifiers (RxCUIs) and RxNorm names. The different names of the opioids were then extracted, and their usage frequencies were calculated in collection of more than 14.9 million AE reports for 13 FDA-approved prescription opioid classes, reported over 17 years. The results showed that a significant number of different names were consistently used for opioids in FAERS reports, with 2,086 different names (out of 7,892) used at least three times and 842 different names used at least ten times for each of the 92 RxNorm names of FDA-approved opioids. Our method of using RxNorm API mapping was confirmed to be efficient and accurate and capable of reducing the heterogeneity of prescription opioid names significantly in the AE reports in FAERS; meanwhile, it is expected to have a broad application to different sets of drug names from any database where drug names are diverse and unnormalized. It is expected to be able to automatically standardize and link different representations of the same drugs to build an intact and high-quality database for diverse research, particularly postmarketing data analysis in pharmacovigilance initiatives.
美国食品和药物管理局(FDA)的不良事件报告系统(FAERS)数据库已进行了大量研究,以评估上市后的报告率,用于药物安全性审查和风险评估。然而,由于制药公司强制提交的信息与患者、医疗保健专业人员和公众自愿提交的信息不统一,FAERS 中不良事件(AE)报告中的药物名称也不尽相同。使用 FAERS 和其他自发报告的 AEs 数据库进行研究时,如果没有对药物名称进行规范化处理,可能会遇到从非标准药物名称中收集到的 AE 报告不完整的问题,结果的准确性可能会受到影响。在本研究中,我们展示了美国国家医学图书馆开发的 RxNorm 在 FAERS 中进行药名规范化的适用性。以处方阿片类药物为例,我们使用 RxNorm 应用程序接口(API)将 FAERS AE 报告中描述的所有经 FDA 批准的处方阿片类药物映射为等效的 RxNorm 概念唯一标识符(RxCUI)和 RxNorm 名称。然后提取了阿片类药物的不同名称,并在收集的超过 1490 万份 AE 报告中计算了它们的使用频率,这些报告涉及 13 个 FDA 批准的处方阿片类药物类别,报告时间长达 17 年。结果显示,在 FAERS 报告中,阿片类药物持续使用了大量不同的名称,在 92 个 FDA 批准的阿片类药物 RxNorm 名称中,有 2,086 个不同名称(共 7,892 个)至少使用了三次,842 个不同名称至少使用了十次。我们使用 RxNorm API 映射的方法被证实是高效、准确的,能够显著减少 FAERS AE 报告中处方阿片类药物名称的异质性;同时,它有望广泛应用于任何药物名称多样且未规范化的数据库中的不同药物名称集。它有望能够自动标准化和连接相同药物的不同表述,从而建立一个完整和高质量的数据库,用于各种研究,特别是药物警戒计划中的上市后数据分析。
{"title":"RxNorm for drug name normalization: a case study of prescription opioids in the FDA adverse events reporting system","authors":"Huyen Le, Ru Chen, Stephen Harris, Hong Fang, Beverly Lyn-Cook, H. Hong, W. Ge, Paul Rogers, Weida Tong, Wen Zou","doi":"10.3389/fbinf.2023.1328613","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1328613","url":null,"abstract":"Numerous studies have been conducted on the US Food and Drug Administration (FDA) Adverse Events Reporting System (FAERS) database to assess post-marketing reporting rates for drug safety review and risk assessment. However, the drug names in the adverse event (AE) reports from FAERS were heterogeneous due to a lack of uniformity of information submitted mandatorily by pharmaceutical companies and voluntarily by patients, healthcare professionals, and the public. Studies using FAERS and other spontaneous reporting AEs database without drug name normalization may encounter incomplete collection of AE reports from non-standard drug names and the accuracies of the results might be impacted. In this study, we demonstrated applicability of RxNorm, developed by the National Library of Medicine, for drug name normalization in FAERS. Using prescription opioids as a case study, we used RxNorm application program interface (API) to map all FDA-approved prescription opioids described in FAERS AE reports to their equivalent RxNorm Concept Unique Identifiers (RxCUIs) and RxNorm names. The different names of the opioids were then extracted, and their usage frequencies were calculated in collection of more than 14.9 million AE reports for 13 FDA-approved prescription opioid classes, reported over 17 years. The results showed that a significant number of different names were consistently used for opioids in FAERS reports, with 2,086 different names (out of 7,892) used at least three times and 842 different names used at least ten times for each of the 92 RxNorm names of FDA-approved opioids. Our method of using RxNorm API mapping was confirmed to be efficient and accurate and capable of reducing the heterogeneity of prescription opioid names significantly in the AE reports in FAERS; meanwhile, it is expected to have a broad application to different sets of drug names from any database where drug names are diverse and unnormalized. It is expected to be able to automatically standardize and link different representations of the same drugs to build an intact and high-quality database for diverse research, particularly postmarketing data analysis in pharmacovigilance initiatives.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"48 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139383606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editorial: Expert opinions in protein bioinformatics: 2022 社论:蛋白质生物信息学专家意见:2022 年
Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-01-05 DOI: 10.3389/fbinf.2023.1338560
Daisuke Kihara
{"title":"Editorial: Expert opinions in protein bioinformatics: 2022","authors":"Daisuke Kihara","doi":"10.3389/fbinf.2023.1338560","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1338560","url":null,"abstract":"","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"50 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139383582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Genomic risk prediction of cardiovascular diseases among type 2 diabetes patients in the UK Biobank 英国生物库中 2 型糖尿病患者心血管疾病的基因组风险预测
Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-01-04 DOI: 10.3389/fbinf.2023.1320748
Yixuan Ye, Jiaqi Hu, Fuyuan Pang, Can Cui, Hongyu Zhao
Background: Polygenic risk score (PRS) has proved useful in predicting the risk of cardiovascular diseases (CVD) based on the genotypes of an individual, but most analyses have focused on disease onset in the general population. The usefulness of PRS to predict CVD risk among type 2 diabetes (T2D) patients remains unclear.Methods: We built a meta-PRSCVD upon the candidate PRSs developed from state-of-the-art PRS methods for three CVD subtypes of significant importance: coronary artery disease (CAD), ischemic stroke (IS), and heart failure (HF). To evaluate the prediction performance of the meta-PRSCVD, we restricted our analysis to 21,092 white British T2D patients in the UK Biobank, among which 4,015 had CVD events.Results: Results showed that the meta-PRSCVD was significantly associated with CVD risk with a hazard ratio per standard deviation increase of 1.28 (95% CI: 1.23–1.33). The meta-PRSCVD alone predicted the CVD incidence with an area under the receiver operating characteristic curve (AUC) of 0.57 (95% CI: 0.54–0.59). When restricted to the early-onset patients (onset age ≤ 55), the AUC was further increased to 0.61 (95% CI 0.56–0.67).Conclusion: Our results highlight the potential role of genomic screening for secondary preventions of CVD among T2D patients, especially among early-onset patients.
背景:多基因风险评分(PRS)已被证明有助于根据个体的基因型预测心血管疾病(CVD)的风险,但大多数分析都侧重于普通人群的发病情况。PRS对预测2型糖尿病(T2D)患者心血管疾病风险的有用性仍不清楚:我们根据最先进的 PRS 方法为三种重要的心血管疾病亚型(冠状动脉疾病 (CAD)、缺血性中风 (IS) 和心力衰竭 (HF))开发的候选 PRS 建立了元 PRSCVD。为了评估元 PRSCVD 的预测性能,我们将分析对象限定为英国生物库中的 21,092 名英国白人 T2D 患者,其中 4,015 人发生了心血管疾病事件:结果显示,meta-PRSCVD 与心血管疾病风险显著相关,每标准差增加的危险比为 1.28(95% CI:1.23-1.33)。元-PRSCVD单独预测心血管疾病发病率的接收者操作特征曲线下面积(AUC)为0.57(95% CI:0.54-0.59)。如果仅限于早发患者(发病年龄≤55岁),则AUC进一步增加到0.61(95% CI 0.56-0.67):我们的研究结果凸显了基因组筛查在T2D患者,尤其是早发症患者心血管疾病二级预防中的潜在作用。
{"title":"Genomic risk prediction of cardiovascular diseases among type 2 diabetes patients in the UK Biobank","authors":"Yixuan Ye, Jiaqi Hu, Fuyuan Pang, Can Cui, Hongyu Zhao","doi":"10.3389/fbinf.2023.1320748","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1320748","url":null,"abstract":"Background: Polygenic risk score (PRS) has proved useful in predicting the risk of cardiovascular diseases (CVD) based on the genotypes of an individual, but most analyses have focused on disease onset in the general population. The usefulness of PRS to predict CVD risk among type 2 diabetes (T2D) patients remains unclear.Methods: We built a meta-PRSCVD upon the candidate PRSs developed from state-of-the-art PRS methods for three CVD subtypes of significant importance: coronary artery disease (CAD), ischemic stroke (IS), and heart failure (HF). To evaluate the prediction performance of the meta-PRSCVD, we restricted our analysis to 21,092 white British T2D patients in the UK Biobank, among which 4,015 had CVD events.Results: Results showed that the meta-PRSCVD was significantly associated with CVD risk with a hazard ratio per standard deviation increase of 1.28 (95% CI: 1.23–1.33). The meta-PRSCVD alone predicted the CVD incidence with an area under the receiver operating characteristic curve (AUC) of 0.57 (95% CI: 0.54–0.59). When restricted to the early-onset patients (onset age ≤ 55), the AUC was further increased to 0.61 (95% CI 0.56–0.67).Conclusion: Our results highlight the potential role of genomic screening for secondary preventions of CVD among T2D patients, especially among early-onset patients.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"59 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139384606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying epigenetic aging moderators using the epigenetic pacemaker 利用表观遗传起搏器识别表观遗传衰老调节器
Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-01-03 DOI: 10.3389/fbinf.2023.1308680
Colin Farrell, Chanyue Hu, Kalsuda Lapborisuth, Kyle Pu, S. Snir, Matteo Pellegrini
Epigenetic clocks are DNA methylation-based chronological age prediction models that are commonly employed to study age-related biology. The difference between the predicted and observed age is often interpreted as a form of biological age acceleration, and many studies have measured the impact of environmental and disease-associated factors on epigenetic age. Most epigenetic clocks are fit using approaches that minimize the error between the predicted and observed chronological age, and as a result, they may not accurately model the impact of factors that moderate the relationship between the actual and epigenetic age. Here, we compare epigenetic clocks that are constructed using penalized regression methods to an evolutionary framework of epigenetic aging with the epigenetic pacemaker (EPM), which directly models DNA methylation as a function of a time-dependent epigenetic state. In simulations, we show that the value of the epigenetic state is impacted by factors such as age, sex, and cell-type composition. Next, in a dataset aggregated from previous studies, we show that the epigenetic state is also moderated by sex and the cell type. Finally, we demonstrate that the epigenetic state is also moderated by toxins in a study on polybrominated biphenyl exposure. Thus, we find that the pacemaker provides a robust framework for the study of factors that impact epigenetic age acceleration and that the effect of these factors may be obscured in traditional clocks based on linear regression models.
表观遗传时钟是基于 DNA 甲基化的年代年龄预测模型,通常用于研究与年龄相关的生物学。预测年龄与观察年龄之间的差异通常被解释为一种生物年龄加速,许多研究已经测量了环境和疾病相关因素对表观遗传年龄的影响。大多数表观遗传时钟的拟合方法是尽量减小预测年龄与观察年龄之间的误差,因此,它们可能无法准确模拟缓和实际年龄与表观遗传年龄之间关系的因素的影响。在这里,我们将使用惩罚回归方法构建的表观遗传时钟与表观遗传起搏器(EPM)的表观遗传衰老进化框架进行了比较,EPM直接将DNA甲基化模拟为随时间变化的表观遗传状态的函数。在模拟中,我们发现表观遗传状态的值受年龄、性别和细胞类型组成等因素的影响。接下来,在一个由以往研究汇总而成的数据集中,我们表明表观遗传状态也受性别和细胞类型的影响。最后,我们在一项关于多溴联苯暴露的研究中证明,表观遗传状态也受毒素的影响。因此,我们发现起搏器为研究影响表观遗传年龄加速的因素提供了一个稳健的框架,而这些因素的影响可能会被基于线性回归模型的传统时钟所掩盖。
{"title":"Identifying epigenetic aging moderators using the epigenetic pacemaker","authors":"Colin Farrell, Chanyue Hu, Kalsuda Lapborisuth, Kyle Pu, S. Snir, Matteo Pellegrini","doi":"10.3389/fbinf.2023.1308680","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1308680","url":null,"abstract":"Epigenetic clocks are DNA methylation-based chronological age prediction models that are commonly employed to study age-related biology. The difference between the predicted and observed age is often interpreted as a form of biological age acceleration, and many studies have measured the impact of environmental and disease-associated factors on epigenetic age. Most epigenetic clocks are fit using approaches that minimize the error between the predicted and observed chronological age, and as a result, they may not accurately model the impact of factors that moderate the relationship between the actual and epigenetic age. Here, we compare epigenetic clocks that are constructed using penalized regression methods to an evolutionary framework of epigenetic aging with the epigenetic pacemaker (EPM), which directly models DNA methylation as a function of a time-dependent epigenetic state. In simulations, we show that the value of the epigenetic state is impacted by factors such as age, sex, and cell-type composition. Next, in a dataset aggregated from previous studies, we show that the epigenetic state is also moderated by sex and the cell type. Finally, we demonstrate that the epigenetic state is also moderated by toxins in a study on polybrominated biphenyl exposure. Thus, we find that the pacemaker provides a robust framework for the study of factors that impact epigenetic age acceleration and that the effect of these factors may be obscured in traditional clocks based on linear regression models.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"47 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139451050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving fluorescence lifetime imaging microscopy phasor accuracy using convolutional neural networks 利用卷积神经网络提高荧光寿命成像显微镜相位精度
Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-12-22 DOI: 10.3389/fbinf.2023.1335413
Varun Mannam, Jacob P. Brandt, Cody J. Smith, Xiaotong Yuan, S. Howard
Introduction: Although a powerful biological imaging technique, fluorescence lifetime imaging microscopy (FLIM) faces challenges such as a slow acquisition rate, a low signal-to-noise ratio (SNR), and high cost and complexity. To address the fundamental problem of low SNR in FLIM images, we demonstrate how to use pre-trained convolutional neural networks (CNNs) to reduce noise in FLIM measurements.Methods: Our approach uses pre-learned models that have been previously validated on large datasets with different distributions than the training datasets, such as sample structures, noise distributions, and microscopy modalities in fluorescence microscopy, to eliminate the need to train a neural network from scratch or to acquire a large training dataset to denoise FLIM data. In addition, we are using the pre-trained networks in the inference stage, where the computation time is in milliseconds and accuracy is better than traditional denoising methods. To separate different fluorophores in lifetime images, the denoised images are then run through an unsupervised machine learning technique named “K-means clustering”.Results and Discussion: The results of the experiments carried out on in vivo mouse kidney tissue, Bovine pulmonary artery endothelial (BPAE) fixed cells that have been fluorescently labeled, and mouse kidney fixed samples that have been fluorescently labeled show that our demonstrated method can effectively remove noise from FLIM images and improve segmentation accuracy. Additionally, the performance of our method on out-of-distribution highly scattering in vivo plant samples shows that it can also improve SNR in challenging imaging conditions. Our proposed method provides a fast and accurate way to segment fluorescence lifetime images captured using any FLIM system. It is especially effective for separating fluorophores in noisy FLIM images, which is common in in vivo imaging where averaging is not applicable. Our approach significantly improves the identification of vital biologically relevant structures in biomedical imaging applications.
引言:荧光寿命成像显微镜(FLIM)虽然是一种强大的生物成像技术,但却面临着采集速度慢、信噪比(SNR)低、成本高且复杂等挑战。为了解决荧光寿命成像图像信噪比低这一根本问题,我们展示了如何使用预训练的卷积神经网络(CNN)来降低荧光寿命成像测量中的噪声:我们的方法使用预先学习的模型,这些模型之前已在大型数据集上进行过验证,这些数据集的分布与训练数据集不同,例如荧光显微镜中的样本结构、噪声分布和显微镜模式,因此无需从头开始训练神经网络,也无需获取大型训练数据集来对 FLIM 数据进行去噪处理。此外,我们还在推理阶段使用预先训练好的网络,其计算时间仅为毫秒级,准确性却优于传统的去噪方法。为了分离生命周期图像中的不同荧光团,去噪后的图像将通过一种名为 "K-means 聚类 "的无监督机器学习技术进行处理:在活体小鼠肾脏组织、已荧光标记的牛肺动脉内皮(BPAE)固定细胞和已荧光标记的小鼠肾脏固定样本上进行的实验结果表明,我们展示的方法能有效去除 FLIM 图像中的噪声,并提高分割精度。此外,我们的方法在分布外高散射活体植物样本上的表现表明,它还能在具有挑战性的成像条件下提高信噪比。我们提出的方法提供了一种快速、准确的方法来分割使用任何 FLIM 系统捕获的荧光寿命图像。这种方法对分离嘈杂 FLIM 图像中的荧光团特别有效,而这种情况在不适用平均法的活体成像中很常见。我们的方法大大提高了生物医学成像应用中重要生物相关结构的识别能力。
{"title":"Improving fluorescence lifetime imaging microscopy phasor accuracy using convolutional neural networks","authors":"Varun Mannam, Jacob P. Brandt, Cody J. Smith, Xiaotong Yuan, S. Howard","doi":"10.3389/fbinf.2023.1335413","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1335413","url":null,"abstract":"Introduction: Although a powerful biological imaging technique, fluorescence lifetime imaging microscopy (FLIM) faces challenges such as a slow acquisition rate, a low signal-to-noise ratio (SNR), and high cost and complexity. To address the fundamental problem of low SNR in FLIM images, we demonstrate how to use pre-trained convolutional neural networks (CNNs) to reduce noise in FLIM measurements.Methods: Our approach uses pre-learned models that have been previously validated on large datasets with different distributions than the training datasets, such as sample structures, noise distributions, and microscopy modalities in fluorescence microscopy, to eliminate the need to train a neural network from scratch or to acquire a large training dataset to denoise FLIM data. In addition, we are using the pre-trained networks in the inference stage, where the computation time is in milliseconds and accuracy is better than traditional denoising methods. To separate different fluorophores in lifetime images, the denoised images are then run through an unsupervised machine learning technique named “K-means clustering”.Results and Discussion: The results of the experiments carried out on in vivo mouse kidney tissue, Bovine pulmonary artery endothelial (BPAE) fixed cells that have been fluorescently labeled, and mouse kidney fixed samples that have been fluorescently labeled show that our demonstrated method can effectively remove noise from FLIM images and improve segmentation accuracy. Additionally, the performance of our method on out-of-distribution highly scattering in vivo plant samples shows that it can also improve SNR in challenging imaging conditions. Our proposed method provides a fast and accurate way to segment fluorescence lifetime images captured using any FLIM system. It is especially effective for separating fluorophores in noisy FLIM images, which is common in in vivo imaging where averaging is not applicable. Our approach significantly improves the identification of vital biologically relevant structures in biomedical imaging applications.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 11","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138944777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attention network for predicting T-cell receptor-peptide binding can associate attention with interpretable protein structural properties. 用于预测 T 细胞受体与多肽结合的注意力网络可将注意力与可解释的蛋白质结构特性联系起来。
Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-12-18 eCollection Date: 2023-01-01 DOI: 10.3389/fbinf.2023.1274599
Kyohei Koyama, Kosuke Hashimoto, Chioko Nagao, Kenji Mizuguchi

Understanding how a T-cell receptor (TCR) recognizes its specific ligand peptide is crucial for gaining an insight into biological functions and disease mechanisms. Despite its importance, experimentally determining TCR-peptide-major histocompatibility complex (TCR-pMHC) interactions is expensive and time-consuming. To address this challenge, computational methods have been proposed, but they are typically evaluated by internal retrospective validation only, and few researchers have incorporated and tested an attention layer from language models into structural information. Therefore, in this study, we developed a machine learning model based on a modified version of Transformer, a source-target attention neural network, to predict the TCR-pMHC interaction solely from the amino acid sequences of the TCR complementarity-determining region (CDR) 3 and the peptide. This model achieved competitive performance on a benchmark dataset of the TCR-pMHC interaction, as well as on a truly new external dataset. Additionally, by analyzing the results of binding predictions, we associated the neural network weights with protein structural properties. By classifying the residues into large- and small-attention groups, we identified statistically significant properties associated with the largely attended residues such as hydrogen bonds within CDR3. The dataset that we created and the ability of our model to provide an interpretable prediction of TCR-peptide binding should increase our knowledge about molecular recognition and pave the way for designing new therapeutics.

了解 T 细胞受体(TCR)如何识别其特定的配体肽对于深入了解生物功能和疾病机制至关重要。尽管很重要,但通过实验确定 TCR-肽-主要组织相容性复合体(TCR-pMHC)之间的相互作用既昂贵又耗时。为了应对这一挑战,人们提出了一些计算方法,但这些方法通常只通过内部回顾验证进行评估,很少有研究人员将语言模型的注意力层纳入结构信息并进行测试。因此,在本研究中,我们开发了一种基于源-目标注意神经网络 Transformer 改进版的机器学习模型,仅从 TCR 互补性决定区(CDR)3 和多肽的氨基酸序列预测 TCR-pMHC 相互作用。该模型在TCR-pMHC相互作用的基准数据集以及全新的外部数据集上都取得了具有竞争力的性能。此外,通过分析结合预测的结果,我们将神经网络权重与蛋白质结构特性联系起来。通过将残基分为大关注度组和小关注度组,我们发现了与大关注度残基(如 CDR3 中的氢键)相关的具有统计学意义的特性。我们创建的数据集和我们的模型能够提供可解释的 TCR 肽结合预测,这将增加我们对分子识别的了解,并为设计新疗法铺平道路。
{"title":"Attention network for predicting T-cell receptor-peptide binding can associate attention with interpretable protein structural properties.","authors":"Kyohei Koyama, Kosuke Hashimoto, Chioko Nagao, Kenji Mizuguchi","doi":"10.3389/fbinf.2023.1274599","DOIUrl":"10.3389/fbinf.2023.1274599","url":null,"abstract":"<p><p>Understanding how a T-cell receptor (TCR) recognizes its specific ligand peptide is crucial for gaining an insight into biological functions and disease mechanisms. Despite its importance, experimentally determining TCR-peptide-major histocompatibility complex (TCR-pMHC) interactions is expensive and time-consuming. To address this challenge, computational methods have been proposed, but they are typically evaluated by internal retrospective validation only, and few researchers have incorporated and tested an attention layer from language models into structural information. Therefore, in this study, we developed a machine learning model based on a modified version of Transformer, a source-target attention neural network, to predict the TCR-pMHC interaction solely from the amino acid sequences of the TCR complementarity-determining region (CDR) 3 and the peptide. This model achieved competitive performance on a benchmark dataset of the TCR-pMHC interaction, as well as on a truly new external dataset. Additionally, by analyzing the results of binding predictions, we associated the neural network weights with protein structural properties. By classifying the residues into large- and small-attention groups, we identified statistically significant properties associated with the largely attended residues such as hydrogen bonds within CDR3. The dataset that we created and the ability of our model to provide an interpretable prediction of TCR-peptide binding should increase our knowledge about molecular recognition and pave the way for designing new therapeutics.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1274599"},"PeriodicalIF":0.0,"publicationDate":"2023-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10759225/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139089614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Frontiers in bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1