Qisheng Pan, Georgina Becerra Parra, Yoochan Myung, Stephanie Portelli, Thanh Binh Nguyen, David B. Ascher
{"title":"AlzDiscovery: A computational tool to identify Alzheimer's disease‐causing missense mutations using protein structure information","authors":"Qisheng Pan, Georgina Becerra Parra, Yoochan Myung, Stephanie Portelli, Thanh Binh Nguyen, David B. Ascher","doi":"10.1002/pro.5147","DOIUrl":null,"url":null,"abstract":"Alzheimer's disease (AD) is one of the most common forms of dementia and neurodegenerative diseases, characterized by the formation of neuritic plaques and neurofibrillary tangles. Many different proteins participate in this complicated pathogenic mechanism, and missense mutations can alter the folding and functions of these proteins, significantly increasing the risk of AD. However, many methods to identify AD‐causing variants did not consider the effect of mutations from the perspective of a protein three‐dimensional environment. Here, we present a machine learning‐based analysis to classify the AD‐causing mutations from their benign counterparts in 21 AD‐related proteins leveraging both sequence‐ and structure‐based features. Using computational tools to estimate the effect of mutations on protein stability, we first observed a bias of the pathogenic mutations with significant destabilizing effects on family AD‐related proteins. Combining this insight, we built a generic predictive model, and improved the performance by tuning the sample weights in the training process. Our final model achieved the performance on area under the receiver operating characteristic curve up to 0.95 in the blind test and 0.70 in an independent clinical validation, outperforming all the state‐of‐the‐art methods. Feature interpretation indicated that the hydrophobic environment and polar interaction contacts were crucial to the decision on pathogenic phenotypes of missense mutations. Finally, we presented a user‐friendly web server, AlzDiscovery, for researchers to browse the predicted phenotypes of all possible missense mutations on these 21 AD‐related proteins. Our study will be a valuable resource for AD screening and the development of personalized treatment.","PeriodicalId":20761,"journal":{"name":"Protein Science","volume":"22 1","pages":""},"PeriodicalIF":4.5000,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Protein Science","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/pro.5147","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Alzheimer's disease (AD) is one of the most common forms of dementia and neurodegenerative diseases, characterized by the formation of neuritic plaques and neurofibrillary tangles. Many different proteins participate in this complicated pathogenic mechanism, and missense mutations can alter the folding and functions of these proteins, significantly increasing the risk of AD. However, many methods to identify AD‐causing variants did not consider the effect of mutations from the perspective of a protein three‐dimensional environment. Here, we present a machine learning‐based analysis to classify the AD‐causing mutations from their benign counterparts in 21 AD‐related proteins leveraging both sequence‐ and structure‐based features. Using computational tools to estimate the effect of mutations on protein stability, we first observed a bias of the pathogenic mutations with significant destabilizing effects on family AD‐related proteins. Combining this insight, we built a generic predictive model, and improved the performance by tuning the sample weights in the training process. Our final model achieved the performance on area under the receiver operating characteristic curve up to 0.95 in the blind test and 0.70 in an independent clinical validation, outperforming all the state‐of‐the‐art methods. Feature interpretation indicated that the hydrophobic environment and polar interaction contacts were crucial to the decision on pathogenic phenotypes of missense mutations. Finally, we presented a user‐friendly web server, AlzDiscovery, for researchers to browse the predicted phenotypes of all possible missense mutations on these 21 AD‐related proteins. Our study will be a valuable resource for AD screening and the development of personalized treatment.
期刊介绍:
Protein Science, the flagship journal of The Protein Society, is a publication that focuses on advancing fundamental knowledge in the field of protein molecules. The journal welcomes original reports and review articles that contribute to our understanding of protein function, structure, folding, design, and evolution.
Additionally, Protein Science encourages papers that explore the applications of protein science in various areas such as therapeutics, protein-based biomaterials, bionanotechnology, synthetic biology, and bioelectronics.
The journal accepts manuscript submissions in any suitable format for review, with the requirement of converting the manuscript to journal-style format only upon acceptance for publication.
Protein Science is indexed and abstracted in numerous databases, including the Agricultural & Environmental Science Database (ProQuest), Biological Science Database (ProQuest), CAS: Chemical Abstracts Service (ACS), Embase (Elsevier), Health & Medical Collection (ProQuest), Health Research Premium Collection (ProQuest), Materials Science & Engineering Database (ProQuest), MEDLINE/PubMed (NLM), Natural Science Collection (ProQuest), and SciTech Premium Collection (ProQuest).