{"title":"PairK: Pairwise k-mer alignment for quantifying protein motif conservation in disordered regions.","authors":"Jackson C Halpin, Amy E Keating","doi":"10.1002/pro.70004","DOIUrl":null,"url":null,"abstract":"<p><p>Protein-protein interactions are often mediated by a modular peptide recognition domain binding to a short linear motif (SLiM) in the disordered region of another protein. To understand the features of SLiMs that are important for binding and to identify motif instances that are important for biological function, it is useful to examine the evolutionary conservation of motifs across homologous proteins. However, the intrinsically disordered regions (IDRs) in which SLiMs reside evolve rapidly. Consequently, multiple sequence alignment (MSA) of IDRs often misaligns SLiMs and underestimates their conservation. We present PairK (pairwise k-mer alignment), an MSA-free method to align and quantify the relative local conservation of subsequences within an IDR. Lacking a ground truth for conservation, we tested PairK on the task of distinguishing biologically important motif instances from background motifs, under the assumption that biologically important motifs are more conserved. The method outperforms both standard MSA-based conservation scores and a modern LLM-based conservation score predictor. PairK can quantify conservation over wider phylogenetic distances than MSAs, indicating that some SLiMs are more conserved than MSA-based metrics imply. PairK is available as an open-source python package at https://github.com/jacksonh1/pairk. It is designed to be easily adapted for use with other SLiM tools and for diverse applications.</p>","PeriodicalId":20761,"journal":{"name":"Protein Science","volume":"34 1","pages":"e70004"},"PeriodicalIF":4.5000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11669117/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Protein Science","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/pro.70004","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Protein-protein interactions are often mediated by a modular peptide recognition domain binding to a short linear motif (SLiM) in the disordered region of another protein. To understand the features of SLiMs that are important for binding and to identify motif instances that are important for biological function, it is useful to examine the evolutionary conservation of motifs across homologous proteins. However, the intrinsically disordered regions (IDRs) in which SLiMs reside evolve rapidly. Consequently, multiple sequence alignment (MSA) of IDRs often misaligns SLiMs and underestimates their conservation. We present PairK (pairwise k-mer alignment), an MSA-free method to align and quantify the relative local conservation of subsequences within an IDR. Lacking a ground truth for conservation, we tested PairK on the task of distinguishing biologically important motif instances from background motifs, under the assumption that biologically important motifs are more conserved. The method outperforms both standard MSA-based conservation scores and a modern LLM-based conservation score predictor. PairK can quantify conservation over wider phylogenetic distances than MSAs, indicating that some SLiMs are more conserved than MSA-based metrics imply. PairK is available as an open-source python package at https://github.com/jacksonh1/pairk. It is designed to be easily adapted for use with other SLiM tools and for diverse applications.
期刊介绍:
Protein Science, the flagship journal of The Protein Society, is a publication that focuses on advancing fundamental knowledge in the field of protein molecules. The journal welcomes original reports and review articles that contribute to our understanding of protein function, structure, folding, design, and evolution.
Additionally, Protein Science encourages papers that explore the applications of protein science in various areas such as therapeutics, protein-based biomaterials, bionanotechnology, synthetic biology, and bioelectronics.
The journal accepts manuscript submissions in any suitable format for review, with the requirement of converting the manuscript to journal-style format only upon acceptance for publication.
Protein Science is indexed and abstracted in numerous databases, including the Agricultural & Environmental Science Database (ProQuest), Biological Science Database (ProQuest), CAS: Chemical Abstracts Service (ACS), Embase (Elsevier), Health & Medical Collection (ProQuest), Health Research Premium Collection (ProQuest), Materials Science & Engineering Database (ProQuest), MEDLINE/PubMed (NLM), Natural Science Collection (ProQuest), and SciTech Premium Collection (ProQuest).