{"title":"MDVarP: modifier ~ disease-causing variant pairs predictor.","authors":"Hong Sun, Yunqin Chen, Liangxiao Ma","doi":"10.1186/s13040-024-00392-y","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Modifiers significantly impact disease phenotypes by modulating the effects of disease-causing variants, resulting in varying disease manifestations among individuals. However, identifying genetic interactions between modifier and disease-causing variants is challenging.</p><p><strong>Results: </strong>We developed MDVarP, an ensemble model comprising 1000 random forest predictors, to identify modifier ~ disease-causing variant combinations. MDVarP achieves high accuracy and precision, as verified using an independent dataset with published evidence of genetic interactions. We identified 25 novel modifier ~ disease-causing variant combinations and obtained supporting evidence for these associations. MDVarP outputs a class label (\"Associated-pair\" or \"Nonrelevant-pair\") and two prediction scores indicating the probability of a true association.</p><p><strong>Conclusions: </strong>MDVarP prioritizes variant pairs associated with phenotypic modulations, enabling more effective mapping of functional contributions from disease-causing and modifier variants. This framework interprets genetic interactions underlying phenotypic variations in human diseases, with potential applications in personalized medicine and disease prevention.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"39"},"PeriodicalIF":4.0000,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11460193/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biodata Mining","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13040-024-00392-y","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Modifiers significantly impact disease phenotypes by modulating the effects of disease-causing variants, resulting in varying disease manifestations among individuals. However, identifying genetic interactions between modifier and disease-causing variants is challenging.
Results: We developed MDVarP, an ensemble model comprising 1000 random forest predictors, to identify modifier ~ disease-causing variant combinations. MDVarP achieves high accuracy and precision, as verified using an independent dataset with published evidence of genetic interactions. We identified 25 novel modifier ~ disease-causing variant combinations and obtained supporting evidence for these associations. MDVarP outputs a class label ("Associated-pair" or "Nonrelevant-pair") and two prediction scores indicating the probability of a true association.
Conclusions: MDVarP prioritizes variant pairs associated with phenotypic modulations, enabling more effective mapping of functional contributions from disease-causing and modifier variants. This framework interprets genetic interactions underlying phenotypic variations in human diseases, with potential applications in personalized medicine and disease prevention.
期刊介绍:
BioData Mining is an open access, open peer-reviewed journal encompassing research on all aspects of data mining applied to high-dimensional biological and biomedical data, focusing on computational aspects of knowledge discovery from large-scale genetic, transcriptomic, genomic, proteomic, and metabolomic data.
Topical areas include, but are not limited to:
-Development, evaluation, and application of novel data mining and machine learning algorithms.
-Adaptation, evaluation, and application of traditional data mining and machine learning algorithms.
-Open-source software for the application of data mining and machine learning algorithms.
-Design, development and integration of databases, software and web services for the storage, management, retrieval, and analysis of data from large scale studies.
-Pre-processing, post-processing, modeling, and interpretation of data mining and machine learning results for biological interpretation and knowledge discovery.