{"title":"利用遗传、环境和临床因素预测缺血性中风患者的功能预后:基于人群的前瞻性队列研究的机器学习分析。","authors":"Siding Chen, Zhe Xu, Jinfeng Yin, Hongqiu Gu, Yanfeng Shi, Cang Guo, Xia Meng, Hao Li, Xinying Huang, Yong Jiang, Yongjun Wang","doi":"10.1093/bib/bbae487","DOIUrl":null,"url":null,"abstract":"<p><p>Ischemic stroke (IS) is a leading cause of adult disability that can severely compromise the quality of life for patients. Accurately predicting the IS functional outcome is crucial for precise risk stratification and effective therapeutic interventions. We developed a predictive model integrating genetic, environmental, and clinical factors using data from 7819 IS patients in the Third China National Stroke Registry. Employing an 80:20 split, we randomly divided the dataset into development and internal validation cohorts. The discrimination and calibration performance of models were evaluated using the area under the receiver operating characteristic curves (AUC) for discrimination and Brier score with calibration curve in the internal validation cohort. We conducted genome-wide association studies (GWAS) in the development cohort, identifying rs11109607 (ANKS1B) as the most significant variant associated with IS functional outcome. We employed principal component analysis to reduce dimensionality on the top 100 significant variants identified by the GWAS, incorporating them as genetic factors in the predictive model. We employed a machine learning algorithm capable of identifying nonlinear relationships to establish predictive models for IS patient functional outcome. The optimal model was the XGBoost model, which outperformed the logistic regression model (AUC 0.818 versus 0.756, P < .05) and significantly improved reclassification efficiency. Our study innovatively incorporated genetic, environmental, and clinical factors for predicting the IS functional outcome in East Asian populations, thereby offering novel insights into IS functional outcome.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8000,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11471838/pdf/","citationCount":"0","resultStr":"{\"title\":\"Predicting functional outcome in ischemic stroke patients using genetic, environmental, and clinical factors: a machine learning analysis of population-based prospective cohort study.\",\"authors\":\"Siding Chen, Zhe Xu, Jinfeng Yin, Hongqiu Gu, Yanfeng Shi, Cang Guo, Xia Meng, Hao Li, Xinying Huang, Yong Jiang, Yongjun Wang\",\"doi\":\"10.1093/bib/bbae487\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Ischemic stroke (IS) is a leading cause of adult disability that can severely compromise the quality of life for patients. Accurately predicting the IS functional outcome is crucial for precise risk stratification and effective therapeutic interventions. We developed a predictive model integrating genetic, environmental, and clinical factors using data from 7819 IS patients in the Third China National Stroke Registry. Employing an 80:20 split, we randomly divided the dataset into development and internal validation cohorts. The discrimination and calibration performance of models were evaluated using the area under the receiver operating characteristic curves (AUC) for discrimination and Brier score with calibration curve in the internal validation cohort. We conducted genome-wide association studies (GWAS) in the development cohort, identifying rs11109607 (ANKS1B) as the most significant variant associated with IS functional outcome. We employed principal component analysis to reduce dimensionality on the top 100 significant variants identified by the GWAS, incorporating them as genetic factors in the predictive model. We employed a machine learning algorithm capable of identifying nonlinear relationships to establish predictive models for IS patient functional outcome. The optimal model was the XGBoost model, which outperformed the logistic regression model (AUC 0.818 versus 0.756, P < .05) and significantly improved reclassification efficiency. Our study innovatively incorporated genetic, environmental, and clinical factors for predicting the IS functional outcome in East Asian populations, thereby offering novel insights into IS functional outcome.</p>\",\"PeriodicalId\":9209,\"journal\":{\"name\":\"Briefings in bioinformatics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":6.8000,\"publicationDate\":\"2024-09-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11471838/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Briefings in bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/bib/bbae487\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbae487","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
摘要
缺血性中风(IS)是导致成人残疾的主要原因,会严重影响患者的生活质量。准确预测缺血性脑卒中的功能预后对于精确的风险分层和有效的治疗干预至关重要。我们利用第三期中国全国脑卒中登记中 7819 例 IS 患者的数据,建立了一个综合了遗传、环境和临床因素的预测模型。我们采用 80:20 的比例将数据集随机分为开发队列和内部验证队列。在内部验证队列中,我们使用接收者操作特征曲线下面积(AUC)评估了模型的判别和校准性能,并使用校准曲线评估了Brier评分。我们在开发队列中进行了全基因组关联研究(GWAS),发现rs11109607(ANKS1B)是与IS功能结果相关的最重要变异。我们采用主成分分析法降低了 GWAS 确定的前 100 个重要变异的维度,将它们作为遗传因素纳入预测模型。我们采用了一种能够识别非线性关系的机器学习算法来建立 IS 患者功能预后的预测模型。最佳模型是 XGBoost 模型,它优于逻辑回归模型(AUC 0.818 对 0.756,P
Predicting functional outcome in ischemic stroke patients using genetic, environmental, and clinical factors: a machine learning analysis of population-based prospective cohort study.
Ischemic stroke (IS) is a leading cause of adult disability that can severely compromise the quality of life for patients. Accurately predicting the IS functional outcome is crucial for precise risk stratification and effective therapeutic interventions. We developed a predictive model integrating genetic, environmental, and clinical factors using data from 7819 IS patients in the Third China National Stroke Registry. Employing an 80:20 split, we randomly divided the dataset into development and internal validation cohorts. The discrimination and calibration performance of models were evaluated using the area under the receiver operating characteristic curves (AUC) for discrimination and Brier score with calibration curve in the internal validation cohort. We conducted genome-wide association studies (GWAS) in the development cohort, identifying rs11109607 (ANKS1B) as the most significant variant associated with IS functional outcome. We employed principal component analysis to reduce dimensionality on the top 100 significant variants identified by the GWAS, incorporating them as genetic factors in the predictive model. We employed a machine learning algorithm capable of identifying nonlinear relationships to establish predictive models for IS patient functional outcome. The optimal model was the XGBoost model, which outperformed the logistic regression model (AUC 0.818 versus 0.756, P < .05) and significantly improved reclassification efficiency. Our study innovatively incorporated genetic, environmental, and clinical factors for predicting the IS functional outcome in East Asian populations, thereby offering novel insights into IS functional outcome.
期刊介绍:
Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data.
The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.