监督机器学习使用全外显子组测序数据按亚型对炎症性肠病患者进行分类。

IF 8.3 2区 医学 Q1 GASTROENTEROLOGY & HEPATOLOGY Journal of Crohns & Colitis Pub Date : 2023-11-08 DOI:10.1093/ecco-jcc/jjad084
Imogen S Stafford, James J Ashton, Enrico Mossotto, Guo Cheng, Robert Mark Beattie, Sarah Ennis
{"title":"监督机器学习使用全外显子组测序数据按亚型对炎症性肠病患者进行分类。","authors":"Imogen S Stafford, James J Ashton, Enrico Mossotto, Guo Cheng, Robert Mark Beattie, Sarah Ennis","doi":"10.1093/ecco-jcc/jjad084","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Inflammatory bowel disease [IBD] is a chronic inflammatory disorder with two main subtypes: Crohn's disease [CD] and ulcerative colitis [UC]. Prompt subtype diagnosis enables the correct treatment to be administered. Using genomic data, we aimed to assess machine learning [ML] to classify patients according to IBD subtype.</p><p><strong>Methods: </strong>Whole exome sequencing [WES] from paediatric/adult IBD patients was processed using an in-house bioinformatics pipeline. These data were condensed into the per-gene, per-individual genomic burden score, GenePy. Data were split into training and testing datasets [80/20]. Feature selection with a linear support vector classifier, and hyperparameter tuning with Bayesian Optimisation, were performed [training data]. The supervised ML method random forest was utilised to classify patients as CD or UC, using three panels: 1] all available genes; 2] autoimmune genes; 3] 'IBD' genes. ML results were assessed using area under the receiver operating characteristics curve [AUROC], sensitivity, and specificity on the testing dataset.</p><p><strong>Results: </strong>A total of 906 patients were included in analysis [600 CD, 306 UC]. Training data included 488 patients, balanced according to the minority class of UC. The autoimmune gene panel generated the best performing ML model [AUROC = 0.68], outperforming an IBD gene panel [AUROC = 0.61]. NOD2 was the top gene for discriminating CD and UC, regardless of the gene panel used. Lack of variation in genes with high GenePy scores in CD patients was the best classifier of a diagnosis of UC.</p><p><strong>Discussion: </strong>We demonstrate promising classification of patients by subtype using random forest and WES data. Focusing on specific subgroups of patients, with larger datasets, may result in better classification.</p>","PeriodicalId":15547,"journal":{"name":"Journal of Crohns & Colitis","volume":" ","pages":"1672-1680"},"PeriodicalIF":8.3000,"publicationDate":"2023-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10637043/pdf/","citationCount":"0","resultStr":"{\"title\":\"Supervised Machine Learning Classifies Inflammatory Bowel Disease Patients by Subtype Using Whole Exome Sequencing Data.\",\"authors\":\"Imogen S Stafford, James J Ashton, Enrico Mossotto, Guo Cheng, Robert Mark Beattie, Sarah Ennis\",\"doi\":\"10.1093/ecco-jcc/jjad084\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Inflammatory bowel disease [IBD] is a chronic inflammatory disorder with two main subtypes: Crohn's disease [CD] and ulcerative colitis [UC]. Prompt subtype diagnosis enables the correct treatment to be administered. Using genomic data, we aimed to assess machine learning [ML] to classify patients according to IBD subtype.</p><p><strong>Methods: </strong>Whole exome sequencing [WES] from paediatric/adult IBD patients was processed using an in-house bioinformatics pipeline. These data were condensed into the per-gene, per-individual genomic burden score, GenePy. Data were split into training and testing datasets [80/20]. Feature selection with a linear support vector classifier, and hyperparameter tuning with Bayesian Optimisation, were performed [training data]. The supervised ML method random forest was utilised to classify patients as CD or UC, using three panels: 1] all available genes; 2] autoimmune genes; 3] 'IBD' genes. ML results were assessed using area under the receiver operating characteristics curve [AUROC], sensitivity, and specificity on the testing dataset.</p><p><strong>Results: </strong>A total of 906 patients were included in analysis [600 CD, 306 UC]. Training data included 488 patients, balanced according to the minority class of UC. The autoimmune gene panel generated the best performing ML model [AUROC = 0.68], outperforming an IBD gene panel [AUROC = 0.61]. NOD2 was the top gene for discriminating CD and UC, regardless of the gene panel used. Lack of variation in genes with high GenePy scores in CD patients was the best classifier of a diagnosis of UC.</p><p><strong>Discussion: </strong>We demonstrate promising classification of patients by subtype using random forest and WES data. Focusing on specific subgroups of patients, with larger datasets, may result in better classification.</p>\",\"PeriodicalId\":15547,\"journal\":{\"name\":\"Journal of Crohns & Colitis\",\"volume\":\" \",\"pages\":\"1672-1680\"},\"PeriodicalIF\":8.3000,\"publicationDate\":\"2023-11-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10637043/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Crohns & Colitis\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1093/ecco-jcc/jjad084\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GASTROENTEROLOGY & HEPATOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Crohns & Colitis","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/ecco-jcc/jjad084","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GASTROENTEROLOGY & HEPATOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

背景:炎症性肠病(IBD)是一种慢性炎症性疾病,主要有两种亚型:克罗恩病(CD)和溃疡性结肠炎(UC)。及时的亚型诊断使正确的治疗得以实施。利用基因组数据,我们旨在评估机器学习[ML]根据IBD亚型对患者进行分类。方法:使用内部生物信息学管道处理来自儿科/成人IBD患者的全外显子组测序[WES]。这些数据被浓缩成每个基因、每个个体的基因组负担评分(GenePy)。数据被分成训练和测试数据集[80/20]。使用线性支持向量分类器进行特征选择,使用贝叶斯优化进行超参数调优[训练数据]。使用监督ML方法随机森林将患者分类为CD或UC,使用三个面板:1]所有可用基因;2]自身免疫基因;[3]“IBD”基因。使用受试者工作特征曲线下的面积(AUROC)、敏感性和测试数据集的特异性来评估ML结果。结果:共有906例患者被纳入分析[600例CD, 306例UC]。训练数据包括488例患者,根据UC的少数类别进行平衡。自身免疫基因组产生了表现最好的ML模型[AUROC = 0.68],优于IBD基因组[AUROC = 0.61]。无论使用何种基因面板,NOD2都是区分CD和UC的最佳基因。在CD患者中缺乏高GenePy评分的基因变异是UC诊断的最佳分类器。讨论:我们展示了使用随机森林和WES数据按亚型进行患者分类的前景。专注于特定的亚组患者,拥有更大的数据集,可能会导致更好的分类。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Supervised Machine Learning Classifies Inflammatory Bowel Disease Patients by Subtype Using Whole Exome Sequencing Data.

Background: Inflammatory bowel disease [IBD] is a chronic inflammatory disorder with two main subtypes: Crohn's disease [CD] and ulcerative colitis [UC]. Prompt subtype diagnosis enables the correct treatment to be administered. Using genomic data, we aimed to assess machine learning [ML] to classify patients according to IBD subtype.

Methods: Whole exome sequencing [WES] from paediatric/adult IBD patients was processed using an in-house bioinformatics pipeline. These data were condensed into the per-gene, per-individual genomic burden score, GenePy. Data were split into training and testing datasets [80/20]. Feature selection with a linear support vector classifier, and hyperparameter tuning with Bayesian Optimisation, were performed [training data]. The supervised ML method random forest was utilised to classify patients as CD or UC, using three panels: 1] all available genes; 2] autoimmune genes; 3] 'IBD' genes. ML results were assessed using area under the receiver operating characteristics curve [AUROC], sensitivity, and specificity on the testing dataset.

Results: A total of 906 patients were included in analysis [600 CD, 306 UC]. Training data included 488 patients, balanced according to the minority class of UC. The autoimmune gene panel generated the best performing ML model [AUROC = 0.68], outperforming an IBD gene panel [AUROC = 0.61]. NOD2 was the top gene for discriminating CD and UC, regardless of the gene panel used. Lack of variation in genes with high GenePy scores in CD patients was the best classifier of a diagnosis of UC.

Discussion: We demonstrate promising classification of patients by subtype using random forest and WES data. Focusing on specific subgroups of patients, with larger datasets, may result in better classification.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Crohns & Colitis
Journal of Crohns & Colitis 医学-胃肠肝病学
CiteScore
15.50
自引率
7.50%
发文量
1048
审稿时长
1 months
期刊介绍: Journal of Crohns and Colitis is concerned with the dissemination of knowledge on clinical, basic science and innovative methods related to inflammatory bowel diseases. The journal publishes original articles, review papers, editorials, leading articles, viewpoints, case reports, innovative methods and letters to the editor.
期刊最新文献
Peripheral Blood DNA Methylation Signatures and Response to Tofacitinib in Moderate-to-severe Ulcerative Colitis. Whole Blood DNA Methylation Changes Are Associated with Anti-TNF Drug Concentration in Patients with Crohn's Disease. 6-Mercaptopurine in ulcerative colitis: the potential of upfront dosing with allopurinol. Mitigating the Risk of Tofacitinib-induced Adverse Events in the Elderly Population with Ulcerative Colitis. Temporary Faecal Diversion for Refractory Perianal and/or Distal Colonic Crohn's Disease in the Biologic Era: An Updated Systematic Review with Meta-analysis.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1