Small patient datasets reveal genetic drivers of non-small cell lung cancer subtypes using machine learning for hypothesis generation

Q4 Biochemistry, Genetics and Molecular Biology Exploration of medicine Pub Date : 2023-07-26 DOI:10.37349/emed.2023.00153
Moses Cook, Bessi Qorri, Amruth Baskar, Jalal Ziauddin, L. Pani, Shashibushan Yenkanchi, J. Geraci
{"title":"Small patient datasets reveal genetic drivers of non-small cell lung cancer subtypes using machine learning for hypothesis generation","authors":"Moses Cook, Bessi Qorri, Amruth Baskar, Jalal Ziauddin, L. Pani, Shashibushan Yenkanchi, J. Geraci","doi":"10.37349/emed.2023.00153","DOIUrl":null,"url":null,"abstract":"Aim: Many small datasets of significant value exist in the medical space that are being underutilized. Due to the heterogeneity of complex disorders found in oncology, systems capable of discovering patient subpopulations while elucidating etiologies are of great value as they can indicate leads for innovative drug discovery and development.\nMethods: Two small non-small cell lung cancer (NSCLC) datasets (GSE18842 and GSE10245) consisting of 58 samples of adenocarcinoma (ADC) and 45 samples of squamous cell carcinoma (SCC) were used in a machine intelligence framework to identify genetic biomarkers differentiating these two subtypes. Utilizing a set of standard machine learning (ML) methods, subpopulations of ADC and SCC were uncovered while simultaneously extracting which genes, in combination, were significantly involved in defining the subpopulations. A previously described interactive hypothesis-generating method designed to work with ML methods was employed to provide an alternative way of extracting the most important combination of variables to construct a new data set.\nResults: Several genes were uncovered that were previously implicated by other methods. This framework accurately discovered known subpopulations, such as genetic drivers associated with differing levels of aggressiveness within the SCC and ADC subtypes. Furthermore, phyosphatidylinositol glycan anchor biosynthesis, class X (PIGX) was a novel gene implicated in this study that warrants further investigation due to its role in breast cancer proliferation.\nConclusions: The ability to learn from small datasets was highlighted and revealed well-established properties of NSCLC. This showcases the utility of ML techniques to reveal potential genes of interest, even from small datasets, shedding light on novel driving factors behind subpopulations of patients.","PeriodicalId":72999,"journal":{"name":"Exploration of medicine","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Exploration of medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.37349/emed.2023.00153","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Biochemistry, Genetics and Molecular Biology","Score":null,"Total":0}
引用次数: 0

Abstract

Aim: Many small datasets of significant value exist in the medical space that are being underutilized. Due to the heterogeneity of complex disorders found in oncology, systems capable of discovering patient subpopulations while elucidating etiologies are of great value as they can indicate leads for innovative drug discovery and development. Methods: Two small non-small cell lung cancer (NSCLC) datasets (GSE18842 and GSE10245) consisting of 58 samples of adenocarcinoma (ADC) and 45 samples of squamous cell carcinoma (SCC) were used in a machine intelligence framework to identify genetic biomarkers differentiating these two subtypes. Utilizing a set of standard machine learning (ML) methods, subpopulations of ADC and SCC were uncovered while simultaneously extracting which genes, in combination, were significantly involved in defining the subpopulations. A previously described interactive hypothesis-generating method designed to work with ML methods was employed to provide an alternative way of extracting the most important combination of variables to construct a new data set. Results: Several genes were uncovered that were previously implicated by other methods. This framework accurately discovered known subpopulations, such as genetic drivers associated with differing levels of aggressiveness within the SCC and ADC subtypes. Furthermore, phyosphatidylinositol glycan anchor biosynthesis, class X (PIGX) was a novel gene implicated in this study that warrants further investigation due to its role in breast cancer proliferation. Conclusions: The ability to learn from small datasets was highlighted and revealed well-established properties of NSCLC. This showcases the utility of ML techniques to reveal potential genes of interest, even from small datasets, shedding light on novel driving factors behind subpopulations of patients.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
小患者数据集使用机器学习进行假设生成,揭示癌症非小细胞肺癌亚型的遗传驱动因素
目的:医学领域中存在许多具有重要价值的小型数据集,但这些数据集的利用率不高。由于肿瘤学中发现的复杂疾病的异质性,能够发现患者亚群同时阐明病因的系统具有巨大价值,因为它们可以为创新药物的发现和开发指明方向。方法:在机器智能框架中使用两个小非小细胞肺癌(NSCLC)数据集(GSE18842和GSE10245),包括58个腺癌(ADC)样本和45个鳞状细胞癌(SCC)样本,以识别区分这两种亚型的遗传生物标志物。利用一组标准的机器学习(ML)方法,在同时提取哪些基因组合显著参与定义亚群的同时,发现了ADC和SCC的亚群。采用先前描述的设计用于与ML方法一起工作的交互式假设生成方法来提供提取变量的最重要组合以构建新数据集的替代方法。结果:发现了一些先前通过其他方法涉及的基因。该框架准确地发现了已知的亚群,例如与SCC和ADC亚型中不同攻击性水平相关的遗传驱动因素。此外,藻磷脂酰肌醇聚糖锚定生物合成,X类(PIGX)是本研究涉及的一个新基因,由于其在乳腺癌症增殖中的作用,值得进一步研究。结论:强调了从小数据集学习的能力,并揭示了NSCLC的既定特性。这展示了ML技术在揭示潜在感兴趣基因方面的实用性,即使是从小型数据集中,也能揭示患者亚群背后的新驱动因素。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
2.10
自引率
0.00%
发文量
0
审稿时长
13 weeks
期刊最新文献
The future of cervical cancer prevention: advances in research and technology Impact of vitamin D on ultraviolet-induced photoaging and skin diseases Physiologically driven nanodrug delivery system for targeted lung cancer treatment Effects of alimentary-derived bacterial metabolites on energy metabolism in colonic epithelial cells and inflammatory bowel diseases Medicinal and immunological aspects of bacteriophage therapy to combat antibiotic resistance
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1