Smoking Classification Using Novel Plasma Cytokines by implementing Machine Learning and Statistical Methods.

Seema Singh Saharan, Pankaj Nagar, Kate Townsend Creasy, Eveline O Stock, James Feng, Mary J Malloy, John P Kane
{"title":"Smoking Classification Using Novel Plasma Cytokines by implementing Machine Learning and Statistical Methods.","authors":"Seema Singh Saharan, Pankaj Nagar, Kate Townsend Creasy, Eveline O Stock, James Feng, Mary J Malloy, John P Kane","doi":"10.1109/csci62032.2023.00118","DOIUrl":null,"url":null,"abstract":"<p><p>Smoking is a major cause of premature and preventable death. Tobacco exposure has a detrimental effect on many organs and contributes to multiple diseases including chronic obstructive pulmonary disease (COPD), cardiovascular disease, cancer, and diabetes. Cytokines are inflammatory biomarkers that are mechanistically associated with smoking. Machine Learning algorithms allow for the quantitative assessment of the contributions of individual cytokines to tobacco-related diseases. The mapping of cytokines to disease can facilitate and direct treatment modalities. By the application of k Nearest Neighbor (k-NN) and Random Forest machine learning algorithms on 63 plasma cytokines we have demonstrated the classification of smoking. To ensure optimal results, performance improvement techniques such as k-fold cross validation and hyper parameter tuning are employed. Separability efficiency achieved by the models is evaluated using the Area Under the Receiver Operating Characteristic (AUROC) metric. The most significant cytokines that enabled the classification are identified and presented. The statistically significant difference for AUROC score of k-NN and Random Forest has been ascertained using the 2-sample independent t test. A reasonably good classification performance was achieved by k-NN algorithm with an AUROC metric of .87, and a 95% CI of (.823,.917). Random forest exceeded k-NN algorithm's performance, with a perfect AUROC score of 1 and a 95% CI of (1,1). From among the ten most prominent cytokines that contributed to the classification, the ones common to both algorithms are: LIF, IL22, G-CSF/CSF-3, TRAIL. AUROC scores for k-NN and Random Forest are significantly different (p-value = 5.105e-16). The discovery and transference of biomarkers such as cytokines from the platform of molecular investigation to clinical practice, can facilitate precision medicine-based therapeutic interventions.</p>","PeriodicalId":93614,"journal":{"name":"Proceedings. International Conference on Computational Science and Computational Intelligence","volume":"2023 ","pages":"686-694"},"PeriodicalIF":0.0000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11500790/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Conference on Computational Science and Computational Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/csci62032.2023.00118","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/7/19 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Smoking is a major cause of premature and preventable death. Tobacco exposure has a detrimental effect on many organs and contributes to multiple diseases including chronic obstructive pulmonary disease (COPD), cardiovascular disease, cancer, and diabetes. Cytokines are inflammatory biomarkers that are mechanistically associated with smoking. Machine Learning algorithms allow for the quantitative assessment of the contributions of individual cytokines to tobacco-related diseases. The mapping of cytokines to disease can facilitate and direct treatment modalities. By the application of k Nearest Neighbor (k-NN) and Random Forest machine learning algorithms on 63 plasma cytokines we have demonstrated the classification of smoking. To ensure optimal results, performance improvement techniques such as k-fold cross validation and hyper parameter tuning are employed. Separability efficiency achieved by the models is evaluated using the Area Under the Receiver Operating Characteristic (AUROC) metric. The most significant cytokines that enabled the classification are identified and presented. The statistically significant difference for AUROC score of k-NN and Random Forest has been ascertained using the 2-sample independent t test. A reasonably good classification performance was achieved by k-NN algorithm with an AUROC metric of .87, and a 95% CI of (.823,.917). Random forest exceeded k-NN algorithm's performance, with a perfect AUROC score of 1 and a 95% CI of (1,1). From among the ten most prominent cytokines that contributed to the classification, the ones common to both algorithms are: LIF, IL22, G-CSF/CSF-3, TRAIL. AUROC scores for k-NN and Random Forest are significantly different (p-value = 5.105e-16). The discovery and transference of biomarkers such as cytokines from the platform of molecular investigation to clinical practice, can facilitate precision medicine-based therapeutic interventions.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过机器学习和统计方法利用新型血浆细胞因子进行吸烟分类。
吸烟是导致过早死亡和可预防死亡的主要原因。烟草暴露会对许多器官产生有害影响,并导致多种疾病,包括慢性阻塞性肺病(COPD)、心血管疾病、癌症和糖尿病。细胞因子是一种炎症生物标志物,从机理上讲与吸烟有关。通过机器学习算法,可以定量评估单个细胞因子对烟草相关疾病的影响。细胞因子与疾病的映射可促进和指导治疗模式。通过对 63 种血浆细胞因子应用 k Nearest Neighbor(k-NN)和随机森林(Random Forest)机器学习算法,我们对吸烟进行了分类。为确保获得最佳结果,我们采用了 k 倍交叉验证和超参数调整等性能改进技术。使用接收者操作特征下面积(AUROC)指标对模型实现的可分离性效率进行了评估。确定并展示了能够进行分类的最重要细胞因子。使用双样本独立 t 检验确定了 k-NN 和随机森林的 AUROC 分数在统计学上的显著差异。k-NN 算法取得了相当不错的分类性能,其 AUROC 指标为 0.87,95% CI 为(.823,.917)。随机森林的性能超过了 k-NN 算法,AUROC 满分为 1,95% CI 为(1,1)。在对分类做出贡献的 10 种最重要的细胞因子中,两种算法的共同点如下:LIF、IL22、G-CSF/CSF-3、TRIT。k-NN 和随机森林的 AUROC 分数有显著差异(p 值 = 5.105e-16)。发现细胞因子等生物标记物并将其从分子研究平台转移到临床实践中,可促进基于精准医学的治疗干预。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Machine Learning-Based Model for Predicting Coronary Heart Disease Using Preβ HDL and Cytokines as Plasma Biomarkers. Smoking Classification Using Novel Plasma Cytokines by implementing Machine Learning and Statistical Methods. Logistic Regression and Statistical Regularization Techniques for Risk Classification of Coronary Artery Disease using Cytokines transported by high density lipoproteins. Optimization of Smoking Classification by Applying Neural Network with Variable Importance Using Cytokine Biomarkers. Collagen a1(XI) structure prediction by Alphafold 2.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1