改进基于血小板-RNA 的诊断:癌症检测和多类分类机器学习模型的比较分析。

IF 6.6 2区 医学 Q1 Biochemistry, Genetics and Molecular Biology Molecular Oncology Pub Date : 2024-11-01 Epub Date: 2024-06-17 DOI:10.1002/1878-0261.13689
Maksym A Jopek, Krzysztof Pastuszak, Michał Sieczczyński, Sebastian Cygert, Anna J Żaczek, Matthew T Rondina, Anna Supernat
{"title":"改进基于血小板-RNA 的诊断:癌症检测和多类分类机器学习模型的比较分析。","authors":"Maksym A Jopek, Krzysztof Pastuszak, Michał Sieczczyński, Sebastian Cygert, Anna J Żaczek, Matthew T Rondina, Anna Supernat","doi":"10.1002/1878-0261.13689","DOIUrl":null,"url":null,"abstract":"<p><p>Liquid biopsy demonstrates excellent potential in patient management by providing a minimally invasive and cost-effective approach to detecting and monitoring cancer, even at its early stages. Due to the complexity of liquid biopsy data, machine-learning techniques are increasingly gaining attention in sample analysis, especially for multidimensional data such as RNA expression profiles. Yet, there is no agreement in the community on which methods are the most effective or how to process the data. To circumvent this, we performed a large-scale study using various machine-learning techniques. First, we took a closer look at existing datasets and filtered out some patients to assert data collection quality. The final data collection included platelet RNA samples acquired from 1397 cancer patients (17 types of cancer) and 354 asymptomatic, presumed healthy, donors. Then, we assessed an array of different machine-learning models and techniques (e.g., feature selection of RNA transcripts) in pan-cancer detection and multiclass classification. Our results show that simple logistic regression performs the best, reaching a 68% cancer detection rate at a 99% specificity level, and multiclass classification accuracy of 79.38% when distinguishing between five cancer types. In summary, by revisiting classical machine-learning models, we have exceeded the previously used method by 5% and 9.65% in cancer detection and multiclass classification, respectively. To ease further research, we open-source our code and data processing pipelines (https://gitlab.com/jopekmaksym/improving-platelet-rna-based-diagnostics), which we hope will serve the community as a strong baseline.</p>","PeriodicalId":18764,"journal":{"name":"Molecular Oncology","volume":" ","pages":"2743-2754"},"PeriodicalIF":6.6000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11547247/pdf/","citationCount":"0","resultStr":"{\"title\":\"Improving platelet-RNA-based diagnostics: a comparative analysis of machine learning models for cancer detection and multiclass classification.\",\"authors\":\"Maksym A Jopek, Krzysztof Pastuszak, Michał Sieczczyński, Sebastian Cygert, Anna J Żaczek, Matthew T Rondina, Anna Supernat\",\"doi\":\"10.1002/1878-0261.13689\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Liquid biopsy demonstrates excellent potential in patient management by providing a minimally invasive and cost-effective approach to detecting and monitoring cancer, even at its early stages. Due to the complexity of liquid biopsy data, machine-learning techniques are increasingly gaining attention in sample analysis, especially for multidimensional data such as RNA expression profiles. Yet, there is no agreement in the community on which methods are the most effective or how to process the data. To circumvent this, we performed a large-scale study using various machine-learning techniques. First, we took a closer look at existing datasets and filtered out some patients to assert data collection quality. The final data collection included platelet RNA samples acquired from 1397 cancer patients (17 types of cancer) and 354 asymptomatic, presumed healthy, donors. Then, we assessed an array of different machine-learning models and techniques (e.g., feature selection of RNA transcripts) in pan-cancer detection and multiclass classification. Our results show that simple logistic regression performs the best, reaching a 68% cancer detection rate at a 99% specificity level, and multiclass classification accuracy of 79.38% when distinguishing between five cancer types. In summary, by revisiting classical machine-learning models, we have exceeded the previously used method by 5% and 9.65% in cancer detection and multiclass classification, respectively. To ease further research, we open-source our code and data processing pipelines (https://gitlab.com/jopekmaksym/improving-platelet-rna-based-diagnostics), which we hope will serve the community as a strong baseline.</p>\",\"PeriodicalId\":18764,\"journal\":{\"name\":\"Molecular Oncology\",\"volume\":\" \",\"pages\":\"2743-2754\"},\"PeriodicalIF\":6.6000,\"publicationDate\":\"2024-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11547247/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular Oncology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1002/1878-0261.13689\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/6/17 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"Biochemistry, Genetics and Molecular Biology\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Oncology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/1878-0261.13689","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/6/17 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"Biochemistry, Genetics and Molecular Biology","Score":null,"Total":0}
引用次数: 0

摘要

液体活检为检测和监测癌症(即使是早期癌症)提供了一种微创且经济有效的方法,在患者管理方面显示出巨大的潜力。由于液体活检数据的复杂性,机器学习技术在样本分析中日益受到重视,尤其是对于 RNA 表达谱等多维数据。然而,对于哪种方法最有效或如何处理数据,业界尚未达成一致。为了避免这种情况,我们利用各种机器学习技术进行了大规模研究。首先,我们仔细研究了现有的数据集,过滤掉了一些患者,以确保数据收集的质量。最终收集的数据包括从 1397 名癌症患者(17 种癌症)和 354 名无症状、假定健康的捐献者那里获得的血小板 RNA 样本。然后,我们对泛癌检测和多类分类中的一系列不同机器学习模型和技术(如 RNA 转录本的特征选择)进行了评估。结果表明,简单的逻辑回归表现最佳,在特异性水平为 99% 的情况下,癌症检测率达到 68%,在区分五种癌症类型时,多类分类准确率为 79.38%。总之,通过重新审视经典机器学习模型,我们在癌症检测和多类分类方面分别比以前使用的方法高出 5% 和 9.65%。为了方便进一步研究,我们开源了我们的代码和数据处理管道(https://gitlab.com/jopekmaksym/improving-platelet-rna-based-diagnostics),希望能为社区提供一个强大的基线。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Improving platelet-RNA-based diagnostics: a comparative analysis of machine learning models for cancer detection and multiclass classification.

Liquid biopsy demonstrates excellent potential in patient management by providing a minimally invasive and cost-effective approach to detecting and monitoring cancer, even at its early stages. Due to the complexity of liquid biopsy data, machine-learning techniques are increasingly gaining attention in sample analysis, especially for multidimensional data such as RNA expression profiles. Yet, there is no agreement in the community on which methods are the most effective or how to process the data. To circumvent this, we performed a large-scale study using various machine-learning techniques. First, we took a closer look at existing datasets and filtered out some patients to assert data collection quality. The final data collection included platelet RNA samples acquired from 1397 cancer patients (17 types of cancer) and 354 asymptomatic, presumed healthy, donors. Then, we assessed an array of different machine-learning models and techniques (e.g., feature selection of RNA transcripts) in pan-cancer detection and multiclass classification. Our results show that simple logistic regression performs the best, reaching a 68% cancer detection rate at a 99% specificity level, and multiclass classification accuracy of 79.38% when distinguishing between five cancer types. In summary, by revisiting classical machine-learning models, we have exceeded the previously used method by 5% and 9.65% in cancer detection and multiclass classification, respectively. To ease further research, we open-source our code and data processing pipelines (https://gitlab.com/jopekmaksym/improving-platelet-rna-based-diagnostics), which we hope will serve the community as a strong baseline.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Molecular Oncology
Molecular Oncology Biochemistry, Genetics and Molecular Biology-Molecular Medicine
CiteScore
11.80
自引率
1.50%
发文量
203
审稿时长
10 weeks
期刊介绍: Molecular Oncology highlights new discoveries, approaches, and technical developments, in basic, clinical and discovery-driven translational cancer research. It publishes research articles, reviews (by invitation only), and timely science policy articles. The journal is now fully Open Access with all articles published over the past 10 years freely available.
期刊最新文献
Platelet-activating factor: a potential therapeutic target to improve cancer immunotherapy. Global metabolomic profiling of tumor tissue and paired serum samples to identify biomarkers for response to neoadjuvant FOLFIRINOX treatment of human pancreatic cancer. Gut microbiota diversity is prognostic and associated with benefit from chemo-immunotherapy in metastatic triple-negative breast cancer. Integrative transcriptomic analysis identifies emetine as a promising candidate for overcoming acquired resistance to ALK inhibitors in lung cancer. Vertical inhibition of p110α/AKT and N-cadherin enhances treatment efficacy in PIK3CA-aberrated ovarian cancer cells.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1