A machine learning-based method for feature reduction of methylation data for the classification of cancer tissue origin

IF 2.4 3区 医学 Q3 ONCOLOGY International Journal of Clinical Oncology Pub Date : 2024-09-18 DOI:10.1007/s10147-024-02617-w
Marco A. De Velasco, Kazuko Sakai, Seiichiro Mitani, Yurie Kura, Shuji Minamoto, Takahiro Haeno, Hidetoshi Hayashi, Kazuto Nishio
{"title":"A machine learning-based method for feature reduction of methylation data for the classification of cancer tissue origin","authors":"Marco A. De Velasco, Kazuko Sakai, Seiichiro Mitani, Yurie Kura, Shuji Minamoto, Takahiro Haeno, Hidetoshi Hayashi, Kazuto Nishio","doi":"10.1007/s10147-024-02617-w","DOIUrl":null,"url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Background</h3><p>Genome DNA methylation profiling is a promising yet costly method for cancer classification, involving substantial data. We developed an ensemble learning model to identify cancer types using methylation profiles from a limited number of CpG sites.</p><h3 data-test=\"abstract-sub-heading\">Methods</h3><p>Analyzing methylation data from 890 samples across 10 cancer types from the TCGA database, we utilized ANOVA and Gain Ratio to select the most significant CpG sites, then employed Gradient Boosting to reduce these to just 100 sites.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>This approach maintained high accuracy across multiple machine learning models, with classification accuracy rates between 87.7% and 93.5% for methods including Extreme Gradient Boosting, CatBoost, and Random Forest. This method effectively minimizes the number of features needed without losing performance, helping to classify primary organs and uncover subgroups within specific cancers like breast and lung.</p><h3 data-test=\"abstract-sub-heading\">Conclusions</h3><p>Using a gradient boosting feature selector shows potential for streamlining methylation-based cancer classification.</p>","PeriodicalId":13869,"journal":{"name":"International Journal of Clinical Oncology","volume":"102 1","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Clinical Oncology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s10147-024-02617-w","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background

Genome DNA methylation profiling is a promising yet costly method for cancer classification, involving substantial data. We developed an ensemble learning model to identify cancer types using methylation profiles from a limited number of CpG sites.

Methods

Analyzing methylation data from 890 samples across 10 cancer types from the TCGA database, we utilized ANOVA and Gain Ratio to select the most significant CpG sites, then employed Gradient Boosting to reduce these to just 100 sites.

Results

This approach maintained high accuracy across multiple machine learning models, with classification accuracy rates between 87.7% and 93.5% for methods including Extreme Gradient Boosting, CatBoost, and Random Forest. This method effectively minimizes the number of features needed without losing performance, helping to classify primary organs and uncover subgroups within specific cancers like breast and lung.

Conclusions

Using a gradient boosting feature selector shows potential for streamlining methylation-based cancer classification.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于机器学习的甲基化数据特征还原方法,用于癌症组织来源分类
背景基因组DNA甲基化图谱分析是一种很有前景但成本高昂的癌症分类方法,涉及大量数据。方法分析 TCGA 数据库中 10 种癌症类型 890 个样本的甲基化数据,利用方差分析和增益比来选择最重要的 CpG 位点,然后利用梯度提升法将这些位点减少到 100 个。这种方法在不降低性能的前提下有效地减少了所需特征的数量,有助于对原发器官进行分类,并发现乳腺癌和肺癌等特定癌症中的亚组。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
6.80
自引率
3.00%
发文量
175
审稿时长
2 months
期刊介绍: The International Journal of Clinical Oncology (IJCO) welcomes original research papers on all aspects of clinical oncology that report the results of novel and timely investigations. Reports on clinical trials are encouraged. Experimental studies will also be accepted if they have obvious relevance to clinical oncology. Membership in the Japan Society of Clinical Oncology is not a prerequisite for submission to the journal. Papers are received on the understanding that: their contents have not been published in whole or in part elsewhere; that they are subject to peer review by at least two referees and the Editors, and to editorial revision of the language and contents; and that the Editors are responsible for their acceptance, rejection, and order of publication.
期刊最新文献
Robotic dual-docking surgery for para-aortic lymphadenectomy in endometrial cancer: a prospective feasibility study. Prognostic significance of lymph node metastasis of soft tissue sarcoma of the extremities. National cancer institute experience. Efficacy of androgen receptor signaling inhibitors in combination with androgen deprivation therapy for castration-sensitive metastatic prostate cancer: a retrospective analysis in a Japanese cohort. Postoperative adjuvant therapy with molecularly targeted agents for non-small cell lung cancer. Age-related genomic alterations and chemotherapy sensitivity in osteosarcoma: insights from cancer genome profiling analyses.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1