{"title":"CMOB:具有开放数据集、任务和基线的大规模癌症多指标基准测试","authors":"Ziwei Yang, Rikuto Kotoge, Zheng Chen, Xihao Piao, Yasuko Matsubara, Yasushi Sakurai","doi":"arxiv-2409.02143","DOIUrl":null,"url":null,"abstract":"Machine learning has shown great potential in the field of cancer multi-omics\nstudies, offering incredible opportunities for advancing precision medicine.\nHowever, the challenges associated with dataset curation and task formulation\npose significant hurdles, especially for researchers lacking a biomedical\nbackground. Here, we introduce the CMOB, the first large-scale cancer\nmulti-omics benchmark integrates the TCGA platform, making data resources\naccessible and usable for machine learning researchers without significant\npreparation and expertise.To date, CMOB includes a collection of 20 cancer\nmulti-omics datasets covering 32 cancers, accompanied by a systematic data\nprocessing pipeline. CMOB provides well-processed dataset versions to support\n20 meaningful tasks in four studies, with a collection of benchmarks. We also\nintegrate CMOB with two complementary resources and various biological tools to\nexplore broader research avenues.All resources are open-accessible with\nuser-friendly and compatible integration scripts that enable non-experts to\neasily incorporate this complementary information for various tasks. We conduct\nextensive experiments on selected datasets to offer recommendations on suitable\nmachine learning baselines for specific applications. Through CMOB, we aim to\nfacilitate algorithmic advances and hasten the development, validation, and\nclinical translation of machine-learning models for personalized cancer\ntreatments. CMOB is available on GitHub\n(\\url{https://github.com/chenzRG/Cancer-Multi-Omics-Benchmark}).","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"26 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CMOB: Large-Scale Cancer Multi-Omics Benchmark with Open Datasets, Tasks, and Baselines\",\"authors\":\"Ziwei Yang, Rikuto Kotoge, Zheng Chen, Xihao Piao, Yasuko Matsubara, Yasushi Sakurai\",\"doi\":\"arxiv-2409.02143\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning has shown great potential in the field of cancer multi-omics\\nstudies, offering incredible opportunities for advancing precision medicine.\\nHowever, the challenges associated with dataset curation and task formulation\\npose significant hurdles, especially for researchers lacking a biomedical\\nbackground. Here, we introduce the CMOB, the first large-scale cancer\\nmulti-omics benchmark integrates the TCGA platform, making data resources\\naccessible and usable for machine learning researchers without significant\\npreparation and expertise.To date, CMOB includes a collection of 20 cancer\\nmulti-omics datasets covering 32 cancers, accompanied by a systematic data\\nprocessing pipeline. CMOB provides well-processed dataset versions to support\\n20 meaningful tasks in four studies, with a collection of benchmarks. We also\\nintegrate CMOB with two complementary resources and various biological tools to\\nexplore broader research avenues.All resources are open-accessible with\\nuser-friendly and compatible integration scripts that enable non-experts to\\neasily incorporate this complementary information for various tasks. We conduct\\nextensive experiments on selected datasets to offer recommendations on suitable\\nmachine learning baselines for specific applications. Through CMOB, we aim to\\nfacilitate algorithmic advances and hasten the development, validation, and\\nclinical translation of machine-learning models for personalized cancer\\ntreatments. CMOB is available on GitHub\\n(\\\\url{https://github.com/chenzRG/Cancer-Multi-Omics-Benchmark}).\",\"PeriodicalId\":501070,\"journal\":{\"name\":\"arXiv - QuanBio - Genomics\",\"volume\":\"26 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - QuanBio - Genomics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.02143\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.02143","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
CMOB: Large-Scale Cancer Multi-Omics Benchmark with Open Datasets, Tasks, and Baselines
Machine learning has shown great potential in the field of cancer multi-omics
studies, offering incredible opportunities for advancing precision medicine.
However, the challenges associated with dataset curation and task formulation
pose significant hurdles, especially for researchers lacking a biomedical
background. Here, we introduce the CMOB, the first large-scale cancer
multi-omics benchmark integrates the TCGA platform, making data resources
accessible and usable for machine learning researchers without significant
preparation and expertise.To date, CMOB includes a collection of 20 cancer
multi-omics datasets covering 32 cancers, accompanied by a systematic data
processing pipeline. CMOB provides well-processed dataset versions to support
20 meaningful tasks in four studies, with a collection of benchmarks. We also
integrate CMOB with two complementary resources and various biological tools to
explore broader research avenues.All resources are open-accessible with
user-friendly and compatible integration scripts that enable non-experts to
easily incorporate this complementary information for various tasks. We conduct
extensive experiments on selected datasets to offer recommendations on suitable
machine learning baselines for specific applications. Through CMOB, we aim to
facilitate algorithmic advances and hasten the development, validation, and
clinical translation of machine-learning models for personalized cancer
treatments. CMOB is available on GitHub
(\url{https://github.com/chenzRG/Cancer-Multi-Omics-Benchmark}).