USC-DCT: A Collection of Diverse Classification Tasks

IF 2 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Data Pub Date : 2023-10-12 DOI:10.3390/data8100153

Adam M. Jones, Gozde Sahin, Zachary W. Murdock, Yunhao Ge, Ao Xu, Yuecheng Li, Di Wu, Shuo Ni, Po-Hsuan Huang, Kiran Lekkala, Laurent Itti

{"title":"USC-DCT: A Collection of Diverse Classification Tasks","authors":"Adam M. Jones, Gozde Sahin, Zachary W. Murdock, Yunhao Ge, Ao Xu, Yuecheng Li, Di Wu, Shuo Ni, Po-Hsuan Huang, Kiran Lekkala, Laurent Itti","doi":"10.3390/data8100153","DOIUrl":null,"url":null,"abstract":"Machine learning is a crucial tool for both academic and real-world applications. Classification problems are often used as the preferred showcase in this space, which has led to a wide variety of datasets being collected and utilized for a myriad of applications. Unfortunately, there is very little standardization in how these datasets are collected, processed, and disseminated. As new learning paradigms like lifelong or meta-learning become more popular, the demand for merging tasks for at-scale evaluation of algorithms has also increased. This paper provides a methodology for processing and cleaning datasets that can be applied to existing or new classification tasks as well as implements these practices in a collection of diverse classification tasks called USC-DCT. Constructed using 107 classification tasks collected from the internet, this collection provides a transparent and standardized pipeline that can be useful for many different applications and frameworks. While there are currently 107 tasks, USC-DCT is designed to enable future growth. Additional discussion provides explanations of applications in machine learning paradigms such as transfer, lifelong, or meta-learning, how revisions to the collection will be handled, and further tips for curating and using classification tasks at this scale.","PeriodicalId":36824,"journal":{"name":"Data","volume":"120 1","pages":"0"},"PeriodicalIF":2.0000,"publicationDate":"2023-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/data8100153","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Machine learning is a crucial tool for both academic and real-world applications. Classification problems are often used as the preferred showcase in this space, which has led to a wide variety of datasets being collected and utilized for a myriad of applications. Unfortunately, there is very little standardization in how these datasets are collected, processed, and disseminated. As new learning paradigms like lifelong or meta-learning become more popular, the demand for merging tasks for at-scale evaluation of algorithms has also increased. This paper provides a methodology for processing and cleaning datasets that can be applied to existing or new classification tasks as well as implements these practices in a collection of diverse classification tasks called USC-DCT. Constructed using 107 classification tasks collected from the internet, this collection provides a transparent and standardized pipeline that can be useful for many different applications and frameworks. While there are currently 107 tasks, USC-DCT is designed to enable future growth. Additional discussion provides explanations of applications in machine learning paradigms such as transfer, lifelong, or meta-learning, how revisions to the collection will be handled, and further tips for curating and using classification tasks at this scale.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

USC-DCT:不同分类任务的集合

机器学习对于学术和现实世界的应用都是至关重要的工具。在这个领域中，分类问题经常被用作首选的展示，这导致了各种各样的数据集被收集并用于无数的应用程序。不幸的是，在如何收集、处理和传播这些数据集方面几乎没有标准化。随着终身学习或元学习等新的学习范式越来越流行，对大规模算法评估合并任务的需求也在增加。本文提供了一种处理和清理数据集的方法，该方法可以应用于现有或新的分类任务，并在称为USC-DCT的各种分类任务集合中实现这些实践。使用从internet收集的107个分类任务构建，该集合提供了一个透明和标准化的管道，可用于许多不同的应用程序和框架。虽然目前有107项任务，但USC-DCT旨在实现未来的增长。额外的讨论解释了机器学习范例中的应用，如迁移、终身学习或元学习，如何处理集合的修订，以及在这种规模下管理和使用分类任务的进一步提示。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊