不平衡数据二元分类自动方法

IF 3.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Knowledge and Information Systems Pub Date : 2024-01-12 DOI:10.1007/s10115-023-02046-7

Pedro Marques Vieira, Fátima Rodrigues

{"title":"不平衡数据二元分类自动方法","authors":"Pedro Marques Vieira, Fátima Rodrigues","doi":"10.1007/s10115-023-02046-7","DOIUrl":null,"url":null,"abstract":"<p>Imbalanced data are present in various business sectors and must be handled with the proper resampling methods and classification algorithms. To handle imbalanced data, there are numerous resampling and learning method combinations; nonetheless, their effective use necessitates specialised knowledge. In this paper, several approaches, ranging from more accessible to more advanced in the domain of data resampling techniques, will be considered to handle imbalanced data. The application developed delivers recommendations of the most suitable combinations of techniques for a specific dataset by extracting and comparing dataset meta-feature values recorded in a knowledge base. It facilitates effortless classification and automates part of the machine learning pipeline with comparable or better results than state-of-the-art solutions and with a much smaller execution time.\n</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"16 1","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An automated approach for binary classification on imbalanced data\",\"authors\":\"Pedro Marques Vieira, Fátima Rodrigues\",\"doi\":\"10.1007/s10115-023-02046-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Imbalanced data are present in various business sectors and must be handled with the proper resampling methods and classification algorithms. To handle imbalanced data, there are numerous resampling and learning method combinations; nonetheless, their effective use necessitates specialised knowledge. In this paper, several approaches, ranging from more accessible to more advanced in the domain of data resampling techniques, will be considered to handle imbalanced data. The application developed delivers recommendations of the most suitable combinations of techniques for a specific dataset by extracting and comparing dataset meta-feature values recorded in a knowledge base. It facilitates effortless classification and automates part of the machine learning pipeline with comparable or better results than state-of-the-art solutions and with a much smaller execution time.\\n</p>\",\"PeriodicalId\":54749,\"journal\":{\"name\":\"Knowledge and Information Systems\",\"volume\":\"16 1\",\"pages\":\"\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2024-01-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge and Information Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s10115-023-02046-7\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge and Information Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10115-023-02046-7","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

不平衡数据存在于各个业务领域，必须使用适当的重采样方法和分类算法来处理。要处理不平衡数据，有许多重采样和学习方法组合；然而，有效使用这些方法需要专业知识。本文将介绍几种处理不平衡数据的方法，这些方法在数据重采样技术领域既有较易掌握的，也有较先进的。所开发的应用程序通过提取和比较知识库中记录的数据集元特征值，为特定数据集提供最合适的技术组合建议。它可以轻松实现分类，并自动执行机器学习管道的部分工作，其结果可媲美或优于最先进的解决方案，而且执行时间更短。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An automated approach for binary classification on imbalanced data

Imbalanced data are present in various business sectors and must be handled with the proper resampling methods and classification algorithms. To handle imbalanced data, there are numerous resampling and learning method combinations; nonetheless, their effective use necessitates specialised knowledge. In this paper, several approaches, ranging from more accessible to more advanced in the domain of data resampling techniques, will be considered to handle imbalanced data. The application developed delivers recommendations of the most suitable combinations of techniques for a specific dataset by extracting and comparing dataset meta-feature values recorded in a knowledge base. It facilitates effortless classification and automates part of the machine learning pipeline with comparable or better results than state-of-the-art solutions and with a much smaller execution time.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Knowledge and Information Systems 工程技术-计算机：人工智能

CiteScore

5.70

自引率

7.40%

发文量

152

审稿时长

7.2 months

期刊介绍： Knowledge and Information Systems (KAIS) provides an international forum for researchers and professionals to share their knowledge and report new advances on all topics related to knowledge systems and advanced information systems. This monthly peer-reviewed archival journal publishes state-of-the-art research reports on emerging topics in KAIS, reviews of important techniques in related areas, and application papers of interest to a general readership.

期刊最新文献

Automated biomedical hypothesis generation with time-aware hypergraph contrastive learning. Restless reachability problems in temporal graphs. Dynamic evolution of causal relationships among cryptocurrencies: an analysis via Bayesian networks Deep multi-semantic fuzzy K-means with adaptive weight adjustment Class incremental named entity recognition without forgetting