{"title":"A survey on big data classification","authors":"Keerthana G , Sherly Puspha Annabel L","doi":"10.1016/j.datak.2025.102408","DOIUrl":null,"url":null,"abstract":"<div><div>Big data refers to vast volumes of structured and unstructured data that are too large or complex for traditional data-processing methods to handle efficiently. The importance of big data lies in its ability to provide actionable insights and drive decision-making across various industries, such as healthcare, finance, marketing, and government, by enabling more accurate predictions, and personalized services. Moreover, traditional big data classification approaches, often struggle with big data's complexity. They failed to manage high-dimensionality, deal with non-linearity, or process data in real time. For effective big data classification, robust computing infrastructure, scalable storage solutions, and advanced algorithms are required. This survey provides a thorough assessment of 50 research papers based on big data classification, by identifying the struggle faced by current big data classification techniques to process and classify data efficiently without substantial computational resources. The analysis is enabled on a variety of scenarios and key points. In this case, this survey will enable the classification of the techniques utilized for big data classification that is made based on the rule-based, deep learning-based, optimization-based, machine learning-based techniques and so on. Furthermore, the classification of techniques, tools used, published year, used software tool, and performance metrics are contemplated for the analysis in big data classification. At last, the research gaps and technical problems of the techniques in a way that makes the motivations for creating an efficient model of enabling big data classification optimal.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"156 ","pages":"Article 102408"},"PeriodicalIF":2.7000,"publicationDate":"2025-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data & Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169023X25000035","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Big data refers to vast volumes of structured and unstructured data that are too large or complex for traditional data-processing methods to handle efficiently. The importance of big data lies in its ability to provide actionable insights and drive decision-making across various industries, such as healthcare, finance, marketing, and government, by enabling more accurate predictions, and personalized services. Moreover, traditional big data classification approaches, often struggle with big data's complexity. They failed to manage high-dimensionality, deal with non-linearity, or process data in real time. For effective big data classification, robust computing infrastructure, scalable storage solutions, and advanced algorithms are required. This survey provides a thorough assessment of 50 research papers based on big data classification, by identifying the struggle faced by current big data classification techniques to process and classify data efficiently without substantial computational resources. The analysis is enabled on a variety of scenarios and key points. In this case, this survey will enable the classification of the techniques utilized for big data classification that is made based on the rule-based, deep learning-based, optimization-based, machine learning-based techniques and so on. Furthermore, the classification of techniques, tools used, published year, used software tool, and performance metrics are contemplated for the analysis in big data classification. At last, the research gaps and technical problems of the techniques in a way that makes the motivations for creating an efficient model of enabling big data classification optimal.
期刊介绍:
Data & Knowledge Engineering (DKE) stimulates the exchange of ideas and interaction between these two related fields of interest. DKE reaches a world-wide audience of researchers, designers, managers and users. The major aim of the journal is to identify, investigate and analyze the underlying principles in the design and effective use of these systems.