各种大数据分类技术的综合分析:一个具有挑战性的概述

J. Inf. Knowl. Manag. Pub Date : 2022-11-02 DOI:10.1142/s0219649222500836

H. B. Abdalla, B. Abuhaija

{"title":"各种大数据分类技术的综合分析:一个具有挑战性的概述","authors":"H. B. Abdalla, B. Abuhaija","doi":"10.1142/s0219649222500836","DOIUrl":null,"url":null,"abstract":"Data over the internet has been increasing everyday, and automatic mining of essential information from an enormous amount of data has become a challenging task today for an organisation with a huge dataset. In recent years, the prominent technology in the domain of Information Technology (IT) is big data, which is unstructured data that solves the computational complexity of classical database systems. The data is fast and big and typically derived from multiple and independent sources. The three main challenges are data accessing, semantics, and domain knowledge for various big data utilisations and complexities raised by big data volumes. One of the major limitations is the classification of big data. This paper introduces well-defined classification methodologies employed for big data classification. This paper reviews 50 research papers based on classification methods of big data, and such methodologies are primarily categorised into six different categories, namely K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Fuzzy-based method, Bayesian-based method, Random Forest, and Decision Tree. In addition, detailed analysis and discussion are carried out by considering classification techniques, dataset utilised, evaluation metrics, semantic similarity measures, and publication year. In addition, research gaps and issues for several traditional big data classification techniques are explained to expand investigators’ works to provide effective big data management.","PeriodicalId":127309,"journal":{"name":"J. Inf. Knowl. Manag.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comprehensive Analysis of Various Big Data Classification Techniques: A Challenging Overview\",\"authors\":\"H. B. Abdalla, B. Abuhaija\",\"doi\":\"10.1142/s0219649222500836\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data over the internet has been increasing everyday, and automatic mining of essential information from an enormous amount of data has become a challenging task today for an organisation with a huge dataset. In recent years, the prominent technology in the domain of Information Technology (IT) is big data, which is unstructured data that solves the computational complexity of classical database systems. The data is fast and big and typically derived from multiple and independent sources. The three main challenges are data accessing, semantics, and domain knowledge for various big data utilisations and complexities raised by big data volumes. One of the major limitations is the classification of big data. This paper introduces well-defined classification methodologies employed for big data classification. This paper reviews 50 research papers based on classification methods of big data, and such methodologies are primarily categorised into six different categories, namely K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Fuzzy-based method, Bayesian-based method, Random Forest, and Decision Tree. In addition, detailed analysis and discussion are carried out by considering classification techniques, dataset utilised, evaluation metrics, semantic similarity measures, and publication year. In addition, research gaps and issues for several traditional big data classification techniques are explained to expand investigators’ works to provide effective big data management.\",\"PeriodicalId\":127309,\"journal\":{\"name\":\"J. Inf. Knowl. Manag.\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"J. Inf. Knowl. Manag.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/s0219649222500836\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Inf. Knowl. Manag.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s0219649222500836","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

互联网上的数据每天都在增加，对于拥有庞大数据集的组织来说，从大量数据中自动挖掘重要信息已成为一项具有挑战性的任务。近年来，信息技术(IT)领域的突出技术是大数据，它是一种解决经典数据库系统计算复杂性的非结构化数据。数据快速而庞大，通常来自多个独立来源。三个主要挑战是数据访问、语义和各种大数据利用的领域知识，以及大数据量带来的复杂性。其中一个主要的限制是大数据的分类。本文介绍了用于大数据分类的定义良好的分类方法。本文回顾了50篇基于大数据分类方法的研究论文，这些方法主要分为6类，分别是k -最近邻(KNN)、支持向量机(SVM)、基于模糊的方法、基于贝叶斯的方法、随机森林和决策树。此外，通过考虑分类技术、使用的数据集、评估指标、语义相似度量和出版年份，进行了详细的分析和讨论。此外，阐述了几种传统大数据分类技术的研究空白和存在的问题，以拓展研究者的工作，提供有效的大数据管理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Comprehensive Analysis of Various Big Data Classification Techniques: A Challenging Overview

Data over the internet has been increasing everyday, and automatic mining of essential information from an enormous amount of data has become a challenging task today for an organisation with a huge dataset. In recent years, the prominent technology in the domain of Information Technology (IT) is big data, which is unstructured data that solves the computational complexity of classical database systems. The data is fast and big and typically derived from multiple and independent sources. The three main challenges are data accessing, semantics, and domain knowledge for various big data utilisations and complexities raised by big data volumes. One of the major limitations is the classification of big data. This paper introduces well-defined classification methodologies employed for big data classification. This paper reviews 50 research papers based on classification methods of big data, and such methodologies are primarily categorised into six different categories, namely K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Fuzzy-based method, Bayesian-based method, Random Forest, and Decision Tree. In addition, detailed analysis and discussion are carried out by considering classification techniques, dataset utilised, evaluation metrics, semantic similarity measures, and publication year. In addition, research gaps and issues for several traditional big data classification techniques are explained to expand investigators’ works to provide effective big data management.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

J. Inf. Knowl. Manag.

自引率

0.00%

发文量