{"title":"Performance Comparison of Big Data Processing Utilizing SciDB and Apache Accumulo Databases","authors":"Mohammad Abu Mhana, Alá F. Khalifeh, S. Alouneh","doi":"10.1109/FMEC57183.2022.10062513","DOIUrl":null,"url":null,"abstract":"Big data deals with processing massive, complex data sets and data volumes that incorporate a tremendous amount of information. Therefore, researchers created several methods, models, and databases to deal with such big data, among them is the Apache Accumulo database, which is considered an in-storage database reliant on the Hadoop processing framework to give the ability to analyze and process the data. Another big data database that is widely used in the research community is SciDB which stands for the scientific database. SciDB utilizes a PostgreSQL connection, to establish a reliable link with the database. In this paper, we will analyze and evaluate the performance of these two database systems that are specialized in handling big data and storing them for further processing and analysis. The databases' performance will be analyzed in terms of several metrics such as CPU utilization, data storing/retrieval delay, disk utilization, and the number of data ingestions per second. Furthermore, the setup and integration of the two databases are investigated. Our performance evaluation revealed the advantages and disadvantages of each database structure. Where it has been found that Apache Accumulo DB has the best performance compared with SciDB in terms of average ingestion execution time, the number of ingestions per second, and CPU utilization. Whereas, SciDB has the lowest disk utilization compared to Apache Accumulo.","PeriodicalId":129184,"journal":{"name":"2022 Seventh International Conference on Fog and Mobile Edge Computing (FMEC)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Seventh International Conference on Fog and Mobile Edge Computing (FMEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FMEC57183.2022.10062513","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Big data deals with processing massive, complex data sets and data volumes that incorporate a tremendous amount of information. Therefore, researchers created several methods, models, and databases to deal with such big data, among them is the Apache Accumulo database, which is considered an in-storage database reliant on the Hadoop processing framework to give the ability to analyze and process the data. Another big data database that is widely used in the research community is SciDB which stands for the scientific database. SciDB utilizes a PostgreSQL connection, to establish a reliable link with the database. In this paper, we will analyze and evaluate the performance of these two database systems that are specialized in handling big data and storing them for further processing and analysis. The databases' performance will be analyzed in terms of several metrics such as CPU utilization, data storing/retrieval delay, disk utilization, and the number of data ingestions per second. Furthermore, the setup and integration of the two databases are investigated. Our performance evaluation revealed the advantages and disadvantages of each database structure. Where it has been found that Apache Accumulo DB has the best performance compared with SciDB in terms of average ingestion execution time, the number of ingestions per second, and CPU utilization. Whereas, SciDB has the lowest disk utilization compared to Apache Accumulo.