{"title":"图侧和约侧布兰连接中布兰过滤器尺寸的性能评价","authors":"A. Al-Badarneh, Hassan M. Najadat, Salah Rababah","doi":"10.1109/IACS.2017.7921965","DOIUrl":null,"url":null,"abstract":"Map Reduce (MP) Is an efficient programming model for processing big data. However, MR has some limitations in performing the join operation. Recent researches have been made to alleviate this problem, such as Bloom join. The idea of the Bloom join lies in constructing a Bloom filter to remove redundant records before performing the join operation. The size of the constructed filter is very critical and it should be chosen in a good manner. In this paper, we evaluate the performance of the Bloom filter size for two Bloom join algorithms, Map-side Bloom join and Reduce-side Bloom join. In our methodology, we constructed multiple Bloom filters with different sizes for two static input datasets. Our experimental results show that it is not always the best solution to construct a small or a large filter size to produce a good performance, it should be constructed based on the size of the input datasets. Also, the results show that tuning the Bloom filter size causes major effects on the join performance. Furthermore, the results show that it is recommended to choose small sizes of the Bloom filter, small enough to produce neglected false positive rate, in the implementation of the two algorithms when there is a concern about the memory. On the other hand, small to medium sizes of the Bloom filter in the Reduce-side join produce smaller elapsed time compared to the Map-side join, while large sizes produce larger elapsed time.","PeriodicalId":180504,"journal":{"name":"2017 8th International Conference on Information and Communication Systems (ICICS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Performance evaluation of bloom filter size in map-side and reduce-side bloom joins\",\"authors\":\"A. Al-Badarneh, Hassan M. Najadat, Salah Rababah\",\"doi\":\"10.1109/IACS.2017.7921965\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Map Reduce (MP) Is an efficient programming model for processing big data. However, MR has some limitations in performing the join operation. Recent researches have been made to alleviate this problem, such as Bloom join. The idea of the Bloom join lies in constructing a Bloom filter to remove redundant records before performing the join operation. The size of the constructed filter is very critical and it should be chosen in a good manner. In this paper, we evaluate the performance of the Bloom filter size for two Bloom join algorithms, Map-side Bloom join and Reduce-side Bloom join. In our methodology, we constructed multiple Bloom filters with different sizes for two static input datasets. Our experimental results show that it is not always the best solution to construct a small or a large filter size to produce a good performance, it should be constructed based on the size of the input datasets. Also, the results show that tuning the Bloom filter size causes major effects on the join performance. Furthermore, the results show that it is recommended to choose small sizes of the Bloom filter, small enough to produce neglected false positive rate, in the implementation of the two algorithms when there is a concern about the memory. On the other hand, small to medium sizes of the Bloom filter in the Reduce-side join produce smaller elapsed time compared to the Map-side join, while large sizes produce larger elapsed time.\",\"PeriodicalId\":180504,\"journal\":{\"name\":\"2017 8th International Conference on Information and Communication Systems (ICICS)\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 8th International Conference on Information and Communication Systems (ICICS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IACS.2017.7921965\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 8th International Conference on Information and Communication Systems (ICICS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IACS.2017.7921965","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Performance evaluation of bloom filter size in map-side and reduce-side bloom joins
Map Reduce (MP) Is an efficient programming model for processing big data. However, MR has some limitations in performing the join operation. Recent researches have been made to alleviate this problem, such as Bloom join. The idea of the Bloom join lies in constructing a Bloom filter to remove redundant records before performing the join operation. The size of the constructed filter is very critical and it should be chosen in a good manner. In this paper, we evaluate the performance of the Bloom filter size for two Bloom join algorithms, Map-side Bloom join and Reduce-side Bloom join. In our methodology, we constructed multiple Bloom filters with different sizes for two static input datasets. Our experimental results show that it is not always the best solution to construct a small or a large filter size to produce a good performance, it should be constructed based on the size of the input datasets. Also, the results show that tuning the Bloom filter size causes major effects on the join performance. Furthermore, the results show that it is recommended to choose small sizes of the Bloom filter, small enough to produce neglected false positive rate, in the implementation of the two algorithms when there is a concern about the memory. On the other hand, small to medium sizes of the Bloom filter in the Reduce-side join produce smaller elapsed time compared to the Map-side join, while large sizes produce larger elapsed time.