Pub Date : 2019-07-01DOI: 10.1109/BigDataCongress.2019.00027
F. Guerra, Paolo Sottovia, Matteo Paganelli, M. Vincini
The application of big data integration techniques in real scenarios needs to address practical issues related to the scalability of the process and the heterogeneity of data sources. In this paper, we describe the pipeline that has been developed in the context of the Re-search Alps project, a project funded by the EU Commission through the INEA Agency in the CEF Telecom framework, that aims at creating an open dataset describing research centers located in the Alpine area.
{"title":"Big Data Integration of Heterogeneous Data Sources: The Re-Search Alps Case Study","authors":"F. Guerra, Paolo Sottovia, Matteo Paganelli, M. Vincini","doi":"10.1109/BigDataCongress.2019.00027","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2019.00027","url":null,"abstract":"The application of big data integration techniques in real scenarios needs to address practical issues related to the scalability of the process and the heterogeneity of data sources. In this paper, we describe the pipeline that has been developed in the context of the Re-search Alps project, a project funded by the EU Commission through the INEA Agency in the CEF Telecom framework, that aims at creating an open dataset describing research centers located in the Alpine area.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125370257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-01DOI: 10.1109/BigDataCongress.2019.00015
P. Bonatti, S. Kirrane
The new European General Data Protection Regulation places stringent restrictions on the processing of personally identifiable data. The GDPR does not only affect European companies, as the regulation applies to all the organizations that track or provide services to European citizens. Free exploratory data analysis is permitted only on anonymous data, at the cost of some legal risks. We argue that for the other kinds of personal data processing, the most flexible and safe legal basis is explicit consent. We illustrate the approach to consent management and compliance with the GDPR being developed by the European H2020 project SPECIAL, and highlight some related big data aspects.
{"title":"Big Data and Analytics in the Age of the GDPR","authors":"P. Bonatti, S. Kirrane","doi":"10.1109/BigDataCongress.2019.00015","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2019.00015","url":null,"abstract":"The new European General Data Protection Regulation places stringent restrictions on the processing of personally identifiable data. The GDPR does not only affect European companies, as the regulation applies to all the organizations that track or provide services to European citizens. Free exploratory data analysis is permitted only on anonymous data, at the cost of some legal risks. We argue that for the other kinds of personal data processing, the most flexible and safe legal basis is explicit consent. We illustrate the approach to consent management and compliance with the GDPR being developed by the European H2020 project SPECIAL, and highlight some related big data aspects.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115175192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-01DOI: 10.1109/BigDataCongress.2019.00029
S. Migliorini, E. Quintarelli, D. Carra, A. Belussi
Recommendation algorithms have been investigated and employed by many important companies in the past years: some scenarios, such as the one where a system suggests the points of interest to tourists, well adapt to sequence of recommendations to (groups of) users. We envision that sequence recommendations can be useful whenever the group of users has a limited time interval to spend together, since they reduce the time wasted in selecting the best next activity. In this paper, we investigate the role played by the context, i.e. the situation the group is currently experiencing, in the design of a system that recommends sequences of activities. We model the problem as a multi-objective optimization, where the satisfaction of the group and the available time interval are two of the functions to be optimized. In particular, the dynamic evolution of the group can be considered as the key contextual feature to produce better suggestions.
{"title":"Sequences of Recommendations for Dynamic Groups: What Is the Role of Context?","authors":"S. Migliorini, E. Quintarelli, D. Carra, A. Belussi","doi":"10.1109/BigDataCongress.2019.00029","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2019.00029","url":null,"abstract":"Recommendation algorithms have been investigated and employed by many important companies in the past years: some scenarios, such as the one where a system suggests the points of interest to tourists, well adapt to sequence of recommendations to (groups of) users. We envision that sequence recommendations can be useful whenever the group of users has a limited time interval to spend together, since they reduce the time wasted in selecting the best next activity. In this paper, we investigate the role played by the context, i.e. the situation the group is currently experiencing, in the design of a system that recommends sequences of activities. We model the problem as a multi-objective optimization, where the satisfaction of the group and the available time interval are two of the functions to be optimized. In particular, the dynamic evolution of the group can be considered as the key contextual feature to produce better suggestions.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130177688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-01DOI: 10.1109/bigdatacongress.2019.00011
{"title":"Message from the IEEE Big Data Congress 2019 Chairs","authors":"","doi":"10.1109/bigdatacongress.2019.00011","DOIUrl":"https://doi.org/10.1109/bigdatacongress.2019.00011","url":null,"abstract":"","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"64 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131687130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-01DOI: 10.1109/BigDataCongress.2019.00035
Mahmoud Ismail, August Bonds, Salman Niazi, Seif Haridi, J. Dowling
Distributed hierarchical file systems typically decouple the storage of the file system's metadata from the data (file system blocks) to enable the scalability of the file system. This decoupling, however, requires the introduction of a periodic synchronization protocol to ensure the consistency of the file system's metadata and its blocks. Apache HDFS and HopsFS implement a protocol, called block reporting, where each data server periodically sends ground truth information about all its file system blocks to the metadata servers, allowing the metadata to be synchronized with the actual state of the data blocks in the file system. The network and processing overhead of the existing block reporting protocol, however, increases with cluster size, ultimately limiting cluster scalability. In this paper, we introduce a new block reporting protocol for HopsFS that reduces the protocol bandwidth and processing overhead by up to three orders of magnitude, compared to HDFS/HopsFS' existing protocol. Our new protocol removes a major bottleneck that prevented HopsFS clusters scaling to tens of thousands of servers.
{"title":"Scalable Block Reporting for HopsFS","authors":"Mahmoud Ismail, August Bonds, Salman Niazi, Seif Haridi, J. Dowling","doi":"10.1109/BigDataCongress.2019.00035","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2019.00035","url":null,"abstract":"Distributed hierarchical file systems typically decouple the storage of the file system's metadata from the data (file system blocks) to enable the scalability of the file system. This decoupling, however, requires the introduction of a periodic synchronization protocol to ensure the consistency of the file system's metadata and its blocks. Apache HDFS and HopsFS implement a protocol, called block reporting, where each data server periodically sends ground truth information about all its file system blocks to the metadata servers, allowing the metadata to be synchronized with the actual state of the data blocks in the file system. The network and processing overhead of the existing block reporting protocol, however, increases with cluster size, ultimately limiting cluster scalability. In this paper, we introduce a new block reporting protocol for HopsFS that reduces the protocol bandwidth and processing overhead by up to three orders of magnitude, compared to HDFS/HopsFS' existing protocol. Our new protocol removes a major bottleneck that prevented HopsFS clusters scaling to tens of thousands of servers.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128740313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-01DOI: 10.1109/bigdatacongress.2019.00001
{"title":"Title Page i","authors":"","doi":"10.1109/bigdatacongress.2019.00001","DOIUrl":"https://doi.org/10.1109/bigdatacongress.2019.00001","url":null,"abstract":"","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130110857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Message from the IEEE SERVICES 2019 Symposia Chairs","authors":"M. Goul, Rong N. Chang, L. Brunie","doi":"10.1109/edge.2019.00009","DOIUrl":"https://doi.org/10.1109/edge.2019.00009","url":null,"abstract":"","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126030362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Message from the IEEE SERVICES 2019 Program Chair-in-Chief and Vice Program Chair-in-Chief","authors":"","doi":"10.1109/edge.2019.00008","DOIUrl":"https://doi.org/10.1109/edge.2019.00008","url":null,"abstract":"","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127035583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-04-16DOI: 10.1109/BigDataCongress.2019.00026
Hui Gao, Karolina K. Dluzniak, Hong Xia, W. Jie, Yanping Chen, Wei Xing, Xin Wang, Zhongmin Wang
As the number and variety of services increase, it is becoming difficult and time-consuming to locate services that satisfy users' need. Service clustering is efficacious method to prune the query space, to narrow the searching space, and improve the accuracy of locating services that satisfied users' needs. At present, clustering method of web services adopted single or traditional clustering algorithms. However, accuracy and stability of single or traditional clustering algorithms is poor. In the paper, we proposed SWOC a service clustering method based on wisdom of crowd. Firstly, by using SWOC we calculated document similarity. Secondly, we implemented a mapping algorithm that reduces the correlation of web services and improve accuracy of method. And then, we applyed different number of clusters using different individual clustering methods that increase the number of partitions so as to enhance the robustness of SWOC. Lastly, the diversity algorithm evaluates and selects the partitions to extract interesting information for the final aggregation with the weight of each individual result. Experiments were performed on the real web service dataset crawled from ProgrammableWeb which prove the accuracy, recall, F-value and stability of proposed method.
{"title":"A Service Clustering Method Based on Wisdom of Crowds","authors":"Hui Gao, Karolina K. Dluzniak, Hong Xia, W. Jie, Yanping Chen, Wei Xing, Xin Wang, Zhongmin Wang","doi":"10.1109/BigDataCongress.2019.00026","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2019.00026","url":null,"abstract":"As the number and variety of services increase, it is becoming difficult and time-consuming to locate services that satisfy users' need. Service clustering is efficacious method to prune the query space, to narrow the searching space, and improve the accuracy of locating services that satisfied users' needs. At present, clustering method of web services adopted single or traditional clustering algorithms. However, accuracy and stability of single or traditional clustering algorithms is poor. In the paper, we proposed SWOC a service clustering method based on wisdom of crowd. Firstly, by using SWOC we calculated document similarity. Secondly, we implemented a mapping algorithm that reduces the correlation of web services and improve accuracy of method. And then, we applyed different number of clusters using different individual clustering methods that increase the number of partitions so as to enhance the robustness of SWOC. Lastly, the diversity algorithm evaluates and selects the partitions to extract interesting information for the final aggregation with the weight of each individual result. Experiments were performed on the real web service dataset crawled from ProgrammableWeb which prove the accuracy, recall, F-value and stability of proposed method.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122321280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-07-01DOI: 10.1109/services.2018.00006
{"title":"Message from the IEEE SERVICES 2019 Steering Committee Chair","authors":"","doi":"10.1109/services.2018.00006","DOIUrl":"https://doi.org/10.1109/services.2018.00006","url":null,"abstract":"","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132303033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}