Pub Date : 2019-07-08DOI: 10.1109/BigDataCongress.2019.00018
Changxin Bai, Shiyong Lu, Ishtiaq Ahmed, D. Che, Aravind Mohan
List based scheduling algorithms have been proven an optimistic strategy with a shorter response time to generate feasible solutions for the workflow scheduling problem. Data-intensive and computation-intensive workflow applications have different characteristics in terms of the ratio between data transfer time and task execution time. Workflow scheduling algorithms in a cloud-based environment should adequately consider the characteristics of the underlying cloud platform such as the on-demand resource provisioning strategy, the practically unlimited compute capacities, the booting times of virtual machines, the homogeneous network and the pay-as-you-go price model to produce an optimal scheduling solution within the deadline constraint of a given workflow. In this paper, a path based scheduling algorithm, named LPOD, is proposed to find the best workflow schedule solution with minimum monetary cost in a cloud computing environment. A series of case studies have been carefully conducted using synthetic workflows based on DATAVIEW, which is a popular open-source big data workflow management system. The experimental results show that the proposed algorithm is efficient and can generate better workflow schedules than the state-of-the-art algorithms such as IC-PCP and SGX-E2C2D.
{"title":"LPOD: A Local Path Based Optimized Scheduling Algorithm for Deadline-Constrained Big Data Workflows in the Cloud","authors":"Changxin Bai, Shiyong Lu, Ishtiaq Ahmed, D. Che, Aravind Mohan","doi":"10.1109/BigDataCongress.2019.00018","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2019.00018","url":null,"abstract":"List based scheduling algorithms have been proven an optimistic strategy with a shorter response time to generate feasible solutions for the workflow scheduling problem. Data-intensive and computation-intensive workflow applications have different characteristics in terms of the ratio between data transfer time and task execution time. Workflow scheduling algorithms in a cloud-based environment should adequately consider the characteristics of the underlying cloud platform such as the on-demand resource provisioning strategy, the practically unlimited compute capacities, the booting times of virtual machines, the homogeneous network and the pay-as-you-go price model to produce an optimal scheduling solution within the deadline constraint of a given workflow. In this paper, a path based scheduling algorithm, named LPOD, is proposed to find the best workflow schedule solution with minimum monetary cost in a cloud computing environment. A series of case studies have been carefully conducted using synthetic workflows based on DATAVIEW, which is a popular open-source big data workflow management system. The experimental results show that the proposed algorithm is efficient and can generate better workflow schedules than the state-of-the-art algorithms such as IC-PCP and SGX-E2C2D.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131522507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-08DOI: 10.1109/BigDataCongress.2019.00038
Haojie Huang, R. Wong
Relation extraction is a critical task in building a knowledge base from unstructured text documents. Most works in automatic relation extraction have applied deep learning techniques such as Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) in large text corpora. However, they require a large amount of human labelling data, which is labour intensive and is hardly applied in a new domain of document without human supervision. This paper proposes a novel framework to extract relations in multi-domain texts effectively. In particular, we construct the framework in three phases including preprocessing, feature embedding and relation extraction. We show that a small proportion of training data is sufficient to train our relation extraction framework and achieve a good accuracy in relation extraction works.
{"title":"Reducing Feature Embedding Data for Discovering Relations in Big Text Data","authors":"Haojie Huang, R. Wong","doi":"10.1109/BigDataCongress.2019.00038","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2019.00038","url":null,"abstract":"Relation extraction is a critical task in building a knowledge base from unstructured text documents. Most works in automatic relation extraction have applied deep learning techniques such as Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) in large text corpora. However, they require a large amount of human labelling data, which is labour intensive and is hardly applied in a new domain of document without human supervision. This paper proposes a novel framework to extract relations in multi-domain texts effectively. In particular, we construct the framework in three phases including preprocessing, feature embedding and relation extraction. We show that a small proportion of training data is sufficient to train our relation extraction framework and achieve a good accuracy in relation extraction works.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116038691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-08DOI: 10.1109/BigDataCongress.2019.00016
Feng Xing, Caili Guo
Rumor detection becomes a major issue concerning the public and government as the proliferation of social media in information dissemination. However, most existing methods only extract hand-crafted features, far from adequate in interpreting semantics latent in texts. For social events, there also exists rich social contextual information and highlevel interactions among significant features, which provides cues for interpreting semantics. In this paper, we propose a novel attention learning framework via deep visual perception based recurrent neural network (ViP-RNN), considering both high-level feature interactions and contextual information. In particular, the proposed model is based on RNN for capturing the long-distance temporal dependencies of contextual information of relevant posts and composing low-level lexical features into high-level semantic interactions hierarchically by visual perception of convolutional neural network (CNN). To incorporate information learned by RNN and CNN, we combine convolutional and recurrent layers into one model so that the model can capture a discriminative semantic representation of social events more efficiently by utilizing visual perception attention vector i.e. outputs of CNN to align long-distance temporal dependencies. We conduct experiments on real datasets collected from social media websites, which demonstrates the effectiveness of our approach and the merits of model integration.
{"title":"Mining Semantic Information in Rumor Detection via a Deep Visual Perception Based Recurrent Neural Networks","authors":"Feng Xing, Caili Guo","doi":"10.1109/BigDataCongress.2019.00016","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2019.00016","url":null,"abstract":"Rumor detection becomes a major issue concerning the public and government as the proliferation of social media in information dissemination. However, most existing methods only extract hand-crafted features, far from adequate in interpreting semantics latent in texts. For social events, there also exists rich social contextual information and highlevel interactions among significant features, which provides cues for interpreting semantics. In this paper, we propose a novel attention learning framework via deep visual perception based recurrent neural network (ViP-RNN), considering both high-level feature interactions and contextual information. In particular, the proposed model is based on RNN for capturing the long-distance temporal dependencies of contextual information of relevant posts and composing low-level lexical features into high-level semantic interactions hierarchically by visual perception of convolutional neural network (CNN). To incorporate information learned by RNN and CNN, we combine convolutional and recurrent layers into one model so that the model can capture a discriminative semantic representation of social events more efficiently by utilizing visual perception attention vector i.e. outputs of CNN to align long-distance temporal dependencies. We conduct experiments on real datasets collected from social media websites, which demonstrates the effectiveness of our approach and the merits of model integration.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129386956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-01DOI: 10.1109/BigDataCongress.2019.00021
Jianyang Yu, Yuanyuan Qiao, Nanfei Shu, Kewu Sun, Shenshen Zhou, Jie Yang
With the rapid development of Chinese economy, it is significant to examine the economic activities in China. Each transaction behavior is recorded by the invoice. The invoice contains the transaction content, the classification of the transaction behavior (in accordance with the Tax Classification and Coding for Commodities and Services issued by the state) and transaction price, etc. Our work uses real mass invoice data collected from Zhejiang Province and conducts a multi-dimensional analysis of Chinese transaction behavior based on transaction behavior classification model. Firstly, we propose a compositional CNN-RNN model with attention mechanism to recommend the corresponding categories of transaction behavior collected from tax invoices. It maps the transaction behavior recorded in the invoice to transaction code in the Tax Classification and Coding for Commodities and Services issued by the state. Preliminary experiments show that the top-one accuracy of classifying transaction behavior achieves 75%. Then, we focus on the quantity distribution of invoice data and draw a conclusion that the major category with larger number of invoice records is more diversified in subdivided categories. After that, we studied the price distribution of various transaction behaviors to discover the difference in price distribution between different industries. Prices in the major categories of goods are more concentrated in the middle or lower prices. We can analyze the regional industrial structure through the price distribution of the industry which makes sense to study the economy of the region from the perspective of industry.
{"title":"Neural Network Based Transaction Classification System for Chinese Transaction Behavior Analysis","authors":"Jianyang Yu, Yuanyuan Qiao, Nanfei Shu, Kewu Sun, Shenshen Zhou, Jie Yang","doi":"10.1109/BigDataCongress.2019.00021","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2019.00021","url":null,"abstract":"With the rapid development of Chinese economy, it is significant to examine the economic activities in China. Each transaction behavior is recorded by the invoice. The invoice contains the transaction content, the classification of the transaction behavior (in accordance with the Tax Classification and Coding for Commodities and Services issued by the state) and transaction price, etc. Our work uses real mass invoice data collected from Zhejiang Province and conducts a multi-dimensional analysis of Chinese transaction behavior based on transaction behavior classification model. Firstly, we propose a compositional CNN-RNN model with attention mechanism to recommend the corresponding categories of transaction behavior collected from tax invoices. It maps the transaction behavior recorded in the invoice to transaction code in the Tax Classification and Coding for Commodities and Services issued by the state. Preliminary experiments show that the top-one accuracy of classifying transaction behavior achieves 75%. Then, we focus on the quantity distribution of invoice data and draw a conclusion that the major category with larger number of invoice records is more diversified in subdivided categories. After that, we studied the price distribution of various transaction behaviors to discover the difference in price distribution between different industries. Prices in the major categories of goods are more concentrated in the middle or lower prices. We can analyze the regional industrial structure through the price distribution of the industry which makes sense to study the economy of the region from the perspective of industry.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130619180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-01DOI: 10.1109/BigDataCongress.2019.00036
A. Levin, Shelly Garion, E. K. Kolodner, D. Lorenz, K. Barabash, Mike Kugler, Niall McShane
With the growing reliance on the ubiquitous availability of IT systems and services, these systems become more global, scaled, and complex to operate. To maintain business viability, IT service providers must put in place reliable and cost efficient operations support. Artificial Intelligence for IT Operations (AIOps) is a promising technology for alleviating operational complexity of IT systems and services. AIOps platforms utilize big data, machine learning and other advanced analytics technologies to enhance IT operations with proactive actionable dynamic insight. In this paper we share our experience applying the AIOps approach to a production cloud object storage service to get actionable insights into system's behavior and health. We describe a real-life production cloud scale service and its operational data, present the AIOps platform we have created, and show how it has helped us resolving operational pain points.
{"title":"AIOps for a Cloud Object Storage Service","authors":"A. Levin, Shelly Garion, E. K. Kolodner, D. Lorenz, K. Barabash, Mike Kugler, Niall McShane","doi":"10.1109/BigDataCongress.2019.00036","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2019.00036","url":null,"abstract":"With the growing reliance on the ubiquitous availability of IT systems and services, these systems become more global, scaled, and complex to operate. To maintain business viability, IT service providers must put in place reliable and cost efficient operations support. Artificial Intelligence for IT Operations (AIOps) is a promising technology for alleviating operational complexity of IT systems and services. AIOps platforms utilize big data, machine learning and other advanced analytics technologies to enhance IT operations with proactive actionable dynamic insight. In this paper we share our experience applying the AIOps approach to a production cloud object storage service to get actionable insights into system's behavior and health. We describe a real-life production cloud scale service and its operational data, present the AIOps platform we have created, and show how it has helped us resolving operational pain points.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125875174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-01DOI: 10.1109/BigDataCongress.2019.00030
Chia-Hsuan Chang, Ming-Lun Wu, San-Yih Hwang
Lexicon-based sentiment analysis is a popular and practical approach for sentiment analysis. However, sentiment lexicons, which may be abundant in some language such as English, are scarce in many other languages. The cross-lingual lexicon learning aims to extend lexicons for the language with less resources from those lexicons available in other languages. In this paper, we propose an approach that builds a skip-gram variant to map word spaces across languages so as to construct lexicons for the language with less resources. We show in our preliminary experiment that our approach can generate lexicons that are similar to those crafted by human experts.
{"title":"An Approach to Cross-Lingual Sentiment Lexicon Construction","authors":"Chia-Hsuan Chang, Ming-Lun Wu, San-Yih Hwang","doi":"10.1109/BigDataCongress.2019.00030","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2019.00030","url":null,"abstract":"Lexicon-based sentiment analysis is a popular and practical approach for sentiment analysis. However, sentiment lexicons, which may be abundant in some language such as English, are scarce in many other languages. The cross-lingual lexicon learning aims to extend lexicons for the language with less resources from those lexicons available in other languages. In this paper, we propose an approach that builds a skip-gram variant to map word spaces across languages so as to construct lexicons for the language with less resources. We show in our preliminary experiment that our approach can generate lexicons that are similar to those crafted by human experts.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127994591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-01DOI: 10.1109/bigdatacongress.2019.00012
K. Aberer, Frederico Alvares De Oliveira
Karl Aberer, Ecole Polyltecnique Fédérale de Lausanne Frederico Alvares De Oliveira, ASCOLA Research Group, Mines Nantes, INRIA, LINA Mohsen Amini Salehi, University of Louisiana Lafayette Ahamed Awad, Cairo University Jaume Bacardit, Newcastel University Payam Barnaghi, University of Surrey Rodrigo Barros, PUCRS Arun Balaji Buduru, Indraprastha Institute of Information Technology–Delhi (IIIT-D) Rodrigo N. Calheiros, Western Sydney University Alberto Cano, Virginia Commonwealth University Xin Cao, The University of New South Wales Miguel Cárdenas Montes, CIEMAT Bogdan Cautis, ENST Paris–UMR CNRS 5141 Eugenio Cesario, ICAR-CNRS Subarna Chatterjee, INRIA Lisi Chen, Hong Kong Baptist University Peng Chen Byron Choi, Hong Kong Baptist University Félix Cuadrado, Queen Mary University of London Edward Curry, Insight Centre for Data Analytics, NUI Galway Dilma Da Silva, Texas A&M University Hong-Ning Dai, Macau University of Science & Technology Dong Dai, UNC Charlotte Patrizio Dazzi, ISTI-CNRS Sheng Di, ANL Mario José Diván, Engineering School (UNLPam) & Divsar Youcef Djenouri, LRIA_USTHB Matthieu Dorier, Argonne National Laboratory Schahram Dustdar, Vienna University of Technology Liyue Fan, SUNY Albany George H.L. Fletcher, Eindhoven University of Technology Matthew Forshaw, Newcastle University Gangadharan G.R., IBM Mohamed Gaber, Birmingham City University Mikel Galar, Universidad Pública de Navarra Antonio Gómez-Iglesias, Intel Jose M. Granado-Criado, University of Extremadura Le Gruenwald, The University of Oklahoma Jarek Gryz, York University Yanfei Guo, Argonne National Laboratory Mohamed Hamlich, FSTM Jin-Kao Hao, University of Angers Takahiro Hara, Osaka University Jiong He, Advanced Digital Sciences Centre Qiang He, Swinburne University of Technology Francisco Herrera, University of Granada Jan Hidders, Vrije Universiteit Brussels Liting Hu, Florida International University
{"title":"IEEE BigData Congress 2019 Program Committee","authors":"K. Aberer, Frederico Alvares De Oliveira","doi":"10.1109/bigdatacongress.2019.00012","DOIUrl":"https://doi.org/10.1109/bigdatacongress.2019.00012","url":null,"abstract":"Karl Aberer, Ecole Polyltecnique Fédérale de Lausanne Frederico Alvares De Oliveira, ASCOLA Research Group, Mines Nantes, INRIA, LINA Mohsen Amini Salehi, University of Louisiana Lafayette Ahamed Awad, Cairo University Jaume Bacardit, Newcastel University Payam Barnaghi, University of Surrey Rodrigo Barros, PUCRS Arun Balaji Buduru, Indraprastha Institute of Information Technology–Delhi (IIIT-D) Rodrigo N. Calheiros, Western Sydney University Alberto Cano, Virginia Commonwealth University Xin Cao, The University of New South Wales Miguel Cárdenas Montes, CIEMAT Bogdan Cautis, ENST Paris–UMR CNRS 5141 Eugenio Cesario, ICAR-CNRS Subarna Chatterjee, INRIA Lisi Chen, Hong Kong Baptist University Peng Chen Byron Choi, Hong Kong Baptist University Félix Cuadrado, Queen Mary University of London Edward Curry, Insight Centre for Data Analytics, NUI Galway Dilma Da Silva, Texas A&M University Hong-Ning Dai, Macau University of Science & Technology Dong Dai, UNC Charlotte Patrizio Dazzi, ISTI-CNRS Sheng Di, ANL Mario José Diván, Engineering School (UNLPam) & Divsar Youcef Djenouri, LRIA_USTHB Matthieu Dorier, Argonne National Laboratory Schahram Dustdar, Vienna University of Technology Liyue Fan, SUNY Albany George H.L. Fletcher, Eindhoven University of Technology Matthew Forshaw, Newcastle University Gangadharan G.R., IBM Mohamed Gaber, Birmingham City University Mikel Galar, Universidad Pública de Navarra Antonio Gómez-Iglesias, Intel Jose M. Granado-Criado, University of Extremadura Le Gruenwald, The University of Oklahoma Jarek Gryz, York University Yanfei Guo, Argonne National Laboratory Mohamed Hamlich, FSTM Jin-Kao Hao, University of Angers Takahiro Hara, Osaka University Jiong He, Advanced Digital Sciences Centre Qiang He, Swinburne University of Technology Francisco Herrera, University of Granada Jan Hidders, Vrije Universiteit Brussels Liting Hu, Florida International University","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125307165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-01DOI: 10.1109/BigDataCongress.2019.00022
Luca Cagliero, S. Chiusano, Elena Daraio, P. Garza
Free floating car sharing is a popular rental model for cars in shared use. In urban environments, it has become particularly attractive for users who make short trips or who make occasional use of the car. Since cars are not uniformly distributed across city areas, monitoring the number of cars available within restricted urban areas is crucial for both shaping service provision and improving the user experience. To address these issues, the application of machine learning techniques to analyze car mobility data has become more and more appealing. This paper focuses on forecasting the number of cars available in a restricted urban area in the short term (e.g., in the next 2 hours). It applies regression techniques to train multivariate models from heterogeneous data including the occupancy levels of the target and neighbor areas, weather and temporal information (e.g., season, holidays, daily time slots). To contextualize occupancy level predictions according to the target time and location, we generate models tailored to specific profiles of areas according to the prevalent category of Points-of-Interest in the area. Furthermore, to avoid bias due to presence of uncorrelated features we perform feature selection prior to regression model learning. As a case study, the prediction system is applied to data acquired from a real car sharing system. The results show promising system performance and leave room for insightful extensions.
{"title":"CarPredictor: Forecasting the Number of Free Floating Car Sharing Vehicles within Restricted Urban Areas","authors":"Luca Cagliero, S. Chiusano, Elena Daraio, P. Garza","doi":"10.1109/BigDataCongress.2019.00022","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2019.00022","url":null,"abstract":"Free floating car sharing is a popular rental model for cars in shared use. In urban environments, it has become particularly attractive for users who make short trips or who make occasional use of the car. Since cars are not uniformly distributed across city areas, monitoring the number of cars available within restricted urban areas is crucial for both shaping service provision and improving the user experience. To address these issues, the application of machine learning techniques to analyze car mobility data has become more and more appealing. This paper focuses on forecasting the number of cars available in a restricted urban area in the short term (e.g., in the next 2 hours). It applies regression techniques to train multivariate models from heterogeneous data including the occupancy levels of the target and neighbor areas, weather and temporal information (e.g., season, holidays, daily time slots). To contextualize occupancy level predictions according to the target time and location, we generate models tailored to specific profiles of areas according to the prevalent category of Points-of-Interest in the area. Furthermore, to avoid bias due to presence of uncorrelated features we perform feature selection prior to regression model learning. As a case study, the prediction system is applied to data acquired from a real car sharing system. The results show promising system performance and leave room for insightful extensions.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131070475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-01DOI: 10.1109/BigDataCongress.2019.00028
Boyuan Guan, Liting Hu, Pinchao Liu, Hailu Xu, Z. Fu, Qingyang Wang
Digital Repository Systems have been used in most modern digital library platforms. Even so, Digital Repository Systems often suffer from problems such as low discoverability, poor usability, and high drop-off visit rates. With these problems, the majority of the content in the digital library platforms may not be exposed to end users, while at the same time, users are desperately looking for something which may not be returned from the platforms. The recommendation systems for digital libraries were proposed to solve these problems. However, most recommendation systems have been implemented by directly adopting one specific type of recommender like Collaborative-Filtering (CF), Content-Based Filtering (CBF), Stereotyping, or hybrid recommenders. As such, they are either (1) not able to accommodate the variation of the user groups, (2) require too much labor, or (3) require intensive computational complexity. In this paper, we design and implement a new recommendation system framework for Digital Repository Systems, named dpSmart, which allows multiple recommenders to work collaboratively on the same platform. In the proposed system, a user-group based recommendation strategy is applied to accommodate the requirements from the different types of users. A user recognition model is built, which can avoid the intensive labor of the stereotyping recommender. We implement the system prototype as a sub-system of the FIU library site (http://dpanther.fiu.edu) and evaluate it on January 2019 and February 2019. During this time, the Page Views have increased from 8,502 to 10,916 and 10,942 to 12,314 respectively, compared to 2018, demonstrating the effectiveness of our proposed system.
{"title":"dpSmart: A Flexible Group Based Recommendation Framework for Digital Repository Systems","authors":"Boyuan Guan, Liting Hu, Pinchao Liu, Hailu Xu, Z. Fu, Qingyang Wang","doi":"10.1109/BigDataCongress.2019.00028","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2019.00028","url":null,"abstract":"Digital Repository Systems have been used in most modern digital library platforms. Even so, Digital Repository Systems often suffer from problems such as low discoverability, poor usability, and high drop-off visit rates. With these problems, the majority of the content in the digital library platforms may not be exposed to end users, while at the same time, users are desperately looking for something which may not be returned from the platforms. The recommendation systems for digital libraries were proposed to solve these problems. However, most recommendation systems have been implemented by directly adopting one specific type of recommender like Collaborative-Filtering (CF), Content-Based Filtering (CBF), Stereotyping, or hybrid recommenders. As such, they are either (1) not able to accommodate the variation of the user groups, (2) require too much labor, or (3) require intensive computational complexity. In this paper, we design and implement a new recommendation system framework for Digital Repository Systems, named dpSmart, which allows multiple recommenders to work collaboratively on the same platform. In the proposed system, a user-group based recommendation strategy is applied to accommodate the requirements from the different types of users. A user recognition model is built, which can avoid the intensive labor of the stereotyping recommender. We implement the system prototype as a sub-system of the FIU library site (http://dpanther.fiu.edu) and evaluate it on January 2019 and February 2019. During this time, the Page Views have increased from 8,502 to 10,916 and 10,942 to 12,314 respectively, compared to 2018, demonstrating the effectiveness of our proposed system.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116390619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-01DOI: 10.1109/BigDataCongress.2019.00037
Fan Yang, Adina Crainiceanu, Zhiyuan Chen, Don Needham
Federated RDF systems allow users to retrieve data from multiple independent sources without needing to have all the data in the same triple store. The performance of these systems can be poor for large and geographically distributed RDF data where network transfer costs are high. This paper introduces CBTP, a novel join algorithm that takes advantage of network topology to decrease the cost of processing SPARQL queries in a geographically distributed environment. Federation members are grouped in clusters, based on the network communication cost between the members, and the bulk of the join processing is pushed to the clusters. We use an overlap list to efficiently compute join results from triples in different clusters. We implement our algorithms in OpenRDF Sesame federated framework and use Apache Rya triple store instances as federation members. Experimental evaluation results show the advantages of our approach over existing techniques.
{"title":"Cluster-Based Join for Geographically Distributed Big RDF Data","authors":"Fan Yang, Adina Crainiceanu, Zhiyuan Chen, Don Needham","doi":"10.1109/BigDataCongress.2019.00037","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2019.00037","url":null,"abstract":"Federated RDF systems allow users to retrieve data from multiple independent sources without needing to have all the data in the same triple store. The performance of these systems can be poor for large and geographically distributed RDF data where network transfer costs are high. This paper introduces CBTP, a novel join algorithm that takes advantage of network topology to decrease the cost of processing SPARQL queries in a geographically distributed environment. Federation members are grouped in clusters, based on the network communication cost between the members, and the bulk of the join processing is pushed to the clusters. We use an overlap list to efficiently compute join results from triples in different clusters. We implement our algorithms in OpenRDF Sesame federated framework and use Apache Rya triple store instances as federation members. Experimental evaluation results show the advantages of our approach over existing techniques.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129221014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}