2019 IEEE International Congress on Big Data (BigDataCongress)最新文献

英文中文

LPOD: A Local Path Based Optimized Scheduling Algorithm for Deadline-Constrained Big Data Workflows in the Cloud LPOD:一种基于本地路径的云环境下截止日期约束大数据工作流优化调度算法

2019 IEEE International Congress on Big Data (BigDataCongress)

Pub Date : 2019-07-08 DOI: 10.1109/BigDataCongress.2019.00018

Changxin Bai, Shiyong Lu, Ishtiaq Ahmed, D. Che, Aravind Mohan

List based scheduling algorithms have been proven an optimistic strategy with a shorter response time to generate feasible solutions for the workflow scheduling problem. Data-intensive and computation-intensive workflow applications have different characteristics in terms of the ratio between data transfer time and task execution time. Workflow scheduling algorithms in a cloud-based environment should adequately consider the characteristics of the underlying cloud platform such as the on-demand resource provisioning strategy, the practically unlimited compute capacities, the booting times of virtual machines, the homogeneous network and the pay-as-you-go price model to produce an optimal scheduling solution within the deadline constraint of a given workflow. In this paper, a path based scheduling algorithm, named LPOD, is proposed to find the best workflow schedule solution with minimum monetary cost in a cloud computing environment. A series of case studies have been carefully conducted using synthetic workflows based on DATAVIEW, which is a popular open-source big data workflow management system. The experimental results show that the proposed algorithm is efficient and can generate better workflow schedules than the state-of-the-art algorithms such as IC-PCP and SGX-E2C2D.

基于列表的调度算法具有较短的响应时间，是求解工作流调度问题的一种乐观策略。数据密集型和计算密集型工作流应用程序在数据传输时间和任务执行时间之间的比率方面具有不同的特征。基于云的环境中的工作流调度算法应该充分考虑底层云平台的特征，如按需资源供应策略、几乎无限的计算容量、虚拟机的启动时间、同构网络和按需付费价格模型，以便在给定工作流的最后期限约束内生成最优调度解决方案。本文提出了一种基于路径的工作流调度算法LPOD，用于在云计算环境下寻找成本最小的最佳工作流调度方案。使用基于DATAVIEW的合成工作流进行了一系列的案例研究，DATAVIEW是一个流行的开源大数据工作流管理系统。实验结果表明，与现有的IC-PCP和SGX-E2C2D算法相比，该算法具有较高的效率，能够生成更好的工作流调度。

{"title":"LPOD: A Local Path Based Optimized Scheduling Algorithm for Deadline-Constrained Big Data Workflows in the Cloud","authors":"Changxin Bai, Shiyong Lu, Ishtiaq Ahmed, D. Che, Aravind Mohan","doi":"10.1109/BigDataCongress.2019.00018","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2019.00018","url":null,"abstract":"List based scheduling algorithms have been proven an optimistic strategy with a shorter response time to generate feasible solutions for the workflow scheduling problem. Data-intensive and computation-intensive workflow applications have different characteristics in terms of the ratio between data transfer time and task execution time. Workflow scheduling algorithms in a cloud-based environment should adequately consider the characteristics of the underlying cloud platform such as the on-demand resource provisioning strategy, the practically unlimited compute capacities, the booting times of virtual machines, the homogeneous network and the pay-as-you-go price model to produce an optimal scheduling solution within the deadline constraint of a given workflow. In this paper, a path based scheduling algorithm, named LPOD, is proposed to find the best workflow schedule solution with minimum monetary cost in a cloud computing environment. A series of case studies have been carefully conducted using synthetic workflows based on DATAVIEW, which is a popular open-source big data workflow management system. The experimental results show that the proposed algorithm is efficient and can generate better workflow schedules than the state-of-the-art algorithms such as IC-PCP and SGX-E2C2D.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131522507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Reducing Feature Embedding Data for Discovering Relations in Big Text Data 减少特征嵌入数据以发现大文本数据中的关系

2019 IEEE International Congress on Big Data (BigDataCongress)

Pub Date : 2019-07-08 DOI: 10.1109/BigDataCongress.2019.00038

Haojie Huang, R. Wong

Relation extraction is a critical task in building a knowledge base from unstructured text documents. Most works in automatic relation extraction have applied deep learning techniques such as Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) in large text corpora. However, they require a large amount of human labelling data, which is labour intensive and is hardly applied in a new domain of document without human supervision. This paper proposes a novel framework to extract relations in multi-domain texts effectively. In particular, we construct the framework in three phases including preprocessing, feature embedding and relation extraction. We show that a small proportion of training data is sufficient to train our relation extraction framework and achieve a good accuracy in relation extraction works.

关系提取是从非结构化文本文档中构建知识库的一项关键任务。大多数自动关系提取的工作都是在大文本语料库中应用卷积神经网络(CNN)和长短期记忆(LSTM)等深度学习技术。然而，它们需要大量的人工标记数据，这是劳动密集型的，并且在没有人工监督的情况下很难应用于新的文件领域。本文提出了一种新的多领域文本关系提取框架。具体而言，我们分预处理、特征嵌入和关系提取三个阶段构建了该框架。研究表明，少量的训练数据足以训练我们的关系提取框架，并在关系提取工作中取得良好的准确性。

引用次数: 1

Mining Semantic Information in Rumor Detection via a Deep Visual Perception Based Recurrent Neural Networks 基于深度视觉感知递归神经网络的谣言检测语义信息挖掘

2019 IEEE International Congress on Big Data (BigDataCongress)

Pub Date : 2019-07-08 DOI: 10.1109/BigDataCongress.2019.00016

Feng Xing, Caili Guo

Rumor detection becomes a major issue concerning the public and government as the proliferation of social media in information dissemination. However, most existing methods only extract hand-crafted features, far from adequate in interpreting semantics latent in texts. For social events, there also exists rich social contextual information and highlevel interactions among significant features, which provides cues for interpreting semantics. In this paper, we propose a novel attention learning framework via deep visual perception based recurrent neural network (ViP-RNN), considering both high-level feature interactions and contextual information. In particular, the proposed model is based on RNN for capturing the long-distance temporal dependencies of contextual information of relevant posts and composing low-level lexical features into high-level semantic interactions hierarchically by visual perception of convolutional neural network (CNN). To incorporate information learned by RNN and CNN, we combine convolutional and recurrent layers into one model so that the model can capture a discriminative semantic representation of social events more efficiently by utilizing visual perception attention vector i.e. outputs of CNN to align long-distance temporal dependencies. We conduct experiments on real datasets collected from social media websites, which demonstrates the effectiveness of our approach and the merits of model integration.

随着社交媒体在信息传播中的普及，谣言检测成为公众和政府关注的重大问题。然而，现有的方法大多只提取了手工制作的特征，远远不足以解释文本中潜在的语义。对于社会事件，也存在着丰富的社会语境信息和显著特征之间的高层次交互作用，为语义解释提供线索。在本文中，我们提出了一种基于深度视觉感知的递归神经网络(ViP-RNN)的注意力学习框架，该框架考虑了高级特征交互和上下文信息。特别是，该模型基于RNN，通过卷积神经网络(CNN)的视觉感知，捕获相关帖子上下文信息的远距离时间依赖关系，并将低级词汇特征分层构成高级语义交互。为了整合RNN和CNN学习到的信息，我们将卷积层和循环层结合到一个模型中，使该模型能够更有效地捕获社会事件的判别语义表示，利用视觉感知注意向量，即CNN的输出来对齐长距离时间依赖性。我们对从社交媒体网站收集的真实数据集进行了实验，证明了我们的方法的有效性和模型集成的优点。

{"title":"Mining Semantic Information in Rumor Detection via a Deep Visual Perception Based Recurrent Neural Networks","authors":"Feng Xing, Caili Guo","doi":"10.1109/BigDataCongress.2019.00016","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2019.00016","url":null,"abstract":"Rumor detection becomes a major issue concerning the public and government as the proliferation of social media in information dissemination. However, most existing methods only extract hand-crafted features, far from adequate in interpreting semantics latent in texts. For social events, there also exists rich social contextual information and highlevel interactions among significant features, which provides cues for interpreting semantics. In this paper, we propose a novel attention learning framework via deep visual perception based recurrent neural network (ViP-RNN), considering both high-level feature interactions and contextual information. In particular, the proposed model is based on RNN for capturing the long-distance temporal dependencies of contextual information of relevant posts and composing low-level lexical features into high-level semantic interactions hierarchically by visual perception of convolutional neural network (CNN). To incorporate information learned by RNN and CNN, we combine convolutional and recurrent layers into one model so that the model can capture a discriminative semantic representation of social events more efficiently by utilizing visual perception attention vector i.e. outputs of CNN to align long-distance temporal dependencies. We conduct experiments on real datasets collected from social media websites, which demonstrates the effectiveness of our approach and the merits of model integration.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129386956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Neural Network Based Transaction Classification System for Chinese Transaction Behavior Analysis 基于神经网络的中国交易行为分析交易分类系统

2019 IEEE International Congress on Big Data (BigDataCongress)

Pub Date : 2019-07-01 DOI: 10.1109/BigDataCongress.2019.00021

Jianyang Yu, Yuanyuan Qiao, Nanfei Shu, Kewu Sun, Shenshen Zhou, Jie Yang

With the rapid development of Chinese economy, it is significant to examine the economic activities in China. Each transaction behavior is recorded by the invoice. The invoice contains the transaction content, the classification of the transaction behavior (in accordance with the Tax Classification and Coding for Commodities and Services issued by the state) and transaction price, etc. Our work uses real mass invoice data collected from Zhejiang Province and conducts a multi-dimensional analysis of Chinese transaction behavior based on transaction behavior classification model. Firstly, we propose a compositional CNN-RNN model with attention mechanism to recommend the corresponding categories of transaction behavior collected from tax invoices. It maps the transaction behavior recorded in the invoice to transaction code in the Tax Classification and Coding for Commodities and Services issued by the state. Preliminary experiments show that the top-one accuracy of classifying transaction behavior achieves 75%. Then, we focus on the quantity distribution of invoice data and draw a conclusion that the major category with larger number of invoice records is more diversified in subdivided categories. After that, we studied the price distribution of various transaction behaviors to discover the difference in price distribution between different industries. Prices in the major categories of goods are more concentrated in the middle or lower prices. We can analyze the regional industrial structure through the price distribution of the industry which makes sense to study the economy of the region from the perspective of industry.

随着中国经济的快速发展，研究中国的经济活动意义重大。每一笔交易行为都由发票记录。发票包含交易内容、交易行为分类（根据国家颁布的《商品和服务税收分类与编码》）和交易价格等。我们的研究利用从浙江省采集的真实海量发票数据，基于交易行为分类模型对中国人的交易行为进行了多维度分析。首先，我们提出了一个具有关注机制的 CNN-RNN 组成模型，以推荐从税务发票中收集到的交易行为的相应类别。它将发票中记录的交易行为与国家颁布的《商品和服务税收分类与编码》中的交易代码进行映射。初步实验表明，交易行为分类的最高准确率达到 75%。然后，我们重点研究了发票数据的数量分布，得出了发票记录数量较多的大类在细分类别中更加多样化的结论。之后，我们研究了各种交易行为的价格分布，发现了不同行业之间价格分布的差异。大类商品的价格更多集中在中低价位。我们可以通过产业的价格分布来分析区域产业结构，这对于从产业角度研究区域经济是很有意义的。

{"title":"Neural Network Based Transaction Classification System for Chinese Transaction Behavior Analysis","authors":"Jianyang Yu, Yuanyuan Qiao, Nanfei Shu, Kewu Sun, Shenshen Zhou, Jie Yang","doi":"10.1109/BigDataCongress.2019.00021","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2019.00021","url":null,"abstract":"With the rapid development of Chinese economy, it is significant to examine the economic activities in China. Each transaction behavior is recorded by the invoice. The invoice contains the transaction content, the classification of the transaction behavior (in accordance with the Tax Classification and Coding for Commodities and Services issued by the state) and transaction price, etc. Our work uses real mass invoice data collected from Zhejiang Province and conducts a multi-dimensional analysis of Chinese transaction behavior based on transaction behavior classification model. Firstly, we propose a compositional CNN-RNN model with attention mechanism to recommend the corresponding categories of transaction behavior collected from tax invoices. It maps the transaction behavior recorded in the invoice to transaction code in the Tax Classification and Coding for Commodities and Services issued by the state. Preliminary experiments show that the top-one accuracy of classifying transaction behavior achieves 75%. Then, we focus on the quantity distribution of invoice data and draw a conclusion that the major category with larger number of invoice records is more diversified in subdivided categories. After that, we studied the price distribution of various transaction behaviors to discover the difference in price distribution between different industries. Prices in the major categories of goods are more concentrated in the middle or lower prices. We can analyze the regional industrial structure through the price distribution of the industry which makes sense to study the economy of the region from the perspective of industry.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130619180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

AIOps for a Cloud Object Storage Service 云对象存储服务的AIOps

2019 IEEE International Congress on Big Data (BigDataCongress)

Pub Date : 2019-07-01 DOI: 10.1109/BigDataCongress.2019.00036

A. Levin, Shelly Garion, E. K. Kolodner, D. Lorenz, K. Barabash, Mike Kugler, Niall McShane

With the growing reliance on the ubiquitous availability of IT systems and services, these systems become more global, scaled, and complex to operate. To maintain business viability, IT service providers must put in place reliable and cost efficient operations support. Artificial Intelligence for IT Operations (AIOps) is a promising technology for alleviating operational complexity of IT systems and services. AIOps platforms utilize big data, machine learning and other advanced analytics technologies to enhance IT operations with proactive actionable dynamic insight. In this paper we share our experience applying the AIOps approach to a production cloud object storage service to get actionable insights into system's behavior and health. We describe a real-life production cloud scale service and its operational data, present the AIOps platform we have created, and show how it has helped us resolving operational pain points.

随着对无处不在的IT系统和服务的日益依赖，这些系统变得更加全球化、规模化和操作复杂。为了维持业务的生存能力，IT服务提供商必须提供可靠且经济高效的操作支持。人工智能用于IT运营(AIOps)是一种很有前途的技术，用于减轻IT系统和服务的操作复杂性。AIOps平台利用大数据、机器学习和其他先进的分析技术，通过主动、可操作的动态洞察来增强IT运营。在本文中，我们将分享我们将AIOps方法应用于生产云对象存储服务的经验，以获得对系统行为和健康状况的可操作见解。我们描述了一个真实的生产云规模服务及其运营数据，展示了我们创建的AIOps平台，并展示了它如何帮助我们解决运营痛点。

引用次数: 17

An Approach to Cross-Lingual Sentiment Lexicon Construction 一种跨语言情感词典构建方法

2019 IEEE International Congress on Big Data (BigDataCongress)

Pub Date : 2019-07-01 DOI: 10.1109/BigDataCongress.2019.00030

Chia-Hsuan Chang, Ming-Lun Wu, San-Yih Hwang

Lexicon-based sentiment analysis is a popular and practical approach for sentiment analysis. However, sentiment lexicons, which may be abundant in some language such as English, are scarce in many other languages. The cross-lingual lexicon learning aims to extend lexicons for the language with less resources from those lexicons available in other languages. In this paper, we propose an approach that builds a skip-gram variant to map word spaces across languages so as to construct lexicons for the language with less resources. We show in our preliminary experiment that our approach can generate lexicons that are similar to those crafted by human experts.

基于词典的情感分析是一种流行且实用的情感分析方法。然而，情感词汇在某些语言(如英语)中可能是丰富的，而在许多其他语言中则是稀缺的。跨语言词汇学习的目的是在资源较少的情况下，从其他语言的词汇中扩充本语言的词汇。在本文中，我们提出了一种构建跨语言的跳格变体来映射词空间的方法，从而为资源较少的语言构建词汇。我们在初步实验中表明，我们的方法可以生成与人类专家制作的相似的词汇。

引用次数: 4

IEEE BigData Congress 2019 Program Committee IEEE大数据大会2019项目委员会

2019 IEEE International Congress on Big Data (BigDataCongress)

Pub Date : 2019-07-01 DOI: 10.1109/bigdatacongress.2019.00012

K. Aberer, Frederico Alvares De Oliveira

Karl Aberer, Ecole Polyltecnique Fédérale de Lausanne Frederico Alvares De Oliveira, ASCOLA Research Group, Mines Nantes, INRIA, LINA Mohsen Amini Salehi, University of Louisiana Lafayette Ahamed Awad, Cairo University Jaume Bacardit, Newcastel University Payam Barnaghi, University of Surrey Rodrigo Barros, PUCRS Arun Balaji Buduru, Indraprastha Institute of Information Technology–Delhi (IIIT-D) Rodrigo N. Calheiros, Western Sydney University Alberto Cano, Virginia Commonwealth University Xin Cao, The University of New South Wales Miguel Cárdenas Montes, CIEMAT Bogdan Cautis, ENST Paris–UMR CNRS 5141 Eugenio Cesario, ICAR-CNRS Subarna Chatterjee, INRIA Lisi Chen, Hong Kong Baptist University Peng Chen Byron Choi, Hong Kong Baptist University Félix Cuadrado, Queen Mary University of London Edward Curry, Insight Centre for Data Analytics, NUI Galway Dilma Da Silva, Texas A&M University Hong-Ning Dai, Macau University of Science & Technology Dong Dai, UNC Charlotte Patrizio Dazzi, ISTI-CNRS Sheng Di, ANL Mario José Diván, Engineering School (UNLPam) & Divsar Youcef Djenouri, LRIA_USTHB Matthieu Dorier, Argonne National Laboratory Schahram Dustdar, Vienna University of Technology Liyue Fan, SUNY Albany George H.L. Fletcher, Eindhoven University of Technology Matthew Forshaw, Newcastle University Gangadharan G.R., IBM Mohamed Gaber, Birmingham City University Mikel Galar, Universidad Pública de Navarra Antonio Gómez-Iglesias, Intel Jose M. Granado-Criado, University of Extremadura Le Gruenwald, The University of Oklahoma Jarek Gryz, York University Yanfei Guo, Argonne National Laboratory Mohamed Hamlich, FSTM Jin-Kao Hao, University of Angers Takahiro Hara, Osaka University Jiong He, Advanced Digital Sciences Centre Qiang He, Swinburne University of Technology Francisco Herrera, University of Granada Jan Hidders, Vrije Universiteit Brussels Liting Hu, Florida International University

Karl Aberer，洛桑理工学院Frederico Alvares de Oliveira, ASCOLA研究小组，Mines Nantes, INRIA, LINA Mohsen Amini Salehi，路易斯安那大学Lafayette Ahamed Awad，开罗大学Jaume Bacardit，纽卡斯尔大学Payam Barnaghi，萨里大学Rodrigo Barros, PUCRS Arun Balaji Buduru, Indraprastha德里信息技术研究所(IIIT-D) Rodrigo N. Calheiros，西悉尼大学Alberto Cano，弗吉尼亚联邦大学曹欣、新南威尔士大学Miguel Cárdenas Montes、CIEMAT Bogdan Cautis、ENST Paris-UMR CNRS 5141 Eugenio Cesario、ICAR-CNRS Subarna Chatterjee、INRIA Lisi Chen、香港浸会大学陈彭Byron Choi、香港浸会大学flix Cuadrado、伦敦玛丽女王大学Edward Curry、数据分析洞察中心、NUI Galway Dilma Da Silva、德州农工大学戴洪宁、澳门科技大学戴东、北卡大学Charlotte Patrizio Dazzi、ISTI-CNRS disheng、ANL Mario jos Diván、工程学院(UNLPam) & Divsar Youcef Djenouri、LRIA_USTHB Matthieu Dorier、阿贡国家实验室Schahram Dustdar、维也纳理工大学Liyue Fan、纽约州立大学奥尔巴尼分校George H.L. Fletcher、埃因霍温理工大学Matthew Forshaw、纽卡斯尔大学Gangadharan G.R、IBM Mohamed Gaber、伯明翰城市大学Mikel Galar、安东尼奥纳瓦拉大学Pública Gómez-Iglesias，英特尔Jose M. Granado-Criado，埃斯特雷马杜拉大学，俄克拉何马大学，约克大学，郭燕飞，阿贡国家实验室Mohamed Hamlich, FSTM金考豪，昂热斯大学原孝宏，大阪大学，高级数字科学中心何强，斯威本科技大学Francisco Herrera，格拉纳达大学Jan Hidders，布鲁塞尔自由大学胡丽玲，佛罗里达国际大学

{"title":"IEEE BigData Congress 2019 Program Committee","authors":"K. Aberer, Frederico Alvares De Oliveira","doi":"10.1109/bigdatacongress.2019.00012","DOIUrl":"https://doi.org/10.1109/bigdatacongress.2019.00012","url":null,"abstract":"Karl Aberer, Ecole Polyltecnique Fédérale de Lausanne Frederico Alvares De Oliveira, ASCOLA Research Group, Mines Nantes, INRIA, LINA Mohsen Amini Salehi, University of Louisiana Lafayette Ahamed Awad, Cairo University Jaume Bacardit, Newcastel University Payam Barnaghi, University of Surrey Rodrigo Barros, PUCRS Arun Balaji Buduru, Indraprastha Institute of Information Technology–Delhi (IIIT-D) Rodrigo N. Calheiros, Western Sydney University Alberto Cano, Virginia Commonwealth University Xin Cao, The University of New South Wales Miguel Cárdenas Montes, CIEMAT Bogdan Cautis, ENST Paris–UMR CNRS 5141 Eugenio Cesario, ICAR-CNRS Subarna Chatterjee, INRIA Lisi Chen, Hong Kong Baptist University Peng Chen Byron Choi, Hong Kong Baptist University Félix Cuadrado, Queen Mary University of London Edward Curry, Insight Centre for Data Analytics, NUI Galway Dilma Da Silva, Texas A&M University Hong-Ning Dai, Macau University of Science & Technology Dong Dai, UNC Charlotte Patrizio Dazzi, ISTI-CNRS Sheng Di, ANL Mario José Diván, Engineering School (UNLPam) & Divsar Youcef Djenouri, LRIA_USTHB Matthieu Dorier, Argonne National Laboratory Schahram Dustdar, Vienna University of Technology Liyue Fan, SUNY Albany George H.L. Fletcher, Eindhoven University of Technology Matthew Forshaw, Newcastle University Gangadharan G.R., IBM Mohamed Gaber, Birmingham City University Mikel Galar, Universidad Pública de Navarra Antonio Gómez-Iglesias, Intel Jose M. Granado-Criado, University of Extremadura Le Gruenwald, The University of Oklahoma Jarek Gryz, York University Yanfei Guo, Argonne National Laboratory Mohamed Hamlich, FSTM Jin-Kao Hao, University of Angers Takahiro Hara, Osaka University Jiong He, Advanced Digital Sciences Centre Qiang He, Swinburne University of Technology Francisco Herrera, University of Granada Jan Hidders, Vrije Universiteit Brussels Liting Hu, Florida International University","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125307165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CarPredictor: Forecasting the Number of Free Floating Car Sharing Vehicles within Restricted Urban Areas CarPredictor:预测城市限制区域内自由浮动的共享汽车数量

2019 IEEE International Congress on Big Data (BigDataCongress)

Pub Date : 2019-07-01 DOI: 10.1109/BigDataCongress.2019.00022

Luca Cagliero, S. Chiusano, Elena Daraio, P. Garza

Free floating car sharing is a popular rental model for cars in shared use. In urban environments, it has become particularly attractive for users who make short trips or who make occasional use of the car. Since cars are not uniformly distributed across city areas, monitoring the number of cars available within restricted urban areas is crucial for both shaping service provision and improving the user experience. To address these issues, the application of machine learning techniques to analyze car mobility data has become more and more appealing. This paper focuses on forecasting the number of cars available in a restricted urban area in the short term (e.g., in the next 2 hours). It applies regression techniques to train multivariate models from heterogeneous data including the occupancy levels of the target and neighbor areas, weather and temporal information (e.g., season, holidays, daily time slots). To contextualize occupancy level predictions according to the target time and location, we generate models tailored to specific profiles of areas according to the prevalent category of Points-of-Interest in the area. Furthermore, to avoid bias due to presence of uncorrelated features we perform feature selection prior to regression model learning. As a case study, the prediction system is applied to data acquired from a real car sharing system. The results show promising system performance and leave room for insightful extensions.

自由浮动汽车共享是一种流行的共享汽车租赁模式。在城市环境中，它对短途旅行或偶尔使用汽车的用户特别有吸引力。由于汽车在城市地区的分布并不均匀，因此监测城市限制区域内可用汽车的数量对于塑造服务提供和改善用户体验至关重要。为了解决这些问题，应用机器学习技术来分析汽车移动数据变得越来越有吸引力。本文的重点是预测短期内(例如，在未来2小时内)限制城市区域内可用的汽车数量。它应用回归技术从异构数据中训练多元模型，包括目标和邻近地区的入住率、天气和时间信息(如季节、节假日、每日时间段)。为了根据目标时间和地点进行入住率预测，我们根据该地区兴趣点的普遍类别生成了针对该地区特定概况的模型。此外，为了避免由于不相关特征的存在而产生的偏差，我们在回归模型学习之前进行特征选择。作为案例研究，将该预测系统应用于实际汽车共享系统的数据。结果显示了令人满意的系统性能，并为有见地的扩展留下了空间。

{"title":"CarPredictor: Forecasting the Number of Free Floating Car Sharing Vehicles within Restricted Urban Areas","authors":"Luca Cagliero, S. Chiusano, Elena Daraio, P. Garza","doi":"10.1109/BigDataCongress.2019.00022","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2019.00022","url":null,"abstract":"Free floating car sharing is a popular rental model for cars in shared use. In urban environments, it has become particularly attractive for users who make short trips or who make occasional use of the car. Since cars are not uniformly distributed across city areas, monitoring the number of cars available within restricted urban areas is crucial for both shaping service provision and improving the user experience. To address these issues, the application of machine learning techniques to analyze car mobility data has become more and more appealing. This paper focuses on forecasting the number of cars available in a restricted urban area in the short term (e.g., in the next 2 hours). It applies regression techniques to train multivariate models from heterogeneous data including the occupancy levels of the target and neighbor areas, weather and temporal information (e.g., season, holidays, daily time slots). To contextualize occupancy level predictions according to the target time and location, we generate models tailored to specific profiles of areas according to the prevalent category of Points-of-Interest in the area. Furthermore, to avoid bias due to presence of uncorrelated features we perform feature selection prior to regression model learning. As a case study, the prediction system is applied to data acquired from a real car sharing system. The results show promising system performance and leave room for insightful extensions.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131070475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

dpSmart: A Flexible Group Based Recommendation Framework for Digital Repository Systems 数字存储系统中基于灵活组的推荐框架

2019 IEEE International Congress on Big Data (BigDataCongress)

Pub Date : 2019-07-01 DOI: 10.1109/BigDataCongress.2019.00028

Boyuan Guan, Liting Hu, Pinchao Liu, Hailu Xu, Z. Fu, Qingyang Wang

Digital Repository Systems have been used in most modern digital library platforms. Even so, Digital Repository Systems often suffer from problems such as low discoverability, poor usability, and high drop-off visit rates. With these problems, the majority of the content in the digital library platforms may not be exposed to end users, while at the same time, users are desperately looking for something which may not be returned from the platforms. The recommendation systems for digital libraries were proposed to solve these problems. However, most recommendation systems have been implemented by directly adopting one specific type of recommender like Collaborative-Filtering (CF), Content-Based Filtering (CBF), Stereotyping, or hybrid recommenders. As such, they are either (1) not able to accommodate the variation of the user groups, (2) require too much labor, or (3) require intensive computational complexity. In this paper, we design and implement a new recommendation system framework for Digital Repository Systems, named dpSmart, which allows multiple recommenders to work collaboratively on the same platform. In the proposed system, a user-group based recommendation strategy is applied to accommodate the requirements from the different types of users. A user recognition model is built, which can avoid the intensive labor of the stereotyping recommender. We implement the system prototype as a sub-system of the FIU library site (http://dpanther.fiu.edu) and evaluate it on January 2019 and February 2019. During this time, the Page Views have increased from 8,502 to 10,916 and 10,942 to 12,314 respectively, compared to 2018, demonstrating the effectiveness of our proposed system.

数字资源库系统已在大多数现代数字图书馆平台中使用。尽管如此，数字存储库系统经常会遇到一些问题，如低可发现性、低可用性和高访问量。由于这些问题，数字图书馆平台上的大部分内容可能不会暴露给最终用户，而与此同时，用户却在拼命地寻找一些可能无法从平台返回的东西。针对这些问题，提出了数字图书馆推荐系统。然而，大多数推荐系统都是通过直接采用一种特定类型的推荐器来实现的，比如协同过滤(CF)、基于内容的过滤(CBF)、刻板印象或混合推荐器。因此，它们要么(1)不能适应用户组的变化，(2)需要太多的劳动，或者(3)需要大量的计算复杂性。在本文中，我们设计并实现了一个新的推荐系统框架dpSmart，它允许多个推荐系统在同一个平台上协同工作。在该系统中，采用基于用户组的推荐策略来适应不同类型用户的需求。建立了用户识别模型，避免了刻板印象式推荐的密集劳动。我们将系统原型作为FIU图书馆网站(http://dpanther.fiu.edu)的子系统实施，并在2019年1月和2019年2月对其进行评估。在此期间，与2018年相比，页面浏览量分别从8,502增加到10,916和10,942增加到12,314，证明了我们提出的系统的有效性。

{"title":"dpSmart: A Flexible Group Based Recommendation Framework for Digital Repository Systems","authors":"Boyuan Guan, Liting Hu, Pinchao Liu, Hailu Xu, Z. Fu, Qingyang Wang","doi":"10.1109/BigDataCongress.2019.00028","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2019.00028","url":null,"abstract":"Digital Repository Systems have been used in most modern digital library platforms. Even so, Digital Repository Systems often suffer from problems such as low discoverability, poor usability, and high drop-off visit rates. With these problems, the majority of the content in the digital library platforms may not be exposed to end users, while at the same time, users are desperately looking for something which may not be returned from the platforms. The recommendation systems for digital libraries were proposed to solve these problems. However, most recommendation systems have been implemented by directly adopting one specific type of recommender like Collaborative-Filtering (CF), Content-Based Filtering (CBF), Stereotyping, or hybrid recommenders. As such, they are either (1) not able to accommodate the variation of the user groups, (2) require too much labor, or (3) require intensive computational complexity. In this paper, we design and implement a new recommendation system framework for Digital Repository Systems, named dpSmart, which allows multiple recommenders to work collaboratively on the same platform. In the proposed system, a user-group based recommendation strategy is applied to accommodate the requirements from the different types of users. A user recognition model is built, which can avoid the intensive labor of the stereotyping recommender. We implement the system prototype as a sub-system of the FIU library site (http://dpanther.fiu.edu) and evaluate it on January 2019 and February 2019. During this time, the Page Views have increased from 8,502 to 10,916 and 10,942 to 12,314 respectively, compared to 2018, demonstrating the effectiveness of our proposed system.","PeriodicalId":335850,"journal":{"name":"2019 IEEE International Congress on Big Data (BigDataCongress)","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116390619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Cluster-Based Join for Geographically Distributed Big RDF Data 基于集群的地理分布式大RDF数据连接

2019 IEEE International Congress on Big Data (BigDataCongress)

Pub Date : 2019-07-01 DOI: 10.1109/BigDataCongress.2019.00037

Fan Yang, Adina Crainiceanu, Zhiyuan Chen, Don Needham

Federated RDF systems allow users to retrieve data from multiple independent sources without needing to have all the data in the same triple store. The performance of these systems can be poor for large and geographically distributed RDF data where network transfer costs are high. This paper introduces CBTP, a novel join algorithm that takes advantage of network topology to decrease the cost of processing SPARQL queries in a geographically distributed environment. Federation members are grouped in clusters, based on the network communication cost between the members, and the bulk of the join processing is pushed to the clusters. We use an overlap list to efficiently compute join results from triples in different clusters. We implement our algorithms in OpenRDF Sesame federated framework and use Apache Rya triple store instances as federation members. Experimental evaluation results show the advantages of our approach over existing techniques.

联邦RDF系统允许用户从多个独立的数据源检索数据，而不需要将所有数据放在同一个三重存储中。对于网络传输成本很高的大型和地理上分布的RDF数据，这些系统的性能可能很差。本文介绍了一种新的连接算法CBTP，它利用网络拓扑结构来降低在地理分布环境中处理SPARQL查询的成本。根据成员之间的网络通信成本，将联邦成员分组到集群中，并且将大量的连接处理推到集群中。我们使用重叠列表来有效地计算不同集群中三元组的连接结果。我们在OpenRDF Sesame联邦框架中实现算法，并使用Apache Rya三重存储实例作为联邦成员。实验评估结果表明，我们的方法优于现有的技术。

引用次数: 2

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2019 IEEE International Congress on Big Data (BigDataCongress)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀