2020 IEEE International Conference on Smart Data Services (SMDS)最新文献_第2页

Stargazer: A Deep Learning Approach for Estimating the Performance of Edge- Based Clustering Applications 基于边缘聚类应用的性能评估的深度学习方法

2020 IEEE International Conference on Smart Data Services (SMDS)

Pub Date : 2020-10-01 DOI: 10.1109/SMDS49396.2020.00009

Breno Dantas Cruz, A. Paul, Z. Song, E. Tilevich

As a solution to the sensor data deluge, edge computing processes sensor data by means of local devices. Many of these devices are resource-scarce in terms of the available processing capabilities and battery power. To achieve the required design trade-offs of edge applications, developers must be able to understand the performance and resource utilization of data processing algorithms. An increasing number of edge-based applications use machine learning (ML) as their key functionality. However, the performance and resource utilization of ML algorithms remain poorly understood, thus hindering the system design of edge-based ML applications. In addition, developers often cannot access real-world edge-based test beds during the design phase. To address this problem, we present an approach for estimating the performance of edge-based ML applications, with a particular application to clustering. To that end, we first comprehensively evaluate the performance and resource utilization of widely used clustering algorithms deployed in a representative edge environment. Second, we identify which properties of these algorithms are correlated with their performance and resource utilization. Finally, we apply our findings to create Stargazer, a Deep Neural Network that given a clustering algorithm's computational load and input data size, estimates how this algorithm would perform and utilize resources in an edge-based application. Our tool provides viable decision-making support for addressing the multifaceted design challenges of edge-based ML applications.

作为一种解决传感器数据泛滥的方法，边缘计算通过本地设备处理传感器数据。在可用的处理能力和电池电量方面，这些设备中的许多都是资源稀缺的。为了实现边缘应用程序所需的设计权衡，开发人员必须能够理解数据处理算法的性能和资源利用率。越来越多的基于边缘的应用程序使用机器学习(ML)作为其关键功能。然而，机器学习算法的性能和资源利用仍然知之甚少，从而阻碍了基于边缘的机器学习应用的系统设计。此外，在设计阶段，开发人员通常无法访问真实的基于边缘的测试平台。为了解决这个问题，我们提出了一种方法来估计基于边缘的机器学习应用程序的性能，并对集群进行了特定的应用。为此，我们首先全面评估了在代表性边缘环境中部署的广泛使用的聚类算法的性能和资源利用率。其次，我们确定这些算法的哪些属性与其性能和资源利用率相关。最后，我们将我们的发现应用于创建Stargazer，这是一个深度神经网络，给定聚类算法的计算负载和输入数据大小，估计该算法在基于边缘的应用程序中如何执行和利用资源。我们的工具为解决基于边缘的机器学习应用程序的多方面设计挑战提供了可行的决策支持。

{"title":"Stargazer: A Deep Learning Approach for Estimating the Performance of Edge- Based Clustering Applications","authors":"Breno Dantas Cruz, A. Paul, Z. Song, E. Tilevich","doi":"10.1109/SMDS49396.2020.00009","DOIUrl":"https://doi.org/10.1109/SMDS49396.2020.00009","url":null,"abstract":"As a solution to the sensor data deluge, edge computing processes sensor data by means of local devices. Many of these devices are resource-scarce in terms of the available processing capabilities and battery power. To achieve the required design trade-offs of edge applications, developers must be able to understand the performance and resource utilization of data processing algorithms. An increasing number of edge-based applications use machine learning (ML) as their key functionality. However, the performance and resource utilization of ML algorithms remain poorly understood, thus hindering the system design of edge-based ML applications. In addition, developers often cannot access real-world edge-based test beds during the design phase. To address this problem, we present an approach for estimating the performance of edge-based ML applications, with a particular application to clustering. To that end, we first comprehensively evaluate the performance and resource utilization of widely used clustering algorithms deployed in a representative edge environment. Second, we identify which properties of these algorithms are correlated with their performance and resource utilization. Finally, we apply our findings to create Stargazer, a Deep Neural Network that given a clustering algorithm's computational load and input data size, estimates how this algorithm would perform and utilize resources in an edge-based application. Our tool provides viable decision-making support for addressing the multifaceted design challenges of edge-based ML applications.","PeriodicalId":385149,"journal":{"name":"2020 IEEE International Conference on Smart Data Services (SMDS)","volume":"149 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122783546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

M2NN: Rare Event Inference through Multi-variate Multi-scale Attention

2020 IEEE International Conference on Smart Data Services (SMDS)

Pub Date : 2020-10-01 DOI: 10.1109/SMDS49396.2020.00014

Manjusha Ravindranath, K. Candan, M. Sapino

With the increasing availability of sensory data, inferring the existence of relevant events in the observations is becoming a critical task for smart data service delivery in applications that rely on such data sources. Yet, existing solutions tend to fail when the events that are being inferred are rare, for instance when one attempts to infer seizure events in electroencephalogram (EEG) data. In this paper, we note that multi-variate time series often carry robust localized multi-variate temporal features that could, at least in theory, help identify these events; however, the lack of sufficient data to train for these events make it impossible for neural architectures to identify and make use of these features. To tackle this challenge, we propose an LSTM-based neural architecture, M2N N, with an attention mechanism that leverages robust multivariate temporal features that are extracted a priori and fed into the NN as a side information. In particular, multi-variate temporal features are extracted by simultaneously considering, at multiple scales, temporal characteristics of the time series along with external knowledge, including variate relationships that are known a priori. We then show that a single layer LSTM with dual-layer attention that leverages these multi-scale, multi-variate features provides significant gains in rare seizure detection on EEG data. In addition, in order to illustrate the broader applicability (and reproducibility) of M2N N, we also evaluate it in other publicly available rare event detection tasks, such as anomaly detection in manufacturing. We further show that the proposed M2N N technique is beneficial in tackling more traditional inference problems, such as travel-time prediction, where rare accident events can cause congestions.

随着传感数据可用性的增加，推断观测中相关事件的存在正成为依赖此类数据源的应用程序中智能数据服务交付的关键任务。然而，当推断的事件很罕见时，现有的解决方案往往会失败，例如，当试图从脑电图(EEG)数据中推断癫痫事件时。在本文中，我们注意到多变量时间序列通常带有鲁棒的局部多变量时间特征，至少在理论上，这些特征可以帮助识别这些事件;然而，由于缺乏足够的数据来训练这些事件，使得神经结构无法识别和利用这些特征。为了应对这一挑战，我们提出了一种基于lstm的神经结构m2nn，其注意机制利用了先验提取的鲁棒多元时间特征，并将其作为副信息输入到神经网络中。特别是，通过在多个尺度上同时考虑时间序列的时间特征以及外部知识(包括已知的先验变量关系)来提取多变量时间特征。然后，我们证明了利用这些多尺度、多变量特征的单层LSTM具有双层注意力，在脑电图数据的罕见癫痫检测中具有显着的增益。此外，为了说明m2nn的更广泛的适用性(和可重复性)，我们还在其他公开可用的罕见事件检测任务中对其进行了评估，例如制造中的异常检测。我们进一步表明，所提出的m2nn技术有利于解决更传统的推理问题，例如旅行时间预测，其中罕见的事故事件可能导致拥堵。

{"title":"M2NN: Rare Event Inference through Multi-variate Multi-scale Attention","authors":"Manjusha Ravindranath, K. Candan, M. Sapino","doi":"10.1109/SMDS49396.2020.00014","DOIUrl":"https://doi.org/10.1109/SMDS49396.2020.00014","url":null,"abstract":"With the increasing availability of sensory data, inferring the existence of relevant events in the observations is becoming a critical task for smart data service delivery in applications that rely on such data sources. Yet, existing solutions tend to fail when the events that are being inferred are rare, for instance when one attempts to infer seizure events in electroencephalogram (EEG) data. In this paper, we note that multi-variate time series often carry robust localized multi-variate temporal features that could, at least in theory, help identify these events; however, the lack of sufficient data to train for these events make it impossible for neural architectures to identify and make use of these features. To tackle this challenge, we propose an LSTM-based neural architecture, M2N N, with an attention mechanism that leverages robust multivariate temporal features that are extracted a priori and fed into the NN as a side information. In particular, multi-variate temporal features are extracted by simultaneously considering, at multiple scales, temporal characteristics of the time series along with external knowledge, including variate relationships that are known a priori. We then show that a single layer LSTM with dual-layer attention that leverages these multi-scale, multi-variate features provides significant gains in rare seizure detection on EEG data. In addition, in order to illustrate the broader applicability (and reproducibility) of M2N N, we also evaluate it in other publicly available rare event detection tasks, such as anomaly detection in manufacturing. We further show that the proposed M2N N technique is beneficial in tackling more traditional inference problems, such as travel-time prediction, where rare accident events can cause congestions.","PeriodicalId":385149,"journal":{"name":"2020 IEEE International Conference on Smart Data Services (SMDS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130737372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Conflict-Free Replicated Relations for Multi-Synchronous Database Management at Edge 边缘多同步数据库管理的无冲突复制关系

2020 IEEE International Conference on Smart Data Services (SMDS)

Pub Date : 2020-10-01 DOI: 10.1109/SMDS49396.2020.00021

Weihai Yu, C. Ignat

In a cloud-edge environment, edge devices may not always be connected to the network. Still, applications may need to access the data on edge devices even when they are not connected. With support for multi-synchronous access, data on an edge device are kept synchronous with the data in the cloud as long as the device is online. When the device is offline, the application can still access the data on the device, asynchronously with concurrent data updates either in the cloud or on other edge devices. Conflict-free Replicated Data Types (CRDTs) emerged as a technology for multi-synchronous data access. CRDTs guarantee that when all sites have applied the same set of updates, the replicated data converge. However, CRDTs have not been successfully applied to relational databases (RDBs) for multi-synchronous access. In this paper, we present Conflict-free Replicated Relations (CRRs) that apply CRDTs to RDBs for support of multi-synchronous data access. With CRR, existing RDB applications, with very little modification, can be enhanced with multi-synchronous access. We also present a prototype implementation of CRR with some preliminary performance results.

在云边缘环境中，边缘设备可能并不总是连接到网络。尽管如此，应用程序可能需要访问边缘设备上的数据，即使它们没有连接。通过支持多同步访问，只要设备在线，边缘设备上的数据就会与云中的数据保持同步。当设备脱机时，应用程序仍然可以访问设备上的数据，在云中或其他边缘设备上异步地进行并发数据更新。无冲突复制数据类型(crdt)是一种用于多同步数据访问的技术。crdt保证当所有站点都应用了相同的更新集时，复制的数据会收敛。然而，crdt还没有成功地应用于关系数据库(rdb)的多同步访问。在本文中，我们提出了一种无冲突复制关系(crr)，它将crdt应用于rdb，以支持多同步数据访问。有了CRR，现有的RDB应用程序只需要很少的修改，就可以通过多同步访问得到增强。我们还提出了一个CRR的原型实现和一些初步的性能结果。

{"title":"Conflict-Free Replicated Relations for Multi-Synchronous Database Management at Edge","authors":"Weihai Yu, C. Ignat","doi":"10.1109/SMDS49396.2020.00021","DOIUrl":"https://doi.org/10.1109/SMDS49396.2020.00021","url":null,"abstract":"In a cloud-edge environment, edge devices may not always be connected to the network. Still, applications may need to access the data on edge devices even when they are not connected. With support for multi-synchronous access, data on an edge device are kept synchronous with the data in the cloud as long as the device is online. When the device is offline, the application can still access the data on the device, asynchronously with concurrent data updates either in the cloud or on other edge devices. Conflict-free Replicated Data Types (CRDTs) emerged as a technology for multi-synchronous data access. CRDTs guarantee that when all sites have applied the same set of updates, the replicated data converge. However, CRDTs have not been successfully applied to relational databases (RDBs) for multi-synchronous access. In this paper, we present Conflict-free Replicated Relations (CRRs) that apply CRDTs to RDBs for support of multi-synchronous data access. With CRR, existing RDB applications, with very little modification, can be enhanced with multi-synchronous access. We also present a prototype implementation of CRR with some preliminary performance results.","PeriodicalId":385149,"journal":{"name":"2020 IEEE International Conference on Smart Data Services (SMDS)","volume":"44 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133453940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

BC-Sketch: A Simple Reversible Sketch for Detecting Network Anomalies BC-Sketch:用于检测网络异常的简单可逆草图

2020 IEEE International Conference on Smart Data Services (SMDS)

Pub Date : 2020-10-01 DOI: 10.1109/SMDS49396.2020.00012

Feng Wang, Yongning Tang, Lixin Gao, Guang Cheng

As 5G/IoT networks constantly growing and evolving, proliferated network traffic bring an unprecedented challenge to detecting and identifying flow anomalies, such as heavy hitters, heavy changes and superspreaders. Many flow data analytics have been proposed to tackle the problem. Sketch-based approaches are the most commonly used flow analytics service, in which a compressed data structure is used to keep a summary of the original data and estimate traffic statistics such as flow size for all traffic flows. However, those approaches either induce information losses due to sampling or incur computational and space overheads for key recovery. In this paper, we propose a new lightweight traffic analytics service, called BC-sketch, for faster and more accurate detection of heavy keys using very small number of counters. BC-sketch provides reversible sketch using an extensible data structure designed to accommodate different sketch-based solutions. BC-sketch can be efficiently provisioned as a traffic analytics service in resource constrained IoT devices, or integrated to various virtual network environments as a virtual service to detect heavy hitter, superspreader and heavy change. To demonstrate its effectiveness, we use BC-sketch to detect heavy hitters, superspreaders, and heavy changes. Both theoretical analysis and experimental evaluations show that BC-sketch can provide higher precision for identifying those traffic anomalies with low memory and computational overheads.

随着5G/物联网网络的不断发展和演进，激增的网络流量对流量异常的检测和识别带来了前所未有的挑战，如大打击者、大变化者和超传播者。为了解决这个问题，人们提出了许多流量数据分析方法。基于草图的方法是最常用的流量分析服务，其中使用压缩的数据结构来保留原始数据的摘要，并估计所有交通流的流量大小等交通统计数据。然而，这些方法要么会由于采样而导致信息丢失，要么会导致密钥恢复的计算和空间开销。在本文中，我们提出了一种新的轻量级流量分析服务，称为BC-sketch，用于使用非常少的计数器更快，更准确地检测重键。BC-sketch使用可扩展的数据结构提供可逆的草图，以适应不同的基于草图的解决方案。BC-sketch可以在资源受限的物联网设备中高效配置为流量分析服务，也可以作为虚拟服务集成到各种虚拟网络环境中，检测重磅、超传播者和重变化。为了证明它的有效性，我们使用BC-sketch来检测重磅炸弹、超级传播者和重大变化。理论分析和实验评价表明，BC-sketch在低内存和低计算开销的情况下，能够提供较高的识别精度。

{"title":"BC-Sketch: A Simple Reversible Sketch for Detecting Network Anomalies","authors":"Feng Wang, Yongning Tang, Lixin Gao, Guang Cheng","doi":"10.1109/SMDS49396.2020.00012","DOIUrl":"https://doi.org/10.1109/SMDS49396.2020.00012","url":null,"abstract":"As 5G/IoT networks constantly growing and evolving, proliferated network traffic bring an unprecedented challenge to detecting and identifying flow anomalies, such as heavy hitters, heavy changes and superspreaders. Many flow data analytics have been proposed to tackle the problem. Sketch-based approaches are the most commonly used flow analytics service, in which a compressed data structure is used to keep a summary of the original data and estimate traffic statistics such as flow size for all traffic flows. However, those approaches either induce information losses due to sampling or incur computational and space overheads for key recovery. In this paper, we propose a new lightweight traffic analytics service, called BC-sketch, for faster and more accurate detection of heavy keys using very small number of counters. BC-sketch provides reversible sketch using an extensible data structure designed to accommodate different sketch-based solutions. BC-sketch can be efficiently provisioned as a traffic analytics service in resource constrained IoT devices, or integrated to various virtual network environments as a virtual service to detect heavy hitter, superspreader and heavy change. To demonstrate its effectiveness, we use BC-sketch to detect heavy hitters, superspreaders, and heavy changes. Both theoretical analysis and experimental evaluations show that BC-sketch can provide higher precision for identifying those traffic anomalies with low memory and computational overheads.","PeriodicalId":385149,"journal":{"name":"2020 IEEE International Conference on Smart Data Services (SMDS)","volume":"183 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115074832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CNN Approaches to Classify Multivariate Time Series Using Class-specific Features 使用类别特定特征对多元时间序列进行分类的CNN方法

2020 IEEE International Conference on Smart Data Services (SMDS)

Pub Date : 2020-10-01 DOI: 10.1109/SMDS49396.2020.00008

Yifan Hao, H. Cao, Erick Draayer

Many smart data services (e.g., smart energy, smart homes) collect and utilize time series data (e.g., energy production and consumption, human body movement) to conduct data analysis. Among such analysis tasks, classification is a widely utilized technique to provide data-driven solutions. Most existing classification methods extract a single set of features from the data and use this feature set for classification across multiple classes. This often ignores the reality that different and class-specific subsets of the initial feature set may better facilitate classification. In this paper, we propose two convolutional neural network (CNN) models using class-specific variables to solve the multi-class classification problem over multivariate time series (MTS) data. A new loss function is introduced for training the CNN models. We compare our proposed methods with 13 baseline approaches using 14 real datasets. The extensive experimental results show that our new approaches can not only outperform other methods on classification accuracy, but also successfully identify important class-specific variables.

许多智能数据服务(如智能能源、智能家居)收集和利用时间序列数据(如能源生产和消耗、人体运动)进行数据分析。在这些分析任务中，分类是一种广泛使用的技术，用于提供数据驱动的解决方案。大多数现有的分类方法从数据中提取一组特征，并使用该特征集跨多个类进行分类。这往往忽略了一个事实，即初始特征集的不同和特定于类的子集可能更有利于分类。在本文中，我们提出了两个卷积神经网络(CNN)模型，使用特定于类别的变量来解决多变量时间序列(MTS)数据上的多类别分类问题。引入了一种新的损失函数来训练CNN模型。我们将我们提出的方法与使用14个真实数据集的13种基线方法进行了比较。大量的实验结果表明，我们的新方法不仅在分类精度上优于其他方法，而且能够成功地识别出重要的类特定变量。

引用次数: 1

SMDS 2020 Organizing Committee SMDS 2020组委会

2020 IEEE International Conference on Smart Data Services (SMDS)

Pub Date : 2020-10-01 DOI: 10.1109/smds49396.2020.00025

Mohand-Saïd Hacid, Zhaohui Wu, Laurence T. Yang, Hongbing Wang, Nabil El Ioini, Kenneth Fletcher, M. Gergatsoulis, Daniel Grosu, Jin-Kao Hao, M. Sapino, Huasong Shan, Kurt Tutschku, Sudharshan S. Vazhkudai, M. A. Vega-Rodríguez, S. Ventura, Jian Wang, Haohua Wang, Jianwu Wang, Shangguang Wang

Program Committee Amani Abu Jabal, Purdue University Jacky Akoka, CEDRIC-CNAM & IMT-TEM Mohsen Amini Salehi, University of Louisiana Lafayette Rui Araujo, University of Coimbra Claudio Ardagna, Universita' degli Studi di Milano Mohan Baruwal Chhetri, CSIRO Paolo Bellavista, University of Bologna Nik Bessis, Edge Hill University Frank Blaauw, University of Groningen Luca Cagliero, Politecnico di Torino Jian Cao, Shanghai Jiao Tong University Chia-Hui Chang, National Central University Feng Chen, Louisiana State University Tao Chen, Loughborough University Yong Chen, Tianjin University Shizhan Chen, Tianjin University Lisi Chen, Hong Kong Baptist University Bo Cheng, Beijing University of Posts & Telecommunications Lizhen Cui, Shandong University Edward Curry, NUI Galway Harshad Deshmukh, Google Sheng Di, ANL Zhijun Ding, Tongji University Weilong Ding, North China University of Technology Mario Jose Divan, UNLPam Schahram Dustdar, Vienna University of Technology Nabil El Ioini Kenneth Fletcher, University of Massachusetts Boston Matthew Forshaw, Newcastle University Mohamed Gaber, Birmingham City University Mikel Galar, Universidad Pública de Navarra Mengmeng Ge, Deakin University

项目委员会Amani Abu Jabal，普度大学Jacky Akoka, cecic - cnam & IMT-TEM Mohsen Amini Salehi，路易斯安那大学Lafayette Rui Araujo，科英布拉大学Claudio Ardagna，米兰理工大学Mohan Baruwal Chhetri, CSIRO Paolo Bellavista，博洛尼亚大学Nik Bessis，边山大学Frank Blaauw，格罗宁根大学Luca Cagliero，都灵理工大学Jian Cao，上海交通大学张嘉辉，中央大学陈峰，路易斯安那州立大学陈涛、拉夫堡大学陈勇、天津大学陈世展、天津大学陈丽思、香港浸会大学程博、北京邮电大学崔丽真、山东大学Edward Curry、NUI Galway Harshad Deshmukh、谷歌迪胜、ANL丁志军、同济大学丁伟龙、华北工业大学Mario Jose Divan、UNLPam Schahram Dustdar、维也纳科技大学Nabil El Ioini Kenneth Fletcher，马萨诸塞大学波士顿分校Matthew Forshaw，纽卡斯尔大学Mohamed Gaber，伯明翰城市大学Mikel Galar，纳瓦拉大学Pública de Navarra Mengmeng Ge，迪肯大学

{"title":"SMDS 2020 Organizing Committee","authors":"Mohand-Saïd Hacid, Zhaohui Wu, Laurence T. Yang, Hongbing Wang, Nabil El Ioini, Kenneth Fletcher, M. Gergatsoulis, Daniel Grosu, Jin-Kao Hao, M. Sapino, Huasong Shan, Kurt Tutschku, Sudharshan S. Vazhkudai, M. A. Vega-Rodríguez, S. Ventura, Jian Wang, Haohua Wang, Jianwu Wang, Shangguang Wang","doi":"10.1109/smds49396.2020.00025","DOIUrl":"https://doi.org/10.1109/smds49396.2020.00025","url":null,"abstract":"Program Committee Amani Abu Jabal, Purdue University Jacky Akoka, CEDRIC-CNAM & IMT-TEM Mohsen Amini Salehi, University of Louisiana Lafayette Rui Araujo, University of Coimbra Claudio Ardagna, Universita' degli Studi di Milano Mohan Baruwal Chhetri, CSIRO Paolo Bellavista, University of Bologna Nik Bessis, Edge Hill University Frank Blaauw, University of Groningen Luca Cagliero, Politecnico di Torino Jian Cao, Shanghai Jiao Tong University Chia-Hui Chang, National Central University Feng Chen, Louisiana State University Tao Chen, Loughborough University Yong Chen, Tianjin University Shizhan Chen, Tianjin University Lisi Chen, Hong Kong Baptist University Bo Cheng, Beijing University of Posts & Telecommunications Lizhen Cui, Shandong University Edward Curry, NUI Galway Harshad Deshmukh, Google Sheng Di, ANL Zhijun Ding, Tongji University Weilong Ding, North China University of Technology Mario Jose Divan, UNLPam Schahram Dustdar, Vienna University of Technology Nabil El Ioini Kenneth Fletcher, University of Massachusetts Boston Matthew Forshaw, Newcastle University Mohamed Gaber, Birmingham City University Mikel Galar, Universidad Pública de Navarra Mengmeng Ge, Deakin University","PeriodicalId":385149,"journal":{"name":"2020 IEEE International Conference on Smart Data Services (SMDS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125610472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

S3QLRDF: Property Table Partitioning Scheme for Distributed SPARQL Querying of large-scale RDF data 用于大规模RDF数据的分布式SPARQL查询的属性表分区方案

2020 IEEE International Conference on Smart Data Services (SMDS)

Pub Date : 2020-10-01 DOI: 10.1109/SMDS49396.2020.00023

Mahmudul Hassan, S. Bansal

The proliferation of the semantic web in the form of Resource Description Framework (RDF) demands an efficient, scalable, and distributed storage along with a highly available and fault-tolerant parallel processing strategy. More precisely, the rapid growth of RDF data raises the need for an efficient partitioning strategy over distributed data management systems to improve SPARQL query performance regardless of its pattern shape with minimized pre-processing time. In this context, we propose a new relational partitioning scheme called Property Table Partitioning (PTP) for RDF data, that further partitions existing Property Table into multiple tables based on distinct properties (comprising of all subjects with non-null values for those distinct properties) in order to minimize input data and join operations. In this paper, we introduce a distributed RDF data management system called S3QLRDF, which is built on top of Spark and utilizes SQL to execute SPARQL queries over PTP schema. We perform an extensive experimental evaluation with respect to preprocessing costs and query performance, using Lehigh University Benchmark (LUBM) and Waterloo SPARQL Diversity Test Suite (WatDiv) datasets with up to 1.4 billion triples. Our results demonstrate that S3QLRDF outperforms state-of-the-art distributed RDF management systems.

以资源描述框架(Resource Description Framework, RDF)形式出现的语义web的激增需要高效、可伸缩的分布式存储以及高可用性和容错的并行处理策略。更准确地说，RDF数据的快速增长提出了对分布式数据管理系统的高效分区策略的需求，以提高SPARQL查询性能，无论其模式形状如何，同时尽量减少预处理时间。在这种情况下，我们提出了一种新的关系分区方案，称为RDF数据的属性表分区(Property Table partitioning, PTP)，该方案根据不同的属性(包括所有具有这些不同属性的非空值的主题)将现有的属性表进一步划分为多个表，以尽量减少输入数据和连接操作。在本文中，我们介绍了一个名为S3QLRDF的分布式RDF数据管理系统，它建立在Spark之上，并利用SQL在PTP模式上执行SPARQL查询。我们使用Lehigh University Benchmark (LUBM)和Waterloo SPARQL Diversity Test Suite (WatDiv)数据集，对预处理成本和查询性能进行了广泛的实验评估，其中包含多达14亿个三元组。我们的结果表明，S3QLRDF优于最先进的分布式RDF管理系统。

{"title":"S3QLRDF: Property Table Partitioning Scheme for Distributed SPARQL Querying of large-scale RDF data","authors":"Mahmudul Hassan, S. Bansal","doi":"10.1109/SMDS49396.2020.00023","DOIUrl":"https://doi.org/10.1109/SMDS49396.2020.00023","url":null,"abstract":"The proliferation of the semantic web in the form of Resource Description Framework (RDF) demands an efficient, scalable, and distributed storage along with a highly available and fault-tolerant parallel processing strategy. More precisely, the rapid growth of RDF data raises the need for an efficient partitioning strategy over distributed data management systems to improve SPARQL query performance regardless of its pattern shape with minimized pre-processing time. In this context, we propose a new relational partitioning scheme called Property Table Partitioning (PTP) for RDF data, that further partitions existing Property Table into multiple tables based on distinct properties (comprising of all subjects with non-null values for those distinct properties) in order to minimize input data and join operations. In this paper, we introduce a distributed RDF data management system called S3QLRDF, which is built on top of Spark and utilizes SQL to execute SPARQL queries over PTP schema. We perform an extensive experimental evaluation with respect to preprocessing costs and query performance, using Lehigh University Benchmark (LUBM) and Waterloo SPARQL Diversity Test Suite (WatDiv) datasets with up to 1.4 billion triples. Our results demonstrate that S3QLRDF outperforms state-of-the-art distributed RDF management systems.","PeriodicalId":385149,"journal":{"name":"2020 IEEE International Conference on Smart Data Services (SMDS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114368927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Real-time System for Short- and Long-Term Prediction of Vehicle Flow 车辆流量的短期和长期实时预测系统

2020 IEEE International Conference on Smart Data Services (SMDS)

Pub Date : 2020-10-01 DOI: 10.1109/SMDS49396.2020.00019

S. Bilotta, P. Nesi, I. Paoli

Nowadays, traffic management and sustainable mobility are becoming one of the central topics for intelligent transportation systems (ITS). Thanks to the today's technologies, it is possible to collect real-time data to monitor the traffic situation in some specific areas. An important challenge in ITS is the ability to predict road traffic variables. The short-term predictions of traffic aspects are a complex nonlinear task that has been the subject of many research efforts in the past few decades. Accessing to precise traffic flow data is mandatory for a large number of applications which have to guarantee high level of services such as: traffic flow reconstruction, which in turn is used to perform what-if analysis, conditioned routing, etc. They have to be reliable and precise for sending rescue teams and fire brigades. This paper proposes a solution for a short- and long-term traffic flow prediction estimation by using and comparing a number of machine learning approaches. The solution has been developed in the context of Sii-Mobility smart city mobility and transport national project and it is in use in other EC projects and solution such as Snap4City PCP EC and TRAFAIR CEF, but also for REPLICATE H2020 SCC1 and control room in Florence area.

当前，交通管理和可持续移动正成为智能交通系统的核心议题之一。由于今天的技术，可以收集实时数据来监控某些特定区域的交通状况。智能交通系统的一个重要挑战是预测道路交通变量的能力。交通方面的短期预测是一个复杂的非线性问题，在过去的几十年里一直是许多研究的主题。访问精确的交通流数据对于大量应用程序来说是必须的，这些应用程序必须保证高水平的服务，例如:交通流重建，这反过来又用于执行假设分析，条件路由等。为了派遣救援队和消防队，它们必须可靠和精确。本文通过使用和比较多种机器学习方法，提出了一种短期和长期交通流量预测估计的解决方案。该解决方案是在Sii-Mobility智慧城市交通和运输国家项目的背景下开发的，它被用于其他EC项目和解决方案，如Snap4City PCP EC和traair CEF，也用于佛罗伦萨地区的复制H2020 SCC1和控制室。

{"title":"Real-time System for Short- and Long-Term Prediction of Vehicle Flow","authors":"S. Bilotta, P. Nesi, I. Paoli","doi":"10.1109/SMDS49396.2020.00019","DOIUrl":"https://doi.org/10.1109/SMDS49396.2020.00019","url":null,"abstract":"Nowadays, traffic management and sustainable mobility are becoming one of the central topics for intelligent transportation systems (ITS). Thanks to the today's technologies, it is possible to collect real-time data to monitor the traffic situation in some specific areas. An important challenge in ITS is the ability to predict road traffic variables. The short-term predictions of traffic aspects are a complex nonlinear task that has been the subject of many research efforts in the past few decades. Accessing to precise traffic flow data is mandatory for a large number of applications which have to guarantee high level of services such as: traffic flow reconstruction, which in turn is used to perform what-if analysis, conditioned routing, etc. They have to be reliable and precise for sending rescue teams and fire brigades. This paper proposes a solution for a short- and long-term traffic flow prediction estimation by using and comparing a number of machine learning approaches. The solution has been developed in the context of Sii-Mobility smart city mobility and transport national project and it is in use in other EC projects and solution such as Snap4City PCP EC and TRAFAIR CEF, but also for REPLICATE H2020 SCC1 and control room in Florence area.","PeriodicalId":385149,"journal":{"name":"2020 IEEE International Conference on Smart Data Services (SMDS)","volume":"406 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123067928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A Latent Feelings-aware RNN Model for User Churn Prediction with only Behaviour data 基于行为数据的用户流失预测的潜在情感感知RNN模型

2020 IEEE International Conference on Smart Data Services (SMDS)

Pub Date : 2020-10-01 DOI: 10.1109/SMDS49396.2020.00011

Meng Xi, Zhiling Luo, Naibo Wang, Jianrong Tao, Ying Li, Jianwei Yin

User Churn Prediction is a cutting-edge research area in the web service industry, it is the key for managing the user in the virtual world and provide feedback information for improving the corresponding web service. At present, most of the relevant work is to design a questionnaire to collect data of users' characteristics and feelings and then develop a general model by finding relevance. However, that kind of methods requires quite a time and manpower, and most web services can only obtain logs of users' behaviours and have no access to users' feature data. Therefore, it is a big challenge to conduct user churn prediction with only behavior data and get users' latent feelings from their action data in order to improve the accuracy of churn prediction. In this paper, a novel Latent Feelings-aware RNN model, namely LaFee, has been proposed to solve the user churn prediction problem by using only behaviour data. The latent feelings, proven to be satisfaction and aspiration, can be estimated through the intermediate variable of the trained LaFee. We also designed experiments on a real dataset and the results show that our methods outperform the baselines.

用户流失预测是web服务行业的一个前沿研究领域，它是对虚拟世界中的用户进行管理并为改进相应的web服务提供反馈信息的关键。目前，大部分的相关工作都是设计一份问卷，收集用户的特征和感受的数据，然后通过寻找相关性来建立一个通用的模型。然而，这种方法需要耗费大量的时间和人力，而且大多数web服务只能获取用户行为的日志，无法访问用户的特征数据。因此，仅凭行为数据进行用户流失预测，并从用户的行为数据中获取用户的潜在感受，以提高用户流失预测的准确性是一个很大的挑战。本文提出了一种新的潜在情感感知RNN模型LaFee，该模型仅使用行为数据来解决用户流失预测问题。潜在的感受，证明是满意和愿望，可以通过中间变量的训练LaFee估计。我们还在真实数据集上设计了实验，结果表明我们的方法优于基线。

{"title":"A Latent Feelings-aware RNN Model for User Churn Prediction with only Behaviour data","authors":"Meng Xi, Zhiling Luo, Naibo Wang, Jianrong Tao, Ying Li, Jianwei Yin","doi":"10.1109/SMDS49396.2020.00011","DOIUrl":"https://doi.org/10.1109/SMDS49396.2020.00011","url":null,"abstract":"User Churn Prediction is a cutting-edge research area in the web service industry, it is the key for managing the user in the virtual world and provide feedback information for improving the corresponding web service. At present, most of the relevant work is to design a questionnaire to collect data of users' characteristics and feelings and then develop a general model by finding relevance. However, that kind of methods requires quite a time and manpower, and most web services can only obtain logs of users' behaviours and have no access to users' feature data. Therefore, it is a big challenge to conduct user churn prediction with only behavior data and get users' latent feelings from their action data in order to improve the accuracy of churn prediction. In this paper, a novel Latent Feelings-aware RNN model, namely LaFee, has been proposed to solve the user churn prediction problem by using only behaviour data. The latent feelings, proven to be satisfaction and aspiration, can be estimated through the intermediate variable of the trained LaFee. We also designed experiments on a real dataset and the results show that our methods outperform the baselines.","PeriodicalId":385149,"journal":{"name":"2020 IEEE International Conference on Smart Data Services (SMDS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130177995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

EdgeInfer: Robust Truth Inference under Data Poisoning Attack EdgeInfer:数据中毒攻击下的鲁棒真值推断

2020 IEEE International Conference on Smart Data Services (SMDS)

Pub Date : 2020-10-01 DOI: 10.1109/SMDS49396.2020.00013

Farnaz Tahmasebian, Li Xiong, Mani Sotoodeh, V. Sunderam

As crowdsourcing is becoming more widely used for annotating data from a large group of users, attackers have strong incentives to manipulate the system. Deriving the true answer of tasks in crowdsourcing systems based on user-provided data is susceptible to data poisoning attacks, whereby malicious users may intentionally or strategically report incorrect information to mislead the system into inferring the wrong truth for a set of tasks. Recent work has proposed several attacks on the crowdsourcing systems and showed that existing truth inference methods may be vulnerable to such attacks. In this paper, we propose solutions to enhance the robustness of existing truth inference methods. Our solutions base on 1) detecting and augmenting the answers for the boundary tasks in which users could not reach a strong consensus and hence are subjective to potential manipulation, and 2) enhancing inference method with a stronger prior. We empirically evaluate these defense mechanisms by designing attack scenarios that aim to decrease the accuracy of the system. Experiments show that our method is effective and significantly improves the robustness of the system under attack.

随着众包越来越广泛地用于注释来自一大群用户的数据，攻击者有强烈的动机操纵系统。在众包系统中，基于用户提供的数据得出任务的真实答案容易受到数据中毒攻击，恶意用户可能有意或有策略地报告不正确的信息，误导系统对一组任务推断出错误的真相。最近的研究提出了几种针对众包系统的攻击，并表明现有的真相推理方法可能容易受到这种攻击。在本文中，我们提出了增强现有真值推理方法的鲁棒性的解决方案。我们的解决方案基于1)检测和增强用户无法达成强烈共识的边界任务的答案，因此对潜在的操纵是主观的;2)用更强的先验增强推理方法。我们通过设计旨在降低系统准确性的攻击场景来经验地评估这些防御机制。实验表明，该方法是有效的，显著提高了系统在攻击下的鲁棒性。

{"title":"EdgeInfer: Robust Truth Inference under Data Poisoning Attack","authors":"Farnaz Tahmasebian, Li Xiong, Mani Sotoodeh, V. Sunderam","doi":"10.1109/SMDS49396.2020.00013","DOIUrl":"https://doi.org/10.1109/SMDS49396.2020.00013","url":null,"abstract":"As crowdsourcing is becoming more widely used for annotating data from a large group of users, attackers have strong incentives to manipulate the system. Deriving the true answer of tasks in crowdsourcing systems based on user-provided data is susceptible to data poisoning attacks, whereby malicious users may intentionally or strategically report incorrect information to mislead the system into inferring the wrong truth for a set of tasks. Recent work has proposed several attacks on the crowdsourcing systems and showed that existing truth inference methods may be vulnerable to such attacks. In this paper, we propose solutions to enhance the robustness of existing truth inference methods. Our solutions base on 1) detecting and augmenting the answers for the boundary tasks in which users could not reach a strong consensus and hence are subjective to potential manipulation, and 2) enhancing inference method with a stronger prior. We empirically evaluate these defense mechanisms by designing attack scenarios that aim to decrease the accuracy of the system. Experiments show that our method is effective and significantly improves the robustness of the system under attack.","PeriodicalId":385149,"journal":{"name":"2020 IEEE International Conference on Smart Data Services (SMDS)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126600721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3