Pub Date : 2020-10-01DOI: 10.1109/SMDS49396.2020.00009
Breno Dantas Cruz, A. Paul, Z. Song, E. Tilevich
As a solution to the sensor data deluge, edge computing processes sensor data by means of local devices. Many of these devices are resource-scarce in terms of the available processing capabilities and battery power. To achieve the required design trade-offs of edge applications, developers must be able to understand the performance and resource utilization of data processing algorithms. An increasing number of edge-based applications use machine learning (ML) as their key functionality. However, the performance and resource utilization of ML algorithms remain poorly understood, thus hindering the system design of edge-based ML applications. In addition, developers often cannot access real-world edge-based test beds during the design phase. To address this problem, we present an approach for estimating the performance of edge-based ML applications, with a particular application to clustering. To that end, we first comprehensively evaluate the performance and resource utilization of widely used clustering algorithms deployed in a representative edge environment. Second, we identify which properties of these algorithms are correlated with their performance and resource utilization. Finally, we apply our findings to create Stargazer, a Deep Neural Network that given a clustering algorithm's computational load and input data size, estimates how this algorithm would perform and utilize resources in an edge-based application. Our tool provides viable decision-making support for addressing the multifaceted design challenges of edge-based ML applications.
{"title":"Stargazer: A Deep Learning Approach for Estimating the Performance of Edge- Based Clustering Applications","authors":"Breno Dantas Cruz, A. Paul, Z. Song, E. Tilevich","doi":"10.1109/SMDS49396.2020.00009","DOIUrl":"https://doi.org/10.1109/SMDS49396.2020.00009","url":null,"abstract":"As a solution to the sensor data deluge, edge computing processes sensor data by means of local devices. Many of these devices are resource-scarce in terms of the available processing capabilities and battery power. To achieve the required design trade-offs of edge applications, developers must be able to understand the performance and resource utilization of data processing algorithms. An increasing number of edge-based applications use machine learning (ML) as their key functionality. However, the performance and resource utilization of ML algorithms remain poorly understood, thus hindering the system design of edge-based ML applications. In addition, developers often cannot access real-world edge-based test beds during the design phase. To address this problem, we present an approach for estimating the performance of edge-based ML applications, with a particular application to clustering. To that end, we first comprehensively evaluate the performance and resource utilization of widely used clustering algorithms deployed in a representative edge environment. Second, we identify which properties of these algorithms are correlated with their performance and resource utilization. Finally, we apply our findings to create Stargazer, a Deep Neural Network that given a clustering algorithm's computational load and input data size, estimates how this algorithm would perform and utilize resources in an edge-based application. Our tool provides viable decision-making support for addressing the multifaceted design challenges of edge-based ML applications.","PeriodicalId":385149,"journal":{"name":"2020 IEEE International Conference on Smart Data Services (SMDS)","volume":"149 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122783546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01DOI: 10.1109/SMDS49396.2020.00014
Manjusha Ravindranath, K. Candan, M. Sapino
With the increasing availability of sensory data, inferring the existence of relevant events in the observations is becoming a critical task for smart data service delivery in applications that rely on such data sources. Yet, existing solutions tend to fail when the events that are being inferred are rare, for instance when one attempts to infer seizure events in electroencephalogram (EEG) data. In this paper, we note that multi-variate time series often carry robust localized multi-variate temporal features that could, at least in theory, help identify these events; however, the lack of sufficient data to train for these events make it impossible for neural architectures to identify and make use of these features. To tackle this challenge, we propose an LSTM-based neural architecture, M2N N, with an attention mechanism that leverages robust multivariate temporal features that are extracted a priori and fed into the NN as a side information. In particular, multi-variate temporal features are extracted by simultaneously considering, at multiple scales, temporal characteristics of the time series along with external knowledge, including variate relationships that are known a priori. We then show that a single layer LSTM with dual-layer attention that leverages these multi-scale, multi-variate features provides significant gains in rare seizure detection on EEG data. In addition, in order to illustrate the broader applicability (and reproducibility) of M2N N, we also evaluate it in other publicly available rare event detection tasks, such as anomaly detection in manufacturing. We further show that the proposed M2N N technique is beneficial in tackling more traditional inference problems, such as travel-time prediction, where rare accident events can cause congestions.
{"title":"M2NN: Rare Event Inference through Multi-variate Multi-scale Attention","authors":"Manjusha Ravindranath, K. Candan, M. Sapino","doi":"10.1109/SMDS49396.2020.00014","DOIUrl":"https://doi.org/10.1109/SMDS49396.2020.00014","url":null,"abstract":"With the increasing availability of sensory data, inferring the existence of relevant events in the observations is becoming a critical task for smart data service delivery in applications that rely on such data sources. Yet, existing solutions tend to fail when the events that are being inferred are rare, for instance when one attempts to infer seizure events in electroencephalogram (EEG) data. In this paper, we note that multi-variate time series often carry robust localized multi-variate temporal features that could, at least in theory, help identify these events; however, the lack of sufficient data to train for these events make it impossible for neural architectures to identify and make use of these features. To tackle this challenge, we propose an LSTM-based neural architecture, M2N N, with an attention mechanism that leverages robust multivariate temporal features that are extracted a priori and fed into the NN as a side information. In particular, multi-variate temporal features are extracted by simultaneously considering, at multiple scales, temporal characteristics of the time series along with external knowledge, including variate relationships that are known a priori. We then show that a single layer LSTM with dual-layer attention that leverages these multi-scale, multi-variate features provides significant gains in rare seizure detection on EEG data. In addition, in order to illustrate the broader applicability (and reproducibility) of M2N N, we also evaluate it in other publicly available rare event detection tasks, such as anomaly detection in manufacturing. We further show that the proposed M2N N technique is beneficial in tackling more traditional inference problems, such as travel-time prediction, where rare accident events can cause congestions.","PeriodicalId":385149,"journal":{"name":"2020 IEEE International Conference on Smart Data Services (SMDS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130737372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01DOI: 10.1109/SMDS49396.2020.00021
Weihai Yu, C. Ignat
In a cloud-edge environment, edge devices may not always be connected to the network. Still, applications may need to access the data on edge devices even when they are not connected. With support for multi-synchronous access, data on an edge device are kept synchronous with the data in the cloud as long as the device is online. When the device is offline, the application can still access the data on the device, asynchronously with concurrent data updates either in the cloud or on other edge devices. Conflict-free Replicated Data Types (CRDTs) emerged as a technology for multi-synchronous data access. CRDTs guarantee that when all sites have applied the same set of updates, the replicated data converge. However, CRDTs have not been successfully applied to relational databases (RDBs) for multi-synchronous access. In this paper, we present Conflict-free Replicated Relations (CRRs) that apply CRDTs to RDBs for support of multi-synchronous data access. With CRR, existing RDB applications, with very little modification, can be enhanced with multi-synchronous access. We also present a prototype implementation of CRR with some preliminary performance results.
{"title":"Conflict-Free Replicated Relations for Multi-Synchronous Database Management at Edge","authors":"Weihai Yu, C. Ignat","doi":"10.1109/SMDS49396.2020.00021","DOIUrl":"https://doi.org/10.1109/SMDS49396.2020.00021","url":null,"abstract":"In a cloud-edge environment, edge devices may not always be connected to the network. Still, applications may need to access the data on edge devices even when they are not connected. With support for multi-synchronous access, data on an edge device are kept synchronous with the data in the cloud as long as the device is online. When the device is offline, the application can still access the data on the device, asynchronously with concurrent data updates either in the cloud or on other edge devices. Conflict-free Replicated Data Types (CRDTs) emerged as a technology for multi-synchronous data access. CRDTs guarantee that when all sites have applied the same set of updates, the replicated data converge. However, CRDTs have not been successfully applied to relational databases (RDBs) for multi-synchronous access. In this paper, we present Conflict-free Replicated Relations (CRRs) that apply CRDTs to RDBs for support of multi-synchronous data access. With CRR, existing RDB applications, with very little modification, can be enhanced with multi-synchronous access. We also present a prototype implementation of CRR with some preliminary performance results.","PeriodicalId":385149,"journal":{"name":"2020 IEEE International Conference on Smart Data Services (SMDS)","volume":"44 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133453940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01DOI: 10.1109/SMDS49396.2020.00012
Feng Wang, Yongning Tang, Lixin Gao, Guang Cheng
As 5G/IoT networks constantly growing and evolving, proliferated network traffic bring an unprecedented challenge to detecting and identifying flow anomalies, such as heavy hitters, heavy changes and superspreaders. Many flow data analytics have been proposed to tackle the problem. Sketch-based approaches are the most commonly used flow analytics service, in which a compressed data structure is used to keep a summary of the original data and estimate traffic statistics such as flow size for all traffic flows. However, those approaches either induce information losses due to sampling or incur computational and space overheads for key recovery. In this paper, we propose a new lightweight traffic analytics service, called BC-sketch, for faster and more accurate detection of heavy keys using very small number of counters. BC-sketch provides reversible sketch using an extensible data structure designed to accommodate different sketch-based solutions. BC-sketch can be efficiently provisioned as a traffic analytics service in resource constrained IoT devices, or integrated to various virtual network environments as a virtual service to detect heavy hitter, superspreader and heavy change. To demonstrate its effectiveness, we use BC-sketch to detect heavy hitters, superspreaders, and heavy changes. Both theoretical analysis and experimental evaluations show that BC-sketch can provide higher precision for identifying those traffic anomalies with low memory and computational overheads.
{"title":"BC-Sketch: A Simple Reversible Sketch for Detecting Network Anomalies","authors":"Feng Wang, Yongning Tang, Lixin Gao, Guang Cheng","doi":"10.1109/SMDS49396.2020.00012","DOIUrl":"https://doi.org/10.1109/SMDS49396.2020.00012","url":null,"abstract":"As 5G/IoT networks constantly growing and evolving, proliferated network traffic bring an unprecedented challenge to detecting and identifying flow anomalies, such as heavy hitters, heavy changes and superspreaders. Many flow data analytics have been proposed to tackle the problem. Sketch-based approaches are the most commonly used flow analytics service, in which a compressed data structure is used to keep a summary of the original data and estimate traffic statistics such as flow size for all traffic flows. However, those approaches either induce information losses due to sampling or incur computational and space overheads for key recovery. In this paper, we propose a new lightweight traffic analytics service, called BC-sketch, for faster and more accurate detection of heavy keys using very small number of counters. BC-sketch provides reversible sketch using an extensible data structure designed to accommodate different sketch-based solutions. BC-sketch can be efficiently provisioned as a traffic analytics service in resource constrained IoT devices, or integrated to various virtual network environments as a virtual service to detect heavy hitter, superspreader and heavy change. To demonstrate its effectiveness, we use BC-sketch to detect heavy hitters, superspreaders, and heavy changes. Both theoretical analysis and experimental evaluations show that BC-sketch can provide higher precision for identifying those traffic anomalies with low memory and computational overheads.","PeriodicalId":385149,"journal":{"name":"2020 IEEE International Conference on Smart Data Services (SMDS)","volume":"183 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115074832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01DOI: 10.1109/SMDS49396.2020.00008
Yifan Hao, H. Cao, Erick Draayer
Many smart data services (e.g., smart energy, smart homes) collect and utilize time series data (e.g., energy production and consumption, human body movement) to conduct data analysis. Among such analysis tasks, classification is a widely utilized technique to provide data-driven solutions. Most existing classification methods extract a single set of features from the data and use this feature set for classification across multiple classes. This often ignores the reality that different and class-specific subsets of the initial feature set may better facilitate classification. In this paper, we propose two convolutional neural network (CNN) models using class-specific variables to solve the multi-class classification problem over multivariate time series (MTS) data. A new loss function is introduced for training the CNN models. We compare our proposed methods with 13 baseline approaches using 14 real datasets. The extensive experimental results show that our new approaches can not only outperform other methods on classification accuracy, but also successfully identify important class-specific variables.
{"title":"CNN Approaches to Classify Multivariate Time Series Using Class-specific Features","authors":"Yifan Hao, H. Cao, Erick Draayer","doi":"10.1109/SMDS49396.2020.00008","DOIUrl":"https://doi.org/10.1109/SMDS49396.2020.00008","url":null,"abstract":"Many smart data services (e.g., smart energy, smart homes) collect and utilize time series data (e.g., energy production and consumption, human body movement) to conduct data analysis. Among such analysis tasks, classification is a widely utilized technique to provide data-driven solutions. Most existing classification methods extract a single set of features from the data and use this feature set for classification across multiple classes. This often ignores the reality that different and class-specific subsets of the initial feature set may better facilitate classification. In this paper, we propose two convolutional neural network (CNN) models using class-specific variables to solve the multi-class classification problem over multivariate time series (MTS) data. A new loss function is introduced for training the CNN models. We compare our proposed methods with 13 baseline approaches using 14 real datasets. The extensive experimental results show that our new approaches can not only outperform other methods on classification accuracy, but also successfully identify important class-specific variables.","PeriodicalId":385149,"journal":{"name":"2020 IEEE International Conference on Smart Data Services (SMDS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130674046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01DOI: 10.1109/smds49396.2020.00025
Mohand-Saïd Hacid, Zhaohui Wu, Laurence T. Yang, Hongbing Wang, Nabil El Ioini, Kenneth Fletcher, M. Gergatsoulis, Daniel Grosu, Jin-Kao Hao, M. Sapino, Huasong Shan, Kurt Tutschku, Sudharshan S. Vazhkudai, M. A. Vega-Rodríguez, S. Ventura, Jian Wang, Haohua Wang, Jianwu Wang, Shangguang Wang
Program Committee Amani Abu Jabal, Purdue University Jacky Akoka, CEDRIC-CNAM & IMT-TEM Mohsen Amini Salehi, University of Louisiana Lafayette Rui Araujo, University of Coimbra Claudio Ardagna, Universita' degli Studi di Milano Mohan Baruwal Chhetri, CSIRO Paolo Bellavista, University of Bologna Nik Bessis, Edge Hill University Frank Blaauw, University of Groningen Luca Cagliero, Politecnico di Torino Jian Cao, Shanghai Jiao Tong University Chia-Hui Chang, National Central University Feng Chen, Louisiana State University Tao Chen, Loughborough University Yong Chen, Tianjin University Shizhan Chen, Tianjin University Lisi Chen, Hong Kong Baptist University Bo Cheng, Beijing University of Posts & Telecommunications Lizhen Cui, Shandong University Edward Curry, NUI Galway Harshad Deshmukh, Google Sheng Di, ANL Zhijun Ding, Tongji University Weilong Ding, North China University of Technology Mario Jose Divan, UNLPam Schahram Dustdar, Vienna University of Technology Nabil El Ioini Kenneth Fletcher, University of Massachusetts Boston Matthew Forshaw, Newcastle University Mohamed Gaber, Birmingham City University Mikel Galar, Universidad Pública de Navarra Mengmeng Ge, Deakin University
项目委员会Amani Abu Jabal,普度大学Jacky Akoka, cecic - cnam & IMT-TEM Mohsen Amini Salehi,路易斯安那大学Lafayette Rui Araujo,科英布拉大学Claudio Ardagna,米兰理工大学Mohan Baruwal Chhetri, CSIRO Paolo Bellavista,博洛尼亚大学Nik Bessis,边山大学Frank Blaauw,格罗宁根大学Luca Cagliero,都灵理工大学Jian Cao,上海交通大学张嘉辉,中央大学陈峰,路易斯安那州立大学陈涛、拉夫堡大学陈勇、天津大学陈世展、天津大学陈丽思、香港浸会大学程博、北京邮电大学崔丽真、山东大学Edward Curry、NUI Galway Harshad Deshmukh、谷歌迪胜、ANL丁志军、同济大学丁伟龙、华北工业大学Mario Jose Divan、UNLPam Schahram Dustdar、维也纳科技大学Nabil El Ioini Kenneth Fletcher,马萨诸塞大学波士顿分校Matthew Forshaw,纽卡斯尔大学Mohamed Gaber,伯明翰城市大学Mikel Galar,纳瓦拉大学Pública de Navarra Mengmeng Ge,迪肯大学
{"title":"SMDS 2020 Organizing Committee","authors":"Mohand-Saïd Hacid, Zhaohui Wu, Laurence T. Yang, Hongbing Wang, Nabil El Ioini, Kenneth Fletcher, M. Gergatsoulis, Daniel Grosu, Jin-Kao Hao, M. Sapino, Huasong Shan, Kurt Tutschku, Sudharshan S. Vazhkudai, M. A. Vega-Rodríguez, S. Ventura, Jian Wang, Haohua Wang, Jianwu Wang, Shangguang Wang","doi":"10.1109/smds49396.2020.00025","DOIUrl":"https://doi.org/10.1109/smds49396.2020.00025","url":null,"abstract":"Program Committee Amani Abu Jabal, Purdue University Jacky Akoka, CEDRIC-CNAM & IMT-TEM Mohsen Amini Salehi, University of Louisiana Lafayette Rui Araujo, University of Coimbra Claudio Ardagna, Universita' degli Studi di Milano Mohan Baruwal Chhetri, CSIRO Paolo Bellavista, University of Bologna Nik Bessis, Edge Hill University Frank Blaauw, University of Groningen Luca Cagliero, Politecnico di Torino Jian Cao, Shanghai Jiao Tong University Chia-Hui Chang, National Central University Feng Chen, Louisiana State University Tao Chen, Loughborough University Yong Chen, Tianjin University Shizhan Chen, Tianjin University Lisi Chen, Hong Kong Baptist University Bo Cheng, Beijing University of Posts & Telecommunications Lizhen Cui, Shandong University Edward Curry, NUI Galway Harshad Deshmukh, Google Sheng Di, ANL Zhijun Ding, Tongji University Weilong Ding, North China University of Technology Mario Jose Divan, UNLPam Schahram Dustdar, Vienna University of Technology Nabil El Ioini Kenneth Fletcher, University of Massachusetts Boston Matthew Forshaw, Newcastle University Mohamed Gaber, Birmingham City University Mikel Galar, Universidad Pública de Navarra Mengmeng Ge, Deakin University","PeriodicalId":385149,"journal":{"name":"2020 IEEE International Conference on Smart Data Services (SMDS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125610472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01DOI: 10.1109/SMDS49396.2020.00023
Mahmudul Hassan, S. Bansal
The proliferation of the semantic web in the form of Resource Description Framework (RDF) demands an efficient, scalable, and distributed storage along with a highly available and fault-tolerant parallel processing strategy. More precisely, the rapid growth of RDF data raises the need for an efficient partitioning strategy over distributed data management systems to improve SPARQL query performance regardless of its pattern shape with minimized pre-processing time. In this context, we propose a new relational partitioning scheme called Property Table Partitioning (PTP) for RDF data, that further partitions existing Property Table into multiple tables based on distinct properties (comprising of all subjects with non-null values for those distinct properties) in order to minimize input data and join operations. In this paper, we introduce a distributed RDF data management system called S3QLRDF, which is built on top of Spark and utilizes SQL to execute SPARQL queries over PTP schema. We perform an extensive experimental evaluation with respect to preprocessing costs and query performance, using Lehigh University Benchmark (LUBM) and Waterloo SPARQL Diversity Test Suite (WatDiv) datasets with up to 1.4 billion triples. Our results demonstrate that S3QLRDF outperforms state-of-the-art distributed RDF management systems.
以资源描述框架(Resource Description Framework, RDF)形式出现的语义web的激增需要高效、可伸缩的分布式存储以及高可用性和容错的并行处理策略。更准确地说,RDF数据的快速增长提出了对分布式数据管理系统的高效分区策略的需求,以提高SPARQL查询性能,无论其模式形状如何,同时尽量减少预处理时间。在这种情况下,我们提出了一种新的关系分区方案,称为RDF数据的属性表分区(Property Table partitioning, PTP),该方案根据不同的属性(包括所有具有这些不同属性的非空值的主题)将现有的属性表进一步划分为多个表,以尽量减少输入数据和连接操作。在本文中,我们介绍了一个名为S3QLRDF的分布式RDF数据管理系统,它建立在Spark之上,并利用SQL在PTP模式上执行SPARQL查询。我们使用Lehigh University Benchmark (LUBM)和Waterloo SPARQL Diversity Test Suite (WatDiv)数据集,对预处理成本和查询性能进行了广泛的实验评估,其中包含多达14亿个三元组。我们的结果表明,S3QLRDF优于最先进的分布式RDF管理系统。
{"title":"S3QLRDF: Property Table Partitioning Scheme for Distributed SPARQL Querying of large-scale RDF data","authors":"Mahmudul Hassan, S. Bansal","doi":"10.1109/SMDS49396.2020.00023","DOIUrl":"https://doi.org/10.1109/SMDS49396.2020.00023","url":null,"abstract":"The proliferation of the semantic web in the form of Resource Description Framework (RDF) demands an efficient, scalable, and distributed storage along with a highly available and fault-tolerant parallel processing strategy. More precisely, the rapid growth of RDF data raises the need for an efficient partitioning strategy over distributed data management systems to improve SPARQL query performance regardless of its pattern shape with minimized pre-processing time. In this context, we propose a new relational partitioning scheme called Property Table Partitioning (PTP) for RDF data, that further partitions existing Property Table into multiple tables based on distinct properties (comprising of all subjects with non-null values for those distinct properties) in order to minimize input data and join operations. In this paper, we introduce a distributed RDF data management system called S3QLRDF, which is built on top of Spark and utilizes SQL to execute SPARQL queries over PTP schema. We perform an extensive experimental evaluation with respect to preprocessing costs and query performance, using Lehigh University Benchmark (LUBM) and Waterloo SPARQL Diversity Test Suite (WatDiv) datasets with up to 1.4 billion triples. Our results demonstrate that S3QLRDF outperforms state-of-the-art distributed RDF management systems.","PeriodicalId":385149,"journal":{"name":"2020 IEEE International Conference on Smart Data Services (SMDS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114368927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01DOI: 10.1109/SMDS49396.2020.00019
S. Bilotta, P. Nesi, I. Paoli
Nowadays, traffic management and sustainable mobility are becoming one of the central topics for intelligent transportation systems (ITS). Thanks to the today's technologies, it is possible to collect real-time data to monitor the traffic situation in some specific areas. An important challenge in ITS is the ability to predict road traffic variables. The short-term predictions of traffic aspects are a complex nonlinear task that has been the subject of many research efforts in the past few decades. Accessing to precise traffic flow data is mandatory for a large number of applications which have to guarantee high level of services such as: traffic flow reconstruction, which in turn is used to perform what-if analysis, conditioned routing, etc. They have to be reliable and precise for sending rescue teams and fire brigades. This paper proposes a solution for a short- and long-term traffic flow prediction estimation by using and comparing a number of machine learning approaches. The solution has been developed in the context of Sii-Mobility smart city mobility and transport national project and it is in use in other EC projects and solution such as Snap4City PCP EC and TRAFAIR CEF, but also for REPLICATE H2020 SCC1 and control room in Florence area.
{"title":"Real-time System for Short- and Long-Term Prediction of Vehicle Flow","authors":"S. Bilotta, P. Nesi, I. Paoli","doi":"10.1109/SMDS49396.2020.00019","DOIUrl":"https://doi.org/10.1109/SMDS49396.2020.00019","url":null,"abstract":"Nowadays, traffic management and sustainable mobility are becoming one of the central topics for intelligent transportation systems (ITS). Thanks to the today's technologies, it is possible to collect real-time data to monitor the traffic situation in some specific areas. An important challenge in ITS is the ability to predict road traffic variables. The short-term predictions of traffic aspects are a complex nonlinear task that has been the subject of many research efforts in the past few decades. Accessing to precise traffic flow data is mandatory for a large number of applications which have to guarantee high level of services such as: traffic flow reconstruction, which in turn is used to perform what-if analysis, conditioned routing, etc. They have to be reliable and precise for sending rescue teams and fire brigades. This paper proposes a solution for a short- and long-term traffic flow prediction estimation by using and comparing a number of machine learning approaches. The solution has been developed in the context of Sii-Mobility smart city mobility and transport national project and it is in use in other EC projects and solution such as Snap4City PCP EC and TRAFAIR CEF, but also for REPLICATE H2020 SCC1 and control room in Florence area.","PeriodicalId":385149,"journal":{"name":"2020 IEEE International Conference on Smart Data Services (SMDS)","volume":"406 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123067928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
User Churn Prediction is a cutting-edge research area in the web service industry, it is the key for managing the user in the virtual world and provide feedback information for improving the corresponding web service. At present, most of the relevant work is to design a questionnaire to collect data of users' characteristics and feelings and then develop a general model by finding relevance. However, that kind of methods requires quite a time and manpower, and most web services can only obtain logs of users' behaviours and have no access to users' feature data. Therefore, it is a big challenge to conduct user churn prediction with only behavior data and get users' latent feelings from their action data in order to improve the accuracy of churn prediction. In this paper, a novel Latent Feelings-aware RNN model, namely LaFee, has been proposed to solve the user churn prediction problem by using only behaviour data. The latent feelings, proven to be satisfaction and aspiration, can be estimated through the intermediate variable of the trained LaFee. We also designed experiments on a real dataset and the results show that our methods outperform the baselines.
{"title":"A Latent Feelings-aware RNN Model for User Churn Prediction with only Behaviour data","authors":"Meng Xi, Zhiling Luo, Naibo Wang, Jianrong Tao, Ying Li, Jianwei Yin","doi":"10.1109/SMDS49396.2020.00011","DOIUrl":"https://doi.org/10.1109/SMDS49396.2020.00011","url":null,"abstract":"User Churn Prediction is a cutting-edge research area in the web service industry, it is the key for managing the user in the virtual world and provide feedback information for improving the corresponding web service. At present, most of the relevant work is to design a questionnaire to collect data of users' characteristics and feelings and then develop a general model by finding relevance. However, that kind of methods requires quite a time and manpower, and most web services can only obtain logs of users' behaviours and have no access to users' feature data. Therefore, it is a big challenge to conduct user churn prediction with only behavior data and get users' latent feelings from their action data in order to improve the accuracy of churn prediction. In this paper, a novel Latent Feelings-aware RNN model, namely LaFee, has been proposed to solve the user churn prediction problem by using only behaviour data. The latent feelings, proven to be satisfaction and aspiration, can be estimated through the intermediate variable of the trained LaFee. We also designed experiments on a real dataset and the results show that our methods outperform the baselines.","PeriodicalId":385149,"journal":{"name":"2020 IEEE International Conference on Smart Data Services (SMDS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130177995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01DOI: 10.1109/SMDS49396.2020.00013
Farnaz Tahmasebian, Li Xiong, Mani Sotoodeh, V. Sunderam
As crowdsourcing is becoming more widely used for annotating data from a large group of users, attackers have strong incentives to manipulate the system. Deriving the true answer of tasks in crowdsourcing systems based on user-provided data is susceptible to data poisoning attacks, whereby malicious users may intentionally or strategically report incorrect information to mislead the system into inferring the wrong truth for a set of tasks. Recent work has proposed several attacks on the crowdsourcing systems and showed that existing truth inference methods may be vulnerable to such attacks. In this paper, we propose solutions to enhance the robustness of existing truth inference methods. Our solutions base on 1) detecting and augmenting the answers for the boundary tasks in which users could not reach a strong consensus and hence are subjective to potential manipulation, and 2) enhancing inference method with a stronger prior. We empirically evaluate these defense mechanisms by designing attack scenarios that aim to decrease the accuracy of the system. Experiments show that our method is effective and significantly improves the robustness of the system under attack.
{"title":"EdgeInfer: Robust Truth Inference under Data Poisoning Attack","authors":"Farnaz Tahmasebian, Li Xiong, Mani Sotoodeh, V. Sunderam","doi":"10.1109/SMDS49396.2020.00013","DOIUrl":"https://doi.org/10.1109/SMDS49396.2020.00013","url":null,"abstract":"As crowdsourcing is becoming more widely used for annotating data from a large group of users, attackers have strong incentives to manipulate the system. Deriving the true answer of tasks in crowdsourcing systems based on user-provided data is susceptible to data poisoning attacks, whereby malicious users may intentionally or strategically report incorrect information to mislead the system into inferring the wrong truth for a set of tasks. Recent work has proposed several attacks on the crowdsourcing systems and showed that existing truth inference methods may be vulnerable to such attacks. In this paper, we propose solutions to enhance the robustness of existing truth inference methods. Our solutions base on 1) detecting and augmenting the answers for the boundary tasks in which users could not reach a strong consensus and hence are subjective to potential manipulation, and 2) enhancing inference method with a stronger prior. We empirically evaluate these defense mechanisms by designing attack scenarios that aim to decrease the accuracy of the system. Experiments show that our method is effective and significantly improves the robustness of the system under attack.","PeriodicalId":385149,"journal":{"name":"2020 IEEE International Conference on Smart Data Services (SMDS)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126600721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}