2014 IEEE 34th International Conference on Distributed Computing Systems最新文献

Learning from the Past: Intelligent On-Line Weather Monitoring Based on Matrix Completion 借鉴过去:基于矩阵补全的智能在线天气监测

2014 IEEE 34th International Conference on Distributed Computing Systems

Pub Date : 2014-06-30 DOI: 10.1109/ICDCS.2014.26

Kun Xie, Lele Wang, Xin Wang, Jigang Wen, Gaogang Xie

Matrix completion has emerged very recently and provides a new venue for low cost data gathering in WSNs. Existing schemes often assume that the data matrix has a known and fixed low-rank, which is unlikely to hold in a practical monitoring system such as weather data gathering. Weather data varies in temporal and spatial domain with time. By analyzing a large set of weather data collected from 196 sensors in ZhuZhou, China, we reveal that weather data have the features of low-rank, temporal stability, and relative rank stability. Taking advantage of these features, we propose an on-line data gathering scheme based on matrix completion theory, named MC-Weather, to adaptively sample different locations according to environmental and weather conditions. To better schedule sampling process while satisfying the required reconstruction accuracy, we propose several novel techniques, including three sample learning principles, an adaptive sampling algorithm based on matrix completion, and a uniform time slot and cross sample model. With these techniques, our MC-Weather scheme can collect the sensory data at required accuracy while largely reduce the cost for sensing, communication and computation. We perform extensive simulations based on the real weather data sets and the simulation results validate the efficiency and efficacy of the proposed scheme.

矩阵补全是最近才出现的，为wsn的低成本数据采集提供了新的途径。现有方案通常假设数据矩阵具有已知和固定的低秩，这在实际监测系统(如天气数据收集)中不太可能成立。天气资料在时空上随时间而变化。通过对株洲地区196个传感器采集的大量天气数据的分析，发现天气数据具有低秩、时间稳定性和相对秩稳定性的特征。利用这些特点，我们提出了一种基于矩阵补全理论的在线数据采集方案MC-Weather，根据环境和天气条件自适应采样不同的地点。为了在满足重构精度要求的同时更好地调度采样过程，我们提出了几种新技术，包括三个样本学习原理、基于矩阵补全的自适应采样算法以及均匀时隙和交叉样本模型。利用这些技术，我们的MC-Weather方案可以以所需的精度收集传感器数据，同时大大降低了传感、通信和计算的成本。我们基于真实的天气数据集进行了大量的模拟，模拟结果验证了所提出方案的效率和有效性。

{"title":"Learning from the Past: Intelligent On-Line Weather Monitoring Based on Matrix Completion","authors":"Kun Xie, Lele Wang, Xin Wang, Jigang Wen, Gaogang Xie","doi":"10.1109/ICDCS.2014.26","DOIUrl":"https://doi.org/10.1109/ICDCS.2014.26","url":null,"abstract":"Matrix completion has emerged very recently and provides a new venue for low cost data gathering in WSNs. Existing schemes often assume that the data matrix has a known and fixed low-rank, which is unlikely to hold in a practical monitoring system such as weather data gathering. Weather data varies in temporal and spatial domain with time. By analyzing a large set of weather data collected from 196 sensors in ZhuZhou, China, we reveal that weather data have the features of low-rank, temporal stability, and relative rank stability. Taking advantage of these features, we propose an on-line data gathering scheme based on matrix completion theory, named MC-Weather, to adaptively sample different locations according to environmental and weather conditions. To better schedule sampling process while satisfying the required reconstruction accuracy, we propose several novel techniques, including three sample learning principles, an adaptive sampling algorithm based on matrix completion, and a uniform time slot and cross sample model. With these techniques, our MC-Weather scheme can collect the sensory data at required accuracy while largely reduce the cost for sensing, communication and computation. We perform extensive simulations based on the real weather data sets and the simulation results validate the efficiency and efficacy of the proposed scheme.","PeriodicalId":170186,"journal":{"name":"2014 IEEE 34th International Conference on Distributed Computing Systems","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114503421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

Will They Blend?: Exploring Big Data Computation Atop Traditional HPC NAS Storage 他们会融合吗?:探索基于传统HPC NAS存储的大数据计算

2014 IEEE 34th International Conference on Distributed Computing Systems

Pub Date : 2014-06-30 DOI: 10.1109/ICDCS.2014.60

E. Wilson, M. Kandemir, Garth A. Gibson

The Apache Hadoop framework has rung in a new era in how data-rich organizations can process, store, and analyze large amounts of data. This has resulted in increased potential for an infrastructure exodus from the traditional solution of commercial database ad-hoc analytics on network-attached storage (NAS). While many data-rich organizations can afford to either move entirely to Hadoop for their Big Data analytics, or to maintain their existing traditional infrastructures and acquire a new set of infrastructure solely for Hadoop jobs, most supercomputing centers do not enjoy either of those possibilities. Too much of the existing scientific code is tailored to work on massively parallel file systems unlike the Hadoop Distributed File System (HDFS), and their datasets are too large to reasonably maintain and/or ferry between two distinct storage systems. Nevertheless, as scientists search for easier-to-program frameworks with a lower time-to-science to post-process their huge datasets after execution, there is increasing pressure to enable use of MapReduce within these traditional High Performance Computing (HPC) architectures. Therefore, in this work we explore potential means to enable use of the easy-to-program Hadoop MapReduce framework without requiring a complete infrastructure overhaul from existing HPC NAS solutions. We demonstrate that retaining function-dedicated resources like NAS is not only possible, but can even be effected efficiently with MapReduce. In our exploration, we unearth subtle pitfalls resultant from this mash-up of new-era Big Data computation on conventional HPC storage and share the clever architectural configurations that allow us to avoid them. Last, we design and present a novel Hadoop File System, the Reliable Array of Independent NAS File System (RainFS), and experimentally demonstrate its improvements in performance and reliability over the previous architectures we have investigated.

Apache Hadoop框架开启了一个数据丰富的组织处理、存储和分析大量数据的新时代。这增加了基础设施从传统的基于网络附加存储(NAS)的商业数据库特别分析解决方案中流失的可能性。虽然许多数据丰富的组织可以完全转移到Hadoop进行大数据分析，或者维持现有的传统基础设施，并获得一套新的基础设施，仅用于Hadoop工作，但大多数超级计算中心都不享受这两种可能性。与Hadoop分布式文件系统(HDFS)不同，现有的科学代码中有太多是为大规模并行文件系统量身定制的，而且它们的数据集太大，无法合理地维护和/或在两个不同的存储系统之间传输。然而，随着科学家们寻找更容易编程的框架，并在执行后更短的时间内对其庞大的数据集进行后处理，在这些传统的高性能计算(HPC)架构中使用MapReduce的压力越来越大。因此，在这项工作中，我们探索了使用易于编程的Hadoop MapReduce框架的潜在方法，而不需要从现有的HPC NAS解决方案中进行完整的基础设施检修。我们证明保留像NAS这样的功能专用资源不仅是可能的，而且甚至可以通过MapReduce有效地实现。在我们的探索中，我们发现了新时代大数据计算在传统HPC存储上混搭所带来的微妙陷阱，并分享了巧妙的架构配置，使我们能够避免这些陷阱。最后，我们设计并提出了一个新的Hadoop文件系统，独立NAS文件系统的可靠阵列(RainFS)，并通过实验证明了它在性能和可靠性方面的改进。

{"title":"Will They Blend?: Exploring Big Data Computation Atop Traditional HPC NAS Storage","authors":"E. Wilson, M. Kandemir, Garth A. Gibson","doi":"10.1109/ICDCS.2014.60","DOIUrl":"https://doi.org/10.1109/ICDCS.2014.60","url":null,"abstract":"The Apache Hadoop framework has rung in a new era in how data-rich organizations can process, store, and analyze large amounts of data. This has resulted in increased potential for an infrastructure exodus from the traditional solution of commercial database ad-hoc analytics on network-attached storage (NAS). While many data-rich organizations can afford to either move entirely to Hadoop for their Big Data analytics, or to maintain their existing traditional infrastructures and acquire a new set of infrastructure solely for Hadoop jobs, most supercomputing centers do not enjoy either of those possibilities. Too much of the existing scientific code is tailored to work on massively parallel file systems unlike the Hadoop Distributed File System (HDFS), and their datasets are too large to reasonably maintain and/or ferry between two distinct storage systems. Nevertheless, as scientists search for easier-to-program frameworks with a lower time-to-science to post-process their huge datasets after execution, there is increasing pressure to enable use of MapReduce within these traditional High Performance Computing (HPC) architectures. Therefore, in this work we explore potential means to enable use of the easy-to-program Hadoop MapReduce framework without requiring a complete infrastructure overhaul from existing HPC NAS solutions. We demonstrate that retaining function-dedicated resources like NAS is not only possible, but can even be effected efficiently with MapReduce. In our exploration, we unearth subtle pitfalls resultant from this mash-up of new-era Big Data computation on conventional HPC storage and share the clever architectural configurations that allow us to avoid them. Last, we design and present a novel Hadoop File System, the Reliable Array of Independent NAS File System (RainFS), and experimentally demonstrate its improvements in performance and reliability over the previous architectures we have investigated.","PeriodicalId":170186,"journal":{"name":"2014 IEEE 34th International Conference on Distributed Computing Systems","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131928839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Optimal Energy Cost for Strongly Stable Multi-hop Green Cellular Networks 强稳定多跳绿色蜂窝网络的最优能量代价

2014 IEEE 34th International Conference on Distributed Computing Systems

Pub Date : 2014-06-30 DOI: 10.1109/ICDCS.2014.15

Weixian Liao, Ming Li, Sergio Salinas, Pan Li, M. Pan

With the ever increasing user adoption of mobile devices like smart phones and tablets, the cellular service providers' energy consumption and cost are fast-growing and have received tremendous attention. How to effectively reduce the energy cost of cellular networks and achieve green communications while satisfying cellular users' rocketing traffic demands has become an urgent and challenging problem. In this paper, we investigate the minimization of the long-term time-averaged expected energy cost of a cellular service provider while guaranteeing the strong stability of the network. We first formulate an offline optimization problem with a joint consideration of flow routing, link scheduling, and energy (i.e., renewable energy resource, energy storage unit, etc.) constraints. Since the formulated problem is a time-coupling stochastic Mixed-Integer Non-Linear Programming (MINLP) problem, it is prohibitively expensive to solve. Then, we reformulate the problem by employing Lyapunov optimization theory. A decomposition based algorithm is developed to solve the problem, which is proved to guarantee the network strong stability. Both the lower and upper bounds on the optimal result of the original problem are derived and proven. Simulation results demonstrate that the obtained lower and upper bounds are very tight, and that the proposed scheme results in noticeable energy cost savings.

随着用户对智能手机、平板电脑等移动设备的使用越来越多，移动运营商的能耗和成本也在快速增长，受到了人们的极大关注。如何在满足移动用户飞速增长的流量需求的同时，有效降低蜂窝网络的能源成本，实现绿色通信，已成为一个迫切而具有挑战性的问题。在本文中，我们研究了在保证网络的强稳定性的同时，蜂窝服务提供商的长期时间平均期望能量成本的最小化。首先提出了一个综合考虑流路由、链路调度和能量(即可再生能源、储能单元等)约束的离线优化问题。由于该公式问题是一个时间耦合随机混合整数非线性规划(MINLP)问题，求解成本非常高。然后，利用李雅普诺夫优化理论对问题进行了重新表述。提出了一种基于分解的算法来解决该问题，并证明该算法保证了网络的强稳定性。导出并证明了原问题最优结果的下界和上界。仿真结果表明，所得到的下界和上界是非常紧凑的，并且所提出的方案显著地节省了能源成本。

{"title":"Optimal Energy Cost for Strongly Stable Multi-hop Green Cellular Networks","authors":"Weixian Liao, Ming Li, Sergio Salinas, Pan Li, M. Pan","doi":"10.1109/ICDCS.2014.15","DOIUrl":"https://doi.org/10.1109/ICDCS.2014.15","url":null,"abstract":"With the ever increasing user adoption of mobile devices like smart phones and tablets, the cellular service providers' energy consumption and cost are fast-growing and have received tremendous attention. How to effectively reduce the energy cost of cellular networks and achieve green communications while satisfying cellular users' rocketing traffic demands has become an urgent and challenging problem. In this paper, we investigate the minimization of the long-term time-averaged expected energy cost of a cellular service provider while guaranteeing the strong stability of the network. We first formulate an offline optimization problem with a joint consideration of flow routing, link scheduling, and energy (i.e., renewable energy resource, energy storage unit, etc.) constraints. Since the formulated problem is a time-coupling stochastic Mixed-Integer Non-Linear Programming (MINLP) problem, it is prohibitively expensive to solve. Then, we reformulate the problem by employing Lyapunov optimization theory. A decomposition based algorithm is developed to solve the problem, which is proved to guarantee the network strong stability. Both the lower and upper bounds on the optimal result of the original problem are derived and proven. Simulation results demonstrate that the obtained lower and upper bounds are very tight, and that the proposed scheme results in noticeable energy cost savings.","PeriodicalId":170186,"journal":{"name":"2014 IEEE 34th International Conference on Distributed Computing Systems","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128907175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

OpenSample: A Low-Latency, Sampling-Based Measurement Platform for Commodity SDN OpenSample:一个低延迟、基于采样的商用SDN测量平台

2014 IEEE 34th International Conference on Distributed Computing Systems

Pub Date : 2014-06-30 DOI: 10.1109/ICDCS.2014.31

Junho Suh, T. Kwon, C. Dixon, Wes Felter, J. Carter

In this paper we propose, implement and evaluate OpenSample: a low-latency, sampling-based network measurement platform targeted at building faster control loops for software-defined networks. OpenSample leverages sFlow packet sampling to provide near-real-time measurements of both network load and individual flows. While OpenSample is useful in any context, it is particularly useful in an SDN environment where a network controller can quickly take action based on the data it provides. Using sampling for network monitoring allows OpenSample to have a 100 millisecond control loop rather than the 1-5 second control loop of prior polling-based approaches. We implement OpenSample in the Floodlight Open Flow controller and evaluate it both in simulation and on a test bed comprised of commodity switches. When used to inform traffic engineering, OpenSample provides up to a 150% throughput improvement over both static equal-cost multi-path routing and a polling-based solution with a one second control loop.

在本文中，我们提出，实施和评估OpenSample:一个低延迟，基于采样的网络测量平台，旨在为软件定义网络构建更快的控制回路。OpenSample利用sFlow数据包采样来提供网络负载和单个流的近实时测量。虽然OpenSample在任何上下文中都很有用，但它在SDN环境中特别有用，因为网络控制器可以根据它提供的数据快速采取行动。使用采样进行网络监控允许OpenSample拥有100毫秒的控制循环，而不是之前基于轮询的方法的1-5秒控制循环。我们在泛光灯开放流量控制器中实现了OpenSample，并在模拟和由商品开关组成的测试台上对其进行了评估。当用于通知流量工程时，OpenSample比静态等成本多路径路由和基于轮询的一秒控制回路解决方案提供了高达150%的吞吐量改进。

引用次数: 159

Enabling Privacy-Preserving Image-Centric Social Discovery 实现以图像为中心的隐私保护社交发现

2014 IEEE 34th International Conference on Distributed Computing Systems

Pub Date : 2014-06-30 DOI: 10.1109/ICDCS.2014.28

Xingliang Yuan, Xinyu Wang, Cong Wang, A. Squicciarini, K. Ren

The increasing popularity of images at social media sites is posing new opportunities for social discovery applications, i.e., suggesting new friends and discovering new social groups with similar interests via exploring images. To effectively handle the explosive growth of images involved in social discovery, one common trend for many emerging social media sites is to leverage the commercial public cloud as their robust backend data center. While extremely convenient, directly exposing content-rich images and the related social discovery results to the public cloud also raises new acute privacy concerns. In light of the observation, in this paper we propose a privacy-preserving social discovery service architecture based on encrypted images. As the core of such social discovery is to compare and quantify similar images, we first adopt the effective Bag-of-Words model to extract the "visual similarity content" of users' images into image profile vectors, and then model the problem as similarity retrieval of encrypted high-dimensional image profiles. To support fast and scalable similarity search over hundreds of thousands of encrypted images, we propose a secure and efficient indexing structure. The resulting design enables social media sites to obtain secure, practical, and accurate social discovery from the public cloud, without disclosing the encrypted image content. We formally prove the security and discuss further extensions on user image update and the compatibility with existing image sharing social functionalities. Extensive experiments on a large Flickr image dataset demonstrate the practical performance of the proposed design. Our qualitative social discovery results show consistency with human perception.

图片在社交媒体网站上的日益普及为社交发现应用提供了新的机会，即通过探索图片推荐新朋友和发现有相似兴趣的新社交群体。为了有效地处理社交发现中涉及的图像的爆炸性增长，许多新兴社交媒体站点的一个共同趋势是利用商业公共云作为其健壮的后端数据中心。虽然非常方便，但将内容丰富的图像和相关的社交发现结果直接暴露在公共云上也引发了新的尖锐的隐私问题。鉴于此，本文提出了一种基于加密图像的隐私保护社交发现服务架构。由于这种社交发现的核心是对相似图像进行比较和量化，我们首先采用有效的Bag-of-Words模型将用户图像的“视觉相似内容”提取到图像轮廓向量中，然后将问题建模为加密高维图像轮廓的相似度检索。为了支持对成千上万的加密图像进行快速和可扩展的相似度搜索，我们提出了一种安全高效的索引结构。最终的设计使社交媒体网站能够在不泄露加密图像内容的情况下，从公共云获得安全、实用、准确的社交发现。我们正式证明了安全性，并讨论了用户图像更新的进一步扩展以及与现有图像共享社交功能的兼容性。在大型Flickr图像数据集上的大量实验证明了所提出设计的实际性能。我们的定性社会发现结果与人类感知一致。

{"title":"Enabling Privacy-Preserving Image-Centric Social Discovery","authors":"Xingliang Yuan, Xinyu Wang, Cong Wang, A. Squicciarini, K. Ren","doi":"10.1109/ICDCS.2014.28","DOIUrl":"https://doi.org/10.1109/ICDCS.2014.28","url":null,"abstract":"The increasing popularity of images at social media sites is posing new opportunities for social discovery applications, i.e., suggesting new friends and discovering new social groups with similar interests via exploring images. To effectively handle the explosive growth of images involved in social discovery, one common trend for many emerging social media sites is to leverage the commercial public cloud as their robust backend data center. While extremely convenient, directly exposing content-rich images and the related social discovery results to the public cloud also raises new acute privacy concerns. In light of the observation, in this paper we propose a privacy-preserving social discovery service architecture based on encrypted images. As the core of such social discovery is to compare and quantify similar images, we first adopt the effective Bag-of-Words model to extract the \"visual similarity content\" of users' images into image profile vectors, and then model the problem as similarity retrieval of encrypted high-dimensional image profiles. To support fast and scalable similarity search over hundreds of thousands of encrypted images, we propose a secure and efficient indexing structure. The resulting design enables social media sites to obtain secure, practical, and accurate social discovery from the public cloud, without disclosing the encrypted image content. We formally prove the security and discuss further extensions on user image update and the compatibility with existing image sharing social functionalities. Extensive experiments on a large Flickr image dataset demonstrate the practical performance of the proposed design. Our qualitative social discovery results show consistency with human perception.","PeriodicalId":170186,"journal":{"name":"2014 IEEE 34th International Conference on Distributed Computing Systems","volume":"1 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113976329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 47

Exploring the Use of Diverse Replicas for Big Location Tracking Data 探索在大位置跟踪数据中使用不同的副本

2014 IEEE 34th International Conference on Distributed Computing Systems

Pub Date : 2014-06-30 DOI: 10.1109/ICDCS.2014.17

Ye Ding, Haoyu Tan, Wuman Luo, L. Ni

The value of large amount of location tracking data has received wide attention in many applications including human behavior analysis, urban transportation planning, and various location-based services (LBS). Nowadays, both scientific and industrial communities are encouraged to collect as much location tracking data as possible, which brings about two issues: 1) it is challenging to process the queries on big location tracking data efficiently, and 2) it is expensive to store several exact data replicas for fault-tolerance. So far, several dedicated storage systems have been proposed to address these issues. However, they do not work well when the query ranges vary widely. In this paper, we present the design of a storage system using diverse replica scheme which improves the query processing efficiency with reduced cost of storage space. To the best of our knowledge, we are the first to investigate the data storage and processing in the context of big location tracking data. Specifically, we conduct in-depth theoretical and empirical analysis of the trade-offs between different spatio-temporal partitioning schemes as well as data encoding schemes. Then we propose an effective approach to select an appropriate set of diverse replicas, which is optimized for the expected query loads while conforming to the given storage space budget. The experiment results confirm that using diverse replicas can significantly improve the overall query performance. The results also demonstrate that the proposed algorithms for the replica selection problem is both effective and efficient.

在人类行为分析、城市交通规划以及各种基于位置的服务(LBS)等诸多应用中，大量位置跟踪数据的价值受到了广泛关注。目前，科学界和工业界都鼓励尽可能多地收集位置跟踪数据，这带来了两个问题:1)对大量位置跟踪数据的查询处理具有挑战性，2)为了容错而存储多个精确的数据副本的成本很高。到目前为止，已经提出了几种专用存储系统来解决这些问题。然而，当查询范围变化很大时，它们就不能很好地工作了。在本文中，我们设计了一种采用多副本方案的存储系统，在降低存储空间成本的同时提高了查询处理效率。据我们所知，我们是第一个在大位置跟踪数据背景下研究数据存储和处理的公司。具体而言，我们对不同时空划分方案和数据编码方案之间的权衡进行了深入的理论和实证分析。然后，我们提出了一种有效的方法来选择合适的不同副本集，该方法针对预期的查询负载进行了优化，同时符合给定的存储空间预算。实验结果证实，使用不同的副本可以显著提高整体查询性能。结果还表明，本文提出的算法在副本选择问题上是有效的。

{"title":"Exploring the Use of Diverse Replicas for Big Location Tracking Data","authors":"Ye Ding, Haoyu Tan, Wuman Luo, L. Ni","doi":"10.1109/ICDCS.2014.17","DOIUrl":"https://doi.org/10.1109/ICDCS.2014.17","url":null,"abstract":"The value of large amount of location tracking data has received wide attention in many applications including human behavior analysis, urban transportation planning, and various location-based services (LBS). Nowadays, both scientific and industrial communities are encouraged to collect as much location tracking data as possible, which brings about two issues: 1) it is challenging to process the queries on big location tracking data efficiently, and 2) it is expensive to store several exact data replicas for fault-tolerance. So far, several dedicated storage systems have been proposed to address these issues. However, they do not work well when the query ranges vary widely. In this paper, we present the design of a storage system using diverse replica scheme which improves the query processing efficiency with reduced cost of storage space. To the best of our knowledge, we are the first to investigate the data storage and processing in the context of big location tracking data. Specifically, we conduct in-depth theoretical and empirical analysis of the trade-offs between different spatio-temporal partitioning schemes as well as data encoding schemes. Then we propose an effective approach to select an appropriate set of diverse replicas, which is optimized for the expected query loads while conforming to the given storage space budget. The experiment results confirm that using diverse replicas can significantly improve the overall query performance. The results also demonstrate that the proposed algorithms for the replica selection problem is both effective and efficient.","PeriodicalId":170186,"journal":{"name":"2014 IEEE 34th International Conference on Distributed Computing Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131158697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Impact Analysis of Topology Poisoning Attacks on Economic Operation of the Smart Power Grid 拓扑投毒攻击对智能电网经济运行的影响分析

2014 IEEE 34th International Conference on Distributed Computing Systems

Pub Date : 2014-06-30 DOI: 10.1109/ICDCS.2014.72

M. Rahman, E. Al-Shaer, R. Kavasseri

The Optimal Power Flow (OPF) routine used in energy control centers allocates individual generator outputs by minimizing the overall cost of generation subject to system level operating constraints. The OPF relies on the outputs of two other modules, namely topology processor and state estimator. The topology processor maps the grid topology based on statuses received from the switches and circuit breakers across the system. The state estimator computes the system state, i.e., voltage magnitudes with phase angles, transmission line flows, and system loads based on real-time meter measurements. However, topology statuses and meter measurements are vulnerable to false data injection attacks. Recent research has shown that such cyber attacks can be launched against state estimation where adversaries can corrupt the states but still remain undetected. In this paper, we show how the stealthy topology poisoning attacks can compromise the integrity of OPF, and thus undermine economic operation. We describe a formal verification based framework to systematically analyze the impact of such attacks on OPF. The proposed framework is illustrated with an example. We also evaluate the scalability of the framework with respect to time and memory requirements.

能源控制中心使用的最优潮流(OPF)程序在系统运行约束下，通过最小化总体发电成本来分配各个发电机的输出。OPF依赖于另外两个模块的输出，即拓扑处理器和状态估计器。拓扑处理器根据从整个系统的开关和断路器接收到的状态映射网格拓扑。状态估计器根据实时仪表测量值计算系统状态，即带相角的电压幅值、传输线流量和系统负载。然而，拓扑状态和仪表测量容易受到虚假数据注入攻击。最近的研究表明，这种网络攻击可以针对国家估计发起，在这种情况下，对手可以破坏国家，但仍未被发现。在本文中，我们展示了隐身拓扑中毒攻击如何损害OPF的完整性，从而破坏经济运行。我们描述了一个基于正式验证的框架，以系统地分析此类攻击对OPF的影响。通过一个实例说明了所提出的框架。我们还根据时间和内存需求评估框架的可伸缩性。

{"title":"Impact Analysis of Topology Poisoning Attacks on Economic Operation of the Smart Power Grid","authors":"M. Rahman, E. Al-Shaer, R. Kavasseri","doi":"10.1109/ICDCS.2014.72","DOIUrl":"https://doi.org/10.1109/ICDCS.2014.72","url":null,"abstract":"The Optimal Power Flow (OPF) routine used in energy control centers allocates individual generator outputs by minimizing the overall cost of generation subject to system level operating constraints. The OPF relies on the outputs of two other modules, namely topology processor and state estimator. The topology processor maps the grid topology based on statuses received from the switches and circuit breakers across the system. The state estimator computes the system state, i.e., voltage magnitudes with phase angles, transmission line flows, and system loads based on real-time meter measurements. However, topology statuses and meter measurements are vulnerable to false data injection attacks. Recent research has shown that such cyber attacks can be launched against state estimation where adversaries can corrupt the states but still remain undetected. In this paper, we show how the stealthy topology poisoning attacks can compromise the integrity of OPF, and thus undermine economic operation. We describe a formal verification based framework to systematically analyze the impact of such attacks on OPF. The proposed framework is illustrated with an example. We also evaluate the scalability of the framework with respect to time and memory requirements.","PeriodicalId":170186,"journal":{"name":"2014 IEEE 34th International Conference on Distributed Computing Systems","volume":"18 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133980325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

Compiler Driven Automatic Kernel Context Migration for Heterogeneous Computing 异构计算中编译器驱动的自动内核上下文迁移

2014 IEEE 34th International Conference on Distributed Computing Systems

Pub Date : 2014-06-30 DOI: 10.1109/ICDCS.2014.47

Ramy Gad, Tim Süß, A. Brinkmann

Computer systems provide different heterogeneous resources (e.g., GPUs, DSPs and FPGAs) that accelerate applications and that can reduce the energy consumption by using them. Usually, these resources have an isolated memory and a require target specific code to be written. There exist tools that can automatically generate target specific codes for program parts, so-called kernels. The data objects required for a target kernel execution need to be moved to the target resource memory. It is the programmers' responsibility to serialize these data objects used in the kernel and to copy them to or from the resource's memory. Typically, the programmer writes his own serializing function or uses existing serialization libraries. Unfortunately, both approaches require code modifications, and the programmer needs knowledge of the used data structure format. There is a need for a tool that is able to automatically extract the original kernel data objects, serialize them, and migrate them to a target resource without requiring intervention from the programmer. In this paper, we present a tool collection ConSerner that automatically identifies, gathers, and serializes the context of a kernel and migrates it to a target resource's memory where a target specific kernel is executed with this data. This is all done transparently to the programmer. Complex data structures can be used without making a modification of the program code by a programmer necessary. Predefined data structures in external libraries (e.g., the STL's vector) can also be used as long as the source code of these libraries is available.

计算机系统提供不同的异构资源(例如，gpu, dsp和fpga)，这些资源可以加速应用程序并通过使用它们来降低能耗。通常，这些资源具有独立的内存，并且需要编写目标特定的代码。有一些工具可以自动为程序部分生成目标特定代码，即所谓的内核。目标内核执行所需的数据对象需要移动到目标资源内存中。序列化内核中使用的这些数据对象并将它们复制到资源的内存中是程序员的责任。通常，程序员编写自己的序列化函数或使用现有的序列化库。不幸的是，这两种方法都需要修改代码，而且程序员需要了解所使用的数据结构格式。需要一种工具，它能够自动提取原始内核数据对象，对它们进行序列化，并将它们迁移到目标资源，而无需程序员的干预。在本文中，我们提供了一个工具集合ConSerner，它可以自动识别、收集和序列化内核的上下文，并将其迁移到目标资源的内存中，在该内存中使用这些数据执行目标特定的内核。这对程序员来说都是透明的。复杂的数据结构无需程序员对程序代码进行必要的修改就可以使用。外部库中的预定义数据结构(例如，STL的向量)也可以使用，只要这些库的源代码可用。

{"title":"Compiler Driven Automatic Kernel Context Migration for Heterogeneous Computing","authors":"Ramy Gad, Tim Süß, A. Brinkmann","doi":"10.1109/ICDCS.2014.47","DOIUrl":"https://doi.org/10.1109/ICDCS.2014.47","url":null,"abstract":"Computer systems provide different heterogeneous resources (e.g., GPUs, DSPs and FPGAs) that accelerate applications and that can reduce the energy consumption by using them. Usually, these resources have an isolated memory and a require target specific code to be written. There exist tools that can automatically generate target specific codes for program parts, so-called kernels. The data objects required for a target kernel execution need to be moved to the target resource memory. It is the programmers' responsibility to serialize these data objects used in the kernel and to copy them to or from the resource's memory. Typically, the programmer writes his own serializing function or uses existing serialization libraries. Unfortunately, both approaches require code modifications, and the programmer needs knowledge of the used data structure format. There is a need for a tool that is able to automatically extract the original kernel data objects, serialize them, and migrate them to a target resource without requiring intervention from the programmer. In this paper, we present a tool collection ConSerner that automatically identifies, gathers, and serializes the context of a kernel and migrates it to a target resource's memory where a target specific kernel is executed with this data. This is all done transparently to the programmer. Complex data structures can be used without making a modification of the program code by a programmer necessary. Predefined data structures in external libraries (e.g., the STL's vector) can also be used as long as the source code of these libraries is available.","PeriodicalId":170186,"journal":{"name":"2014 IEEE 34th International Conference on Distributed Computing Systems","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131757808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Columbus: Configuration Discovery for Clouds 哥伦布:云的配置发现

2014 IEEE 34th International Conference on Distributed Computing Systems

Pub Date : 2014-06-30 DOI: 10.1109/ICDCS.2014.41

R. Balani, Deepak Jeswani, Dipyaman Banerjee, Akshat Verma

Low-cost, accurate and scalable software configuration discovery is the key to simplifying many cloud management tasks. However, the lack of standardization across software configuration techniques has prevented the development of a fully automated and application independent configuration discovery solution. In this work, we present Columbus, an application-agnostic system to automatically discover environmental configuration parameters or Points of Variability (PoV) in clustered applications with high accuracy. Columbus uses the insight that even though configuration mechanisms and files vary across different software, the PoVs are encoded using a few common patterns. It uses a novel rule framework to annotate file content with PoVs and a Bayesian network to estimate confidence for annotated PoVs. Our experiments confirm that Columbus can accurately discover configuration for a diverse set of enterprise and cloud applications. It has subsequently been integrated in three real-world systems that analyze this information for discovery of distributed application dependencies, enterprise IT migration and virtual application configuration.

低成本、准确和可扩展的软件配置发现是简化许多云管理任务的关键。然而，缺乏跨软件配置技术的标准化阻碍了完全自动化和独立于应用程序的配置发现解决方案的开发。在这项工作中，我们提出了Columbus，一个与应用无关的系统，用于高精度地自动发现集群应用中的环境配置参数或可变性点(PoV)。Columbus认为，尽管配置机制和文件在不同的软件中是不同的，但是pov是使用一些通用模式进行编码的。它使用一种新的规则框架来用pov注释文件内容，并使用贝叶斯网络来估计注释pov的置信度。我们的实验证实，Columbus可以准确地发现各种企业和云应用程序的配置。它随后被集成到三个现实世界的系统中，这些系统分析这些信息，以发现分布式应用程序依赖关系、企业It迁移和虚拟应用程序配置。

{"title":"Columbus: Configuration Discovery for Clouds","authors":"R. Balani, Deepak Jeswani, Dipyaman Banerjee, Akshat Verma","doi":"10.1109/ICDCS.2014.41","DOIUrl":"https://doi.org/10.1109/ICDCS.2014.41","url":null,"abstract":"Low-cost, accurate and scalable software configuration discovery is the key to simplifying many cloud management tasks. However, the lack of standardization across software configuration techniques has prevented the development of a fully automated and application independent configuration discovery solution. In this work, we present Columbus, an application-agnostic system to automatically discover environmental configuration parameters or Points of Variability (PoV) in clustered applications with high accuracy. Columbus uses the insight that even though configuration mechanisms and files vary across different software, the PoVs are encoded using a few common patterns. It uses a novel rule framework to annotate file content with PoVs and a Bayesian network to estimate confidence for annotated PoVs. Our experiments confirm that Columbus can accurately discover configuration for a diverse set of enterprise and cloud applications. It has subsequently been integrated in three real-world systems that analyze this information for discovery of distributed application dependencies, enterprise IT migration and virtual application configuration.","PeriodicalId":170186,"journal":{"name":"2014 IEEE 34th International Conference on Distributed Computing Systems","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114771803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Enforcing Location and Time-Based Access Control on Cloud-Stored Data 对云存储数据实施基于位置和时间的访问控制

2014 IEEE 34th International Conference on Distributed Computing Systems

Pub Date : 2014-06-30 DOI: 10.1109/ICDCS.2014.71

Elli Androulaki, Claudio Soriente, Luka Malisa, Srdjan Capkun

Recent incidents of data-breaches from the cloud suggest that users should not trust the cloud provider to enforce access control on their data. We focus on mitigating trust to the cloud in scenarios where granting access to data not only considers user identities (as in conventional access policies), but also contextual information such as the user's location and time of access. Previous work in this context assumes a fully trusted cloud that is further capable of locating users. We introduce LoTAC, a novel framework that seamlessly integrates the operation of a cloud provider and a localization infrastructure to enforce location- and time-based access control to cloud-stored data. In LoTAC, the two entities operate independently and are only trusted to offer their basic services: the cloud provider is used and trusted only to reliably store data, the localization infrastructure is used and trusted only to accurately locate users. Furthermore, neither the cloud provider nor the localization infrastructure can access the data, even if they collude. LoTAC protocols require no changes to the cloud provider and minimal changes to the localization infrastructure. We evaluate our protocols using a cellular network as the localization infrastructure and show that they incur in low communication and computation costs and scale well with a large number of users and policies.

最近来自云的数据泄露事件表明，用户不应该相信云提供商会对他们的数据实施访问控制。在授予数据访问权限不仅考虑用户身份(如传统访问策略)，还考虑用户的位置和访问时间等上下文信息的情况下，我们专注于减轻对云的信任。在此上下文中，前面的工作假设了一个完全可信的云，能够进一步定位用户。我们介绍了LoTAC，这是一个新颖的框架，可以无缝集成云提供商的操作和本地化基础设施，以对云存储数据实施基于位置和时间的访问控制。在LoTAC中，这两个实体独立运行，并且仅信任它们提供的基本服务:使用和信任云提供商仅用于可靠地存储数据，使用和信任本地化基础设施仅用于准确定位用户。此外，云提供商和本地化基础设施都不能访问数据，即使它们相互勾结。LoTAC协议不需要对云提供商进行更改，对本地化基础设施的更改也很少。我们使用蜂窝网络作为定位基础设施来评估我们的协议，并表明它们产生较低的通信和计算成本，并且在大量用户和策略下可以很好地扩展。

{"title":"Enforcing Location and Time-Based Access Control on Cloud-Stored Data","authors":"Elli Androulaki, Claudio Soriente, Luka Malisa, Srdjan Capkun","doi":"10.1109/ICDCS.2014.71","DOIUrl":"https://doi.org/10.1109/ICDCS.2014.71","url":null,"abstract":"Recent incidents of data-breaches from the cloud suggest that users should not trust the cloud provider to enforce access control on their data. We focus on mitigating trust to the cloud in scenarios where granting access to data not only considers user identities (as in conventional access policies), but also contextual information such as the user's location and time of access. Previous work in this context assumes a fully trusted cloud that is further capable of locating users. We introduce LoTAC, a novel framework that seamlessly integrates the operation of a cloud provider and a localization infrastructure to enforce location- and time-based access control to cloud-stored data. In LoTAC, the two entities operate independently and are only trusted to offer their basic services: the cloud provider is used and trusted only to reliably store data, the localization infrastructure is used and trusted only to accurately locate users. Furthermore, neither the cloud provider nor the localization infrastructure can access the data, even if they collude. LoTAC protocols require no changes to the cloud provider and minimal changes to the localization infrastructure. We evaluate our protocols using a cellular network as the localization infrastructure and show that they incur in low communication and computation costs and scale well with a large number of users and policies.","PeriodicalId":170186,"journal":{"name":"2014 IEEE 34th International Conference on Distributed Computing Systems","volume":"1993 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128629318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26