Due to the small memory footprint and fast startup times offerred by container virtualization, made ever more popular by the Docker platform, containers are seeing rapid adoption as a foundational capability to build PaaS and SaaS clouds. For such container clouds, which are fundamentally different from VM clouds, various cloud management services need to be revisited. In this paper, we present our Voyager - just-in-time live container migration service, designed in accordance with the Open Container Initiative (OCI) principles. Voyager is a novel filesystem-agnostic and vendor-agnostic migration service that provides consistent full-system migration. Voyager combines CRIU-based memory migration together with the data federation capabilities of union mounts to minimize migration downtime. With a union view of data between the source and target hosts, Voyager containers can resume operation instantly on the target host, while performing disk state transfer lazily in the background.
{"title":"Voyager: Complete Container State Migration","authors":"S. Nadgowda, Sahil Suneja, Nilton Bila, C. Isci","doi":"10.1109/ICDCS.2017.91","DOIUrl":"https://doi.org/10.1109/ICDCS.2017.91","url":null,"abstract":"Due to the small memory footprint and fast startup times offerred by container virtualization, made ever more popular by the Docker platform, containers are seeing rapid adoption as a foundational capability to build PaaS and SaaS clouds. For such container clouds, which are fundamentally different from VM clouds, various cloud management services need to be revisited. In this paper, we present our Voyager - just-in-time live container migration service, designed in accordance with the Open Container Initiative (OCI) principles. Voyager is a novel filesystem-agnostic and vendor-agnostic migration service that provides consistent full-system migration. Voyager combines CRIU-based memory migration together with the data federation capabilities of union mounts to minimize migration downtime. With a union view of data between the source and target hosts, Voyager containers can resume operation instantly on the target host, while performing disk state transfer lazily in the background.","PeriodicalId":127689,"journal":{"name":"2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134590442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Superseding HTTP/1.1, the dominating web protocol, HTTP/2 promises to make web applications faster and safer by introducing many new features, such as multiplexing, header compression, request priority, server push, etc. Although a few recent studies examined the adoption of HTTP/2 and evaluated its impacts, little is known about whether the popular HTTP/2 servers have correctly realized the new features and how the deployed servers use these features. To fill in the gap, in this paper, we conduct the first systematic investigation by inspecting six popular implementations of HTTP/2 servers (i.e., Nginx, Apache, H2O, Lightspeed, nghttpd and Tengine) and measuring the top 1 million Alexa web sites. In particular, we propose new methods and develop a tool named H2Scope to assess the new features in those servers. The results of the large-scale measurement on HTTP/2 web sites reveal new observations and insights. This study sheds light on the current status and the future research of HTTP/2.
{"title":"Are HTTP/2 Servers Ready Yet?","authors":"Muhui Jiang, Xiapu Luo, TungNgai Miu, Shengtuo Hu, Weixiong Rao","doi":"10.1109/ICDCS.2017.279","DOIUrl":"https://doi.org/10.1109/ICDCS.2017.279","url":null,"abstract":"Superseding HTTP/1.1, the dominating web protocol, HTTP/2 promises to make web applications faster and safer by introducing many new features, such as multiplexing, header compression, request priority, server push, etc. Although a few recent studies examined the adoption of HTTP/2 and evaluated its impacts, little is known about whether the popular HTTP/2 servers have correctly realized the new features and how the deployed servers use these features. To fill in the gap, in this paper, we conduct the first systematic investigation by inspecting six popular implementations of HTTP/2 servers (i.e., Nginx, Apache, H2O, Lightspeed, nghttpd and Tengine) and measuring the top 1 million Alexa web sites. In particular, we propose new methods and develop a tool named H2Scope to assess the new features in those servers. The results of the large-scale measurement on HTTP/2 web sites reveal new observations and insights. This study sheds light on the current status and the future research of HTTP/2.","PeriodicalId":127689,"journal":{"name":"2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132634100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ai-Chun Pang, W. Chung, Te-Chuan Chiu, Junshan Zhang
Fog computing is emerging as one promising solution to meet the increasing demand for ultra-low latency services in wireless networks. Taking a forward-looking perspective, we propose a Fog-Radio Access Network (F-RAN) model, which utilizes the existing infrastructure, e.g., small cells and macro base stations, to achieve the ultra-low latency by joint computing across multiple F-RAN nodes and near-range communications at the edge. We treat the low latency design as an optimization problem, which characterizes the tradeoff between communication and computing across multiple F-RAN nodes. Since this problem is NP-hard, we propose a latency-driven cooperative task computing algorithm with one-for-all concept for simultaneous selection of the F-RAN nodes to serve with proper heterogeneous resource allocation for multi-user services. Considering the limited heterogeneous resources shared among all users, we advocate the one-for-all strategy for every user taking other's situation into consideration and seek for a "win-win" solution. The numerical results show that the low latency services can be achieved by F-RAN via latency-driven cooperative task computing.
{"title":"Latency-Driven Cooperative Task Computing in Multi-user Fog-Radio Access Networks","authors":"Ai-Chun Pang, W. Chung, Te-Chuan Chiu, Junshan Zhang","doi":"10.1109/ICDCS.2017.83","DOIUrl":"https://doi.org/10.1109/ICDCS.2017.83","url":null,"abstract":"Fog computing is emerging as one promising solution to meet the increasing demand for ultra-low latency services in wireless networks. Taking a forward-looking perspective, we propose a Fog-Radio Access Network (F-RAN) model, which utilizes the existing infrastructure, e.g., small cells and macro base stations, to achieve the ultra-low latency by joint computing across multiple F-RAN nodes and near-range communications at the edge. We treat the low latency design as an optimization problem, which characterizes the tradeoff between communication and computing across multiple F-RAN nodes. Since this problem is NP-hard, we propose a latency-driven cooperative task computing algorithm with one-for-all concept for simultaneous selection of the F-RAN nodes to serve with proper heterogeneous resource allocation for multi-user services. Considering the limited heterogeneous resources shared among all users, we advocate the one-for-all strategy for every user taking other's situation into consideration and seek for a \"win-win\" solution. The numerical results show that the low latency services can be achieved by F-RAN via latency-driven cooperative task computing.","PeriodicalId":127689,"journal":{"name":"2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)","volume":"446 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115750685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gang Li, Fan Yang, Guoxing Chen, Qiang Zhai, Xinfeng Li, Jin Teng, Junda Zhu, D. Xuan, Biao Chen, Wei Zhao
Visual (V) surveillance systems are extensively deployed and becoming the largest source of big data. On the other hand, electronic (E) data also plays an important role in surveillance and its amount increases explosively with the ubiquity of mobile devices. One of the major problems in surveillance is to determine human objects' identities among different surveillance scenes. Traditional way of processing big V and E datasets separately does not serve the purpose well because V data and E data are imperfect alone for information gathering and retrieval. Matching human objects in the two datasets can merge the good of the two for efficient large-scale surveillance. Yet such matching across two heterogeneous big datasets is challenging. In this paper, we propose an efficient set of parallel algorithms, called EV-Matching, to bridge big E and V data. We match E and V data based on their spatiotemporal correlation. The EV-Matching algorithms are implemented on Apache Spark to further accelerate the whole procedure. We conduct extensive experiments on a large synthetic dataset under different settings. Results demonstrate the feasibility and efficiency of our proposed algorithms.
{"title":"EV-Matching: Bridging Large Visual Data and Electronic Data for Efficient Surveillance","authors":"Gang Li, Fan Yang, Guoxing Chen, Qiang Zhai, Xinfeng Li, Jin Teng, Junda Zhu, D. Xuan, Biao Chen, Wei Zhao","doi":"10.1109/ICDCS.2017.89","DOIUrl":"https://doi.org/10.1109/ICDCS.2017.89","url":null,"abstract":"Visual (V) surveillance systems are extensively deployed and becoming the largest source of big data. On the other hand, electronic (E) data also plays an important role in surveillance and its amount increases explosively with the ubiquity of mobile devices. One of the major problems in surveillance is to determine human objects' identities among different surveillance scenes. Traditional way of processing big V and E datasets separately does not serve the purpose well because V data and E data are imperfect alone for information gathering and retrieval. Matching human objects in the two datasets can merge the good of the two for efficient large-scale surveillance. Yet such matching across two heterogeneous big datasets is challenging. In this paper, we propose an efficient set of parallel algorithms, called EV-Matching, to bridge big E and V data. We match E and V data based on their spatiotemporal correlation. The EV-Matching algorithms are implemented on Apache Spark to further accelerate the whole procedure. We conduct extensive experiments on a large synthetic dataset under different settings. Results demonstrate the feasibility and efficiency of our proposed algorithms.","PeriodicalId":127689,"journal":{"name":"2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130573541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Si Chen, K. Ren, Sixu Piao, Cong Wang, Qian Wang, J. Weng, Lu Su, Aziz Mohaisen
Voice, as a convenient and efficient way of information delivery, has a significant advantage over the conventional keyboard-based input methods, especially on small mobile devices such as smartphones and smartwatches. However, the human voice could often be exposed to the public, which allows an attacker to quickly collect sound samples of targeted victims and further launch voice impersonation attacks to spoof those voice-based applications. In this paper, we propose the design and implementation of a robust software-only voice impersonation defense system, which is tailored for mobile platforms and can be easily integrated with existing off-the-shelf smart devices. In our system, we explore magnetic field emitted from loudspeakers as the essential characteristic for detecting machine-based voice impersonation attacks. Furthermore, we use a state-of-the-art automatic speaker verification system to defend against human imitation attacks. Finally, our evaluation results show that our system achieves simultaneously high accuracy (100%) and low equal error rates (EERs) (0%) in detecting the machine-based voice impersonation attack on smartphones.
{"title":"You Can Hear But You Cannot Steal: Defending Against Voice Impersonation Attacks on Smartphones","authors":"Si Chen, K. Ren, Sixu Piao, Cong Wang, Qian Wang, J. Weng, Lu Su, Aziz Mohaisen","doi":"10.1109/ICDCS.2017.133","DOIUrl":"https://doi.org/10.1109/ICDCS.2017.133","url":null,"abstract":"Voice, as a convenient and efficient way of information delivery, has a significant advantage over the conventional keyboard-based input methods, especially on small mobile devices such as smartphones and smartwatches. However, the human voice could often be exposed to the public, which allows an attacker to quickly collect sound samples of targeted victims and further launch voice impersonation attacks to spoof those voice-based applications. In this paper, we propose the design and implementation of a robust software-only voice impersonation defense system, which is tailored for mobile platforms and can be easily integrated with existing off-the-shelf smart devices. In our system, we explore magnetic field emitted from loudspeakers as the essential characteristic for detecting machine-based voice impersonation attacks. Furthermore, we use a state-of-the-art automatic speaker verification system to defend against human imitation attacks. Finally, our evaluation results show that our system achieves simultaneously high accuracy (100%) and low equal error rates (EERs) (0%) in detecting the machine-based voice impersonation attack on smartphones.","PeriodicalId":127689,"journal":{"name":"2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130125715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Distribution of intangible information goods is experiencing tremendous growth in recent years, which has facilitated a blossoming of information goods economics. As big data develops, there are more and more information goods markets for data trading. In the current of data pricing policies in data trading, there are many metrics to measure the value of data goods, such as the data generation date, data volume, and data integrity, etc. However, it is very challenging to identify the amount of data information and its distribution, and the corresponding data pricing has rarely been discussed. In this paper, we propose a new data pricing metric, i.e., the data information entropy, which helps to make a reasonable price in the data trading. We first demonstrate a data information measurement method based on information entropy, and then propose a pricing function based on the result of data information measurement. To comprehensively understand the new data pricing metric and facilitate its application in data trading, we verify the rationality of the data information measurement method and give three concrete pricing functions. It is the first time to look at the information entropy-based data pricing, which can inspire the research concerning the pricing mechanism of data goods, further promoting the development of data products business.
{"title":"A First Look at Information Entropy-Based Data Pricing","authors":"Xijun Li, Jianguo Yao, Xue Liu, Haibing Guan","doi":"10.1109/ICDCS.2017.45","DOIUrl":"https://doi.org/10.1109/ICDCS.2017.45","url":null,"abstract":"Distribution of intangible information goods is experiencing tremendous growth in recent years, which has facilitated a blossoming of information goods economics. As big data develops, there are more and more information goods markets for data trading. In the current of data pricing policies in data trading, there are many metrics to measure the value of data goods, such as the data generation date, data volume, and data integrity, etc. However, it is very challenging to identify the amount of data information and its distribution, and the corresponding data pricing has rarely been discussed. In this paper, we propose a new data pricing metric, i.e., the data information entropy, which helps to make a reasonable price in the data trading. We first demonstrate a data information measurement method based on information entropy, and then propose a pricing function based on the result of data information measurement. To comprehensively understand the new data pricing metric and facilitate its application in data trading, we verify the rationality of the data information measurement method and give three concrete pricing functions. It is the first time to look at the information entropy-based data pricing, which can inspire the research concerning the pricing mechanism of data goods, further promoting the development of data products business.","PeriodicalId":127689,"journal":{"name":"2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124118091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Memcached is a widely used in-memory caching solution in large-scale searching scenarios. The most pivotal performance metric in Memcached is latency, which is affected by various factors including the workload pattern, the service rate, the unbalanced load distribution and the cache miss ratio. To quantitate the impact of each factor on latency, we establish a theoretical model for the Memcached system. Specially, we formulate the unbalanced load distribution among Memcached servers by a set of probabilities, capture the burst and concurrent key arrivals at Memcached servers in form of batching blocks, and add a cache miss processing stage. Based on this model, algebraic derivations are conducted to estimate latency in Memcached. The latency estimation is validated by intensive experiments. Moreover, we obtain a quantitative understanding of how much improvement of latency performance can be achieved by optimizing each factor and provide several useful recommendations to optimal latency in Memcached.
{"title":"Modeling and Analyzing Latency in the Memcached system","authors":"Wenxue Cheng, Fengyuan Ren, Wanchun Jiang, Tong Zhang","doi":"10.1109/ICDCS.2017.122","DOIUrl":"https://doi.org/10.1109/ICDCS.2017.122","url":null,"abstract":"Memcached is a widely used in-memory caching solution in large-scale searching scenarios. The most pivotal performance metric in Memcached is latency, which is affected by various factors including the workload pattern, the service rate, the unbalanced load distribution and the cache miss ratio. To quantitate the impact of each factor on latency, we establish a theoretical model for the Memcached system. Specially, we formulate the unbalanced load distribution among Memcached servers by a set of probabilities, capture the burst and concurrent key arrivals at Memcached servers in form of batching blocks, and add a cache miss processing stage. Based on this model, algebraic derivations are conducted to estimate latency in Memcached. The latency estimation is validated by intensive experiments. Moreover, we obtain a quantitative understanding of how much improvement of latency performance can be achieved by optimizing each factor and provide several useful recommendations to optimal latency in Memcached.","PeriodicalId":127689,"journal":{"name":"2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)","volume":"576 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116262160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we seek to address anonymous communications in delay tolerant networks (DTNs). While many different approaches for the internet and ad hoc networks, to the best of our knowledge, only variants of onion-based routing have been tailored for DTNs. Since each type of anonymous routing protocol has its advantages and drawbacks, there is no single anonymous routing protocol for DTNs that can adapt to the different levels of security requirements. In this paper, we first design a set of anonymous routing protocols for DTNs, called anonymous epidemic and zone-based anonymous routing, based on the original anonymous routing protocols for ad hoc networks. Then, we propose a framework of anonymous routing (FAR) for DTNs, which subsumes all the aforementioned protocols. By tuning its parameters, the proposed FAR is able to outperform onion-based, anonymous Epidemic, and zone-based routing.
{"title":"Anonymous Routing to Maximize Delivery Rates in DTNs","authors":"Kazuya Sakai, Min-Te Sun, Wei-Shinn Ku, Jie Wu","doi":"10.1109/ICDCS.2017.44","DOIUrl":"https://doi.org/10.1109/ICDCS.2017.44","url":null,"abstract":"In this paper, we seek to address anonymous communications in delay tolerant networks (DTNs). While many different approaches for the internet and ad hoc networks, to the best of our knowledge, only variants of onion-based routing have been tailored for DTNs. Since each type of anonymous routing protocol has its advantages and drawbacks, there is no single anonymous routing protocol for DTNs that can adapt to the different levels of security requirements. In this paper, we first design a set of anonymous routing protocols for DTNs, called anonymous epidemic and zone-based anonymous routing, based on the original anonymous routing protocols for ad hoc networks. Then, we propose a framework of anonymous routing (FAR) for DTNs, which subsumes all the aforementioned protocols. By tuning its parameters, the proposed FAR is able to outperform onion-based, anonymous Epidemic, and zone-based routing.","PeriodicalId":127689,"journal":{"name":"2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125940186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The domain of Internet of Things (IoT) is rapidly expanding beyond research, and becoming a major industrial market with such stakeholders as major manufacturers of chips and connected entities (i.e., things), and fast-growing operators of wide-area networks. Importantly, this emerging domain is driven by applications that leverage an IoT infrastructure to provide users with innovative, high-value services. IoT infrastructures range from small scale (e.g., homes and personal health) to large scale (e.g., cities and transportation systems). In this paper, we argue that there is a continuum between orchestrating connected entities in the small and in the large. We propose a unified approach to application development, which covers this spectrum. To do so, we examine the requirements for orchestrating connected entities and address them with domainspecific design concepts. We then show how to map these design concepts into dedicated programming patterns and runtime mechanisms.Our work revolves around domain-specific concepts and notations, integrated into a tool-based design methodology and dedicated to develop IoT applications. We have applied our work across a spectrum of infrastructure sizes, ranging from an automated pilot in avionics, to an assisted living platform for the home of seniors, to a parking management system in a smart city.
{"title":"Internet of Things: From Small- to Large-Scale Orchestration","authors":"C. Consel, Milan Kabác","doi":"10.1109/ICDCS.2017.314","DOIUrl":"https://doi.org/10.1109/ICDCS.2017.314","url":null,"abstract":"The domain of Internet of Things (IoT) is rapidly expanding beyond research, and becoming a major industrial market with such stakeholders as major manufacturers of chips and connected entities (i.e., things), and fast-growing operators of wide-area networks. Importantly, this emerging domain is driven by applications that leverage an IoT infrastructure to provide users with innovative, high-value services. IoT infrastructures range from small scale (e.g., homes and personal health) to large scale (e.g., cities and transportation systems). In this paper, we argue that there is a continuum between orchestrating connected entities in the small and in the large. We propose a unified approach to application development, which covers this spectrum. To do so, we examine the requirements for orchestrating connected entities and address them with domainspecific design concepts. We then show how to map these design concepts into dedicated programming patterns and runtime mechanisms.Our work revolves around domain-specific concepts and notations, integrated into a tool-based design methodology and dedicated to develop IoT applications. We have applied our work across a spectrum of infrastructure sizes, ranging from an automated pilot in avionics, to an assisted living platform for the home of seniors, to a parking management system in a smart city.","PeriodicalId":127689,"journal":{"name":"2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131412679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deployed in various distributed storage systems, erasure coding has demonstrated its advantages of low storage overhead and high failure tolerance. Typically in an erasure-coded distributed storage system, systematic maximum distance seperable (MDS) codes are chosen since the optimal storage overhead can be achieved and meanwhile data can be read directly without decoding operations. However, data parallelism of existing MDS codes is limited, because we can only read data from some specific servers in parallel without decoding operations. In this paper, we propose Carousel codes, designed to allow data to be read from an arbitrary number of servers in parallel without decoding, while preserving the optimal storage overhead of MDS codes. Furthermore, Carousel codes can achieve the optimal network traffic to reconstruct an unavailable block. We have implemented a prototype of Carousel codes on Apache Hadoop. Our experimental results have demonstrated that Carousel codes can make MapReduce jobs finish with almost 50% less time and reduce data access latency significantly, with a comparable throughput in the encoding and decoding operations and no additional sacrifice of failure tolerance or the network overhead to reconstruct unavailable data.
{"title":"On Data Parallelism of Erasure Coding in Distributed Storage Systems","authors":"Jun Li, Baochun Li","doi":"10.1109/ICDCS.2017.191","DOIUrl":"https://doi.org/10.1109/ICDCS.2017.191","url":null,"abstract":"Deployed in various distributed storage systems, erasure coding has demonstrated its advantages of low storage overhead and high failure tolerance. Typically in an erasure-coded distributed storage system, systematic maximum distance seperable (MDS) codes are chosen since the optimal storage overhead can be achieved and meanwhile data can be read directly without decoding operations. However, data parallelism of existing MDS codes is limited, because we can only read data from some specific servers in parallel without decoding operations. In this paper, we propose Carousel codes, designed to allow data to be read from an arbitrary number of servers in parallel without decoding, while preserving the optimal storage overhead of MDS codes. Furthermore, Carousel codes can achieve the optimal network traffic to reconstruct an unavailable block. We have implemented a prototype of Carousel codes on Apache Hadoop. Our experimental results have demonstrated that Carousel codes can make MapReduce jobs finish with almost 50% less time and reduce data access latency significantly, with a comparable throughput in the encoding and decoding operations and no additional sacrifice of failure tolerance or the network overhead to reconstruct unavailable data.","PeriodicalId":127689,"journal":{"name":"2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129896391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}