As consumers are increasingly engaged in social networking and E-commerce activities, businesses grow to rely on Big Data analytics for intelligence, and traditional IT infrastructures continue to migrate to the cloud and edge, these trends cause distributed data storage demand to rise at an unprecedented speed. Erasure coding has seen itself quickly emerged as a promising technique to reduce storage cost while providing similar reliability as replicated systems, widely adopted by companies like Facebook, Microsoft and Google. However, it also brings new challenges in characterizing and optimizing the access latency when erasure codes are used in distributed storage. The aim of this monograph is to provide a review of recent progress (both theoretical and practical) on systems that employ erasure codes for distributed storage. In this monograph, we will first identify the key challenges and taxonomy of the research problems and then give an overview of different approaches that have been developed to quantify and model latency of erasure-coded storage. This includes recent work leveraging MDS-Reservation, Fork-Join, Probabilistic, and Delayed-Relaunch scheduling policies, as well as their applications to characterize access latency (e.g., mean, tail, asymptotic latency) of erasure-coded distributed storage systems. We will also extend the problem to the case when users are streaming videos from erasure-coded distributed storage systems. Next, we bridge the gap between theory and practice, and discuss lessons learned from prototype implementation. In particular, we will discuss exemplary implementations of erasure-coded storage, illuminate key design degrees of freedom and tradeoffs, and summarize remaining challenges in real-world storage systems such as in content delivery and caching. Open problems for future research are discussed at the end of each chapter.
{"title":"Modeling and Optimization of Latency in Erasure-coded Storage Systems","authors":"V. Aggarwal, Tian Lan","doi":"10.1561/0100000108","DOIUrl":"https://doi.org/10.1561/0100000108","url":null,"abstract":"As consumers are increasingly engaged in social networking and E-commerce activities, businesses grow to rely on Big Data analytics for intelligence, and traditional IT infrastructures continue to migrate to the cloud and edge, these trends cause distributed data storage demand to rise at an unprecedented speed. Erasure coding has seen itself quickly emerged as a promising technique to reduce storage cost while providing similar reliability as replicated systems, widely adopted by companies like Facebook, Microsoft and Google. However, it also brings new challenges in characterizing and optimizing the access latency when erasure codes are used in distributed storage. The aim of this monograph is to provide a review of recent progress (both theoretical and practical) on systems that employ erasure codes for distributed storage. \u0000In this monograph, we will first identify the key challenges and taxonomy of the research problems and then give an overview of different approaches that have been developed to quantify and model latency of erasure-coded storage. This includes recent work leveraging MDS-Reservation, Fork-Join, Probabilistic, and Delayed-Relaunch scheduling policies, as well as their applications to characterize access latency (e.g., mean, tail, asymptotic latency) of erasure-coded distributed storage systems. We will also extend the problem to the case when users are streaming videos from erasure-coded distributed storage systems. Next, we bridge the gap between theory and practice, and discuss lessons learned from prototype implementation. In particular, we will discuss exemplary implementations of erasure-coded storage, illuminate key design degrees of freedom and tradeoffs, and summarize remaining challenges in real-world storage systems such as in content delivery and caching. Open problems for future research are discussed at the end of each chapter.","PeriodicalId":45236,"journal":{"name":"Foundations and Trends in Communications and Information Theory","volume":"11 1","pages":"380-525"},"PeriodicalIF":2.4,"publicationDate":"2020-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88639804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Scarlett, A. G. Fàbregas, A. Somekh-Baruch, Alfonso Martinez
Shannon's channel coding theorem characterizes the maximal rate of information that can be reliably transmitted over a communication channel when optimal encoding and decoding strategies are used. In many scenarios, however, practical considerations such as channel uncertainty and implementation constraints rule out the use of an optimal decoder. The mismatched decoding problem addresses such scenarios by considering the case that the decoder cannot be optimized, but is instead fixed as part of the problem statement. This problem is not only of direct interest in its own right, but also has close connections with other long-standing theoretical problems in information theory. In this monograph, we survey both classical literature and recent developments on the mismatched decoding problem, with an emphasis on achievable random-coding rates for memoryless channels. We present two widely-considered achievable rates known as the generalized mutual information (GMI) and the LM rate, and overview their derivations and properties. In addition, we survey several improved rates via multi-user coding techniques, as well as recent developments and challenges in establishing upper bounds on the mismatch capacity, and an analogous mismatched encoding problem in rate-distortion theory. Throughout the monograph, we highlight a variety of applications and connections with other prominent information theory problems.
{"title":"Information-Theoretic Foundations of Mismatched Decoding","authors":"J. Scarlett, A. G. Fàbregas, A. Somekh-Baruch, Alfonso Martinez","doi":"10.1561/0100000101","DOIUrl":"https://doi.org/10.1561/0100000101","url":null,"abstract":"Shannon's channel coding theorem characterizes the maximal rate of information that can be reliably transmitted over a communication channel when optimal encoding and decoding strategies are used. In many scenarios, however, practical considerations such as channel uncertainty and implementation constraints rule out the use of an optimal decoder. The mismatched decoding problem addresses such scenarios by considering the case that the decoder cannot be optimized, but is instead fixed as part of the problem statement. This problem is not only of direct interest in its own right, but also has close connections with other long-standing theoretical problems in information theory. In this monograph, we survey both classical literature and recent developments on the mismatched decoding problem, with an emphasis on achievable random-coding rates for memoryless channels. We present two widely-considered achievable rates known as the generalized mutual information (GMI) and the LM rate, and overview their derivations and properties. In addition, we survey several improved rates via multi-user coding techniques, as well as recent developments and challenges in establishing upper bounds on the mismatch capacity, and an analogous mismatched encoding problem in rate-distortion theory. Throughout the monograph, we highlight a variety of applications and connections with other prominent information theory problems.","PeriodicalId":45236,"journal":{"name":"Foundations and Trends in Communications and Information Theory","volume":"49 1","pages":"149-401"},"PeriodicalIF":2.4,"publicationDate":"2020-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91030153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Theoretical Foundations of Adversarial Binary Detection","authors":"M. Barni, B. Tondi","doi":"10.1561/0100000102","DOIUrl":"https://doi.org/10.1561/0100000102","url":null,"abstract":"","PeriodicalId":45236,"journal":{"name":"Foundations and Trends in Communications and Information Theory","volume":"61 1","pages":"1-172"},"PeriodicalIF":2.4,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91348574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Storage resources and caching techniques permeate almost every area of communication networks today. In the near future, caching is set to play an important role in storage-assisted Internet architectures, information-centric networks, and wireless systems, reducing operating and capital expenditures and improving the services offered to users. In light of the remarkable data traffic growth and the increasing number of rich-media applications, the impact of caching is expected to become even more profound than it is today. Therefore, it is crucial to design these systems in an optimal fashion, ensuring the maximum possible performance and economic benefits from their deployment. To this end, this article presents a collection of detailed models and algorithms, which are synthesized to build a powerful analytical framework for caching optimization.
{"title":"Cache Optimization Models and Algorithms","authors":"G. Paschos, G. Iosifidis, G. Caire","doi":"10.1561/0100000104","DOIUrl":"https://doi.org/10.1561/0100000104","url":null,"abstract":"Storage resources and caching techniques permeate almost every area of communication networks today. In the near future, caching is set to play an important role in storage-assisted Internet architectures, information-centric networks, and wireless systems, reducing operating and capital expenditures and improving the services offered to users. In light of the remarkable data traffic growth and the increasing number of rich-media applications, the impact of caching is expected to become even more profound than it is today. Therefore, it is crucial to design these systems in an optimal fashion, ensuring the maximum possible performance and economic benefits from their deployment. To this end, this article presents a collection of detailed models and algorithms, which are synthesized to build a powerful analytical framework for caching optimization.","PeriodicalId":45236,"journal":{"name":"Foundations and Trends in Communications and Information Theory","volume":"7 1","pages":"156-343"},"PeriodicalIF":2.4,"publicationDate":"2019-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73371440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lattice-Reduction-Aided and Integer-Forcing Equalization: Structures, Criteria, Factorization, and Coding","authors":"R. Fischer, S. Stern, J. Huber","doi":"10.1561/0100000100","DOIUrl":"https://doi.org/10.1561/0100000100","url":null,"abstract":"","PeriodicalId":45236,"journal":{"name":"Foundations and Trends in Communications and Information Theory","volume":"9 1","pages":"1-155"},"PeriodicalIF":2.4,"publicationDate":"2019-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84288167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Developing computationally-efficient codes that approach the Shannon-theoretic limits for communication and compression has long been one of the major goals of information and coding theory. There have been significant advances towards this goal in the last couple of decades, with the emergence of turbo codes, sparse-graph codes, and polar codes. These codes are designed primarily for discrete-alphabet channels and sources. For Gaussian channels and sources, where the alphabet is inherently continuous, Sparse Superposition Codes or Sparse Regression Codes (SPARCs) are a promising class of codes for achieving the Shannon limits. This survey provides a unified and comprehensive overview of sparse regression codes, covering theory, algorithms, and practical implementation aspects. The first part of the monograph focuses on SPARCs for AWGN channel coding, and the second part on SPARCs for lossy compression (with squared error distortion criterion). In the third part, SPARCs are used to construct codes for Gaussian multi-terminal channel and source coding models such as broadcast channels, multiple-access channels, and source and channel coding with side information. The survey concludes with a discussion of open problems and directions for future work.
{"title":"Sparse Regression Codes","authors":"R. Venkataramanan, S. Tatikonda, A. Barron","doi":"10.1561/0100000092","DOIUrl":"https://doi.org/10.1561/0100000092","url":null,"abstract":"Developing computationally-efficient codes that approach the Shannon-theoretic limits for communication and compression has long been one of the major goals of information and coding theory. There have been significant advances towards this goal in the last couple of decades, with the emergence of turbo codes, sparse-graph codes, and polar codes. These codes are designed primarily for discrete-alphabet channels and sources. For Gaussian channels and sources, where the alphabet is inherently continuous, Sparse Superposition Codes or Sparse Regression Codes (SPARCs) are a promising class of codes for achieving the Shannon limits. \u0000This survey provides a unified and comprehensive overview of sparse regression codes, covering theory, algorithms, and practical implementation aspects. The first part of the monograph focuses on SPARCs for AWGN channel coding, and the second part on SPARCs for lossy compression (with squared error distortion criterion). In the third part, SPARCs are used to construct codes for Gaussian multi-terminal channel and source coding models such as broadcast channels, multiple-access channels, and source and channel coding with side information. The survey concludes with a discussion of open problems and directions for future work.","PeriodicalId":45236,"journal":{"name":"Foundations and Trends in Communications and Information Theory","volume":"11 1","pages":"1-195"},"PeriodicalIF":2.4,"publicationDate":"2019-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74307354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The group testing problem concerns discovering a small number of defective items within a large population by performing tests on pools of items. A test is positive if the pool contains at least one defective, and negative if it contains no defectives. This is a sparse inference problem with a combinatorial flavour, with applications in medical testing, biology, telecommunications, information technology, data science, and more. In this monograph, we survey recent developments in the group testing problem from an information-theoretic perspective. We cover several related developments: efficient algorithms with practical storage and computation requirements, achievability bounds for optimal decoding methods, and algorithm-independent converse bounds. We assess the theoretical guarantees not only in terms of scaling laws, but also in terms of the constant factors, leading to the notion of the {em rate} of group testing, indicating the amount of information learned per test. Considering both noiseless and noisy settings, we identify several regimes where existing algorithms are provably optimal or near-optimal, as well as regimes where there remains greater potential for improvement. In addition, we survey results concerning a number of variations on the standard group testing problem, including partial recovery criteria, adaptive algorithms with a limited number of stages, constrained test designs, and sublinear-time algorithms.
{"title":"Group testing: an information theory perspective","authors":"Matthew Aldridge, O. Johnson, J. Scarlett","doi":"10.1561/0100000099","DOIUrl":"https://doi.org/10.1561/0100000099","url":null,"abstract":"The group testing problem concerns discovering a small number of defective items within a large population by performing tests on pools of items. A test is positive if the pool contains at least one defective, and negative if it contains no defectives. This is a sparse inference problem with a combinatorial flavour, with applications in medical testing, biology, telecommunications, information technology, data science, and more. In this monograph, we survey recent developments in the group testing problem from an information-theoretic perspective. We cover several related developments: efficient algorithms with practical storage and computation requirements, achievability bounds for optimal decoding methods, and algorithm-independent converse bounds. We assess the theoretical guarantees not only in terms of scaling laws, but also in terms of the constant factors, leading to the notion of the {em rate} of group testing, indicating the amount of information learned per test. Considering both noiseless and noisy settings, we identify several regimes where existing algorithms are provably optimal or near-optimal, as well as regimes where there remains greater potential for improvement. In addition, we survey results concerning a number of variations on the standard group testing problem, including partial recovery criteria, adaptive algorithms with a limited number of stages, constrained test designs, and sublinear-time algorithms.","PeriodicalId":45236,"journal":{"name":"Foundations and Trends in Communications and Information Theory","volume":"15 1","pages":"196-392"},"PeriodicalIF":2.4,"publicationDate":"2019-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88025684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lattice-Reduction-Aided and\u0000Integer-Forcing Equalization","authors":"R. Fischer, S. Stern, J. Huber","doi":"10.1561/9781680836455","DOIUrl":"https://doi.org/10.1561/9781680836455","url":null,"abstract":"","PeriodicalId":45236,"journal":{"name":"Foundations and Trends in Communications and Information Theory","volume":"16 1","pages":"1-155"},"PeriodicalIF":2.4,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67082025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}