This paper presents the design and implementation of the first in-band full duplex WiFi radios that can simultaneously transmit and receive on the same channel using standard WiFi 802.11ac PHYs and achieves close to the theoretical doubling of throughput in all practical deployment scenarios. Our design uses a single antenna for simultaneous TX/RX (i.e., the same resources as a standard half duplex system). We also propose novel analog and digital cancellation techniques that cancel the self interference to the receiver noise floor, and therefore ensure that there is no degradation to the received signal. We prototype our design by building our own analog circuit boards and integrating them with a fully WiFi-PHY compatible software radio implementation. We show experimentally that our design works robustly in noisy indoor environments, and provides close to the expected theoretical doubling of throughput in practice.
{"title":"Full duplex radios","authors":"Dinesh Bharadia, Emily McMilin, S. Katti","doi":"10.1145/2486001.2486033","DOIUrl":"https://doi.org/10.1145/2486001.2486033","url":null,"abstract":"This paper presents the design and implementation of the first in-band full duplex WiFi radios that can simultaneously transmit and receive on the same channel using standard WiFi 802.11ac PHYs and achieves close to the theoretical doubling of throughput in all practical deployment scenarios. Our design uses a single antenna for simultaneous TX/RX (i.e., the same resources as a standard half duplex system). We also propose novel analog and digital cancellation techniques that cancel the self interference to the receiver noise floor, and therefore ensure that there is no degradation to the received signal. We prototype our design by building our own analog circuit boards and integrating them with a fully WiFi-PHY compatible software radio implementation. We show experimentally that our design works robustly in noisy indoor environments, and provides close to the expected theoretical doubling of throughput in practice.","PeriodicalId":159374,"journal":{"name":"Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116889493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kanak Biscuitwala, W. Bult, Mathias Lécuyer, T. J. Purtell, Madeline K. B. Ross, A. Chaintreau, Chris Haseman, M. Lam, Susan E. McGregor
Kanak Biscuitwala kanak@cs.stanford.edu Willem Bult wbult@stanford.edu Mathias Lecuyer† ml3302@columbia.edu T.J. Purtell tpurtell@cs.stanford.edu Madeline K.B. Ross‡ mkr2132@columbia.edu Augustin Chaintreau† augustin@cs.columbia.edu Chris Haseman§ haseman@tumblr.com Monica S. Lam lam@cs.stanford.edu Susan E. McGregor‡ sem2196@columbia.edu Computer Science Department, Stanford University †Computer Science Department, Columbia University ‡Graduate School of Journalism, Columbia University §Tumblr, Inc.
坎纳克·布希特瓦拉·坎纳克@cs.stanford.edu威廉·布尔特·wbult@stanford.edu马蒂亚斯·勒库耶尔†ml3302@columbia.edu T.J. Purtell tpurtell@cs.stanford.edu Madeline K.B. Ross‡mkr2132@columbia.edu Augustin Chaintreau†augustin@cs.columbia.edu克里斯·哈塞曼§haseman@tumblr.com Monica S. Lam lam@cs.stanford.edu苏珊·麦格莱格‡sem2196@columbia.edu斯坦福大学计算机科学系†哥伦比亚大学新闻研究生院哥伦比亚大学/ Tumblr, Inc.
{"title":"Dispatch: secure, resilient mobile reporting","authors":"Kanak Biscuitwala, W. Bult, Mathias Lécuyer, T. J. Purtell, Madeline K. B. Ross, A. Chaintreau, Chris Haseman, M. Lam, Susan E. McGregor","doi":"10.1145/2486001.2491697","DOIUrl":"https://doi.org/10.1145/2486001.2491697","url":null,"abstract":"Kanak Biscuitwala kanak@cs.stanford.edu Willem Bult wbult@stanford.edu Mathias Lecuyer† ml3302@columbia.edu T.J. Purtell tpurtell@cs.stanford.edu Madeline K.B. Ross‡ mkr2132@columbia.edu Augustin Chaintreau† augustin@cs.columbia.edu Chris Haseman§ haseman@tumblr.com Monica S. Lam lam@cs.stanford.edu Susan E. McGregor‡ sem2196@columbia.edu Computer Science Department, Stanford University †Computer Science Department, Columbia University ‡Graduate School of Journalism, Columbia University §Tumblr, Inc.","PeriodicalId":159374,"journal":{"name":"Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125001976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We design a content-centric privacy scheme for Information-Centric Networking (ICN). We enhance ICN's ability to support data confidentiality by introducing attribute-based encryption into ICN and making it specific to the data attributes. Our approach is unusual in that it preserves ICN's goal to decouple publishers and subscribers for greater data accessibility, scalable multiparty communication and efficient data distribution. Inspired by application-layer publish-subscribe, we enable fine-grained access control with more expressive policies. Moreover, we propose an attribute-based routing scheme that offers interest confidentiality. A prototype system is implemented based on CCNx, a popular open source version of ICN, to showcase privacy preservation in Smart Neighborhood and Smart City applications.
{"title":"Toward content-centric privacy in ICN: attribute-based encryption and routing","authors":"Mihaela Ion, Jianqing Zhang, E. Schooler","doi":"10.1145/2486001.2491717","DOIUrl":"https://doi.org/10.1145/2486001.2491717","url":null,"abstract":"We design a content-centric privacy scheme for Information-Centric Networking (ICN). We enhance ICN's ability to support data confidentiality by introducing attribute-based encryption into ICN and making it specific to the data attributes. Our approach is unusual in that it preserves ICN's goal to decouple publishers and subscribers for greater data accessibility, scalable multiparty communication and efficient data distribution. Inspired by application-layer publish-subscribe, we enable fine-grained access control with more expressive policies. Moreover, we propose an attribute-based routing scheme that offers interest confidentiality. A prototype system is implemented based on CCNx, a popular open source version of ICN, to showcase privacy preservation in Smart Neighborhood and Smart City applications.","PeriodicalId":159374,"journal":{"name":"Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123165976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The growing prevalence of broadband Internet access around the world has made understanding the performance and reliability of broadband access networks extremely important. To better understand the performance anomalies that arise in broadband access networks, we have deployed hundreds of routers in home broadband access networks around the world and are studying the performance of these networks. One of the performance pathologies that we have observed is correlated, sudden latency increases simultaneously and to multiple destinations. In this work, we provide an preliminary glimpse into these sudden latency increases and attempt to understand their causes. Although we do not isolate root cause in this study, observing the sets of destinations that experience correlated latency increases can provide important clues as to the locations in the network that may be inducing these pathologies. We present an algorithm to better identify the network locations that are likely responsible for these pathologies. We then analyze latency data from one month across our home router deployment to determine where in the network latency issues are arising, and how those pathologies differ across regions, ISPs, and countries. Our preliminary analysis suggests that most latency pathologies are to a single destination and a relatively small percentage of these pathologies are likely in the last mile, suggesting that peering within the network may be a more likely culprit for these pathologies than access link problems.
{"title":"Characterizing correlated latency anomalies in broadband access networks","authors":"Swati Roy, N. Feamster","doi":"10.1145/2486001.2491734","DOIUrl":"https://doi.org/10.1145/2486001.2491734","url":null,"abstract":"The growing prevalence of broadband Internet access around the world has made understanding the performance and reliability of broadband access networks extremely important. To better understand the performance anomalies that arise in broadband access networks, we have deployed hundreds of routers in home broadband access networks around the world and are studying the performance of these networks. One of the performance pathologies that we have observed is correlated, sudden latency increases simultaneously and to multiple destinations. In this work, we provide an preliminary glimpse into these sudden latency increases and attempt to understand their causes. Although we do not isolate root cause in this study, observing the sets of destinations that experience correlated latency increases can provide important clues as to the locations in the network that may be inducing these pathologies. We present an algorithm to better identify the network locations that are likely responsible for these pathologies. We then analyze latency data from one month across our home router deployment to determine where in the network latency issues are arising, and how those pathologies differ across regions, ISPs, and countries. Our preliminary analysis suggests that most latency pathologies are to a single destination and a relatively small percentage of these pathologies are likely in the last mile, suggesting that peering within the network may be a more likely culprit for these pathologies than access link problems.","PeriodicalId":159374,"journal":{"name":"Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128283292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuanfang Chen, N. Crespi, Lin Lv, Mingchu Li, A. M. Ortiz, Lei Shu
Most indoor localization algorithms are based on Received Signal Strength (RSS), in which RSS signatures of an interested area are annotated with their real recorded locations. However, according to our experiments, RSS signatures are not suitable as the unique annotations (like Fingerprints) of recorded locations. In this study, we investigate the characteristics of RSS (e.g., how the RSS values change as time goes on and between consecutive positions?). On this basis, we design LuPI (Locating using Prior Information) that exploits the characteristics of RSS: with user motion, LuPI uses novel sensors integrated in smartphones to construct the RSS variation space (like radio map) of a floor plan as prior information. The deployment of LuPI is easy and rapid since little human intervention is needed. In LuPI, the calibration of ``radio map'' is crowd-sourced, automatic and scheduled. Experimental results show that LuPI achieves comparable location accuracy to previous approaches, even without the statistical information of site survey.
大多数室内定位算法都是基于接收信号强度(RSS),其中对感兴趣区域的RSS签名进行注释,并标注其实际记录的位置。然而,根据我们的实验,RSS签名不适合作为记录位置的唯一注释(如指纹)。在本研究中,我们研究了RSS的特征(例如,RSS值如何随着时间的推移和连续位置之间变化?)。在此基础上,我们设计了利用RSS特性的LuPI (positioning using Prior Information):随着用户的运动,LuPI使用集成在智能手机中的新型传感器来构建平面图的RSS变化空间(如无线电地图)作为先验信息。由于几乎不需要人工干预,因此LuPI的部署既简单又快速。在LuPI中,“无线电地图”的校准是众包的、自动的和预定的。实验结果表明,即使在没有现场调查统计信息的情况下,LuPI的定位精度也与以往的方法相当。
{"title":"Locating using prior information: wireless indoor localization algorithm","authors":"Yuanfang Chen, N. Crespi, Lin Lv, Mingchu Li, A. M. Ortiz, Lei Shu","doi":"10.1145/2486001.2491688","DOIUrl":"https://doi.org/10.1145/2486001.2491688","url":null,"abstract":"Most indoor localization algorithms are based on Received Signal Strength (RSS), in which RSS signatures of an interested area are annotated with their real recorded locations. However, according to our experiments, RSS signatures are not suitable as the unique annotations (like Fingerprints) of recorded locations. In this study, we investigate the characteristics of RSS (e.g., how the RSS values change as time goes on and between consecutive positions?). On this basis, we design LuPI (Locating using Prior Information) that exploits the characteristics of RSS: with user motion, LuPI uses novel sensors integrated in smartphones to construct the RSS variation space (like radio map) of a floor plan as prior information. The deployment of LuPI is easy and rapid since little human intervention is needed. In LuPI, the calibration of ``radio map'' is crowd-sourced, automatic and scheduled. Experimental results show that LuPI achieves comparable location accuracy to previous approaches, even without the statistical information of site survey.","PeriodicalId":159374,"journal":{"name":"Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122733392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yingying Chen, Ratul Mahajan, B. Sridharan, Zhi-Li Zhang
Using a large Web search service as a case study, we highlight the challenges that modern Web services face in understanding and diagnosing the response time experienced by users. We show that search response time (SRT) varies widely over time and also exhibits counter-intuitive behavior. It is actually higher during off-peak hours, when the query load is lower, than during peak hours. To resolve this paradox and explain SRT variations in general, we develop an analysis framework that separates systemic variations due to periodic changes in service usage and anomalous variations due to unanticipated events such as failures and denial-of-service attacks. We find that systemic SRT variations are primarily caused by systemic changes in aggregate network characteristics, nature of user queries, and browser types. For instance, one reason for higher SRTs during off-peak hours is that during those hours a greater fraction of queries come from slower, mainly-residential networks. We also develop a technique that, by factoring out the impact of such variations, robustly detects and diagnoses performance anomalies in SRT. Deployment experience shows that our technique detects three times more true (operator-verified) anomalies than existing techniques.
{"title":"A provider-side view of web search response time","authors":"Yingying Chen, Ratul Mahajan, B. Sridharan, Zhi-Li Zhang","doi":"10.1145/2486001.2486035","DOIUrl":"https://doi.org/10.1145/2486001.2486035","url":null,"abstract":"Using a large Web search service as a case study, we highlight the challenges that modern Web services face in understanding and diagnosing the response time experienced by users. We show that search response time (SRT) varies widely over time and also exhibits counter-intuitive behavior. It is actually higher during off-peak hours, when the query load is lower, than during peak hours. To resolve this paradox and explain SRT variations in general, we develop an analysis framework that separates systemic variations due to periodic changes in service usage and anomalous variations due to unanticipated events such as failures and denial-of-service attacks. We find that systemic SRT variations are primarily caused by systemic changes in aggregate network characteristics, nature of user queries, and browser types. For instance, one reason for higher SRTs during off-peak hours is that during those hours a greater fraction of queries come from slower, mainly-residential networks. We also develop a technique that, by factoring out the impact of such variations, robustly detects and diagnoses performance anomalies in SRT. Deployment experience shows that our technique detects three times more true (operator-verified) anomalies than existing techniques.","PeriodicalId":159374,"journal":{"name":"Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134023970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tobias Flach, Nandita Dukkipati, A. Terzis, B. Raghavan, N. Cardwell, Yuchung Cheng, Ankur Jain, Shuai Hao, Ethan Katz-Bassett, R. Govindan
To serve users quickly, Web service providers build infrastructure closer to clients and use multi-stage transport connections. Although these changes reduce client-perceived round-trip times, TCP's current mechanisms fundamentally limit latency improvements. We performed a measurement study of a large Web service provider and found that, while connections with no loss complete close to the ideal latency of one round-trip time, TCP's timeout-driven recovery causes transfers with loss to take five times longer on average. In this paper, we present the design of novel loss recovery mechanisms for TCP that judiciously use redundant transmissions to minimize timeout-driven recovery. Proactive, Reactive, and Corrective are three qualitatively-different, easily-deployable mechanisms that (1) proactively recover from losses, (2) recover from them as quickly as possible, and (3) reconstruct packets to mask loss. Crucially, the mechanisms are compatible both with middleboxes and with TCP's existing congestion control and loss recovery. Our large-scale experiments on Google's production network that serves billions of flows demonstrate a 23% decrease in the mean and 47% in 99th percentile latency over today's TCP.
{"title":"Reducing web latency: the virtue of gentle aggression","authors":"Tobias Flach, Nandita Dukkipati, A. Terzis, B. Raghavan, N. Cardwell, Yuchung Cheng, Ankur Jain, Shuai Hao, Ethan Katz-Bassett, R. Govindan","doi":"10.1145/2486001.2486014","DOIUrl":"https://doi.org/10.1145/2486001.2486014","url":null,"abstract":"To serve users quickly, Web service providers build infrastructure closer to clients and use multi-stage transport connections. Although these changes reduce client-perceived round-trip times, TCP's current mechanisms fundamentally limit latency improvements. We performed a measurement study of a large Web service provider and found that, while connections with no loss complete close to the ideal latency of one round-trip time, TCP's timeout-driven recovery causes transfers with loss to take five times longer on average. In this paper, we present the design of novel loss recovery mechanisms for TCP that judiciously use redundant transmissions to minimize timeout-driven recovery. Proactive, Reactive, and Corrective are three qualitatively-different, easily-deployable mechanisms that (1) proactively recover from losses, (2) recover from them as quickly as possible, and (3) reconstruct packets to mask loss. Crucially, the mechanisms are compatible both with middleboxes and with TCP's existing congestion control and loss recovery. Our large-scale experiments on Google's production network that serves billions of flows demonstrate a 23% decrease in the mean and 47% in 99th percentile latency over today's TCP.","PeriodicalId":159374,"journal":{"name":"Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132596235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The goal of this paper is to improve wireless AP caching by leveraging in-network caching. We observe that by treating routers as an in-network storage extension, we can relieve the storage limitation of APs. The unique challenge is that APs and routers cannot have a full collaboration, which makes the problem different from traditional cooperative caching problems. We study how APs can optimize caching decisions by using in-network caching information without controlling routers.
{"title":"In-network caching assisted wireless AP storage management: challenges and algorithms","authors":"Zhongxing Ming, Mingwei Xu, Dan Wang","doi":"10.1145/2486001.2491706","DOIUrl":"https://doi.org/10.1145/2486001.2491706","url":null,"abstract":"The goal of this paper is to improve wireless AP caching by leveraging in-network caching. We observe that by treating routers as an in-network storage extension, we can relieve the storage limitation of APs. The unique challenge is that APs and routers cannot have a full collaboration, which makes the problem different from traditional cooperative caching problems. We study how APs can optimize caching decisions by using in-network caching information without controlling routers.","PeriodicalId":159374,"journal":{"name":"Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133639531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Natural and human factors cause Internet outages---from big events like Hurricane Sandy in 2012 and the Egyptian Internet shutdown in Jan. 2011 to small outages every day that go unpublicized. We describe Trinocular, an outage detection system that uses active probing to understand reliability of edge networks. Trinocular is principled: deriving a simple model of the Internet that captures the information pertinent to outages, and populating that model through long-term data, and learning current network state through ICMP probes. It is parsimonious, using Bayesian inference to determine how many probes are needed. On average, each Trinocular instance sends fewer than 20 probes per hour to each /24 network block under study, increasing Internet "background radiation" by less than 0.7%. Trinocular is also predictable and precise: we provide known precision in outage timing and duration. Probing in rounds of 11 minutes, we detect 100% of outages one round or longer, and estimate outage duration within one-half round. Since we require little traffic, a single machine can track 3.4M /24 IPv4 blocks, all of the Internet currently suitable for analysis. We show that our approach is significantly more accurate than the best current methods, with about one-third fewer false conclusions, and about 30% greater coverage at constant accuracy. We validate our approach using controlled experiments, use Trinocular to analyze two days of Internet outages observed from three sites, and re-analyze three years of existing data to develop trends for the Internet.
{"title":"Trinocular: understanding internet reliability through adaptive probing","authors":"Lin Quan, J. Heidemann, Y. Pradkin","doi":"10.1145/2486001.2486017","DOIUrl":"https://doi.org/10.1145/2486001.2486017","url":null,"abstract":"Natural and human factors cause Internet outages---from big events like Hurricane Sandy in 2012 and the Egyptian Internet shutdown in Jan. 2011 to small outages every day that go unpublicized. We describe Trinocular, an outage detection system that uses active probing to understand reliability of edge networks. Trinocular is principled: deriving a simple model of the Internet that captures the information pertinent to outages, and populating that model through long-term data, and learning current network state through ICMP probes. It is parsimonious, using Bayesian inference to determine how many probes are needed. On average, each Trinocular instance sends fewer than 20 probes per hour to each /24 network block under study, increasing Internet \"background radiation\" by less than 0.7%. Trinocular is also predictable and precise: we provide known precision in outage timing and duration. Probing in rounds of 11 minutes, we detect 100% of outages one round or longer, and estimate outage duration within one-half round. Since we require little traffic, a single machine can track 3.4M /24 IPv4 blocks, all of the Internet currently suitable for analysis. We show that our approach is significantly more accurate than the best current methods, with about one-third fewer false conclusions, and about 30% greater coverage at constant accuracy. We validate our approach using controlled experiments, use Trinocular to analyze two days of Internet outages observed from three sites, and re-analyze three years of existing data to develop trends for the Internet.","PeriodicalId":159374,"journal":{"name":"Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114239686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many applications do not constrain the destinations of their network transfers. New opportunities emerge when such transfers contribute a large amount of network bytes. By choosing the endpoints to avoid congested links, completion times of these transfers as well as that of others without similar flexibility can be improved. In this paper, we focus on leveraging the flexibility in replica placement during writes to cluster file systems (CFSes), which account for almost half of all cross-rack traffic in data-intensive clusters. The replicas of a CFS write can be placed in any subset of machines as long as they are in multiple fault domains and ensure a balanced use of storage throughout the cluster. We study CFS interactions with the cluster network, analyze optimizations for replica placement, and propose Sinbad -- a system that identifies imbalance and adapts replica destinations to navigate around congested links. Experiments on EC2 and trace-driven simulations show that block writes complete 1.3X (respectively, 1.58X) faster as the network becomes more balanced. As a collateral benefit, end-to-end completion times of data-intensive jobs improve as well. Sinbad does so with little impact on the long-term storage balance.
{"title":"Leveraging endpoint flexibility in data-intensive clusters","authors":"Mosharaf Chowdhury, Srikanth Kandula, I. Stoica","doi":"10.1145/2486001.2486021","DOIUrl":"https://doi.org/10.1145/2486001.2486021","url":null,"abstract":"Many applications do not constrain the destinations of their network transfers. New opportunities emerge when such transfers contribute a large amount of network bytes. By choosing the endpoints to avoid congested links, completion times of these transfers as well as that of others without similar flexibility can be improved. In this paper, we focus on leveraging the flexibility in replica placement during writes to cluster file systems (CFSes), which account for almost half of all cross-rack traffic in data-intensive clusters. The replicas of a CFS write can be placed in any subset of machines as long as they are in multiple fault domains and ensure a balanced use of storage throughout the cluster. We study CFS interactions with the cluster network, analyze optimizations for replica placement, and propose Sinbad -- a system that identifies imbalance and adapts replica destinations to navigate around congested links. Experiments on EC2 and trace-driven simulations show that block writes complete 1.3X (respectively, 1.58X) faster as the network becomes more balanced. As a collateral benefit, end-to-end completion times of data-intensive jobs improve as well. Sinbad does so with little impact on the long-term storage balance.","PeriodicalId":159374,"journal":{"name":"Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114199803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}