首页 > 最新文献

Measurement and Modeling of Computer Systems最新文献

英文 中文
The efficacy of error mitigation techniques for DRAM retention failures: a comparative experimental study 错误缓解技术对DRAM保留故障的有效性:一项比较实验研究
Pub Date : 2014-06-16 DOI: 10.1145/2591971.2592000
S. Khan, Donghyuk Lee, Yoongu Kim, Alaa R. Alameldeen, C. Wilkerson, O. Mutlu
As DRAM cells continue to shrink, they become more susceptible to retention failures. DRAM cells that permanently exhibit short retention times are fairly easy to identify and repair through the use of memory tests and row and column redundancy. However, the retention time of many cells may vary over time due to a property called Variable Retention Time (VRT). Since these cells intermittently transition between failing and non-failing states, they are particularly difficult to identify through memory tests alone. In addition, the high temperature packaging process may aggravate this problem as the susceptibility of cells to VRT increases after the assembly of DRAM chips. A promising alternative to manufacture-time testing is to detect and mitigate retention failures after the system has become operational. Such a system would require mechanisms to detect and mitigate retention failures in the field, but would be responsive to retention failures introduced after system assembly and could dramatically reduce the cost of testing, enabling much longer tests than are practical with manufacturer testing equipment. In this paper, we analyze the efficacy of three common error mitigation techniques (memory tests, guardbands, and error correcting codes (ECC)) in real DRAM chips exhibiting both intermittent and permanent retention failures. Our analysis allows us to quantify the efficacy of recent system-level error mitigation mechanisms that build upon these techniques. We revisit prior works in the context of the experimental data we present, showing that our measured results significantly impact these works' conclusions. We find that mitigation techniques that rely on run-time testing alone [38, 27, 50, 26] are unable to ensure reliable operation even after many months of testing. Techniques that incorporate ECC[4, 52], however, can ensure reliable DRAM operation after only a few hours of testing. For example, VS-ECC[4], which couples testing with variable strength codes to allocate the strongest codes to the most error-prone memory regions, can ensure reliable operation for 10 years after only 19 minutes of testing. We conclude that the viability of these mitigation techniques depend on efficient online profiling of DRAM performed without disrupting system operation.
随着DRAM单元的不断缩小,它们变得更容易出现保留故障。通过使用内存测试和行和列冗余,可以很容易地识别和修复永久显示较短保留时间的DRAM单元。然而,由于一种称为可变保留时间(VRT)的特性,许多细胞的保留时间可能随时间而变化。由于这些细胞间歇性地在失败状态和非失败状态之间转换,因此仅通过记忆测试来识别它们特别困难。此外,高温封装工艺可能会加剧这一问题,因为在DRAM芯片组装后,电池对VRT的敏感性增加。制造时测试的一个很有前途的替代方案是在系统投入运行后检测和减少保留故障。这样的系统需要检测和减轻现场滞留故障的机制,但可以对系统组装后出现的滞留故障做出反应,并且可以大大降低测试成本,比制造商测试设备的实际测试时间长得多。在本文中,我们分析了三种常见的错误缓解技术(内存测试,保护带和纠错码(ECC))在实际的DRAM芯片中表现出间歇性和永久保留故障的有效性。我们的分析使我们能够量化基于这些技术的系统级错误缓解机制的有效性。我们在实验数据的背景下重新审视以前的工作,表明我们的测量结果显著影响这些工作的结论。我们发现,仅依赖于运行时测试的缓解技术[38,27,50,26]即使经过数月的测试也无法确保可靠的运行。然而,采用ECC[4,52]的技术可以在仅几个小时的测试后确保可靠的DRAM操作。例如,VS-ECC[4]采用可变强度码耦合测试,将最强码分配到最容易出错的存储区域,只需19分钟的测试即可确保10年的可靠运行。我们得出结论,这些缓解技术的可行性取决于在不中断系统运行的情况下对DRAM进行有效的在线分析。
{"title":"The efficacy of error mitigation techniques for DRAM retention failures: a comparative experimental study","authors":"S. Khan, Donghyuk Lee, Yoongu Kim, Alaa R. Alameldeen, C. Wilkerson, O. Mutlu","doi":"10.1145/2591971.2592000","DOIUrl":"https://doi.org/10.1145/2591971.2592000","url":null,"abstract":"As DRAM cells continue to shrink, they become more susceptible to retention failures. DRAM cells that permanently exhibit short retention times are fairly easy to identify and repair through the use of memory tests and row and column redundancy. However, the retention time of many cells may vary over time due to a property called Variable Retention Time (VRT). Since these cells intermittently transition between failing and non-failing states, they are particularly difficult to identify through memory tests alone. In addition, the high temperature packaging process may aggravate this problem as the susceptibility of cells to VRT increases after the assembly of DRAM chips. A promising alternative to manufacture-time testing is to detect and mitigate retention failures after the system has become operational. Such a system would require mechanisms to detect and mitigate retention failures in the field, but would be responsive to retention failures introduced after system assembly and could dramatically reduce the cost of testing, enabling much longer tests than are practical with manufacturer testing equipment.\u0000 In this paper, we analyze the efficacy of three common error mitigation techniques (memory tests, guardbands, and error correcting codes (ECC)) in real DRAM chips exhibiting both intermittent and permanent retention failures. Our analysis allows us to quantify the efficacy of recent system-level error mitigation mechanisms that build upon these techniques. We revisit prior works in the context of the experimental data we present, showing that our measured results significantly impact these works' conclusions. We find that mitigation techniques that rely on run-time testing alone [38, 27, 50, 26] are unable to ensure reliable operation even after many months of testing. Techniques that incorporate ECC[4, 52], however, can ensure reliable DRAM operation after only a few hours of testing. For example, VS-ECC[4], which couples testing with variable strength codes to allocate the strongest codes to the most error-prone memory regions, can ensure reliable operation for 10 years after only 19 minutes of testing. We conclude that the viability of these mitigation techniques depend on efficient online profiling of DRAM performed without disrupting system operation.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127515277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 176
Collecting, organizing, and sharing pins in pinterest: interest-driven or social-driven? 在pinterest上收集、整理和分享图钉:兴趣驱动还是社交驱动?
Pub Date : 2014-06-16 DOI: 10.1145/2591971.2591996
Jinyoung Han, Daejin Choi, Byung-Gon Chun, T. Kwon, Hyunchul Kim, Yanghee Choi
Pinterest, a popular social curating service where people collect, organize, and share content (pins in Pinterest), has gained great attention in recent years. Despite the increasing interest in Pinterest, little research has paid attention to how people collect, manage, and share pins in Pinterest. In this paper, to shed insight on such issues, we study the following questions. How do people collect and manage pins by their tastes in Pinterest? What factors do mainly drive people to share their pins in Pinterest? How do the characteristics of users (e.g., gender, popularity, country) or properties of pins (e.g., category, topic) play roles in propagating pins in Pinterest? To answer these questions, we have conducted a measurement study on patterns of pin curating and sharing in Pinterest. By keeping track of all the newly posted and shared pins in each category (e.g., animal, kids, women's fashion) from June 5 to July 18, 2013, we built 350 K pin propagation trees for 3 M users. With the dataset, we investigate: (1) how users collect and curate pins, (2) how users share their pins and why, and (3) how users are related by shared pins of interest. Our key finding is that pin propagation in Pinterest is mostly driven by pin's properties like its topic, not by user's characteristics like her number of followers. We further show that users in the same community in the interest graph (i.e., representing the relations among users) of Pinterest share pins (i) in the same category with 94% probability and (ii) of the same URL where pins come from with 89% probability. Finally, we explore the implications of our findings for predicting how pins are shared in Pinterest.
Pinterest是一个流行的社交策划服务,人们可以在这里收集、组织和分享内容(Pinterest中的pins),近年来获得了极大的关注。尽管人们对Pinterest的兴趣越来越大,但很少有研究关注人们如何在Pinterest上收集、管理和分享pin。在本文中,为了对这些问题有所了解,我们研究了以下问题。人们在Pinterest上是如何根据自己的品味收集和管理pins的?哪些因素主要驱使人们在Pinterest上分享他们的pin ?用户的特征(例如,性别,受欢迎程度,国家)或图钉的属性(例如,类别,主题)如何在Pinterest上传播图钉?为了回答这些问题,我们对Pinterest上的pin策展和分享模式进行了测量研究。通过跟踪2013年6月5日至7月18日每个类别(例如,动物,儿童,女性时尚)中所有新发布和共享的pin,我们为300万用户构建了350,000个pin传播树。使用该数据集,我们研究:(1)用户如何收集和管理pin,(2)用户如何共享他们的pin以及为什么共享他们的pin,以及(3)用户如何通过共享感兴趣的pin来关联。我们的主要发现是,Pinterest上的别针传播主要是由别针的属性(如主题)驱动的,而不是由用户的特征(如关注者数量)驱动的。我们进一步表明,在Pinterest的兴趣图(即代表用户之间的关系)中,同一社区的用户共享(i)同一类别中的pin的概率为94%,(ii) pin来自同一URL的概率为89%。最后,我们探讨了我们的研究结果对预测pins如何在Pinterest上共享的影响。
{"title":"Collecting, organizing, and sharing pins in pinterest: interest-driven or social-driven?","authors":"Jinyoung Han, Daejin Choi, Byung-Gon Chun, T. Kwon, Hyunchul Kim, Yanghee Choi","doi":"10.1145/2591971.2591996","DOIUrl":"https://doi.org/10.1145/2591971.2591996","url":null,"abstract":"Pinterest, a popular social curating service where people collect, organize, and share content (pins in Pinterest), has gained great attention in recent years. Despite the increasing interest in Pinterest, little research has paid attention to how people collect, manage, and share pins in Pinterest. In this paper, to shed insight on such issues, we study the following questions. How do people collect and manage pins by their tastes in Pinterest? What factors do mainly drive people to share their pins in Pinterest? How do the characteristics of users (e.g., gender, popularity, country) or properties of pins (e.g., category, topic) play roles in propagating pins in Pinterest? To answer these questions, we have conducted a measurement study on patterns of pin curating and sharing in Pinterest. By keeping track of all the newly posted and shared pins in each category (e.g., animal, kids, women's fashion) from June 5 to July 18, 2013, we built 350 K pin propagation trees for 3 M users. With the dataset, we investigate: (1) how users collect and curate pins, (2) how users share their pins and why, and (3) how users are related by shared pins of interest. Our key finding is that pin propagation in Pinterest is mostly driven by pin's properties like its topic, not by user's characteristics like her number of followers. We further show that users in the same community in the interest graph (i.e., representing the relations among users) of Pinterest share pins (i) in the same category with 94% probability and (ii) of the same URL where pins come from with 89% probability. Finally, we explore the implications of our findings for predicting how pins are shared in Pinterest.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115152802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
Concave switching in single and multihop networks 单跳和多跳网络中的凹交换
Pub Date : 2014-04-10 DOI: 10.1145/2591971.2591987
N. Walton
Switched queueing networks model wireless networks, input queued switches and numerous other networked communications systems. For single-hop networks, we consider a (α,g)-switch policy} which combines the MaxWeight policies with bandwidth sharing networks -- a further well studied model of Internet congestion. We prove the maximum stability property for this class of randomized policies. Thus these policies have the same first order behavior as the MaxWeight policies. However, for multihop networks some of these generalized polices address a number of critical weakness of the MaxWeight/BackPressure policies. For multihop networks with fixed routing, we consider the Proportional Scheduler (or (1,log)-policy). In this setting, the BackPressure policy is maximum stable, but must maintain a queue for every route-destination, which typically grows rapidly with a network's size. However, this proportionally fair policy only needs to maintain a queue for each outgoing link, which is typically bounded in number. As is common with Internet routing, by maintaining per-link queueing each node only needs to know the next hop for each packet and not its entire route. Further, in contrast to BackPressure, the Proportional Scheduler does not compare downstream queue lengths to determine weights, only local link information is required. This leads to greater potential for decomposed implementations of the policy. Through a reduction argument and an entropy argument, we demonstrate that, whilst maintaining substantially less queueing overhead, the Proportional Scheduler achieves maximum throughput stability.
交换排队网络模拟无线网络、输入排队交换机和许多其他网络通信系统。对于单跳网络,我们考虑一种(α,g)交换策略},它将MaxWeight策略与带宽共享网络相结合——这是一种进一步深入研究的互联网拥塞模型。我们证明了这类随机策略的极大稳定性。因此,这些策略具有与MaxWeight策略相同的一阶行为。然而,对于多跳网络,这些通用策略中的一些解决了MaxWeight/BackPressure策略的一些关键弱点。对于具有固定路由的多跳网络,我们考虑比例调度(或(1,log)-policy)。在这种情况下,BackPressure策略是最稳定的,但必须为每个路由目的地维护一个队列,该队列通常会随着网络规模的增长而迅速增长。然而,这种按比例公平的策略只需要为每个传出链路维护一个队列,该队列通常在数量上是有限的。与Internet路由一样,通过维护每个链路排队,每个节点只需要知道每个数据包的下一跳,而不需要知道整个路由。此外,与BackPressure相比,Proportional Scheduler不比较下游队列长度来确定权重,只需要本地链接信息。这为策略的分解实现提供了更大的可能性。通过一个约简参数和一个熵参数,我们证明,在保持更少的队列开销的同时,比例调度器实现了最大的吞吐量稳定性。
{"title":"Concave switching in single and multihop networks","authors":"N. Walton","doi":"10.1145/2591971.2591987","DOIUrl":"https://doi.org/10.1145/2591971.2591987","url":null,"abstract":"Switched queueing networks model wireless networks, input queued switches and numerous other networked communications systems. For single-hop networks, we consider a (α,g)-switch policy} which combines the MaxWeight policies with bandwidth sharing networks -- a further well studied model of Internet congestion. We prove the maximum stability property for this class of randomized policies. Thus these policies have the same first order behavior as the MaxWeight policies. However, for multihop networks some of these generalized polices address a number of critical weakness of the MaxWeight/BackPressure policies.\u0000 For multihop networks with fixed routing, we consider the Proportional Scheduler (or (1,log)-policy). In this setting, the BackPressure policy is maximum stable, but must maintain a queue for every route-destination, which typically grows rapidly with a network's size. However, this proportionally fair policy only needs to maintain a queue for each outgoing link, which is typically bounded in number. As is common with Internet routing, by maintaining per-link queueing each node only needs to know the next hop for each packet and not its entire route. Further, in contrast to BackPressure, the Proportional Scheduler does not compare downstream queue lengths to determine weights, only local link information is required. This leads to greater potential for decomposed implementations of the policy. Through a reduction argument and an entropy argument, we demonstrate that, whilst maintaining substantially less queueing overhead, the Proportional Scheduler achieves maximum throughput stability.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"211 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134304789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
The multi-shop ski rental problem 多店滑雪板租赁问题
Pub Date : 2014-04-09 DOI: 10.1145/2591971.2591984
Lin Ai, X. Wu, Lingxiao Huang, Longbo Huang, Pingzhong Tang, J. Li
We consider the multi-shop ski rental problem. This problem generalizes the classic ski rental problem to a multi-shop setting, in which each shop has different prices for renting and purchasing a pair of skis, and a consumer has to make decisions on when and where to buy. We are interested in the optimal online (competitive-ratio minimizing) mixed strategy from the consumer's perspective. For our problem in its basic form, we obtain exciting closed-form solutions and a linear time algorithm for computing them. We further demonstrate the generality of our approach by investigating three extensions of our basic problem, namely ones that consider costs incurred by entering a shop or switching to another shop. Our solutions to these problems suggest that the consumer must assign positive probability in exactly one shop at any buying time. Our results apply to many real-world applications, ranging from cost management in IaaS cloud to scheduling in distributed computing.
我们考虑多店滑雪板租赁问题。这个问题将经典的滑雪板租赁问题推广到多商店设置,其中每个商店对租用和购买一双滑雪板有不同的价格,消费者必须决定何时何地购买。从消费者的角度出发,我们对最优在线(竞争比最小化)混合策略感兴趣。对于问题的基本形式,我们得到了令人兴奋的闭形式解和计算它们的线性时间算法。我们通过研究基本问题的三个扩展,即考虑进入一家商店或切换到另一家商店所产生的成本,进一步证明了我们方法的通用性。我们对这些问题的解决方案表明,消费者必须在任何购买时间恰好在一家商店分配正概率。我们的研究结果适用于许多现实世界的应用程序,从IaaS云中的成本管理到分布式计算中的调度。
{"title":"The multi-shop ski rental problem","authors":"Lin Ai, X. Wu, Lingxiao Huang, Longbo Huang, Pingzhong Tang, J. Li","doi":"10.1145/2591971.2591984","DOIUrl":"https://doi.org/10.1145/2591971.2591984","url":null,"abstract":"We consider the multi-shop ski rental problem. This problem generalizes the classic ski rental problem to a multi-shop setting, in which each shop has different prices for renting and purchasing a pair of skis, and a consumer has to make decisions on when and where to buy. We are interested in the optimal online (competitive-ratio minimizing) mixed strategy from the consumer's perspective. For our problem in its basic form, we obtain exciting closed-form solutions and a linear time algorithm for computing them. We further demonstrate the generality of our approach by investigating three extensions of our basic problem, namely ones that consider costs incurred by entering a shop or switching to another shop. Our solutions to these problems suggest that the consumer must assign positive probability in exactly one shop at any buying time. Our results apply to many real-world applications, ranging from cost management in IaaS cloud to scheduling in distributed computing.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116033607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
The power of online learning in stochastic network optimization 在线学习在随机网络优化中的作用
Pub Date : 2014-04-06 DOI: 10.1145/2591971.2591990
Longbo Huang, Xin Liu, Xiaohong Hao
In this paper, we investigate the power of online learning in stochastic network optimization with unknown system statistics a priori. We are interested in understanding how information and learning can be efficiently incorporated into system control techniques, and what are the fundamental benefits of doing so. We propose two Online Learning-Aided Control techniques, OLAC and OLAC2, that explicitly utilize the past system information in current system control via a learning procedure called dual learning. We prove strong performance guarantees of the proposed algorithms: OLAC and OLAC2 achieve the near-optimal [O(ε), O([log(1/ε)]2)] utility-delay tradeoff and OLAC2 possesses an O-2/3) convergence time. Simulation results also confirm the superior performance of the proposed algorithms in practice. To the best of our knowledge, OLAC and OLAC2 are the first algorithms that simultaneously possess explicit near-optimal delay guarantee and sub-linear convergence time, and our attempt is the first to explicitly incorporate online learning into stochastic network optimization and to demonstrate its power in both theory and practice.
在本文中,我们研究了在线学习在未知系统先验统计量的随机网络优化中的作用。我们感兴趣的是了解如何将信息和学习有效地整合到系统控制技术中,以及这样做的基本好处是什么。我们提出了两种在线学习辅助控制技术,OLAC和OLAC2,它们通过称为双重学习的学习过程明确地利用过去的系统信息进行当前系统控制。我们证明了所提出算法的强大性能保证:OLAC和OLAC2实现了接近最优的[O(ε), O([log(1/ε)]2)]效用-延迟权衡,OLAC2具有O(ε-2/3)收敛时间。仿真结果也证实了该算法在实际应用中的优越性能。据我们所知,OLAC和OLAC2是第一个同时拥有明确的近最优延迟保证和亚线性收敛时间的算法,我们的尝试是第一个明确地将在线学习纳入随机网络优化,并在理论和实践中展示其力量。
{"title":"The power of online learning in stochastic network optimization","authors":"Longbo Huang, Xin Liu, Xiaohong Hao","doi":"10.1145/2591971.2591990","DOIUrl":"https://doi.org/10.1145/2591971.2591990","url":null,"abstract":"In this paper, we investigate the power of online learning in stochastic network optimization with unknown system statistics <i>a priori</i>. We are interested in understanding how information and learning can be efficiently incorporated into system control techniques, and what are the fundamental benefits of doing so. We propose two <i>Online Learning-Aided Control</i> techniques, <b>OLAC</b> and <b>OLAC2</b>, that explicitly utilize the past system information in current system control via a learning procedure called <i>dual learning</i>. We prove strong performance guarantees of the proposed algorithms: <b>OLAC</b> and <b>OLAC2</b> achieve the near-optimal [<i>O</i>(ε), <i>O</i>([log(1/ε)]<sup>2</sup>)] utility-delay tradeoff and <b>OLAC2</b> possesses an <i>O</i>(ε<sup>-2/3</sup>) convergence time. Simulation results also confirm the superior performance of the proposed algorithms in practice. To the best of our knowledge, <b>OLAC</b> and <b>OLAC2</b> are the first algorithms that simultaneously possess explicit near-optimal delay guarantee and sub-linear convergence time, and our attempt is the first to explicitly incorporate online learning into stochastic network optimization and to demonstrate its power in both theory and practice.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127830846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
Privacy tradeoffs in predictive analytics 预测分析中的隐私权衡
Pub Date : 2014-03-31 DOI: 10.1145/2591971.2592011
Stratis Ioannidis, A. Montanari, Udi Weinsberg, Smriti Bhagat, N. Fawaz, N. Taft
Online services routinely mine user data to predict user preferences, make recommendations, and place targeted ads. Recent research has demonstrated that several private user attributes (such as political affiliation, sexual orientation, and gender) can be inferred from such data. Can a privacy-conscious user benefit from personalization while simultaneously protecting her private attributes? We study this question in the context of a rating prediction service based on matrix factorization. We construct a protocol of interactions between the service and users that has remarkable optimality properties: it is privacy-preserving, in that no inference algorithm can succeed in inferring a user's private attribute with a probability better than random guessing; it has maximal accuracy, in that no other privacy-preserving protocol improves rating prediction; and, finally, it involves a minimal disclosure, as the prediction accuracy strictly decreases when the service reveals less information. We extensively evaluate our protocol using several rating datasets, demonstrating that it successfully blocks the inference of gender, age and political affiliation, while incurring less than 5% decrease in the accuracy of rating prediction.
在线服务通常会挖掘用户数据来预测用户偏好、提出建议和投放有针对性的广告。最近的研究表明,可以从这些数据中推断出一些私人用户属性(如政治派别、性取向和性别)。注重隐私的用户能否从个性化中获益,同时保护其隐私属性?我们在基于矩阵分解的评级预测服务的背景下研究这个问题。我们构建了一个服务和用户之间的交互协议,它具有显著的最优性:它是隐私保护的,因为没有任何推理算法能够以比随机猜测更好的概率成功推断用户的隐私属性;它具有最大的准确性,因为没有其他隐私保护协议可以提高评级预测;最后,它涉及最小的披露,因为当服务披露的信息较少时,预测的准确性会严格降低。我们使用几个评级数据集广泛评估了我们的协议,证明它成功地阻止了性别、年龄和政治派别的推断,同时导致评级预测准确性下降不到5%。
{"title":"Privacy tradeoffs in predictive analytics","authors":"Stratis Ioannidis, A. Montanari, Udi Weinsberg, Smriti Bhagat, N. Fawaz, N. Taft","doi":"10.1145/2591971.2592011","DOIUrl":"https://doi.org/10.1145/2591971.2592011","url":null,"abstract":"Online services routinely mine user data to predict user preferences, make recommendations, and place targeted ads. Recent research has demonstrated that several private user attributes (such as political affiliation, sexual orientation, and gender) can be inferred from such data. Can a privacy-conscious user benefit from personalization while simultaneously protecting her private attributes? We study this question in the context of a rating prediction service based on matrix factorization. We construct a protocol of interactions between the service and users that has remarkable optimality properties: it is privacy-preserving, in that no inference algorithm can succeed in inferring a user's private attribute with a probability better than random guessing; it has maximal accuracy, in that no other privacy-preserving protocol improves rating prediction; and, finally, it involves a minimal disclosure, as the prediction accuracy strictly decreases when the service reveals less information. We extensively evaluate our protocol using several rating datasets, demonstrating that it successfully blocks the inference of gender, age and political affiliation, while incurring less than 5% decrease in the accuracy of rating prediction.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123540192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Exact analysis of TTL cache networks: the case of caching policies driven by stopping times TTL缓存网络的精确分析:由停止时间驱动的缓存策略的情况
Pub Date : 2014-02-24 DOI: 10.1145/2591971.2592038
Daniel S. Berger, Philipp Gland, Sahil Singla, F. Ciucu
TTL caching models have recently regained significant research interest, largely due to their ability to fit popular caching policies such as LRU. In this extended abstract we briefly describe our recent work on two exact methods to analyze TTL cache networks. The first method generalizes existing results for line networks under renewal requests to the broad class of caching policies whereby evictions are driven by stopping times. The obtained results are further generalized, using the second method, to feedforward networks with Markov arrival processes (MAP) requests. MAPs are particularly suitable for non-line networks because they are closed not only under superposition and splitting, as known, but also under input-output caching operations as proven herein for phase-type TTL distributions. The crucial benefit of the two closure properties is that they jointly enable the first exact analysis of feedforward networks of TTL caches in great generality.
TTL缓存模型最近重新获得了重要的研究兴趣,主要是因为它们能够适应流行的缓存策略,如LRU。在这个扩展的摘要中,我们简要地描述了我们最近在分析TTL缓存网络的两种精确方法上的工作。第一种方法将线路网络在更新请求下的现有结果推广到广泛的缓存策略类别,其中驱逐是由停止时间驱动的。将得到的结果进一步推广到具有马尔可夫到达过程(MAP)请求的前馈网络。map特别适合于非线性网络,因为它们不仅在已知的叠加和分裂下是封闭的,而且在输入-输出缓存操作下也是封闭的,这在本文中对相位型TTL分布进行了证明。这两个闭包特性的关键好处是,它们共同使我们能够对TTL缓存的前馈网络进行第一次精确的分析。
{"title":"Exact analysis of TTL cache networks: the case of caching policies driven by stopping times","authors":"Daniel S. Berger, Philipp Gland, Sahil Singla, F. Ciucu","doi":"10.1145/2591971.2592038","DOIUrl":"https://doi.org/10.1145/2591971.2592038","url":null,"abstract":"TTL caching models have recently regained significant research interest, largely due to their ability to fit popular caching policies such as LRU. In this extended abstract we briefly describe our recent work on two exact methods to analyze TTL cache networks. The first method generalizes existing results for line networks under renewal requests to the broad class of caching policies whereby evictions are driven by stopping times. The obtained results are further generalized, using the second method, to feedforward networks with Markov arrival processes (MAP) requests. MAPs are particularly suitable for non-line networks because they are closed not only under superposition and splitting, as known, but also under input-output caching operations as proven herein for phase-type TTL distributions. The crucial benefit of the two closure properties is that they jointly enable the first exact analysis of feedforward networks of TTL caches in great generality.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"19 19","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113941727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
Jointly clustering rows and columns of binary matrices: algorithms and trade-offs 二值矩阵的行和列联合聚类:算法和权衡
Pub Date : 2013-10-01 DOI: 10.1145/2591971.2592005
Jiaming Xu, Rui Wu, Kai Zhu, B. Hajek, R. Srikant, Lei Ying
In standard clustering problems, data points are represented by vectors, and by stacking them together, one forms a data matrix with row or column cluster structure. In this paper, we consider a class of binary matrices, arising in many applications, which exhibit both row and column cluster structure, and our goal is to exactly recover the underlying row and column clusters by observing only a small fraction of noisy entries. We first derive a lower bound on the minimum number of observations needed for exact cluster recovery. Then, we study three algorithms with different running time and compare the number of observations needed by them for successful cluster recovery. Our analytical results show smooth time-data trade offs: one can gradually reduce the computational complexity when increasingly more observations are available.
在标准聚类问题中,数据点用向量表示,通过将它们堆叠在一起,形成具有行或列聚类结构的数据矩阵。在本文中,我们考虑了一类在许多应用中出现的二进制矩阵,它同时表现出行和列簇结构,我们的目标是通过观察一小部分有噪声的条目来精确地恢复底层的行和列簇。我们首先推导出精确集群恢复所需的最小观测数的下界。然后,我们研究了三种不同运行时间的算法,并比较了它们成功恢复集群所需的观测数。我们的分析结果显示了平滑的时间-数据权衡:当可用的观测值越来越多时,可以逐渐降低计算复杂性。
{"title":"Jointly clustering rows and columns of binary matrices: algorithms and trade-offs","authors":"Jiaming Xu, Rui Wu, Kai Zhu, B. Hajek, R. Srikant, Lei Ying","doi":"10.1145/2591971.2592005","DOIUrl":"https://doi.org/10.1145/2591971.2592005","url":null,"abstract":"In standard clustering problems, data points are represented by vectors, and by stacking them together, one forms a data matrix with row or column cluster structure. In this paper, we consider a class of binary matrices, arising in many applications, which exhibit both row and column cluster structure, and our goal is to exactly recover the underlying row and column clusters by observing only a small fraction of noisy entries. We first derive a lower bound on the minimum number of observations needed for exact cluster recovery. Then, we study three algorithms with different running time and compare the number of observations needed by them for successful cluster recovery. Our analytical results show smooth time-data trade offs: one can gradually reduce the computational complexity when increasingly more observations are available.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116762105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
Defragmenting the cloud using demand-based resource allocation 使用基于需求的资源分配对云进行碎片整理
Pub Date : 2013-06-17 DOI: 10.1145/2465529.2465763
Ganesha Shanmuganathan, Ajay Gulati, P. Varman
Current public cloud offerings sell capacity in the form of pre-defined virtual machine (VM) configurations to their tenants. Typically this means that tenants must purchase individual VM configurations based on the peak demands of the applications, or be restricted to only scale-out applications that can share a pool of VMs. This diminishes the value proposition of moving to a public cloud as compared to server consolidation in a private virtualized datacenter, where one gets the benefits of statistical multiplexing between VMs belonging to the same or different applications. Ideally one would like to enable a cloud tenant to buy capacity in bulk and benefit from statistical multiplexing among its workloads. This requires the purchased capacity to be dynamically and transparently allocated among the tenant's VMs that may be running on different servers, even across datacenters. In this paper, we propose two novel algorithms called BPX and DBS that are able to provide the cloud customer with the abstraction of buying bulk capacity. These algorithms dynamically allocate the bulk capacity purchased by a customer between its VMs based on their individual demands and user-set importance. Our algorithms are highly scalable and are designed to work in a large-scale distributed environment. We implemented a prototype of BPX as part of VMware's management software and showed that BPX is able to closely mimic the behavior of a centralized allocator in a distributed manner.
当前的公共云产品以预定义的虚拟机(VM)配置的形式向其租户出售容量。通常,这意味着租户必须根据应用程序的峰值需求购买单独的VM配置,或者仅限于可以共享VM池的扩展应用程序。与私有虚拟化数据中心中的服务器整合相比,这降低了迁移到公共云的价值主张,在私有虚拟化数据中心中,人们可以获得属于相同或不同应用程序的vm之间的统计多路复用的好处。理想情况下,人们希望云租户能够批量购买容量,并从其工作负载之间的统计多路复用中获益。这需要在租户的vm之间动态透明地分配购买的容量,这些vm可能运行在不同的服务器上,甚至跨数据中心。在本文中,我们提出了两种新颖的算法,即BPX和DBS,它们能够为云客户提供购买大容量的抽象。这些算法根据客户的个人需求和用户设置的重要性在虚拟机之间动态分配客户购买的大容量。我们的算法是高度可扩展的,设计用于在大规模分布式环境中工作。我们实现了一个BPX的原型,作为VMware管理软件的一部分,并表明BPX能够以分布式方式密切模仿集中式分配器的行为。
{"title":"Defragmenting the cloud using demand-based resource allocation","authors":"Ganesha Shanmuganathan, Ajay Gulati, P. Varman","doi":"10.1145/2465529.2465763","DOIUrl":"https://doi.org/10.1145/2465529.2465763","url":null,"abstract":"Current public cloud offerings sell capacity in the form of pre-defined virtual machine (VM) configurations to their tenants. Typically this means that tenants must purchase individual VM configurations based on the peak demands of the applications, or be restricted to only scale-out applications that can share a pool of VMs. This diminishes the value proposition of moving to a public cloud as compared to server consolidation in a private virtualized datacenter, where one gets the benefits of statistical multiplexing between VMs belonging to the same or different applications. Ideally one would like to enable a cloud tenant to buy capacity in bulk and benefit from statistical multiplexing among its workloads. This requires the purchased capacity to be dynamically and transparently allocated among the tenant's VMs that may be running on different servers, even across datacenters. In this paper, we propose two novel algorithms called BPX and DBS that are able to provide the cloud customer with the abstraction of buying bulk capacity. These algorithms dynamically allocate the bulk capacity purchased by a customer between its VMs based on their individual demands and user-set importance. Our algorithms are highly scalable and are designed to work in a large-scale distributed environment. We implemented a prototype of BPX as part of VMware's management software and showed that BPX is able to closely mimic the behavior of a centralized allocator in a distributed manner.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122618720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Tolerating path heterogeneity in multipath TCP with bounded receive buffers 在有界接收缓冲区的多路径TCP中容忍路径异质性
Pub Date : 2013-06-17 DOI: 10.1145/2465529.2465750
Ming Li, Andrey Lukyanenko, S. Tarkoma, Yong Cui, Antti Ylä-Jääski
Abstract With bounded receive buffers, the aggregate bandwidth of multipath transmission degrades significantly in the presence of path heterogeneity. The performance could even be worse than that of single-path TCP, undermining the advantage gained by using multipath transmit. Furthermore, multipath transmission also suffers from delay and jitter even with large receive buffers. In order to tolerate the path heterogeneity when the receive buffer is bounded, we propose a new multipath TCP protocol, namely SC-MPTCP, by integrating linear systematic coding into MPTCP. In SC-MPTCP, we make use of coded packets as redundancy to counter against expensive retransmissions. The redundancy is provisioned into both proactive and reactive data. Specifically, to send a generation of packets, SC-MPTCP transmits proactive redundancy first and then delivers the original packets, instead of encoding all sent-out packets as all the existing coding solutions have done. The proactive redundancy is continuously updated according to the estimated aggregate retransmission ratio. In order to avoid the proactive redundancy being underestimated, the pre-blocking warning mechanism is utilized to retrieve the reactive redundancy from the sender. We use an NS-3 network simulator to evaluate the performance of SC-MPTCP with and without the coupled congestion control option. The results show that with bounded receive buffers, MPTCP achieves less than 20% of the optimal goodput with diverse packet losses, whereas SC-MPTCP approaches the optimal performance with significantly smaller receive buffers. With the help of systematic coding, SC-MPTCP reduces the average buffer delay of MPTCP by at least 80% in different test scenarios. We also demonstrate that the use of systematic coding could significantly reduce the arithmetic complexity compared with the use of non-systematic coding.
摘要在接收缓冲区有限的情况下,多径传输的聚合带宽在存在路径异构的情况下会显著下降。性能甚至可能比单路径TCP更差,从而破坏了使用多路径传输所获得的优势。此外,即使有较大的接收缓冲区,多径传输也会受到延迟和抖动的影响。为了在接收缓冲区有限的情况下能够容忍路径的异质性,我们将线性系统编码集成到MPTCP中,提出了一种新的多路径TCP协议SC-MPTCP。在SC-MPTCP中,我们使用编码包作为冗余来对抗昂贵的重传。冗余被配置为主动数据和被动数据。具体来说,为了发送一代数据包,SC-MPTCP首先传输主动冗余,然后再发送原始数据包,而不是像所有现有的编码解决方案那样对所有发送的数据包进行编码。主动冗余根据估计的总重传率不断更新。为了避免主动冗余被低估,利用预阻塞预警机制从发送方获取被动冗余。我们使用NS-3网络模拟器来评估有无耦合拥塞控制选项的SC-MPTCP的性能。结果表明,在有界接收缓冲区的情况下,MPTCP在不同丢包情况下实现的最佳good - put不到20%,而SC-MPTCP在接收缓冲区明显更小的情况下接近最佳性能。在系统编码的帮助下,SC-MPTCP在不同的测试场景下,将MPTCP的平均缓冲延迟降低了至少80%。我们还证明,与使用非系统编码相比,使用系统编码可以显著降低算法复杂度。
{"title":"Tolerating path heterogeneity in multipath TCP with bounded receive buffers","authors":"Ming Li, Andrey Lukyanenko, S. Tarkoma, Yong Cui, Antti Ylä-Jääski","doi":"10.1145/2465529.2465750","DOIUrl":"https://doi.org/10.1145/2465529.2465750","url":null,"abstract":"Abstract With bounded receive buffers, the aggregate bandwidth of multipath transmission degrades significantly in the presence of path heterogeneity. The performance could even be worse than that of single-path TCP, undermining the advantage gained by using multipath transmit. Furthermore, multipath transmission also suffers from delay and jitter even with large receive buffers. In order to tolerate the path heterogeneity when the receive buffer is bounded, we propose a new multipath TCP protocol, namely SC-MPTCP, by integrating linear systematic coding into MPTCP. In SC-MPTCP, we make use of coded packets as redundancy to counter against expensive retransmissions. The redundancy is provisioned into both proactive and reactive data. Specifically, to send a generation of packets, SC-MPTCP transmits proactive redundancy first and then delivers the original packets, instead of encoding all sent-out packets as all the existing coding solutions have done. The proactive redundancy is continuously updated according to the estimated aggregate retransmission ratio. In order to avoid the proactive redundancy being underestimated, the pre-blocking warning mechanism is utilized to retrieve the reactive redundancy from the sender. We use an NS-3 network simulator to evaluate the performance of SC-MPTCP with and without the coupled congestion control option. The results show that with bounded receive buffers, MPTCP achieves less than 20% of the optimal goodput with diverse packet losses, whereas SC-MPTCP approaches the optimal performance with significantly smaller receive buffers. With the help of systematic coding, SC-MPTCP reduces the average buffer delay of MPTCP by at least 80% in different test scenarios. We also demonstrate that the use of systematic coding could significantly reduce the arithmetic complexity compared with the use of non-systematic coding.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134216950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
期刊
Measurement and Modeling of Computer Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1