首页 > 最新文献

Proceedings of the 16th ACM International Conference on Computing Frontiers最新文献

英文 中文
Scaling up performance of fat nodes for HPC 提高高性能计算中脂肪节点的性能
Pub Date : 2019-04-30 DOI: 10.1145/3310273.3325137
Alejandro Rico
Future computing systems will integrate an increasing number of compute elements in processors. Such systems must be designed to efficiently scale up and to provide effective synchronization semantics, fast data movement and resource management. At the same time, it is paramount to understand application characteristics to dimension hardware components and interfaces, while adapting the codes to better exploit performance through those features without wasting area or power. This talk will cover multiple technologies targeted to scale up performance of large processors and research insights around synchronization, coherence, bandwidth and resource management, developed during the co-design effort with HPC codes for future systems.
未来的计算系统将在处理器中集成越来越多的计算元素。这样的系统必须设计成能够有效地扩展,并提供有效的同步语义、快速数据移动和资源管理。与此同时,理解应用程序特征以确定硬件组件和接口的尺寸是至关重要的,同时在不浪费面积或功率的情况下,通过这些特性调整代码以更好地利用性能。本次演讲将涵盖多种旨在提升大型处理器性能的技术,以及围绕同步、相干性、带宽和资源管理的研究见解,这些都是在未来系统的HPC代码协同设计期间开发的。
{"title":"Scaling up performance of fat nodes for HPC","authors":"Alejandro Rico","doi":"10.1145/3310273.3325137","DOIUrl":"https://doi.org/10.1145/3310273.3325137","url":null,"abstract":"Future computing systems will integrate an increasing number of compute elements in processors. Such systems must be designed to efficiently scale up and to provide effective synchronization semantics, fast data movement and resource management. At the same time, it is paramount to understand application characteristics to dimension hardware components and interfaces, while adapting the codes to better exploit performance through those features without wasting area or power. This talk will cover multiple technologies targeted to scale up performance of large processors and research insights around synchronization, coherence, bandwidth and resource management, developed during the co-design effort with HPC codes for future systems.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116746854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatially fine-grained air quality prediction based on DBU-LSTM 基于DBU-LSTM的空间细粒度空气质量预测
Pub Date : 2019-04-30 DOI: 10.1145/3310273.3322829
Liang Ge, Aoli Zhou, Hang Li, Junling Liu
This paper proposes a general approach to predict the spatially fine-grained air quality. The model is based on deep bidirectional and unidirectional long short-term memory (DBU-LSTM) neural network, which can capture bidirectional temporal dependencies and spatial correlations from time series data. Urban heterogeneous data such as point of interest (POI) and road network are used to evaluate the similarities between urban regions. The tensor decomposition method is used to complete the missing historical air quality data of monitoring stations. We evaluate our approach on real data sources obtained in Beijing, and the experimental results show its advantages over baseline methods.
本文提出了一种预测空间细粒空气质量的通用方法。该模型基于深度双向和单向长短期记忆(DBU-LSTM)神经网络,可以捕获时间序列数据的双向时间依赖性和空间相关性。城市异构数据,如兴趣点(POI)和道路网络,用于评估城市区域之间的相似性。利用张量分解法补全监测站缺失的历史空气质量数据。我们在北京获得的真实数据源上对该方法进行了评估,实验结果表明该方法优于基线方法。
{"title":"Spatially fine-grained air quality prediction based on DBU-LSTM","authors":"Liang Ge, Aoli Zhou, Hang Li, Junling Liu","doi":"10.1145/3310273.3322829","DOIUrl":"https://doi.org/10.1145/3310273.3322829","url":null,"abstract":"This paper proposes a general approach to predict the spatially fine-grained air quality. The model is based on deep bidirectional and unidirectional long short-term memory (DBU-LSTM) neural network, which can capture bidirectional temporal dependencies and spatial correlations from time series data. Urban heterogeneous data such as point of interest (POI) and road network are used to evaluate the similarities between urban regions. The tensor decomposition method is used to complete the missing historical air quality data of monitoring stations. We evaluate our approach on real data sources obtained in Beijing, and the experimental results show its advantages over baseline methods.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122787256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Approximate loop unrolling 近似循环展开
Pub Date : 2019-04-30 DOI: 10.1145/3310273.3323841
M. Rodriguez-Cancio, B. Combemale, B. Baudry
We introduce Approximate Unrolling, a compiler loop optimization that reduces execution time and energy consumption, exploiting code regions that can endure some approximation and still produce acceptable results. Specifically, this work focuses on counted loops that map a function over the elements of an array. Approximate Unrolling transforms loops similarly to Loop Unrolling. However, unlike its exact counterpart, our optimization does not unroll loops by adding exact copies of the loop's body. Instead, it adds code that interpolates the results of previous iterations.
我们介绍了近似展开,这是一种编译器循环优化,可以减少执行时间和能量消耗,利用可以忍受一些近似并仍然产生可接受结果的代码区域。具体来说,这项工作侧重于将函数映射到数组元素上的计数循环。近似展开对循环的变换类似于循环展开。然而,与它的精确副本不同,我们的优化不会通过添加循环主体的精确副本来展开循环。相反,它会添加插入先前迭代结果的代码。
{"title":"Approximate loop unrolling","authors":"M. Rodriguez-Cancio, B. Combemale, B. Baudry","doi":"10.1145/3310273.3323841","DOIUrl":"https://doi.org/10.1145/3310273.3323841","url":null,"abstract":"We introduce Approximate Unrolling, a compiler loop optimization that reduces execution time and energy consumption, exploiting code regions that can endure some approximation and still produce acceptable results. Specifically, this work focuses on counted loops that map a function over the elements of an array. Approximate Unrolling transforms loops similarly to Loop Unrolling. However, unlike its exact counterpart, our optimization does not unroll loops by adding exact copies of the loop's body. Instead, it adds code that interpolates the results of previous iterations.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126221848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
CacheGuard: a security-enhanced directory architecture against continuous attacks CacheGuard:针对持续攻击的安全增强的目录架构
Pub Date : 2019-04-30 DOI: 10.1145/3310273.3323051
Kai Wang, Fengkai Yuan, Rui Hou, Jingqiang Lin, Z. Ji, Dan Meng
Modern processor cores share the last-level cache and directory to improve resource utilization. Unfortunately, such sharing makes the cache vulnerable to cross-core cache side channel attacks. Recent studies show that information leakage through cross-core cache side channel attacks is a serious threat in different computing domains ranging from cloud servers and mobile phones to embedded devices. However, previous solutions have limitations of losing performance, lacking golden standards, requiring software support, or being easily bypassed. In this paper, we observe that most cross-core cache side channel attacks cause sensitive data to appear in a ping-pong pattern in continuous attack scenarios, where attackers need to launch numerous attacks in a short period of time. This paper proposes CacheGuard to defend against the continuous attacks. CacheGuard extends the directory architecture for capturing the ping-pong patterns. Once the ping-pong pattern of a cache line is captured, Cache-Guard can secure the line with two pattern-oriented counteractions, Preload and Lock. The experimental evaluation demonstrates that CacheGuard can block the continuous attacks, and that it induces negligible performance degradation and hardware overhead.
现代处理器内核共享最后一级缓存和目录,以提高资源利用率。不幸的是,这种共享使得缓存容易受到跨核缓存侧通道攻击。近年来的研究表明,跨核缓存侧信道攻击导致的信息泄露是云服务器、移动电话、嵌入式设备等不同计算领域的严重威胁。然而,以前的解决方案存在性能下降、缺乏黄金标准、需要软件支持或容易被绕过的局限性。在本文中,我们观察到大多数跨核缓存侧通道攻击会导致敏感数据在连续攻击场景下出现乒乓模式,攻击者需要在短时间内发动多次攻击。本文提出了CacheGuard来防御连续攻击。CacheGuard扩展了用于捕获乒乓模式的目录体系结构。一旦捕获了缓存线的乒乓模式,cache - guard可以使用两个面向模式的反作用力(Preload和Lock)来保护该线路。实验评估表明,CacheGuard可以有效地阻止连续攻击,并且性能降低和硬件开销很小。
{"title":"CacheGuard: a security-enhanced directory architecture against continuous attacks","authors":"Kai Wang, Fengkai Yuan, Rui Hou, Jingqiang Lin, Z. Ji, Dan Meng","doi":"10.1145/3310273.3323051","DOIUrl":"https://doi.org/10.1145/3310273.3323051","url":null,"abstract":"Modern processor cores share the last-level cache and directory to improve resource utilization. Unfortunately, such sharing makes the cache vulnerable to cross-core cache side channel attacks. Recent studies show that information leakage through cross-core cache side channel attacks is a serious threat in different computing domains ranging from cloud servers and mobile phones to embedded devices. However, previous solutions have limitations of losing performance, lacking golden standards, requiring software support, or being easily bypassed. In this paper, we observe that most cross-core cache side channel attacks cause sensitive data to appear in a ping-pong pattern in continuous attack scenarios, where attackers need to launch numerous attacks in a short period of time. This paper proposes CacheGuard to defend against the continuous attacks. CacheGuard extends the directory architecture for capturing the ping-pong patterns. Once the ping-pong pattern of a cache line is captured, Cache-Guard can secure the line with two pattern-oriented counteractions, Preload and Lock. The experimental evaluation demonstrates that CacheGuard can block the continuous attacks, and that it induces negligible performance degradation and hardware overhead.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131423282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
User-centered context-aware CPU/GPU power management for interactive applications on smartphones 以用户为中心的上下文感知CPU/GPU电源管理在智能手机上的交互式应用程序
Pub Date : 2019-04-30 DOI: 10.1145/3310273.3322825
Syuan-Yi Lin, C. King
CPU/GPU frequency scheduling on smartphones that maintains users' quality of experience (QoE) while reducing power consumption has been studied extensively in the past. Most previous works focused on power-hungry applications such as video streaming or 3D games. However, the majority of people are light to medium users, using applications such as social networking, web browsing, etc. For such interactive applications, it is difficult to reduce power consumption, because their behaviors depend on the user's interactions and are hard to characterize. In this paper, we tackle this challenging problem by considering the influences of user contexts on their interaction behaviors. A context-aware CPU/GPU frequency scheduling governor is proposed that allocates CPU/GPU frequencies just enough to meet the workload under different stages of user interaction. Evaluations show that the proposed governor can save power consumption up to 25% compared to the default governor while keeping the users satisfied with the QoE.
智能手机上的CPU/GPU频率调度在保持用户体验质量(QoE)的同时降低功耗已经得到了广泛的研究。之前的大部分工作都集中在高能耗的应用上,比如视频流或3D游戏。然而,大多数人都是轻度到中度用户,使用诸如社交网络、网页浏览等应用程序。对于这样的交互式应用,由于其行为依赖于用户的交互并且难以表征,因此很难降低功耗。在本文中,我们通过考虑用户上下文对其交互行为的影响来解决这个具有挑战性的问题。提出了一种上下文感知的CPU/GPU频率调度调控器,在用户交互的不同阶段分配刚好满足负载的CPU/GPU频率。评估表明,与默认调控器相比,建议的调控器可以节省高达25%的功耗,同时保持用户对QoE的满意。
{"title":"User-centered context-aware CPU/GPU power management for interactive applications on smartphones","authors":"Syuan-Yi Lin, C. King","doi":"10.1145/3310273.3322825","DOIUrl":"https://doi.org/10.1145/3310273.3322825","url":null,"abstract":"CPU/GPU frequency scheduling on smartphones that maintains users' quality of experience (QoE) while reducing power consumption has been studied extensively in the past. Most previous works focused on power-hungry applications such as video streaming or 3D games. However, the majority of people are light to medium users, using applications such as social networking, web browsing, etc. For such interactive applications, it is difficult to reduce power consumption, because their behaviors depend on the user's interactions and are hard to characterize. In this paper, we tackle this challenging problem by considering the influences of user contexts on their interaction behaviors. A context-aware CPU/GPU frequency scheduling governor is proposed that allocates CPU/GPU frequencies just enough to meet the workload under different stages of user interaction. Evaluations show that the proposed governor can save power consumption up to 25% compared to the default governor while keeping the users satisfied with the QoE.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130412449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Performance statistics and learning based detection of exploitative speculative attacks 性能统计和基于学习的利用投机攻击检测
Pub Date : 2019-04-30 DOI: 10.1145/3310273.3322832
Swastika Dutta, S. Sinha
Most of the modern processors perform out-of-order speculative executions to maximise system performance. Spectre and Meltdown exploit these optimisations and execute certain instructions leading to leakage of confidential information of the victim. All the variants of this class of attacks necessarily exploit branch prediction or speculative execution. Using this insight, we develop a two step strategy to effectively detect these attacks using performance counter statistics, correlation coefficient model, deep neural network and fast Fourier transform. Our approach is expected to provide reliable, fast and highly accurate results with no perceivable loss in system performance or system overhead.
大多数现代处理器执行乱序推测执行以最大化系统性能。Spectre和Meltdown利用这些优化并执行某些指令,导致受害者的机密信息泄露。这类攻击的所有变体都必须利用分支预测或推测执行。利用这一见解,我们开发了一种两步策略,利用性能计数器统计、相关系数模型、深度神经网络和快速傅立叶变换有效地检测这些攻击。我们的方法有望提供可靠、快速和高度准确的结果,而不会对系统性能或系统开销造成可察觉的损失。
{"title":"Performance statistics and learning based detection of exploitative speculative attacks","authors":"Swastika Dutta, S. Sinha","doi":"10.1145/3310273.3322832","DOIUrl":"https://doi.org/10.1145/3310273.3322832","url":null,"abstract":"Most of the modern processors perform out-of-order speculative executions to maximise system performance. Spectre and Meltdown exploit these optimisations and execute certain instructions leading to leakage of confidential information of the victim. All the variants of this class of attacks necessarily exploit branch prediction or speculative execution. Using this insight, we develop a two step strategy to effectively detect these attacks using performance counter statistics, correlation coefficient model, deep neural network and fast Fourier transform. Our approach is expected to provide reliable, fast and highly accurate results with no perceivable loss in system performance or system overhead.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131931903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Accelerating parallel graph computing with speculation 利用推测加速并行图计算
Pub Date : 2019-04-30 DOI: 10.1145/3310273.3323049
Shuo Ji, Yinliang Zhao, Qing Yi
Nowadays distributed graph computing is widely used to process large amount of data on the internet. Communication overhead is a critical factor in determining the overall efficiency of graph algorithms. Through speculative prediction of the content of communications, we develop an optimization technique to significantly reduce the amount of communications needed for a class of graph algorithms. We have evaluated our optimization technique using five graph algorithms, Single-source shortest path, Connected Components, PageRank, Diameter, and Random Walk, on the Amazon EC2 clusters using different graph datasets. Our optimized implementations have reduced communication overhead by 21--93% for these algorithms, while keeping the error rates under 5%.
目前,分布式图计算被广泛用于处理互联网上的大量数据。通信开销是决定图算法整体效率的关键因素。通过对通信内容的推测性预测,我们开发了一种优化技术,以显着减少一类图算法所需的通信量。我们在Amazon EC2集群上使用不同的图数据集,使用五种图算法(单源最短路径、连接组件、PageRank、Diameter和Random Walk)评估了我们的优化技术。我们的优化实现将这些算法的通信开销减少了21- 93%,同时将错误率保持在5%以下。
{"title":"Accelerating parallel graph computing with speculation","authors":"Shuo Ji, Yinliang Zhao, Qing Yi","doi":"10.1145/3310273.3323049","DOIUrl":"https://doi.org/10.1145/3310273.3323049","url":null,"abstract":"Nowadays distributed graph computing is widely used to process large amount of data on the internet. Communication overhead is a critical factor in determining the overall efficiency of graph algorithms. Through speculative prediction of the content of communications, we develop an optimization technique to significantly reduce the amount of communications needed for a class of graph algorithms. We have evaluated our optimization technique using five graph algorithms, Single-source shortest path, Connected Components, PageRank, Diameter, and Random Walk, on the Amazon EC2 clusters using different graph datasets. Our optimized implementations have reduced communication overhead by 21--93% for these algorithms, while keeping the error rates under 5%.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131587660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Analysing the tor web with high performance graph algorithms 用高性能图算法分析web
Pub Date : 2019-04-30 DOI: 10.1145/3310273.3323918
M. Bernaschi, Alessandro Celestini, Stefano Guarino, F. Lombardi, Enrico Mastrostefano
The exploration and analysis of Web graphs has flourished in the recent past, producing a large number of relevant and interesting research results. However, the unique characteristics of the Tor network demand for specific algorithms to explore and analyze it. Tor is an anonymity network that allows offering and accessing various Internet resources while guaranteeing a high degree of provider and user anonymity. So far the attention of the research community has focused on assessing the security of the Tor infrastructure. Most research work on the Tor network aimed at discovering protocol vulnerabilities to de-anonymize users and services, while little or no information is available about the topology of the Tor Web graph or the relationship between pages' content and topological structure. With our work we aim at addressing such lack of information. We describe the topology of the Tor Web graph measuring both global and local properties by means of well-known metrics that require due to the size of the network, high performance algorithms. We consider three different snapshots obtained by extensively crawling Tor three times over a 5 months time frame. Finally we present a correlation analysis of pages' semantics and topology, discussing novel insights about the Tor Web organization and its content. Our findings show that the Tor graph presents some of the characteristics of social and surface web graphs, along with a few unique peculiarities.
近年来,对Web图的探索和分析蓬勃发展,产生了大量相关且有趣的研究结果。然而,Tor网络的独特特性需要特定的算法对其进行探索和分析。Tor是一个匿名网络,允许提供和访问各种互联网资源,同时保证提供者和用户的高度匿名性。到目前为止,研究界的注意力都集中在评估Tor基础设施的安全性上。大多数关于Tor网络的研究工作旨在发现协议漏洞以使用户和服务去匿名化,而关于Tor网络图的拓扑结构或页面内容与拓扑结构之间的关系的信息很少或根本没有。我们的工作旨在解决这种信息缺乏的问题。我们描述了Tor Web图的拓扑结构,通过众所周知的指标来测量全局和局部属性,这些指标由于网络的大小和高性能算法而需要。我们考虑在5个月的时间框架内通过广泛爬行Tor三次获得的三个不同快照。最后,我们提出了页面语义和拓扑的相关性分析,讨论了关于Tor Web组织及其内容的新见解。我们的研究结果表明,Tor图呈现出社交和平面网络图的一些特征,以及一些独特的特性。
{"title":"Analysing the tor web with high performance graph algorithms","authors":"M. Bernaschi, Alessandro Celestini, Stefano Guarino, F. Lombardi, Enrico Mastrostefano","doi":"10.1145/3310273.3323918","DOIUrl":"https://doi.org/10.1145/3310273.3323918","url":null,"abstract":"The exploration and analysis of Web graphs has flourished in the recent past, producing a large number of relevant and interesting research results. However, the unique characteristics of the Tor network demand for specific algorithms to explore and analyze it. Tor is an anonymity network that allows offering and accessing various Internet resources while guaranteeing a high degree of provider and user anonymity. So far the attention of the research community has focused on assessing the security of the Tor infrastructure. Most research work on the Tor network aimed at discovering protocol vulnerabilities to de-anonymize users and services, while little or no information is available about the topology of the Tor Web graph or the relationship between pages' content and topological structure. With our work we aim at addressing such lack of information. We describe the topology of the Tor Web graph measuring both global and local properties by means of well-known metrics that require due to the size of the network, high performance algorithms. We consider three different snapshots obtained by extensively crawling Tor three times over a 5 months time frame. Finally we present a correlation analysis of pages' semantics and topology, discussing novel insights about the Tor Web organization and its content. Our findings show that the Tor graph presents some of the characteristics of social and surface web graphs, along with a few unique peculiarities.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114424502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An adaptive concurrent priority queue for NUMA architectures NUMA体系结构的自适应并发优先级队列
Pub Date : 2019-04-30 DOI: 10.1145/3310273.3323164
F. Strati, Christina Giannoula, Dimitrios Siakavaras, G. Goumas, N. Koziris
Designing scalable concurrent priority queues for contemporary NUMA servers is challenging. Several NUMA-unaware implementations can scale up to a high number of threads exploiting the potential parallelism of the insert operations. In contrast, in deleteMin-dominated workloads, threads compete for accessing the same memory locations, i.e. the first item in the priority queue. In such cases, NUMA-aware implementations are typically used, since they reduce the coherence traffic between the nodes of a NUMA system. In this work, we propose an adaptive priority queue, called SmartPQ, that tunes itself by automatically switching between NUMA-unaware and NUMA-aware algorithmic modes to provide the highest available performance under all workloads. SmartPQ is built on top of NUMA Node Delegation (Nuddle), a low overhead technique to construct NUMA-aware data structures using any arbitrary NUMA-unaware implementation as its backbone. Moreover, SmartPQ employs machine learning to decide when to switch between its two algorithmic modes. As our evaluation reveals, it achieves the highest available performance with 88% success rate and dynamically adapts between a NUMA-aware and a NUMA-unaware mode, without overheads, while performing up to 1.83 times better performance than Spraylist, the state-of-the-art NUMA-unaware priority queue.
为当代NUMA服务器设计可伸缩的并发优先级队列具有挑战性。几个不了解numa的实现可以利用插入操作的潜在并行性扩展到大量线程。相反,在以deletemin为主的工作负载中,线程会竞争访问相同的内存位置,即优先级队列中的第一项。在这种情况下,通常使用NUMA感知实现,因为它们减少了NUMA系统节点之间的一致性流量。在这项工作中,我们提出了一个自适应优先级队列,称为SmartPQ,它通过在numa不知情和numa感知算法模式之间自动切换来调整自己,从而在所有工作负载下提供最高的可用性能。SmartPQ是建立在NUMA节点委托(Nuddle)之上的,这是一种低开销的技术,可以使用任何任意NUMA不知情的实现作为其主干来构建NUMA感知的数据结构。此外,SmartPQ采用机器学习来决定何时在两种算法模式之间切换。正如我们的评估所显示的那样,它达到了最高的可用性能,成功率为88%,并在numa感知和numa不感知模式之间动态适应,没有开销,同时性能比最先进的numa不感知优先队列Spraylist高1.83倍。
{"title":"An adaptive concurrent priority queue for NUMA architectures","authors":"F. Strati, Christina Giannoula, Dimitrios Siakavaras, G. Goumas, N. Koziris","doi":"10.1145/3310273.3323164","DOIUrl":"https://doi.org/10.1145/3310273.3323164","url":null,"abstract":"Designing scalable concurrent priority queues for contemporary NUMA servers is challenging. Several NUMA-unaware implementations can scale up to a high number of threads exploiting the potential parallelism of the insert operations. In contrast, in deleteMin-dominated workloads, threads compete for accessing the same memory locations, i.e. the first item in the priority queue. In such cases, NUMA-aware implementations are typically used, since they reduce the coherence traffic between the nodes of a NUMA system. In this work, we propose an adaptive priority queue, called SmartPQ, that tunes itself by automatically switching between NUMA-unaware and NUMA-aware algorithmic modes to provide the highest available performance under all workloads. SmartPQ is built on top of NUMA Node Delegation (Nuddle), a low overhead technique to construct NUMA-aware data structures using any arbitrary NUMA-unaware implementation as its backbone. Moreover, SmartPQ employs machine learning to decide when to switch between its two algorithmic modes. As our evaluation reveals, it achieves the highest available performance with 88% success rate and dynamically adapts between a NUMA-aware and a NUMA-unaware mode, without overheads, while performing up to 1.83 times better performance than Spraylist, the state-of-the-art NUMA-unaware priority queue.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128087334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Designing a secure DRAM+NVM hybrid memory module 设计安全的 DRAM+NVM 混合内存模块
Pub Date : 2019-04-30 DOI: 10.1145/3310273.3323069
Xu Wang, I. Koren
Non-Volatile Memory (NVM) such as PCM has emerged as a potential alternative for main memory due to its high density and low leakage power. However, an NVM main-memory system faces three challenges when compared to Dynamic Random Access Memory (DRAM) - long latency, poor write endurance and data security. To address these three challenges, we propose a secure DRAM+NVM hybrid memory module. The hybrid module integrates a DRAM cache and a security unit (SU). DRAM cache can improve the performance of an NVM memory module and reduce the number of direct writes to the NVM. Our results show that a 256MB 2-way DRAM cache with a 1024B cache line performs well in an 8GB NVM main memory module. The SU is embedded in the onboard controller and includes an AES-GCM engine and an NVM vault. The AES-GCM engine implements encryption and authentication with low overhead. The NVM vault is used to store MAC tags and counter values for each DRAM cache line. According to our results, the proposed secure hybrid memory module improves the performance by 32% compared to an NVM-only memory module, and is only 6.8% slower than a DRAM only memory module.
非易失性存储器(NVM)(如 PCM)因其高密度和低漏电功率而成为主存储器的潜在替代品。然而,与动态随机存取存储器(DRAM)相比,NVM 主存储器系统面临着三个挑战--延迟长、写入耐久性差和数据安全性。为了应对这三大挑战,我们提出了一种安全的 DRAM+NVM 混合内存模块。该混合模块集成了 DRAM 高速缓存和安全单元(SU)。DRAM 缓存可以提高 NVM 内存模块的性能,并减少直接写入 NVM 的次数。我们的研究结果表明,在 8GB NVM 主存储器模块中,具有 1024B 缓存线的 256MB 双向 DRAM 缓存性能良好。SU 嵌入在板载控制器中,包括一个 AES-GCM 引擎和一个 NVM 存储库。AES-GCM 引擎以较低的开销实现加密和验证。NVM 存储库用于存储每个 DRAM 高速缓存行的 MAC 标记和计数器值。根据我们的研究结果,与纯 NVM 存储模块相比,拟议的安全混合存储模块的性能提高了 32%,与纯 DRAM 存储模块相比,仅慢 6.8%。
{"title":"Designing a secure DRAM+NVM hybrid memory module","authors":"Xu Wang, I. Koren","doi":"10.1145/3310273.3323069","DOIUrl":"https://doi.org/10.1145/3310273.3323069","url":null,"abstract":"Non-Volatile Memory (NVM) such as PCM has emerged as a potential alternative for main memory due to its high density and low leakage power. However, an NVM main-memory system faces three challenges when compared to Dynamic Random Access Memory (DRAM) - long latency, poor write endurance and data security. To address these three challenges, we propose a secure DRAM+NVM hybrid memory module. The hybrid module integrates a DRAM cache and a security unit (SU). DRAM cache can improve the performance of an NVM memory module and reduce the number of direct writes to the NVM. Our results show that a 256MB 2-way DRAM cache with a 1024B cache line performs well in an 8GB NVM main memory module. The SU is embedded in the onboard controller and includes an AES-GCM engine and an NVM vault. The AES-GCM engine implements encryption and authentication with low overhead. The NVM vault is used to store MAC tags and counter values for each DRAM cache line. According to our results, the proposed secure hybrid memory module improves the performance by 32% compared to an NVM-only memory module, and is only 6.8% slower than a DRAM only memory module.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129158182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the 16th ACM International Conference on Computing Frontiers
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1