首页 > 最新文献

ACM International Conference on Computing Frontiers最新文献

英文 中文
Scaling analytics applications with OpenCL for loosely coupled heterogeneous clusters 使用OpenCL为松散耦合异构集群扩展分析应用程序
Pub Date : 2013-05-14 DOI: 10.1145/2482767.2482812
T. Suganuma, R. Krishnamurthy, Moriyoshi Ohara, T. Nakatani
OpenCL is an open standard for heterogeneous parallel programming, exploiting multi-core CPUs, GPUs, or other accelerators as parallel computing resources. Recent work has extended the OpenCL parallel programming model for distributed heterogeneous clusters. For such loosely coupled acceleration architectures, the design of OpenCL programs to maximize performance is quite different from that of conventional tightly coupled acceleration platforms. This paper describes our experiences in OpenCL programming to extract scalable performance for a distributed heterogeneous cluster environment. We picked two real-world analytics workloads, Two-Step Cluster and Linear Regression, that offer different challenges to efficient OpenCL implementations. We obtained scalable performance with this architecture by carefully managing the amount of data and computations in the kernel program design and by well addressing the network latency problems through optimizations.
OpenCL是异构并行编程的开放标准,利用多核cpu、gpu或其他加速器作为并行计算资源。最近的工作扩展了OpenCL并行编程模型用于分布式异构集群。对于这种松耦合的加速架构,OpenCL程序最大化性能的设计与传统的紧耦合加速平台有很大的不同。本文描述了我们在OpenCL编程中为分布式异构集群环境提取可扩展性能的经验。我们选择了两种现实世界的分析工作负载,两步集群和线性回归,它们为高效的OpenCL实现提供了不同的挑战。通过仔细管理内核程序设计中的数据量和计算量,以及通过优化很好地解决网络延迟问题,我们获得了可扩展的性能。
{"title":"Scaling analytics applications with OpenCL for loosely coupled heterogeneous clusters","authors":"T. Suganuma, R. Krishnamurthy, Moriyoshi Ohara, T. Nakatani","doi":"10.1145/2482767.2482812","DOIUrl":"https://doi.org/10.1145/2482767.2482812","url":null,"abstract":"OpenCL is an open standard for heterogeneous parallel programming, exploiting multi-core CPUs, GPUs, or other accelerators as parallel computing resources. Recent work has extended the OpenCL parallel programming model for distributed heterogeneous clusters. For such loosely coupled acceleration architectures, the design of OpenCL programs to maximize performance is quite different from that of conventional tightly coupled acceleration platforms. This paper describes our experiences in OpenCL programming to extract scalable performance for a distributed heterogeneous cluster environment. We picked two real-world analytics workloads, Two-Step Cluster and Linear Regression, that offer different challenges to efficient OpenCL implementations. We obtained scalable performance with this architecture by carefully managing the amount of data and computations in the kernel program design and by well addressing the network latency problems through optimizations.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127701195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Reasoning and prediction on opportunistic networks to improve data dissemination 机会网络的推理和预测,以改善数据传播
Pub Date : 2013-05-14 DOI: 10.1145/2482767.2482782
C. O. Rolim, C. Geyer
Opportunistic networks exploits social behavior to build connectivity opportunities. This paradigm uses pair-wise contact to share and forward content without any prior knowledge about pre-existing infrastructure. In this context, optimize data dissemination among nodes is a paramount. This paper presents early stages of our research with focus on reasoning and predictions issues to improve data dissemination on opportunistic networks. We intend to explore contextual and social aspects with machine learning techniques in the design of a reasoning and prediction engine for this purpose.
机会主义网络利用社会行为来建立连接机会。这种范例使用成对接触来共享和转发内容,而无需事先了解已有的基础设施。在这种情况下,优化节点之间的数据传播是至关重要的。本文介绍了我们研究的早期阶段,重点是推理和预测问题,以改善机会主义网络上的数据传播。我们打算利用机器学习技术在设计推理和预测引擎时探索上下文和社会方面。
{"title":"Reasoning and prediction on opportunistic networks to improve data dissemination","authors":"C. O. Rolim, C. Geyer","doi":"10.1145/2482767.2482782","DOIUrl":"https://doi.org/10.1145/2482767.2482782","url":null,"abstract":"Opportunistic networks exploits social behavior to build connectivity opportunities. This paradigm uses pair-wise contact to share and forward content without any prior knowledge about pre-existing infrastructure. In this context, optimize data dissemination among nodes is a paramount. This paper presents early stages of our research with focus on reasoning and predictions issues to improve data dissemination on opportunistic networks. We intend to explore contextual and social aspects with machine learning techniques in the design of a reasoning and prediction engine for this purpose.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"11 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134337874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bridging the programming gap between persistent and volatile memory using WrAP 使用WrAP弥合持久性和易失性内存之间的编程差距
Pub Date : 2013-05-14 DOI: 10.1145/2482767.2482806
Ellis R. Giles, K. Doshi, P. Varman
Advances in memory technology are promising the availability of byte-addressable persistent memory as an integral component of future computing platforms. This change has significant implications for software that has traditionally made a sharp distinction between durable and volatile storage. In this paper we describe a software-hardware architecture, WrAP, for persistent memory that provides atomicity and durability while simultaneously ensuring that fast paths through the cache, DRAM, and persistent memory layers are not slowed down by burdensome buffering or double-copying requirements. Trace-driven simulation of transactional data structures indicate the potential for significant performance gains using the WrAP approach.
内存技术的进步使字节可寻址的持久内存成为未来计算平台的一个组成部分。这一变化对传统上泾渭分明地区分持久存储和易失存储的软件具有重要意义。在本文中,我们描述了一种用于持久内存的软硬件架构WrAP,它提供原子性和持久性,同时确保通过缓存、DRAM和持久内存层的快速路径不会因繁琐的缓冲或双重复制要求而减慢速度。对事务数据结构的跟踪驱动模拟表明,使用WrAP方法可以显著提高性能。
{"title":"Bridging the programming gap between persistent and volatile memory using WrAP","authors":"Ellis R. Giles, K. Doshi, P. Varman","doi":"10.1145/2482767.2482806","DOIUrl":"https://doi.org/10.1145/2482767.2482806","url":null,"abstract":"Advances in memory technology are promising the availability of byte-addressable persistent memory as an integral component of future computing platforms. This change has significant implications for software that has traditionally made a sharp distinction between durable and volatile storage. In this paper we describe a software-hardware architecture, WrAP, for persistent memory that provides atomicity and durability while simultaneously ensuring that fast paths through the cache, DRAM, and persistent memory layers are not slowed down by burdensome buffering or double-copying requirements. Trace-driven simulation of transactional data structures indicate the potential for significant performance gains using the WrAP approach.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130274656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Computationally unifying urban masterplanning 计算统一城市总体规划
Pub Date : 2013-05-14 DOI: 10.1145/2482767.2482808
David Birch
Architectural design, particularly in large scale masterplanning projects, has yet to fully undergo the computational revolution experienced by other design-led industries such as automotive and aerospace. These industries use computational frameworks to undertake automated design analysis and design space exploration. However, within the Architectural, Engineering and Construction (AEC) industries we find no such computational platforms. This precludes the rapid analysis needed for quantitative design iteration which is required for sustainable design. This is a current computing frontier. This paper considers the computational solutions to the challenges preventing such advances to improve architectural design performance for a more sustainable future. We present a practical discussion of the computational challenges and opportunities in this industry and present a computational framework "HierSynth" with a data model designed to the needs of this industry. We report the results and lessons learned from applying this framework to a major commercial urban masterplanning project. This framework was used to automate and augment existing practice and was used to undertake previously infeasible, designer lead, design space exploration. During the casestudy an order of magnitude more analysis cycles were undertaken than literature suggests is normal; each occurring in hours not days.
建筑设计,特别是在大型总体规划项目中,尚未完全经历其他设计主导行业(如汽车和航空航天)所经历的计算革命。这些行业使用计算框架进行自动化设计分析和设计空间探索。然而,在建筑、工程和施工(AEC)行业中,我们没有发现这样的计算平台。这妨碍了可持续设计所需的定量设计迭代所需的快速分析。这是当前计算的前沿。本文考虑了计算机解决方案的挑战,以防止这种进步,以提高建筑设计性能,以实现更可持续的未来。我们对该行业的计算挑战和机遇进行了实际的讨论,并提出了一个计算框架“HierSynth”,其中包含一个为该行业需求而设计的数据模型。我们报告了将该框架应用于一个大型商业城市总体规划项目的结果和经验教训。这个框架被用来自动化和增强现有的实践,并用于承担以前不可行的、设计师领导的、设计空间的探索。在案例研究期间,进行的分析周期比文献建议的正常多一个数量级;每一个都发生在几个小时而不是几天。
{"title":"Computationally unifying urban masterplanning","authors":"David Birch","doi":"10.1145/2482767.2482808","DOIUrl":"https://doi.org/10.1145/2482767.2482808","url":null,"abstract":"Architectural design, particularly in large scale masterplanning projects, has yet to fully undergo the computational revolution experienced by other design-led industries such as automotive and aerospace. These industries use computational frameworks to undertake automated design analysis and design space exploration. However, within the Architectural, Engineering and Construction (AEC) industries we find no such computational platforms. This precludes the rapid analysis needed for quantitative design iteration which is required for sustainable design. This is a current computing frontier.\u0000 This paper considers the computational solutions to the challenges preventing such advances to improve architectural design performance for a more sustainable future. We present a practical discussion of the computational challenges and opportunities in this industry and present a computational framework \"HierSynth\" with a data model designed to the needs of this industry.\u0000 We report the results and lessons learned from applying this framework to a major commercial urban masterplanning project. This framework was used to automate and augment existing practice and was used to undertake previously infeasible, designer lead, design space exploration. During the casestudy an order of magnitude more analysis cycles were undertaken than literature suggests is normal; each occurring in hours not days.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123700761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Mapping applications for high performance on multithreaded, NUMA systems 映射应用程序在多线程,NUMA系统上的高性能
Pub Date : 2013-05-14 DOI: 10.1145/2482767.2482777
Guojing Cong, H. Wen
The communication latency and available resources for a group of logical processors are determined by their relative position in the hierarchy of chips, cores, and threads on modern shared-memory systems. Multithreaded applications exhibit different performance behavior depending on the mapping of software threads to logical processors. We observe the execution time under one mapping can be 5.4 times as much as that under another. Applications with irregular access patterns show the worst performance under the default OS mapping. Mapping alone does not reduce remote accesses on NUMA machines when the logical processors span multiple chips. We present new data replication and distribution optimizations for two irregular applications. We further show that locality optimization reduces remote accesses and improves cache performance simultaneously and achieves better performance than prior NUMA-specific techniques.
一组逻辑处理器的通信延迟和可用资源是由它们在现代共享内存系统上的芯片、内核和线程层次结构中的相对位置决定的。根据软件线程到逻辑处理器的映射,多线程应用程序表现出不同的性能行为。我们观察到,一个映射下的执行时间可能是另一个映射下的5.4倍。具有不规则访问模式的应用程序在默认操作系统映射下表现出最差的性能。当逻辑处理器跨越多个芯片时,单独的映射并不能减少NUMA机器上的远程访问。我们为两个不规则应用程序提供了新的数据复制和分布优化。我们进一步表明,局域优化减少了远程访问,同时提高了缓存性能,并且比以前的numa特定技术实现了更好的性能。
{"title":"Mapping applications for high performance on multithreaded, NUMA systems","authors":"Guojing Cong, H. Wen","doi":"10.1145/2482767.2482777","DOIUrl":"https://doi.org/10.1145/2482767.2482777","url":null,"abstract":"The communication latency and available resources for a group of logical processors are determined by their relative position in the hierarchy of chips, cores, and threads on modern shared-memory systems. Multithreaded applications exhibit different performance behavior depending on the mapping of software threads to logical processors. We observe the execution time under one mapping can be 5.4 times as much as that under another. Applications with irregular access patterns show the worst performance under the default OS mapping.\u0000 Mapping alone does not reduce remote accesses on NUMA machines when the logical processors span multiple chips. We present new data replication and distribution optimizations for two irregular applications. We further show that locality optimization reduces remote accesses and improves cache performance simultaneously and achieves better performance than prior NUMA-specific techniques.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127153259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
GPU acceleration of regular expression matching for large datasets: exploring the implementation space 大型数据集正则表达式匹配的GPU加速:探索实现空间
Pub Date : 2013-05-14 DOI: 10.1145/2482767.2482791
Xiaodong Yu, M. Becchi
Regular expression matching is a central task in several networking (and search) applications and has been accelerated on a variety of parallel architectures, including general purpose multi-core processors, network processors, field programmable gate arrays, and ASIC- and TCAM-based systems. All of these solutions are based on finite automata (either in deterministic or non-deterministic form) and mostly focus on effective memory representations for such automata. More recently, a handful of proposals have exploited the parallelism intrinsic in regular expression matching (i.e., coarse-grained packet-level parallelism and fine-grained data structure parallelism) to propose efficient regex-matching designs for GPUs. However, most GPU solutions aim at achieving good performance on small datasets, which are far less complex and problematic than those used in real-world applications. In this work, we provide a more comprehensive study of regular expression matching on GPUs. To this end, we consider datasets of practical size and complexity and explore advantages and limitations of different automata representations and of various GPU implementation techniques. Our goal is not to show optimal speedup on specific datasets, but to highlight advantages and disadvantages of the GPU hardware in supporting state-of-the-art automata representations and encoding schemes, approaches that have been broadly adopted on other parallel memory-based platforms.
正则表达式匹配是几个网络(和搜索)应用程序中的核心任务,并且在各种并行体系结构上得到了加速,包括通用多核处理器、网络处理器、现场可编程门阵列以及基于ASIC和tcam的系统。所有这些解决方案都基于有限自动机(确定性或非确定性形式),并且主要关注这些自动机的有效内存表示。最近,一些建议利用正则表达式匹配中固有的并行性(即,粗粒度包级并行性和细粒度数据结构并行性)来为gpu提出有效的正则表达式匹配设计。然而,大多数GPU解决方案的目标是在小数据集上实现良好的性能,这比在现实应用中使用的数据集要简单得多。在这项工作中,我们对gpu上的正则表达式匹配进行了更全面的研究。为此,我们考虑了实际规模和复杂性的数据集,并探索了不同自动机表示和各种GPU实现技术的优点和局限性。我们的目标不是展示特定数据集上的最佳加速,而是强调GPU硬件在支持最先进的自动机表示和编码方案方面的优点和缺点,这些方法已在其他基于并行内存的平台上广泛采用。
{"title":"GPU acceleration of regular expression matching for large datasets: exploring the implementation space","authors":"Xiaodong Yu, M. Becchi","doi":"10.1145/2482767.2482791","DOIUrl":"https://doi.org/10.1145/2482767.2482791","url":null,"abstract":"Regular expression matching is a central task in several networking (and search) applications and has been accelerated on a variety of parallel architectures, including general purpose multi-core processors, network processors, field programmable gate arrays, and ASIC- and TCAM-based systems. All of these solutions are based on finite automata (either in deterministic or non-deterministic form) and mostly focus on effective memory representations for such automata. More recently, a handful of proposals have exploited the parallelism intrinsic in regular expression matching (i.e., coarse-grained packet-level parallelism and fine-grained data structure parallelism) to propose efficient regex-matching designs for GPUs. However, most GPU solutions aim at achieving good performance on small datasets, which are far less complex and problematic than those used in real-world applications.\u0000 In this work, we provide a more comprehensive study of regular expression matching on GPUs. To this end, we consider datasets of practical size and complexity and explore advantages and limitations of different automata representations and of various GPU implementation techniques. Our goal is not to show optimal speedup on specific datasets, but to highlight advantages and disadvantages of the GPU hardware in supporting state-of-the-art automata representations and encoding schemes, approaches that have been broadly adopted on other parallel memory-based platforms.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128111023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 61
DCNSim: a unified and cross-layer computer architecture simulation framework for data center network research DCNSim:用于数据中心网络研究的统一的跨层计算机体系结构仿真框架
Pub Date : 2013-05-14 DOI: 10.1145/2482767.2482792
Nongda Hu, Long Li, Binzhang Fu, Tao Li, Xiufeng Sui, Lixin Zhang
Within today's large-scale data centers, the inter-node communication is often the major bottleneck. This fact recently blooms the data center network (DCN) research. Since building a real data center is cost prohibitive, most of DCN studies rely on simulations. Unfortunately, state-of-the-art network simulators have limited support for real world applications, which prevents researchers from first-hand investigation. To address this issue, we developed a unified and cross-layer simulation framework, namely the DCNSim. By leveraging the two widely deployed simulators, DCNSim introduces computer architecture solutions into DCN research. With DCNSim, one could run packet-level network simulation driven by commercial applications while varying computer and network parameters, such as CPU frequency, memory access latency, network topology and protocols. With extensive validations, we show that DCNSim could accurately capture performance trends caused by changing computer and network parameters. Finally, we argue that future DCN researches should consider computer architecture factors via several case studies.
在当今的大型数据中心中,节点间通信通常是主要的瓶颈。这一事实最近掀起了数据中心网络(DCN)的研究热潮。由于构建真实的数据中心成本过高,大多数DCN研究都依赖于模拟。不幸的是,最先进的网络模拟器对现实世界应用的支持有限,这阻碍了研究人员的第一手调查。为了解决这个问题,我们开发了一个统一的跨层仿真框架,即DCNSim。通过利用这两个广泛部署的模拟器,DCNSim将计算机体系结构解决方案引入DCN研究。使用DCNSim,可以在改变计算机和网络参数(如CPU频率、内存访问延迟、网络拓扑和协议)的情况下运行由商业应用程序驱动的数据包级网络模拟。通过大量的验证,我们表明DCNSim可以准确地捕获由计算机和网络参数变化引起的性能趋势。最后,我们认为未来的DCN研究应该考虑计算机体系结构因素。
{"title":"DCNSim: a unified and cross-layer computer architecture simulation framework for data center network research","authors":"Nongda Hu, Long Li, Binzhang Fu, Tao Li, Xiufeng Sui, Lixin Zhang","doi":"10.1145/2482767.2482792","DOIUrl":"https://doi.org/10.1145/2482767.2482792","url":null,"abstract":"Within today's large-scale data centers, the inter-node communication is often the major bottleneck. This fact recently blooms the data center network (DCN) research. Since building a real data center is cost prohibitive, most of DCN studies rely on simulations. Unfortunately, state-of-the-art network simulators have limited support for real world applications, which prevents researchers from first-hand investigation. To address this issue, we developed a unified and cross-layer simulation framework, namely the DCNSim. By leveraging the two widely deployed simulators, DCNSim introduces computer architecture solutions into DCN research. With DCNSim, one could run packet-level network simulation driven by commercial applications while varying computer and network parameters, such as CPU frequency, memory access latency, network topology and protocols. With extensive validations, we show that DCNSim could accurately capture performance trends caused by changing computer and network parameters. Finally, we argue that future DCN researches should consider computer architecture factors via several case studies.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115926698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Network stacking considered harmful 网络堆叠被认为有害
Pub Date : 2013-05-14 DOI: 10.1145/2482767.2482780
Robert Surton
The most important challenge facing the future Internet is not technical, but is rather the need to justify placing trust in the technical solutions. Current network models suffer from limitations that result in practical deployments being too complex to reason about. The novel channel market model, based on composing networks by sharing channels through a flat market, offers a better opportunity for reasoning. The old language is still useful, and continues to make sense in the new model. Two design principles, the haggling principle and the composition principle, provide hints for discussing and designing networks in a channel market.
未来互联网面临的最重要的挑战不是技术,而是需要证明对技术解决方案的信任。当前的网络模型存在局限性,导致实际部署过于复杂,无法进行推理。这种新的渠道市场模型,通过一个平坦的市场,通过共享渠道组成网络,为推理提供了更好的机会。旧的语言仍然有用,并且在新模型中仍然有意义。讨价还价原则和构成原则这两个设计原则为渠道市场网络的探讨和设计提供了启示。
{"title":"Network stacking considered harmful","authors":"Robert Surton","doi":"10.1145/2482767.2482780","DOIUrl":"https://doi.org/10.1145/2482767.2482780","url":null,"abstract":"The most important challenge facing the future Internet is not technical, but is rather the need to justify placing trust in the technical solutions. Current network models suffer from limitations that result in practical deployments being too complex to reason about. The novel channel market model, based on composing networks by sharing channels through a flat market, offers a better opportunity for reasoning. The old language is still useful, and continues to make sense in the new model. Two design principles, the haggling principle and the composition principle, provide hints for discussing and designing networks in a channel market.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133912943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An algorithm for parallel calculation of trigonometric functions 三角函数的并行计算算法
Pub Date : 2013-05-14 DOI: 10.1145/2482767.2482778
T. Barrera, A. Hast, E. Bengtsson
We propose a new way of calculating the sine and cosine functions. The method is based on recursive applications of a modified complex power algorithm. On a machine with multiple complex multipliers the method can be used to calculate sines and cosines in logarithmic time. The serial version of the presented method requires only two precomputed constants and no tables. In the parallel versions a trade off can be made between the number of parallel processing elements and the size of tables.
我们提出了一种计算正弦和余弦函数的新方法。该方法基于一种改进的复幂算法的递归应用。在具有多个复数乘法器的机器上,该方法可用于在对数时间内计算正弦和余弦。该方法的串行版本只需要两个预先计算的常量,不需要表。在并行版本中,可以在并行处理元素的数量和表的大小之间进行权衡。
{"title":"An algorithm for parallel calculation of trigonometric functions","authors":"T. Barrera, A. Hast, E. Bengtsson","doi":"10.1145/2482767.2482778","DOIUrl":"https://doi.org/10.1145/2482767.2482778","url":null,"abstract":"We propose a new way of calculating the sine and cosine functions. The method is based on recursive applications of a modified complex power algorithm. On a machine with multiple complex multipliers the method can be used to calculate sines and cosines in logarithmic time. The serial version of the presented method requires only two precomputed constants and no tables. In the parallel versions a trade off can be made between the number of parallel processing elements and the size of tables.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115892066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RFiof: an RF approach to I/O-pin and memory controller scalability for off-chip memories RFiof:一种用于片外存储器的I/ o引脚和存储器控制器可扩展性的射频方法
Pub Date : 2013-05-14 DOI: 10.1145/2482767.2482803
M. Marino
Given the maintenance of Moore's law behavior, core count is expected to continue growing, which keeps demanding more memory bandwidth destined to feed them. Memory controller (MC) scalability is crucial to achieve these bandwidth needs, but constrained by I/O pin scaling. In this study, we introduce RFiof, a radio-frequency (RF) memory approach to address I/O pin constraints which restrict MC scalability in off-chip-memory systems, while keeping interconnection energy at lower levels. In this paper, we model, design, and demonstrate how RFiof achieves high MC I/O pin scalability for different memory technology generations, while evaluating its area and power/energy impact. By introducing the novel concept of RFpins -- to replace traditional MC I/O pins, and using RFMCs - MCs coupled to RF transmitters (TX)/receivers (RX), while employing a minimal RF-path between RFMC and ranks, we demonstrate that for a 32-out-of-order multicore configured with off-chip ranks with a 1:1 core-to-MC ratio, RFiof presents scalable 4 RFpins per RFMC -comparable to pin-scalable optical solutions - and is able to respectively improve bandwidth and performance by up to 7.2x and 8.6x, compared to the traditional baseline -- constrained to MC I/O pin counts. Furthermore, RFiof reduces about 65.6% of MC area usage, and 80% of memory path energy interconnection.
考虑到摩尔定律行为的维持,核心数量预计将继续增长,这将要求更多的内存带宽来满足它们的需求。内存控制器(MC)的可伸缩性对于实现这些带宽需求至关重要,但受到I/O引脚可伸缩性的限制。在本研究中,我们引入射频(RF)存储器方法RFiof,以解决限制片外存储器系统中MC可扩展性的I/O引脚限制,同时将互连能量保持在较低水平。在本文中,我们建模、设计并演示了RFiof如何为不同的存储技术实现高MC I/O引脚可扩展性,同时评估其面积和功率/能量影响。通过引入RFpins的新概念——来取代传统的MC I / O引脚,并使用RFMCs—MCs耦合射频发射器(TX) /接收器(RX),同时采用最小RFMC和等级之间的RF路径,我们证明32-out-of-order多核配置了片外与1:1 core-to-MC比率,RFiof礼物可伸缩4 RFpins / RFMC可比pin-scalable光学解决方案,能够提高带宽和性能分别达7.2倍和8.6倍,与传统基准相比——受限于MC I/O引脚数。此外,RFiof减少了65.6%的MC面积使用和80%的内存路径能量互连。
{"title":"RFiof: an RF approach to I/O-pin and memory controller scalability for off-chip memories","authors":"M. Marino","doi":"10.1145/2482767.2482803","DOIUrl":"https://doi.org/10.1145/2482767.2482803","url":null,"abstract":"Given the maintenance of Moore's law behavior, core count is expected to continue growing, which keeps demanding more memory bandwidth destined to feed them. Memory controller (MC) scalability is crucial to achieve these bandwidth needs, but constrained by I/O pin scaling. In this study, we introduce RFiof, a radio-frequency (<u>RF</u>) memory approach to address <u>I</u>/<u>O</u> pin constraints which restrict MC scalability in o<u>f</u>f-chip-memory systems, while keeping interconnection energy at lower levels.\u0000 In this paper, we model, design, and demonstrate how RFiof achieves high MC I/O pin scalability for different memory technology generations, while evaluating its area and power/energy impact. By introducing the novel concept of RFpins -- to replace traditional MC I/O pins, and using RFMCs - MCs coupled to RF transmitters (TX)/receivers (RX), while employing a minimal RF-path between RFMC and ranks, we demonstrate that for a 32-out-of-order multicore configured with off-chip ranks with a 1:1 core-to-MC ratio, RFiof presents scalable 4 RFpins per RFMC -comparable to pin-scalable optical solutions - and is able to respectively improve bandwidth and performance by up to 7.2x and 8.6x, compared to the traditional baseline -- constrained to MC I/O pin counts. Furthermore, RFiof reduces about 65.6% of MC area usage, and 80% of memory path energy interconnection.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115943837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
ACM International Conference on Computing Frontiers
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1