首页 > 最新文献

2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC)最新文献

英文 中文
Combinatorial JPT based on orthogonal beamforming for two-cell cooperation 基于正交波束形成的组合JPT双小区合作
Hojae Lee, Beom Kwon, Seonghyun Kim, Inwoong Lee, Sanghoon Lee
In this paper, we investigate efficient multi-cell cooperation based on CoMP-joint processing and transmission (CoMP-JPT) with orthogonal beamforming. Through the use of a combinatorial optimization algorithm, the optimal user scheduling for joint transmission using multiple transmitters is accomplished. The throughput of the CoMP-JPT can be significantly improved while maintaining fairness among users over a multi-cell environment.
本文研究了基于正交波束形成的comp联合处理和传输(CoMP-JPT)的高效多小区合作。通过使用组合优化算法,实现了多发射机联合传输的最优用户调度。CoMP-JPT的吞吐量可以得到显著提高,同时在多单元环境中保持用户之间的公平性。
{"title":"Combinatorial JPT based on orthogonal beamforming for two-cell cooperation","authors":"Hojae Lee, Beom Kwon, Seonghyun Kim, Inwoong Lee, Sanghoon Lee","doi":"10.1109/PCCC.2014.7017043","DOIUrl":"https://doi.org/10.1109/PCCC.2014.7017043","url":null,"abstract":"In this paper, we investigate efficient multi-cell cooperation based on CoMP-joint processing and transmission (CoMP-JPT) with orthogonal beamforming. Through the use of a combinatorial optimization algorithm, the optimal user scheduling for joint transmission using multiple transmitters is accomplished. The throughput of the CoMP-JPT can be significantly improved while maintaining fairness among users over a multi-cell environment.","PeriodicalId":105442,"journal":{"name":"2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114180292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Virtual structures and heterogeneous nodes in dependency graphs for detecting metamorphic malware 变形恶意软件检测依赖图中的虚拟结构和异构节点
Gilbert Breves Martins, Rosiane de Freitas, E. Souto
The traditional way to identify malicious programs is to compare the code body with a set of previously stored code patterns, also known as signatures, extracted from already identified malware code. To nullify this identification process, the malware developers can insert in their creations the ability to modify the malware code when the next contamination process takes place, using obfuscation techniques. One way to deal with this metamorphic malware behavior is the use of dependency graphs, generated by surveying dependency relationships among code elements, creating a model that is resilient to code mutations. Analog to the signature model, a matching procedure that compares these graphs with a reference graph database is used to identify a malware code. Since graph matching is a NP-hard problem, it is necessary to find ways to optimize this process, so this identification technique can be applied. Using dependency graphs extracted from binary code, we present an approach to reduce the size of the reference dependency graphs stored on the graph database, by introducing a node differentiation based on its features. This way, in conjunction with the insertion of virtual paths, it is possible to build a virtual clique used to identify and dispose of less relevant elements of the original graph. The use of dependency graph reduction also produces more stable results in the matching process. To validate these statements, we present a methodology for generating these graphs from binary programs and compare the results achieved with and without the proposed approach in the identification of the Evol and Polip metamorphic malware.
识别恶意程序的传统方法是将代码体与先前存储的一组代码模式(也称为签名)进行比较,这些模式是从已经识别的恶意软件代码中提取出来的。为了使这个识别过程无效,恶意软件开发人员可以在他们的创建中插入使用混淆技术在下一次污染过程发生时修改恶意软件代码的能力。处理这种变形的恶意软件行为的一种方法是使用依赖图,通过调查代码元素之间的依赖关系生成依赖图,创建一个对代码突变具有弹性的模型。与签名模型类似,将这些图形与参考图形数据库进行比较的匹配过程用于识别恶意软件代码。由于图匹配是一个np困难问题,有必要找到优化这一过程的方法,因此可以应用这种识别技术。利用从二进制代码中提取的依赖图,我们提出了一种方法,通过引入基于其特征的节点区分来减少存储在图数据库中的参考依赖图的大小。这样,结合虚拟路径的插入,就有可能建立一个虚拟团,用于识别和处理原始图中不太相关的元素。依赖图约简的使用也会在匹配过程中产生更稳定的结果。为了验证这些陈述,我们提出了一种从二进制程序生成这些图的方法,并比较了在识别Evol和Polip变质恶意软件时使用和不使用所提出方法获得的结果。
{"title":"Virtual structures and heterogeneous nodes in dependency graphs for detecting metamorphic malware","authors":"Gilbert Breves Martins, Rosiane de Freitas, E. Souto","doi":"10.1109/PCCC.2014.7017069","DOIUrl":"https://doi.org/10.1109/PCCC.2014.7017069","url":null,"abstract":"The traditional way to identify malicious programs is to compare the code body with a set of previously stored code patterns, also known as signatures, extracted from already identified malware code. To nullify this identification process, the malware developers can insert in their creations the ability to modify the malware code when the next contamination process takes place, using obfuscation techniques. One way to deal with this metamorphic malware behavior is the use of dependency graphs, generated by surveying dependency relationships among code elements, creating a model that is resilient to code mutations. Analog to the signature model, a matching procedure that compares these graphs with a reference graph database is used to identify a malware code. Since graph matching is a NP-hard problem, it is necessary to find ways to optimize this process, so this identification technique can be applied. Using dependency graphs extracted from binary code, we present an approach to reduce the size of the reference dependency graphs stored on the graph database, by introducing a node differentiation based on its features. This way, in conjunction with the insertion of virtual paths, it is possible to build a virtual clique used to identify and dispose of less relevant elements of the original graph. The use of dependency graph reduction also produces more stable results in the matching process. To validate these statements, we present a methodology for generating these graphs from binary programs and compare the results achieved with and without the proposed approach in the identification of the Evol and Polip metamorphic malware.","PeriodicalId":105442,"journal":{"name":"2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122106215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A promising CUDA-accelerated vehicular area network simulator using NS-3 基于NS-3的有前途的cuda加速车载局域网模拟器
Chok M. Yip, A. Asaduzzaman
Both size and computational activities of Vehicular Area Network (VANET) are growing. Simulation of VANETs not only requires the simulation of network standards, but also requires the mobility of nodes. Such a dynamic system involves computations of node distances, routing protocols, application layers, data send, data receive, etc. The simulation model of VANET requires both hardware and software supports to deal with massive computational problems. Currently available network simulators, like Network Simulator 3 (NS-3), are not adequate for simulating large-scale VANET systems. In this work, we propose a Compute Unified Device Architecture (CUDA)-assisted VANET simulation model for multicore Central Processing Unit (CPU) / manycore Graphics Processing Unit (GPU) platform to increase computational throughput. The proposed VANET/GPU simulator uses NS-3 as the core engine and improves throughput by exploiting massively parallel processing on the GPU. Experimental results show that the overall computation speedup can be increased up to 129x by using the proposed VANET/GPU simulator.
车辆区域网络(VANET)的规模和计算量都在不断增长。VANETs的仿真不仅需要仿真网络标准,还需要节点的移动性。这样一个动态系统涉及节点距离、路由协议、应用层、数据发送、数据接收等的计算。VANET的仿真模型需要硬件和软件的支持来处理大量的计算问题。目前可用的网络模拟器,如网络模拟器3 (NS-3),不足以模拟大规模VANET系统。在这项工作中,我们提出了一个计算统一设备架构(CUDA)辅助的多核中央处理器(CPU) /多核图形处理单元(GPU)平台的VANET仿真模型,以提高计算吞吐量。提出的VANET/GPU模拟器使用NS-3作为核心引擎,通过利用GPU上的大规模并行处理来提高吞吐量。实验结果表明,采用VANET/GPU模拟器,整体计算速度可提高129倍。
{"title":"A promising CUDA-accelerated vehicular area network simulator using NS-3","authors":"Chok M. Yip, A. Asaduzzaman","doi":"10.1109/PCCC.2014.7017048","DOIUrl":"https://doi.org/10.1109/PCCC.2014.7017048","url":null,"abstract":"Both size and computational activities of Vehicular Area Network (VANET) are growing. Simulation of VANETs not only requires the simulation of network standards, but also requires the mobility of nodes. Such a dynamic system involves computations of node distances, routing protocols, application layers, data send, data receive, etc. The simulation model of VANET requires both hardware and software supports to deal with massive computational problems. Currently available network simulators, like Network Simulator 3 (NS-3), are not adequate for simulating large-scale VANET systems. In this work, we propose a Compute Unified Device Architecture (CUDA)-assisted VANET simulation model for multicore Central Processing Unit (CPU) / manycore Graphics Processing Unit (GPU) platform to increase computational throughput. The proposed VANET/GPU simulator uses NS-3 as the core engine and improves throughput by exploiting massively parallel processing on the GPU. Experimental results show that the overall computation speedup can be increased up to 129x by using the proposed VANET/GPU simulator.","PeriodicalId":105442,"journal":{"name":"2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123967080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Test oriented formal model of SDN applications 面向测试的SDN应用形式化模型
Jiangyuan Yao, Zhiliang Wang, Xia Yin, Xingang Shi, Jianping Wu, Yahui Li
As the soul of the Software-Defined Networking (SDN), the quality of control plane applications determines the reliability of the networks. Unfortunately, better programmability in SDN increases the risk of bugs and challenges for testing. Because manually testing seems to be inefficient, automatic testing methods become promising alternative. Both white-box method with models and black-box method without model have limitations. In this paper, we propose a formal model for blackbox testing of SDN applications. We use a group of components to describe the data structure stored in the applications and the system behaviors. It is easier and more natural to specify applications. Based on our models, we present our work-in-progress testing framework. It can iteratively improve the design model with model verification and expose implement bugs with model-based testing.
控制平面作为软件定义网络(SDN)的灵魂,其质量决定了网络的可靠性。不幸的是,SDN中更好的可编程性增加了bug的风险和测试的挑战。由于手动测试似乎效率低下,自动测试方法成为有希望的替代方法。有模型的白盒法和无模型的黑盒法都有其局限性。本文提出了一种用于SDN应用黑盒测试的形式化模型。我们使用一组组件来描述存储在应用程序中的数据结构和系统行为。指定应用程序更容易,也更自然。基于我们的模型,我们展示了正在进行的测试框架。它可以通过模型验证迭代地改进设计模型,并通过基于模型的测试暴露实现缺陷。
{"title":"Test oriented formal model of SDN applications","authors":"Jiangyuan Yao, Zhiliang Wang, Xia Yin, Xingang Shi, Jianping Wu, Yahui Li","doi":"10.1109/PCCC.2014.7017024","DOIUrl":"https://doi.org/10.1109/PCCC.2014.7017024","url":null,"abstract":"As the soul of the Software-Defined Networking (SDN), the quality of control plane applications determines the reliability of the networks. Unfortunately, better programmability in SDN increases the risk of bugs and challenges for testing. Because manually testing seems to be inefficient, automatic testing methods become promising alternative. Both white-box method with models and black-box method without model have limitations. In this paper, we propose a formal model for blackbox testing of SDN applications. We use a group of components to describe the data structure stored in the applications and the system behaviors. It is easier and more natural to specify applications. Based on our models, we present our work-in-progress testing framework. It can iteratively improve the design model with model verification and expose implement bugs with model-based testing.","PeriodicalId":105442,"journal":{"name":"2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC)","volume":"199 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124279820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A heuristic for logical data buffer allocation in multicore platforms 多核平台中逻辑数据缓冲区分配的启发式算法
B. Ries, Walter Unger, M. Odendahl, R. Leupers
In the past memory allocation and communication between processors and memories in current MPSoC's, due to the small design space, was not a big challenge. Through advanced MPSoC's and improving techniques to interface Dynamic RAM (DRAM), allocation of logical data buffers to physical memories is no longer manageable manually. We present a heuristic for the mapping of logical data buffers to physical memories and the routing of data flows. Our heuristic use an approximation scheme to obtain an fractional solution, and randomized rounding. We evaluate our implementation for different values of e using representative data of the Long Term Evolution Standard.
在过去的内存分配和通信之间的处理器和存储器在当前的MPSoC的,由于小的设计空间,并不是一个大的挑战。通过先进的MPSoC和改进的技术与动态RAM (DRAM)接口,逻辑数据缓冲区分配到物理内存不再需要手动管理。我们提出了逻辑数据缓冲区映射到物理存储器和数据流路由的启发式方法。我们的启发式使用近似方案来获得分数解,并随机四舍五入。我们使用长期进化标准的代表性数据来评估我们对不同e值的实现。
{"title":"A heuristic for logical data buffer allocation in multicore platforms","authors":"B. Ries, Walter Unger, M. Odendahl, R. Leupers","doi":"10.1109/PCCC.2014.7017040","DOIUrl":"https://doi.org/10.1109/PCCC.2014.7017040","url":null,"abstract":"In the past memory allocation and communication between processors and memories in current MPSoC's, due to the small design space, was not a big challenge. Through advanced MPSoC's and improving techniques to interface Dynamic RAM (DRAM), allocation of logical data buffers to physical memories is no longer manageable manually. We present a heuristic for the mapping of logical data buffers to physical memories and the routing of data flows. Our heuristic use an approximation scheme to obtain an fractional solution, and randomized rounding. We evaluate our implementation for different values of e using representative data of the Long Term Evolution Standard.","PeriodicalId":105442,"journal":{"name":"2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126296980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An efficient spin-lock based multi-core resource sharing protocol 一个高效的基于自旋锁的多核资源共享协议
Martin Alfranseder, Michael Deubzer, Benjamin Justus, J. Mottok, Christian Siemers
We present in this paper a new lock-based resource sharing protocol PWLP (Preemptable Waiting Locking Protocol) for embedded multi-core processors. It is based on the busy-wait model and works with non-preemptive critical sections while task may be preempted by tasks with a higher priority when waiting for resources. Our protocol can be applied in partitioned as well as global scheduling scenarios, in which task-fix, job-fix or dynamically assigned priorities may be used. Furthermore, the PWLP permits nested requests to shared resources. Finally, we present a case study based on event-based simulations in which the FMLP (Flexible Multiprocessor Locking Protocol) and the proposed PWLP are compared.
本文提出了一种新的基于锁的嵌入式多核处理器资源共享协议PWLP (Preemptable Waiting Locking protocol)。它基于忙碌-等待模型,在等待资源时,任务可能会被具有更高优先级的任务抢占,因此可以使用非抢占式临界区。我们的协议既可以应用于分区调度方案,也可以应用于全局调度方案,其中可以使用任务修复、作业修复或动态分配优先级。此外,PWLP允许对共享资源的嵌套请求。最后,我们提出了一个基于事件模拟的案例研究,其中比较了FMLP(柔性多处理器锁定协议)和所提出的PWLP。
{"title":"An efficient spin-lock based multi-core resource sharing protocol","authors":"Martin Alfranseder, Michael Deubzer, Benjamin Justus, J. Mottok, Christian Siemers","doi":"10.1109/PCCC.2014.7017090","DOIUrl":"https://doi.org/10.1109/PCCC.2014.7017090","url":null,"abstract":"We present in this paper a new lock-based resource sharing protocol PWLP (Preemptable Waiting Locking Protocol) for embedded multi-core processors. It is based on the busy-wait model and works with non-preemptive critical sections while task may be preempted by tasks with a higher priority when waiting for resources. Our protocol can be applied in partitioned as well as global scheduling scenarios, in which task-fix, job-fix or dynamically assigned priorities may be used. Furthermore, the PWLP permits nested requests to shared resources. Finally, we present a case study based on event-based simulations in which the FMLP (Flexible Multiprocessor Locking Protocol) and the proposed PWLP are compared.","PeriodicalId":105442,"journal":{"name":"2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131140439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Maximizing system's total accrued utility value for parallel and time-sensitive applications 最大限度地提高系统的总累积效用价值的并行和时间敏感的应用
Shuhui Li, Miao Song, P. Wan, Shangping Ren
For a time-sensitive application, the usefulness or the quality of the application's end result depends on the time when the result is delivered, or when the application is completed. A Time Utility Function (TUF) is often used to represent the dependency between an application's accrued value and its completion time. For parallel and time-sensitive applications, each application has multiple tasks that must be executed concurrently in order to produce a result. Therefore, their execution occupies resources in two dimensions: spatial, i.e., the number of processing units needed to support concurrent tasks, and temporal, i.e., time duration needed to complete the application. Because of the parallelism and time-sensitive features of the applications, the execution interference among parallel and time-sensitive applications can be both in spatial and temporal domains. In this paper, we first introduce a metric to measure the spatial-temporal interference on applications' accrued values. Second, based on the metric, we develop a scheduling algorithm, i.e., the Discounting Spatial-Temporal Interference (DSTI) scheduling algorithm, to maximize system's total accrued utility value for a given set of parallel and time-sensitive applications. Our simulation results show that the proposed DSTI algorithm results in close to optimal solutions and also has clear advantage over existing approaches in the literature in terms of system total accrued utility values and profitable application ratio. It accrues up to 164%, 150%, and 97% more system value, and up to 21%, 35%, and 18% higher profitable application ratio than the Gang EDF, the FCFS with backfilling, and the 0-1 Knapsack based scheduling algorithms, respectively.
对于时间敏感的应用程序,应用程序最终结果的有用性或质量取决于交付结果的时间,或者取决于应用程序完成的时间。时间效用函数(TUF)通常用于表示应用程序的累积值与其完成时间之间的依赖关系。对于并行和时间敏感的应用程序,每个应用程序都有多个任务,必须并发执行这些任务才能产生结果。因此,它们的执行在两个维度上占用资源:空间,即支持并发任务所需的处理单元数量,以及时间,即完成应用程序所需的持续时间。由于应用程序的并行性和时间敏感特性,并行和时间敏感应用程序之间的执行干扰可能同时存在于空间和时间域中。在本文中,我们首先引入了一个度量来衡量应用程序累积值的时空干扰。其次,基于度量,我们开发了一种调度算法,即贴现时空干扰(DSTI)调度算法,以最大化系统的总应计效用值对于给定的一组并行和时间敏感应用。我们的仿真结果表明,所提出的DSTI算法的结果接近最优解,并且在系统总应计效用值和可盈利应用比率方面比文献中的现有方法具有明显的优势。与Gang EDF、带回填的FCFS和基于0-1背包的调度算法相比,它的系统价值分别提高了164%、150%和97%,盈利应用比率分别提高了21%、35%和18%。
{"title":"Maximizing system's total accrued utility value for parallel and time-sensitive applications","authors":"Shuhui Li, Miao Song, P. Wan, Shangping Ren","doi":"10.1109/PCCC.2014.7017062","DOIUrl":"https://doi.org/10.1109/PCCC.2014.7017062","url":null,"abstract":"For a time-sensitive application, the usefulness or the quality of the application's end result depends on the time when the result is delivered, or when the application is completed. A Time Utility Function (TUF) is often used to represent the dependency between an application's accrued value and its completion time. For parallel and time-sensitive applications, each application has multiple tasks that must be executed concurrently in order to produce a result. Therefore, their execution occupies resources in two dimensions: spatial, i.e., the number of processing units needed to support concurrent tasks, and temporal, i.e., time duration needed to complete the application. Because of the parallelism and time-sensitive features of the applications, the execution interference among parallel and time-sensitive applications can be both in spatial and temporal domains. In this paper, we first introduce a metric to measure the spatial-temporal interference on applications' accrued values. Second, based on the metric, we develop a scheduling algorithm, i.e., the Discounting Spatial-Temporal Interference (DSTI) scheduling algorithm, to maximize system's total accrued utility value for a given set of parallel and time-sensitive applications. Our simulation results show that the proposed DSTI algorithm results in close to optimal solutions and also has clear advantage over existing approaches in the literature in terms of system total accrued utility values and profitable application ratio. It accrues up to 164%, 150%, and 97% more system value, and up to 21%, 35%, and 18% higher profitable application ratio than the Gang EDF, the FCFS with backfilling, and the 0-1 Knapsack based scheduling algorithms, respectively.","PeriodicalId":105442,"journal":{"name":"2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123754439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Analysis of cache tuner architectural layouts for multicore embedded systems 多核嵌入式系统缓存调谐器架构布局分析
Tosiron Adegbija, A. Gordon-Ross, M. Rawlins
Due to the memory hierarchy's large contribution to a microprocessor's total power, cache tuning is an ideal method for optimizing overall power consumption in embedded systems. Since most embedded systems are power and area constrained, the hardware and/or software that orchestrate cache tuning - the cache tuner - must not impose significant power and area overhead. Furthermore, as embedded systems increasingly trend towards multicore, inter-core data sharing, communication, and synchronization impose additional cache tuner design complexity, necessitating cross-core cache tuning coordination. In order to minimize cache tuner overhead, cache tuner design must consider these overheads and scalability. Whereas prior work proposes low-overhead cache tuners, scalability to multicore systems requires additional considerations. In this work, we present a low-overhead, scalable cache tuner and extensively evaluate various cache tuner design tradeoffs with respect to power and area for constrained multicore embedded systems. Based on our analysis, we formulate valuable insights and designer-assisted guidelines for modeling scalable and efficient cache tuners that best achieve optimization goals while maintaining power and area constraints.
由于内存层次结构对微处理器的总功耗有很大的贡献,因此缓存调优是优化嵌入式系统中总功耗的理想方法。由于大多数嵌入式系统都受到功率和面积的限制,因此协调缓存调优的硬件和/或软件(缓存调优器)必须不会造成很大的功率和面积开销。此外,随着嵌入式系统越来越趋向于多核,核间数据共享、通信和同步增加了缓存调谐器设计的复杂性,需要跨核缓存调谐协调。为了最小化缓存调谐器开销,缓存调谐器设计必须考虑这些开销和可伸缩性。虽然先前的工作提出了低开销的缓存调优器,但多核系统的可伸缩性需要额外的考虑。在这项工作中,我们提出了一种低开销,可扩展的缓存调谐器,并广泛评估了各种缓存调谐器设计在受限多核嵌入式系统的功率和面积方面的权衡。基于我们的分析,我们制定了有价值的见解和设计师辅助指南,用于建模可扩展和高效的缓存调谐器,在保持功率和面积限制的同时最好地实现优化目标。
{"title":"Analysis of cache tuner architectural layouts for multicore embedded systems","authors":"Tosiron Adegbija, A. Gordon-Ross, M. Rawlins","doi":"10.1109/PCCC.2014.7017091","DOIUrl":"https://doi.org/10.1109/PCCC.2014.7017091","url":null,"abstract":"Due to the memory hierarchy's large contribution to a microprocessor's total power, cache tuning is an ideal method for optimizing overall power consumption in embedded systems. Since most embedded systems are power and area constrained, the hardware and/or software that orchestrate cache tuning - the cache tuner - must not impose significant power and area overhead. Furthermore, as embedded systems increasingly trend towards multicore, inter-core data sharing, communication, and synchronization impose additional cache tuner design complexity, necessitating cross-core cache tuning coordination. In order to minimize cache tuner overhead, cache tuner design must consider these overheads and scalability. Whereas prior work proposes low-overhead cache tuners, scalability to multicore systems requires additional considerations. In this work, we present a low-overhead, scalable cache tuner and extensively evaluate various cache tuner design tradeoffs with respect to power and area for constrained multicore embedded systems. Based on our analysis, we formulate valuable insights and designer-assisted guidelines for modeling scalable and efficient cache tuners that best achieve optimization goals while maintaining power and area constraints.","PeriodicalId":105442,"journal":{"name":"2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127856427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Patterns and modeling of group growth in online social networks 在线社交网络中群体增长的模式和建模
J. Niu, Shaluo Huang, Milica Stojmenovic
We investigate the group growth in online social networks, by analyzing six different user groups (two million users in total) in Douban Network. The size and longevity of posts in the Douban dataset demonstrate a power-law distribution with exponential cutoff and heavy tail, respectively. The frequency of user interactions follows a two-stage power-law distribution, which can distinguish different types of users. The growth of the number of users and the number of posts/replies generated by the users in a given and same time period, in each group, follow an exponential pattern at the initial stage and oscillate dramatically during the rest of the processes. The number of posts/replies has a power-law relation with the number of active users within a period of time. We propose an empirical growth model, Twisted Growth (TG), to portray the relation between the number of users and the amount of the contents they generated. The model derives equations based on the historical data for deciding coefficients, and the assumtion that the contents in one group will attract new users to join, which will lead to growth of users. Further, the newcomers together with original users will create new contents. We validate our TG model through theoretical analysis and simulations over real datasets.
我们通过分析豆瓣网络中6个不同的用户群体(总共200万用户)来调查在线社交网络中的群体增长。豆瓣数据集中帖子的大小和寿命分别表现为指数截断和重尾的幂律分布。用户交互频率遵循两阶段幂律分布,可以区分不同类型的用户。在给定的同一时间段内,每一组的用户数量和用户产生的帖子/回复数量的增长在初始阶段遵循指数模式,在其余过程中急剧振荡。在一段时间内,发帖/回复的数量与活跃用户的数量呈幂律关系。我们提出了一个实证增长模型,扭曲增长(TG),来描述用户数量和他们生成的内容数量之间的关系。该模型根据历史数据推导方程来确定系数,并假设一个群组中的内容会吸引新用户加入,从而导致用户的增长。此外,新用户将与原始用户一起创造新的内容。我们通过理论分析和实际数据集的模拟验证了我们的TG模型。
{"title":"Patterns and modeling of group growth in online social networks","authors":"J. Niu, Shaluo Huang, Milica Stojmenovic","doi":"10.1109/PCCC.2014.7017058","DOIUrl":"https://doi.org/10.1109/PCCC.2014.7017058","url":null,"abstract":"We investigate the group growth in online social networks, by analyzing six different user groups (two million users in total) in Douban Network. The size and longevity of posts in the Douban dataset demonstrate a power-law distribution with exponential cutoff and heavy tail, respectively. The frequency of user interactions follows a two-stage power-law distribution, which can distinguish different types of users. The growth of the number of users and the number of posts/replies generated by the users in a given and same time period, in each group, follow an exponential pattern at the initial stage and oscillate dramatically during the rest of the processes. The number of posts/replies has a power-law relation with the number of active users within a period of time. We propose an empirical growth model, Twisted Growth (TG), to portray the relation between the number of users and the amount of the contents they generated. The model derives equations based on the historical data for deciding coefficients, and the assumtion that the contents in one group will attract new users to join, which will lead to growth of users. Further, the newcomers together with original users will create new contents. We validate our TG model through theoretical analysis and simulations over real datasets.","PeriodicalId":105442,"journal":{"name":"2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130674294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A flexible and scalable high-performance OpenFlow switch on heterogeneous SoC platforms 在异构SoC平台上灵活、可扩展的高性能OpenFlow交换机
Shijie Zhou, Weirong Jiang, V. Prasanna
Software Defined Networking (SDN) has been proposed as a flexible solution for the next generation Internet provision. OpenFlow is a pioneering protocol for SDN which enables a hardware data plane to be managed by a software-based controller in a standard way. In this paper, we present a hardware-software co-design approach of an OpenFlow switch using a state-of-the-art heterogeneous system-on-chip (SoC) platform. Specifically, we implement the OpenFlow switch on a Xilinx Zynq ZC706 board. The Xilinx Zynq SoC family provides a tight coupling of field programmable gate array (FPGA) fabric and ARM processor cores, making it an attractive on-chip implementation platform for SDN switches. High-performance, yet highly-programmable, data plane processing can reside in programmable logic, while complex control software can reside in ARM processor. Our proposed architecture involves a methodology that scales across: (a) a range of possible packet throughput rates and (b) a range of possible flow table sizes. Post-place-and-route results show that our design targeted at Xilinx Zynq can achieve a total 88 Gbps throughput for a 1K flow table which supports dynamic and hitless updates. Correct operation has been demonstrated using a ZC706 board.
软件定义网络(SDN)已被提出作为下一代互联网提供的灵活解决方案。OpenFlow是SDN的先驱协议,它使硬件数据平面能够由基于软件的控制器以标准方式进行管理。在本文中,我们提出了一种使用最先进的异构片上系统(SoC)平台的OpenFlow交换机的硬件软件协同设计方法。具体来说,我们在Xilinx Zynq ZC706板上实现了OpenFlow开关。赛灵思Zynq SoC系列提供了现场可编程门阵列(FPGA)结构和ARM处理器内核的紧密耦合,使其成为SDN交换机的有吸引力的片上实现平台。高性能且高度可编程的数据平面处理可以驻留在可编程逻辑中,而复杂的控制软件可以驻留在ARM处理器中。我们提出的架构涉及一种方法,该方法可在以下范围内扩展:(a)可能的数据包吞吐量范围和(b)可能的流表大小范围。放置和路由后的结果表明,我们针对Xilinx Zynq的设计可以在支持动态和无命中更新的1K流表中实现总计88 Gbps的吞吐量。正确的操作已经演示了使用ZC706板。
{"title":"A flexible and scalable high-performance OpenFlow switch on heterogeneous SoC platforms","authors":"Shijie Zhou, Weirong Jiang, V. Prasanna","doi":"10.1109/PCCC.2014.7017053","DOIUrl":"https://doi.org/10.1109/PCCC.2014.7017053","url":null,"abstract":"Software Defined Networking (SDN) has been proposed as a flexible solution for the next generation Internet provision. OpenFlow is a pioneering protocol for SDN which enables a hardware data plane to be managed by a software-based controller in a standard way. In this paper, we present a hardware-software co-design approach of an OpenFlow switch using a state-of-the-art heterogeneous system-on-chip (SoC) platform. Specifically, we implement the OpenFlow switch on a Xilinx Zynq ZC706 board. The Xilinx Zynq SoC family provides a tight coupling of field programmable gate array (FPGA) fabric and ARM processor cores, making it an attractive on-chip implementation platform for SDN switches. High-performance, yet highly-programmable, data plane processing can reside in programmable logic, while complex control software can reside in ARM processor. Our proposed architecture involves a methodology that scales across: (a) a range of possible packet throughput rates and (b) a range of possible flow table sizes. Post-place-and-route results show that our design targeted at Xilinx Zynq can achieve a total 88 Gbps throughput for a 1K flow table which supports dynamic and hitless updates. Correct operation has been demonstrated using a ZC706 board.","PeriodicalId":105442,"journal":{"name":"2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121372971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
期刊
2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1