首页 > 最新文献

2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools最新文献

英文 中文
A Case for Hardware Task Management Support for the StarSS Programming Model 为StarSS编程模型提供硬件任务管理支持的案例
C. Meenderinck, B. Juurlink
StarSS is a parallel programming model that eases the task of the programmer. He or she has to identify the tasks that can potentially be executed in parallel and the inputs and outputs of these tasks, while the runtime system takes care of the difficult issues of determining inter task dependencies, synchronization, load balancing, scheduling to optimize data locality, etc. Given these issues, however, the runtime system might become a bottleneck that limits the scalability of the system. The contribution of this paper is two-fold. First, we analyze the scalability of the current software runtime system for several synthetic benchmarks with different dependency patterns and task sizes. We show that for fine-grained tasks the system does not scale beyond five cores. Furthermore, we identify the main scalability bottlenecks of the runtime system. Second, we present the design of Nexus, a hardware support system for StarSS applications, that greatly reduces the task management overhead.
StarSS是一种简化程序员任务的并行编程模型。他或她必须确定可能并行执行的任务以及这些任务的输入和输出,而运行时系统则负责确定任务间依赖关系、同步、负载平衡、调度以优化数据局部性等难题。然而,考虑到这些问题,运行时系统可能会成为限制系统可伸缩性的瓶颈。本文的贡献是双重的。首先,我们分析了具有不同依赖模式和任务大小的几个合成基准的当前软件运行时系统的可伸缩性。我们表明,对于细粒度任务,系统不能扩展到超过5个内核。此外,我们还确定了运行时系统的主要可伸缩性瓶颈。其次,我们设计了一个用于StarSS应用程序的硬件支持系统Nexus,它大大降低了任务管理开销。
{"title":"A Case for Hardware Task Management Support for the StarSS Programming Model","authors":"C. Meenderinck, B. Juurlink","doi":"10.1109/DSD.2010.63","DOIUrl":"https://doi.org/10.1109/DSD.2010.63","url":null,"abstract":"StarSS is a parallel programming model that eases the task of the programmer. He or she has to identify the tasks that can potentially be executed in parallel and the inputs and outputs of these tasks, while the runtime system takes care of the difficult issues of determining inter task dependencies, synchronization, load balancing, scheduling to optimize data locality, etc. Given these issues, however, the runtime system might become a bottleneck that limits the scalability of the system. The contribution of this paper is two-fold. First, we analyze the scalability of the current software runtime system for several synthetic benchmarks with different dependency patterns and task sizes. We show that for fine-grained tasks the system does not scale beyond five cores. Furthermore, we identify the main scalability bottlenecks of the runtime system. Second, we present the design of Nexus, a hardware support system for StarSS applications, that greatly reduces the task management overhead.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116554513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Low Power FPGA Implementations of 256-bit Luffa Hash Function 256位丝瓜哈希函数的低功耗FPGA实现
P. Kitsos, N. Sklavos, A. Skodras
Low power techniques in a FPGA implementation of the hash function called Luffa are presented in this paper. This hash function is under consideration for adoption as standard. Two major gate level techniques are introduced in order to reduce the power consumption, namely the pipeline technique (with some variants) and the use of embedded RAM blocks instead of general purpose logic elements. Power consumption reduction from 1.2 to 8.7 times is achieved by means of the proposed techniques compared with the implementation without any low power issue.
本文介绍了一种低功耗的FPGA实现哈希函数Luffa的技术。这个哈希函数正在考虑作为标准采用。为了降低功耗,介绍了两种主要的门级技术,即管道技术(具有某些变体)和使用嵌入式RAM块而不是通用逻辑元件。与没有任何低功耗问题的实现相比,采用所提出的技术可将功耗降低1.2至8.7倍。
{"title":"Low Power FPGA Implementations of 256-bit Luffa Hash Function","authors":"P. Kitsos, N. Sklavos, A. Skodras","doi":"10.1109/DSD.2010.19","DOIUrl":"https://doi.org/10.1109/DSD.2010.19","url":null,"abstract":"Low power techniques in a FPGA implementation of the hash function called Luffa are presented in this paper. This hash function is under consideration for adoption as standard. Two major gate level techniques are introduced in order to reduce the power consumption, namely the pipeline technique (with some variants) and the use of embedded RAM blocks instead of general purpose logic elements. Power consumption reduction from 1.2 to 8.7 times is achieved by means of the proposed techniques compared with the implementation without any low power issue.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121624968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Storage-Aware Value Prediction 存储感知值预测
M. Salehi, A. Baniasadi
Despite the huge potential, value predictors have not been used in modern processors. This is partially due to the complex structures associated with such predictors. In this paper we study value predictors and investigate solutions to reduce storage requirements while imposing negligible coverage cost. Our solutions build on the observation that conventional value predictors do not utilize storage efficiently as they allocate too much space for small and frequently appearing values. We measure data width requirement and entropy in a subset of predictor resources and show that values stored in predictors show limited sizes and very small entropy. We exploit this behavior and suggest different bit sharing solutions for predictors storing single byte values.
尽管潜力巨大,但价值预测器尚未在现代处理器中使用。这部分是由于与这些预测器相关的复杂结构。在本文中,我们研究了价值预测因子,并研究了在施加可忽略不计的覆盖成本的情况下减少存储需求的解决方案。我们的解决方案基于这样的观察:传统的值预测器不能有效地利用存储,因为它们为小而频繁出现的值分配了太多的空间。我们测量预测器资源子集中的数据宽度需求和熵,并显示存储在预测器中的值显示有限的大小和非常小的熵。我们利用这种行为,并为存储单字节值的预测器提出了不同的位共享解决方案。
{"title":"Storage-Aware Value Prediction","authors":"M. Salehi, A. Baniasadi","doi":"10.1109/DSD.2010.70","DOIUrl":"https://doi.org/10.1109/DSD.2010.70","url":null,"abstract":"Despite the huge potential, value predictors have not been used in modern processors. This is partially due to the complex structures associated with such predictors. In this paper we study value predictors and investigate solutions to reduce storage requirements while imposing negligible coverage cost. Our solutions build on the observation that conventional value predictors do not utilize storage efficiently as they allocate too much space for small and frequently appearing values. We measure data width requirement and entropy in a subset of predictor resources and show that values stored in predictors show limited sizes and very small entropy. We exploit this behavior and suggest different bit sharing solutions for predictors storing single byte values.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121721063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Application Dependent FPGA Testing Method 应用相关的FPGA测试方法
M. Rozkovec, Jiri Jenícek, O. Novák
Application dependent FPGA testing can reduce time and memory requirements comparing with the tests that exercise complete FPGA structure. This paper describes a methodology of FPGA testing that does not require reconfiguration of the tested hardware and thus it preserves conditions that caused erroneous behavior of the FPGA during its function. We show that the tested part of the FPGA can be efficiently tested by deterministic test patters even in case if we have no precise information about the internal FPGA structure. It is too hardware consuming to store uncompressed deterministic test patterns on the FPGA. From this reason we propose to compress the deterministic test patterns with the help of COMPAS – a compression system that uses scan chains for pattern decompression. COMPAS is well suited for current FPGAs as they can store the scan chain content in the LUT based shift registers. The COMPAS test compression system is based on test pattern overlapping, we propose an improved version of it. Application of overlapped test patterns requires additional shift registers for saving test patterns during test response recording into the internal scan chains. The neighborhood of the tested part of the FPGA can be dynamically reconfigured into shift registers and ORA. The shift registers contain compressed test sequence and allow fast test pattern decompression. Experimental results given in the paper demonstrate efficiency of the proposed FPGA tetste testing method.
与完整的FPGA结构测试相比,应用相关的FPGA测试可以减少时间和内存需求。本文描述了一种不需要重新配置被测硬件的FPGA测试方法,因此它保留了导致FPGA在其功能期间错误行为的条件。结果表明,即使在没有FPGA内部结构的精确信息的情况下,确定的测试模式也可以有效地测试FPGA的测试部分。在FPGA上存储未压缩的确定性测试模式太耗费硬件。基于这个原因,我们建议在COMPAS的帮助下压缩确定性测试模式,COMPAS是一个使用扫描链进行模式解压缩的压缩系统。COMPAS非常适合当前的fpga,因为它们可以将扫描链内容存储在基于LUT的移位寄存器中。基于测试模式重叠的COMPAS测试压缩系统,提出了一种改进版本。重叠测试模式的应用需要额外的移位寄存器,以便在测试响应记录到内部扫描链期间保存测试模式。FPGA被测部分的邻域可以动态地重新配置为移位寄存器和ORA。移位寄存器包含压缩的测试序列,并允许快速测试模式解压缩。实验结果证明了所提出的FPGA测试方法的有效性。
{"title":"Application Dependent FPGA Testing Method","authors":"M. Rozkovec, Jiri Jenícek, O. Novák","doi":"10.1109/DSD.2010.65","DOIUrl":"https://doi.org/10.1109/DSD.2010.65","url":null,"abstract":"Application dependent FPGA testing can reduce time and memory requirements comparing with the tests that exercise complete FPGA structure. This paper describes a methodology of FPGA testing that does not require reconfiguration of the tested hardware and thus it preserves conditions that caused erroneous behavior of the FPGA during its function. We show that the tested part of the FPGA can be efficiently tested by deterministic test patters even in case if we have no precise information about the internal FPGA structure. It is too hardware consuming to store uncompressed deterministic test patterns on the FPGA. From this reason we propose to compress the deterministic test patterns with the help of COMPAS – a compression system that uses scan chains for pattern decompression. COMPAS is well suited for current FPGAs as they can store the scan chain content in the LUT based shift registers. The COMPAS test compression system is based on test pattern overlapping, we propose an improved version of it. Application of overlapped test patterns requires additional shift registers for saving test patterns during test response recording into the internal scan chains. The neighborhood of the tested part of the FPGA can be dynamically reconfigured into shift registers and ORA. The shift registers contain compressed test sequence and allow fast test pattern decompression. Experimental results given in the paper demonstrate efficiency of the proposed FPGA tetste testing method.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128060111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
A C-to-RTL Flow as an Energy Efficient Alternative to Embedded Processors in Digital Systems C-to-RTL流作为数字系统中嵌入式处理器的节能替代方案
Sameer D. Sahasrabuddhe, S. Subramanian, Kunal P. Ghosh, K. Arya, M. Desai
We present a high-level synthesis flow for mapping an algorithm description (in C) to a provably equivalent register transfer level (RTL) description of hardware. This flow uses an intermediate representation which is an orthogonal factorization of the program behavior into control, data and memory aspects, and is suitable for the description of large systems. We show that optimizations such as arbiter-less resource sharing can be efficiently computed on this representation. We apply the flow to a wide range of examples ranging from stream ciphers to database and linear algebra applications. The resulting RTL is then put through a standard ASIC tool chain (synthesis followed by automatic place-and-route), and the performance and power dissipation of the resulting layout is computed. We observe that the energy consumption (per completed task) of each resulting circuit is considerably lower than that of an equivalent executable running on a low-power processor, indicating that this C-to-RTL flow offers an energy efficient alternative to the use of embedded processors in mapping algorithms to digital VLSI systems.
我们提出了一个高级合成流程,用于将算法描述(用C语言)映射到可证明的等效寄存器传输级(RTL)硬件描述。该流程使用一种中间表示,即将程序行为正交分解为控制、数据和内存方面,适用于大型系统的描述。我们证明了在这种表示上可以有效地计算诸如无仲裁资源共享之类的优化。我们将流应用于从流密码到数据库和线性代数应用的广泛示例。然后将得到的RTL放入标准的ASIC工具链(综合之后是自动放置和布线),并计算得到的布局的性能和功耗。我们观察到,每个结果电路的能耗(每个完成的任务)大大低于在低功耗处理器上运行的等效可执行文件的能耗,这表明这种C-to-RTL流程为将算法映射到数字VLSI系统中使用嵌入式处理器提供了一种节能替代方案。
{"title":"A C-to-RTL Flow as an Energy Efficient Alternative to Embedded Processors in Digital Systems","authors":"Sameer D. Sahasrabuddhe, S. Subramanian, Kunal P. Ghosh, K. Arya, M. Desai","doi":"10.1109/DSD.2010.52","DOIUrl":"https://doi.org/10.1109/DSD.2010.52","url":null,"abstract":"We present a high-level synthesis flow for mapping an algorithm description (in C) to a provably equivalent register transfer level (RTL) description of hardware. This flow uses an intermediate representation which is an orthogonal factorization of the program behavior into control, data and memory aspects, and is suitable for the description of large systems. We show that optimizations such as arbiter-less resource sharing can be efficiently computed on this representation. We apply the flow to a wide range of examples ranging from stream ciphers to database and linear algebra applications. The resulting RTL is then put through a standard ASIC tool chain (synthesis followed by automatic place-and-route), and the performance and power dissipation of the resulting layout is computed. We observe that the energy consumption (per completed task) of each resulting circuit is considerably lower than that of an equivalent executable running on a low-power processor, indicating that this C-to-RTL flow offers an energy efficient alternative to the use of embedded processors in mapping algorithms to digital VLSI systems.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127047891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Path-Delay Fault Testing in Embedded Content Addressable Memories 嵌入式内容可寻址存储器中的路径延迟故障测试
P. Manikandan, Bjørn B. Larsen, E. Aas
Delay faults in content addressable memories (CAMs) is a major concern in many applications such as network routers, IP filters, longest prefix matching (LPM) search engines and cache tags where high speed data search is significant. It creates the need for analysis of critical paths and detecting associated faults using a minimum number of test patterns. This paper proposes a test method to detect critical path delay faults in CAM systems using a newly proposed low power TCAM cell structure. The proposed complement bit walk (CBW) algorithms are using low time complexity such as 3m+n and 2m+2n operations. The fault simulation of the given TCAM system provides 100% fault coverage for the write, search and pseudo logic faults.
内容可寻址存储器(CAMs)中的延迟故障在许多应用中是一个主要问题,例如网络路由器、IP过滤器、最长前缀匹配(LPM)搜索引擎和高速数据搜索非常重要的缓存标签。它需要使用最少数量的测试模式来分析关键路径和检测相关的故障。本文提出了一种利用新提出的低功耗TCAM单元结构检测CAM系统关键路径延迟故障的测试方法。所提出的补位行走(CBW)算法使用了3m+n和2m+2n操作等低时间复杂度。给定TCAM系统的故障仿真对写、搜索和伪逻辑故障提供了100%的故障覆盖率。
{"title":"Path-Delay Fault Testing in Embedded Content Addressable Memories","authors":"P. Manikandan, Bjørn B. Larsen, E. Aas","doi":"10.1109/DSD.2010.48","DOIUrl":"https://doi.org/10.1109/DSD.2010.48","url":null,"abstract":"Delay faults in content addressable memories (CAMs) is a major concern in many applications such as network routers, IP filters, longest prefix matching (LPM) search engines and cache tags where high speed data search is significant. It creates the need for analysis of critical paths and detecting associated faults using a minimum number of test patterns. This paper proposes a test method to detect critical path delay faults in CAM systems using a newly proposed low power TCAM cell structure. The proposed complement bit walk (CBW) algorithms are using low time complexity such as 3m+n and 2m+2n operations. The fault simulation of the given TCAM system provides 100% fault coverage for the write, search and pseudo logic faults.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126034672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Re-NUCA: Boosting CMP Performance Through Block Replication Re-NUCA:通过块复制提高CMP性能
P. Foglia, C. Prete, M. Solinas, Giovanna Monni
Chip Multiprocessor (CMP) systems have become the reference architecture for designing micro-processors, thanks to the improvements in semiconductor nanotechnology that have continuously provided a crescent number of faster and smaller per-chip transistors. The interests for CMPs grew up since classical techniques for boosting performance, e.g. the increase of clock frequency and the amount of work performed at each clock cycle, can no longer deliver to significant improvement due to energy constrains and wire delay effects. CMP systems generally adopt a large last-level-cache (LLC) (typically, L2 or L3) shared among all cores, and private L1 caches. As the miss resolution time for private caches depends on the response time of the LLC, which is wire-delay dominated, performance are affected by wire delay. NUCA caches have been proposed for single and multi core systems as a mechanism for tolerating wire-delay effects on the overall performance. In this paper, we introduce a novel NUCA architecture, called Re-NUCA, specifically suited for (but not limited to) CMPs in which cores are placed at different sides of the shared cache. The idea is to allow shared blocks to be replicated inside the shared cache, in order to avoid the limitations to performance improvements that arise in classical D-NUCA caches due to the conflict hit problem. Our results show that Re-NUCA outperforms D-NUCA of more then 5% on average, but for those applications that strongly suffer from the conflict hit problem we observe performance improvements up to 15%.
芯片多处理器(CMP)系统已经成为设计微处理器的参考架构,这要归功于半导体纳米技术的进步,它不断地提供了数量如新月一般的更快、更小的单片晶体管。由于能量限制和电线延迟效应,提高性能的经典技术(例如增加时钟频率和每个时钟周期执行的工作量)不再能够提供显着的改进,因此对cmp的兴趣不断增长。CMP系统通常采用在所有核心之间共享的大型最后一级缓存(LLC)(通常是L2或L3)和专用L1缓存。由于私有缓存的miss解析时间取决于LLC的响应时间,而LLC的响应时间以线延迟为主,因此线延迟会影响性能。NUCA缓存已被提议用于单核和多核系统,作为容忍线延迟对整体性能影响的机制。在本文中,我们介绍了一种新的NUCA架构,称为Re-NUCA,特别适用于(但不限于)cmp,其中内核放置在共享缓存的不同侧。这个想法是允许在共享缓存内复制共享块,以避免由于冲突命中问题而在经典D-NUCA缓存中出现的性能改进限制。我们的结果表明,Re-NUCA的性能平均优于D-NUCA 5%以上,但对于那些严重遭受冲突打击问题的应用程序,我们观察到性能提高高达15%。
{"title":"Re-NUCA: Boosting CMP Performance Through Block Replication","authors":"P. Foglia, C. Prete, M. Solinas, Giovanna Monni","doi":"10.1109/DSD.2010.41","DOIUrl":"https://doi.org/10.1109/DSD.2010.41","url":null,"abstract":"Chip Multiprocessor (CMP) systems have become the reference architecture for designing micro-processors, thanks to the improvements in semiconductor nanotechnology that have continuously provided a crescent number of faster and smaller per-chip transistors. The interests for CMPs grew up since classical techniques for boosting performance, e.g. the increase of clock frequency and the amount of work performed at each clock cycle, can no longer deliver to significant improvement due to energy constrains and wire delay effects. CMP systems generally adopt a large last-level-cache (LLC) (typically, L2 or L3) shared among all cores, and private L1 caches. As the miss resolution time for private caches depends on the response time of the LLC, which is wire-delay dominated, performance are affected by wire delay. NUCA caches have been proposed for single and multi core systems as a mechanism for tolerating wire-delay effects on the overall performance. In this paper, we introduce a novel NUCA architecture, called Re-NUCA, specifically suited for (but not limited to) CMPs in which cores are placed at different sides of the shared cache. The idea is to allow shared blocks to be replicated inside the shared cache, in order to avoid the limitations to performance improvements that arise in classical D-NUCA caches due to the conflict hit problem. Our results show that Re-NUCA outperforms D-NUCA of more then 5% on average, but for those applications that strongly suffer from the conflict hit problem we observe performance improvements up to 15%.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127292852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
An Approximate Maximum Common Subgraph Algorithm for Large Digital Circuits 大型数字电路的近似最大公子图算法
J. Rutgers, P. T. Wolkotte, P. Hölzenspies, J. Kuper, G. Smit
This paper presents an approximate Maximum Common Sub graph (MCS) algorithm, specifically for directed, cyclic graphs representing digital circuits. Because of the application domain, the graphs have nice properties: they are very sparse, have many different labels, and most vertices have only one predecessor. The algorithm iterates over all vertices once and uses heuristics to find the MCS. It is linear in computational complexity with respect to the size of the graph. Experiments show that very large common sub graphs were found in graphs of up to 200,000 vertices within a few minutes, when a quarter or less of the graphs differ. The variation in run-time and quality of the result is low.
本文提出了一种近似的最大公共子图(MCS)算法,专门用于表示数字电路的有向循环图。由于应用领域的原因,图具有很好的属性:它们非常稀疏,有许多不同的标签,并且大多数顶点只有一个前身。该算法对所有顶点迭代一次,并使用启发式方法找到MCS。它的计算复杂度与图的大小是线性的。实验表明,当四分之一或更少的图不同时,在几分钟内,在多达200,000个顶点的图中发现了非常大的公共子图。运行时的变化和结果的质量很低。
{"title":"An Approximate Maximum Common Subgraph Algorithm for Large Digital Circuits","authors":"J. Rutgers, P. T. Wolkotte, P. Hölzenspies, J. Kuper, G. Smit","doi":"10.1109/DSD.2010.29","DOIUrl":"https://doi.org/10.1109/DSD.2010.29","url":null,"abstract":"This paper presents an approximate Maximum Common Sub graph (MCS) algorithm, specifically for directed, cyclic graphs representing digital circuits. Because of the application domain, the graphs have nice properties: they are very sparse, have many different labels, and most vertices have only one predecessor. The algorithm iterates over all vertices once and uses heuristics to find the MCS. It is linear in computational complexity with respect to the size of the graph. Experiments show that very large common sub graphs were found in graphs of up to 200,000 vertices within a few minutes, when a quarter or less of the graphs differ. The variation in run-time and quality of the result is low.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121316926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Trading Hardware Overhead for Communication Performance in Mesh-Type Topologies 在网格型拓扑中交换通信性能的硬件开销
C. Cornelius, Philipp Gorski, S. Kubisch, D. Timmermann
Several alternatives of mesh-type topologies have been published for the use in Networks-on-Chip. Due to their regularity, mesh-type topologies often serve as a foundation to investigate new ideas or to customize the topology to application-specific needs. This paper analyzes existing mesh-type topologies and compares their characteristics in terms of communication and implementation costs. Furthermore, this paper proposes BEAM (Border-Enhanced Mesh) − a mesh-type topology for Networks-on-Chip. BEAM uses concentration while necessitating only low-radix routers. Thereto, additional resources are connected to the outer boundaries of a conventional mesh. As a result, overall bandwidth is traded off against hardware overhead. In conclusion, simulation and synthesis results show that the conventional mesh stands out due to its communication performance, whereas clustered and concentrated topologies offer the least hardware overhead. BEAM ranges in between and is an option to balance hardware costs and communication performance.
在片上网络中已经发表了几种网格型拓扑的替代方案。由于其规律性,网格型拓扑通常作为研究新思想或根据特定应用程序需求定制拓扑的基础。本文分析了现有的网格型拓扑,比较了它们在通信和实现成本方面的特点。此外,本文提出了BEAM (Border-Enhanced Mesh) -一种用于片上网络的网格型拓扑结构。BEAM使用集中,而只需要低基数路由器。因此,附加资源被连接到常规网格的外部边界。因此,总体带宽与硬件开销相权衡。综上所述,仿真和综合结果表明,传统的网状网络由于其通信性能而脱颖而出,而聚类和集中拓扑提供了最小的硬件开销。BEAM介于两者之间,是平衡硬件成本和通信性能的一种选择。
{"title":"Trading Hardware Overhead for Communication Performance in Mesh-Type Topologies","authors":"C. Cornelius, Philipp Gorski, S. Kubisch, D. Timmermann","doi":"10.1109/DSD.2010.67","DOIUrl":"https://doi.org/10.1109/DSD.2010.67","url":null,"abstract":"Several alternatives of mesh-type topologies have been published for the use in Networks-on-Chip. Due to their regularity, mesh-type topologies often serve as a foundation to investigate new ideas or to customize the topology to application-specific needs. This paper analyzes existing mesh-type topologies and compares their characteristics in terms of communication and implementation costs. Furthermore, this paper proposes BEAM (Border-Enhanced Mesh) − a mesh-type topology for Networks-on-Chip. BEAM uses concentration while necessitating only low-radix routers. Thereto, additional resources are connected to the outer boundaries of a conventional mesh. As a result, overall bandwidth is traded off against hardware overhead. In conclusion, simulation and synthesis results show that the conventional mesh stands out due to its communication performance, whereas clustered and concentrated topologies offer the least hardware overhead. BEAM ranges in between and is an option to balance hardware costs and communication performance.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126169875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Design Process for Hardware/Software System Co-design and its Application to Designing a Reconfigurable FPGA 硬件/软件系统协同设计过程及其在可重构FPGA设计中的应用
F. Moreno, I. López, R. Sanz
This paper is going to address the topic of hardware/software systems co-design. The paper will develop two points of view. First, it provides a system-theoretical layout on the problem of designing hardware-software systems. This layout will enable the designer to proceed systematically in optimizing the tradeoff between the desired functionality, available resources and operating conditions. Second, the paper will describe an application of some of the theoretical principles to the design of an embedded automotive system built on a low-cost FPGA.
本文将讨论硬件/软件系统协同设计的主题。本文将阐述两个观点。首先,对硬件软件系统的设计问题进行了系统的理论布局。这种布局将使设计人员能够系统地在期望的功能、可用资源和操作条件之间进行优化权衡。其次,本文将描述一些理论原理在基于低成本FPGA的嵌入式汽车系统设计中的应用。
{"title":"A Design Process for Hardware/Software System Co-design and its Application to Designing a Reconfigurable FPGA","authors":"F. Moreno, I. López, R. Sanz","doi":"10.1109/DSD.2010.43","DOIUrl":"https://doi.org/10.1109/DSD.2010.43","url":null,"abstract":"This paper is going to address the topic of hardware/software systems co-design. The paper will develop two points of view. First, it provides a system-theoretical layout on the problem of designing hardware-software systems. This layout will enable the designer to proceed systematically in optimizing the tradeoff between the desired functionality, available resources and operating conditions. Second, the paper will describe an application of some of the theoretical principles to the design of an embedded automotive system built on a low-cost FPGA.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128447527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1