2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)最新文献

英文中文

2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203799

L. Amarù, Mathias Soeken, P. Vuillod, Jiong Luo, A. Mishchenko, P. Gaillardon, Janet Olson, R. Brayton, G. Micheli

Given (i) a Boolean function, (ii) a set of arrival times at the inputs, and (iii) a gate library with associated delay values, the exact delay synthesis problem asks for a circuit implementation which minimizes the arrival time at the output(s). The exact delay synthesis problem, with given input arrival times, relates to computing the communication complexity of a Boolean function, which is an intractable problem. Input arrival times are variable and can take any value, thereby making the exact delay synthesis search space infinite. This paper presents theory and algorithms for exact delay synthesis. We introduce the theory of equioptimizable arrival times, which allows us to partition all arrival time patterns into a finite set of equivalence classes. Thanks to this new theory, we create for the first time exact delay circuit databases covering all Boolean functions up to 5 variables and all possible arrival time patterns. We describe further arrival time compression techniques which enable the creation of larger databases. We propose an enhanced delay synthesis flow capable of dealing with large circuits, combining exact delay logic rewriting and Boolean optimization techniques, attaining unprecedented results. We improve 9/10 of the best known results in the EPFL arithmetic delay synthesis competition, outperforming previous best results up to 3x. Embedded in a commercial EDA flow for ASICs, our exact delay synthesis techniques reduce the total negative slack by 12.17%, after physical implementation, at negligible area and runtime costs.

给定(i)一个布尔函数，(ii)一组到达输入端的时间，以及(iii)一个带有相关延迟值的门库，精确的延迟合成问题要求一个最小化到达输出端的时间的电路实现。给定输入到达时间下的精确延迟合成问题涉及布尔函数的通信复杂度计算，是一个棘手的问题。输入到达时间是可变的，可以取任意值，从而使精确延迟综合搜索空间无限。本文给出了精确延迟合成的理论和算法。我们引入了等优化到达时间理论，它允许我们将所有到达时间模式划分为有限的等价类集合。由于这个新理论，我们首次创建了精确的延迟电路数据库，涵盖了所有布尔函数最多5个变量和所有可能的到达时间模式。我们描述了进一步的到达时间压缩技术，它可以创建更大的数据库。我们提出了一种能够处理大型电路的增强延迟合成流，结合了精确的延迟逻辑重写和布尔优化技术，获得了前所未有的结果。我们提高了EPFL算法延迟合成竞赛中最知名结果的9/10，比之前的最佳结果高出3倍。我们的精确延迟合成技术嵌入到用于asic的商业EDA流程中，在物理实现后，以可忽略不计的面积和运行时间成本将总负空闲减少12.17%。

{"title":"Enabling exact delay synthesis","authors":"L. Amarù, Mathias Soeken, P. Vuillod, Jiong Luo, A. Mishchenko, P. Gaillardon, Janet Olson, R. Brayton, G. Micheli","doi":"10.1109/ICCAD.2017.8203799","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203799","url":null,"abstract":"Given (i) a Boolean function, (ii) a set of arrival times at the inputs, and (iii) a gate library with associated delay values, the exact delay synthesis problem asks for a circuit implementation which minimizes the arrival time at the output(s). The exact delay synthesis problem, with given input arrival times, relates to computing the communication complexity of a Boolean function, which is an intractable problem. Input arrival times are variable and can take any value, thereby making the exact delay synthesis search space infinite. This paper presents theory and algorithms for exact delay synthesis. We introduce the theory of equioptimizable arrival times, which allows us to partition all arrival time patterns into a finite set of equivalence classes. Thanks to this new theory, we create for the first time exact delay circuit databases covering all Boolean functions up to 5 variables and all possible arrival time patterns. We describe further arrival time compression techniques which enable the creation of larger databases. We propose an enhanced delay synthesis flow capable of dealing with large circuits, combining exact delay logic rewriting and Boolean optimization techniques, attaining unprecedented results. We improve 9/10 of the best known results in the EPFL arithmetic delay synthesis competition, outperforming previous best results up to 3x. Embedded in a commercial EDA flow for ASICs, our exact delay synthesis techniques reduce the total negative slack by 12.17%, after physical implementation, at negligible area and runtime costs.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"159 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115797250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Simultaneous template assignment and layout decomposition using multiple bcp materials in DSA-MP lithography 在DSA-MP光刻中使用多个bcp材料同时分配模板和布局分解

2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203784

Kuo-Hao Wu, Shao-Yun Fang

In sub 10-nm technology nodes, the directed self-assembly technology with multiple patterning lithography (DSA-MP) is a promising solution for contact/via layer fabrication. However, previous studies using multiple patterning with a single block copolymer (BCP) material still suffer from low via manufacturability due to limited types of feasible guiding templates. To mitigate the problem, multiple patterning in combination with two different BCP materials has been proposed, which contributes to more flexible DSA-compatible pattern matching. In this paper, we propose the first work of simultaneous guiding template assignment and layout decomposition with multiple BCP materials for general via layouts in DSA-MP. An optimal integer linear programming (ILP) formulation and a practical and sophisticated heuristic algorithm are proposed. Experimental results indicate that adopting two different BCP materials can greatly reduce conflict numbers compared with existing works using a single BCP material, and the proposed heuristic method can efficiently obtain good solutions.

在亚10nm技术节点上，采用多模式光刻(DSA-MP)的定向自组装技术是一种很有前途的接触/通孔层制造解决方案。然而，由于可行的引导模板类型有限，先前使用单一嵌段共聚物(BCP)材料进行多种图案制作的研究仍然存在低可制造性的问题。为了解决这个问题，提出了两种不同BCP材料的多模式组合，这有助于更灵活的dsa兼容模式匹配。在本文中，我们提出了DSA-MP中通用通孔布局的多BCP材料同时指导模板分配和布局分解的第一项工作。提出了一种最优整数线性规划(ILP)公式和一种实用而复杂的启发式算法。实验结果表明，采用两种不同的BCP材料与使用单一BCP材料相比，可以大大减少冲突次数，并且所提出的启发式方法可以有效地获得较好的解。

引用次数: 9

Approximating complex arithmetic circuits with formal error guarantees: 32-bit multipliers accomplished 用形式误差保证近似复杂算术电路:完成32位乘法器

2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203807

Milan Ceska, Jiří Matyáš, Vojtěch Mrázek, L. Sekanina, Z. Vašíček, Tomáš Vojnar

We present a novel method allowing one to approximate complex arithmetic circuits with formal guarantees on the approximation error. The method integrates in a unique way formal techniques for approximate equivalence checking into a search-based circuit optimisation algorithm. The key idea of our approach is to employ a novel search strategy that drives the search towards promptly verifiable approximate circuits. The method was implemented within the ABC tool and extensively evaluated on functional approximation of multipliers (with up to 32-bit operands) and adders (with up to 128-bit operands). Within a few hours, we constructed a high-quality Pareto set of 32-bit multipliers providing trade-offs between the circuit error and size. This is for the first time when such complex approximate circuits with formal error guarantees have been derived, which demonstrates an outstanding performance and scalability of our approach compared with existing methods that have either been applied to the approximation of multipliers limited to 8-bit operands or statistical testing has been used only. Our approach thus significantly improves capabilities of the existing methods and paves a way towards an automated design process of provably-correct circuit approximations.

我们提出了一种新的方法，可以近似复杂的算术电路，并对近似误差有形式保证。该方法以一种独特的方式将近似等效性检查的形式化技术集成到基于搜索的电路优化算法中。我们的方法的关键思想是采用一种新的搜索策略，推动搜索迅速可验证的近似电路。该方法在ABC工具中实现，并在乘数(最多32位操作数)和加法器(最多128位操作数)的函数近似上进行了广泛评估。在几个小时内，我们构建了一个高质量的32位乘法器集，提供了电路误差和大小之间的权衡。这是第一次推导出具有正式误差保证的复杂近似电路，这表明与现有方法相比，我们的方法具有出色的性能和可扩展性，这些方法要么应用于仅限于8位操作数的乘法器的近似，要么只使用统计测试。因此，我们的方法显着提高了现有方法的能力，并为可证明正确的电路近似的自动化设计过程铺平了道路。

{"title":"Approximating complex arithmetic circuits with formal error guarantees: 32-bit multipliers accomplished","authors":"Milan Ceska, Jiří Matyáš, Vojtěch Mrázek, L. Sekanina, Z. Vašíček, Tomáš Vojnar","doi":"10.1109/ICCAD.2017.8203807","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203807","url":null,"abstract":"We present a novel method allowing one to approximate complex arithmetic circuits with formal guarantees on the approximation error. The method integrates in a unique way formal techniques for approximate equivalence checking into a search-based circuit optimisation algorithm. The key idea of our approach is to employ a novel search strategy that drives the search towards promptly verifiable approximate circuits. The method was implemented within the ABC tool and extensively evaluated on functional approximation of multipliers (with up to 32-bit operands) and adders (with up to 128-bit operands). Within a few hours, we constructed a high-quality Pareto set of 32-bit multipliers providing trade-offs between the circuit error and size. This is for the first time when such complex approximate circuits with formal error guarantees have been derived, which demonstrates an outstanding performance and scalability of our approach compared with existing methods that have either been applied to the approximation of multipliers limited to 8-bit operands or statistical testing has been used only. Our approach thus significantly improves capabilities of the existing methods and paves a way towards an automated design process of provably-correct circuit approximations.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126367807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 46

Model and integrate medical resource availability into verifiably correct executable medical guidelines 将医疗资源可用性建模并集成到可验证的正确的可执行医疗指南中

2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203885

Chunhui Guo, Zhicheng Fu, Zhenyu Zhang, Shangping Ren, L. Sha

Improving effectiveness and safety of patient care is an ultimate objective for medical cyber-physical systems. A recent study shows that the patients' death rate can be reduced by computerizing medical guidelines [20]. Most existing medical guideline models are validated and/or verified based on the assumption that all necessary medical resources needed for a patient care are always available. However, the reality is that some medical resources, such as special medical equipment or medical specialists, can be temporarily unavailable for an individual patient. In such cases, safety properties validated and/or verified in existing medical guideline models without considering medical resource availability may not hold any more.

提高患者护理的有效性和安全性是医疗信息物理系统的最终目标。最近的一项研究表明，将医疗指南电脑化可以降低患者的死亡率[20]。大多数现有的医学指南模型都是基于患者护理所需的所有必要医疗资源总是可用的假设来验证和/或验证的。然而，现实情况是，某些医疗资源，如特殊医疗设备或医疗专家，可能暂时无法为个别病人提供。在这种情况下，在不考虑医疗资源可用性的情况下，在现有医学指南模型中验证和/或验证的安全特性可能不再成立。

引用次数: 1

Rapid gate sizing with fewer iterations of Lagrangian Relaxation 具有较少拉格朗日松弛迭代的栅极快速定径

2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203797

A. Sharma, D. Chinnery, S. Dhamdhere, C. Chu

Existing Lagrangian Relaxation (LR) based gate sizers take many iterations to converge to a competitive solution. In this paper, we propose a novel LR based gate sizer which dramatically reduces the number of iterations while achieving a similar reduction in leakage power and meeting the timing constraints. The decrease in the iteration count is enabled by an elegant Lagrange multiplier update strategy for rapid coarse-grained optimization as well as finer-grained timing and power recovery techniques, which allow the coarse-grained optimization to terminate early without compromising the solution quality. Since LR iterations dominate the total runtime, our gate sizer achieves an average speedup of 2.5x in runtime and saves 1% more power compared to the previous fastest work.

现有的基于拉格朗日松弛(LR)的栅极尺寸计算需要经过多次迭代才能收敛到竞争解决方案。在本文中，我们提出了一种新颖的基于LR的栅极尺寸计，它可以显着减少迭代次数，同时实现类似的泄漏功率降低并满足时序约束。迭代次数的减少是由用于快速粗粒度优化的优雅Lagrange乘数更新策略以及细粒度定时和功率恢复技术实现的，这些技术允许粗粒度优化在不影响解决方案质量的情况下提前终止。由于LR迭代主导了整个运行时，我们的栅极大小器在运行时实现了2.5倍的平均加速，并且与之前最快的工作相比节省了1%的功率。

引用次数: 11

Clock-aware placement for large-scale heterogeneous FPGAs 大规模异构fpga的时钟感知布局

2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203821

Yun-Chih Kuo, Chau-Chin Huang, Shih-Chun Chen, Chun-Han Chiang, Yao-Wen Chang, S. Kuo

A modern FPGA often contains an ASIC-like clocking architecture which is crucial to achieve better skew and performance. Existing conventional FPGA placement algorithms seldom consider clocking resources, and thus may lead to clock routing failures. To address the special FPGA clocking architecture, this paper presents a novel clock-aware placement algorithm for large-scale heterogeneous FPGAs. Our algorithm consists of three major stages: (1) a nonlinear global placement framework with clock fence region construction, (2) a clock-aware packing scheme, and (3) clock-aware legalization and detailed placement. We evaluate our results based on the 2017 ISPD Clock-Aware Placement Contest benchmark suite. Compared with the top three winners, the results show that our algorithm achieves the best overall routed wirelength. On average, our algorithm outperforms the top-3 winners by 3.6%, 7.5%, and 12.9% in routed wirelength, respectively.

现代FPGA通常包含类似asic的时钟架构，这对于实现更好的倾斜和性能至关重要。现有的传统FPGA布局算法很少考虑时钟资源，因此可能导致时钟路由失败。针对FPGA特殊的时钟结构，提出了一种新的大规模异构FPGA的时钟感知放置算法。该算法包括三个主要阶段:(1)具有时钟栅栏区域构造的非线性全局布局框架;(2)时钟感知的打包方案;(3)时钟感知的合法化和详细布局。我们根据2017年ISPD时钟感知安置竞赛基准套件评估我们的结果。与前三名算法进行比较，结果表明本文算法实现了最佳的总路由长度。平均而言，我们的算法在路由长度上分别比前三名的算法高出3.6%、7.5%和12.9%。

引用次数: 20

ATRIUM: Runtime attestation resilient under memory attacks ATRIUM:运行时认证在内存攻击下具有弹性

2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203803

Shaza Zeitouni, Ghada Dessouky, Orlando Arias, Dean Sullivan, Ahmad Ibrahim, Yier Jin, A. Sadeghi

Remote attestation is an important security service that allows a trusted party (verifier) to verify the integrity of a software running on a remote and potentially compromised device (prover). The security of existing remote attestation schemes relies on the assumption that attacks are software-only and that the prover's code cannot be modified at runtime. However, in practice, these schemes can be bypassed in a stronger and more realistic adversary model that is hereby capable of controlling and modifying code memory to attest benign code but execute malicious code instead — leaving the underlying system vulnerable to Time of Check Time of Use (TOCTOU) attacks. In this work, we first demonstrate TOCTOU attacks on recently proposed attestation schemes by exploiting physical access to prover's memory. Then we present the design and proof-of-concept implementation of ATRIUM, a runtime remote attestation system that securely attests both the code's binary and its execution behavior under memory attacks. ATRIUM provides resilience against both software- and hardware-based TOCTOU attacks, while incurring minimal area and performance overhead.

远程认证是一项重要的安全服务，它允许受信任的一方(验证者)验证在远程和可能受到损害的设备(证明者)上运行的软件的完整性。现有远程认证方案的安全性依赖于这样的假设:攻击仅针对软件，并且证明者的代码不能在运行时修改。然而，在实践中，这些方案可以在更强大、更现实的对手模型中绕过，该模型因此能够控制和修改代码内存以验证良性代码，但却执行恶意代码，从而使底层系统容易受到检查使用时间(TOCTOU)攻击。在这项工作中，我们首先通过利用对证明者内存的物理访问来演示TOCTOU攻击最近提出的认证方案。然后，我们介绍了ATRIUM的设计和概念验证实现，ATRIUM是一个运行时远程认证系统，可以安全地验证代码的二进制和内存攻击下的执行行为。ATRIUM提供了针对基于软件和硬件的TOCTOU攻击的弹性，同时产生最小的面积和性能开销。

{"title":"ATRIUM: Runtime attestation resilient under memory attacks","authors":"Shaza Zeitouni, Ghada Dessouky, Orlando Arias, Dean Sullivan, Ahmad Ibrahim, Yier Jin, A. Sadeghi","doi":"10.1109/ICCAD.2017.8203803","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203803","url":null,"abstract":"Remote attestation is an important security service that allows a trusted party (verifier) to verify the integrity of a software running on a remote and potentially compromised device (prover). The security of existing remote attestation schemes relies on the assumption that attacks are software-only and that the prover's code cannot be modified at runtime. However, in practice, these schemes can be bypassed in a stronger and more realistic adversary model that is hereby capable of controlling and modifying code memory to attest benign code but execute malicious code instead — leaving the underlying system vulnerable to Time of Check Time of Use (TOCTOU) attacks. In this work, we first demonstrate TOCTOU attacks on recently proposed attestation schemes by exploiting physical access to prover's memory. Then we present the design and proof-of-concept implementation of ATRIUM, a runtime remote attestation system that securely attests both the code's binary and its execution behavior under memory attacks. ATRIUM provides resilience against both software- and hardware-based TOCTOU attacks, while incurring minimal area and performance overhead.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123855703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 78

Energy efficient runtime approximate computing on data flow graphs 数据流图上的节能运行时近似计算

2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203876

Mingze Gao, G. Qu

Approximate computing is an emerging computation paradigm that utilizes many applications' intrinsic error resilience to improve power and energy efficiency. Several approaches have been proposed to identify the non-critical computations by analyzing the output sensitivity to the accuracy of the results, and then perform approximate computing on these computations. However, these static approaches only use the prior knowledge (e.g. input ranges) for analysis and fail to consider the runtime information, which limits the energy saving and incurs large computation error. In this paper, we propose a runtime approximate computing framework to solve this problem. The basic idea is to use a low cost method to estimate the impact of each immediate input value to the accuracy of computation at every node in the data flow graph, and then decide whether we should simply use the estimated value or perform an accurate computation. Our novel runtime estimation method is based on converting data to the logarithmic representation. We propose two algorithms to make the decision at certain nodes whether an accurate computation will be needed to balance energy saving and computation error. Experimental results show that this tradeoff ranges from 40% energy saving with 4.85% error on average to 8% energy saving with 0.18% error. Compared to the static DFG node cutting approach, our approach's estimation accuracy is 32x better to achieve the same amount of energy saving.

近似计算是一种新兴的计算范式，它利用许多应用程序固有的错误弹性来提高功率和能源效率。提出了几种方法，通过分析输出对结果精度的敏感性来识别非关键计算，然后对这些计算进行近似计算。然而，这些静态方法仅使用先验知识(例如输入范围)进行分析，而没有考虑运行时信息，这限制了节能并且会产生较大的计算误差。在本文中，我们提出了一个运行时近似计算框架来解决这个问题。其基本思想是使用一种低成本的方法来估计每个即时输入值对数据流图中每个节点的计算精度的影响，然后决定我们是简单地使用估计值还是执行精确的计算。我们的新运行时估计方法是基于将数据转换为对数表示。我们提出了两种算法来决定在某些节点是否需要精确的计算来平衡节能和计算误差。实验结果表明，该算法在平均节能40%(误差4.85%)和平均节能8%(误差0.18%)之间进行了权衡。与静态DFG节点切割方法相比，我们的方法的估计精度提高了32倍，达到了相同的节能效果。

{"title":"Energy efficient runtime approximate computing on data flow graphs","authors":"Mingze Gao, G. Qu","doi":"10.1109/ICCAD.2017.8203876","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203876","url":null,"abstract":"Approximate computing is an emerging computation paradigm that utilizes many applications' intrinsic error resilience to improve power and energy efficiency. Several approaches have been proposed to identify the non-critical computations by analyzing the output sensitivity to the accuracy of the results, and then perform approximate computing on these computations. However, these static approaches only use the prior knowledge (e.g. input ranges) for analysis and fail to consider the runtime information, which limits the energy saving and incurs large computation error. In this paper, we propose a runtime approximate computing framework to solve this problem. The basic idea is to use a low cost method to estimate the impact of each immediate input value to the accuracy of computation at every node in the data flow graph, and then decide whether we should simply use the estimated value or perform an accurate computation. Our novel runtime estimation method is based on converting data to the logarithmic representation. We propose two algorithms to make the decision at certain nodes whether an accurate computation will be needed to balance energy saving and computation error. Experimental results show that this tradeoff ranges from 40% energy saving with 4.85% error on average to 8% energy saving with 0.18% error. Compared to the static DFG node cutting approach, our approach's estimation accuracy is 32x better to achieve the same amount of energy saving.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123161863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

A case for low frequency single cycle multi hop NoCs for energy efficiency and high performance 低频率单周期多跳noc的能效和高性能案例

2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203851

Monodeep Kar, T. Krishna

As the number of cores in a multi-core system increase, network on-chip (NoC) latency and transmission energy scale unfavorably, since they are directly proportional to the number of hops traversed. Designers often have to trade-off energy to get lower latency (for instance long-distance bypass links with high-radix multi-stage routers) or latency to get lower energy (e.g., scaling down voltage and frequency of NoC routers and links). This work offers an alternate design-space for latency-energy optimization that has previously been unexplored, by harnessing the fact that lower frequency links can actually be used to transmit over longer on-chip distances within a cycle. We leverage a recently proposed micro-architecture that enables the construction of single-cycle multi-hop paths on the fly over a regular mesh network, and augment it with support for dynamic voltage and frequency scaling by decoupling router frequency from link frequency. In essence, we enable packets to traverse only wires from the source to the destination (as if it had a dedicated connection) only getting buffered at routers if necessary (at turns or due to contention). We address the synchronization challenges of multi-hop bypass setup signals in a multi-frequency domain and propose novel static/dynamic router and link frequency assignment techniques. Across synthetic as well as full-system benchmarks, we demonstrate reduced energy with similar or better run-times.

随着多核系统中核数的增加，片上网络(NoC)延迟和传输能量的规模将变得不利，因为它们与所穿越的跳数成正比。设计人员通常必须权衡能量以获得更低的延迟(例如，高基数多级路由器的长距离旁路链路)或延迟以获得更低的能量(例如，按比例降低NoC路由器和链路的电压和频率)。这项工作为延迟能量优化提供了另一种设计空间，这是以前未被探索过的，它利用了一个事实，即低频链路实际上可以在一个周期内传输更长的片上距离。我们利用最近提出的微架构，该架构能够在常规网状网络上构建单周期多跳路径，并通过将路由器频率与链路频率解耦来支持动态电压和频率缩放。实际上，我们只允许数据包通过从源到目的地的线路(就好像它有一个专用的连接)，只有在必要时(在轮流或由于争用)才在路由器上得到缓冲。我们解决了在多频域中多跳旁路设置信号的同步挑战，并提出了新的静态/动态路由器和链路频率分配技术。在综合和全系统基准测试中，我们证明了在相似或更好的运行时间下降低了能耗。

{"title":"A case for low frequency single cycle multi hop NoCs for energy efficiency and high performance","authors":"Monodeep Kar, T. Krishna","doi":"10.1109/ICCAD.2017.8203851","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203851","url":null,"abstract":"As the number of cores in a multi-core system increase, network on-chip (NoC) latency and transmission energy scale unfavorably, since they are directly proportional to the number of hops traversed. Designers often have to trade-off energy to get lower latency (for instance long-distance bypass links with high-radix multi-stage routers) or latency to get lower energy (e.g., scaling down voltage and frequency of NoC routers and links). This work offers an alternate design-space for latency-energy optimization that has previously been unexplored, by harnessing the fact that lower frequency links can actually be used to transmit over longer on-chip distances within a cycle. We leverage a recently proposed micro-architecture that enables the construction of single-cycle multi-hop paths on the fly over a regular mesh network, and augment it with support for dynamic voltage and frequency scaling by decoupling router frequency from link frequency. In essence, we enable packets to traverse only wires from the source to the destination (as if it had a dedicated connection) only getting buffered at routers if necessary (at turns or due to contention). We address the synchronization challenges of multi-hop bypass setup signals in a multi-frequency domain and propose novel static/dynamic router and link frequency assignment techniques. Across synthetic as well as full-system benchmarks, we demonstrate reduced energy with similar or better run-times.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114423505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Sortex: Efficient timing-driven synthesis of reconfigurable flow-based biochips for scalable single-cell screening Sortex:用于可扩展单细胞筛选的可重构流动生物芯片的高效时序驱动合成

2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203835

Mohamed Ibrahim, Aditya Sridhar, K. Chakrabarty, Ulf Schlichtmann

Single-cell screening is used to sort a stream of cells into clusters (or types) based on pre-specified biomarkers, thus supporting type-driven biochemical analysis. Reconfigurable flow-based microfluidic biochips (RFBs) can be utilized to screen hundreds of heterogeneous cells within a few minutes, but they are overburdened with the control of a large number of valves. To address this problem, we present a pin-constrained RFB design methodology for single-cell screening. The proposed design is analyzed using computational fluid dynamics simulations, mapped to an RC-lumped model, and combined with a high-level synthesis framework, referred to as Sortex. Simulation results show that Sortex significantly reduces the number of control pins and fulfills the timing requirements of single-cell screening.

单细胞筛选用于根据预先指定的生物标志物将细胞流分类为簇(或类型)，从而支持类型驱动的生化分析。基于可重构流动的微流控生物芯片(rfb)可以在几分钟内对数百个异质细胞进行筛选，但由于需要控制大量的阀门，使其负担过重。为了解决这个问题，我们提出了一种引脚受限的RFB设计方法，用于单细胞筛选。采用计算流体动力学模拟对提出的设计进行分析，映射到rc集总模型，并结合高级综合框架(称为Sortex)。仿真结果表明，Sortex显著减少了控制引脚的数量，满足了单细胞筛选的时序要求。

引用次数: 7

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀