首页 > 最新文献

Proceedings of the 25th edition on Great Lakes Symposium on VLSI最新文献

英文 中文
Untrusted Third Party Digital IP Cores: Power-Delay Trade-off Driven Exploration of Hardware Trojan Secured Datapath during High Level Synthesis 不受信任的第三方数字IP核:在高级合成过程中硬件木马安全数据路径的功率延迟权衡驱动探索
Pub Date : 2015-05-20 DOI: 10.1145/2742060.2742061
A. Sengupta, Saumya Bhadauria
An evolutionary algorithm (EA) driven novel design space exploration (DSE) of an optimized hardware Trojan secured datapath based on user power-delay constraint during high level synthesis (HLS) is presented. The focus on hardware Trojan secured datapath generation during HLS has been very little with absolutely zero effort so far in design space exploration of a user multi-objective (MO) constraint optimized hardware Trojan secured datapath. This problem mandates attention as producing a Trojan secured datapath is not inconsequential. Merely the detection process of Trojan is not as straightforward as concurrent error detection (CED) of transient faults as it involves the concept of multiple third party intellectual property (3PIP) vendors to facilitate detection, let aside the exploration process of a user optimized Trojan secured datapath based on MO constraints. The proposed DSE for hardware Trojan detection includes novel problem encoding technique that enables exploration of efficient distinct vendor allocation as well as enables exploration of an optimized Trojan secured datapath structure. The exploration backbone for the proposed approach is bacterial foraging optimization algorithm (BFOA) which is known for its adaptive feature (tumbling/swimming) and simplified model. Results of comparison with recent approach indicated an average improvement in quality of results (QoR) of >14.1%
提出了一种基于进化算法驱动的基于用户功率延迟约束的优化硬件木马安全数据路径的新设计空间探索方法。在HLS过程中,对硬件木马安全数据路径生成的关注很少,迄今为止,在用户多目标约束优化硬件木马安全数据路径的设计空间探索方面绝对没有付出任何努力。这个问题需要引起注意,因为生成木马安全的数据路径并非无关紧要。单纯的木马检测过程并不像瞬时故障的并发错误检测(CED)那样简单,因为它涉及到多个第三方知识产权(3PIP)供应商的概念来促进检测,更不用说基于MO约束的用户优化木马安全数据路径的探索过程。所提出的用于硬件木马检测的DSE包括新的问题编码技术,该技术可以探索有效的不同供应商分配以及优化的木马安全数据路径结构。该方法的探索骨干是细菌觅食优化算法(BFOA),该算法以其自适应特征(翻滚/游泳)和简化的模型而闻名。结果与最近的方法比较表明,结果质量(QoR)的平均改善>14.1%
{"title":"Untrusted Third Party Digital IP Cores: Power-Delay Trade-off Driven Exploration of Hardware Trojan Secured Datapath during High Level Synthesis","authors":"A. Sengupta, Saumya Bhadauria","doi":"10.1145/2742060.2742061","DOIUrl":"https://doi.org/10.1145/2742060.2742061","url":null,"abstract":"An evolutionary algorithm (EA) driven novel design space exploration (DSE) of an optimized hardware Trojan secured datapath based on user power-delay constraint during high level synthesis (HLS) is presented. The focus on hardware Trojan secured datapath generation during HLS has been very little with absolutely zero effort so far in design space exploration of a user multi-objective (MO) constraint optimized hardware Trojan secured datapath. This problem mandates attention as producing a Trojan secured datapath is not inconsequential. Merely the detection process of Trojan is not as straightforward as concurrent error detection (CED) of transient faults as it involves the concept of multiple third party intellectual property (3PIP) vendors to facilitate detection, let aside the exploration process of a user optimized Trojan secured datapath based on MO constraints. The proposed DSE for hardware Trojan detection includes novel problem encoding technique that enables exploration of efficient distinct vendor allocation as well as enables exploration of an optimized Trojan secured datapath structure. The exploration backbone for the proposed approach is bacterial foraging optimization algorithm (BFOA) which is known for its adaptive feature (tumbling/swimming) and simplified model. Results of comparison with recent approach indicated an average improvement in quality of results (QoR) of >14.1%","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133254911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Layout Characterization and Power Density Analysis for Shorted-Gate and Independent-Gate 7nm FinFET Standard Cells 短栅和独立栅7nm FinFET标准电池的布局表征和功率密度分析
Pub Date : 2015-05-20 DOI: 10.1145/2742060.2742093
Tiansong Cui, Bowen Chen, Yanzhi Wang, Shahin Nazarian, Massoud Pedram
In this paper, a power density analysis is presented for 7nm FinFET technology node based on both shorted-gate (SG) and independent-gate (IG) standard cells operating in multiple supply voltage regimes. A Liberty-formatted standard cell library is established by selecting the appropriate number of fins for the pull-up and pull-down networks of each logic cell. The layout of both shorted-gate and independent-gate standard cells are then characterized according to lambda-based layout design rules for FinFET devices. Finally, the power density of 7nm FinFET technology node is analyzed and compared with the 45 nm CMOS technology node for different circuits. Experimental result shows that the power density of each 7nm FinFET circuit is 3-20 times larger than that of 45nm CMOS circuit under the spacer-defined technology. Experimental result also shows that the back-gate signal enables a better control of power consumption for independent-gate FinFETs.
本文提出了一种基于短栅(SG)和独立栅(IG)标准电池的7nm FinFET技术节点的功率密度分析。通过为每个逻辑单元的上拉和下拉网络选择适当数量的鳍来建立liberty格式的标准单元库。然后,根据基于lambda的FinFET器件布局设计规则,对短栅和独立栅标准单元的布局进行了表征。最后,分析了7nm FinFET技术节点与45 nm CMOS技术节点在不同电路中的功率密度。实验结果表明,在间隔定义技术下,每个7nm FinFET电路的功率密度比45nm CMOS电路大3-20倍。实验结果还表明,后门信号可以更好地控制独立栅极finfet的功耗。
{"title":"Layout Characterization and Power Density Analysis for Shorted-Gate and Independent-Gate 7nm FinFET Standard Cells","authors":"Tiansong Cui, Bowen Chen, Yanzhi Wang, Shahin Nazarian, Massoud Pedram","doi":"10.1145/2742060.2742093","DOIUrl":"https://doi.org/10.1145/2742060.2742093","url":null,"abstract":"In this paper, a power density analysis is presented for 7nm FinFET technology node based on both shorted-gate (SG) and independent-gate (IG) standard cells operating in multiple supply voltage regimes. A Liberty-formatted standard cell library is established by selecting the appropriate number of fins for the pull-up and pull-down networks of each logic cell. The layout of both shorted-gate and independent-gate standard cells are then characterized according to lambda-based layout design rules for FinFET devices. Finally, the power density of 7nm FinFET technology node is analyzed and compared with the 45 nm CMOS technology node for different circuits. Experimental result shows that the power density of each 7nm FinFET circuit is 3-20 times larger than that of 45nm CMOS circuit under the spacer-defined technology. Experimental result also shows that the back-gate signal enables a better control of power consumption for independent-gate FinFETs.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132792224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Exploiting the Expressive Power of Graphene Reconfigurable Gates via Post-Synthesis Optimization 通过合成后优化开发石墨烯可重构门的表达能力
Pub Date : 2015-05-20 DOI: 10.1145/2742060.2742098
S. Miryala, V. Tenace, A. Calimera, E. Macii, M. Poncino, L. Amarù, G. Micheli, P. Gaillardon
As an answer to the new electronics market demands, semiconductor industry is looking for different materials, new process technologies and alternative design solutions that can support Silicon replacement in the VLSI domain. The recent introduction of graphene, together with the option of electrostatically controlling its doping profile, has shown a possible way to implement fast and power efficient Reconfigurable Gates (RGs). Also, and this is the most important feature considered in this work, those graphene RGs show higher expressive power, i.e., they implement more complex functions, like Majority, MUX, XOR, with less area w.r.t. CMOS counterparts. Unfortunately, state-of-the-art synthesis tools, which have been customized for standard NAND/NOR CMOS gates, do not exploit the aforementioned feature of graphene RGs. In this paper, we present a post-synthesis tool that translates the gate level netlist obtained from commercial synthesis tools to a more optimized netlist that can efficiently integrate graphene RGs. Results conducted on a set of open-source benchmarks demonstrate that the proposed strategy improves, on average, both area and performance by 17% and 8.17% respectively.
为了满足新的电子市场需求,半导体行业正在寻找不同的材料、新的工艺技术和替代设计解决方案,以支持VLSI领域的硅替代。最近引入的石墨烯,以及静电控制其掺杂谱的选择,显示了一种实现快速和节能的可重构门(RGs)的可能方法。此外,这是本工作中考虑的最重要的特性,这些石墨烯rg具有更高的表达能力,即它们实现更复杂的功能,如Majority, MUX, XOR,而与CMOS对应器件相比,面积更小。不幸的是,最先进的合成工具,已经为标准的NAND/NOR CMOS门定制,不能利用上述石墨烯RGs的特性。在本文中,我们提出了一种合成后工具,将从商业合成工具获得的栅极级网表转换为更优化的网表,可以有效地集成石墨烯RGs。在一组开源基准测试中进行的结果表明,该策略的面积和性能平均分别提高了17%和8.17%。
{"title":"Exploiting the Expressive Power of Graphene Reconfigurable Gates via Post-Synthesis Optimization","authors":"S. Miryala, V. Tenace, A. Calimera, E. Macii, M. Poncino, L. Amarù, G. Micheli, P. Gaillardon","doi":"10.1145/2742060.2742098","DOIUrl":"https://doi.org/10.1145/2742060.2742098","url":null,"abstract":"As an answer to the new electronics market demands, semiconductor industry is looking for different materials, new process technologies and alternative design solutions that can support Silicon replacement in the VLSI domain. The recent introduction of graphene, together with the option of electrostatically controlling its doping profile, has shown a possible way to implement fast and power efficient Reconfigurable Gates (RGs). Also, and this is the most important feature considered in this work, those graphene RGs show higher expressive power, i.e., they implement more complex functions, like Majority, MUX, XOR, with less area w.r.t. CMOS counterparts. Unfortunately, state-of-the-art synthesis tools, which have been customized for standard NAND/NOR CMOS gates, do not exploit the aforementioned feature of graphene RGs. In this paper, we present a post-synthesis tool that translates the gate level netlist obtained from commercial synthesis tools to a more optimized netlist that can efficiently integrate graphene RGs. Results conducted on a set of open-source benchmarks demonstrate that the proposed strategy improves, on average, both area and performance by 17% and 8.17% respectively.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125194653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Delay, Power and Energy Tradeoffs in Deep Voltage-scaled FPGAs 深电压级fpga的延迟、功率和能量权衡
Pub Date : 2015-05-20 DOI: 10.1145/2742060.2742120
M. Abusultan, S. Khatri
In this paper, we present a circuit-level analysis of deep voltage-scaled FPGAs, which operate from full supply to sub-threshold voltages. The logic as well as the interconnect of the FPGA are modeled at the circuit level, and their relative contribution to the delay, power and energy of the FPGA are studied by means of circuit simulations. Three representative designs are studied to explore these design trade-offs. We conclude that the energy and delay-minimal FPGA design is one in which both the interconnect and logic are curtailed from scaling below a fixed voltage (about 550mV in our experiments). If power is a more important design factor (at the cost of delay), it is beneficial to operate both the logic and interconnect between 300mV and 800mV.
在本文中,我们提出了深电压标度fpga的电路级分析,其工作从满电源到亚阈值电压。在电路级对FPGA的逻辑和互连进行了建模,并通过电路仿真研究了它们对FPGA的时延、功耗和能量的相对贡献。研究了三个有代表性的设计来探索这些设计的权衡。我们得出的结论是,能量和延迟最小的FPGA设计是一种互连和逻辑都被限制在固定电压以下(在我们的实验中约为550mV)。如果功率是一个更重要的设计因素(以延迟为代价),则在300mV和800mV之间同时操作逻辑和互连是有益的。
{"title":"Delay, Power and Energy Tradeoffs in Deep Voltage-scaled FPGAs","authors":"M. Abusultan, S. Khatri","doi":"10.1145/2742060.2742120","DOIUrl":"https://doi.org/10.1145/2742060.2742120","url":null,"abstract":"In this paper, we present a circuit-level analysis of deep voltage-scaled FPGAs, which operate from full supply to sub-threshold voltages. The logic as well as the interconnect of the FPGA are modeled at the circuit level, and their relative contribution to the delay, power and energy of the FPGA are studied by means of circuit simulations. Three representative designs are studied to explore these design trade-offs. We conclude that the energy and delay-minimal FPGA design is one in which both the interconnect and logic are curtailed from scaling below a fixed voltage (about 550mV in our experiments). If power is a more important design factor (at the cost of delay), it is beneficial to operate both the logic and interconnect between 300mV and 800mV.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128620997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Statistically Validating the Impact of Process Variations on Analog and Mixed Signal Designs 统计验证过程变化对模拟和混合信号设计的影响
Pub Date : 2015-05-20 DOI: 10.1145/2742060.2742122
Ibtissem Seghaier, M. Zaki, S. Tahar
Process variation presents a practical challenge on the performance of analog and mixed signal (AMS) circuits. This paper proposes a Monte Carlo-Jackknife (MC-JK) technique, a variant of Monte Carlo method, to verify process variation affecting the performance and functionality of AMS designs. We use a behavioral model to which we encompass device variation due to $65nm$ technology process. Next, we conduct hypothesis testing based on the MC-JK technique combined with Latin hypercube sampling in a statistical run-time verification environment. Experimental results demonstrate the robustness of our approach in verifying AMS circuits.
过程变化对模拟和混合信号(AMS)电路的性能提出了实际的挑战。本文提出Monte Carlo- jackknife (MC-JK)技术,该技术是Monte Carlo方法的一种变体,用于验证影响AMS设计性能和功能的工艺变化。我们使用了一个行为模型,其中包含了由于65纳米工艺而导致的设备变化。接下来,我们在统计运行时验证环境下,基于MC-JK技术结合拉丁超立方体采样进行假设检验。实验结果证明了该方法在AMS电路验证中的鲁棒性。
{"title":"Statistically Validating the Impact of Process Variations on Analog and Mixed Signal Designs","authors":"Ibtissem Seghaier, M. Zaki, S. Tahar","doi":"10.1145/2742060.2742122","DOIUrl":"https://doi.org/10.1145/2742060.2742122","url":null,"abstract":"Process variation presents a practical challenge on the performance of analog and mixed signal (AMS) circuits. This paper proposes a Monte Carlo-Jackknife (MC-JK) technique, a variant of Monte Carlo method, to verify process variation affecting the performance and functionality of AMS designs. We use a behavioral model to which we encompass device variation due to $65nm$ technology process. Next, we conduct hypothesis testing based on the MC-JK technique combined with Latin hypercube sampling in a statistical run-time verification environment. Experimental results demonstrate the robustness of our approach in verifying AMS circuits.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126356962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Reconfigurable: Self Adaptive Fault Tolerant Cache Memory for DVS enabled Systems 可重构:为启用分布式交换机的系统提供自适应容错缓存
Pub Date : 2015-05-20 DOI: 10.1145/2742060.2742091
Michail Mavropoulos, G. Keramidas, Grigorios Adamopoulos, D. Nikolos
Processor caches play a critical role in the performance of today"s computer systems. As technology scales, due to manufacturing defects and process variations a large number of cells in a cache is expected to be faulty. The number of faulty cells varies from die to die and in the field of the application depends on the operating conditions (e.g., supply voltage, frequency). Several techniques have been proposed to tolerate faults in caches. A drawback of the redundancy based techniques is that the amount of redundancy is decided at the design time targeting a maximum number of faults, so in cases of a small number of faults (e.g., in the nominal supply voltage in a system with DVS) only a part of the redundant resources is used. In this paper we propose a new reconfigurable-self adaptive fault tolerant cache scheme. The unique characteristic of our scheme is that it uses its resources for both the reduction of the misses caused by the faulty blocks as well as for the reduction of conflict misses, depending on the number of faults, their distribution in the cache, and the running application. Our experimental results for a wide range of scientific applications and a plethora of fault maps with different SRAM failure probabilities reveal that our proposal can achieve significant benefits.
处理器缓存在当今计算机系统的性能中起着至关重要的作用。随着技术的发展,由于制造缺陷和工艺变化,缓存中的大量单元预计会出现故障。故障电池的数量因模具而异,在应用领域取决于操作条件(例如,电源电压,频率)。已经提出了几种技术来容忍缓存中的错误。基于冗余的技术的一个缺点是,冗余的数量是在设计时确定的,目标是最大数量的故障,所以在少量故障的情况下(例如,在分布式交换机系统的标称电源电压中),只使用冗余资源的一部分。本文提出了一种新的可重构自适应容错缓存方案。我们的方案的独特之处在于,它根据错误的数量、它们在缓存中的分布和正在运行的应用程序,将其资源用于减少由错误块引起的错误和减少冲突错误。我们对广泛的科学应用和具有不同SRAM故障概率的大量故障图的实验结果表明,我们的建议可以实现显着的好处。
{"title":"Reconfigurable: Self Adaptive Fault Tolerant Cache Memory for DVS enabled Systems","authors":"Michail Mavropoulos, G. Keramidas, Grigorios Adamopoulos, D. Nikolos","doi":"10.1145/2742060.2742091","DOIUrl":"https://doi.org/10.1145/2742060.2742091","url":null,"abstract":"Processor caches play a critical role in the performance of today\"s computer systems. As technology scales, due to manufacturing defects and process variations a large number of cells in a cache is expected to be faulty. The number of faulty cells varies from die to die and in the field of the application depends on the operating conditions (e.g., supply voltage, frequency). Several techniques have been proposed to tolerate faults in caches. A drawback of the redundancy based techniques is that the amount of redundancy is decided at the design time targeting a maximum number of faults, so in cases of a small number of faults (e.g., in the nominal supply voltage in a system with DVS) only a part of the redundant resources is used. In this paper we propose a new reconfigurable-self adaptive fault tolerant cache scheme. The unique characteristic of our scheme is that it uses its resources for both the reduction of the misses caused by the faulty blocks as well as for the reduction of conflict misses, depending on the number of faults, their distribution in the cache, and the running application. Our experimental results for a wide range of scientific applications and a plethora of fault maps with different SRAM failure probabilities reveal that our proposal can achieve significant benefits.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128008853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Origami: A Convolutional Network Accelerator 折纸:卷积网络加速器
Pub Date : 2015-05-20 DOI: 10.1145/2742060.2743766
L. Cavigelli, David Gschwend, Christoph Mayer, Samuel Willi, Beat Muheim, L. Benini
Today advanced computer vision (CV) systems of ever increasing complexity are being deployed in a growing number of application scenarios with strong real-time and power constraints. Current trends in CV clearly show a rise of neural network-based algorithms, which have recently broken many object detection and localization records. These approaches are very flexible and can be used to tackle many different challenges by only changing their parameters. In this paper, we present the first convolutional network accelerator which is scalable to network sizes that are currently only handled by workstation GPUs, but remains within the power envelope of embedded systems. The architecture has been implemented on 3.09 mm2 core area in UMC 65 nm technology, capable of a throughput of 274 GOp/s at 369 GOp/s/W with an external memory bandwidth of just 525 MB/s full-duplex " a decrease of more than 90% from previous work.
如今,越来越复杂的先进计算机视觉(CV)系统被部署在越来越多的实时性和功耗限制很强的应用场景中。当前CV的趋势清楚地表明基于神经网络的算法的兴起,这些算法最近打破了许多物体检测和定位记录。这些方法非常灵活,可以通过改变参数来解决许多不同的挑战。在本文中,我们提出了第一个卷积网络加速器,它可以扩展到目前仅由工作站gpu处理的网络大小,但仍在嵌入式系统的功率包络内。该架构采用联电65nm技术,在3.09 mm2的核心面积上实现,在369 GOp/s/W下的吞吐量为274 GOp/s,外部存储器带宽仅为525 MB/s,“全双工”比以前的工作减少了90%以上。
{"title":"Origami: A Convolutional Network Accelerator","authors":"L. Cavigelli, David Gschwend, Christoph Mayer, Samuel Willi, Beat Muheim, L. Benini","doi":"10.1145/2742060.2743766","DOIUrl":"https://doi.org/10.1145/2742060.2743766","url":null,"abstract":"Today advanced computer vision (CV) systems of ever increasing complexity are being deployed in a growing number of application scenarios with strong real-time and power constraints. Current trends in CV clearly show a rise of neural network-based algorithms, which have recently broken many object detection and localization records. These approaches are very flexible and can be used to tackle many different challenges by only changing their parameters. In this paper, we present the first convolutional network accelerator which is scalable to network sizes that are currently only handled by workstation GPUs, but remains within the power envelope of embedded systems. The architecture has been implemented on 3.09 mm2 core area in UMC 65 nm technology, capable of a throughput of 274 GOp/s at 369 GOp/s/W with an external memory bandwidth of just 525 MB/s full-duplex \" a decrease of more than 90% from previous work.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134280952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 149
Design Automation for Biological Models: A Pipeline that Incorporates Spatial and Molecular Complexity 生物模型设计自动化:整合空间和分子复杂性的管道
Pub Date : 2015-05-20 DOI: 10.1145/2742060.2743763
Devin P. Sullivan, Rohan Arepally, R. Murphy, J. Tapia, J. Faeder, M. Dittrich, J. Czech
Understanding the dynamics of biochemical networks is a major goal of systems biology. Due to the heterogeneity of cells and the low copy numbers of key molecules, spatially resolved approaches are required to fully understand and model these systems. Until recently, most spatial modeling was performed using geometries obtained either through manual segmentation or manual fabrication both of which are time-consuming and tedious. Similarly, the system of reactions associated with the model had to be manually defined, a process that is both tedious and error-prone for large networks. As a result, spatially resolved simulations have typically only been performed in a limited number of geometries, which are often highly simplified, and with small reaction networks.
了解生物化学网络的动态是系统生物学的一个主要目标。由于细胞的异质性和关键分子的低拷贝数,需要空间分辨的方法来充分理解和模拟这些系统。直到最近,大多数空间建模都是使用手工分割或手工制作获得的几何图形进行的,这两种方法都是耗时且繁琐的。同样,与模型相关的反应系统必须手动定义,对于大型网络来说,这一过程既繁琐又容易出错。因此,空间分辨模拟通常只在有限数量的几何形状中进行,这些几何形状通常高度简化,并且具有较小的反应网络。
{"title":"Design Automation for Biological Models: A Pipeline that Incorporates Spatial and Molecular Complexity","authors":"Devin P. Sullivan, Rohan Arepally, R. Murphy, J. Tapia, J. Faeder, M. Dittrich, J. Czech","doi":"10.1145/2742060.2743763","DOIUrl":"https://doi.org/10.1145/2742060.2743763","url":null,"abstract":"Understanding the dynamics of biochemical networks is a major goal of systems biology. Due to the heterogeneity of cells and the low copy numbers of key molecules, spatially resolved approaches are required to fully understand and model these systems. Until recently, most spatial modeling was performed using geometries obtained either through manual segmentation or manual fabrication both of which are time-consuming and tedious. Similarly, the system of reactions associated with the model had to be manually defined, a process that is both tedious and error-prone for large networks. As a result, spatially resolved simulations have typically only been performed in a limited number of geometries, which are often highly simplified, and with small reaction networks.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129848588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A Multilayered Design Approach for Efficient Hybrid 3D Photonics Network-on-chip 一种高效混合三维光子学片上网络的多层设计方法
Pub Date : 2015-05-20 DOI: 10.1145/2742060.2742083
Dharanidhar Dang, B. Patra, R. Mahapatra
In Chip Multiprocessors, traditional metallic interconnects will soon reach their bandwidth and energy dissipation limits. Photonic NoC (PNoC) is a promising alternative to renew higher performance in the advent of rising number of cores on chip. Efficient PNoC architectures are needed to reduce laser related energy consumption and maintain high performance. In this work we propose a novel sandwich layered approach to design a 3D PNoC architecture that is able to reduce no of hops, cross over points, and no of laser sources using multiplexing techniques. The 3D hybrid PNoC uses high performance 5X5 photonic routers incorporating mode division multiplexing (MDM) along with wavelength division multiplexing (WDM) and time division multiplexing (TDM). Experimental results demonstrates an increase in aggregated bandwidth up to 4x while reducing average energy consumption per router by 83% as compared to the recently reported results.
在芯片多处理器中,传统的金属互连将很快达到其带宽和能量消耗的极限。随着芯片核数的不断增加,光子NoC (PNoC)是一种很有希望更新更高性能的替代方案。高效的PNoC架构需要降低激光相关能耗并保持高性能。在这项工作中,我们提出了一种新颖的三明治分层方法来设计3D PNoC架构,该架构能够使用多路复用技术减少跳数、交叉点和激光源。3D混合PNoC使用高性能5X5光子路由器,结合模分复用(MDM)以及波分复用(WDM)和时分复用(TDM)。实验结果表明,与最近报道的结果相比,聚合带宽增加了4倍,同时每个路由器的平均能耗降低了83%。
{"title":"A Multilayered Design Approach for Efficient Hybrid 3D Photonics Network-on-chip","authors":"Dharanidhar Dang, B. Patra, R. Mahapatra","doi":"10.1145/2742060.2742083","DOIUrl":"https://doi.org/10.1145/2742060.2742083","url":null,"abstract":"In Chip Multiprocessors, traditional metallic interconnects will soon reach their bandwidth and energy dissipation limits. Photonic NoC (PNoC) is a promising alternative to renew higher performance in the advent of rising number of cores on chip. Efficient PNoC architectures are needed to reduce laser related energy consumption and maintain high performance. In this work we propose a novel sandwich layered approach to design a 3D PNoC architecture that is able to reduce no of hops, cross over points, and no of laser sources using multiplexing techniques. The 3D hybrid PNoC uses high performance 5X5 photonic routers incorporating mode division multiplexing (MDM) along with wavelength division multiplexing (WDM) and time division multiplexing (TDM). Experimental results demonstrates an increase in aggregated bandwidth up to 4x while reducing average energy consumption per router by 83% as compared to the recently reported results.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127989472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Dynamic Power Reduction Techniques in On-Chip Photonic Interconnects 片上光子互连中的动态功耗降低技术
Pub Date : 2015-05-20 DOI: 10.1145/2742060.2742118
B. Neel, M. Kennedy, Avinash Karanth Kodi
Photonic interconnects is a disruptive technology solution that can overcome the power and bandwidth limitations of traditional electrical Network-on-Chips (NoCs). However, the static power dissipated in the external laser may limit the performance of future optical NoCs by dominating the stringent network power budget. From the analysis of real benchmarks for multi-cores, it is observed that high static power is consumed due to the external laser even for low channel utilization. In this paper, we propose runtime power management techniques to reduce the magnitude of laser power consumption by tuning the network in response to actual application characteristics. We scale the number of channels available for communication based on link and buffer utilization. The performance on synthetic and real traffic (PARSEC, Splash-2) for 64-cores indicate that our proposed power scaling technique can reduce optical power by about 70% with less than 1% throughput penalty for real traffic.
光子互连是一种颠覆性的技术解决方案,它可以克服传统的电子片上网络(noc)的功率和带宽限制。然而,外部激光的静态功率消耗可能会限制未来光noc的性能,因为它控制了严格的网络功率预算。从多核的实际基准分析中可以看出,即使在低信道利用率的情况下,由于外部激光的存在,也会消耗很高的静态功率。在本文中,我们提出运行时功率管理技术,通过调整网络以响应实际应用特性来降低激光功耗的幅度。我们根据链路和缓冲区的利用率来扩展可用于通信的通道的数量。64核合成流量和真实流量(PARSEC, Splash-2)的性能表明,我们提出的功率缩放技术可以在真实流量的吞吐量损失小于1%的情况下将光功率降低约70%。
{"title":"Dynamic Power Reduction Techniques in On-Chip Photonic Interconnects","authors":"B. Neel, M. Kennedy, Avinash Karanth Kodi","doi":"10.1145/2742060.2742118","DOIUrl":"https://doi.org/10.1145/2742060.2742118","url":null,"abstract":"Photonic interconnects is a disruptive technology solution that can overcome the power and bandwidth limitations of traditional electrical Network-on-Chips (NoCs). However, the static power dissipated in the external laser may limit the performance of future optical NoCs by dominating the stringent network power budget. From the analysis of real benchmarks for multi-cores, it is observed that high static power is consumed due to the external laser even for low channel utilization. In this paper, we propose runtime power management techniques to reduce the magnitude of laser power consumption by tuning the network in response to actual application characteristics. We scale the number of channels available for communication based on link and buffer utilization. The performance on synthetic and real traffic (PARSEC, Splash-2) for 64-cores indicate that our proposed power scaling technique can reduce optical power by about 70% with less than 1% throughput penalty for real traffic.","PeriodicalId":255133,"journal":{"name":"Proceedings of the 25th edition on Great Lakes Symposium on VLSI","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129371173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
Proceedings of the 25th edition on Great Lakes Symposium on VLSI
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1