首页 > 最新文献

Proceedings of the 56th Annual Design Automation Conference 2019最新文献

英文 中文
Software Approaches for In-time Resilience 实时弹性的软件方法
Pub Date : 2019-06-02 DOI: 10.1145/3316781.3323487
Aviral Shrivastava, Moslem Didehban
Advances in semiconductor technology have enabled unprecedented growth in safety-critical applications. However, due to unabated scaling, the unreliability of the underlying hardware is only getting worse. For a lot of applications, just recovering from errors is not enough -- the latency between the occurrence of the fault to it's detection and recovery from the fault, i.e., in-time error resilience is of vital importance. This is especially true for real-time applications, where the timing of application events is a crucial part of the correctness of application. While software techniques for resilience are highly desirable since they can be flexibly applied, but achieving reliable, in-time software resilience is still an elusive goal. A new class of recent techniques have started to tackle this problem. This paper presents a succinct overview of existing software resilience techniques from the point-of-view of in-time resilience, and points out future challenges.
半导体技术的进步使安全关键应用实现了前所未有的增长。然而,由于不减的扩展,底层硬件的不可靠性只会变得更糟。对于很多应用来说,仅仅从错误中恢复是不够的——从故障发生到检测到从故障中恢复之间的延迟,即及时的错误恢复能力是至关重要的。对于实时应用程序尤其如此,其中应用程序事件的定时是应用程序正确性的关键部分。虽然弹性的软件技术是非常可取的,因为它们可以灵活地应用,但是实现可靠的、及时的软件弹性仍然是一个难以捉摸的目标。最近一类新的技术已经开始解决这个问题。本文从实时弹性的角度简要概述了现有的软件弹性技术,并指出了未来的挑战。
{"title":"Software Approaches for In-time Resilience","authors":"Aviral Shrivastava, Moslem Didehban","doi":"10.1145/3316781.3323487","DOIUrl":"https://doi.org/10.1145/3316781.3323487","url":null,"abstract":"Advances in semiconductor technology have enabled unprecedented growth in safety-critical applications. However, due to unabated scaling, the unreliability of the underlying hardware is only getting worse. For a lot of applications, just recovering from errors is not enough -- the latency between the occurrence of the fault to it's detection and recovery from the fault, i.e., in-time error resilience is of vital importance. This is especially true for real-time applications, where the timing of application events is a crucial part of the correctness of application. While software techniques for resilience are highly desirable since they can be flexibly applied, but achieving reliable, in-time software resilience is still an elusive goal. A new class of recent techniques have started to tackle this problem. This paper presents a succinct overview of existing software resilience techniques from the point-of-view of in-time resilience, and points out future challenges.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122251621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
ALIGN 对齐
Pub Date : 2019-06-02 DOI: 10.1145/3316781.3323471
K. Kunal, Meghna Madhusudan, Arvind Sharma, Wenbin Xu, S. Burns, R. Harjani, Jiang Hu, D. Kirkpatrick, Sachin S. Sapatnekar
This paper presents analog layout automation efforts under the ALIGN (“Analog Layout, Intelligently Generated from Netlists”) project for fast layout generation using a modular approach based on a mix of algorithmic and machine learning-based tools. The road to rapid turnaround is based on an approach that detects structure and hierarchy in the input netlist and uses a grid based philosophy for layout. The paper provides a view of the current status of the project, challenges in developing open-source code with an academic/industry team, and nuts-and-bolts issues such as working with abstracted PDKs, navigating the “wall” between secured IP and open-source software, and securing access to example designs.
{"title":"ALIGN","authors":"K. Kunal, Meghna Madhusudan, Arvind Sharma, Wenbin Xu, S. Burns, R. Harjani, Jiang Hu, D. Kirkpatrick, Sachin S. Sapatnekar","doi":"10.1145/3316781.3323471","DOIUrl":"https://doi.org/10.1145/3316781.3323471","url":null,"abstract":"This paper presents analog layout automation efforts under the ALIGN (“Analog Layout, Intelligently Generated from Netlists”) project for fast layout generation using a modular approach based on a mix of algorithmic and machine learning-based tools. The road to rapid turnaround is based on an approach that detects structure and hierarchy in the input netlist and uses a grid based philosophy for layout. The paper provides a view of the current status of the project, challenges in developing open-source code with an academic/industry team, and nuts-and-bolts issues such as working with abstracted PDKs, navigating the “wall” between secured IP and open-source software, and securing access to example designs.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130328906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
QURE
Pub Date : 2019-06-02 DOI: 10.1145/3316781.3317888
Abdullah Ash-Saki, M. Alam, Swaroop Ghosh
Concerted efforts by the academia and the industries e.g., IBM, Google and Intel have brought us to the era of Noisy Intermediate-Scale Quantum (NISQ) computers. Qubits, the basic elements of quantum computer, have been proven extremely susceptible to different noises. Recent experiments have exhibited spatial variations among the qubits in NISQ hardware. Therefore, conventional mapping of qubit done without quality awareness results in significant loss of fidelity for a given workload. In this paper, we have analyzed the effects of various noise sources on the overall fidelity of the given workload for a real NISQ hardware. We have also presented novel optimization technique namely, Qubit Re-allocation (QURE) to maximize the sequence fidelity of a given workload. QURE is scalable and can be applied to future large scale quantum computers. QURE can improve the fidelity of a quantum workload up to 1.54X (1.39X on average) in simulation and up to 1.7X in real device compared to variation oblivious qubit allocation without incurring any physical overhead. CCS CONCEPTS • Hardware → Quantum error correction and fault tolerance;
{"title":"QURE","authors":"Abdullah Ash-Saki, M. Alam, Swaroop Ghosh","doi":"10.1145/3316781.3317888","DOIUrl":"https://doi.org/10.1145/3316781.3317888","url":null,"abstract":"Concerted efforts by the academia and the industries e.g., IBM, Google and Intel have brought us to the era of Noisy Intermediate-Scale Quantum (NISQ) computers. Qubits, the basic elements of quantum computer, have been proven extremely susceptible to different noises. Recent experiments have exhibited spatial variations among the qubits in NISQ hardware. Therefore, conventional mapping of qubit done without quality awareness results in significant loss of fidelity for a given workload. In this paper, we have analyzed the effects of various noise sources on the overall fidelity of the given workload for a real NISQ hardware. We have also presented novel optimization technique namely, Qubit Re-allocation (QURE) to maximize the sequence fidelity of a given workload. QURE is scalable and can be applied to future large scale quantum computers. QURE can improve the fidelity of a quantum workload up to 1.54X (1.39X on average) in simulation and up to 1.7X in real device compared to variation oblivious qubit allocation without incurring any physical overhead. CCS CONCEPTS • Hardware → Quantum error correction and fault tolerance;","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116034989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
ApproxLP
Pub Date : 2019-06-02 DOI: 10.1145/3316781.3317774
M. Imani, Alice Sokolova, Ricardo Garcia, Andrew Huang, Fan Wu, Baris Aksanli, Tajana Rosing
In a data hungry world, approximate computing has emerged as one of the solutions to create higher energy efficiency and faster systems, while providing application tailored quality. In this paper, we propose ApproxLP, an Approximate Multiplier based on Linear Planes. We introduce an iterative method for approximating the product of two operands using fitted linear functions with two inputs, referred to as linear planes. The linearization of multiplication allows multiplication operations to be completely replaced with weighted addition. The proposed technique is used to find the significand of the product of two floating point numbers, decreasing the high energy cost of floating point arithmetic. Our method fully exploits the trade-off between accuracy and energy consumption by offering various degrees of approximation at different energy costs. As the level of approximation increases, the approximated product asymptotically approaches the exact product in an iterative manner. The performance of ApproxLP is evaluated over a range of multimedia and machine learning applications. A GPU enhanced by ApproxLP yields significant energy-delay product (EDP) improvement. For multimedia, neural network, and hyperdimensional computing applications, ApproxLP offers on average $2.4 times, 2.7 times $, and $4.3 times $ EDP improvement respectively with sufficient computational quality for the application. ApproxLP also provides up to $4.5 times $ EDP improvement and has $2.3 times $ lower chip area than other state-of-the-art approximate multipliers.CCS CONCEPTS•Hardware → Integrated circuits; • Computer systems organization → Architectures;
{"title":"ApproxLP","authors":"M. Imani, Alice Sokolova, Ricardo Garcia, Andrew Huang, Fan Wu, Baris Aksanli, Tajana Rosing","doi":"10.1145/3316781.3317774","DOIUrl":"https://doi.org/10.1145/3316781.3317774","url":null,"abstract":"In a data hungry world, approximate computing has emerged as one of the solutions to create higher energy efficiency and faster systems, while providing application tailored quality. In this paper, we propose ApproxLP, an Approximate Multiplier based on Linear Planes. We introduce an iterative method for approximating the product of two operands using fitted linear functions with two inputs, referred to as linear planes. The linearization of multiplication allows multiplication operations to be completely replaced with weighted addition. The proposed technique is used to find the significand of the product of two floating point numbers, decreasing the high energy cost of floating point arithmetic. Our method fully exploits the trade-off between accuracy and energy consumption by offering various degrees of approximation at different energy costs. As the level of approximation increases, the approximated product asymptotically approaches the exact product in an iterative manner. The performance of ApproxLP is evaluated over a range of multimedia and machine learning applications. A GPU enhanced by ApproxLP yields significant energy-delay product (EDP) improvement. For multimedia, neural network, and hyperdimensional computing applications, ApproxLP offers on average $2.4 times, 2.7 times $, and $4.3 times $ EDP improvement respectively with sufficient computational quality for the application. ApproxLP also provides up to $4.5 times $ EDP improvement and has $2.3 times $ lower chip area than other state-of-the-art approximate multipliers.CCS CONCEPTS•Hardware → Integrated circuits; • Computer systems organization → Architectures;","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130476940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
LithoGAN
Pub Date : 2019-06-02 DOI: 10.1145/3316781.3317852
Wei Ye, M. Alawieh, Yibo Lin, D. Pan
Lithography simulation is one of the most fundamental steps in process modeling and physical verification. Conventional simulation methods suffer from a tremendous computational cost for achieving high accuracy. Recently, machine learning was introduced to trade off between accuracy and runtime through speeding up the resist modeling stage of the simulation flow. In this work, we propose LithoGAN, an end-to-end lithography modeling framework based on a generative adversarial network (GAN), to map the input mask patterns directly to the output resist patterns. Our experimental results show that LithoGAN can predict resist patterns with high accuracy while achieving orders of magnitude speedup compared to conventional lithography simulation and previous machine learning based approach.
{"title":"LithoGAN","authors":"Wei Ye, M. Alawieh, Yibo Lin, D. Pan","doi":"10.1145/3316781.3317852","DOIUrl":"https://doi.org/10.1145/3316781.3317852","url":null,"abstract":"Lithography simulation is one of the most fundamental steps in process modeling and physical verification. Conventional simulation methods suffer from a tremendous computational cost for achieving high accuracy. Recently, machine learning was introduced to trade off between accuracy and runtime through speeding up the resist modeling stage of the simulation flow. In this work, we propose LithoGAN, an end-to-end lithography modeling framework based on a generative adversarial network (GAN), to map the input mask patterns directly to the output resist patterns. Our experimental results show that LithoGAN can predict resist patterns with high accuracy while achieving orders of magnitude speedup compared to conventional lithography simulation and previous machine learning based approach.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127647568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
SMatch SMatch
Pub Date : 2019-06-02 DOI: 10.1145/3316781.3317912
R. T. Possignolo, Josep Renau
Designers wait several hours to get synthesis, placement and routing results even for small changes. Commercial FPGA flows allow for resynthesis after code changes, however, they target large code changes with not so effective incremental flows. We propose SMatch, a flow for FPGAs that has a novel incremental elaboration and novel incremental FPGA placement and routing that improves the state-of-the-art by reducing the amount of placement and routing work needed. We evaluate our approach against commercial FPGAs flows. Our method finishes synthesis, placement, and routing in under 30s for most changes of publicly available benchmarks with negligible QoR impact, being over $20 times$ faster than existing incremental FPGA flows. CCS CONCEPTS •Hardware → Methodologies for EDA; Logic synthesis.
{"title":"SMatch","authors":"R. T. Possignolo, Josep Renau","doi":"10.1145/3316781.3317912","DOIUrl":"https://doi.org/10.1145/3316781.3317912","url":null,"abstract":"Designers wait several hours to get synthesis, placement and routing results even for small changes. Commercial FPGA flows allow for resynthesis after code changes, however, they target large code changes with not so effective incremental flows. We propose SMatch, a flow for FPGAs that has a novel incremental elaboration and novel incremental FPGA placement and routing that improves the state-of-the-art by reducing the amount of placement and routing work needed. We evaluate our approach against commercial FPGAs flows. Our method finishes synthesis, placement, and routing in under 30s for most changes of publicly available benchmarks with negligible QoR impact, being over $20 times$ faster than existing incremental FPGA flows. CCS CONCEPTS •Hardware → Methodologies for EDA; Logic synthesis.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121570675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Tetris 俄罗斯方块
Pub Date : 2019-06-02 DOI: 10.1145/3316781.3317921
Brendan L. West, Jian Zhou, R. Dreslinski, J. Fowlkes, O. Kripfgans, C. Chakrabarti, T. Wenisch
High volume acquisition rates are imperative for medical ultrasound imaging applications, such as 3D elastography and 3D vector flow imaging. Unfortunately, despite recent algorithmic improvements, high-volume-rate imaging remains computationally infeasible on known platforms.In this paper, we propose TETRIS, a novel hardware accelerator for ultrasound beamforming that enables volume acquisition rates up to the physics limits of acoustic propagation delay. Through algorithmic and hardware optimizations, we enable a streaming system design outclassing previously proposed accelerators in performance while lowering hardware complexity and storage requirements. For a representative imaging task, our proposed system generates physics-limited 13,020 volumes per second in a 2. 5W power budget.CCS CONCEPTS• Hardware → Emerging architectures; 3D integrated circuits.;
{"title":"Tetris","authors":"Brendan L. West, Jian Zhou, R. Dreslinski, J. Fowlkes, O. Kripfgans, C. Chakrabarti, T. Wenisch","doi":"10.1145/3316781.3317921","DOIUrl":"https://doi.org/10.1145/3316781.3317921","url":null,"abstract":"High volume acquisition rates are imperative for medical ultrasound imaging applications, such as 3D elastography and 3D vector flow imaging. Unfortunately, despite recent algorithmic improvements, high-volume-rate imaging remains computationally infeasible on known platforms.In this paper, we propose TETRIS, a novel hardware accelerator for ultrasound beamforming that enables volume acquisition rates up to the physics limits of acoustic propagation delay. Through algorithmic and hardware optimizations, we enable a streaming system design outclassing previously proposed accelerators in performance while lowering hardware complexity and storage requirements. For a representative imaging task, our proposed system generates physics-limited 13,020 volumes per second in a 2. 5W power budget.CCS CONCEPTS• Hardware → Emerging architectures; 3D integrated circuits.;","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116877201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
ChipSecure
Pub Date : 2019-06-02 DOI: 10.1145/3316781.3324890
M. Mahmoodi, H. Nili, S. Larimian, Xinjie Guo, Dmitri B. Strukov
We exploit randomness in static I-V characteristics and reconfigurability of embedded flash memories to design very efficient physically unclonable function. Leakage current and subthreshold slope variations, nonlinearity, nondeterministic tuning error, and sneak path current in the redesigned commercial flash memory arrays are exploited to create a unique digital fingerprint. A time-multiplexed architecture is designed to enhance the security and expand the challenge-response pair space to 10211. Experimental results demonstrate 50.3% average uniformity, 49.99% average diffuseness, and native < 5% bit error rate. The analysis of the measured data also shows strong resilience against machine learning attacks and possibility for extremely energy efficient, 0.56 pJ/b operation.
{"title":"ChipSecure","authors":"M. Mahmoodi, H. Nili, S. Larimian, Xinjie Guo, Dmitri B. Strukov","doi":"10.1145/3316781.3324890","DOIUrl":"https://doi.org/10.1145/3316781.3324890","url":null,"abstract":"We exploit randomness in static I-V characteristics and reconfigurability of embedded flash memories to design very efficient physically unclonable function. Leakage current and subthreshold slope variations, nonlinearity, nondeterministic tuning error, and sneak path current in the redesigned commercial flash memory arrays are exploited to create a unique digital fingerprint. A time-multiplexed architecture is designed to enhance the security and expand the challenge-response pair space to 10211. Experimental results demonstrate 50.3% average uniformity, 49.99% average diffuseness, and native < 5% bit error rate. The analysis of the measured data also shows strong resilience against machine learning attacks and possibility for extremely energy efficient, 0.56 pJ/b operation.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115011681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
DCFNoC
Pub Date : 2019-06-02 DOI: 10.1145/3316781.3317794
Tomás Picornell, J. Flich, Carles Hernández, J. Duato
The adoption of many-cores in safety-critical systems requires real-time capable networks on chip (NoC). In this paper we propose a new time-predictable NoC design paradigm where contention within the network is eliminated. This new paradigm builds on the Channel Dependency Graph (CDG) and guarantees by design the absence of contention. Our delayed conflict-free NoC (DCFNoC) is able to naturally inject messages using a TDM period equal to the optimal theoretical bound and without the need of using a computationally demanding offline process. Results show that DCFNoC guarantees time predictability with very low implementation cost.
{"title":"DCFNoC","authors":"Tomás Picornell, J. Flich, Carles Hernández, J. Duato","doi":"10.1145/3316781.3317794","DOIUrl":"https://doi.org/10.1145/3316781.3317794","url":null,"abstract":"The adoption of many-cores in safety-critical systems requires real-time capable networks on chip (NoC). In this paper we propose a new time-predictable NoC design paradigm where contention within the network is eliminated. This new paradigm builds on the Channel Dependency Graph (CDG) and guarantees by design the absence of contention. Our delayed conflict-free NoC (DCFNoC) is able to naturally inject messages using a TDM period equal to the optimal theoretical bound and without the need of using a computationally demanding offline process. Results show that DCFNoC guarantees time predictability with very low implementation cost.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127999921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
LAMA
Pub Date : 2019-06-02 DOI: 10.1145/3316781.3317846
Liang Feng, Jieru Zhao, Tingyuan Liang, Sharad Sinha, Wei Zhang
To satisfy increasing computing demands, heterogeneous computing platforms are gaining attention, especially CPU-FPGA platforms. Recently, emerging tightly coupled CPU-FPGA platforms with shared coherent caches (such as the Intel HARP and IBM POWER with CAPI) have been proposed to facilitate data communication and simplify the programming model. In this work, we propose LAMA, a static analysis and dynamic control combined framework for memory access management in such platforms, to further enhance the memory access efficiency and maintain the data consistency. Based on implementation results on the real Intel HARP2 platform, LAMA is shown to improve the performance by 34% on average with low overhead.
{"title":"LAMA","authors":"Liang Feng, Jieru Zhao, Tingyuan Liang, Sharad Sinha, Wei Zhang","doi":"10.1145/3316781.3317846","DOIUrl":"https://doi.org/10.1145/3316781.3317846","url":null,"abstract":"To satisfy increasing computing demands, heterogeneous computing platforms are gaining attention, especially CPU-FPGA platforms. Recently, emerging tightly coupled CPU-FPGA platforms with shared coherent caches (such as the Intel HARP and IBM POWER with CAPI) have been proposed to facilitate data communication and simplify the programming model. In this work, we propose LAMA, a static analysis and dynamic control combined framework for memory access management in such platforms, to further enhance the memory access efficiency and maintain the data consistency. Based on implementation results on the real Intel HARP2 platform, LAMA is shown to improve the performance by 34% on average with low overhead.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128821713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
Proceedings of the 56th Annual Design Automation Conference 2019
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1