Proceedings of the 59th ACM/IEEE Design Automation Conference最新文献

英文中文

Pref-X: a framework to reveal data prefetching in commercial in-order cores Pref-X:一个框架，用于揭示商业顺序核中的数据预取

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530569

Quentin Huppert, F. Catthoor, L. Torres, D. Novo

Computer system simulators are major tools used by architecture researchers to develop and evaluate new ideas. Clearly, such evaluations are more conclusive when compared to commercial state-of-the-art architectures. However, the behavior of key components in existing processors is often not disclosed, complicating the construction of faithful reference models. The data prefetching engine is one of such obscured components that can have a significant impact on key metrics such as performance and energy. In this paper, we propose Pref-X, a framework to analyze functional characteristics of data prefetching in commercial in-order cores. Our framework reveals data prefetches by X-raying into the cache memory at the request granularity, which allows linking memory access patterns with changes in the cache content. To demonstrate the power and accuracy of our methodology, we use Pref-X to replicate the data prefetching mechanisms of two representative processors, namely the Arm Cortex-A7 and the Arm Cortex-A53, with a 99.8% and 96.9% average accuracy, respectively.

计算机系统模拟器是建筑研究人员用来开发和评估新想法的主要工具。显然，与商业最先进的体系结构相比，这样的评估更具决定性。然而，现有处理器中关键组件的行为通常不公开，使忠实参考模型的构建复杂化。数据预取引擎就是这样一个模糊的组件，它可能对性能和能耗等关键指标产生重大影响。在本文中，我们提出了Pref-X框架来分析商业顺序核中数据预取的功能特征。我们的框架通过在请求粒度上对缓存进行x射线扫描来显示数据预取，这允许将内存访问模式与缓存内容的更改联系起来。为了证明我们方法的强大和准确性，我们使用Pref-X来复制两种代表性处理器(即Arm Cortex-A7和Arm Cortex-A53)的数据预取机制，平均准确率分别为99.8%和96.9%。

引用次数: 0

VStore

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530560

Shengwen Liang, Ying Wang, Ziming Yuan, Cheng Liu, Huawei Li, Xiaowei Li

Graph-based vector search that finds best matches to user queries based on their semantic similarities using a graph data structure, becomes instrumental in data science and AI application. However, deploying graph-based vector search in production systems requires high accuracy and cost-efficiency with low latency and memory footprint, which existing work fails to offer. We present VStore, a graph-based vector search solution that collaboratively optimizes accuracy, latency, memory, and data movement on large-scale vector data based on in-storage computing. The evaluation shows that VStore exhibits significant search efficiency improvement and energy reduction while attaining accuracy over CPU, GPU, and ZipNN platforms.

引用次数: 3

iMARS

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530478

Mengyuan Li, Ann Franchesca Laguna, D. Reis, Xunzhao Yin, M. Niemier, X. S. Hu

Recommendation systems (RecSys) suggest items to users by predicting their preferences based on historical data. Typical RecSys handle large embedding tables and many embedding table related operations. The memory size and bandwidth of the conventional computer architecture restrict the performance of RecSys. This work proposes an in-memory-computing (IMC) architecture (iMARS) for accelerating the filtering and ranking stages of deep neural network-based RecSys. iMARS leverages IMC-friendly embedding tables implemented inside a ferroelectric FET based IMC fabric. Circuit-level and system-level evaluation show that iMARS achieves 16.8x (713x) end-to-end latency (energy) improvement compared to the GPU counterpart for the MovieLens dataset.

引用次数: 1

ODHD

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530395

Ruixuan Wang, Xun Jiao, X. S. Hu

Outlier detection is a classical and important technique that has been used in different application domains such as medical diagnosis and Internet-of-Things. Recently, machine learning-based outlier detection algorithms, such as one-class support vector machine (OCSVM), isolation forest and autoencoder, have demonstrated promising results in outlier detection. In this paper, we take a radical departure from these classical learning methods and propose ODHD, an outlier detection method based on hyperdimensional computing (HDC). In ODHD, the outlier detection process is based on a P-U learning structure, in which we train a one-class HV based on inlier samples. This HV represents the abstraction information of all inlier samples; hence, any (testing) sample whose corresponding HV is dissimilar from this HV will be considered as an outlier. We perform an extensive evaluation using six datasets across different application domains and compare ODHD with multiple baseline methods including OCSVM, isolation forest, and autoencoder using three metrics including accuracy, F1 score and ROC-AUC. Experimental results show that ODHD outperforms all the baseline methods on every dataset for every metric. Moreover, we perform a design space exploration for ODHD to illustrate the tradeoff between performance and efficiency. The promising results presented in this paper provide a viable option and alternative to traditional learning algorithms for outlier detection.

{"title":"ODHD","authors":"Ruixuan Wang, Xun Jiao, X. S. Hu","doi":"10.1145/3489517.3530395","DOIUrl":"https://doi.org/10.1145/3489517.3530395","url":null,"abstract":"Outlier detection is a classical and important technique that has been used in different application domains such as medical diagnosis and Internet-of-Things. Recently, machine learning-based outlier detection algorithms, such as one-class support vector machine (OCSVM), isolation forest and autoencoder, have demonstrated promising results in outlier detection. In this paper, we take a radical departure from these classical learning methods and propose ODHD, an outlier detection method based on hyperdimensional computing (HDC). In ODHD, the outlier detection process is based on a P-U learning structure, in which we train a one-class HV based on inlier samples. This HV represents the abstraction information of all inlier samples; hence, any (testing) sample whose corresponding HV is dissimilar from this HV will be considered as an outlier. We perform an extensive evaluation using six datasets across different application domains and compare ODHD with multiple baseline methods including OCSVM, isolation forest, and autoencoder using three metrics including accuracy, F1 score and ROC-AUC. Experimental results show that ODHD outperforms all the baseline methods on every dataset for every metric. Moreover, we perform a design space exploration for ODHD to illustrate the tradeoff between performance and efficiency. The promising results presented in this paper provide a viable option and alternative to traditional learning algorithms for outlier detection.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"313 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121175252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

SCAIE-V: an open-source SCAlable interface for ISA extensions for RISC-V processors SCAIE-V: RISC-V处理器的ISA扩展的开源可扩展接口

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530432

M. Damian, J. Oppermann, Christoph Spang, A. Koch

Custom instructions extending a base ISA are often used to increase performance. However, only few cores provide open interfaces for integrating such ISA Extensions (ISAX). In addition, the degree to which a core's capabilities are exposed for extension varies wildly between interfaces. Thus, even when using open-source cores, the lack of standardized ISAX interfaces typically causes high engineering effort when implementing or porting ISAXes. We present SCAIE-V, a highly portable and feature-rich ISAX interface that supports custom control flow, decoupled execution, multi-cycle-instructions, and memory transactions. The cost of the interface itself scales with the complexity of the ISAXes actually used.

扩展基本ISA的自定义指令通常用于提高性能。然而，只有少数核心提供了集成这些ISA扩展(ISAX)的开放接口。此外，一个核心的功能公开用于扩展的程度在不同的接口之间差别很大。因此，即使使用开源内核，在实现或移植ISAX时，缺乏标准化的ISAX接口通常也会导致大量的工程工作。我们提出SCAIE-V，一个高度可移植和功能丰富的ISAX接口，支持自定义控制流，解耦执行，多周期指令和内存事务。接口本身的成本随实际使用的ISAXes的复杂性而变化。

引用次数: 2

ReSMA

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530559

Huize Li, Hai Jin, Long Zheng, Yu Huang, Xiaofei Liao, Zhuohui Duan, Dan Chen, Chuangyi Gui

Approximate string matching (ASM) functions as the basic operation kernel for a large number of string processing applications. Existing Von-Neumann-based ASM accelerators suffer from huge intermediate data with the ever-increasing string data, leading to massive off-chip data transmissions. This paper presents a novel ASM processing-in-memory (PIM) accelerator, namely ReSMA, based on ReCAM- and ReRAM-arrays to eliminate the off-chip data transmissions in ASM. We develop a novel ReCAM-friendly filter-and-filtering algorithm to process the q-grams filtering in ReCAM memory. We also design a new data mapping strategy and a new verification algorithm, which enables computing the edit distances totally in ReRAM crossbars for energy saving. Experimental results show that ReSMA outperforms the CPU-, GPU-, FPGA-, ASIC-, and PIM-based solutions by 268.7×, 38.6×, 20.9×, 707.8×, and 14.7× in terms of performance, and 153.8×, 42.2×, 31.6×, 18.3×, and 5.3× in terms of energy-saving, respectively.

引用次数: 1

Energy efficient data search design and optimization based on a compact ferroelectric FET content addressable memory 基于紧凑铁电场效应晶体管内容可寻址存储器的节能数据搜索设计与优化

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530527

Jiahao Cai, M. Imani, K. Ni, Grace Li Zhang, Bing Li, Ulf Schlichtmann, Cheng Zhuo, Xunzhao Yin

Content Addressable Memory (CAM) is widely used for associative search tasks in advanced machine learning models and data-intensive applications due to the highly parallel pattern matching capability. Most state-of-the-art CAM designs focus on reducing the CAM cell area by exploiting the nonvolatile memories (NVMs). There exists only little research on optimizing the design and energy efficiency of NVM based CAMs for practical deployment in edge devices and AI hardware. In this paper, we propose a general compact and energy efficient CAM design scheme that alleviates the design overhead by employing just one NVM device in the cell. We also propose an adaptive matchline (ML) precharge and discharge scheme that further optimizes the search energy by fully reducing the ML voltage swing. We consider Ferroelectric field effect transistors (FeFETs) as the representative NVM, and present a 2T-1FeFET CAM array including a sense amplifier implementing the proposed ML scheme. Evaluation results suggest that our proposed 2T-1FeFET CAM design achieves 6.64×/4.74×/9.14×/3.02× better energy efficiency compared with CMOS/ReRAM/STT-MRAM/2FeFET CAM arrays. Benchmarking results show that our approach provides 3.3×/2.1× energy-delay product improvement over the 2T-2R/2FeFET CAM in accelerating query processing applications.

内容寻址存储器(CAM)由于具有高度并行的模式匹配能力，被广泛用于高级机器学习模型和数据密集型应用中的关联搜索任务。大多数最先进的CAM设计都致力于通过利用非易失性存储器(NVMs)来减小CAM单元面积。对于优化基于NVM的cam的设计和能效，以便在边缘设备和人工智能硬件中实际部署，目前的研究很少。在本文中，我们提出了一种通用的紧凑节能的CAM设计方案，该方案通过在单元中仅使用一个NVM设备来减轻设计开销。我们还提出了一种自适应匹配线(ML)预充放电方案，通过充分减小ML电压摆动进一步优化搜索能量。我们将铁电场效应晶体管(fefet)作为NVM的代表，并提出了一个2T-1FeFET CAM阵列，其中包括一个实现所提出的ML方案的感测放大器。评估结果表明，与CMOS/ReRAM/STT-MRAM/2FeFET CAM阵列相比，我们提出的2T-1FeFET CAM阵列的能量效率提高了6.64×/4.74×/9.14×/3.02×。基准测试结果表明，在加速查询处理应用中，我们的方法比2T-2R/2FeFET CAM提供了3.3倍/2.1倍的能量延迟产品改进。

{"title":"Energy efficient data search design and optimization based on a compact ferroelectric FET content addressable memory","authors":"Jiahao Cai, M. Imani, K. Ni, Grace Li Zhang, Bing Li, Ulf Schlichtmann, Cheng Zhuo, Xunzhao Yin","doi":"10.1145/3489517.3530527","DOIUrl":"https://doi.org/10.1145/3489517.3530527","url":null,"abstract":"Content Addressable Memory (CAM) is widely used for associative search tasks in advanced machine learning models and data-intensive applications due to the highly parallel pattern matching capability. Most state-of-the-art CAM designs focus on reducing the CAM cell area by exploiting the nonvolatile memories (NVMs). There exists only little research on optimizing the design and energy efficiency of NVM based CAMs for practical deployment in edge devices and AI hardware. In this paper, we propose a general compact and energy efficient CAM design scheme that alleviates the design overhead by employing just one NVM device in the cell. We also propose an adaptive matchline (ML) precharge and discharge scheme that further optimizes the search energy by fully reducing the ML voltage swing. We consider Ferroelectric field effect transistors (FeFETs) as the representative NVM, and present a 2T-1FeFET CAM array including a sense amplifier implementing the proposed ML scheme. Evaluation results suggest that our proposed 2T-1FeFET CAM design achieves 6.64×/4.74×/9.14×/3.02× better energy efficiency compared with CMOS/ReRAM/STT-MRAM/2FeFET CAM arrays. Benchmarking results show that our approach provides 3.3×/2.1× energy-delay product improvement over the 2T-2R/2FeFET CAM in accelerating query processing applications.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124191059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Adaptive window-based sensor attack detection for cyber-physical systems 基于自适应窗口的网络物理系统传感器攻击检测

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530555

Lin Zhang, Zifan Wang, Mengyu Liu, Fanxin Kong

Sensor attacks alter sensor readings and spoof Cyber-Physical Systems (CPS) to perform dangerous actions. Existing detection works tend to minimize the detection delay and false alarms at the same time, while there is a clear trade-off between the two metrics. Instead, we argue that attack detection should dynamically balance the two metrics when a physical system is at different states. Along with this argument, we propose an adaptive sensor attack detection system that consists of three components - an adaptive detector, detection deadline estimator, and data logger. It can adapt the detection delay and thus false alarms at run time to meet a varying detection deadline and improve usability (or false alarms). Finally, we implement our detection system and validate it using multiple CPS simulators and a reduced-scale autonomous vehicle testbed.

传感器攻击改变传感器读数并欺骗网络物理系统(CPS)执行危险操作。现有的检测工作倾向于同时最小化检测延迟和假警报，而这两个指标之间存在明显的权衡。相反，我们认为攻击检测应该在物理系统处于不同状态时动态平衡这两个度量。根据这一论点，我们提出了一个自适应传感器攻击检测系统，该系统由三个组件组成-自适应检测器，检测截止日期估计器和数据记录器。它可以在运行时调整检测延迟和假警报，以满足不同的检测截止日期并提高可用性(或假警报)。最后，我们实现了我们的检测系统，并使用多个CPS模拟器和一个缩小规模的自动驾驶汽车试验台对其进行了验证。

引用次数: 4

GaBAN

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1163/_eifo_sim_2411

Jiajie Chen, Le Yang, Youhui Zhang

引用次数: 0

Accelerating nonlinear DC circuit simulation with reinforcement learning

Proceedings of the 59th ACM/IEEE Design Automation Conference

Pub Date : 2022-07-10 DOI: 10.1145/3489517.3530512

Zhou Jin, Haojie Pei, Yichao Dong, Xiang Jin, Xiao Wu, Weipeng Xing, Dan Niu

DC analysis is the foundation for nonlinear electronic circuit simulation. Pseudo transient analysis (PTA) methods have gained great success among various continuation algorithms. However, PTA tends to be computationally intensive without careful tuning of parameters and proper stepping strategies. In this paper, we harness the latest advancing in machine learning to resolve these challenges simultaneously. Particularly, an active learning is leveraged to provide a fine initial solver environment, in which a TD3-based Reinforcement Learning (RL) is implemented to accelerate the simulation on the fly. The RL agent is strengthen with dual agents, priority sampling, and cooperative learning to enhance its robustness and convergence. The proposed algorithms are implemented in an out-of-the-box SPICElike simulator, which demonstrated a significant speedup: up to 3.1X for the initial stage and 234X for the RL stage.

直流分析是非线性电子电路仿真的基础。在各种延拓算法中，伪瞬态分析(PTA)方法取得了很大的成功。然而，如果没有仔细调整参数和适当的步进策略，PTA往往是计算密集型的。在本文中，我们利用机器学习的最新进展来同时解决这些挑战。特别地，利用主动学习来提供一个良好的初始求解器环境，其中实现了基于td3的强化学习(RL)来加速动态仿真。采用双代理、优先抽样和合作学习等方法增强RL智能体的鲁棒性和收敛性。所提出的算法在一个开箱即用的spice模拟器中实现，该模拟器显示出显着的加速:初始阶段高达3.1倍，RL阶段高达234X。

引用次数: 2

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 59th ACM/IEEE Design Automation Conference

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀