首页 > 最新文献

2020 57th ACM/IEEE Design Automation Conference (DAC)最新文献

英文 中文
CUGR: Detailed-Routability-Driven 3D Global Routing with Probabilistic Resource Model 基于概率资源模型的详细可达性驱动的三维全局路由
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218646
Jinwei Liu, Chak-Wa Pui, Fangzhou Wang, Evangeline F. Y. Young
Many competitive global routers adopt the technique of compressing the 3D routing space into 2D in order to handle today’s massive circuit scales. It has been shown as an effective way to shorten the routing time, however, quality will inevitably be sacrificed to different extents. In this paper, we propose two routing techniques that directly operate on the 3D routing space and can maximally utilize the 3D structure of a grid graph. The first technique is called 3D pattern routing, by which we combine pattern routing and layer assignment, and we are able to produce optimal solutions with respect to the patterns under consideration in terms of a cost function in wire length and routability. The second technique is called multi-level 3D maze routing. Two levels of maze routing with different cost functions and objectives are designed to maximize the routability and to search for the minimum cost path efficiently. Besides, we also designed a cost function that is sensitive to resources changes and a post-processing technique called patching that gives the detailed router more flexibility in escaping congested regions. Finally, the experimental results show that our global router outperforms all the contestants in the ICCAD’19 global routing contest.
许多竞争激烈的全球路由器采用将3D路由空间压缩为2D的技术,以处理当今庞大的电路规模。实践证明,这是一种有效的缩短布线时间的方法,但不可避免地会在一定程度上牺牲质量。在本文中,我们提出了两种直接操作三维路由空间的路由技术,可以最大限度地利用网格图的三维结构。第一种技术被称为3D模式路由,通过该技术,我们将模式路由和层分配结合起来,并且我们能够根据线长度和可达性的成本函数产生关于所考虑的模式的最优解决方案。第二种技术被称为多层次3D迷宫路径。设计了具有不同代价函数和目标的两层迷宫路径,使可达性最大化,并有效地寻找代价最小的路径。此外,我们还设计了一个对资源变化敏感的代价函数和一种称为补丁的后处理技术,使详细路由器在逃离拥塞区域时具有更大的灵活性。最后,实验结果表明,我们的全局路由器在ICCAD ' 19全球路由竞赛中表现优异。
{"title":"CUGR: Detailed-Routability-Driven 3D Global Routing with Probabilistic Resource Model","authors":"Jinwei Liu, Chak-Wa Pui, Fangzhou Wang, Evangeline F. Y. Young","doi":"10.1109/DAC18072.2020.9218646","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218646","url":null,"abstract":"Many competitive global routers adopt the technique of compressing the 3D routing space into 2D in order to handle today’s massive circuit scales. It has been shown as an effective way to shorten the routing time, however, quality will inevitably be sacrificed to different extents. In this paper, we propose two routing techniques that directly operate on the 3D routing space and can maximally utilize the 3D structure of a grid graph. The first technique is called 3D pattern routing, by which we combine pattern routing and layer assignment, and we are able to produce optimal solutions with respect to the patterns under consideration in terms of a cost function in wire length and routability. The second technique is called multi-level 3D maze routing. Two levels of maze routing with different cost functions and objectives are designed to maximize the routability and to search for the minimum cost path efficiently. Besides, we also designed a cost function that is sensitive to resources changes and a post-processing technique called patching that gives the detailed router more flexibility in escaping congested regions. Finally, the experimental results show that our global router outperforms all the contestants in the ICCAD’19 global routing contest.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"s3-44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130189378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Convergence-Aware Neural Network Training 收敛感知神经网络训练
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218518
Hyungjun Oh, Yongseung Yu, G. Ryu, Gunjoo Ahn, Yuri Jeong, Yongjun Park, Jiwon Seo
Training a deep neural network(DNN) is expensive, requiring a large amount of computation time. While the training overhead is high, not all computation in DNN training is equal. Some parameters converge faster and thus their gradient computation may contribute little to the parameter update; in nearstationary points a subset of parameters may change very little. In this paper we exploit the parameter convergence to optimize gradient computation in DNN training. We design a light-weight monitoring technique to track the parameter convergence; we prune the gradient computation stochastically for a group of semantically related parameters, exploiting their convergence correlations. These techniques are efficiently implemented in existing GPU kernels. In our evaluation the optimization techniques substantially and robustly improve the training throughput for four DNN models on three public datasets.
深度神经网络(DNN)的训练成本很高,需要大量的计算时间。虽然训练开销很高,但DNN训练中并非所有的计算都是相等的。有些参数收敛较快,其梯度计算对参数更新贡献不大;在近平稳点上,参数子集的变化可能很小。本文利用参数收敛性来优化深度神经网络训练中的梯度计算。我们设计了一种轻量级的监测技术来跟踪参数的收敛;我们对一组语义相关参数的梯度计算进行随机修剪,利用它们的收敛相关性。这些技术可以在现有的GPU内核中有效地实现。在我们的评估中,优化技术显著提高了四个DNN模型在三个公共数据集上的训练吞吐量。
{"title":"Convergence-Aware Neural Network Training","authors":"Hyungjun Oh, Yongseung Yu, G. Ryu, Gunjoo Ahn, Yuri Jeong, Yongjun Park, Jiwon Seo","doi":"10.1109/DAC18072.2020.9218518","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218518","url":null,"abstract":"Training a deep neural network(DNN) is expensive, requiring a large amount of computation time. While the training overhead is high, not all computation in DNN training is equal. Some parameters converge faster and thus their gradient computation may contribute little to the parameter update; in nearstationary points a subset of parameters may change very little. In this paper we exploit the parameter convergence to optimize gradient computation in DNN training. We design a light-weight monitoring technique to track the parameter convergence; we prune the gradient computation stochastically for a group of semantically related parameters, exploiting their convergence correlations. These techniques are efficiently implemented in existing GPU kernels. In our evaluation the optimization techniques substantially and robustly improve the training throughput for four DNN models on three public datasets.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134437560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Algorithm/Hardware Co-Design for In-Memory Neural Network Computing with Minimal Peripheral Circuit Overhead 最小外围电路开销的内存神经网络计算算法/硬件协同设计
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218657
Hyungjun Kim, Yulhwa Kim, Sungju Ryu, Jae-Joon Kim
We propose an in-memory neural network accelerator architecture called MOSAIC which uses minimal form of peripheral circuits; 1-bit word line driver to replace DAC and 1-bit sense amplifier to replace ADC. To map multi-bit neural networks on MOSAIC architecture which has 1-bit precision peripheral circuits, we also propose a bit-splitting method to approximate the original network by separating each bit path of the multi-bit network so that each bit path can propagate independently throughout the network. Thanks to the minimal form of peripheral circuits, MOSAIC can achieve an order of magnitude higher energy and area efficiency than previous in-memory neural network accelerators.
我们提出了一种称为MOSAIC的内存神经网络加速器架构,它使用最小形式的外围电路;1位字行驱动器取代DAC和1位感测放大器取代ADC。为了将多比特神经网络映射到具有1位精度外围电路的MOSAIC架构上,我们还提出了一种比特分割方法,通过分离多比特网络的每个比特路径来近似原始网络,使每个比特路径在网络中独立传播。由于外围电路的最小形式,MOSAIC可以实现比以前的内存神经网络加速器高数量级的能量和面积效率。
{"title":"Algorithm/Hardware Co-Design for In-Memory Neural Network Computing with Minimal Peripheral Circuit Overhead","authors":"Hyungjun Kim, Yulhwa Kim, Sungju Ryu, Jae-Joon Kim","doi":"10.1109/DAC18072.2020.9218657","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218657","url":null,"abstract":"We propose an in-memory neural network accelerator architecture called MOSAIC which uses minimal form of peripheral circuits; 1-bit word line driver to replace DAC and 1-bit sense amplifier to replace ADC. To map multi-bit neural networks on MOSAIC architecture which has 1-bit precision peripheral circuits, we also propose a bit-splitting method to approximate the original network by separating each bit path of the multi-bit network so that each bit path can propagate independently throughout the network. Thanks to the minimal form of peripheral circuits, MOSAIC can achieve an order of magnitude higher energy and area efficiency than previous in-memory neural network accelerators.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133949445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
An Efficient EPIST Algorithm for Global Placement with Non-Integer Multiple-Height Cells * 非整数多高度单元格全局布局的高效EPIST算法*
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218504
Jianli Chen, Zhipeng Huang, Ye Huang, Wen-xing Zhu, Jun Yu, Yao-Wen Chang
With the increasing design requirements of modern circuits, a standard-cell library often contains cells of different row heights to address various trade-offs among performance, power, and area. However, maintaining all standard cells with integer multiples of a single-row height could cause some area overheads and increase power consumption. In this paper, we present an analytical placer to directly consider a circuit design with non-integer multiple-height standard cells and additional layout constraints. The region of different cell heights is adaptively generated by the global placement result. In particular, an exact penalty iterative shrinkage and thresholding (EPIST) algorithm is employed to efficiently optimize the global placement problem. The convergence of the algorithm is proved, and the acceleration strategy is proposed to improve the performance of our algorithm. Compared with the state-of-the-art works, experimental results based on the 2017 CAD Contest at ICCAD benchmarks show that our algorithm achieves the best wirelength and area for every benchmark. In particular, our proposed EPIST algorithm provides a new direction for effectively solving large-scale nonlinear optimization problems with non-smooth terms, which are often seen in real-world applications.
随着现代电路设计要求的不断提高,标准单元库通常包含不同排高的单元,以解决性能、功率和面积之间的各种权衡。但是,使用单行高度的整数倍来维护所有标准单元格可能会导致一些面积开销并增加功耗。在本文中,我们提出了一个解析砂子来直接考虑具有非整数多高度标准单元和附加布局约束的电路设计。根据全局放置结果自适应生成不同单元高度的区域。特别地,采用精确惩罚迭代收缩阈值(EPIST)算法对全局布局问题进行了有效优化。证明了算法的收敛性,并提出了加速策略来提高算法的性能。基于ICCAD基准测试的2017年CAD竞赛的实验结果表明,我们的算法在每个基准测试中都获得了最佳的波长和面积。特别是,我们提出的EPIST算法为有效解决实际应用中经常出现的大规模非光滑项非线性优化问题提供了新的方向。
{"title":"An Efficient EPIST Algorithm for Global Placement with Non-Integer Multiple-Height Cells *","authors":"Jianli Chen, Zhipeng Huang, Ye Huang, Wen-xing Zhu, Jun Yu, Yao-Wen Chang","doi":"10.1109/DAC18072.2020.9218504","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218504","url":null,"abstract":"With the increasing design requirements of modern circuits, a standard-cell library often contains cells of different row heights to address various trade-offs among performance, power, and area. However, maintaining all standard cells with integer multiples of a single-row height could cause some area overheads and increase power consumption. In this paper, we present an analytical placer to directly consider a circuit design with non-integer multiple-height standard cells and additional layout constraints. The region of different cell heights is adaptively generated by the global placement result. In particular, an exact penalty iterative shrinkage and thresholding (EPIST) algorithm is employed to efficiently optimize the global placement problem. The convergence of the algorithm is proved, and the acceleration strategy is proposed to improve the performance of our algorithm. Compared with the state-of-the-art works, experimental results based on the 2017 CAD Contest at ICCAD benchmarks show that our algorithm achieves the best wirelength and area for every benchmark. In particular, our proposed EPIST algorithm provides a new direction for effectively solving large-scale nonlinear optimization problems with non-smooth terms, which are often seen in real-world applications.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132779115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Learning to Quantize Deep Neural Networks: A Competitive-Collaborative Approach 学习量化深度神经网络:一种竞争-协作的方法
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218576
Md Fahim Faysal Khan, Mohammad Mahdi Kamani, M. Mahdavi, V. Narayanan
Reducing the model size and computation costs for dedicated AI accelerator designs, neural network quantization methods have attracted momentous attention recently. Unfortunately, merely minimizing quantization loss using constant discretization causes accuracy deterioration. In this paper, we propose an iterative accuracy-driven learning framework of competitive-collaborative quantization (CCQ) to gradually adapt the bit-precision of each individual layer. Orthogonal to prior quantization policies working with full precision for the first and last layers of the network, CCQ offers layer-wise competition for any target quantization policy with holistic layer fine-tuning to recover accuracy, where the state-of-the-art networks can be entirely quantized without any significant accuracy degradation.
神经网络量化方法在减少人工智能专用加速器设计的模型尺寸和计算成本方面受到了广泛关注。不幸的是,仅仅使用常数离散化最小化量化损失会导致精度下降。在本文中,我们提出了一个迭代精度驱动的竞争-协作量化(CCQ)学习框架,以逐步适应每一层的比特精度。CCQ与先前的量化策略正交,在网络的第一层和最后一层以完全精确的方式工作,CCQ为任何目标量化策略提供分层竞争,并进行整体层微调以恢复精度,其中最先进的网络可以完全量化而不会出现任何显著的精度下降。
{"title":"Learning to Quantize Deep Neural Networks: A Competitive-Collaborative Approach","authors":"Md Fahim Faysal Khan, Mohammad Mahdi Kamani, M. Mahdavi, V. Narayanan","doi":"10.1109/DAC18072.2020.9218576","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218576","url":null,"abstract":"Reducing the model size and computation costs for dedicated AI accelerator designs, neural network quantization methods have attracted momentous attention recently. Unfortunately, merely minimizing quantization loss using constant discretization causes accuracy deterioration. In this paper, we propose an iterative accuracy-driven learning framework of competitive-collaborative quantization (CCQ) to gradually adapt the bit-precision of each individual layer. Orthogonal to prior quantization policies working with full precision for the first and last layers of the network, CCQ offers layer-wise competition for any target quantization policy with holistic layer fine-tuning to recover accuracy, where the state-of-the-art networks can be entirely quantized without any significant accuracy degradation.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130683740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Eliminating Redundant Computation in Noisy Quantum Computing Simulation 噪声量子计算模拟中冗余计算的消除
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218666
Gushu Li, Yufei Ding, Yuan Xie
Noisy Quantum Computing (QC) simulation on a classical machine is very time consuming since it requires Monte Carlo simulation with a large number of error-injection trials to model the effect of random noises. Orthogonal to existing QC simulation optimizations, we aim to accelerate the simulation by eliminating the redundant computation among those Monte Carlo simulation trials. We observe that the intermediate states of many trials can often be the same. Once these states are computed in one trial, they can be temporarily stored and reused in other trials. However, storing such states will consume significant memory space. To leverage the shared intermediate states without introducing too much storage overhead, we propose to statically generate and analyze the Monte Carlo simulation simulation trials before the actual simulation. Those trials are reordered to maximize the overlapped computation between two consecutive trials. The states that cannot be reused in follow-up simulation are dropped, so that we only need to store a few states. Experiment results show that the proposed optimization scheme can save on average 80% computation with only a small number of state vectors stored. In addition, the proposed simulation scheme demonstrates great scalability as more computation can be saved with more simulation trials or on future QC devices with reduced error rates.
噪声量子计算(QC)在经典机器上的模拟非常耗时,因为它需要蒙特卡罗模拟和大量的错误注入试验来模拟随机噪声的影响。与现有的QC仿真优化方法正交,通过消除蒙特卡罗仿真试验之间的冗余计算来加快仿真速度。我们观察到,许多试验的中间状态往往是相同的。一旦在一次试验中计算出这些状态,就可以暂时存储它们并在其他试验中重用。然而,存储这些状态将消耗大量的内存空间。为了在不引入太多存储开销的情况下利用共享的中间状态,我们建议在实际模拟之前静态地生成和分析蒙特卡罗模拟模拟试验。这些试验被重新排序,以最大化两个连续试验之间的重叠计算。不能在后续模拟中重用的状态将被删除,因此我们只需要存储几个状态。实验结果表明,该优化方案在仅存储少量状态向量的情况下,平均节省80%的计算量。此外,所提出的仿真方案具有很强的可扩展性,因为通过更多的仿真试验或在未来的QC设备上减少错误率,可以节省更多的计算。
{"title":"Eliminating Redundant Computation in Noisy Quantum Computing Simulation","authors":"Gushu Li, Yufei Ding, Yuan Xie","doi":"10.1109/DAC18072.2020.9218666","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218666","url":null,"abstract":"Noisy Quantum Computing (QC) simulation on a classical machine is very time consuming since it requires Monte Carlo simulation with a large number of error-injection trials to model the effect of random noises. Orthogonal to existing QC simulation optimizations, we aim to accelerate the simulation by eliminating the redundant computation among those Monte Carlo simulation trials. We observe that the intermediate states of many trials can often be the same. Once these states are computed in one trial, they can be temporarily stored and reused in other trials. However, storing such states will consume significant memory space. To leverage the shared intermediate states without introducing too much storage overhead, we propose to statically generate and analyze the Monte Carlo simulation simulation trials before the actual simulation. Those trials are reordered to maximize the overlapped computation between two consecutive trials. The states that cannot be reused in follow-up simulation are dropped, so that we only need to store a few states. Experiment results show that the proposed optimization scheme can save on average 80% computation with only a small number of state vectors stored. In addition, the proposed simulation scheme demonstrates great scalability as more computation can be saved with more simulation trials or on future QC devices with reduced error rates.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130979422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Don’t-Care-Based Node Minimization for Threshold Logic Networks 阈值逻辑网络中基于不关心的节点最小化
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218506
Yung-Chih Chen, Hao-Ju Chang, Li-Cheng Zheng
Threshold logic re-attracts researchers’ attention recently due to the advancement of hardware realization techniques and its applications to deep learning. In the past decade, several design automation techniques for threshold logic have been proposed, such as logic synthesis and logic optimization. Although they are effective, threshold logic network (TLN) optimization based on don’t cares has not been well studied. In this paper, we propose a don’t-care-based node minimization scheme for TLNs. We first present a sufficient condition for don’t cares to exist and a logic-implication-based method to identify the don’t cares of a threshold logic gate (TLG). Then, we transform the problem of TLG minimization with don’t cares to an integer linear programming problem, and present a method to compute the necessary constraints for the ILP formulation. We apply the proposed optimization scheme to two set of TLNs generated by the state-of-the-art synthesis technique. The experimental results show that, for the two sets, it achieves an average of 11% and 19% of area reduction in terms of the sum of the weights and threshold values without overhead on the TLG count and logic depth. Additionally, it completes the optimization of most TLNs within one minute.
近年来,由于硬件实现技术的进步及其在深度学习中的应用,阈值逻辑重新引起了研究人员的关注。在过去的十年中,已经提出了几种阈值逻辑的设计自动化技术,如逻辑综合和逻辑优化。虽然它们是有效的,但基于不在乎的阈值逻辑网络(TLN)优化还没有得到很好的研究。在本文中,我们提出了一种基于不关心的tln节点最小化方案。本文首先给出了不在乎存在的充分条件,并提出了一种基于逻辑蕴涵的阈值逻辑门(TLG)不在乎识别方法。然后,我们将不关心的TLG最小化问题转化为整数线性规划问题,并给出了一种计算ILP公式所需约束的方法。我们将提出的优化方案应用于两组由最先进的合成技术生成的tln。实验结果表明,对于两个集合,在不增加TLG计数和逻辑深度开销的情况下,权重和阈值之和的面积平均减少了11%和19%。此外,它可以在1分钟内完成大多数tln的优化。
{"title":"Don’t-Care-Based Node Minimization for Threshold Logic Networks","authors":"Yung-Chih Chen, Hao-Ju Chang, Li-Cheng Zheng","doi":"10.1109/DAC18072.2020.9218506","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218506","url":null,"abstract":"Threshold logic re-attracts researchers’ attention recently due to the advancement of hardware realization techniques and its applications to deep learning. In the past decade, several design automation techniques for threshold logic have been proposed, such as logic synthesis and logic optimization. Although they are effective, threshold logic network (TLN) optimization based on don’t cares has not been well studied. In this paper, we propose a don’t-care-based node minimization scheme for TLNs. We first present a sufficient condition for don’t cares to exist and a logic-implication-based method to identify the don’t cares of a threshold logic gate (TLG). Then, we transform the problem of TLG minimization with don’t cares to an integer linear programming problem, and present a method to compute the necessary constraints for the ILP formulation. We apply the proposed optimization scheme to two set of TLNs generated by the state-of-the-art synthesis technique. The experimental results show that, for the two sets, it achieves an average of 11% and 19% of area reduction in terms of the sum of the weights and threshold values without overhead on the TLG count and logic depth. Additionally, it completes the optimization of most TLNs within one minute.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"417 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132449905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Improving the Concurrency Performance of Persistent Memory Transactions on Multicores 提高多核持久内存事务的并发性能
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218554
Qing Wang, Youyou Lu, Zhongjie Wu, Fan Yang, J. Shu
Persistent memory provides data persistence to in-memory transaction systems, enabling full ACID properties. However, high data persistence worsens the concurrency performance due to delayed execution of conflicted transactions on multicores. In this paper, we propose SP 3 (SPeculative Parallel Persistence) to improve the concurrency performance of persistent memory transactions. SP3 keeps the dependencies between different transactions in a DAG (direct acyclic graph) by detecting conflicts in the read/write sets, and speculatively executes conflicted transactions without waiting for the completeness of data persistence. Evaluation shows that SP3 significantly improves concurrency performance and achieves almost linear scalability in most evaluated workloads.
持久内存为内存事务系统提供数据持久性,支持完整的ACID属性。然而,由于在多核上延迟执行冲突事务,高数据持久性降低了并发性能。在本文中,我们提出SP 3 (SPeculative Parallel Persistence)来提高持久内存事务的并发性能。SP3通过检测读/写集中的冲突,将不同事务之间的依赖关系保存在DAG(直接无环图)中,并推测地执行冲突的事务,而无需等待数据持久性的完成。评估表明,SP3显著提高了并发性能,并在大多数评估的工作负载中实现了几乎线性的可伸缩性。
{"title":"Improving the Concurrency Performance of Persistent Memory Transactions on Multicores","authors":"Qing Wang, Youyou Lu, Zhongjie Wu, Fan Yang, J. Shu","doi":"10.1109/DAC18072.2020.9218554","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218554","url":null,"abstract":"Persistent memory provides data persistence to in-memory transaction systems, enabling full ACID properties. However, high data persistence worsens the concurrency performance due to delayed execution of conflicted transactions on multicores. In this paper, we propose SP 3 (SPeculative Parallel Persistence) to improve the concurrency performance of persistent memory transactions. SP3 keeps the dependencies between different transactions in a DAG (direct acyclic graph) by detecting conflicts in the read/write sets, and speculatively executes conflicted transactions without waiting for the completeness of data persistence. Evaluation shows that SP3 significantly improves concurrency performance and achieves almost linear scalability in most evaluated workloads.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127389786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Vehicular and Edge Computing for Emerging Connected and Autonomous Vehicle Applications 新兴互联和自动驾驶汽车应用的车辆和边缘计算
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218618
S. Baidya, Yu-Jen Ku, Hengyu Zhao, Jishen Zhao, S. Dey
Emerging connected and autonomous vehicles involve complex applications requiring not only optimal computing resource allocations but also efficient computing architectures. In this paper, we unfold the critical performance metrics required for emerging vehicular computing applications and show with preliminary experimental results, how optimal choices can be made to satisfy the static and dynamic computing requirements in terms of the performance metrics. We also discuss the feasibility of edge computing architectures for vehicular computing and show tradeoffs for different offloading strategies. The paper shows directions for light weight, high performance and low power computing paradigms, architectures and design-space exploration tools to satisfy evolving applications and requirements for connected and autonomous vehicles.
新兴的联网和自动驾驶汽车涉及复杂的应用,不仅需要最佳的计算资源分配,还需要高效的计算架构。在本文中,我们展示了新兴车辆计算应用所需的关键性能指标,并通过初步的实验结果展示了如何在性能指标方面做出最佳选择以满足静态和动态计算需求。我们还讨论了边缘计算架构用于车辆计算的可行性,并展示了不同卸载策略的权衡。本文展示了轻量化、高性能和低功耗计算范式、架构和设计空间探索工具的发展方向,以满足联网和自动驾驶汽车不断发展的应用和需求。
{"title":"Vehicular and Edge Computing for Emerging Connected and Autonomous Vehicle Applications","authors":"S. Baidya, Yu-Jen Ku, Hengyu Zhao, Jishen Zhao, S. Dey","doi":"10.1109/DAC18072.2020.9218618","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218618","url":null,"abstract":"Emerging connected and autonomous vehicles involve complex applications requiring not only optimal computing resource allocations but also efficient computing architectures. In this paper, we unfold the critical performance metrics required for emerging vehicular computing applications and show with preliminary experimental results, how optimal choices can be made to satisfy the static and dynamic computing requirements in terms of the performance metrics. We also discuss the feasibility of edge computing architectures for vehicular computing and show tradeoffs for different offloading strategies. The paper shows directions for light weight, high performance and low power computing paradigms, architectures and design-space exploration tools to satisfy evolving applications and requirements for connected and autonomous vehicles.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115456729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
A Versatile and Flexible Chiplet-based System Design for Heterogeneous Manycore Architectures 基于芯片的异构多核系统设计
Pub Date : 2020-07-01 DOI: 10.1109/DAC18072.2020.9218654
Hao Zheng, Ke Wang, A. Louri
Heterogeneous manycore architectures are deployed to simultaneously run multiple and diverse applications. This requires various computing capabilities (CPUs, GPUs, and accelerators), and an efficient network-on-chip (NoC) architecture to concurrently handle diverse application communication behavior. However, supporting the concurrent communication requirements of diverse applications is challenging due to the dynamic application mapping, the complexity of handling distinct communication patterns and limited on-chip resources. In this paper, we propose Adapt-NoC, a versatile and flexible NoC architecture for chiplet-based manycore architectures, consisting of adaptable routers and links. Adapt-NoC can dynamically allocate disjoint regions of the NoC, called subNoCs, for concurrently-running applications, each of which can be optimized for different communication behavior. The adaptable routers and links are capable of providing various subNoC topologies, satisfying different latency and bandwidth requirements of various traffic patterns (e.g. all-to-all, one-to-many). Full system simulation shows that AdaptNoC can achieve 31% latency reduction, 24% energy saving and 10% execution time reduction on average, when compared to prior designs.
部署异构多核架构以同时运行多个不同的应用程序。这需要各种计算能力(cpu、gpu和加速器)和高效的片上网络(NoC)架构来并发处理各种应用程序通信行为。然而,由于动态应用程序映射、处理不同通信模式的复杂性和有限的片上资源,支持不同应用程序的并发通信需求是具有挑战性的。在本文中,我们提出了Adapt-NoC,这是一种基于芯片的多核架构的通用和灵活的NoC架构,由自适应路由器和链路组成。Adapt-NoC可以为并发运行的应用程序动态分配NoC的不相交区域(称为子NoC),每个子NoC都可以针对不同的通信行为进行优化。适应性强的路由器和链路能够提供各种子noc拓扑,满足各种流量模式(如all-to-all、一对多)的不同延迟和带宽需求。全系统仿真表明,与之前的设计相比,AdaptNoC可以平均减少31%的延迟,节省24%的能源,减少10%的执行时间。
{"title":"A Versatile and Flexible Chiplet-based System Design for Heterogeneous Manycore Architectures","authors":"Hao Zheng, Ke Wang, A. Louri","doi":"10.1109/DAC18072.2020.9218654","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218654","url":null,"abstract":"Heterogeneous manycore architectures are deployed to simultaneously run multiple and diverse applications. This requires various computing capabilities (CPUs, GPUs, and accelerators), and an efficient network-on-chip (NoC) architecture to concurrently handle diverse application communication behavior. However, supporting the concurrent communication requirements of diverse applications is challenging due to the dynamic application mapping, the complexity of handling distinct communication patterns and limited on-chip resources. In this paper, we propose Adapt-NoC, a versatile and flexible NoC architecture for chiplet-based manycore architectures, consisting of adaptable routers and links. Adapt-NoC can dynamically allocate disjoint regions of the NoC, called subNoCs, for concurrently-running applications, each of which can be optimized for different communication behavior. The adaptable routers and links are capable of providing various subNoC topologies, satisfying different latency and bandwidth requirements of various traffic patterns (e.g. all-to-all, one-to-many). Full system simulation shows that AdaptNoC can achieve 31% latency reduction, 24% energy saving and 10% execution time reduction on average, when compared to prior designs.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115020590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
期刊
2020 57th ACM/IEEE Design Automation Conference (DAC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1