首页 > 最新文献

Proceedings of the 16th ACM International Conference on Computing Frontiers最新文献

英文 中文
Ghost loads: what is the cost of invisible speculation? 鬼载:无形投机的代价是什么?
Pub Date : 2019-04-30 DOI: 10.1145/3310273.3321558
Christos Sakalis, M. Alipour, Alberto Ros, A. Jimborean, S. Kaxiras, Magnus Själander
Speculative execution is necessary for achieving high performance on modern general-purpose CPUs but, starting with Spectre and Meltdown, it has also been proven to cause severe security flaws. In case of a misspeculation, the architectural state is restored to assure functional correctness but a multitude of microarchitectural changes (e.g., cache updates), caused by the speculatively executed instructions, are commonly left in the system. These changes can be used to leak sensitive information, which has led to a frantic search for solutions that can eliminate such security flaws. The contribution of this work is an evaluation of the cost of hiding speculative side-effects in the cache hierarchy, making them visible only after the speculation has been resolved. For this, we compare (for the first time) two broad approaches: i) waiting for loads to become non-speculative before issuing them to the memory system, and ii) eliminating the side-effects of speculation, a solution consisting of invisible loads (Ghost loads) and performance optimizations (Ghost Buffer and Materialization). While previous work, InvisiSpec, has proposed a similar solution to our latter approach, it has done so with only a minimal evaluation and at a significant performance cost. The detailed evaluation of our solutions shows that: i) waiting for loads to become non-speculative is no more costly than the previously proposed InvisiSpec solution, albeit much simpler, non-invasive in the memory system, and stronger security-wise; ii) hiding speculation with Ghost loads (in the context of a relaxed memory model) can be achieved at the cost of 12% performance degradation and 9% energy increase, which is significantly better that the previous state-of-the-art solution.
投机执行对于在现代通用cpu上实现高性能是必要的,但是,从Spectre和Meltdown开始,它也被证明会导致严重的安全漏洞。在错误推测的情况下,体系结构状态被恢复以确保功能的正确性,但是由推测执行的指令引起的大量微体系结构更改(例如,缓存更新)通常会留在系统中。这些更改可用于泄露敏感信息,这导致人们疯狂地寻找能够消除此类安全漏洞的解决方案。这项工作的贡献是对隐藏缓存层次结构中推测的副作用的成本进行评估,只有在推测被解决后才使它们可见。为此,我们(第一次)比较了两种广泛的方法:i)在将负载发送到内存系统之前等待负载变得非推测性,ii)消除推测性的副作用,这是一种由不可见负载(Ghost load)和性能优化(Ghost Buffer和Materialization)组成的解决方案。虽然之前的工作,InvisiSpec,已经提出了类似的解决方案,我们的后一种方法,它已经做到了只有最小的评估和显著的性能成本。对我们的解决方案的详细评估表明:i)等待负载变得非推测性并不比之前提出的InvisiSpec解决方案更昂贵,尽管在内存系统中更简单,非侵入性,并且安全性更强;ii)隐藏Ghost负载的推测(在宽松内存模型的背景下)可以以12%的性能下降和9%的能量增加为代价实现,这比以前最先进的解决方案要好得多。
{"title":"Ghost loads: what is the cost of invisible speculation?","authors":"Christos Sakalis, M. Alipour, Alberto Ros, A. Jimborean, S. Kaxiras, Magnus Själander","doi":"10.1145/3310273.3321558","DOIUrl":"https://doi.org/10.1145/3310273.3321558","url":null,"abstract":"Speculative execution is necessary for achieving high performance on modern general-purpose CPUs but, starting with Spectre and Meltdown, it has also been proven to cause severe security flaws. In case of a misspeculation, the architectural state is restored to assure functional correctness but a multitude of microarchitectural changes (e.g., cache updates), caused by the speculatively executed instructions, are commonly left in the system. These changes can be used to leak sensitive information, which has led to a frantic search for solutions that can eliminate such security flaws. The contribution of this work is an evaluation of the cost of hiding speculative side-effects in the cache hierarchy, making them visible only after the speculation has been resolved. For this, we compare (for the first time) two broad approaches: i) waiting for loads to become non-speculative before issuing them to the memory system, and ii) eliminating the side-effects of speculation, a solution consisting of invisible loads (Ghost loads) and performance optimizations (Ghost Buffer and Materialization). While previous work, InvisiSpec, has proposed a similar solution to our latter approach, it has done so with only a minimal evaluation and at a significant performance cost. The detailed evaluation of our solutions shows that: i) waiting for loads to become non-speculative is no more costly than the previously proposed InvisiSpec solution, albeit much simpler, non-invasive in the memory system, and stronger security-wise; ii) hiding speculation with Ghost loads (in the context of a relaxed memory model) can be achieved at the cost of 12% performance degradation and 9% energy increase, which is significantly better that the previous state-of-the-art solution.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134639974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Energy-efficient approximate least squares accelerator: a case study of radio astronomy calibration processing 节能近似最小二乘加速器:射电天文标定处理实例研究
Pub Date : 2019-04-30 DOI: 10.1145/3310273.3323161
G. Gillani, A. Krapukhin, A. Kokkeler
Approximate computing allows the introduction of inaccuracy in the computation for cost savings, such as energy consumption, chip-area, and latency. Targeting energy efficiency, approximate designs for multipliers, adders, and multiply-accumulate (MAC) have been extensively investigated in the past decade. However, accelerator designs for relatively bigger architectures have been of less attention yet. The Least Squares (LS) algorithm is widely used in digital signal processing applications, e.g., image reconstruction. This work proposes a novel LS accelerator design based on a heterogeneous architecture, where the heterogeneity is introduced using accurate and approximate processing cores. We have considered a case study of radio astronomy calibration processing that employs a complex-input iterative LS algorithm. Our proposed methodology exploits the intrinsic error-resilience of the aforesaid algorithm, where initial iterations are processed on approximate modules while the later ones on accurate modules. Our energy-quality experiments have shown up to 24% of energy savings as compared to an accurate (optimized) counterpart for biased designs and up to 29% energy savings when unbiasing is introduced. The proposed LS accelerator design does not increase the number of iterations and provides sufficient precision to converge to an acceptable solution.
近似计算允许在计算中引入不准确性,以节省成本,例如能耗、芯片面积和延迟。在过去的十年中,以能源效率为目标,对乘法器、加法器和乘累加(MAC)的近似设计进行了广泛的研究。然而,针对相对较大架构的加速器设计却很少受到关注。最小二乘(LS)算法广泛应用于数字信号处理应用,如图像重建。本工作提出了一种基于异构架构的新型LS加速器设计,其中使用精确和近似的处理内核引入异构性。我们考虑了一个使用复杂输入迭代LS算法的射电天文学校准处理案例研究。我们提出的方法利用了上述算法固有的抗错误能力,其中初始迭代在近似模块上处理,而后期迭代在精确模块上处理。我们的能源质量实验表明,与精确(优化)的偏差设计相比,节能高达24%,引入无偏置设计时节能高达29%。所提出的LS加速器设计不会增加迭代次数,并提供足够的精度来收敛到可接受的解决方案。
{"title":"Energy-efficient approximate least squares accelerator: a case study of radio astronomy calibration processing","authors":"G. Gillani, A. Krapukhin, A. Kokkeler","doi":"10.1145/3310273.3323161","DOIUrl":"https://doi.org/10.1145/3310273.3323161","url":null,"abstract":"Approximate computing allows the introduction of inaccuracy in the computation for cost savings, such as energy consumption, chip-area, and latency. Targeting energy efficiency, approximate designs for multipliers, adders, and multiply-accumulate (MAC) have been extensively investigated in the past decade. However, accelerator designs for relatively bigger architectures have been of less attention yet. The Least Squares (LS) algorithm is widely used in digital signal processing applications, e.g., image reconstruction. This work proposes a novel LS accelerator design based on a heterogeneous architecture, where the heterogeneity is introduced using accurate and approximate processing cores. We have considered a case study of radio astronomy calibration processing that employs a complex-input iterative LS algorithm. Our proposed methodology exploits the intrinsic error-resilience of the aforesaid algorithm, where initial iterations are processed on approximate modules while the later ones on accurate modules. Our energy-quality experiments have shown up to 24% of energy savings as compared to an accurate (optimized) counterpart for biased designs and up to 29% energy savings when unbiasing is introduced. The proposed LS accelerator design does not increase the number of iterations and provides sufficient precision to converge to an acceptable solution.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132072261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Highway to HAL: open-sourcing the first extendable gate-level netlist reverse engineering framework 高速公路HAL:开源第一个可扩展的门级网络列表逆向工程框架
Pub Date : 2019-04-30 DOI: 10.1145/3310273.3323419
Sebastian Wallat, Nils Albartus, Steffen Becker, Max Hoffmann, Maik Ender, Marc Fyrbiak, Adrian Drees, Sebastian Maaßen, C. Paar
Since hardware oftentimes serves as the root of trust in our modern interconnected world, malicious hardware manipulations constitute a ubiquitous threat in the context of the Internet of Things (IoT). Hardware reverse engineering is a prevalent technique to detect such manipulations. Over the last years, an active research community has significantly advanced the field of hardware reverse engineering. Notably, many open research questions regarding the extraction of functionally correct netlists from Field Programmable Gate Arrays (FPGAs) or Application Specific Integrated Circuits (ASICs) have been tackled. In order to facilitate further analysis of recovered netlists, a software framework is required, serving as the foundation for specialized algorithms. Currently, no such framework is publicly available. Therefore, we provide the first open-source gate-library agnostic framework for gate-level netlist analysis. In this positional paper, we demonstrate the workflow of our modular framework HAL on the basis of two case studies and provide profound insights on its technical foundations.
由于硬件通常是我们现代互联世界中信任的基础,因此恶意硬件操作在物联网(IoT)环境中构成了无处不在的威胁。硬件逆向工程是检测此类操作的常用技术。在过去的几年里,一个活跃的研究团体已经显著地推进了硬件逆向工程领域。值得注意的是,许多关于从现场可编程门阵列(fpga)或专用集成电路(asic)中提取功能正确的网络列表的开放性研究问题已经得到解决。为了便于进一步分析恢复的网络列表,需要一个软件框架,作为专门算法的基础。目前,没有这样的框架是公开可用的。因此,我们提供了第一个用于门级网络列表分析的开源门库不可知论框架。在这篇论文中,我们在两个案例研究的基础上展示了我们的模块化框架HAL的工作流程,并对其技术基础提供了深刻的见解。
{"title":"Highway to HAL: open-sourcing the first extendable gate-level netlist reverse engineering framework","authors":"Sebastian Wallat, Nils Albartus, Steffen Becker, Max Hoffmann, Maik Ender, Marc Fyrbiak, Adrian Drees, Sebastian Maaßen, C. Paar","doi":"10.1145/3310273.3323419","DOIUrl":"https://doi.org/10.1145/3310273.3323419","url":null,"abstract":"Since hardware oftentimes serves as the root of trust in our modern interconnected world, malicious hardware manipulations constitute a ubiquitous threat in the context of the Internet of Things (IoT). Hardware reverse engineering is a prevalent technique to detect such manipulations. Over the last years, an active research community has significantly advanced the field of hardware reverse engineering. Notably, many open research questions regarding the extraction of functionally correct netlists from Field Programmable Gate Arrays (FPGAs) or Application Specific Integrated Circuits (ASICs) have been tackled. In order to facilitate further analysis of recovered netlists, a software framework is required, serving as the foundation for specialized algorithms. Currently, no such framework is publicly available. Therefore, we provide the first open-source gate-library agnostic framework for gate-level netlist analysis. In this positional paper, we demonstrate the workflow of our modular framework HAL on the basis of two case studies and provide profound insights on its technical foundations.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133816425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Iterative machine learning (IterML) for effective parameter pruning and tuning in accelerators 迭代机器学习(IterML)在加速器中的有效参数修剪和调整
Pub Date : 2019-04-30 DOI: 10.1145/3310273.3321563
Xuewen Cui, Wu-chun Feng
With the rise of accelerators (e.g., GPUs, FPGAs, and APUs) in computing systems, the parallel computing community needs better tools and mechanisms with which to productively extract performance. While modern compilers provide flags to activate different optimizations to improve performance, the effectiveness of such automated optimization depends on the algorithm and its mapping to the underlying accelerator architecture. Currently, however, extracting the best performance from an algorithm on an accelerator requires significant expertise and manual effort to exploit both spatial and temporal sharing of computing resources in order to improve overall performance. In particular, maximizing the performance on an algorithm on an accelerator requires extensive hyperparameter (e.g., thread-block size) selection and tuning. Given the myriad of hyperparameter dimensions to optimize across, the search space of optimizations is generally extremely large, making it infeasible to exhaustively evaluate each optimization configuration. This paper proposes an approach that uses statistical analysis with iterative machine learning (IterML) to prune and tune hyper-parameters to achieve better performance. During each iteration, we leverage machine-learning (ML) models to provide pruning and tuning guidance for the subsequent iterations. We evaluate our IterML approach on the selection of the GPU thread-block size across many benchmarks running on an NVIDIA P100 or V100 GPU. The experimental results show that our IterML approach can significantly reduce (i.e., improve) the search effort by 40% to 80%.
随着计算系统中加速器(例如gpu、fpga和apu)的兴起,并行计算社区需要更好的工具和机制来有效地提取性能。虽然现代编译器提供标志来激活不同的优化以提高性能,但这种自动优化的有效性取决于算法及其到底层加速器体系结构的映射。然而,目前,从加速器上的算法中提取最佳性能需要大量的专业知识和手工工作,以利用计算资源的空间和时间共享来提高整体性能。特别是,在加速器上最大化算法的性能需要大量的超参数(例如,线程块大小)选择和调优。由于要进行优化的超参数维数众多,因此优化的搜索空间通常非常大,因此无法详尽地评估每个优化配置。本文提出了一种使用统计分析和迭代机器学习(IterML)来修剪和调整超参数以获得更好性能的方法。在每次迭代中,我们利用机器学习(ML)模型为后续迭代提供修剪和调优指导。我们在NVIDIA P100或V100 GPU上运行的许多基准测试中评估了我们的IterML方法对GPU线程块大小的选择。实验结果表明,我们的IterML方法可以显著减少(即提高)40%到80%的搜索工作量。
{"title":"Iterative machine learning (IterML) for effective parameter pruning and tuning in accelerators","authors":"Xuewen Cui, Wu-chun Feng","doi":"10.1145/3310273.3321563","DOIUrl":"https://doi.org/10.1145/3310273.3321563","url":null,"abstract":"With the rise of accelerators (e.g., GPUs, FPGAs, and APUs) in computing systems, the parallel computing community needs better tools and mechanisms with which to productively extract performance. While modern compilers provide flags to activate different optimizations to improve performance, the effectiveness of such automated optimization depends on the algorithm and its mapping to the underlying accelerator architecture. Currently, however, extracting the best performance from an algorithm on an accelerator requires significant expertise and manual effort to exploit both spatial and temporal sharing of computing resources in order to improve overall performance. In particular, maximizing the performance on an algorithm on an accelerator requires extensive hyperparameter (e.g., thread-block size) selection and tuning. Given the myriad of hyperparameter dimensions to optimize across, the search space of optimizations is generally extremely large, making it infeasible to exhaustively evaluate each optimization configuration. This paper proposes an approach that uses statistical analysis with iterative machine learning (IterML) to prune and tune hyper-parameters to achieve better performance. During each iteration, we leverage machine-learning (ML) models to provide pruning and tuning guidance for the subsequent iterations. We evaluate our IterML approach on the selection of the GPU thread-block size across many benchmarks running on an NVIDIA P100 or V100 GPU. The experimental results show that our IterML approach can significantly reduce (i.e., improve) the search effort by 40% to 80%.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122371191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Realizing parallelism in quantum MISD architecture 实现量子MISD架构中的并行性
Pub Date : 2019-04-30 DOI: 10.1145/3310273.3322823
Suvadip Batabyal, Kounteya Sarkar
We propose an idea to speed up instruction execution through a probabilistic approach, using the parallelism offered by quantum computers. For this, we divide the instruction set of an arbitrary quantum instruction set architecture (QISA) into separate groups and then bias certain qubits representing the group so that only the instructions within the group have a high probability of getting executed in a quantum processor. Therefore, the result generated will be the superimposition of the qubits as if all the instructions within the group were executed simultaneously. We show that we can achieve a significant design improvement compared to classical computer.
我们提出了一种利用量子计算机提供的并行性,通过概率方法加速指令执行的想法。为此,我们将任意量子指令集架构(QISA)的指令集划分为单独的组,然后对代表该组的某些量子位进行偏置,以便只有组内的指令才有高概率在量子处理器中执行。因此,产生的结果将是量子比特的叠加,就好像组内的所有指令同时执行一样。我们表明,与传统计算机相比,我们可以实现显着的设计改进。
{"title":"Realizing parallelism in quantum MISD architecture","authors":"Suvadip Batabyal, Kounteya Sarkar","doi":"10.1145/3310273.3322823","DOIUrl":"https://doi.org/10.1145/3310273.3322823","url":null,"abstract":"We propose an idea to speed up instruction execution through a probabilistic approach, using the parallelism offered by quantum computers. For this, we divide the instruction set of an arbitrary quantum instruction set architecture (QISA) into separate groups and then bias certain qubits representing the group so that only the instructions within the group have a high probability of getting executed in a quantum processor. Therefore, the result generated will be the superimposition of the qubits as if all the instructions within the group were executed simultaneously. We show that we can achieve a significant design improvement compared to classical computer.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121096294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The FitOptiVis ECSEL project: highly efficient distributed embedded image/video processing in cyber-physical systems FitOptiVis ECSEL项目:网络物理系统中高效的分布式嵌入式图像/视频处理
Pub Date : 2019-04-30 DOI: 10.1145/3310273.3323437
Z. Al-Ars, T. Basten, A. D. Beer, M. Geilen, Dip Goswami, P. Jääskeläinen, J. Kadlec, M. Alejandro, F. Palumbo, G. Peeren, L. Pomante, F. V. Linden, Jukka Saarinen, T. Säntti, Carlo Sau, M. Zedda
Cyber-Physical Systems (CPS) are systems that are in feedback with their environment, possibly with humans in the loop. They are often distributed with sensors and actuators, smart, adaptive and predictive and react in real-time. Image- and video-processing pipelines are a prime source for environmental information improving the possibilities of active, relevant feedback. In such a context, FitOptiVis aims to provide end-to-end multi-objective optimization for imaging and video pipelines of CPS, with emphasis on energy and performance, leveraging on a reference architecture, supported by low-power, high-performance, smart devices, and by methods and tools for combined design-time and run-time multi-objective optimization within system and environment constraints.
信息物理系统(CPS)是与环境反馈的系统,可能与人类在循环中。它们通常与传感器和执行器一起分布,具有智能、自适应、预测性和实时反应能力。图像和视频处理管道是环境信息的主要来源,提高了主动、相关反馈的可能性。在这样的背景下,FitOptiVis旨在为CPS的成像和视频管道提供端到端的多目标优化,以能源和性能为重点,利用参考架构,以低功耗,高性能,智能设备为支持,通过在系统和环境约束下结合设计时和运行时多目标优化的方法和工具。
{"title":"The FitOptiVis ECSEL project: highly efficient distributed embedded image/video processing in cyber-physical systems","authors":"Z. Al-Ars, T. Basten, A. D. Beer, M. Geilen, Dip Goswami, P. Jääskeläinen, J. Kadlec, M. Alejandro, F. Palumbo, G. Peeren, L. Pomante, F. V. Linden, Jukka Saarinen, T. Säntti, Carlo Sau, M. Zedda","doi":"10.1145/3310273.3323437","DOIUrl":"https://doi.org/10.1145/3310273.3323437","url":null,"abstract":"Cyber-Physical Systems (CPS) are systems that are in feedback with their environment, possibly with humans in the loop. They are often distributed with sensors and actuators, smart, adaptive and predictive and react in real-time. Image- and video-processing pipelines are a prime source for environmental information improving the possibilities of active, relevant feedback. In such a context, FitOptiVis aims to provide end-to-end multi-objective optimization for imaging and video pipelines of CPS, with emphasis on energy and performance, leveraging on a reference architecture, supported by low-power, high-performance, smart devices, and by methods and tools for combined design-time and run-time multi-objective optimization within system and environment constraints.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114480535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
On the limitations of the chimera graph topology in using analog quantum computers 论嵌合体图拓扑在模拟量子计算机中的局限性
Pub Date : 2019-04-30 DOI: 10.1145/3310273.3322830
D. Vert, Renaud Sirdey, Stéphane Louise
This paper investigates the possibility of using an analog quantum computer as commercialized by D-Wave to solve large QUBO problems by means of a single invocation of the quantum annealer. Indeed this machine solves a spin glass problem with programmable coefficients but subject to quite strong topology restrictions on the set of non-zero coefficients. Rather than mapping problem variables onto multiple qbits, an approach which requires many invocations of the annealer to solve small size problems, it is tempting to investigate the existence of sparse relaxations compliant with the qbits interconnection topology of the machine, hence solvable in one invocation of the annealing oracle, but still providing good-quality solutions to the original problem. This paper provides an experimental setup which aims to determine whether or not such convenient relaxations do exist or, rather, are easy to find. Our experiments suggest that it is not the case and, therefore, that solving even moderate size arbitrary problems with a single call to a quantum annealer is not possible at least within the constraints of the so-called Chimera topology. We conclude the paper with a number of perspectives that this results imply on the design of heuristics taking profit of a quantum annealing oracle to solve large scale problems.
本文探讨了利用D-Wave商业化的模拟量子计算机通过单次调用量子退火器来解决大型QUBO问题的可能性。事实上,这台机器解决了一个具有可编程系数的自旋玻璃问题,但在非零系数集上受到相当强的拓扑限制。与其将问题变量映射到多个量子比特上,这种方法需要多次调用退火器来解决小尺寸问题,不如研究符合机器量子比特互连拓扑的稀疏松弛的存在性,从而可以在一次调用退火oracle中求解,但仍然为原始问题提供高质量的解。本文提供了一个实验装置,旨在确定这种方便的松弛是否存在,或者更确切地说,是否容易找到。我们的实验表明,情况并非如此,因此,至少在所谓的嵌合体拓扑的限制下,通过单次调用量子退火炉来解决中等大小的任意问题是不可能的。我们总结了这篇论文的一些观点,这些结果暗示了利用量子退火预言来解决大规模问题的启发式设计。
{"title":"On the limitations of the chimera graph topology in using analog quantum computers","authors":"D. Vert, Renaud Sirdey, Stéphane Louise","doi":"10.1145/3310273.3322830","DOIUrl":"https://doi.org/10.1145/3310273.3322830","url":null,"abstract":"This paper investigates the possibility of using an analog quantum computer as commercialized by D-Wave to solve large QUBO problems by means of a single invocation of the quantum annealer. Indeed this machine solves a spin glass problem with programmable coefficients but subject to quite strong topology restrictions on the set of non-zero coefficients. Rather than mapping problem variables onto multiple qbits, an approach which requires many invocations of the annealer to solve small size problems, it is tempting to investigate the existence of sparse relaxations compliant with the qbits interconnection topology of the machine, hence solvable in one invocation of the annealing oracle, but still providing good-quality solutions to the original problem. This paper provides an experimental setup which aims to determine whether or not such convenient relaxations do exist or, rather, are easy to find. Our experiments suggest that it is not the case and, therefore, that solving even moderate size arbitrary problems with a single call to a quantum annealer is not possible at least within the constraints of the so-called Chimera topology. We conclude the paper with a number of perspectives that this results imply on the design of heuristics taking profit of a quantum annealing oracle to solve large scale problems.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125352015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Artificial intelligent sensors at the core of cyber-physical-systems: from theory to practical applications 网络物理系统核心的人工智能传感器:从理论到实际应用
Pub Date : 2019-04-30 DOI: 10.1145/3310273.3324019
D. Pau
Cyber-Physical Systems (CPS) are becoming, without pace, more pervasive into embedded systems. Artificial Intelligence, Machine Learning and Deep Learning are mostly confined into the cloud, where unlimited computing resources seems to be available and evolving tirelessly. Unfortunately a layered architecture in which dumb sensors are attached to the cloud would become quickly too centralized, poorly scalable and slowly responsive in the IoT expected scenario that will deploy hundreds of billions of sensors communicating through low data rate networks. In that context, STMicroelectronics is developing solutions to bring Artificial Intelligence closer to the sensors. This talk will review new intelligent technological solutions and mechanisms under development and publicly announced, namely STM32CUBE.AI. The talk will tell how they represent the key ingredients needed to design the current and future generation of artificial intelligent cyber-physical embedded systems and derived applications based on STMicroelectronics heterogeneous sensors, micro controllers and SoCs. In particular, aspects related on how address current interoperability, productivity and constrained embedded resource gaps will be discussed with practical examples based on STM32CUBE.AI. Moreover, the investigation and design of adaptive and cognitive computational-intelligence techniques able to learn, adopting artificial neural networks, and operate in nonstationary environments will be introduced. Finally, the deployment of networked intelligent cyber-physical systems, able to operate in time varying environments, will be also commented.
信息物理系统(CPS)在嵌入式系统中正变得越来越普遍。人工智能、机器学习和深度学习大多局限于云端,在那里,无限的计算资源似乎是可用的,并且在不知疲倦地发展。不幸的是,在物联网的预期场景中,将部署数千亿传感器通过低数据速率网络进行通信的分层架构将很快变得过于集中,可扩展性差,响应速度慢。在这种背景下,意法半导体正在开发解决方案,使人工智能更接近传感器。本次演讲将回顾正在开发和公开发布的新的智能技术解决方案和机制,即STM32CUBE.AI。演讲将讲述它们如何代表设计当前和未来一代人工智能网络物理嵌入式系统以及基于意法半导体异构传感器、微控制器和soc的衍生应用所需的关键成分。特别是,将通过基于STM32CUBE.AI的实际示例讨论与如何解决当前互操作性、生产力和受限嵌入式资源差距相关的方面。此外,将介绍适应性和认知计算智能技术的研究和设计,这些技术能够学习,采用人工神经网络,并在非平稳环境中运行。最后,还将讨论能够在时变环境中运行的网络化智能网络物理系统的部署。
{"title":"Artificial intelligent sensors at the core of cyber-physical-systems: from theory to practical applications","authors":"D. Pau","doi":"10.1145/3310273.3324019","DOIUrl":"https://doi.org/10.1145/3310273.3324019","url":null,"abstract":"Cyber-Physical Systems (CPS) are becoming, without pace, more pervasive into embedded systems. Artificial Intelligence, Machine Learning and Deep Learning are mostly confined into the cloud, where unlimited computing resources seems to be available and evolving tirelessly. Unfortunately a layered architecture in which dumb sensors are attached to the cloud would become quickly too centralized, poorly scalable and slowly responsive in the IoT expected scenario that will deploy hundreds of billions of sensors communicating through low data rate networks. In that context, STMicroelectronics is developing solutions to bring Artificial Intelligence closer to the sensors. This talk will review new intelligent technological solutions and mechanisms under development and publicly announced, namely STM32CUBE.AI. The talk will tell how they represent the key ingredients needed to design the current and future generation of artificial intelligent cyber-physical embedded systems and derived applications based on STMicroelectronics heterogeneous sensors, micro controllers and SoCs. In particular, aspects related on how address current interoperability, productivity and constrained embedded resource gaps will be discussed with practical examples based on STM32CUBE.AI. Moreover, the investigation and design of adaptive and cognitive computational-intelligence techniques able to learn, adopting artificial neural networks, and operate in nonstationary environments will be introduced. Finally, the deployment of networked intelligent cyber-physical systems, able to operate in time varying environments, will be also commented.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117301897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sentiment evaluation of forex news 外汇新闻的情绪评价
Pub Date : 2019-04-30 DOI: 10.1145/3310273.3322821
Zhou Cheng, T. Qi, Jixiang Wang, Yu Zhou, Zhihong Wang, Yi Guo, Junfeng Zhao
Sentiment analysis is significant for excavating text opinion. There are two issues in the foreign exchange (Forex) field. 1) In sentiment orientation, most researches focus on product reviews, lack fine-grained sentiment analysis for Forex news. 2) In sentiment intensity, most works consider the intensity of sentiment words but ignore the significance of field characteristics. Aiming at the two problems, a fine-grained Sentiment Analysis model (shorted as WD-SA) is established, which integrates with the Weight of sentiment words and Domain features. First, the semantic information of text is embedded into a vector based on word2vec. Then, sentiment orientation is detected by a method, which combines machine learning algorithm and the weight of sentiment words. Finally, features are extracted to investigate the intensity of news. The experimental results show that our algorithm outperforms the state-of-the-art.
情感分析对于挖掘文本观点具有重要意义。外汇交易领域有两个问题。1)在情绪导向方面,大多数研究集中在产品评论上,缺乏对外汇新闻的细粒度情绪分析。2)在情感强度方面,大多数作品考虑了情感词的强度,而忽略了场域特征的重要性。针对这两个问题,建立了一种结合情感词权重和领域特征的细粒度情感分析模型(简称WD-SA)。首先,将文本的语义信息嵌入到基于word2vec的向量中。然后,采用一种结合机器学习算法和情感词权重的方法检测情感倾向;最后,提取特征来研究新闻的强度。实验结果表明,我们的算法优于目前最先进的算法。
{"title":"Sentiment evaluation of forex news","authors":"Zhou Cheng, T. Qi, Jixiang Wang, Yu Zhou, Zhihong Wang, Yi Guo, Junfeng Zhao","doi":"10.1145/3310273.3322821","DOIUrl":"https://doi.org/10.1145/3310273.3322821","url":null,"abstract":"Sentiment analysis is significant for excavating text opinion. There are two issues in the foreign exchange (Forex) field. 1) In sentiment orientation, most researches focus on product reviews, lack fine-grained sentiment analysis for Forex news. 2) In sentiment intensity, most works consider the intensity of sentiment words but ignore the significance of field characteristics. Aiming at the two problems, a fine-grained Sentiment Analysis model (shorted as WD-SA) is established, which integrates with the Weight of sentiment words and Domain features. First, the semantic information of text is embedded into a vector based on word2vec. Then, sentiment orientation is detected by a method, which combines machine learning algorithm and the weight of sentiment words. Finally, features are extracted to investigate the intensity of news. The experimental results show that our algorithm outperforms the state-of-the-art.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134122213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Simulation with skeletons of applications using dimemas 模拟使用dimemas的应用程序骨架
Pub Date : 2019-04-30 DOI: 10.1145/3310273.3322827
C. Camarero, C. Martínez, J. L. Bosque
Large computer systems, like those in the TOP 500 ranking, comprise about hundreds of thousands cores. Simulating application execution in these systems is very complex and costly. This article explores the option of using application skeletons, together with an analytic simulator, to study the performance of these large systems. With this aim, the Dimemas simulator has been enhanced with the capability of simulating application skeletons. This enhancement allows simulating the skeleton of Lulesh, an application with 90k processes in a single day. In addition, it also generates traces, which is of great value to validate skeletons and simulations.
大型计算机系统,比如排名前500的那些,由大约数十万个核心组成。在这些系统中模拟应用程序的执行是非常复杂和昂贵的。本文探讨了使用应用程序框架和分析模拟器来研究这些大型系统的性能的选项。有了这个目标,Dimemas模拟器已经增强了模拟应用程序框架的能力。这种增强允许模拟Lulesh的骨架,这是一个在一天内具有90k个进程的应用程序。此外,它还可以生成轨迹,这对验证骨架和模拟具有重要价值。
{"title":"Simulation with skeletons of applications using dimemas","authors":"C. Camarero, C. Martínez, J. L. Bosque","doi":"10.1145/3310273.3322827","DOIUrl":"https://doi.org/10.1145/3310273.3322827","url":null,"abstract":"Large computer systems, like those in the TOP 500 ranking, comprise about hundreds of thousands cores. Simulating application execution in these systems is very complex and costly. This article explores the option of using application skeletons, together with an analytic simulator, to study the performance of these large systems. With this aim, the Dimemas simulator has been enhanced with the capability of simulating application skeletons. This enhancement allows simulating the skeleton of Lulesh, an application with 90k processes in a single day. In addition, it also generates traces, which is of great value to validate skeletons and simulations.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131383676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the 16th ACM International Conference on Computing Frontiers
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1