Anais Estendidos do Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD)最新文献

英文中文

Ajuste Dinâmico de Threads para Execução Eficiente de Aplicações Iterativas OpenMP 动态线程调整，以有效执行迭代OpenMP应用程序

Anais Estendidos do Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD)

Pub Date : 2019-11-12 DOI: 10.5753/wscad_estendido.2019.8702

Marcio N.P. Silva, Aline P. Nascimento, Alexandre da Costa Sena

Duas abordagens distintas podem ser adotadas para aumentar o desempenho das aplicações paralelas em uma arquitetura de alto desempenho: (i) ferramentas de auto-tuning; (ii) tornar ou criar aplicações que consigam se adaptar ao ambiente disponı́vel. Enquanto a primeira solução requer que o programador adapte sua aplicação a ferramenta e execute um profiling do ambiente, a segunda estratégia é mais complexa, pois necessita que o cientista seja capaz de criar aplicações com algum grau de autonomia. Assim, este trabalho propõe e avalia uma estrategia para ajustar dinamicamente a quantidade de theads de aplicações iterativas OpenMP. Os resultados obtidos mostram a viabilidade da estratégia proposta, que foi capaz de executar a aplicação para o problema de leilão eficientemente.

在高性能架构中，可以采用两种不同的方法来提高并行应用程序的性能:(i)自调优工具;(2)成为或创建的应用能适应环境描述ı́韦尔。第一种解决方案要求程序员使他的应用程序适应工具并执行环境分析，而第二种策略则更为复杂，因为它要求科学家能够创建具有一定程度自主权的应用程序。因此，本文提出并评估了一种动态调整迭代OpenMP应用程序头数的策略。结果表明，所提出的策略是可行的，能够有效地应用于拍卖问题。

引用次数: 0

Otimização do Método HOPMOC 1D com auxílio das ferramentas Intel Parallel Studio 利用Intel Parallel Studio工具优化HOPMOC 1D方法

Anais Estendidos do Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD)

Pub Date : 2019-11-12 DOI: 10.5753/wscad_estendido.2019.8697

G. Costa, F. Cabral, C. Osthoff

Esse trabalho apresenta um estudo comparativo entre diferentes técnicas de paralelização utilizadas para aumentar o desempenho do método numérico HOPMOC para resolução de equações diferenciais parciais hiperbólicas de problemas de convecção-difusão. O objetivo é avaliar os ganhos de duas estratégias desenvolvidas à partir da versão original do código, com o intuito de diminuir os tempos gastos em barreiras de sincronização, e compará-las entre si. Além disso o trabalho traz um novo estudo em relação às outras publicações envolvendo o HOPMOC: a análise da relação entre Spin Time e CPU Time para comprovar a eficiência das estratégias desenvolvidas.

本文对不同的并行化技术进行了比较研究，以提高HOPMOC数值方法求解对流-扩散问题双曲偏微分方程的性能。目的是评估从代码的原始版本中开发的两种策略的收益，以减少在同步障碍上花费的时间，并对它们进行比较。此外，本文还对其他涉及HOPMOC的出版物进行了新的研究:分析自旋时间和CPU时间之间的关系，以证明所开发策略的有效性。

引用次数: 1

Arquitetura Adaptável para Execução de Redes Neurais Artificiais em Dispositivos FPGA 在FPGA设备上执行人工神经网络的自适应架构

Anais Estendidos do Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD)

Pub Date : 2019-11-12 DOI: 10.5753/wscad_estendido.2019.8696

W. Castro, M. Heinen, Bruno Neves

Dentro do campo de Inteligências Artificiais, as Redes Neurais Artificiais (RNA) recebem destaque pela capacidade de aprender através de processos de treinamento e sua pluralidade de aplicações, que vão desde a classificação de padrões até o cálculo de funções. A implementação de algoritmos em hardware permite a paralelização de etapas e, então, a aceleração de processamento. Este trabalho propõe uma arquitetura de hardware de propósito geral para a execução de RNA em dispositivos FPGA. Implementada através da linguagem VHDL, a arquitetura proposta processa uma camada em média a cada 3 ciclos de clock. Simulada no dispositivo EP3C25F324C6, foi atingida a frequência de clock de 106.53 MHz e necessários 65.5 Kb de memória.

在人工智能领域，人工神经网络(ann)因其通过训练过程学习的能力及其从模式分类到功能计算的多种应用而受到关注。硬件算法的实现允许并行化步骤，从而加速处理。本文提出了一种在FPGA设备上执行RNA的通用硬件架构。该体系结构通过VHDL语言实现，平均每3个时钟周期处理一个层。在EP3C25F324C6设备上模拟，时钟频率达到106.53 MHz，需要65.5 Kb内存。

引用次数: 0

An Interference-aware Virtual Machine Placement Strategy for Small-scale HPC Applications in Clouds 云环境下小型高性能计算应用的干扰感知虚拟机布局策略

Anais Estendidos do Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD)

Pub Date : 2019-11-12 DOI: 10.5753/wscad_estendido.2019.8707

Maicon Melo Alves, Lúcia M. A. Drummond

The cross-interference problem may occur when applications are executed in virtual machines placed in a same physical machine. Although many previous works have proposed several different strategies for Virtual Machine Placement, neither of them have employed a suitable method for predicting cross-interference nor have considered the minimization of the number of used physical machines at the same time. In this thesis, we define the Interference-aware Virtual Machine Placement Problem for small-scale HPC applications in Clouds (IVMPP) that tackles both problems by minimizing, at the same time, the cross-interference of small-scale HPC applications, that can share physical machines, and the number of physical machines used to allocate them. We propose a mathematical formulation and a strategy based on the Iterated Local Search framework to solve this problem. Moreover, we also propose a quantitative and multivariate model to predict interference for a set of applications allocated to the same physical machine. Experiments executed in a real scenario, by using applications from the oil and gas industry and the HPCC benchmark suite, showed that our method outperforms several heuristics from the related literature in terms of interference, while using the same number of physical machines.

当应用程序在放置在同一物理机中的虚拟机中执行时，可能会出现交叉干扰问题。尽管许多先前的工作已经提出了几种不同的虚拟机放置策略，但它们都没有采用合适的方法来预测交叉干扰，也没有考虑同时使用的物理机数量的最小化。在本文中，我们定义了云中的小规模HPC应用程序的干扰感知虚拟机放置问题(IVMPP)，该问题通过最小化小规模HPC应用程序(可以共享物理机器)的交叉干扰以及用于分配它们的物理机器数量来解决这两个问题。我们提出了一个数学公式和基于迭代局部搜索框架的策略来解决这个问题。此外，我们还提出了一个定量和多元模型来预测分配给同一物理机器的一组应用程序的干扰。在真实场景中，通过使用来自石油和天然气行业的应用程序和HPCC基准套件进行的实验表明，在使用相同数量的物理机器的情况下，我们的方法在干扰方面优于相关文献中的几种启发式方法。

{"title":"An Interference-aware Virtual Machine Placement Strategy for Small-scale HPC Applications in Clouds","authors":"Maicon Melo Alves, Lúcia M. A. Drummond","doi":"10.5753/wscad_estendido.2019.8707","DOIUrl":"https://doi.org/10.5753/wscad_estendido.2019.8707","url":null,"abstract":"The cross-interference problem may occur when applications are executed in virtual machines placed in a same physical machine. Although many previous works have proposed several different strategies for Virtual Machine Placement, neither of them have employed a suitable method for predicting cross-interference nor have considered the minimization of the number of used physical machines at the same time. In this thesis, we define the Interference-aware Virtual Machine Placement Problem for small-scale HPC applications in Clouds (IVMPP) that tackles both problems by minimizing, at the same time, the cross-interference of small-scale HPC applications, that can share physical machines, and the number of physical machines used to allocate them. We propose a mathematical formulation and a strategy based on the Iterated Local Search framework to solve this problem. Moreover, we also propose a quantitative and multivariate model to predict interference for a set of applications allocated to the same physical machine. Experiments executed in a real scenario, by using applications from the oil and gas industry and the HPCC benchmark suite, showed that our method outperforms several heuristics from the related literature in terms of interference, while using the same number of physical machines.","PeriodicalId":280012,"journal":{"name":"Anais Estendidos do Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116924941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Implementações GPGPU do Algoritmo de Otimização por Enxame de Partı́culas para o Problema da Mochila Multidimensional GPGPU的优化算法实现对群里ı́culas多维背包问题

Anais Estendidos do Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD)

Pub Date : 2019-11-12 DOI: 10.5753/wscad_estendido.2019.8703

B. S. Munhão, Bianca de Almeida Dantas, Edson Norberto Cáceres, Henrique Mongelli

Um dos problemas mais conhecidos de otimização combinatória, que possui diversas aplicações práticas, é o problema da mochila multidimensional (MKP). Apesar de sua popularidade e da demanda por soluções de alta qualidade, este é um problema N P-difı́cil, o que leva à necessidade de buscar estratégias alternativas para obtenção de boas soluções em tempo viável. Neste contexto, as metaheurı́sticas se destacam, visto que têm sido bem sucedidas na resolução de diferentes problemas difı́ceis, inclusive do MKP. Neste trabalho, são propostas duas implementações usando GPGPU do algoritmo de otimização por enxame de partı́culas (PSO). A redução nos tempos de execução dos programas GPGPU em comparação com a versão sequencial foi relevante, o que mostrou a eficácia do uso de estratégias de paralelização com a metaheurı́stica estudada.

摘要多维背包问题是组合优化中最著名的问题之一，具有广泛的实际应用。尽管他的声望和需求的高质量的解决方案,这是一个问题(P -difı́cil,导致需要拿另类投资,用于看似可行的。在这种背景下,metaheurı́信息显现出来,既然已经成功地解决各种问题fidı́ceis,包括MKP。这个工作,提出了优化算法的两种实现使用GPGPU群里ı́culas (osp)。程序执行时间的下降相比,连续的版本是没找到相关的有使用并行化策略的有效性和metaheurı́需求研究。

引用次数: 0

pamPython: proposta de um processador para executar algoritmos Python pamPython:提出一种执行Python算法的处理器

Anais Estendidos do Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD)

Pub Date : 2019-11-12 DOI: 10.5753/wscad_estendido.2019.8694

Tulio Bitencourt, B. Neves

Este artigo descreve o processo de desenvolvimento de um processador capaz de executar algoritmos escritos em Python. Esse processador foi desenvolvido utilizando-se a linguagem de descrição de hardware chamada VHDL e seu principal objetivo era seguir a documentação Python e executar seu respectivo código Assembly. Foi alcançado, como resultado desta primeira versão, uma arquitetura de propósito geral funcional.

本文描述了开发能够执行用Python编写的算法的处理器的过程。这个处理器是使用称为VHDL的硬件描述语言开发的，它的主要目标是遵循Python文档并执行其汇编代码。作为第一个版本的结果，实现了一个功能性的通用体系结构。

引用次数: 0

Simplicity, Reproducibility And Scalabilityfor Huge Wireless Sensor Network Simulations 大型无线传感器网络模拟的简单性、可重复性和可扩展性

Anais Estendidos do Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD)

Pub Date : 2019-11-12 DOI: 10.5753/WSCAD_ESTENDIDO.2019.8713

M. L. Silva, Joubert de Castro Lima

Neste trabalho apresentamos duas contribuições para a literatura de redes de sensores sem fio(WSN). A primeira é um modelo geral para alcançar a reprodutibilidade no nível do kernelem simuladores paralelos. Infelizmente, os usuários devem implementar do zero como suassimulações se repetem em simuladores WSN, mas uma simulação paralela ou distribuída im-põe o princípio de concorrência, não trivial de ser implementada por não especialistas. Testesusando o simulador chamadoJSensorcomprovaram que o modelo garante o nível mais restritode reprodutibilidade, mesmo quando as simulações adotam diferentes números de threads oudiferentes máquinas em múltiplas execuções. A segunda contribuição é o simuladorJSensor,um simulador paralelo de uso geral para aplicações WSN de grande escala e algoritmos dis-tribuídos de alto nível. OJSensorintroduz elementos de simulação mais realistas, como oambiente representado por células personalizáveis e eventos de aplicação que representam fenô-menos naturais, como raios, vento, sol, chuva e muito mais. As células são colocadas em umagrade que representa o ambiente com características do espaço definido pelos usuários, comotemperatura, pressão e qualidade do ar. Avaliações experimentais mostram que oJSensortem boa escalabilidade em arquiteturas de computadores multi-core, alcançando umspeedupde 7,45 em uma máquina com 16 núcleos com tecnologiaHyper-Threading, portanto 50% dosnúcleos são virtuais. O JSensor também provou ser 21% mais rápido que oOMNeT++aosimular um modelo do tipo flooding.

在这项工作中，我们提出了两个贡献的文献无线传感器网络(WSN)。第一个是在并行模拟器内核级实现再现性的一般模型。不幸的是，用户必须从头开始实现，因为他们的模拟在WSN模拟器中重复，但并行或分布式模拟强加了并发原则，而不是简单地由非专业人员实现。使用名为jsensor的模拟器进行的测试表明，即使模拟在多次运行中采用不同数量的线程或不同的机器，该模型也能保证最严格的可重复性。第二个贡献是传感器模拟器，这是一个用于大规模WSN应用和高级分布式算法的通用并行模拟器。ojsensor引入了更真实的模拟元素，如由可定制单元格表示的环境，以及代表自然现象(如闪电、风、太阳、雨等)的应用事件。细胞是贴在umagrade代表着空间的环境特征定义为用户、comotemperatura压力和空气质量。实验评价表明,oJSensortem多核计算机体系结构良好的可伸缩性,umspeedupde 7, 45一台16核处理器tecnologiaHyper -Threading所以dosnúcleos 50%都是虚拟的。在模拟洪水模型时，JSensor也被证明比oomnet++快21%。

{"title":"Simplicity, Reproducibility And Scalabilityfor Huge Wireless Sensor Network Simulations","authors":"M. L. Silva, Joubert de Castro Lima","doi":"10.5753/WSCAD_ESTENDIDO.2019.8713","DOIUrl":"https://doi.org/10.5753/WSCAD_ESTENDIDO.2019.8713","url":null,"abstract":"Neste trabalho apresentamos duas contribuições para a literatura de redes de sensores sem fio(WSN). A primeira é um modelo geral para alcançar a reprodutibilidade no nível do kernelem simuladores paralelos. Infelizmente, os usuários devem implementar do zero como suassimulações se repetem em simuladores WSN, mas uma simulação paralela ou distribuída im-põe o princípio de concorrência, não trivial de ser implementada por não especialistas. Testesusando o simulador chamadoJSensorcomprovaram que o modelo garante o nível mais restritode reprodutibilidade, mesmo quando as simulações adotam diferentes números de threads oudiferentes máquinas em múltiplas execuções. A segunda contribuição é o simuladorJSensor,um simulador paralelo de uso geral para aplicações WSN de grande escala e algoritmos dis-tribuídos de alto nível. OJSensorintroduz elementos de simulação mais realistas, como oambiente representado por células personalizáveis e eventos de aplicação que representam fenô-menos naturais, como raios, vento, sol, chuva e muito mais. As células são colocadas em umagrade que representa o ambiente com características do espaço definido pelos usuários, comotemperatura, pressão e qualidade do ar. Avaliações experimentais mostram que oJSensortem boa escalabilidade em arquiteturas de computadores multi-core, alcançando umspeedupde 7,45 em uma máquina com 16 núcleos com tecnologiaHyper-Threading, portanto 50% dosnúcleos são virtuais. O JSensor também provou ser 21% mais rápido que oOMNeT++aosimular um modelo do tipo flooding.","PeriodicalId":280012,"journal":{"name":"Anais Estendidos do Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD)","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131289199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hardening Strategies for HPC Applications HPC应用的加固策略

Anais Estendidos do Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD)

Pub Date : 2019-11-12 DOI: 10.5753/WSCAD_ESTENDIDO.2019.8708

Daniel Oliveira, P. Rech, P. Navaux

HPC devices reliability is one of the major concerns for supercomputers today and for the next generation. In fact, the high number of devices in large data centers makes the probability of having at least a device corrupted to be very high. In this work, we first evaluate the problem by performing radiation experiments. The data from the experiments give us realistic error rate of HPC devices. Moreover, we evaluate a representative set of algorithms deriving general insights of parallel algorithms and programming approaches reliability. To understand better the problem, we propose a novel methodology to go beyond the quantification of the problem. We qualify the error by evaluating the criticality of each corrupted execution through a dedicated set of metrics. We show that, as long as imprecise computing is concerned, the simple mismatch detection is not sufficient to evaluate and compare the radiation sensitivity of HPC devices and algorithms. Our analysis quantifies and qualifies radiation effects on applications output correlating the number of corrupted elements with their spatial locality. We also provide the mean relative error (dataset-wise) to evaluate radiation-induced error magnitude. Furthermore, we designed a homemade fault-injector, CAROL-FI, to understand further the problem by collecting information using fault injection campaigns that is not possible through radiation experiments. We inject different fault models to analyze the sensitivity of given applications. We show that portions of applications can be graded by different criticalities. Mitigation techniques can then be relaxed or hardened based on the criticality of the particular portions. This work also evaluates the reliability behaviors of six different architectures, ranging from HPC devices to embedded ones, with the aim to isolate code- and architecture-dependent behaviors. For this evaluation, we present and discuss radiation experiments that cover a total of more than 352,000 years of natural exposure and fault-injection analysis based on a total of more than 120,000 injections. Finally, Error-Correcting Code, Algorithm-Based Fault Tolerance, and Duplication With Comparison hardening strategies are presented and evaluated on HPC devices through radiation experiments. We present and compare both the reliability improvement and imposed overhead of the selected hardening solutions. Then, we propose and analyze the impact of selective hardening for HPC algorithms. We perform fault-injection campaigns to identify the most critical source code variables and present how to select the best candidates to maximize the reliability/overhead ratio.

高性能计算设备的可靠性是当今和下一代超级计算机的主要关注点之一。事实上，大型数据中心中大量的设备使得至少有一个设备损坏的可能性非常高。在这项工作中，我们首先通过进行辐射实验来评估问题。实验数据给出了HPC器件的实际误差率。此外，我们评估了一组具有代表性的算法，得出了并行算法和编程方法可靠性的一般见解。为了更好地理解这个问题，我们提出了一种超越问题量化的新方法。我们通过一组专门的指标评估每个损坏执行的严重性，从而限定错误。我们表明，只要考虑到不精确计算，简单的失配检测不足以评估和比较HPC设备和算法的辐射灵敏度。我们的分析量化并限定了辐射对应用输出的影响，这些影响与损坏元素的数量及其空间局部性有关。我们还提供了平均相对误差(数据集)来评估辐射引起的误差大小。此外，我们设计了一个自制的故障注入器CAROL-FI，通过使用故障注入活动收集信息来进一步了解问题，这是通过辐射实验无法实现的。我们注入不同的故障模型来分析给定应用的灵敏度。我们展示了应用程序的部分可以根据不同的临界程度进行分级。然后可以根据特定部分的临界程度放松或强化缓解技术。这项工作还评估了六种不同架构的可靠性行为，从高性能计算设备到嵌入式设备，目的是隔离代码和架构相关的行为。为了进行评估，我们提出并讨论了辐射实验，这些实验涵盖了超过352,000年的自然暴露和基于超过120,000次注入的断层注入分析。最后，提出了纠错码、基于算法的容错和复制比较强化策略，并通过辐射实验在高性能计算设备上进行了评估。我们提出并比较了所选加固解决方案的可靠性改进和强加的开销。然后，我们提出并分析了选择性强化对HPC算法的影响。我们执行错误注入活动来识别最关键的源代码变量，并介绍如何选择最佳候选变量以最大化可靠性/开销比。

{"title":"Hardening Strategies for HPC Applications","authors":"Daniel Oliveira, P. Rech, P. Navaux","doi":"10.5753/WSCAD_ESTENDIDO.2019.8708","DOIUrl":"https://doi.org/10.5753/WSCAD_ESTENDIDO.2019.8708","url":null,"abstract":"HPC devices reliability is one of the major concerns for supercomputers today and for the next generation. In fact, the high number of devices in large data centers makes the probability of having at least a device corrupted to be very high. In this work, we first evaluate the problem by performing radiation experiments. The data from the experiments give us realistic error rate of HPC devices. Moreover, we evaluate a representative set of algorithms deriving general insights of parallel algorithms and programming approaches reliability. To understand better the problem, we propose a novel methodology to go beyond the quantification of the problem. We qualify the error by evaluating the criticality of each corrupted execution through a dedicated set of metrics. We show that, as long as imprecise computing is concerned, the simple mismatch detection is not sufficient to evaluate and compare the radiation sensitivity of HPC devices and algorithms. Our analysis quantifies and qualifies radiation effects on applications output correlating the number of corrupted elements with their spatial locality. We also provide the mean relative error (dataset-wise) to evaluate radiation-induced error magnitude. Furthermore, we designed a homemade fault-injector, CAROL-FI, to understand further the problem by collecting information using fault injection campaigns that is not possible through radiation experiments. We inject different fault models to analyze the sensitivity of given applications. We show that portions of applications can be graded by different criticalities. Mitigation techniques can then be relaxed or hardened based on the criticality of the particular portions. This work also evaluates the reliability behaviors of six different architectures, ranging from HPC devices to embedded ones, with the aim to isolate code- and architecture-dependent behaviors. For this evaluation, we present and discuss radiation experiments that cover a total of more than 352,000 years of natural exposure and fault-injection analysis based on a total of more than 120,000 injections. Finally, Error-Correcting Code, Algorithm-Based Fault Tolerance, and Duplication With Comparison hardening strategies are presented and evaluated on HPC devices through radiation experiments. We present and compare both the reliability improvement and imposed overhead of the selected hardening solutions. Then, we propose and analyze the impact of selective hardening for HPC algorithms. We perform fault-injection campaigns to identify the most critical source code variables and present how to select the best candidates to maximize the reliability/overhead ratio.","PeriodicalId":280012,"journal":{"name":"Anais Estendidos do Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130624411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Estratégias de exploração de vizinhança com GPU para problemas de otimização GPU优化问题的邻域探索策略

Anais Estendidos do Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD)

Pub Date : 2019-11-12 DOI: 10.5753/wscad_estendido.2019.8712

Rodolfo Pereira Araujo, Igor M. Coelho, Leandro A. J. Marzulo

Problemas de otimização são de grande importância para diversos setores da indústria, desde o planejamento de produção até escoamento e transporte de produtos. Diversos problemas de interesse se enquadram na classe NP-Difícil, sendo desconhecidos algoritmos para resolvê-los de forma exata em tempo polinomial. Assim, estratégias heurísticas com capacidade de escapar de ótimos locais de baixa qualidade (meta-heurísticas) são geralmente empregadas. A busca local é, em geral, a etapa mais custosa, em termos de tempo computacional, do processo de uma meta-heurística. Desta forma torna-se muito importante fazer bom uso dos recursos nela utilizados. Esta dissertação estuda o emprego de múltiplas estratégias de vizinhança utilizadas paralelamente para explorar um espaço de vizinhança maior e com melhor aproveitamento dos recursos computacionais. O processamento paralelo das estratégias de vizinhança é implementado em nível de grão fino, através de processamento em GPU, e grão grosso, por meio de processamento multi core e processamento em rede, sendo os dois níveis combinados num ambiente heterogêneo, para arquiteturas von Neumann e Dataflow.

优化问题对于从生产计划到产品流动和运输的各个行业都是非常重要的。一些感兴趣的问题属于NP- hard类，在多项式时间内精确解决它们的算法是未知的。因此，通常采用能够逃避低质量局部最优的启发式策略(元启发式)。就计算时间而言，局部搜索通常是元启发式过程中最昂贵的步骤。因此，充分利用所使用的资源是非常重要的。本文研究了并行使用多邻域策略来探索更大的邻域空间和更好地利用计算资源。对于冯·诺伊曼和数据流架构，邻域策略的并行处理是通过GPU处理实现的细粒级和通过多核处理和网络处理实现的粗粒级，这两个级别在异构环境中结合。

{"title":"Estratégias de exploração de vizinhança com GPU para problemas de otimização","authors":"Rodolfo Pereira Araujo, Igor M. Coelho, Leandro A. J. Marzulo","doi":"10.5753/wscad_estendido.2019.8712","DOIUrl":"https://doi.org/10.5753/wscad_estendido.2019.8712","url":null,"abstract":"Problemas de otimização são de grande importância para diversos setores da indústria, desde o planejamento de produção até escoamento e transporte de produtos. Diversos problemas de interesse se enquadram na classe NP-Difícil, sendo desconhecidos algoritmos para resolvê-los de forma exata em tempo polinomial. Assim, estratégias heurísticas com capacidade de escapar de ótimos locais de baixa qualidade (meta-heurísticas) são geralmente empregadas. A busca local é, em geral, a etapa mais custosa, em termos de tempo computacional, do processo de uma meta-heurística. Desta forma torna-se muito importante fazer bom uso dos recursos nela utilizados. Esta dissertação estuda o emprego de múltiplas estratégias de vizinhança utilizadas paralelamente para explorar um espaço de vizinhança maior e com melhor aproveitamento dos recursos computacionais. O processamento paralelo das estratégias de vizinhança é implementado em nível de grão fino, através de processamento em GPU, e grão grosso, por meio de processamento multi core e processamento em rede, sendo os dois níveis combinados num ambiente heterogêneo, para arquiteturas von Neumann e Dataflow.","PeriodicalId":280012,"journal":{"name":"Anais Estendidos do Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126003471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimizing Neural Network Training through TensorFlow Profile Analysis in a Shared Memory System 基于TensorFlow剖面分析的共享内存系统神经网络训练优化

Anais Estendidos do Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD)

Pub Date : 2019-11-12 DOI: 10.5753/wscad_estendido.2019.8701

F. Vilasbôas, Calebe P. Bianchini, Rodrigo Pasti, L. Castro

On the one hand, Deep Neural Networks have emerged as a powerful tool for solving complex problems in image and text analysis. On the other, they are sophisticated learning machines that require deep programming and math skills to be understood and implemented. Therefore, most researchers employ toolboxes and frameworks to design and implement such architectures. This paper performs an execution analysis of TensorFlow, one of the most used deep network frameworks available, on a shared memory system. To do so, we chose a text classification problem based on tweets sentiment analysis. The focus of this work is to identify the best environment configuration for training neural networks on a shared memory system. We set five different configurations using environment variables to modify the TensorFlow execution behavior. The results on an Intel Xeon Platinum 8000 processors series show that the default environment configuration of the TensorFlow can increase the speed up to 5.8. But, fine-tuning this environment can improve the speedup at least 37%.

一方面，深度神经网络已经成为解决图像和文本分析中复杂问题的强大工具。另一方面，它们是复杂的学习机器，需要深厚的编程和数学技能才能理解和实现。因此，大多数研究人员使用工具箱和框架来设计和实现这样的体系结构。本文对目前最常用的深度网络框架TensorFlow在共享内存系统上的执行情况进行了分析。为此，我们选择了一个基于tweets情感分析的文本分类问题。这项工作的重点是确定在共享内存系统上训练神经网络的最佳环境配置。我们使用环境变量设置了五种不同的配置来修改TensorFlow的执行行为。在英特尔至强白金8000处理器系列上的结果表明，默认环境配置可以将TensorFlow的速度提高到5.8。但是，对这个环境进行微调可以将加速提高至少37%。

{"title":"Optimizing Neural Network Training through TensorFlow Profile Analysis in a Shared Memory System","authors":"F. Vilasbôas, Calebe P. Bianchini, Rodrigo Pasti, L. Castro","doi":"10.5753/wscad_estendido.2019.8701","DOIUrl":"https://doi.org/10.5753/wscad_estendido.2019.8701","url":null,"abstract":"On the one hand, Deep Neural Networks have emerged as a powerful tool for solving complex problems in image and text analysis. On the other, they are sophisticated learning machines that require deep programming and math skills to be understood and implemented. Therefore, most researchers employ toolboxes and frameworks to design and implement such architectures. This paper performs an execution analysis of TensorFlow, one of the most used deep network frameworks available, on a shared memory system. To do so, we chose a text classification problem based on tweets sentiment analysis. The focus of this work is to identify the best environment configuration for training neural networks on a shared memory system. We set five different configurations using environment variables to modify the TensorFlow execution behavior. The results on an Intel Xeon Platinum 8000 processors series show that the default environment configuration of the TensorFlow can increase the speed up to 5.8. But, fine-tuning this environment can improve the speedup at least 37%.","PeriodicalId":280012,"journal":{"name":"Anais Estendidos do Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124438693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Anais Estendidos do Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀