Christos Sakalis, M. Alipour, Alberto Ros, A. Jimborean, S. Kaxiras, Magnus Själander
Speculative execution is necessary for achieving high performance on modern general-purpose CPUs but, starting with Spectre and Meltdown, it has also been proven to cause severe security flaws. In case of a misspeculation, the architectural state is restored to assure functional correctness but a multitude of microarchitectural changes (e.g., cache updates), caused by the speculatively executed instructions, are commonly left in the system. These changes can be used to leak sensitive information, which has led to a frantic search for solutions that can eliminate such security flaws. The contribution of this work is an evaluation of the cost of hiding speculative side-effects in the cache hierarchy, making them visible only after the speculation has been resolved. For this, we compare (for the first time) two broad approaches: i) waiting for loads to become non-speculative before issuing them to the memory system, and ii) eliminating the side-effects of speculation, a solution consisting of invisible loads (Ghost loads) and performance optimizations (Ghost Buffer and Materialization). While previous work, InvisiSpec, has proposed a similar solution to our latter approach, it has done so with only a minimal evaluation and at a significant performance cost. The detailed evaluation of our solutions shows that: i) waiting for loads to become non-speculative is no more costly than the previously proposed InvisiSpec solution, albeit much simpler, non-invasive in the memory system, and stronger security-wise; ii) hiding speculation with Ghost loads (in the context of a relaxed memory model) can be achieved at the cost of 12% performance degradation and 9% energy increase, which is significantly better that the previous state-of-the-art solution.
{"title":"Ghost loads: what is the cost of invisible speculation?","authors":"Christos Sakalis, M. Alipour, Alberto Ros, A. Jimborean, S. Kaxiras, Magnus Själander","doi":"10.1145/3310273.3321558","DOIUrl":"https://doi.org/10.1145/3310273.3321558","url":null,"abstract":"Speculative execution is necessary for achieving high performance on modern general-purpose CPUs but, starting with Spectre and Meltdown, it has also been proven to cause severe security flaws. In case of a misspeculation, the architectural state is restored to assure functional correctness but a multitude of microarchitectural changes (e.g., cache updates), caused by the speculatively executed instructions, are commonly left in the system. These changes can be used to leak sensitive information, which has led to a frantic search for solutions that can eliminate such security flaws. The contribution of this work is an evaluation of the cost of hiding speculative side-effects in the cache hierarchy, making them visible only after the speculation has been resolved. For this, we compare (for the first time) two broad approaches: i) waiting for loads to become non-speculative before issuing them to the memory system, and ii) eliminating the side-effects of speculation, a solution consisting of invisible loads (Ghost loads) and performance optimizations (Ghost Buffer and Materialization). While previous work, InvisiSpec, has proposed a similar solution to our latter approach, it has done so with only a minimal evaluation and at a significant performance cost. The detailed evaluation of our solutions shows that: i) waiting for loads to become non-speculative is no more costly than the previously proposed InvisiSpec solution, albeit much simpler, non-invasive in the memory system, and stronger security-wise; ii) hiding speculation with Ghost loads (in the context of a relaxed memory model) can be achieved at the cost of 12% performance degradation and 9% energy increase, which is significantly better that the previous state-of-the-art solution.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134639974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Approximate computing allows the introduction of inaccuracy in the computation for cost savings, such as energy consumption, chip-area, and latency. Targeting energy efficiency, approximate designs for multipliers, adders, and multiply-accumulate (MAC) have been extensively investigated in the past decade. However, accelerator designs for relatively bigger architectures have been of less attention yet. The Least Squares (LS) algorithm is widely used in digital signal processing applications, e.g., image reconstruction. This work proposes a novel LS accelerator design based on a heterogeneous architecture, where the heterogeneity is introduced using accurate and approximate processing cores. We have considered a case study of radio astronomy calibration processing that employs a complex-input iterative LS algorithm. Our proposed methodology exploits the intrinsic error-resilience of the aforesaid algorithm, where initial iterations are processed on approximate modules while the later ones on accurate modules. Our energy-quality experiments have shown up to 24% of energy savings as compared to an accurate (optimized) counterpart for biased designs and up to 29% energy savings when unbiasing is introduced. The proposed LS accelerator design does not increase the number of iterations and provides sufficient precision to converge to an acceptable solution.
{"title":"Energy-efficient approximate least squares accelerator: a case study of radio astronomy calibration processing","authors":"G. Gillani, A. Krapukhin, A. Kokkeler","doi":"10.1145/3310273.3323161","DOIUrl":"https://doi.org/10.1145/3310273.3323161","url":null,"abstract":"Approximate computing allows the introduction of inaccuracy in the computation for cost savings, such as energy consumption, chip-area, and latency. Targeting energy efficiency, approximate designs for multipliers, adders, and multiply-accumulate (MAC) have been extensively investigated in the past decade. However, accelerator designs for relatively bigger architectures have been of less attention yet. The Least Squares (LS) algorithm is widely used in digital signal processing applications, e.g., image reconstruction. This work proposes a novel LS accelerator design based on a heterogeneous architecture, where the heterogeneity is introduced using accurate and approximate processing cores. We have considered a case study of radio astronomy calibration processing that employs a complex-input iterative LS algorithm. Our proposed methodology exploits the intrinsic error-resilience of the aforesaid algorithm, where initial iterations are processed on approximate modules while the later ones on accurate modules. Our energy-quality experiments have shown up to 24% of energy savings as compared to an accurate (optimized) counterpart for biased designs and up to 29% energy savings when unbiasing is introduced. The proposed LS accelerator design does not increase the number of iterations and provides sufficient precision to converge to an acceptable solution.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132072261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sebastian Wallat, Nils Albartus, Steffen Becker, Max Hoffmann, Maik Ender, Marc Fyrbiak, Adrian Drees, Sebastian Maaßen, C. Paar
Since hardware oftentimes serves as the root of trust in our modern interconnected world, malicious hardware manipulations constitute a ubiquitous threat in the context of the Internet of Things (IoT). Hardware reverse engineering is a prevalent technique to detect such manipulations. Over the last years, an active research community has significantly advanced the field of hardware reverse engineering. Notably, many open research questions regarding the extraction of functionally correct netlists from Field Programmable Gate Arrays (FPGAs) or Application Specific Integrated Circuits (ASICs) have been tackled. In order to facilitate further analysis of recovered netlists, a software framework is required, serving as the foundation for specialized algorithms. Currently, no such framework is publicly available. Therefore, we provide the first open-source gate-library agnostic framework for gate-level netlist analysis. In this positional paper, we demonstrate the workflow of our modular framework HAL on the basis of two case studies and provide profound insights on its technical foundations.
{"title":"Highway to HAL: open-sourcing the first extendable gate-level netlist reverse engineering framework","authors":"Sebastian Wallat, Nils Albartus, Steffen Becker, Max Hoffmann, Maik Ender, Marc Fyrbiak, Adrian Drees, Sebastian Maaßen, C. Paar","doi":"10.1145/3310273.3323419","DOIUrl":"https://doi.org/10.1145/3310273.3323419","url":null,"abstract":"Since hardware oftentimes serves as the root of trust in our modern interconnected world, malicious hardware manipulations constitute a ubiquitous threat in the context of the Internet of Things (IoT). Hardware reverse engineering is a prevalent technique to detect such manipulations. Over the last years, an active research community has significantly advanced the field of hardware reverse engineering. Notably, many open research questions regarding the extraction of functionally correct netlists from Field Programmable Gate Arrays (FPGAs) or Application Specific Integrated Circuits (ASICs) have been tackled. In order to facilitate further analysis of recovered netlists, a software framework is required, serving as the foundation for specialized algorithms. Currently, no such framework is publicly available. Therefore, we provide the first open-source gate-library agnostic framework for gate-level netlist analysis. In this positional paper, we demonstrate the workflow of our modular framework HAL on the basis of two case studies and provide profound insights on its technical foundations.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"060 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133816425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the rise of accelerators (e.g., GPUs, FPGAs, and APUs) in computing systems, the parallel computing community needs better tools and mechanisms with which to productively extract performance. While modern compilers provide flags to activate different optimizations to improve performance, the effectiveness of such automated optimization depends on the algorithm and its mapping to the underlying accelerator architecture. Currently, however, extracting the best performance from an algorithm on an accelerator requires significant expertise and manual effort to exploit both spatial and temporal sharing of computing resources in order to improve overall performance. In particular, maximizing the performance on an algorithm on an accelerator requires extensive hyperparameter (e.g., thread-block size) selection and tuning. Given the myriad of hyperparameter dimensions to optimize across, the search space of optimizations is generally extremely large, making it infeasible to exhaustively evaluate each optimization configuration. This paper proposes an approach that uses statistical analysis with iterative machine learning (IterML) to prune and tune hyper-parameters to achieve better performance. During each iteration, we leverage machine-learning (ML) models to provide pruning and tuning guidance for the subsequent iterations. We evaluate our IterML approach on the selection of the GPU thread-block size across many benchmarks running on an NVIDIA P100 or V100 GPU. The experimental results show that our IterML approach can significantly reduce (i.e., improve) the search effort by 40% to 80%.
{"title":"Iterative machine learning (IterML) for effective parameter pruning and tuning in accelerators","authors":"Xuewen Cui, Wu-chun Feng","doi":"10.1145/3310273.3321563","DOIUrl":"https://doi.org/10.1145/3310273.3321563","url":null,"abstract":"With the rise of accelerators (e.g., GPUs, FPGAs, and APUs) in computing systems, the parallel computing community needs better tools and mechanisms with which to productively extract performance. While modern compilers provide flags to activate different optimizations to improve performance, the effectiveness of such automated optimization depends on the algorithm and its mapping to the underlying accelerator architecture. Currently, however, extracting the best performance from an algorithm on an accelerator requires significant expertise and manual effort to exploit both spatial and temporal sharing of computing resources in order to improve overall performance. In particular, maximizing the performance on an algorithm on an accelerator requires extensive hyperparameter (e.g., thread-block size) selection and tuning. Given the myriad of hyperparameter dimensions to optimize across, the search space of optimizations is generally extremely large, making it infeasible to exhaustively evaluate each optimization configuration. This paper proposes an approach that uses statistical analysis with iterative machine learning (IterML) to prune and tune hyper-parameters to achieve better performance. During each iteration, we leverage machine-learning (ML) models to provide pruning and tuning guidance for the subsequent iterations. We evaluate our IterML approach on the selection of the GPU thread-block size across many benchmarks running on an NVIDIA P100 or V100 GPU. The experimental results show that our IterML approach can significantly reduce (i.e., improve) the search effort by 40% to 80%.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122371191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose an idea to speed up instruction execution through a probabilistic approach, using the parallelism offered by quantum computers. For this, we divide the instruction set of an arbitrary quantum instruction set architecture (QISA) into separate groups and then bias certain qubits representing the group so that only the instructions within the group have a high probability of getting executed in a quantum processor. Therefore, the result generated will be the superimposition of the qubits as if all the instructions within the group were executed simultaneously. We show that we can achieve a significant design improvement compared to classical computer.
{"title":"Realizing parallelism in quantum MISD architecture","authors":"Suvadip Batabyal, Kounteya Sarkar","doi":"10.1145/3310273.3322823","DOIUrl":"https://doi.org/10.1145/3310273.3322823","url":null,"abstract":"We propose an idea to speed up instruction execution through a probabilistic approach, using the parallelism offered by quantum computers. For this, we divide the instruction set of an arbitrary quantum instruction set architecture (QISA) into separate groups and then bias certain qubits representing the group so that only the instructions within the group have a high probability of getting executed in a quantum processor. Therefore, the result generated will be the superimposition of the qubits as if all the instructions within the group were executed simultaneously. We show that we can achieve a significant design improvement compared to classical computer.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121096294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Z. Al-Ars, T. Basten, A. D. Beer, M. Geilen, Dip Goswami, P. Jääskeläinen, J. Kadlec, M. Alejandro, F. Palumbo, G. Peeren, L. Pomante, F. V. Linden, Jukka Saarinen, T. Säntti, Carlo Sau, M. Zedda
Cyber-Physical Systems (CPS) are systems that are in feedback with their environment, possibly with humans in the loop. They are often distributed with sensors and actuators, smart, adaptive and predictive and react in real-time. Image- and video-processing pipelines are a prime source for environmental information improving the possibilities of active, relevant feedback. In such a context, FitOptiVis aims to provide end-to-end multi-objective optimization for imaging and video pipelines of CPS, with emphasis on energy and performance, leveraging on a reference architecture, supported by low-power, high-performance, smart devices, and by methods and tools for combined design-time and run-time multi-objective optimization within system and environment constraints.
{"title":"The FitOptiVis ECSEL project: highly efficient distributed embedded image/video processing in cyber-physical systems","authors":"Z. Al-Ars, T. Basten, A. D. Beer, M. Geilen, Dip Goswami, P. Jääskeläinen, J. Kadlec, M. Alejandro, F. Palumbo, G. Peeren, L. Pomante, F. V. Linden, Jukka Saarinen, T. Säntti, Carlo Sau, M. Zedda","doi":"10.1145/3310273.3323437","DOIUrl":"https://doi.org/10.1145/3310273.3323437","url":null,"abstract":"Cyber-Physical Systems (CPS) are systems that are in feedback with their environment, possibly with humans in the loop. They are often distributed with sensors and actuators, smart, adaptive and predictive and react in real-time. Image- and video-processing pipelines are a prime source for environmental information improving the possibilities of active, relevant feedback. In such a context, FitOptiVis aims to provide end-to-end multi-objective optimization for imaging and video pipelines of CPS, with emphasis on energy and performance, leveraging on a reference architecture, supported by low-power, high-performance, smart devices, and by methods and tools for combined design-time and run-time multi-objective optimization within system and environment constraints.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"164 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114480535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper investigates the possibility of using an analog quantum computer as commercialized by D-Wave to solve large QUBO problems by means of a single invocation of the quantum annealer. Indeed this machine solves a spin glass problem with programmable coefficients but subject to quite strong topology restrictions on the set of non-zero coefficients. Rather than mapping problem variables onto multiple qbits, an approach which requires many invocations of the annealer to solve small size problems, it is tempting to investigate the existence of sparse relaxations compliant with the qbits interconnection topology of the machine, hence solvable in one invocation of the annealing oracle, but still providing good-quality solutions to the original problem. This paper provides an experimental setup which aims to determine whether or not such convenient relaxations do exist or, rather, are easy to find. Our experiments suggest that it is not the case and, therefore, that solving even moderate size arbitrary problems with a single call to a quantum annealer is not possible at least within the constraints of the so-called Chimera topology. We conclude the paper with a number of perspectives that this results imply on the design of heuristics taking profit of a quantum annealing oracle to solve large scale problems.
{"title":"On the limitations of the chimera graph topology in using analog quantum computers","authors":"D. Vert, Renaud Sirdey, Stéphane Louise","doi":"10.1145/3310273.3322830","DOIUrl":"https://doi.org/10.1145/3310273.3322830","url":null,"abstract":"This paper investigates the possibility of using an analog quantum computer as commercialized by D-Wave to solve large QUBO problems by means of a single invocation of the quantum annealer. Indeed this machine solves a spin glass problem with programmable coefficients but subject to quite strong topology restrictions on the set of non-zero coefficients. Rather than mapping problem variables onto multiple qbits, an approach which requires many invocations of the annealer to solve small size problems, it is tempting to investigate the existence of sparse relaxations compliant with the qbits interconnection topology of the machine, hence solvable in one invocation of the annealing oracle, but still providing good-quality solutions to the original problem. This paper provides an experimental setup which aims to determine whether or not such convenient relaxations do exist or, rather, are easy to find. Our experiments suggest that it is not the case and, therefore, that solving even moderate size arbitrary problems with a single call to a quantum annealer is not possible at least within the constraints of the so-called Chimera topology. We conclude the paper with a number of perspectives that this results imply on the design of heuristics taking profit of a quantum annealing oracle to solve large scale problems.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125352015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cyber-Physical Systems (CPS) are becoming, without pace, more pervasive into embedded systems. Artificial Intelligence, Machine Learning and Deep Learning are mostly confined into the cloud, where unlimited computing resources seems to be available and evolving tirelessly. Unfortunately a layered architecture in which dumb sensors are attached to the cloud would become quickly too centralized, poorly scalable and slowly responsive in the IoT expected scenario that will deploy hundreds of billions of sensors communicating through low data rate networks. In that context, STMicroelectronics is developing solutions to bring Artificial Intelligence closer to the sensors. This talk will review new intelligent technological solutions and mechanisms under development and publicly announced, namely STM32CUBE.AI. The talk will tell how they represent the key ingredients needed to design the current and future generation of artificial intelligent cyber-physical embedded systems and derived applications based on STMicroelectronics heterogeneous sensors, micro controllers and SoCs. In particular, aspects related on how address current interoperability, productivity and constrained embedded resource gaps will be discussed with practical examples based on STM32CUBE.AI. Moreover, the investigation and design of adaptive and cognitive computational-intelligence techniques able to learn, adopting artificial neural networks, and operate in nonstationary environments will be introduced. Finally, the deployment of networked intelligent cyber-physical systems, able to operate in time varying environments, will be also commented.
{"title":"Artificial intelligent sensors at the core of cyber-physical-systems: from theory to practical applications","authors":"D. Pau","doi":"10.1145/3310273.3324019","DOIUrl":"https://doi.org/10.1145/3310273.3324019","url":null,"abstract":"Cyber-Physical Systems (CPS) are becoming, without pace, more pervasive into embedded systems. Artificial Intelligence, Machine Learning and Deep Learning are mostly confined into the cloud, where unlimited computing resources seems to be available and evolving tirelessly. Unfortunately a layered architecture in which dumb sensors are attached to the cloud would become quickly too centralized, poorly scalable and slowly responsive in the IoT expected scenario that will deploy hundreds of billions of sensors communicating through low data rate networks. In that context, STMicroelectronics is developing solutions to bring Artificial Intelligence closer to the sensors. This talk will review new intelligent technological solutions and mechanisms under development and publicly announced, namely STM32CUBE.AI. The talk will tell how they represent the key ingredients needed to design the current and future generation of artificial intelligent cyber-physical embedded systems and derived applications based on STMicroelectronics heterogeneous sensors, micro controllers and SoCs. In particular, aspects related on how address current interoperability, productivity and constrained embedded resource gaps will be discussed with practical examples based on STM32CUBE.AI. Moreover, the investigation and design of adaptive and cognitive computational-intelligence techniques able to learn, adopting artificial neural networks, and operate in nonstationary environments will be introduced. Finally, the deployment of networked intelligent cyber-physical systems, able to operate in time varying environments, will be also commented.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117301897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhou Cheng, T. Qi, Jixiang Wang, Yu Zhou, Zhihong Wang, Yi Guo, Junfeng Zhao
Sentiment analysis is significant for excavating text opinion. There are two issues in the foreign exchange (Forex) field. 1) In sentiment orientation, most researches focus on product reviews, lack fine-grained sentiment analysis for Forex news. 2) In sentiment intensity, most works consider the intensity of sentiment words but ignore the significance of field characteristics. Aiming at the two problems, a fine-grained Sentiment Analysis model (shorted as WD-SA) is established, which integrates with the Weight of sentiment words and Domain features. First, the semantic information of text is embedded into a vector based on word2vec. Then, sentiment orientation is detected by a method, which combines machine learning algorithm and the weight of sentiment words. Finally, features are extracted to investigate the intensity of news. The experimental results show that our algorithm outperforms the state-of-the-art.
{"title":"Sentiment evaluation of forex news","authors":"Zhou Cheng, T. Qi, Jixiang Wang, Yu Zhou, Zhihong Wang, Yi Guo, Junfeng Zhao","doi":"10.1145/3310273.3322821","DOIUrl":"https://doi.org/10.1145/3310273.3322821","url":null,"abstract":"Sentiment analysis is significant for excavating text opinion. There are two issues in the foreign exchange (Forex) field. 1) In sentiment orientation, most researches focus on product reviews, lack fine-grained sentiment analysis for Forex news. 2) In sentiment intensity, most works consider the intensity of sentiment words but ignore the significance of field characteristics. Aiming at the two problems, a fine-grained Sentiment Analysis model (shorted as WD-SA) is established, which integrates with the Weight of sentiment words and Domain features. First, the semantic information of text is embedded into a vector based on word2vec. Then, sentiment orientation is detected by a method, which combines machine learning algorithm and the weight of sentiment words. Finally, features are extracted to investigate the intensity of news. The experimental results show that our algorithm outperforms the state-of-the-art.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"193 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134122213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Large computer systems, like those in the TOP 500 ranking, comprise about hundreds of thousands cores. Simulating application execution in these systems is very complex and costly. This article explores the option of using application skeletons, together with an analytic simulator, to study the performance of these large systems. With this aim, the Dimemas simulator has been enhanced with the capability of simulating application skeletons. This enhancement allows simulating the skeleton of Lulesh, an application with 90k processes in a single day. In addition, it also generates traces, which is of great value to validate skeletons and simulations.
{"title":"Simulation with skeletons of applications using dimemas","authors":"C. Camarero, C. Martínez, J. L. Bosque","doi":"10.1145/3310273.3322827","DOIUrl":"https://doi.org/10.1145/3310273.3322827","url":null,"abstract":"Large computer systems, like those in the TOP 500 ranking, comprise about hundreds of thousands cores. Simulating application execution in these systems is very complex and costly. This article explores the option of using application skeletons, together with an analytic simulator, to study the performance of these large systems. With this aim, the Dimemas simulator has been enhanced with the capability of simulating application skeletons. This enhancement allows simulating the skeleton of Lulesh, an application with 90k processes in a single day. In addition, it also generates traces, which is of great value to validate skeletons and simulations.","PeriodicalId":431860,"journal":{"name":"Proceedings of the 16th ACM International Conference on Computing Frontiers","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131383676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}