Pub Date : 2018-11-01DOI: 10.1109/CANDARW.2018.00029
Kazuichi Oe, T. Nanri
Automated tiered storage with fast memory and slow flash storage (ATSMF) is a hybrid storage system located between non-volatile memories (NVMs) and solid state drives (SSDs). ATSMF aims to reduce average response time for inputoutput (IO) accesses by migrating concentrated IO access areas from SSD to NVM. However, the current ATSMF implementation cannot reduce average response time sufficiently because of the bottleneck caused by the Linux brd driver, which is used for the NVM access driver. The response time of the brd driver is more than ten times larger than memory access speed. To reduce the average response time sufficiently, we developed a block-level driver for NVM called a "two-mode (2M) memory driver." The 2M memory driver has both the. map IO access mode and direct IO access mode to reduce the response time while maintaining compatibility with the Linux device-mapper framework. The direct IO access mode has a drastically lower response time than the Linux brd driver because the ATSMF driver can execute the IO access function of 2M memory driver directly. Experimental results also indicate that ATSMF using the 2M memory driver reduces the IO access response time to less than that of ATSMF using the Linux brd driver in most cases.
{"title":"Non-volatile Memory Driver for Applying Automated Tiered Storage with Fast Memory and Slow Flash Storage","authors":"Kazuichi Oe, T. Nanri","doi":"10.1109/CANDARW.2018.00029","DOIUrl":"https://doi.org/10.1109/CANDARW.2018.00029","url":null,"abstract":"Automated tiered storage with fast memory and slow flash storage (ATSMF) is a hybrid storage system located between non-volatile memories (NVMs) and solid state drives (SSDs). ATSMF aims to reduce average response time for inputoutput (IO) accesses by migrating concentrated IO access areas from SSD to NVM. However, the current ATSMF implementation cannot reduce average response time sufficiently because of the bottleneck caused by the Linux brd driver, which is used for the NVM access driver. The response time of the brd driver is more than ten times larger than memory access speed. To reduce the average response time sufficiently, we developed a block-level driver for NVM called a \"two-mode (2M) memory driver.\" The 2M memory driver has both the. map IO access mode and direct IO access mode to reduce the response time while maintaining compatibility with the Linux device-mapper framework. The direct IO access mode has a drastically lower response time than the Linux brd driver because the ATSMF driver can execute the IO access function of 2M memory driver directly. Experimental results also indicate that ATSMF using the 2M memory driver reduces the IO access response time to less than that of ATSMF using the Linux brd driver in most cases.","PeriodicalId":329439,"journal":{"name":"2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114495075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/CANDARW.2018.00085
Hiroya Miura, M. Mimura, Hidema Tanaka
In recent years, the number of targeted email attacks using malicious macros has been increasing. Malicious macros are malware which is written in Visual Basic for Application. Since much source code of malicious macros is highly obfuscated, the source code contains many obfuscated words such as random numbers or strings. Today, new malware families are frequently discovered. To detect unseen malicious macros, previous work proposed a method using natural language techniques. The proposed method separates macro's source code into words, and detects malicious macros based on the appearance frequency. This method could detect unseen malicious macros. However, the unseen malicious macros might consist of known malware families. Furthermore, the mechanism and effectiveness of this method are not clear. In particular, detecting new malware families is a top priority. Hence, this paper reveals the mechanism and effectiveness of this method to detect new malware families. Our experiment shows that using only malicious macros for feature extraction and consolidating obfuscated words into a word were effective. We confirmed this method could discover 89% of new malware families.
近年来,使用恶意宏的针对性邮件攻击数量不断增加。恶意宏是用Visual Basic for Application编写的恶意软件。由于许多恶意宏的源代码是高度混淆的,因此源代码包含许多混淆的单词,例如随机数或字符串。今天,新的恶意软件家族经常被发现。为了检测看不见的恶意宏,以前的工作提出了一种使用自然语言技术的方法。该方法将宏的源代码分离成单词,并根据出现频率检测恶意宏。此方法可以检测不可见的恶意宏。然而,看不见的恶意宏可能由已知的恶意软件家族组成。此外,该方法的机制和有效性尚不清楚。特别是,检测新的恶意软件家族是当务之急。因此,本文揭示了该方法检测新恶意软件家族的机制和有效性。我们的实验表明,仅使用恶意宏进行特征提取并将混淆的单词合并为一个单词是有效的。我们证实这种方法可以发现89%的新恶意软件家族。
{"title":"Discovering New Malware Families Using a Linguistic-Based Macros Detection Method","authors":"Hiroya Miura, M. Mimura, Hidema Tanaka","doi":"10.1109/CANDARW.2018.00085","DOIUrl":"https://doi.org/10.1109/CANDARW.2018.00085","url":null,"abstract":"In recent years, the number of targeted email attacks using malicious macros has been increasing. Malicious macros are malware which is written in Visual Basic for Application. Since much source code of malicious macros is highly obfuscated, the source code contains many obfuscated words such as random numbers or strings. Today, new malware families are frequently discovered. To detect unseen malicious macros, previous work proposed a method using natural language techniques. The proposed method separates macro's source code into words, and detects malicious macros based on the appearance frequency. This method could detect unseen malicious macros. However, the unseen malicious macros might consist of known malware families. Furthermore, the mechanism and effectiveness of this method are not clear. In particular, detecting new malware families is a top priority. Hence, this paper reveals the mechanism and effectiveness of this method to detect new malware families. Our experiment shows that using only malicious macros for feature extraction and consolidating obfuscated words into a word were effective. We confirmed this method could discover 89% of new malware families.","PeriodicalId":329439,"journal":{"name":"2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)","volume":"332 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122844200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/CANDARW.2018.00096
Hideo Inagaki, Ryota Kawashima, H. Matsuo
Memory-and-Disk caching is a common caching mechanism for temporal output in Apache Spark. However, it causes performance degradation when memory usage has reached its limit because of the Spark's LRU (Least Recently Used) based cache management. Existing studies have reported that replacement of LRU-based cache mechanism to LRC (Least Reference Count) based one that is a more accurate indicator of the likelihood of future data access. However, frequently used partitions cannot be determined because Spark accesses all of partitions for user-driven RDD operations, even if partitions do not include necessary data. In this paper, we propose a cache management method that enables allocating necessary partitions to the memory by introducing the bloom filter into existing methods. The bloom filter prevents unnecessary partitions from being processed because partitions are checked whether required data is contained. Furthermore, frequently used partitions can be properly determined by measuring the reference count of partitions. We implemented two architecture types, the driver-side bloom filter and the executor-side bloom filter, to consider the optimal place of the bloom filter. Evaluation results showed that the execution time of the driver-side implementation was reduced by 89% in a filter-test benchmark based on the LRC-based method.
{"title":"Improving Apache Spark's Cache Mechanism with LRC-Based Method Using Bloom Filter","authors":"Hideo Inagaki, Ryota Kawashima, H. Matsuo","doi":"10.1109/CANDARW.2018.00096","DOIUrl":"https://doi.org/10.1109/CANDARW.2018.00096","url":null,"abstract":"Memory-and-Disk caching is a common caching mechanism for temporal output in Apache Spark. However, it causes performance degradation when memory usage has reached its limit because of the Spark's LRU (Least Recently Used) based cache management. Existing studies have reported that replacement of LRU-based cache mechanism to LRC (Least Reference Count) based one that is a more accurate indicator of the likelihood of future data access. However, frequently used partitions cannot be determined because Spark accesses all of partitions for user-driven RDD operations, even if partitions do not include necessary data. In this paper, we propose a cache management method that enables allocating necessary partitions to the memory by introducing the bloom filter into existing methods. The bloom filter prevents unnecessary partitions from being processed because partitions are checked whether required data is contained. Furthermore, frequently used partitions can be properly determined by measuring the reference count of partitions. We implemented two architecture types, the driver-side bloom filter and the executor-side bloom filter, to consider the optimal place of the bloom filter. Evaluation results showed that the execution time of the driver-side implementation was reduced by 89% in a filter-test benchmark based on the LRC-based method.","PeriodicalId":329439,"journal":{"name":"2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)","volume":"243 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121029368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/CANDARW.2018.00009
Guilherme David Branco, J. Bordim
Radio signals may contribute to seamless interactions with physical objects providing means to guide users from their position to a particular object within a room or store for instance. To achieve such a goal, a mechanism is needed to allow users to identify and locate objects of interest. Trilateration, fingerprinting and particle filter are usually employed as mechanisms for position estimation in indoor environments. This paper explores the the use of Genetic Algorithms (GA) combined with Particle Filter (PF) mechanism as an alternative to estimate indoor object position. The proposed scheme, named EPF (Evolutionary Particle Filter) has been compared to particle filter and trilateration. Simulation results show that the proposed EPF improves positioning accuracy by 1.5 cm (10%) and 30 cm (300%) over particle filter and trilateration, respectively.
{"title":"Employing Genetic Algorithm and Particle Filtering as an Alternative for Indoor Device Positioning","authors":"Guilherme David Branco, J. Bordim","doi":"10.1109/CANDARW.2018.00009","DOIUrl":"https://doi.org/10.1109/CANDARW.2018.00009","url":null,"abstract":"Radio signals may contribute to seamless interactions with physical objects providing means to guide users from their position to a particular object within a room or store for instance. To achieve such a goal, a mechanism is needed to allow users to identify and locate objects of interest. Trilateration, fingerprinting and particle filter are usually employed as mechanisms for position estimation in indoor environments. This paper explores the the use of Genetic Algorithms (GA) combined with Particle Filter (PF) mechanism as an alternative to estimate indoor object position. The proposed scheme, named EPF (Evolutionary Particle Filter) has been compared to particle filter and trilateration. Simulation results show that the proposed EPF improves positioning accuracy by 1.5 cm (10%) and 30 cm (300%) over particle filter and trilateration, respectively.","PeriodicalId":329439,"journal":{"name":"2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126274887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/CANDARW.2018.00043
M. Z. N. L. Saavedra, W. E. Yu
Network traffic continues to grow yearly at a compounded rate. However, network traffic is still being analyzed on vertically scaled machines that do not scale as well as distributed computing platforms. Hadoop's horizontally scalable ecosystem provides a better environment for processing these network captures stored in packet capture (PCAP) files. This paper proposes a framework called hcap for analyzing PCAPs on Hadoop inspired by the Rseaux IP Europens' (RIPE's) existing hadoop-pcap library but built completely from the ground up. The hcap framework improves several aspects of the hadoop-pcap library, namely protocol, error, and log handling. Results show that, while other methods still outperform hcap, it not only performs better than hadoop-pcap by 15% in scan queries and 18% in join queries, but it's more tolerant to broken PCAP entries which reduces preprocessing time and data loss, while also speeding up the conversion process used in other methods by 85%.
网络流量每年继续以复合速度增长。然而,网络流量仍然是在垂直扩展的机器上进行分析的,这种机器的扩展能力不如分布式计算平台。Hadoop的水平可扩展生态系统为处理存储在包捕获(PCAP)文件中的网络捕获提供了更好的环境。本文提出了一个名为hcap的框架,用于分析Hadoop上的pcap,该框架的灵感来自Rseaux IP Europens (RIPE)现有的Hadoop -pcap库,但完全是从头开始构建的。hcap框架改进了hadoop-pcap库的几个方面,即协议、错误和日志处理。结果表明,虽然其他方法的性能仍然优于hcap,但hcap不仅在扫描查询方面比hadoop-pcap性能好15%,在连接查询方面比hadoop-pcap性能好18%,而且它对PCAP条目的损坏更宽容,从而减少了预处理时间和数据丢失,同时还将其他方法中使用的转换过程加快了85%。
{"title":"Towards Large Scale Packet Capture and Network Flow Analysis on Hadoop","authors":"M. Z. N. L. Saavedra, W. E. Yu","doi":"10.1109/CANDARW.2018.00043","DOIUrl":"https://doi.org/10.1109/CANDARW.2018.00043","url":null,"abstract":"Network traffic continues to grow yearly at a compounded rate. However, network traffic is still being analyzed on vertically scaled machines that do not scale as well as distributed computing platforms. Hadoop's horizontally scalable ecosystem provides a better environment for processing these network captures stored in packet capture (PCAP) files. This paper proposes a framework called hcap for analyzing PCAPs on Hadoop inspired by the Rseaux IP Europens' (RIPE's) existing hadoop-pcap library but built completely from the ground up. The hcap framework improves several aspects of the hadoop-pcap library, namely protocol, error, and log handling. Results show that, while other methods still outperform hcap, it not only performs better than hadoop-pcap by 15% in scan queries and 18% in join queries, but it's more tolerant to broken PCAP entries which reduces preprocessing time and data loss, while also speeding up the conversion process used in other methods by 85%.","PeriodicalId":329439,"journal":{"name":"2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121399392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/CANDARW.2018.00102
Y. Nishida, Kosuke Kaneko, Subodh Sharma, K. Sakurai
Swarm robotics is a research field in which a group of autonomous robots execute tasks through cooperative works. Sharing information among robots is a central function for an optimal performance of the system. Given that the swarm network structure constantly changes when robots move, it becomes difficult to guarantee on information sharing by all swarm members. We, in this work, propose an approach for information sharing on swarm robotic systems by using Blockchain technology. A function of distributed ledger in Blockchain technology has possibility to solve the information sharing problem and to easily synchronize their state. However, because Blockchain persistently keeps past transactions, the increase of its chain size is one of the serious issues to manage Blockchain technology. In this paper, we introduce a methodology to share information among autonomous robots and demonstrate through experiments that how the differences in data size recorded in the blockchain affect the chain size. As a result, compared with our previous approach, we succeeded in suppressing increase in chain size by using the proposal approach; it was reduced the amount of increase in chain size about 73.0% when each node repeatedly shared about 2.8KB image data by 100 times.
{"title":"Suppressing Chain Size of Blockchain-Based Information Sharing for Swarm Robotic Systems","authors":"Y. Nishida, Kosuke Kaneko, Subodh Sharma, K. Sakurai","doi":"10.1109/CANDARW.2018.00102","DOIUrl":"https://doi.org/10.1109/CANDARW.2018.00102","url":null,"abstract":"Swarm robotics is a research field in which a group of autonomous robots execute tasks through cooperative works. Sharing information among robots is a central function for an optimal performance of the system. Given that the swarm network structure constantly changes when robots move, it becomes difficult to guarantee on information sharing by all swarm members. We, in this work, propose an approach for information sharing on swarm robotic systems by using Blockchain technology. A function of distributed ledger in Blockchain technology has possibility to solve the information sharing problem and to easily synchronize their state. However, because Blockchain persistently keeps past transactions, the increase of its chain size is one of the serious issues to manage Blockchain technology. In this paper, we introduce a methodology to share information among autonomous robots and demonstrate through experiments that how the differences in data size recorded in the blockchain affect the chain size. As a result, compared with our previous approach, we succeeded in suppressing increase in chain size by using the proposal approach; it was reduced the amount of increase in chain size about 73.0% when each node repeatedly shared about 2.8KB image data by 100 times.","PeriodicalId":329439,"journal":{"name":"2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132371256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/CANDARW.2018.00091
Yuetong Zhu, Danilo Vasconcellos Vargas, K. Sakurai
Modern cryptographic schemes is developed based on the mathematical theory. Recently works show a new direction about cryptography based on the neural networks. Instead of learning a specific algorithm, a cryptographic scheme is generated automatically. While one kind of neural network is used to achieve the scheme, the idea of the neural cryptography can be realized by other neural network architecture is unknown. In this paper, we make use of this property to create neural cryptography scheme on a new topology evolving neural network architecture called Spectrum-diverse unified neuroevolution architecture. First, experiments are conducted to verify that Spectrum-diverse unified neuroevolution architecture is able to achieve automatic encryption and decryption. Subsequently, we do experiments to achieve the neural symmetric cryptosystem by using adversarial training.
{"title":"Neural Cryptography Based on the Topology Evolving Neural Networks","authors":"Yuetong Zhu, Danilo Vasconcellos Vargas, K. Sakurai","doi":"10.1109/CANDARW.2018.00091","DOIUrl":"https://doi.org/10.1109/CANDARW.2018.00091","url":null,"abstract":"Modern cryptographic schemes is developed based on the mathematical theory. Recently works show a new direction about cryptography based on the neural networks. Instead of learning a specific algorithm, a cryptographic scheme is generated automatically. While one kind of neural network is used to achieve the scheme, the idea of the neural cryptography can be realized by other neural network architecture is unknown. In this paper, we make use of this property to create neural cryptography scheme on a new topology evolving neural network architecture called Spectrum-diverse unified neuroevolution architecture. First, experiments are conducted to verify that Spectrum-diverse unified neuroevolution architecture is able to achieve automatic encryption and decryption. Subsequently, we do experiments to achieve the neural symmetric cryptosystem by using adversarial training.","PeriodicalId":329439,"journal":{"name":"2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132454304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/CANDARW.2018.00077
J. Tada
In the high-associativity caches, the hardware overheads of the cache replacement policy become problem. To avoid this problem, the Adaptive Demotion Policy (ADP) is proposed. The ADP focuses on the priority value demotion at a cache miss, and it can achieve a higher performance compared with conventional cache replacement policies. The ADP can be implemented with small hardware resources, and the priority value update logic can be implemented with a small hardware cost. The ADP can suit for various applications by the appropriate selection of its insertion, promotion and selection policies. If the dynamic selection of the suitable policies for the running application is possible, the performance of the cache replacement policy will be increased. In order to achieve the dynamic selection of the suitable policies, this paper focuses on the global fluctuations of the priority values. At first, the cache is partitioned into several partitions. At every cache access, the total of priority values in each partition is calculated. At every set interval, the fluctuations of total priority values in all the partitions are checked, and the information is used to detect the behavior of the application. This paper adapts this mechanism to the ADP, and the adapted cache replacement policy is called the ADP-G. The performance evaluation shows that the ADP-G achieves the MPKI reductions and the IPC improvements, compared to the LRU policy, the RRIP policy and the ADP.
{"title":"A Cache Replacement Policy with Considering Global Fluctuations of Priority Values","authors":"J. Tada","doi":"10.1109/CANDARW.2018.00077","DOIUrl":"https://doi.org/10.1109/CANDARW.2018.00077","url":null,"abstract":"In the high-associativity caches, the hardware overheads of the cache replacement policy become problem. To avoid this problem, the Adaptive Demotion Policy (ADP) is proposed. The ADP focuses on the priority value demotion at a cache miss, and it can achieve a higher performance compared with conventional cache replacement policies. The ADP can be implemented with small hardware resources, and the priority value update logic can be implemented with a small hardware cost. The ADP can suit for various applications by the appropriate selection of its insertion, promotion and selection policies. If the dynamic selection of the suitable policies for the running application is possible, the performance of the cache replacement policy will be increased. In order to achieve the dynamic selection of the suitable policies, this paper focuses on the global fluctuations of the priority values. At first, the cache is partitioned into several partitions. At every cache access, the total of priority values in each partition is calculated. At every set interval, the fluctuations of total priority values in all the partitions are checked, and the information is used to detect the behavior of the application. This paper adapts this mechanism to the ADP, and the adapted cache replacement policy is called the ADP-G. The performance evaluation shows that the ADP-G achieves the MPKI reductions and the IPC improvements, compared to the LRU policy, the RRIP policy and the ADP.","PeriodicalId":329439,"journal":{"name":"2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133302982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/CANDARW.2018.00042
Naoya Niwa, Tomohiro Totoki, Hiroki Matsutani, M. Koibuchi, H. Amano
The performance prediction for a NoC-based Chip Multi-Processor (CMP) is one of the main design concerns. Generally, there is a trade-off between accuracy and time overhead on the performance prediction of computer systems. In particular, the time overhead is proportional or exponential to the number of cores when using a cycle-accurate full-system simulation, such as gem5. In this study, we propose an accurate and scalable method to predict the influence of design NoC parameters on its performance. Our method counts the number of execution cycles when employing the target NoC based on the statistics of one-time execution of a full-system simulation using a fully-connected NoC. To evaluate the accuracy and execution time overhead, we use the case that randomly generates allocations of processors with 3D mesh topology NoC. Its Mean Absolute Percentage Error of the estimated cycles is about 4.7%, and the Maximum Absolute Percentage Error is about 8.5%.
{"title":"An Trace-Driven Performance Prediction Method for Exploring NoC Design Optimization","authors":"Naoya Niwa, Tomohiro Totoki, Hiroki Matsutani, M. Koibuchi, H. Amano","doi":"10.1109/CANDARW.2018.00042","DOIUrl":"https://doi.org/10.1109/CANDARW.2018.00042","url":null,"abstract":"The performance prediction for a NoC-based Chip Multi-Processor (CMP) is one of the main design concerns. Generally, there is a trade-off between accuracy and time overhead on the performance prediction of computer systems. In particular, the time overhead is proportional or exponential to the number of cores when using a cycle-accurate full-system simulation, such as gem5. In this study, we propose an accurate and scalable method to predict the influence of design NoC parameters on its performance. Our method counts the number of execution cycles when employing the target NoC based on the statistics of one-time execution of a full-system simulation using a fully-connected NoC. To evaluate the accuracy and execution time overhead, we use the case that randomly generates allocations of processors with 3D mesh topology NoC. Its Mean Absolute Percentage Error of the estimated cycles is about 4.7%, and the Maximum Absolute Percentage Error is about 8.5%.","PeriodicalId":329439,"journal":{"name":"2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131060522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/CANDARW.2018.00044
Goktug Inal, Gürhan Küçük
Today, processors utilize many datapath resources with various sizes. In this study, we focus on single thread microprocessors, and apply machine learning techniques to predict processors' future performance trend by collecting and processing processor statistics. This type of a performance prediction can be useful for many ongoing computer architecture research topics. Today, these studies mostly rely on history-and threshold-based prediction schemes, which collect statistics and decide on new resource configurations depending on the results of those threshold conditions at runtime. The proposed offline training-based machine learning methodology is an orthogonal technique, which may further improve the performance of such existing algorithms. We show that our neural network based prediction mechanism achieves around 70% accuracy for predicting performance trend (gain or loss in the near future) of applications. This is a noticeably better result compared to accuracy results obtained by naïve history based prediction models.
{"title":"Application of Machine Learning Techniques on Prediction of Future Processor Performance","authors":"Goktug Inal, Gürhan Küçük","doi":"10.1109/CANDARW.2018.00044","DOIUrl":"https://doi.org/10.1109/CANDARW.2018.00044","url":null,"abstract":"Today, processors utilize many datapath resources with various sizes. In this study, we focus on single thread microprocessors, and apply machine learning techniques to predict processors' future performance trend by collecting and processing processor statistics. This type of a performance prediction can be useful for many ongoing computer architecture research topics. Today, these studies mostly rely on history-and threshold-based prediction schemes, which collect statistics and decide on new resource configurations depending on the results of those threshold conditions at runtime. The proposed offline training-based machine learning methodology is an orthogonal technique, which may further improve the performance of such existing algorithms. We show that our neural network based prediction mechanism achieves around 70% accuracy for predicting performance trend (gain or loss in the near future) of applications. This is a noticeably better result compared to accuracy results obtained by naïve history based prediction models.","PeriodicalId":329439,"journal":{"name":"2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130499728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}