Pub Date : 2019-10-01DOI: 10.1109/MCSoC.2019.00036
Takayuki Hoshino, R. Yoshioka
A design of a knowledge description method based on Knowledge Templates and Multi-view Symbols for experiential learning is proposed. The proposed method is a unique approach for acquiring empirical data as knowledge and applying them to computing. For the domain of experiential learning, a knowledge model is designed based on the concept of Knowledge Templates, and a corresponding representation language is designed based on Multi-view Symbols. This allows description of both domain specific knowledge and subjective knowledge to be acquired more easily and reliably compared to using natural languages. These designs also demonstrate the application of these concepts to a specific knowledge domain. In addition, the design is evaluated by simulated visualizations of knowledge and use-case based analysis.
{"title":"Design of Knowledge Templates and Multi-View Symbols for Experiential Learning","authors":"Takayuki Hoshino, R. Yoshioka","doi":"10.1109/MCSoC.2019.00036","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00036","url":null,"abstract":"A design of a knowledge description method based on Knowledge Templates and Multi-view Symbols for experiential learning is proposed. The proposed method is a unique approach for acquiring empirical data as knowledge and applying them to computing. For the domain of experiential learning, a knowledge model is designed based on the concept of Knowledge Templates, and a corresponding representation language is designed based on Multi-view Symbols. This allows description of both domain specific knowledge and subjective knowledge to be acquired more easily and reliably compared to using natural languages. These designs also demonstrate the application of these concepts to a specific knowledge domain. In addition, the design is evaluated by simulated visualizations of knowledge and use-case based analysis.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132912511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-10-01DOI: 10.1109/MCSoC.2019.00035
Hiroki Ohashi, Y. Watanobe
A method to classify source code based on convolutional neural networks is presented. The goal of the neural networks is to predict the type of algorithm that is used in the corresponding source code so that the result obtained can be used for different kinds of assistance and assessment for programming education. In the proposed method, source code is converted into a sequence that represents the structure of the code without any keywords, such as variable names or function names. In present paper, models and implementation of the proposed method are presented. An experiment considering several algorithm types is also conducted. For evaluation of the proposed method, source code accumulated in an online judge system is used. The results of the experiment demonstrate that the proposed method can predict the algorithm used in the given source code to a high degree of accuracy.
{"title":"Convolutional Neural Network for Classification of Source Codes","authors":"Hiroki Ohashi, Y. Watanobe","doi":"10.1109/MCSoC.2019.00035","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00035","url":null,"abstract":"A method to classify source code based on convolutional neural networks is presented. The goal of the neural networks is to predict the type of algorithm that is used in the corresponding source code so that the result obtained can be used for different kinds of assistance and assessment for programming education. In the proposed method, source code is converted into a sequence that represents the structure of the code without any keywords, such as variable names or function names. In present paper, models and implementation of the proposed method are presented. An experiment considering several algorithm types is also conducted. For evaluation of the proposed method, source code accumulated in an online judge system is used. The results of the experiment demonstrate that the proposed method can predict the algorithm used in the given source code to a high degree of accuracy.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115469528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-10-01DOI: 10.1109/MCSoC.2019.00030
Zhengqian Han, M. Meyer, Xin Jiang, Takahiro Watanabe
Congestion detection has become a hot issue in Networks-on-Chip. Congestion-aware routing algorithms are designed to avoid congestion in the network; however, most of them achieve small performance benefits while drastically increasing the cost if the mesh size becomes larger. Using a different approach, we utilize the network itself to detect the congestion. In this paper, a congestion detecting mechanism is proposed, which is capable of locating where the congestion is in the network within several cycles. Then, the mechanism is applied to routing algorithm selection and task scheduling. The most suitable routing algorithm is selected according to the detection result. If the congestion status changes, the mechanism can also detect the change and judge whether the routing algorithm needs to be changed. Moreover, from the detection result, we can know the effect of task scheduling and judge whether it needs to be changed. Experimental results show that with the proposed detecting mechanism, the suitable routing algorithm is able to be successfully selected according to the congestion status and the better choice of task scheduling can be made. Consequently, the performance of NoC with the proposed congestion detection mechanism increases.
{"title":"Low-Cost Congestion Detection Mechanism for Networks-on-Chip","authors":"Zhengqian Han, M. Meyer, Xin Jiang, Takahiro Watanabe","doi":"10.1109/MCSoC.2019.00030","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00030","url":null,"abstract":"Congestion detection has become a hot issue in Networks-on-Chip. Congestion-aware routing algorithms are designed to avoid congestion in the network; however, most of them achieve small performance benefits while drastically increasing the cost if the mesh size becomes larger. Using a different approach, we utilize the network itself to detect the congestion. In this paper, a congestion detecting mechanism is proposed, which is capable of locating where the congestion is in the network within several cycles. Then, the mechanism is applied to routing algorithm selection and task scheduling. The most suitable routing algorithm is selected according to the detection result. If the congestion status changes, the mechanism can also detect the change and judge whether the routing algorithm needs to be changed. Moreover, from the detection result, we can know the effect of task scheduling and judge whether it needs to be changed. Experimental results show that with the proposed detecting mechanism, the suitable routing algorithm is able to be successfully selected according to the congestion status and the better choice of task scheduling can be made. Consequently, the performance of NoC with the proposed congestion detection mechanism increases.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"509 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120932916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-10-01DOI: 10.1109/MCSoC.2019.00049
Stéphane Louise
Synchronous Data-Flow as a deterministic variation of Khan Process Networks is a good model for distributed applications that allows for verification of the properties of the applications both at the design level and at run-time. Real-Time extensions exist which allow to specify Real-Time clocks for some of the processes in the graph. With the addition of Real-Time clocks, the behavior of the complete system can be easily differentiated from a nominal Real-Time mode where all the required data for processing is available when a clock tick occurs, and an error mode can be triggered when the condition is not met. Our contribution in this paper is –first– to show a set of graph transformations that allow to account for the execution and communication time on a real platform while at the same time maximizing the parallelism of execution and –second– on top of these transformations to provide the execution constraints as a linear program that must be met at run-time to guarantee the real-time requirements. It is illustrated on a subset of a real-life automotive example.
{"title":"Graph Transformations and Derivation of Scheduling Constraints Applied to the Mapping of Real-Time Distributed Applications","authors":"Stéphane Louise","doi":"10.1109/MCSoC.2019.00049","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00049","url":null,"abstract":"Synchronous Data-Flow as a deterministic variation of Khan Process Networks is a good model for distributed applications that allows for verification of the properties of the applications both at the design level and at run-time. Real-Time extensions exist which allow to specify Real-Time clocks for some of the processes in the graph. With the addition of Real-Time clocks, the behavior of the complete system can be easily differentiated from a nominal Real-Time mode where all the required data for processing is available when a clock tick occurs, and an error mode can be triggered when the condition is not met. Our contribution in this paper is –first– to show a set of graph transformations that allow to account for the execution and communication time on a real platform while at the same time maximizing the parallelism of execution and –second– on top of these transformations to provide the execution constraints as a linear program that must be met at run-time to guarantee the real-time requirements. It is illustrated on a subset of a real-life automotive example.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"162 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117139526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-10-01DOI: 10.1109/MCSoC.2019.00028
Md.Atiqur Rahman, Mohamed Hamada
Day by day, communication through the internet is progressing; particularly video calling for which sending a massive amount of data over the internet and saving on a computer is being a big challenge. For which, there are many compression algorithms; such as block transform, vector quantization and JPEG are used to convert a big dataset in such a format so that it can be sent over the internet at a high speed and stored in a small space in computer. In this paper, a new procedure has been proposed using a lossless mode of JPEG by removing the first and last bits from the exact binary pattern of each pixel which provides a better result than the state-of-the-art techniques. In this technique, the two bits are removed after a little bit preprocessing and then the remaining binary pattern is replaced by a fixed value each. Lastly, average code word, compression ratio and PSNR are used to assess the performance of the proposed procedure with the state-of-the-art techniques. From the experimental results, it looks that the proposed procedure provides better results than the state-of-the-art techniques.
{"title":"A Semi-Lossless Image Compression Procedure using a Lossless Mode of JPEG","authors":"Md.Atiqur Rahman, Mohamed Hamada","doi":"10.1109/MCSoC.2019.00028","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00028","url":null,"abstract":"Day by day, communication through the internet is progressing; particularly video calling for which sending a massive amount of data over the internet and saving on a computer is being a big challenge. For which, there are many compression algorithms; such as block transform, vector quantization and JPEG are used to convert a big dataset in such a format so that it can be sent over the internet at a high speed and stored in a small space in computer. In this paper, a new procedure has been proposed using a lossless mode of JPEG by removing the first and last bits from the exact binary pattern of each pixel which provides a better result than the state-of-the-art techniques. In this technique, the two bits are removed after a little bit preprocessing and then the remaining binary pattern is replaced by a fixed value each. Lastly, average code word, compression ratio and PSNR are used to assess the performance of the proposed procedure with the state-of-the-art techniques. From the experimental results, it looks that the proposed procedure provides better results than the state-of-the-art techniques.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117242255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-10-01DOI: 10.1109/MCSoC.2019.00026
R. Yoshioka, Naoyuki Murata
The objective of this research is to develop a method to serve a generic and unified set of symbology (or words) to software UI elements where each symbol is backed by multi-view explanations to support comprehension. The problem of selecting appropriate symbols and words for the UI is solved by reusable components that are connected to a dictionary of common symbols and words. It is unique from other approaches such that the meaning of symbols is explained by a set of pictures that provide multiple explanations of the context, in other words, Multi-view Symbols. The method is realized by an online dictionary of Multi-view Symbols and a UI framework that provide seamless on-demand access to the dictionary. The design and implementation of the symbol and dictionary system is described and an example of embedding it into a client application is provided.
{"title":"Unified Symbol Framework to Improve UI Comprehension","authors":"R. Yoshioka, Naoyuki Murata","doi":"10.1109/MCSoC.2019.00026","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00026","url":null,"abstract":"The objective of this research is to develop a method to serve a generic and unified set of symbology (or words) to software UI elements where each symbol is backed by multi-view explanations to support comprehension. The problem of selecting appropriate symbols and words for the UI is solved by reusable components that are connected to a dictionary of common symbols and words. It is unique from other approaches such that the meaning of symbols is explained by a set of pictures that provide multiple explanations of the context, in other words, Multi-view Symbols. The method is realized by an online dictionary of Multi-view Symbols and a UI framework that provide seamless on-demand access to the dictionary. The design and implementation of the symbol and dictionary system is described and an example of embedding it into a client application is provided.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127686378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-10-01DOI: 10.1109/MCSoC.2019.00025
Akihiro Fukuhara, Tomomu Iwai, Yuiko Sakuma, H. Nishi
In recent years, a large number of Internet of Things (IoT) devices have appeared. Accordingly, various services using data from such devices have been proposed. However, the collected raw data include private information, and thus, privacy problems arise. Data anonymization is a method for removing privacy-sensitive information from raw data. Data anonymization for IoT data services should satisfy the following requirements. First, the raw data should be anonymized between a device and the cloud server. Second, the anonymization methods and the destinations of the collected data should be flexibly configured, as they depend on data types and agreements with data suppliers. Third, network transparency is necessary for ease of installation. However, conventional data anonymization systems do not satisfy these requirements. We propose anonymization hardware that functions as a network router on network edges. It directly anonymizes data in network packets. Moreover, it decides the destination IP address of the packets and anonymizes data based on their content. For high-throughput and low-power processing of the packets, the proposed hardware was implemented by using a field-programmable gate array. The throughput of the proposed hardware achieved 10 Gbps wire speed, and the power consumption was lower than that of software implementation.
{"title":"Implementation of Content-Based Anonymization Edge Router on NetFPGA","authors":"Akihiro Fukuhara, Tomomu Iwai, Yuiko Sakuma, H. Nishi","doi":"10.1109/MCSoC.2019.00025","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00025","url":null,"abstract":"In recent years, a large number of Internet of Things (IoT) devices have appeared. Accordingly, various services using data from such devices have been proposed. However, the collected raw data include private information, and thus, privacy problems arise. Data anonymization is a method for removing privacy-sensitive information from raw data. Data anonymization for IoT data services should satisfy the following requirements. First, the raw data should be anonymized between a device and the cloud server. Second, the anonymization methods and the destinations of the collected data should be flexibly configured, as they depend on data types and agreements with data suppliers. Third, network transparency is necessary for ease of installation. However, conventional data anonymization systems do not satisfy these requirements. We propose anonymization hardware that functions as a network router on network edges. It directly anonymizes data in network packets. Moreover, it decides the destination IP address of the packets and anonymizes data based on their content. For high-throughput and low-power processing of the packets, the proposed hardware was implemented by using a field-programmable gate array. The throughput of the proposed hardware achieved 10 Gbps wire speed, and the power consumption was lower than that of software implementation.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133913251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-10-01DOI: 10.1109/MCSoC.2019.00039
K. Dang, Akram Ben Ahmed, Xuan-Tu Tran
This paper presents "On Communication Through-Silicon-Via Test" (OCTT), an ECC-based method to localize faults without halting the operation of TSV-based 3D-IC systems. OCTT consists of two major parts named Statistical Detector and Isolation and Check. While Statistical Detector could detect open and short defects in TSVs that work without interrupting data transactions, the Isolation and Check algorithm enhances the ability to localize fault position. The Monte-Carlo simulations of Statistical Detector show ×2 increment in the number of detected faults when compared to conventional ECC-based techniques. While Isolation and Check helps localize the number of defects up to ×4 and ×5 higher. In addition, the worst case execution time is below 65,000 cycles with no performance degradation for testing which could be easily integrated into real-time applications.
{"title":"An on-Communication Multiple-TSV Defects Detection and Localization for Real-Time 3D-ICs","authors":"K. Dang, Akram Ben Ahmed, Xuan-Tu Tran","doi":"10.1109/MCSoC.2019.00039","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00039","url":null,"abstract":"This paper presents \"On Communication Through-Silicon-Via Test\" (OCTT), an ECC-based method to localize faults without halting the operation of TSV-based 3D-IC systems. OCTT consists of two major parts named Statistical Detector and Isolation and Check. While Statistical Detector could detect open and short defects in TSVs that work without interrupting data transactions, the Isolation and Check algorithm enhances the ability to localize fault position. The Monte-Carlo simulations of Statistical Detector show ×2 increment in the number of detected faults when compared to conventional ECC-based techniques. While Isolation and Check helps localize the number of defects up to ×4 and ×5 higher. In addition, the worst case execution time is below 65,000 cycles with no performance degradation for testing which could be easily integrated into real-time applications.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129565979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Acceleration by FPGA is expected for real-time edge processing as well as server applications in the cloud. A robot is one of the examples which need the acceleration of processing such as image recognition processing and actuation based on its visual feedback. As the system is more complex, it is required to introduce a management mechanism of FPGA dynamic reconfiguration. In this paper, we propose a method of system development which includes FPGA acceleration. The key idea of the proposed method is the FPGA reconfiguration based on a context, which is defined in Context-Oriented Programming (COP). This idea contributes to solve the cross-cutting concern problem at runtime. The problem causes to decrease the efficiency of development. Thus, this idea makes easily manage to FPGA reconfiguration with software in case of changing a whole system. In evaluation, we compare the reconfiguration time of FPGA to switch a context with the context switching time of the COP software written in C++ language. It indicates that the proposed method is feasible to handle FPGA context.
{"title":"Prototype of FPGA Dynamic Reconfiguration Based-on Context-Oriented Programming","authors":"Takeshi Ohkawa, Ikuta Tanigawa, Mikiko Sato, K. Hisazumi, Nobuhiko Ogura, Harumi Watanabe","doi":"10.1109/MCSoC.2019.00024","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00024","url":null,"abstract":"Acceleration by FPGA is expected for real-time edge processing as well as server applications in the cloud. A robot is one of the examples which need the acceleration of processing such as image recognition processing and actuation based on its visual feedback. As the system is more complex, it is required to introduce a management mechanism of FPGA dynamic reconfiguration. In this paper, we propose a method of system development which includes FPGA acceleration. The key idea of the proposed method is the FPGA reconfiguration based on a context, which is defined in Context-Oriented Programming (COP). This idea contributes to solve the cross-cutting concern problem at runtime. The problem causes to decrease the efficiency of development. Thus, this idea makes easily manage to FPGA reconfiguration with software in case of changing a whole system. In evaluation, we compare the reconfiguration time of FPGA to switch a context with the context switching time of the COP software written in C++ language. It indicates that the proposed method is feasible to handle FPGA context.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114089177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-10-01DOI: 10.1109/MCSoC.2019.00013
Mark Sagi, N. Doan, Thomas Wild, A. Herkersdorf
State-of-the-art power estimation research for multicore processors combine performance counters that collect run-time activity information with an offline-generated power model. To generate these power models, the package power is measured and the activity information is traced while synthetic workloads are executed. These workloads stress distinct core components in order to expose power responses so that the activity information has low collinearity. The measurements are then combined into a power model describing the general power behavior. However, one of the main drawbacks of these synthetic workloads is that they are most of the time custom-designed for a given multi-core architecture and are hardly available. In this paper, we present a methodology to generate power models using freely available benchmarks, e.g. PARSEC/Splash-2. To minimize the collinearity of the activity information due to the uncontrolled/unspecified behavior of these more general benchmarks, we propose to use independent component analysis. This allows to avoid the use of synthetic workloads and a reduction of the relative error by 24% in the average case, when compared to prior state-of-the-art work. Although, we also observe an increase of 22% relative error in the worst case for our approach, this can easily be improved by using either different or more training benchmarks. These promising results give a strong indication that independent component analysis could directly be used with real application workload, leading to the possibility to build/improve power models during runtime.
{"title":"Multicore Power Estimation using Independent Component Analysis Based Modeling","authors":"Mark Sagi, N. Doan, Thomas Wild, A. Herkersdorf","doi":"10.1109/MCSoC.2019.00013","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00013","url":null,"abstract":"State-of-the-art power estimation research for multicore processors combine performance counters that collect run-time activity information with an offline-generated power model. To generate these power models, the package power is measured and the activity information is traced while synthetic workloads are executed. These workloads stress distinct core components in order to expose power responses so that the activity information has low collinearity. The measurements are then combined into a power model describing the general power behavior. However, one of the main drawbacks of these synthetic workloads is that they are most of the time custom-designed for a given multi-core architecture and are hardly available. In this paper, we present a methodology to generate power models using freely available benchmarks, e.g. PARSEC/Splash-2. To minimize the collinearity of the activity information due to the uncontrolled/unspecified behavior of these more general benchmarks, we propose to use independent component analysis. This allows to avoid the use of synthetic workloads and a reduction of the relative error by 24% in the average case, when compared to prior state-of-the-art work. Although, we also observe an increase of 22% relative error in the worst case for our approach, this can easily be improved by using either different or more training benchmarks. These promising results give a strong indication that independent component analysis could directly be used with real application workload, leading to the possibility to build/improve power models during runtime.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121105426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}