Pub Date : 2021-12-01DOI: 10.1109/MCSoC51149.2021.00043
Shunpei Sugawara, Yoichi Shimomura, Ryusuke Egawa, H. Takizawa
Even HPC expert programmers need to invest considerable time and effort in empirically establishing effective performance tuning strategies for their target systems. When the target system is changed and/or updated, it is thus preferable for expert programmers if their performance tuning expertise can be ported to the new system as much as possible. In this paper, we focus on multiple generations of NEC SX series vector systems. We have documented the performance tuning expertise for the previous generations and built a machine-usable database of performance tuning cases. Therefore, this paper investigates how much the recorded expertise in the database can contribute to performance tuning for the latest generation, NEC SX-Aurora TSUBASA (SX-AT). Since the system architecture as well as the software stack such as compilers are totally renewed for SX-AT, this paper discusses the differences in performance tuning across system generations. In addition, this paper also discusses how to express performance tuning techniques in a machine-usable way. The case study in this paper indicates that the Xevolver's approach of using user-defined code transformations can express most of the vectorization-aware performance tuning techniques, and is thus promising for recording the performance tuning expertise in a future-proof fashion.
{"title":"Portability of Vectorization-aware Performance Tuning Expertise across System Generations","authors":"Shunpei Sugawara, Yoichi Shimomura, Ryusuke Egawa, H. Takizawa","doi":"10.1109/MCSoC51149.2021.00043","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00043","url":null,"abstract":"Even HPC expert programmers need to invest considerable time and effort in empirically establishing effective performance tuning strategies for their target systems. When the target system is changed and/or updated, it is thus preferable for expert programmers if their performance tuning expertise can be ported to the new system as much as possible. In this paper, we focus on multiple generations of NEC SX series vector systems. We have documented the performance tuning expertise for the previous generations and built a machine-usable database of performance tuning cases. Therefore, this paper investigates how much the recorded expertise in the database can contribute to performance tuning for the latest generation, NEC SX-Aurora TSUBASA (SX-AT). Since the system architecture as well as the software stack such as compilers are totally renewed for SX-AT, this paper discusses the differences in performance tuning across system generations. In addition, this paper also discusses how to express performance tuning techniques in a machine-usable way. The case study in this paper indicates that the Xevolver's approach of using user-defined code transformations can express most of the vectorization-aware performance tuning techniques, and is thus promising for recording the performance tuning expertise in a future-proof fashion.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"46 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134420045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/MCSoC51149.2021.00050
Toru Tamahashi, R. Yoshioka, Takayuki Hoshino
A user interaction method to support knowledge creation in a hybrid museum experience is proposed and evaluated. The method incorporates a knowledge creation process of visitor experiences to the interaction scheme on the user interface based on two intentions. The first intention is to invoke user actions required for an effective knowledge experience, including individual learning. The second intention is to document the knowledge with sufficient information for sharing and reuse. The method is designed as part of an application for a hybrid museum experience such that the digital device does not distract the visitor from the museum exhibit. This paper presents the proposed UI interaction method, its implementation in the application, and an evaluation study of its effects. The evaluation was conducted with a group of curators to obtain professional feedback on the method's effect on observation behavior and knowledge creation. As a result, we found that the user interface of expressing one's own impressions and seeing the impressions of others helped to deepen the understanding of the exhibits.
{"title":"UI Method to Support Knowledge Creation in Hybrid Museum Experience","authors":"Toru Tamahashi, R. Yoshioka, Takayuki Hoshino","doi":"10.1109/MCSoC51149.2021.00050","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00050","url":null,"abstract":"A user interaction method to support knowledge creation in a hybrid museum experience is proposed and evaluated. The method incorporates a knowledge creation process of visitor experiences to the interaction scheme on the user interface based on two intentions. The first intention is to invoke user actions required for an effective knowledge experience, including individual learning. The second intention is to document the knowledge with sufficient information for sharing and reuse. The method is designed as part of an application for a hybrid museum experience such that the digital device does not distract the visitor from the museum exhibit. This paper presents the proposed UI interaction method, its implementation in the application, and an evaluation study of its effects. The evaluation was conducted with a group of curators to obtain professional feedback on the method's effect on observation behavior and knowledge creation. As a result, we found that the user interface of expressing one's own impressions and seeing the impressions of others helped to deepen the understanding of the exhibits.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"196 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114867350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/MCSoC51149.2021.00018
Aurelien Bloch, S. Brunet, M. Mattavelli
The performance of programs written in languages following the dataflow model of computation (MoC) largely depends on the configuration (partitioning, mapping, scheduling, buffer dimensioning) chosen during the synthesis stages. Furthermore, this programming paradigm is particularly well suited for heterogeneous parallel systems because it is inherently free of memory contention and exposes parallel opportunities. Both of these statements show the necessity for a way to easily and automatically evaluate and find good design configurations. The paper describes the methodology required for clock-accurate profiling of high-level dataflow programs written in RVL-CAL when synthesized on heterogeneous CPU/GPU co-processing platforms. It also extends to the heterogeneous paradigm an existing methodology for qualitatively estimating the performance of such programs as a function of the provided configuration. This, without the need to synthesize and profile every single configuration on the actual hardware platform. This approach is validated using two application programs and several configurations.
{"title":"Performance Estimation of High-Level Dataflow Program on Heterogeneous Platforms","authors":"Aurelien Bloch, S. Brunet, M. Mattavelli","doi":"10.1109/MCSoC51149.2021.00018","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00018","url":null,"abstract":"The performance of programs written in languages following the dataflow model of computation (MoC) largely depends on the configuration (partitioning, mapping, scheduling, buffer dimensioning) chosen during the synthesis stages. Furthermore, this programming paradigm is particularly well suited for heterogeneous parallel systems because it is inherently free of memory contention and exposes parallel opportunities. Both of these statements show the necessity for a way to easily and automatically evaluate and find good design configurations. The paper describes the methodology required for clock-accurate profiling of high-level dataflow programs written in RVL-CAL when synthesized on heterogeneous CPU/GPU co-processing platforms. It also extends to the heterogeneous paradigm an existing methodology for qualitatively estimating the performance of such programs as a function of the provided configuration. This, without the need to synthesize and profile every single configuration on the actual hardware platform. This approach is validated using two application programs and several configurations.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124366462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/MCSoC51149.2021.00021
D. Suzuki, Takahiro Oka, T. Hanyu
A binary convolutional neural network (BCNN) accelerator using a nonvolatile field-programmable gate array (NV-FPGA) with only-once-write shifting is presented. During the basic operation of the BCNN, the feature maps and weights are read from the block RAM (BRAM) and serially transferred to processing elements. The use of only-once-write shifting makes it possible to greatly reduce write power consumption such serial data transfer in the NV-FPGA. Meanwhile, since the BCNN computing is composed of the nested loop, the memory access potentially has a temporal locality. This means that once the data is read from the BRAM, it can be reused among several layers. By focusing this feature and performing loop interchange, the number of memory access can be minimized and the idle time is maximized. If the BRAM is nonvolatile, wasted standby energy consumption during idle state is completely eliminated by the use of power gating technique. As a result, the proposed BCNN accelerator is 66.5% lower energy consumption than a conventional volatile-FPGA-based BCNN accelerator in typical digit recognition task with MNIST dataset.
{"title":"A Memory-Access-Minimized BCNN Accelerator Using Nonvolatile FPGA with Only-Once- Write Shifting","authors":"D. Suzuki, Takahiro Oka, T. Hanyu","doi":"10.1109/MCSoC51149.2021.00021","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00021","url":null,"abstract":"A binary convolutional neural network (BCNN) accelerator using a nonvolatile field-programmable gate array (NV-FPGA) with only-once-write shifting is presented. During the basic operation of the BCNN, the feature maps and weights are read from the block RAM (BRAM) and serially transferred to processing elements. The use of only-once-write shifting makes it possible to greatly reduce write power consumption such serial data transfer in the NV-FPGA. Meanwhile, since the BCNN computing is composed of the nested loop, the memory access potentially has a temporal locality. This means that once the data is read from the BRAM, it can be reused among several layers. By focusing this feature and performing loop interchange, the number of memory access can be minimized and the idle time is maximized. If the BRAM is nonvolatile, wasted standby energy consumption during idle state is completely eliminated by the use of power gating technique. As a result, the proposed BCNN accelerator is 66.5% lower energy consumption than a conventional volatile-FPGA-based BCNN accelerator in typical digit recognition task with MNIST dataset.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127967456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/MCSoC51149.2021.00011
Dong-gill Jung, Dae-Geun Park
LiDAR sensors are one type of sensor used in autonomous driving vehicles that obtain distance data through the flight time of light. A LiDAR sensor can measure data at high speeds, and the precision of the data is higher than with other sensors. A large amount of data per sensing time is transmitted from sensors. Autonomous driving vehicles use man electronic devices, so the data channels they use and the domain control unit resources that control the system are limited. In this environment, if LiDAR sensor data can be reduced without compromising the original data, it can have a quite positive impact on autonomous vehicle systems. In this paper, we propose a differential partial update for data reduction of LiDAR sensors and a semantic detection to eliminate the resulting noise and increase the reliability of the data. The sensor processor extracts only the changed parts of the continuous distance data, excluding the same parts, and transmit them to the host. The high-difference noise is eliminated by filtering through a window-sliding operation. Semantic detection marks only parts that change and detects movement in the field of view. Simple differential partial updates reduce the amount of data by 59.31% based on a simple case. A semantic detection partial update can reduce the amount of data by 83.41%. This process can also reduce computing time by 61.36% with graphics processing unit acceleration.
{"title":"Accelerated on-Chip Algorithm Based on Semantic Region-Based Partial Difference Detection for LiDAR-Vision Depth Data Transmission Reduction in Lightweight Controller Systems of Autonomous Vehicle","authors":"Dong-gill Jung, Dae-Geun Park","doi":"10.1109/MCSoC51149.2021.00011","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00011","url":null,"abstract":"LiDAR sensors are one type of sensor used in autonomous driving vehicles that obtain distance data through the flight time of light. A LiDAR sensor can measure data at high speeds, and the precision of the data is higher than with other sensors. A large amount of data per sensing time is transmitted from sensors. Autonomous driving vehicles use man electronic devices, so the data channels they use and the domain control unit resources that control the system are limited. In this environment, if LiDAR sensor data can be reduced without compromising the original data, it can have a quite positive impact on autonomous vehicle systems. In this paper, we propose a differential partial update for data reduction of LiDAR sensors and a semantic detection to eliminate the resulting noise and increase the reliability of the data. The sensor processor extracts only the changed parts of the continuous distance data, excluding the same parts, and transmit them to the host. The high-difference noise is eliminated by filtering through a window-sliding operation. Semantic detection marks only parts that change and detects movement in the field of view. Simple differential partial updates reduce the amount of data by 59.31% based on a simple case. A semantic detection partial update can reduce the amount of data by 83.41%. This process can also reduce computing time by 61.36% with graphics processing unit acceleration.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"34 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132761948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In modern life, communication via text is becoming one of the most popular means of communication. As a result, storing text in a small format or transferring it quickly over the internet has become a challenging issue, and text compression has become an important research field. Many algorithms for text compression have already been developed, and new algorithms are being devised to fulfil the demands of current technology. This research article proposes a text compression technique based on: (i) the Burrows-Wheeler transform; (ii) an alternative method of run-length coding; (iii) finding repeated patterns more frequently; and (iv) arithmetic coding. The proposed approach is compared with other state-of-the-art methods, and gives better performance in terms of compression ratios.
{"title":"Text Compression Based on an Alternative Approach of Run-Length Coding Using Burrows-Wheeler Transform and Arithmetic Coding","authors":"Md.Atiqur Rahman, Mohamed Hamada, Md. Asfaqur Rahman","doi":"10.1109/MCSoC51149.2021.00049","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00049","url":null,"abstract":"In modern life, communication via text is becoming one of the most popular means of communication. As a result, storing text in a small format or transferring it quickly over the internet has become a challenging issue, and text compression has become an important research field. Many algorithms for text compression have already been developed, and new algorithms are being devised to fulfil the demands of current technology. This research article proposes a text compression technique based on: (i) the Burrows-Wheeler transform; (ii) an alternative method of run-length coding; (iii) finding repeated patterns more frequently; and (iv) arithmetic coding. The proposed approach is compared with other state-of-the-art methods, and gives better performance in terms of compression ratios.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"27 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133686249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/MCSoC51149.2021.00031
Albert Budi Christian, Chih-Yu Lin, Lan-Da Van, Y. Tseng
Inter-vehicle communication is being developed continuously in order to accomplish a better driving experience. Through the exchange of information between vehicles and Road Side Unit (RSU), number of accidents can be reduced by notifying the driver through the facts obtained. In general, broadcast information for vehicles is sent in an ad hoc manner. However, unfiltered information may be useless and wasted for most vehicles. Thus, a raised question is whether precise information can be delivered only to the target vehicles without interfering with other non-target vehicles. A computer vision (CV) and sensor fusion-based transmission system are exchanged by RSU and Vehicle On-board Unit (OBU) is developed to attain this objective. In order to correctly transmit the specific information to the target vehicles, we propose a data fusion driven lane-level precision data transmission system that utilizes three kinds of sensory inputs: Road Side Camera (RSC), GPS, and magnetometer. By combining common features from these sensory inputs, our system is able to select the receiver of specific information on the road. Our system focuses on the scenario where a message can be transmitted to the target vehicles located in a certain lane. The experimental evaluation shows a recognition rate of 87.34% and the generated messages have a total delay less than 72 ms.
{"title":"Data Fusion Driven Lane-level Precision Data Transmission for V2X Road Applications","authors":"Albert Budi Christian, Chih-Yu Lin, Lan-Da Van, Y. Tseng","doi":"10.1109/MCSoC51149.2021.00031","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00031","url":null,"abstract":"Inter-vehicle communication is being developed continuously in order to accomplish a better driving experience. Through the exchange of information between vehicles and Road Side Unit (RSU), number of accidents can be reduced by notifying the driver through the facts obtained. In general, broadcast information for vehicles is sent in an ad hoc manner. However, unfiltered information may be useless and wasted for most vehicles. Thus, a raised question is whether precise information can be delivered only to the target vehicles without interfering with other non-target vehicles. A computer vision (CV) and sensor fusion-based transmission system are exchanged by RSU and Vehicle On-board Unit (OBU) is developed to attain this objective. In order to correctly transmit the specific information to the target vehicles, we propose a data fusion driven lane-level precision data transmission system that utilizes three kinds of sensory inputs: Road Side Camera (RSC), GPS, and magnetometer. By combining common features from these sensory inputs, our system is able to select the receiver of specific information on the road. Our system focuses on the scenario where a message can be transmitted to the target vehicles located in a certain lane. The experimental evaluation shows a recognition rate of 87.34% and the generated messages have a total delay less than 72 ms.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131081192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/MCSoC51149.2021.00024
RK Risikesh, Sharad Sinha, N. Rao
Neural Network model execution is becoming an increasingly compute intensive task. With advances in optimisation techniques such as using lower-bit width precision, need for quantization and model compression, we need to find efficient ways of implementing these techniques. Most Instruction Set Architectures(ISA) do not support low bit-width vector instructions. In this work, we present an extension for the vector specification of the RISC-V ISA, which is targeted towards supporting the lower bit-widths or variable precision (1 to 16 bits) Multiply and Accumulate (MAC) operations. We demonstrate our proposed ISA extension by integrating it with a RISC-V processor named PicoRV32, which is considered as the baseline processor in the proposed work. We introduce the feature of bit-serial multiplication along with variable bit precision support to demonstrate the advantage over a 16 bit baseline processor model. We also build an assembler for the proposed instructions for easier integration into the testbench of the RTL model. We implement the processor on to a Xilinx Zynq based FPGA. We observe that, compared to the baseline RISC-V Vector processor which only supports 8, 16 and 32-bit vector instructions, our processor with variable precision support (1 to16 bits) performs 1.14x faster on an average on a matrix multiplication test program. The proposed processor architecture reduces the memory footprint by up to 1.88x as compared with a baseline 16-bit vector processor.
{"title":"Variable Bit-Precision Vector Extension for RISC-V Based Processors","authors":"RK Risikesh, Sharad Sinha, N. Rao","doi":"10.1109/MCSoC51149.2021.00024","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00024","url":null,"abstract":"Neural Network model execution is becoming an increasingly compute intensive task. With advances in optimisation techniques such as using lower-bit width precision, need for quantization and model compression, we need to find efficient ways of implementing these techniques. Most Instruction Set Architectures(ISA) do not support low bit-width vector instructions. In this work, we present an extension for the vector specification of the RISC-V ISA, which is targeted towards supporting the lower bit-widths or variable precision (1 to 16 bits) Multiply and Accumulate (MAC) operations. We demonstrate our proposed ISA extension by integrating it with a RISC-V processor named PicoRV32, which is considered as the baseline processor in the proposed work. We introduce the feature of bit-serial multiplication along with variable bit precision support to demonstrate the advantage over a 16 bit baseline processor model. We also build an assembler for the proposed instructions for easier integration into the testbench of the RTL model. We implement the processor on to a Xilinx Zynq based FPGA. We observe that, compared to the baseline RISC-V Vector processor which only supports 8, 16 and 32-bit vector instructions, our processor with variable precision support (1 to16 bits) performs 1.14x faster on an average on a matrix multiplication test program. The proposed processor architecture reduces the memory footprint by up to 1.88x as compared with a baseline 16-bit vector processor.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"14 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128645511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/MCSoC51149.2021.00025
T. Cao, Chen Liu, Yuan Gao, W. Goh
This paper presents a parasitic-aware modelling approach called αβ-matrix model for the simulation of neural network (NN) implemented with memristor crossbar array. The line resistance, which is the key parasitic in a memristor crossbar array is analyzed and incorporated into the model. The proposed method estimates the line resistance IR drop with computation complexity of O(mn), in contrast to O(m2n2) required by the classical matrix based Kirchhoff's Current Law (KCL) equations solver. The impact of the crossbar array parasitics to the vector-matrix multiplication (VMM) computation and multi-layer NN classification accuracy are also analyzed. The advantages of the proposed parasitic-aware model are demonstrated through an example of 2-layer perceptron implemented with resistive random access memory (RRAM) crossbar array for MNIST written digits classification. 97.3% classification accuracy is achieved on 64×64 6-bit RRAM crossbar arrays. Compared to the KCL solver, the classification accuracy degradation is less than 0.4% with line resistance up to 4.5Ω.
{"title":"Parasitic-Aware Modelling for Neural Networks Implemented with Memristor Crossbar Array","authors":"T. Cao, Chen Liu, Yuan Gao, W. Goh","doi":"10.1109/MCSoC51149.2021.00025","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00025","url":null,"abstract":"This paper presents a parasitic-aware modelling approach called αβ-matrix model for the simulation of neural network (NN) implemented with memristor crossbar array. The line resistance, which is the key parasitic in a memristor crossbar array is analyzed and incorporated into the model. The proposed method estimates the line resistance IR drop with computation complexity of O(mn), in contrast to O(m2n2) required by the classical matrix based Kirchhoff's Current Law (KCL) equations solver. The impact of the crossbar array parasitics to the vector-matrix multiplication (VMM) computation and multi-layer NN classification accuracy are also analyzed. The advantages of the proposed parasitic-aware model are demonstrated through an example of 2-layer perceptron implemented with resistive random access memory (RRAM) crossbar array for MNIST written digits classification. 97.3% classification accuracy is achieved on 64×64 6-bit RRAM crossbar arrays. Compared to the KCL solver, the classification accuracy degradation is less than 0.4% with line resistance up to 4.5Ω.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"13 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116216236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}