首页 > 最新文献

2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)最新文献

英文 中文
Intelligent congestion control for NoC architecture in Gem5 simulator Gem5仿真器中NoC架构的智能拥塞控制
Smriti Srivastava, M. Shaikh, G. Shivaneetha, Minal Moharir
Congestion in a network significantly impacts the performance of an NoC as there is a substantial increase in latency and power consumption. Machine Learning techniques aid in designing routing methods to keep the network cognizant of the traffic status. This paper presents a congestion-aware Q-routing algorithm based on the Q-learning model of reinforcement learning. The proposed algorithm enhances the network's performance in an NoC under heavy traffic conditions by routing the packets along a less congested path. Thus, it reduces the congestion in the network. This is possible as Q-learning allows the network to keep track of the local and non-local congestion by estimating Q-values. The Q-values guide a node in sending a data packet along an optimal path, thereby evading busy routes. The simulation done on the gem5 simulator with uniform link latency in the network exhibits that Q-routing performs better in a high-load environment than traditional XY and Odd-Even Routing methods, with a performance gain of 5.73% and 12.73%, respectively. The results for varied link latencies that were randomly assigned to create a practical congestion-probable scenario showed that the proposed method outperformed both the XY and Odd-Even routing algorithm with a respective performance gain of 7.38% and 15.19%.
网络中的拥塞会显著影响NoC的性能,因为这会大大增加延迟和功耗。机器学习技术有助于设计路由方法,以保持网络对流量状态的认知。提出了一种基于强化学习中的q -学习模型的感知拥塞q -路由算法。该算法通过将数据包沿较少拥塞的路径路由,提高了网络在大流量条件下的性能。因此,它减少了网络中的拥塞。这是可能的,因为q学习允许网络通过估计q值来跟踪本地和非本地拥塞。q值引导节点沿着最优路径发送数据包,从而避开繁忙的路由。在网络中均匀链路延迟的gem5模拟器上进行的仿真表明,Q-routing在高负载环境下的性能优于传统的XY和奇偶路由方法,性能增益分别为5.73%和12.73%。随机分配不同的链路延迟以创建实际的拥塞可能场景的结果表明,所提出的方法优于XY和奇偶路由算法,其性能增益分别为7.38%和15.19%。
{"title":"Intelligent congestion control for NoC architecture in Gem5 simulator","authors":"Smriti Srivastava, M. Shaikh, G. Shivaneetha, Minal Moharir","doi":"10.1109/MCSoC57363.2022.00062","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00062","url":null,"abstract":"Congestion in a network significantly impacts the performance of an NoC as there is a substantial increase in latency and power consumption. Machine Learning techniques aid in designing routing methods to keep the network cognizant of the traffic status. This paper presents a congestion-aware Q-routing algorithm based on the Q-learning model of reinforcement learning. The proposed algorithm enhances the network's performance in an NoC under heavy traffic conditions by routing the packets along a less congested path. Thus, it reduces the congestion in the network. This is possible as Q-learning allows the network to keep track of the local and non-local congestion by estimating Q-values. The Q-values guide a node in sending a data packet along an optimal path, thereby evading busy routes. The simulation done on the gem5 simulator with uniform link latency in the network exhibits that Q-routing performs better in a high-load environment than traditional XY and Odd-Even Routing methods, with a performance gain of 5.73% and 12.73%, respectively. The results for varied link latencies that were randomly assigned to create a practical congestion-probable scenario showed that the proposed method outperformed both the XY and Odd-Even routing algorithm with a respective performance gain of 7.38% and 15.19%.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132131733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Implementation of Edge-cloud Cooperative CNN Inference on an IoT Platform 边缘云协同CNN推理在物联网平台上的实现
Yuan Wang, H. Shibamura, KuanYi Ng, Koji Inoue
Since the Internet of Things (IoT) has become more widely used in various industrial situations, Artificial Intelligence (AI) programs, particularly Convolutional Neural Network (CNN) applications, are projected to be implemented on edge devices to meet high-accuracy and huge industry computing needs. Offloading computing-intensive workloads to the cloud is a promising solution for compact energy-constrained edge devices, but it tends to incur significant costs in total execution latency. For flexible and fine-grained offloading, this paper aims to design and implement an edge-cloud cooperative CNN inference framework on an IoT platform by targeting TensorFlow Lite. We have confirmed the implementation's feasibility and accuracy through the verification of implementing LeNet, AlexNet, and VGGNet. Intending to perform high-performance edge-cloud AI executions on the presented IoT platform, we evaluate the performance overhead (total execution latency) of the provided implementation and identify the current bottlenecks of the target platform for enhancing it.
随着物联网(IoT)在各种工业场景中的应用越来越广泛,人工智能(AI)程序,特别是卷积神经网络(CNN)应用,预计将在边缘设备上实施,以满足高精度和庞大的工业计算需求。将计算密集型工作负载卸载到云是紧凑型能量受限边缘设备的一个很有前途的解决方案,但它往往会在总执行延迟上产生巨大的成本。为了灵活和细粒度的卸载,本文旨在以TensorFlow Lite为目标,在物联网平台上设计和实现一个边缘云协作CNN推理框架。通过对LeNet、AlexNet和VGGNet的实现验证,证实了实现的可行性和准确性。为了在现有的物联网平台上执行高性能的边缘云AI执行,我们评估了所提供实现的性能开销(总执行延迟),并确定了目标平台当前的瓶颈,以增强它。
{"title":"Implementation of Edge-cloud Cooperative CNN Inference on an IoT Platform","authors":"Yuan Wang, H. Shibamura, KuanYi Ng, Koji Inoue","doi":"10.1109/MCSoC57363.2022.00060","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00060","url":null,"abstract":"Since the Internet of Things (IoT) has become more widely used in various industrial situations, Artificial Intelligence (AI) programs, particularly Convolutional Neural Network (CNN) applications, are projected to be implemented on edge devices to meet high-accuracy and huge industry computing needs. Offloading computing-intensive workloads to the cloud is a promising solution for compact energy-constrained edge devices, but it tends to incur significant costs in total execution latency. For flexible and fine-grained offloading, this paper aims to design and implement an edge-cloud cooperative CNN inference framework on an IoT platform by targeting TensorFlow Lite. We have confirmed the implementation's feasibility and accuracy through the verification of implementing LeNet, AlexNet, and VGGNet. Intending to perform high-performance edge-cloud AI executions on the presented IoT platform, we evaluate the performance overhead (total execution latency) of the provided implementation and identify the current bottlenecks of the target platform for enhancing it.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115690788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Algorithm to Interconvert SQL and Procedural Visual Queries SQL和过程可视化查询的相互转换算法
Tomonori Suzuki, Y. Watanobe, Divij G. Singh
In this paper, we propose an algorithm to convert SQL and procedural languages into each other. The algorithm converts features of SQL, a declarative programming language, that are not evaluated in top-to-bottom evaluation order, to be evaluated in top-to-bottom order. The algorithm also supports SQL-DML (SELECT, INSERT, UPDATE, DELETE). This helps students and inexperienced users who are learning SQL to understand SQL, and helps experienced users to understand nontrivial and difficult-to-understand SQL. It also introduces a system architecture for inter-conversion between SQL and procedural languages. This architecture allows the system to support a variety of RDBMS.
本文提出了一种SQL语言和过程语言相互转换的算法。该算法将SQL(一种声明性编程语言)的特性从不按从上到下的求值顺序转换为按从上到下的顺序求值。该算法还支持SQL-DML (SELECT、INSERT、UPDATE、DELETE)。这有助于正在学习SQL的学生和没有经验的用户理解SQL,并帮助有经验的用户理解重要的和难以理解的SQL。本文还介绍了一种用于SQL和过程语言之间相互转换的系统体系结构。这种体系结构允许系统支持各种RDBMS。
{"title":"Algorithm to Interconvert SQL and Procedural Visual Queries","authors":"Tomonori Suzuki, Y. Watanobe, Divij G. Singh","doi":"10.1109/MCSoC57363.2022.00048","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00048","url":null,"abstract":"In this paper, we propose an algorithm to convert SQL and procedural languages into each other. The algorithm converts features of SQL, a declarative programming language, that are not evaluated in top-to-bottom evaluation order, to be evaluated in top-to-bottom order. The algorithm also supports SQL-DML (SELECT, INSERT, UPDATE, DELETE). This helps students and inexperienced users who are learning SQL to understand SQL, and helps experienced users to understand nontrivial and difficult-to-understand SQL. It also introduces a system architecture for inter-conversion between SQL and procedural languages. This architecture allows the system to support a variety of RDBMS.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116070292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Online Image Sensor Fault Detection for Autonomous Vehicles 自动驾驶汽车在线图像传感器故障检测
Yizhi Chen, Wenyao Zhu, Dejiu Chen, Zhonghai Lu
Automated driving vehicles have shown glorious potential in the near future market due to the high safety and convenience for drivers and passengers. Image sensors' reliability attract many researchers' interests as many image sensors are used in autonomous vehicles. We propose an online image sensor fault detection method based on comparing the historical variances of normal pixels and defective pixels to detect faults. For fault pixels without uncertainty, with a detecting window of more than 30 frames, we get 100% accuracy and 100% recall on realistic continuous traffic pictures from the KITTI data set. We also explore the influence of fault pixel values' uncertainty from 0% to 25% and study different fixed thresholds and a dynamic threshold for judgments. Strict threshold, which is 0.1, has a high accuracy (99.16%) but has a low recall (34.46%) for 15% uncertainty. Loose threshold, which is 0.3, has a relatively high recall (83.78%) but mistakes too many normal pixels with 18.17% accuracy for 15% uncertainty. Our dynamic threshold balances the accuracy and recall. It gets 100% accuracy and 58.69% recall for 5% uncertainty and 78.38% accuracy and 55.39% recall for 15% uncertainty. Based on the detected damage pixel rate, we develop a health score for evaluating the image sensor system intuitively. It can also be helpful for making decision about replacing cameras.
自动驾驶汽车因其对驾驶员和乘客的高安全性和便利性,在不久的将来的市场上显示出巨大的潜力。随着自动驾驶汽车中图像传感器的应用越来越广泛,图像传感器的可靠性问题引起了研究人员的广泛关注。提出了一种基于正常像素和缺陷像素历史方差比较的在线图像传感器故障检测方法。对于无不确定性的故障像素点,在超过30帧的检测窗口下,我们对KITTI数据集的真实连续交通图像获得了100%的准确率和100%的召回率。探讨了故障像素值不确定性在0% ~ 25%范围内的影响,并研究了不同的固定阈值和动态阈值进行判断。严格阈值为0.1,当不确定度为15%时,准确率高(99.16%),召回率低(34.46%)。宽松阈值0.3具有相对较高的召回率(83.78%),但在15%的不确定性下,正确率为18.17%,错误过多。我们的动态阈值平衡了准确率和召回率。在不确定度为5%时,准确率为100%,召回率为58.69%;在不确定度为15%时,准确率为78.38%,召回率为55.39%。基于检测到的损伤像素率,我们开发了一种健康度评分来直观地评价图像传感器系统。它也可以帮助你决定是否更换相机。
{"title":"Online Image Sensor Fault Detection for Autonomous Vehicles","authors":"Yizhi Chen, Wenyao Zhu, Dejiu Chen, Zhonghai Lu","doi":"10.1109/MCSoC57363.2022.00028","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00028","url":null,"abstract":"Automated driving vehicles have shown glorious potential in the near future market due to the high safety and convenience for drivers and passengers. Image sensors' reliability attract many researchers' interests as many image sensors are used in autonomous vehicles. We propose an online image sensor fault detection method based on comparing the historical variances of normal pixels and defective pixels to detect faults. For fault pixels without uncertainty, with a detecting window of more than 30 frames, we get 100% accuracy and 100% recall on realistic continuous traffic pictures from the KITTI data set. We also explore the influence of fault pixel values' uncertainty from 0% to 25% and study different fixed thresholds and a dynamic threshold for judgments. Strict threshold, which is 0.1, has a high accuracy (99.16%) but has a low recall (34.46%) for 15% uncertainty. Loose threshold, which is 0.3, has a relatively high recall (83.78%) but mistakes too many normal pixels with 18.17% accuracy for 15% uncertainty. Our dynamic threshold balances the accuracy and recall. It gets 100% accuracy and 58.69% recall for 5% uncertainty and 78.38% accuracy and 55.39% recall for 15% uncertainty. Based on the detected damage pixel rate, we develop a health score for evaluating the image sensor system intuitively. It can also be helpful for making decision about replacing cameras.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129503872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Movie Oriented Positive Negative Emotion Classification from EEG Signal using Wavelet transformation and Machine learning Approaches 基于小波变换和机器学习的脑电信号正向消极情绪分类
Abu Saleh Musa Miah, Jungpil Shin, Md. Al Mehedi Hasan, M. I. Molla, Y. Okuyama, Yoichi Tomioka
Electroencephalography (EEG) sensor plays an important role in developing brain-computer interfaces (BCI) to enhance human-computer interaction (HCI). Nowadays, various types of research works are performed to develop EEG-based HCI systems for controlling and monitoring systems. However, researchers are still facing challenges in developing this system due to noise from the physiological and internal and external artefacts. This study proposed a method to find useful electrodes and extract potential information from the brain nerves for the classification of positive or negative emotions. The collected emotion's EEG signal is recorded using 14 electrodes from the 30-younger people. Two movies were used for positive and negative emotions. In the proposed method, we first extracted the five bands wavelet transform from the EEG and then calculated the standard deviation (SD), average power (AVP) and mean absolute value (MAV) of the five bands wavelet information. Finally, we applied an extra tree classifier (ETC), random forest (RF), and support vector machine (SVM) to classify the emotion based on the feature vector. Among three classifiers ETC achieved higher performance accuracy in F3, FC5, T8, FC6, F8, and AF4 electrodes. This indicates that the F3, FC5, T8, FC6, F8, and AF4 electrodes carry potential information in positive-negative emotion classification.
脑电传感器在开发脑机接口(BCI)以增强人机交互(HCI)方面发挥着重要作用。目前,各种类型的研究工作正在进行,以开发基于脑电图的HCI系统,用于控制和监测系统。然而,由于生理和内外人工干扰的噪声,研究人员在开发该系统时仍然面临着挑战。本研究提出了一种从脑神经中寻找有用电极和提取电位信息的方法,用于积极情绪和消极情绪的分类。收集到的情绪的脑电图信号用来自30个年轻人的14个电极记录下来。他们分别用两部电影来表达积极情绪和消极情绪。该方法首先从脑电信号中提取5个波段的小波变换,然后计算5个波段小波信息的标准差(SD)、平均功率(AVP)和平均绝对值(MAV)。最后,我们应用额外的树分类器(ETC)、随机森林(RF)和支持向量机(SVM)对基于特征向量的情感进行分类。在三个分类器中,ETC在F3、FC5、T8、FC6、F8和AF4电极上的性能准确率较高。说明F3、FC5、T8、FC6、F8、AF4电极携带正负情绪分类电位信息。
{"title":"Movie Oriented Positive Negative Emotion Classification from EEG Signal using Wavelet transformation and Machine learning Approaches","authors":"Abu Saleh Musa Miah, Jungpil Shin, Md. Al Mehedi Hasan, M. I. Molla, Y. Okuyama, Yoichi Tomioka","doi":"10.1109/MCSoC57363.2022.00014","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00014","url":null,"abstract":"Electroencephalography (EEG) sensor plays an important role in developing brain-computer interfaces (BCI) to enhance human-computer interaction (HCI). Nowadays, various types of research works are performed to develop EEG-based HCI systems for controlling and monitoring systems. However, researchers are still facing challenges in developing this system due to noise from the physiological and internal and external artefacts. This study proposed a method to find useful electrodes and extract potential information from the brain nerves for the classification of positive or negative emotions. The collected emotion's EEG signal is recorded using 14 electrodes from the 30-younger people. Two movies were used for positive and negative emotions. In the proposed method, we first extracted the five bands wavelet transform from the EEG and then calculated the standard deviation (SD), average power (AVP) and mean absolute value (MAV) of the five bands wavelet information. Finally, we applied an extra tree classifier (ETC), random forest (RF), and support vector machine (SVM) to classify the emotion based on the feature vector. Among three classifiers ETC achieved higher performance accuracy in F3, FC5, T8, FC6, F8, and AF4 electrodes. This indicates that the F3, FC5, T8, FC6, F8, and AF4 electrodes carry potential information in positive-negative emotion classification.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116442959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
High-Performance Asynchronous CNN Accelerator with Early Termination 高性能异步CNN加速器与早期终止
Tan Rong Loo, T. Teo, Mulat Ayinet Tiruye, I-Chyn Wey
Convolutional Neural Network (CNN), especially very deep networks, are highly computation intensive, resulting in long delays and high-power consumption. The dynamically varying environmental conditions in real world inference can result in highly complex problems, and hence the need for these inefficient deep networks to guarantee satisfactory accuracy all the time. Several studies employ approximation techniques to execute partial computation of the network, in attempt to reduce the amount of computation were unnecessary. However, such approaches are still highly sequential in nature, since they still need to run the whole network. This paper proposes an early termination architecture on an already-trained CNN to allow for testing the partial results midway through the network, reducing computations by terminating the main network when it is sufficient as the inference results. The first proposal is implemented in synchronous circuit, however, due to its nature all memory elements are required to capture even when no new data is generated. The second proposal employs the use of asynchronous circuit to significantly reducing power consumption and further sped up the architecture since an operation need not wait the slowest critical path in the circuit. The proposed circuits were designed on FPGA platform. The results of the asynchronous circuit show a nearly 20% increment in speed with about 12% reduction in power consumption in comparison with a synchronous circuit.
卷积神经网络(Convolutional Neural Network, CNN),尤其是非常深度的网络,计算量大,导致时延长,功耗高。在现实世界的推理中,动态变化的环境条件会导致高度复杂的问题,因此需要这些低效的深度网络来始终保证令人满意的准确性。一些研究采用近似技术来执行网络的部分计算,试图减少不必要的计算量。然而,这些方法在本质上仍然是高度顺序的,因为它们仍然需要运行整个网络。本文在已经训练好的CNN上提出了一种早期终止架构,允许在网络中途测试部分结果,通过在主网络足够作为推理结果时终止主网络来减少计算量。第一种方案是在同步电路中实现的,然而,由于其性质,即使没有新数据产生,也需要捕获所有存储元件。第二种方案采用异步电路,大大降低了功耗,并进一步加快了体系结构,因为操作不需要等待电路中最慢的关键路径。该电路在FPGA平台上进行了设计。异步电路的结果显示,与同步电路相比,速度提高了近20%,功耗降低了约12%。
{"title":"High-Performance Asynchronous CNN Accelerator with Early Termination","authors":"Tan Rong Loo, T. Teo, Mulat Ayinet Tiruye, I-Chyn Wey","doi":"10.1109/MCSoC57363.2022.00031","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00031","url":null,"abstract":"Convolutional Neural Network (CNN), especially very deep networks, are highly computation intensive, resulting in long delays and high-power consumption. The dynamically varying environmental conditions in real world inference can result in highly complex problems, and hence the need for these inefficient deep networks to guarantee satisfactory accuracy all the time. Several studies employ approximation techniques to execute partial computation of the network, in attempt to reduce the amount of computation were unnecessary. However, such approaches are still highly sequential in nature, since they still need to run the whole network. This paper proposes an early termination architecture on an already-trained CNN to allow for testing the partial results midway through the network, reducing computations by terminating the main network when it is sufficient as the inference results. The first proposal is implemented in synchronous circuit, however, due to its nature all memory elements are required to capture even when no new data is generated. The second proposal employs the use of asynchronous circuit to significantly reducing power consumption and further sped up the architecture since an operation need not wait the slowest critical path in the circuit. The proposed circuits were designed on FPGA platform. The results of the asynchronous circuit show a nearly 20% increment in speed with about 12% reduction in power consumption in comparison with a synchronous circuit.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132616390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using scheduling entropy amplification in CUDA/OpenMP code to exhibit non-reproducibility issues 在CUDA/OpenMP代码中使用调度熵放大来显示不可再现性问题
D. Defour
Rounding error or cancellation that appears with each floating-point operations, combined with the lack of control over execution order in parallel code leads to numerical issues such as numerical reproducibility. In order to enhance the possibility to discover such numerical issue, in this article we propose a simple solution base on an index interposer and an index scrambler to amplify the possible combination of execution order.
每次浮点运算都会出现舍入错误或取消,再加上并行代码中缺乏对执行顺序的控制,会导致数值问题,例如数值可再现性。为了提高发现这类数值问题的可能性,本文提出了一种基于索引插入器和索引扰频器的简单解决方案,以扩大执行顺序的可能组合。
{"title":"Using scheduling entropy amplification in CUDA/OpenMP code to exhibit non-reproducibility issues","authors":"D. Defour","doi":"10.1109/MCSoC57363.2022.00040","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00040","url":null,"abstract":"Rounding error or cancellation that appears with each floating-point operations, combined with the lack of control over execution order in parallel code leads to numerical issues such as numerical reproducibility. In order to enhance the possibility to discover such numerical issue, in this article we propose a simple solution base on an index interposer and an index scrambler to amplify the possible combination of execution order.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133634664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Critical Signature Assertion and On-the-Fly Recovery for Control Flow Errors in Processors 处理器控制流错误的关键签名断言与动态恢复
Ing-Jer Huang, Yi-Ju Ke, Shih-Jung Pao
This paper presents a highly effective hybrid control flow error (CFE) detection and recovery mechanism for fault-tolerant instruction set processors. The mechanism consists of two innovations: critical signature assertion (CSA) and on-the-fly recovery (OTFR). The proposed mechanism is experimented with a commercial 32-bit microcontroller core, Andes N801s. Compared with related work, our approach achieves up to 75% and 221% lower in memory size and performance overheads respectively, and reduces the error correction latency by up to 54%, at the reasonable costs of 3470 gates (+19%) and 967uW (+17%) power and merely 0.3% sacrifice in fault coverage.
针对容错指令集处理器,提出了一种高效的混合控制流错误(CFE)检测与恢复机制。该机制包括两个创新:关键签名断言(CSA)和动态恢复(OTFR)。所提出的机制在商用32位微控制器核心Andes N801s上进行了实验。与相关工作相比,我们的方法在内存大小和性能开销方面分别降低了75%和221%,在3470门(+19%)和967uW(+17%)功耗的合理成本下,将纠错延迟降低了54%,而故障覆盖率仅牺牲0.3%。
{"title":"Critical Signature Assertion and On-the-Fly Recovery for Control Flow Errors in Processors","authors":"Ing-Jer Huang, Yi-Ju Ke, Shih-Jung Pao","doi":"10.1109/MCSoC57363.2022.00052","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00052","url":null,"abstract":"This paper presents a highly effective hybrid control flow error (CFE) detection and recovery mechanism for fault-tolerant instruction set processors. The mechanism consists of two innovations: critical signature assertion (CSA) and on-the-fly recovery (OTFR). The proposed mechanism is experimented with a commercial 32-bit microcontroller core, Andes N801s. Compared with related work, our approach achieves up to 75% and 221% lower in memory size and performance overheads respectively, and reduces the error correction latency by up to 54%, at the reasonable costs of 3470 gates (+19%) and 967uW (+17%) power and merely 0.3% sacrifice in fault coverage.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117016547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Efficient and High-Performance Sparse Matrix-Vector Multiplication on a Many-Core Array 基于多核阵列的高效、高性能稀疏矩阵向量乘法
Peiyao Shi, Aaron Stillmaker, B. Baas
Sparse matrix-vector multiplication (SpMV) is a critical operation in scientific computing, engineering, and other applications. Eight functionally-equivalent SpMV implementations are created for a fine-grained many-core platform with independent shared memory modules. These implementations are compared with a general-purpose processor (Intel Core-i7 3720QM) and a graphics processing unit (GPU, NVIDIA Quadro 620) and results are scaled to 32 nm CMOS. The performance (throughput per chip area) for all three platforms is compared when operating on a set of seven unstructured sparse matrices of varying dimensions up to 3.6 billion elements. The many-core implementations show a $54times$ greater performance than the general-purpose processor, and $40times$ greater performance than the GPU.
稀疏矩阵向量乘法(SpMV)是科学计算、工程和其他应用中的一项关键运算。为具有独立共享内存模块的细粒度多核平台创建了8个功能等效的SpMV实现。这些实现与通用处理器(Intel Core-i7 3720QM)和图形处理单元(GPU, NVIDIA Quadro 620)进行了比较,结果缩放到32纳米CMOS。在操作一组7个不同维度的非结构化稀疏矩阵(最多36亿个元素)时,比较了所有三个平台的性能(每个芯片面积的吞吐量)。多核实现的性能比通用处理器高54倍,比GPU高40倍。
{"title":"Efficient and High-Performance Sparse Matrix-Vector Multiplication on a Many-Core Array","authors":"Peiyao Shi, Aaron Stillmaker, B. Baas","doi":"10.1109/MCSoC57363.2022.00038","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00038","url":null,"abstract":"Sparse matrix-vector multiplication (SpMV) is a critical operation in scientific computing, engineering, and other applications. Eight functionally-equivalent SpMV implementations are created for a fine-grained many-core platform with independent shared memory modules. These implementations are compared with a general-purpose processor (Intel Core-i7 3720QM) and a graphics processing unit (GPU, NVIDIA Quadro 620) and results are scaled to 32 nm CMOS. The performance (throughput per chip area) for all three platforms is compared when operating on a set of seven unstructured sparse matrices of varying dimensions up to 3.6 billion elements. The many-core implementations show a $54times$ greater performance than the general-purpose processor, and $40times$ greater performance than the GPU.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114297881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Realization of IO Physical Memory Protection for RISC-V Systems RISC-V系统IO物理内存保护的实现
Jien Hau Ng, Chee Hong Ang, Hwa Chaw Law
Physical memories or RAMs are essential components in a computer system to hold temporary information required for both software and hardware to work properly. When a system's security is compromised (e.g., due to a malicious application), sensitive information being held in the memories can be leaked out for example to “the cloud”. The RISC-V privileged architecture standard adopts a method called Physical Memory Protection (PMP) to segregate a system's memory into regions with different policy and permissions to prevent unprivileged software from accessing unauthorized regions. However, PMP does not prevent malicious software from hijacking an Input/Output (IO) device with Direct Memory Access (DMA) capability to indirectly gain unauthorized accesses and hence, a similar method commonly termed as “IOPMP” is being worked on in the RISC-V community. This paper describes an early implementation of IOPMP and how it is used to protect physical memory regions in a RISC-V system. Then, the potential performance impact of IOPMP is briefly elaborated. There are still work to be done and this early IOPMP implementation allows various aspects of the protection method such as its scalability, practicality, and effectiveness etc. to be studied for future enhancement.
物理存储器或ram是计算机系统中保存软件和硬件正常工作所需的临时信息的基本组件。当系统的安全性受到威胁时(例如,由于恶意应用程序),存储在内存中的敏感信息可能会泄露出去,例如“云”。RISC-V特权架构标准采用物理内存保护(Physical Memory Protection, PMP)的方法,将系统的内存划分为具有不同策略和权限的区域,防止非特权软件访问未授权的区域。然而,PMP并不能阻止恶意软件劫持具有直接内存访问(DMA)功能的输入/输出(IO)设备来间接获得未经授权的访问,因此,RISC-V社区正在研究一种通常称为“IOPMP”的类似方法。本文描述了IOPMP的早期实现,以及如何使用它来保护RISC-V系统中的物理内存区域。然后,简要阐述了IOPMP对性能的潜在影响。仍然有工作要做,这个早期的IOPMP实现允许保护方法的各个方面,如其可扩展性,实用性和有效性等进行研究,以供未来增强。
{"title":"A Realization of IO Physical Memory Protection for RISC-V Systems","authors":"Jien Hau Ng, Chee Hong Ang, Hwa Chaw Law","doi":"10.1109/MCSoC57363.2022.00066","DOIUrl":"https://doi.org/10.1109/MCSoC57363.2022.00066","url":null,"abstract":"Physical memories or RAMs are essential components in a computer system to hold temporary information required for both software and hardware to work properly. When a system's security is compromised (e.g., due to a malicious application), sensitive information being held in the memories can be leaked out for example to “the cloud”. The RISC-V privileged architecture standard adopts a method called Physical Memory Protection (PMP) to segregate a system's memory into regions with different policy and permissions to prevent unprivileged software from accessing unauthorized regions. However, PMP does not prevent malicious software from hijacking an Input/Output (IO) device with Direct Memory Access (DMA) capability to indirectly gain unauthorized accesses and hence, a similar method commonly termed as “IOPMP” is being worked on in the RISC-V community. This paper describes an early implementation of IOPMP and how it is used to protect physical memory regions in a RISC-V system. Then, the potential performance impact of IOPMP is briefly elaborated. There are still work to be done and this early IOPMP implementation allows various aspects of the protection method such as its scalability, practicality, and effectiveness etc. to be studied for future enhancement.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125517505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1