Pub Date : 2018-11-01DOI: 10.1109/SBESC.2018.00018
Hadley M. Siqueira, M. Kreutz
Real-time embedded systems need software and hardware to be time-predictable to guarantee the correct behavior of the system. Precision Timed Machines are architectures designed for timing predictability and repeatability. They help to improve design time and the efficiency of real-time embedded systems by allowing to separately verify the timing properties of modules. This paper presents a Simultaneous Multithreading Precision Timed Machine named Hivek-RT that can execute hard real-time and conventional threads in parallel. It employs a repeatable thread-interleaved pipeline with an exposed memory hierarchy composed of scratchpads, caches, and a predictable SDRAM memory controller. The proposed architecture is well suited for real-time embedded systems as experimentation results show that the proposed architecture has improved throughput, presents low memory footprint and achieve a memory bandwidth of 90% of the theoretical value while providing deterministic time access to the memory hierarchy. This paper is an extended version of the paper presented on the 8th Brazilian Symposium on Computing Systems Engineering.
{"title":"A simultaneous multithreading processor architecture with predictable timing behavior","authors":"Hadley M. Siqueira, M. Kreutz","doi":"10.1109/SBESC.2018.00018","DOIUrl":"https://doi.org/10.1109/SBESC.2018.00018","url":null,"abstract":"Real-time embedded systems need software and hardware to be time-predictable to guarantee the correct behavior of the system. Precision Timed Machines are architectures designed for timing predictability and repeatability. They help to improve design time and the efficiency of real-time embedded systems by allowing to separately verify the timing properties of modules. This paper presents a Simultaneous Multithreading Precision Timed Machine named Hivek-RT that can execute hard real-time and conventional threads in parallel. It employs a repeatable thread-interleaved pipeline with an exposed memory hierarchy composed of scratchpads, caches, and a predictable SDRAM memory controller. The proposed architecture is well suited for real-time embedded systems as experimentation results show that the proposed architecture has improved throughput, presents low memory footprint and achieve a memory bandwidth of 90% of the theoretical value while providing deterministic time access to the memory hierarchy. This paper is an extended version of the paper presented on the 8th Brazilian Symposium on Computing Systems Engineering.","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"24 1","pages":"45-62"},"PeriodicalIF":1.4,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/SBESC.2018.00018","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41799570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/SBESC.2018.00015
Diogo M. F. Izidio, Antonyus P. A. Ferreira, Edna Barros
A system to automatically recognize vehicle license plates is a growing need to improve safety and traffic control, specifically in major urban centers. However, the license plate recognition task is generally computationally intensive, where the entire input image frame is scanned, the found plates are segmented, and character recognition is then performed for each segmented character. This paper presents a methodology for engineering a system to detect and recognize Brazilian license plates using convolutional neural networks (CNN) that is suitable for embedded systems. The resulting system detects license plates in the captured image using Tiny YOLOv3 architecture and identifies its characters using a second convolutional network trained on synthetic images and fine-tuned with real license plate images. The proposed architecture has demonstrated to be robust to angle, lightning, and noise variations while requiring a single forward pass for each network, therefore allowing faster processing compared to other deep learning approaches. Our methodology was validated using real license plate images under different environmental conditions reached a detection rate of 99.37% and an overall recognition rate of 98.43% while showing an average time of 2.70 s to process $$1024 times 768$$ 1024 × 768 images with a single license plate in a Raspberry Pi3 (ARM Cortex-A53 CPU). To improve the recognition accuracy, an ensemble of CNN models was tested instead of a single CNN model, which resulted in an increase in the average processing time to 4.88 s for each image while increasing the recognition rate to 99.53%. Finally, we discuss the impact of using an ensemble of CNNs considering the accuracy-performance trade-off when engineering embedded systems for license plate recognition.
自动识别车牌系统是提高安全和交通控制的一个日益增长的需求,特别是在主要城市中心。然而,车牌识别任务通常是计算密集型的,其中扫描整个输入图像帧,对发现的车牌进行分割,然后对每个分割的字符进行字符识别。本文提出了一种使用卷积神经网络(CNN)检测和识别巴西车牌的工程系统的方法,该方法适用于嵌入式系统。由此产生的系统使用Tiny YOLOv3架构检测捕获图像中的车牌,并使用在合成图像上训练并与真实车牌图像进行微调的第二个卷积网络识别其特征。所提出的架构已被证明对角度、闪电和噪声变化具有鲁棒性,同时每个网络需要单个前向通道,因此与其他深度学习方法相比,可以更快地处理。采用不同环境条件下的真实车牌图像对方法进行验证,检测率达到99.37% and an overall recognition rate of 98.43% while showing an average time of 2.70 s to process $$1024 times 768$$ 1024 × 768 images with a single license plate in a Raspberry Pi3 (ARM Cortex-A53 CPU). To improve the recognition accuracy, an ensemble of CNN models was tested instead of a single CNN model, which resulted in an increase in the average processing time to 4.88 s for each image while increasing the recognition rate to 99.53%. Finally, we discuss the impact of using an ensemble of CNNs considering the accuracy-performance trade-off when engineering embedded systems for license plate recognition.
{"title":"An embedded automatic license plate recognition system using deep learning","authors":"Diogo M. F. Izidio, Antonyus P. A. Ferreira, Edna Barros","doi":"10.1109/SBESC.2018.00015","DOIUrl":"https://doi.org/10.1109/SBESC.2018.00015","url":null,"abstract":"A system to automatically recognize vehicle license plates is a growing need to improve safety and traffic control, specifically in major urban centers. However, the license plate recognition task is generally computationally intensive, where the entire input image frame is scanned, the found plates are segmented, and character recognition is then performed for each segmented character. This paper presents a methodology for engineering a system to detect and recognize Brazilian license plates using convolutional neural networks (CNN) that is suitable for embedded systems. The resulting system detects license plates in the captured image using Tiny YOLOv3 architecture and identifies its characters using a second convolutional network trained on synthetic images and fine-tuned with real license plate images. The proposed architecture has demonstrated to be robust to angle, lightning, and noise variations while requiring a single forward pass for each network, therefore allowing faster processing compared to other deep learning approaches. Our methodology was validated using real license plate images under different environmental conditions reached a detection rate of 99.37% and an overall recognition rate of 98.43% while showing an average time of 2.70 s to process $$1024 times 768$$ 1024 × 768 images with a single license plate in a Raspberry Pi3 (ARM Cortex-A53 CPU). To improve the recognition accuracy, an ensemble of CNN models was tested instead of a single CNN model, which resulted in an increase in the average processing time to 4.88 s for each image while increasing the recognition rate to 99.53%. Finally, we discuss the impact of using an ensemble of CNNs considering the accuracy-performance trade-off when engineering embedded systems for license plate recognition.","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"24 1","pages":"23-43"},"PeriodicalIF":1.4,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/SBESC.2018.00015","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45764629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-08-07DOI: 10.1007/s10617-018-9214-3
Daniel Gregorek, Alberto Garcia-Ortiz
The emergence of many-core processors raises novel demands to system design. Power-limitations and abundant parallelism require for efficient and scalable run-time management. The integration of dedicated hardware to enhance the performance of the run-time management system is gaining an increasing importance. But the design of a run-time manager for many-core generally suffers from exhaustive evaluation time. Previous works do not address for the required flexibility or do not address for reasonable evaluation time of the simulation framework. We propose the novel simulation framework Agamid to foster the development and evaluation of hardware enhanced run-time management for many-core. Our transaction-level framework performs design point evaluation of hardware enhanced run-time management for many-core at the timescale of seconds. We use a hybrid simulation approach considering the run-time management and the user application at different levels of abstraction. The framework provides a generic run-time manager to compare arbitrary management systems and HW/SW partitionings. The implementation of the run-time manager facilitates direct execution at the host machine and a detailed synchronization model. Agamid applies user application workloads by means of transaction-based task graphs. An extendable system-call interface allows arbitrary interaction between the user application and the run-time management system. The thorough calibration of the RTM timing model enables reasonable approximations of the management overhead. Our evaluation considers the accuracy, wall-time and design space exploration capabilities of Agamid. Our findings substantiate the usefulness to integrate the modeling of the run-time management, hardware architecture and user application into a single transaction-level framework.
{"title":"The Agamid design-space exploration framework","authors":"Daniel Gregorek, Alberto Garcia-Ortiz","doi":"10.1007/s10617-018-9214-3","DOIUrl":"https://doi.org/10.1007/s10617-018-9214-3","url":null,"abstract":"The emergence of many-core processors raises novel demands to system design. Power-limitations and abundant parallelism require for efficient and scalable run-time management. The integration of dedicated hardware to enhance the performance of the run-time management system is gaining an increasing importance. But the design of a run-time manager for many-core generally suffers from exhaustive evaluation time. Previous works do not address for the required flexibility or do not address for reasonable evaluation time of the simulation framework. We propose the novel simulation framework Agamid to foster the development and evaluation of hardware enhanced run-time management for many-core. Our transaction-level framework performs design point evaluation of hardware enhanced run-time management for many-core at the timescale of seconds. We use a hybrid simulation approach considering the run-time management and the user application at different levels of abstraction. The framework provides a generic run-time manager to compare arbitrary management systems and HW/SW partitionings. The implementation of the run-time manager facilitates direct execution at the host machine and a detailed synchronization model. Agamid applies user application workloads by means of transaction-based task graphs. An extendable system-call interface allows arbitrary interaction between the user application and the run-time management system. The thorough calibration of the RTM timing model enables reasonable approximations of the management overhead. Our evaluation considers the accuracy, wall-time and design space exploration capabilities of Agamid. Our findings substantiate the usefulness to integrate the modeling of the run-time management, hardware architecture and user application into a single transaction-level framework.","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"17 7","pages":"293-314"},"PeriodicalIF":1.4,"publicationDate":"2018-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138524200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-05DOI: 10.1007/s10617-018-9213-4
Gunasekaran Manogaran, N. Chilamkurti, Ching-Hsien Hsu
{"title":"Special issue on recent advancements in machine learning algorithms for internet of things","authors":"Gunasekaran Manogaran, N. Chilamkurti, Ching-Hsien Hsu","doi":"10.1007/s10617-018-9213-4","DOIUrl":"https://doi.org/10.1007/s10617-018-9213-4","url":null,"abstract":"","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"22 1","pages":"199 - 200"},"PeriodicalIF":1.4,"publicationDate":"2018-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/s10617-018-9213-4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"52170356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1007/s10617-018-9208-1
Rabeh Ayari, Imane Hafnaoui, G. Beltrame, G. Nicolescu
{"title":"ImGA: an improved genetic algorithm for partitioned scheduling on heterogeneous multi-core systems","authors":"Rabeh Ayari, Imane Hafnaoui, G. Beltrame, G. Nicolescu","doi":"10.1007/s10617-018-9208-1","DOIUrl":"https://doi.org/10.1007/s10617-018-9208-1","url":null,"abstract":"","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"22 1","pages":"183 - 197"},"PeriodicalIF":1.4,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/s10617-018-9208-1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"52170282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-14DOI: 10.1007/s10617-018-9209-0
Xinsheng Zhang, Teng Gao, Dongdong Gao
{"title":"A new deep spatial transformer convolutional neural network for image saliency detection","authors":"Xinsheng Zhang, Teng Gao, Dongdong Gao","doi":"10.1007/s10617-018-9209-0","DOIUrl":"https://doi.org/10.1007/s10617-018-9209-0","url":null,"abstract":"","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"22 1","pages":"243 - 256"},"PeriodicalIF":1.4,"publicationDate":"2018-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/s10617-018-9209-0","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"52170323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-04-16DOI: 10.1007/s10617-018-9202-7
P. Anuradha, H. Rallapalli, G. Narsimha
{"title":"Energy efficient scheduling algorithm for the multicore heterogeneous embedded architectures","authors":"P. Anuradha, H. Rallapalli, G. Narsimha","doi":"10.1007/s10617-018-9202-7","DOIUrl":"https://doi.org/10.1007/s10617-018-9202-7","url":null,"abstract":"","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"22 1","pages":"1 - 12"},"PeriodicalIF":1.4,"publicationDate":"2018-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/s10617-018-9202-7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"52170230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-28DOI: 10.1007/s10617-017-9196-6
Lokesh Sivanandam, Uma Maheswari Oorkavalan, S. Periyasamy
{"title":"RETRACTED ARTICLE: Test data compression for digital circuits using tetrad state skip scheme","authors":"Lokesh Sivanandam, Uma Maheswari Oorkavalan, S. Periyasamy","doi":"10.1007/s10617-017-9196-6","DOIUrl":"https://doi.org/10.1007/s10617-017-9196-6","url":null,"abstract":"","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"21 1","pages":"197 - 211"},"PeriodicalIF":1.4,"publicationDate":"2017-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/s10617-017-9196-6","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"52170207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}