In recent years, the rise of traffic in modern cities has demanded novel technology to support the drivers and protect the passengers and other third parties involved in transportation. Thanks to rapid technological progress and innovations, many Advanced Driver Assistance Systems (A/DAS) based on Machine Learning (ML) algorithms have emerged to address the increasing demand for practical A/DAS applications. Fast and accurate execution of A/DAS algorithms is essential for preventing loss of life and property. High-speed hardware accelerators are vital for processing the high volume of data captured by increasingly sophisticated sensors and complex mathematical models’ execution of modern deep learning (DL) algorithms. One of the fundamental challenges in this new era is to design energy-efficient and portable ML-enabled platforms for vehicles to provide driver assistance and safety. This article presents recent progress in ML-driven A/DAS technology to offer new insights for researchers. We covered standard ML models and optimization approaches based on widely accepted open-source frameworks extensively used in A/DAS applications. We have also highlighted related articles on ML and its sub-branches, neural networks (NNs), and DL. We have also reported the implementation issues, bench-marking problems, and potential challenges for future research. Popular embedded hardware platforms such as Field Programmable Gate Arrays (FPGAs), central processing units (CPUs), Graphical Processing Units (GPUs), and Application Specific Integrated Circuits (ASICs) used to implement A/DAS applications are also compared concerning their performance and resource utilization. We have examined the hardware and software development environments used in implementing A/DAS applications and reported their advantages and disadvantages. We provided performance comparisons of usual A/DAS tasks such as traffic sign recognition, road and lane detection, vehicle and pedestrian detection, driver behavior, and multiple tasking. Considering the current research dynamics, A/DAS will remain one of the most popular application fields for vehicular transportation shortly.
Network topology and routing algorithms stand as pivotal decision points that profoundly impact the performance of Network-on-Chip (NoC) systems. As core counts rise, so does the inherent competition for shared resources, spotlighting the critical need for meticulously designed routing algorithms that circumvent deadlocks to ensure optimal network efficiency. This research capitalizes on the Triplet-Base Architecture (TriBA) and its Distributed Minimal Routing Algorithm (DM4T) to overcome the limitations of previous approaches. While DM4T exhibits performance advantages over previous routing algorithms, its deterministic nature and potential for circular dependencies during routing can lead to deadlocks and congestion. Therefore, this work addresses these vulnerabilities while leveraging the performance benefits of TriBA and DM4T. This work introduces a novel approach that merges a proactive deadlock prevention mechanism with Intermediate Adjacent Shortest Path Routing (IASPR). This combination guarantees both deadlock-free and livelock-free routing, ensuring reliable communication within the network. The key to this integration lies in a flow model-based data transfer categorization technique. This technique prevents the formation of circular dependencies. Additionally, it reduces redundant distance calculations during the routing process. By addressing these challenges, the proposed approach achieves improvements in both routing latency and throughput. To rigorously assess the performance of TriBA network topologies under varying configurations, extensive simulations were undertaken. The investigation encompassed both TriBA networks comprising 9 nodes and those with 27 nodes, employing DM4T, IASPR routing algorithms, and the proactive deadlock prevention method. The gem5 simulator, operating under the Garnet 3.0 network model using a standalone protocol for synthetic traffic patterns, was utilized for simulations at high injection rates, spanning diverse synthetic traffic patterns and PARSEC benchmark suite applications. Simulations rigorously quantified the effectiveness of the proposed approach, revealing reductions in average latency 40.17% and 34.05% compared to the lookup table and DM4T, respectively. Additionally, there were notable increases in average throughput of 7.48% and 5.66%.
The present authentication solutions employed in the Internet of Things (IoT) are either inadequate or computationally intensive, given the resource-constrained nature of IoT devices. This challenges the researchers to devise efficient solutions to embed an important security tenet like authentication. In IoT, the most popular machine-to-machine communication protocol used at the application layer is Message Queuing Telemetry Transport (MQTT). However, the MQTT protocol inherently lacks security-related functions, like authentication, authorization, confidentiality, access control, and data integrity, which is unacceptable for IoT-driven mission-critical applications when connected over public networks. In such a situation, the security is hardened by employing a transport layer security protocol like TLS, which entails significant computational overheads. This paper presents a novel scheme to enhance MQTT security by providing a lightweight multi-factor authentication scheme based on Elliptical curve cryptography. The proposed scheme uses a low-cost signature and a fuzzy extractor to correct errors in imprinted biometrics in noisy environments. This scheme attains mutual authentication, generates a securely agreed-upon session key for secret communication, and guarantees perfect forward secrecy. Furthermore, the rigorous informal security analysis shows the proposed scheme resists cryptographic attacks, including known session critical attacks. Furthermore, an empirical study has been carried out to assess the effectiveness of the proposed scheme in the Cooja simulated environment.
Considering the scaling limitations of current Complementary Metal Oxide Semiconductor (CMOS) technology, Quantum-dot-Cellular Automata (QCA) is emerging as one of the alternatives. QCA being at the molecular scale, defects are more likely to occur in it. Therefore, substantial development of QCA-oriented defects, its corresponding fault models and test generation is required. In this paper, a test generation algorithm for a QCA combinational circuit is proposed. The FAN (A Fanout Oriented) test generation algorithm is extended for QCA. The proposed Automatic Test Pattern Generator (ATPG) for QCA targets Single Stuck at Fault (SSF) set produced by novel Multiple Missing Cells (MMC) defects. The proposed ATPG is based on the QCA-oriented test generation properties and guided by proposed testability measures.
The MCNC benchmark circuits are synthesized into QCA using proposed synthesis algorithms to check the effectiveness of the proposed ATPG. The ATPG is developed using C++ and tested on MCNC benchmark circuits. Further, ATPG-generated test vectors are validated at the QCA device level to demonstrate their correctness. The QCADesigner-E tool is used for the device-level implementation of the MCNC benchmark circuit.
The k-Nearest Neighbors (kNN) algorithm is a fundamental machine learning classification technique with wide-ranging applications. Among various kNN implementation choices, FPGA-based heterogeneous systems have gained popularity due to FPGA's inherent parallelism, energy efficiency, and reconfigurability. However, implementing the kNN algorithm on resource-constrained embedded FPGA platforms, typically characterized by constrained programmable resources shared among various application-specific hardware units, necessitates a kNN accelerator architecture that balances high performance, hardware efficiency, and flexibility. To address this challenge, in this paper, we present a kNN hardware accelerator unit designed to optimize resource utilization by utilizing sequential, i.e. accumulation-based, instead of pipelined/parallel distance computations. The proposed architecture incorporates two key algorithmic optimizations to reduce the iteration count of the sequential distance computation loop: a dynamic lower bound enabling early termination of the distance computation and an online element selection that maximizes partial distance growth per iteration. We further enhance the accelerator's performance by incorporating multiple optimized sequential distance computation units, each dedicated to processing a segment of the training dataset. Our experiments demonstrate that the proposed approach is scalable, making it applicable to various hardware platforms and resource constraints. In particular, when implemented on an AMD Zynq device, the proposed single-core kNN accelerator occupies a mere 5 % of the FPGA's resources while delivering a speedup of 3 – 5 times compared to the kNN software implementation running on the accompanying ARM A9 processor. For the 8-core kNN accelerator, the resource utilization stands at 30 , while the speedup factor ranges between 25 and 35.
Deep Neural Networks (DNNs) are widely used in many application domains. However, they require a vast amount of computations and memory accesses to deliver outstanding accuracy. In this paper, we propose a scheme to predict whether the output of each ReLu activated neuron will be a zero or a positive number in order to skip the computation of those neurons that will likely output a zero. Our predictor, named Mixture-of-Rookies, combines two inexpensive components. The first one exploits the high linear correlation between binarized (1-bit) and full-precision (8-bit) dot products, whereas the second component clusters together neurons that tend to output zero at the same time. We propose a novel clustering scheme based on analysis of angles, as the sign of the dot product of two vectors depends on the cosine of the angle between them. We implement our hybrid zero output predictor on top of a state-of-the-art DNN accelerator. Experimental results show that our scheme introduces a small area overhead of 5.3% while achieving a speedup of 1.2x and reducing energy consumption by 16.5% on average for a set of diverse DNNs.
The increasing demand for rapid application development tools, especially those employing high-level languages such as Python, has underscored the importance of utilizing a wide array of popular libraries while addressing real-time constraints in distributed hardware systems. This paper introduces PyIgH, a unified architecture of an IgH EtherCAT master based on Python, specifically designed to satisfy hard real-time requirements in an EtherCAT network. Implemented as a Python module, PyIgH exposes the functionalities and capabilities of an open-source EtherCAT master, facilitating seamless configuration and control of EtherCAT slave devices within the Python runtime environment. Real-time adaptation of the POSIX library, encapsulated within Python, is also utilized to satisfy the timing requirements of EtherCAT. The feasibility of the proposed approach is verified by analyzing the real-time performance in terms of periodicity and in-controller delay of the EtherCAT control task with a 1 kHz cycle. Experimental results demonstrate that PyIgH is suitable for hard real-time applications and serves as a valid alternative to conventional low-level EtherCAT masters. Additionally, a practical application involving motion control of a six-axis collaborative robot showcases consistent performance of PyIgH within a real-time multi-tasking environment.
People must move using their lower and upper extremities to complete their work. Depending on these extremities' using frequency or different effects such as age, genetics, and body weight, the extremities' ability may decrease. The joints' range of motion(ROM) is measured to evaluate this decrease. Different systems, such as conventional goniometers, mobile phone applications, and sensor-based systems, can measure the ROM value. Still, it can be challenging to measure this parameter in different situations, such as training, moving activities, etc. The partial wireless goniometer and a companion 3D visualization and control GUI were developed in our previous study. However, it was difficult to mount it on the limbs at a distance, or it was impossible to use it for both legs to measure the hip angles. Therefore, this study presents a full wireless goniometer system that can simultaneously measure in real-time and show joint movements in a 3D model for the upper and lower extremities. The angle values required for the ROM were measured with two IMU sensors. Two ESP32s were used as microcontrollers in the system, and a fully wireless system was enabled by transferring data via ESP-NOW and Bluetooth. Thanks to ESP-NOW, the system has less latency compared to other protocols and can transmit data over longer distances. The developed system can also perform activity recognition which is not available in other goniometers. The measurements of the system were compared with a conventional goniometer, and their results were found to be completely correlated .
RISC-V is a computer architecture that has recently attracted considerable attention due to its advantageous qualities: it is an open instruction set, based on reduced and simple instructions. For this reason it has become an appealing choice for a wide range of computing applications and has positioned it as a disruptive force in a wide variety of fields, including those that involve the development of safety–critical software, as in the space sector. The ability to evaluate the activities performed within a processor is of paramount importance in this type of systems to ensure the fulfillment of the requirements during space missions. The monitoring of these events inside the processor is managed by an instrument called Hardware Performance Monitor (HPM). This work shows the implementation of the Sscofpmf extension of the HPM compliant to the RISC-V privileged specification. The paper details the redesign of the existing performance counters from a RISC-V baseline version previously implemented. A comparison between the two versions of both resource utilization data and power consumption is also provided. As expected, the Sscofpmf extension version has a higher resource utilization. Nevertheless, the paper shows that the additional functionalities included in the system have been validated without any changes in the processor clock frequency, so the extension does not introduce any performance overhead.