首页 > 最新文献

IET Computers and Digital Techniques最新文献

英文 中文
Automated planning for finding alternative bug traces 自动规划以查找替代错误跟踪
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2020-09-18 DOI: 10.1049/iet-cdt.2019.0283
Rajib Lochan Jana, Soumyajit Dey, Arijit Mondal, Pallab Dasgupta

Bug traces serve as references for patching a microprocessor design after a bug has been found. Unless the root cause of a bug has been detected and patched, variants of the bug may return through alternative bug traces, following a different sequence of micro-architectural events. To avoid such a situation, the verification engineer must think of every possible way in which the bug may return, which is a complex problem for a modern microprocessor. This study proposes a methodology which gleans high-level descriptions of the micro-architectural steps and uses them in an artificial Intelligence planning framework to find alternative pathways through which a bug may return. The plans are then translated to simulation test cases which explore these potential bug scenarios. The planning tool essentially automates the task of the verification engineer towards exploring possible alternative sequences of micro-architectural steps that may allow a bug to return. The proposed methodology is demonstrated in three case studies.

在发现错误后,错误跟踪可作为修补微处理器设计的参考。除非已经检测到并修补了错误的根本原因,否则在不同的微体系结构事件序列之后,错误的变体可能会通过其他错误跟踪返回。为了避免这种情况,验证工程师必须考虑错误可能返回的各种可能方式,这对现代微处理器来说是一个复杂的问题。这项研究提出了一种方法,该方法收集了微观体系结构步骤的高级描述,并将其用于人工智能规划框架中,以找到漏洞可能返回的替代途径。然后将计划转换为模拟测试用例,以探索这些潜在的错误场景。该规划工具基本上自动化了验证工程师的任务,以探索可能允许错误返回的微体系结构步骤的可能替代序列。三个案例研究证明了所提出的方法。
{"title":"Automated planning for finding alternative bug traces","authors":"Rajib Lochan Jana,&nbsp;Soumyajit Dey,&nbsp;Arijit Mondal,&nbsp;Pallab Dasgupta","doi":"10.1049/iet-cdt.2019.0283","DOIUrl":"https://doi.org/10.1049/iet-cdt.2019.0283","url":null,"abstract":"<div>\u0000 <p>Bug traces serve as references for patching a microprocessor design after a bug has been found. Unless the root cause of a bug has been detected and patched, variants of the bug may return through alternative bug traces, following a different sequence of micro-architectural events. To avoid such a situation, the verification engineer must think of every possible way in which the bug may return, which is a complex problem for a modern microprocessor. This study proposes a methodology which gleans high-level descriptions of the micro-architectural steps and uses them in an artificial Intelligence planning framework to find alternative pathways through which a bug may return. The plans are then translated to simulation test cases which explore these potential bug scenarios. The planning tool essentially automates the task of the verification engineer towards exploring possible alternative sequences of micro-architectural steps that may allow a bug to return. The proposed methodology is demonstrated in three case studies.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"14 6","pages":"322-335"},"PeriodicalIF":1.2,"publicationDate":"2020-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/iet-cdt.2019.0283","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72157106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Optimised HEVC encoder intra-only configuration 优化的HEVC编码器仅内部配置
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2020-09-18 DOI: 10.1049/iet-cdt.2019.0197
Nejmeddine Bahri, Randa Khemiri

High-efficiency video coding (HEVC) is the latest video coding standard aimed to reduce the bitrate by half for the same video quality compared to H.264/AVC. This encoding performance makes HEVC more suitable for high-definition video applications. However, this performance is coupled with a high-computational complexity, which makes it hard to achieve real-time video encoding with a classic embedded processor. Multicore technology of programmable processors could be a very promising solution to overcome this computational complexity. Moreover, software optimisations by proposing fast algorithms for the most complex functions could also be an efficient solution to speed up the encoding process. In this context, this study presents a fast mode decision algorithm for the intra prediction module. This algorithm aims to reduce the number of intra prediction modes to be tested instead of performing a full intra mode search. Experimental results for all-Intra configuration show that the proposed fast intra mode decision allows saving up to 46.79% of the intra prediction time in average. Encoding performance in terms of video quality and bitrate is not significantly affected.

高效视频编码(HEVC)是最新的视频编码标准,旨在与H.264/AVC相比,在相同视频质量的情况下将比特率降低一半。这种编码性能使HEVC更适合高清视频应用。然而,这种性能与高计算复杂度相结合,这使得很难用经典的嵌入式处理器实现实时视频编码。可编程处理器的多核技术可能是克服这种计算复杂性的一个非常有前途的解决方案。此外,通过为最复杂的函数提出快速算法进行软件优化也可能是加快编码过程的有效解决方案。在此背景下,本研究提出了一种用于帧内预测模块的快速模式决策算法。该算法旨在减少要测试的帧内预测模式的数量,而不是执行完整的帧内模式搜索。所有帧内配置的实验结果表明,所提出的快速帧内模式决策允许平均节省高达46.79%的帧内预测时间。视频质量和比特率方面的编码性能不会受到显著影响。
{"title":"Optimised HEVC encoder intra-only configuration","authors":"Nejmeddine Bahri,&nbsp;Randa Khemiri","doi":"10.1049/iet-cdt.2019.0197","DOIUrl":"https://doi.org/10.1049/iet-cdt.2019.0197","url":null,"abstract":"<div>\u0000 <p>High-efficiency video coding (HEVC) is the latest video coding standard aimed to reduce the bitrate by half for the same video quality compared to H.264/AVC. This encoding performance makes HEVC more suitable for high-definition video applications. However, this performance is coupled with a high-computational complexity, which makes it hard to achieve real-time video encoding with a classic embedded processor. Multicore technology of programmable processors could be a very promising solution to overcome this computational complexity. Moreover, software optimisations by proposing fast algorithms for the most complex functions could also be an efficient solution to speed up the encoding process. In this context, this study presents a fast mode decision algorithm for the intra prediction module. This algorithm aims to reduce the number of intra prediction modes to be tested instead of performing a full intra mode search. Experimental results for all-Intra configuration show that the proposed fast intra mode decision allows saving up to 46.79% of the intra prediction time in average. Encoding performance in terms of video quality and bitrate is not significantly affected.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"14 6","pages":"256-262"},"PeriodicalIF":1.2,"publicationDate":"2020-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/iet-cdt.2019.0197","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71984672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Efficient VLSI architectures of lifting based 3D discrete wavelet transform 基于提升的三维离散小波变换的高效VLSI结构
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2020-09-18 DOI: 10.1049/iet-cdt.2020.0038
M. Mohamed Asan Basiri

Discrete wavelet transform (DWT) is widely used in the image and video compression due to its high compression ratio and resolution. This study proposes efficient very large scale integration (VLSI) architectures of lifting based 3D-DWT using (5,3) and (9,7) Daubechies wavelets. The advantage of these proposed architectures is the absence of storage buffer in between the row, column, and temporal processes. Also, five and nine numbers of frames of the 3D signal can be processed in parallel using the proposed (5,3) and (9,7) lifting based DWTs, respectively. Due to this parallelism and the elimination of storage buffers, the throughput of the proposed design is greater than other existing techniques. The authors have implemented all the existing and proposed 3D-DWTs using 45 nm CMOS library with Cadence and Artix-7 FPGA with Xilinx Vivado. The synthesis results show that the proposed designs achieve significant improvement in throughput than various existing designs. For example, the proposed (9,7) lifting based 3D-DWT achieves 85.4% of improvement in the throughput than the conventional design.

离散小波变换(DWT)以其较高的压缩比和分辨率在图像和视频压缩中得到了广泛的应用。本研究使用(5,3)和(9,7)Daubechies小波提出了基于提升的3D-DWT的高效超大规模集成(VLSI)架构。这些提出的体系结构的优点是在行、列和时间进程之间没有存储缓冲区。此外,可以分别使用所提出的(5,3)和(9,7)基于提升的DWT并行处理5个和9个数量的3D信号帧。由于这种并行性和存储缓冲区的消除,所提出的设计的吞吐量大于其他现有技术。作者使用Cadence的45nm CMOS库和Xilinx Vivado的Artix-7 FPGA实现了所有现有和提出的3D DWT。综合结果表明,与现有的各种设计相比,所提出的设计在吞吐量方面实现了显著的提高。例如,与传统设计相比,所提出的(9,7)基于提升的3D-DWT在吞吐量方面实现了85.4%的改进。
{"title":"Efficient VLSI architectures of lifting based 3D discrete wavelet transform","authors":"M. Mohamed Asan Basiri","doi":"10.1049/iet-cdt.2020.0038","DOIUrl":"https://doi.org/10.1049/iet-cdt.2020.0038","url":null,"abstract":"<div>\u0000 <p>Discrete wavelet transform (DWT) is widely used in the image and video compression due to its high compression ratio and resolution. This study proposes efficient very large scale integration (VLSI) architectures of lifting based 3D-DWT using (5,3) and (9,7) Daubechies wavelets. The advantage of these proposed architectures is the absence of storage buffer in between the row, column, and temporal processes. Also, five and nine numbers of frames of the 3D signal can be processed in parallel using the proposed (5,3) and (9,7) lifting based DWTs, respectively. Due to this parallelism and the elimination of storage buffers, the throughput of the proposed design is greater than other existing techniques. The authors have implemented all the existing and proposed 3D-DWTs using 45 nm CMOS library with Cadence and Artix-7 FPGA with Xilinx Vivado. The synthesis results show that the proposed designs achieve significant improvement in throughput than various existing designs. For example, the proposed (9,7) lifting based 3D-DWT achieves 85.4% of improvement in the throughput than the conventional design.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"14 6","pages":"247-255"},"PeriodicalIF":1.2,"publicationDate":"2020-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1049/iet-cdt.2020.0038","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71984673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Towards IP integration on SoC: a case study of high-throughput and low-cost wrapper design on a novel IBUS architecture SoC上的IP集成:基于IBUS架构的高通量低成本封装器设计案例研究
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2020-09-18 DOI: 10.1049/iet-cdt.2019.0090
Xiaokun Yang, Shi Sha, Ishaq Unwala, Jiang Lu

To integrate third-party intellectual properties (IPs) into a new system-on-chip (SoC) architecture is a big challenge. Therefore, this study first presents a new bus protocol named as integrated bus (IBUS), and more important, a configurable bus wrapper for connecting AXI3-interfaced IPs into IBUS is further proposed, aiming to finding the optimal balance between bus efficiency and resource cost in terms of field-programming gate array slice count, bus transfer latency, and energy consumption. As a case study, the authors implemented three IBUS wrappers for integrating three AXI3-interfaced verification IPs into an IBUS SoC. Experimental results show that their proposed work achieves a higher valid data throughput ( in the block test and in the cipher test) compared with the designs on conventional bridge-based SoC integration, as well as a large reduction in the normalised slice-time-power (18.73% in the block benchmark and 23.45% in the cipher benchmark) when setting the same weights of slice number, data transfer latency, and energy dissipation.

将第三方知识产权(IP)集成到一个新的片上系统(SoC)架构中是一个巨大的挑战。因此,本研究首先提出了一种新的总线协议,称为集成总线(IBUS),更重要的是,进一步提出了一个可配置的总线包装器,用于将AXI3接口的IP连接到IBUS中,旨在从现场编程门阵列片数、总线传输延迟和能耗方面找到总线效率和资源成本之间的最佳平衡。作为案例研究,作者实现了三个IBUS封装器,用于将三个AXI3接口验证IP集成到IBUS SoC中。实验结果表明,与传统的基于桥接的SoC集成设计相比,他们提出的工作实现了更高的有效数据吞吐量(在块测试和密码测试中),以及当设置片号、数据传输延迟和能量耗散的相同权重时,归一化片时间功率的大幅降低(在块基准中为18.73%,在密码基准中为23.45%)。
{"title":"Towards IP integration on SoC: a case study of high-throughput and low-cost wrapper design on a novel IBUS architecture","authors":"Xiaokun Yang,&nbsp;Shi Sha,&nbsp;Ishaq Unwala,&nbsp;Jiang Lu","doi":"10.1049/iet-cdt.2019.0090","DOIUrl":"https://doi.org/10.1049/iet-cdt.2019.0090","url":null,"abstract":"<div>\u0000 <p>To integrate third-party intellectual properties (IPs) into a new system-on-chip (SoC) architecture is a big challenge. Therefore, this study first presents a new bus protocol named as integrated bus (IBUS), and more important, a configurable bus wrapper for connecting AXI3-interfaced IPs into IBUS is further proposed, aiming to finding the optimal balance between bus efficiency and resource cost in terms of field-programming gate array slice count, bus transfer latency, and energy consumption. As a case study, the authors implemented three IBUS wrappers for integrating three AXI3-interfaced verification IPs into an IBUS SoC. Experimental results show that their proposed work achieves a higher valid data throughput ( in the block test and in the cipher test) compared with the designs on conventional bridge-based SoC integration, as well as a large reduction in the normalised slice-time-power (18.73% in the block benchmark and 23.45% in the cipher benchmark) when setting the same weights of slice number, data transfer latency, and energy dissipation.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"14 6","pages":"353-362"},"PeriodicalIF":1.2,"publicationDate":"2020-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/iet-cdt.2019.0090","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72157107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
In memory computation using quantum-dot cellular automata 使用量子点细胞自动机的内存计算
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2020-09-18 DOI: 10.1049/iet-cdt.2020.0008
Mrinal Goswami, Jayanta Pal, Mayukh Roy Choudhury, Pritam P. Chougule, Bibhash Sen

The conventional computing system has been facing enormous pressure to cope with the uprising demand for computing speed in today's world. In search of high-speed computing in the nano-scale era, it becomes the utmost necessity to explore a viable alternative to overcome the challenges of the physical limit of complementary-metal-oxide-semiconductor (CMOS). Towards that direction, the processing-in-memory (PIM) is advancing its importance as it keeps the computation as adjacent as possible to memory. It promises to outperform the latencies of the conventional stored-program concept by embedding storage and data computation in a single unit. On the other hand, the bit storing and processing capability of Akers array provides the foundation of PIM. Again, quantum-dot cellular automata (QCA) emerges as a promising nanoelectronic to put back CMOS to give fast-paced devices at the nanoelectronics era. This work presents a novel PIM concept, embedding Akers array in QCA to achieve high-speed computing at the nano-scale era. QCA implementation of universal logic utilizing Akers array signifies its processing power and puts forth its potentials. A universal function is considered for testing the effectiveness of the proposed PIM cell. The performance evaluation indicates the efficacy of QCA PIM over the conventional Von Neumann architecture.

传统的计算系统一直面临着巨大的压力,以应对当今世界对计算速度日益增长的需求。在纳米时代寻求高速计算,探索一种可行的替代方案以克服互补金属氧化物半导体(CMOS)物理极限的挑战成为当务之急。朝着这个方向,存储器中的处理(PIM)正在提高其重要性,因为它使计算尽可能靠近存储器。通过在单个单元中嵌入存储和数据计算,它有望超越传统存储程序概念的延迟。另一方面,阿克斯阵列的比特存储和处理能力为PIM提供了基础。量子点细胞自动机(QCA)再次成为一种很有前途的纳米电子器件,它可以在纳米电子时代将CMOS放回原位,提供快节奏的器件。这项工作提出了一个新的PIM概念,将Akers阵列嵌入QCA中,以实现纳米级的高速计算。利用阿克斯阵列的通用逻辑的QCA实现表明了它的处理能力并展示了它的潜力。考虑了一个通用函数来测试所提出的PIM信元的有效性。性能评估表明QCA PIM优于传统Von Neumann架构。
{"title":"In memory computation using quantum-dot cellular automata","authors":"Mrinal Goswami,&nbsp;Jayanta Pal,&nbsp;Mayukh Roy Choudhury,&nbsp;Pritam P. Chougule,&nbsp;Bibhash Sen","doi":"10.1049/iet-cdt.2020.0008","DOIUrl":"https://doi.org/10.1049/iet-cdt.2020.0008","url":null,"abstract":"<div>\u0000 <p>The conventional computing system has been facing enormous pressure to cope with the uprising demand for computing speed in today's world. In search of high-speed computing in the nano-scale era, it becomes the utmost necessity to explore a viable alternative to overcome the challenges of the physical limit of complementary-metal-oxide-semiconductor (CMOS). Towards that direction, the processing-in-memory (PIM) is advancing its importance as it keeps the computation as adjacent as possible to memory. It promises to outperform the latencies of the conventional stored-program concept by embedding storage and data computation in a single unit. On the other hand, the bit storing and processing capability of Akers array provides the foundation of PIM. Again, quantum-dot cellular automata (QCA) emerges as a promising nanoelectronic to put back CMOS to give fast-paced devices at the nanoelectronics era. This work presents a novel PIM concept, embedding Akers array in QCA to achieve high-speed computing at the nano-scale era. QCA implementation of universal logic utilizing Akers array signifies its processing power and puts forth its potentials. A universal function is considered for testing the effectiveness of the proposed PIM cell. The performance evaluation indicates the efficacy of QCA PIM over the conventional Von Neumann architecture.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"14 6","pages":"336-343"},"PeriodicalIF":1.2,"publicationDate":"2020-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1049/iet-cdt.2020.0008","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72190064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Efficient parallelisation of the packet classification algorithms on multi-core central processing units using multi-threading application program interfaces 使用多线程应用程序接口在多核中央处理器上高效并行化数据包分类算法
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2020-09-10 DOI: 10.1049/iet-cdt.2019.0118
Mahdi Abbasi, Milad Rafiee

The categorisation of network packets according to multiple parameters such as sender and receiver addresses is called packet classification. Packet classification lies at the core of Software-Defined Networking (SDN)-based network applications. Due to the increasing speed of network traffic, there is an urgent need for packet classification at higher speeds. Although it is possible to accelerate packet classification algorithms through hardware implementation, this solution imposes high costs and offers limited development capacity. On the other hand, current software methods to solve this problem are relatively slow. A practical solution to this problem is to parallelise packet classification using multi-core processors. In this study, the Thread, parallel patterns library (PPL), open multi-processing (OpenMP), and threading building blocks (TBB) libraries are examined and implemented to parallelise three packet classification algorithms, i.e. tuple space search, tuple pruning search, and hierarchical tree. According to the results, the type of algorithm and rulesets may influence the performance of parallelisation libraries. In general, the TBB-based method shows the best performance among parallelisation libraries due to using a theft mechanism and can accelerate the classification process up to 8.3 times on a system with a quad-core processor.

根据多个参数(如发送方和接收方地址)对网络分组进行分类称为分组分类。数据包分类是基于软件定义网络(SDN)的网络应用程序的核心。由于网络流量的速度不断增加,迫切需要以更高的速度进行分组分类。尽管可以通过硬件实现来加速分组分类算法,但这种解决方案成本高昂,开发能力有限。另一方面,目前解决这个问题的软件方法相对较慢。这个问题的一个实际解决方案是使用多核处理器并行化数据包分类。在本研究中,研究并实现了线程、并行模式库(PPL)、开放多处理(OpenMP)和线程构建块(TBB)库,以并行化三种分组分类算法,即元组空间搜索、元组修剪搜索和层次树。结果表明,算法和规则集的类型可能会影响并行化库的性能。通常,由于使用了盗窃机制,基于TBB的方法在并行化库中表现出最佳性能,并且在具有四核处理器的系统上可以将分类过程加速8.3倍。
{"title":"Efficient parallelisation of the packet classification algorithms on multi-core central processing units using multi-threading application program interfaces","authors":"Mahdi Abbasi,&nbsp;Milad Rafiee","doi":"10.1049/iet-cdt.2019.0118","DOIUrl":"https://doi.org/10.1049/iet-cdt.2019.0118","url":null,"abstract":"<div>\u0000 <p>The categorisation of network packets according to multiple parameters such as sender and receiver addresses is called packet classification. Packet classification lies at the core of Software-Defined Networking (SDN)-based network applications. Due to the increasing speed of network traffic, there is an urgent need for packet classification at higher speeds. Although it is possible to accelerate packet classification algorithms through hardware implementation, this solution imposes high costs and offers limited development capacity. On the other hand, current software methods to solve this problem are relatively slow. A practical solution to this problem is to parallelise packet classification using multi-core processors. In this study, the Thread, parallel patterns library (PPL), open multi-processing (OpenMP), and threading building blocks (TBB) libraries are examined and implemented to parallelise three packet classification algorithms, i.e. tuple space search, tuple pruning search, and hierarchical tree. According to the results, the type of algorithm and rulesets may influence the performance of parallelisation libraries. In general, the TBB-based method shows the best performance among parallelisation libraries due to using a theft mechanism and can accelerate the classification process up to 8.3 times on a system with a quad-core processor.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"14 6","pages":"313-321"},"PeriodicalIF":1.2,"publicationDate":"2020-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/iet-cdt.2019.0118","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72147016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
High throughput and area-efficient FPGA implementation of AES for high-traffic applications 用于高流量应用的AES的高吞吐量和区域高效FPGA实现
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2020-09-10 DOI: 10.1049/iet-cdt.2019.0179
Karim Shahbazi, Seok-Bum Ko

This study presents a high throughput field-programmable gate array (FPGA) implementation of advanced encryption standard-128 (AES-128). AES is a well-known symmetric key encryption algorithm with high security against different attacks that are widely used in different applications. The main goal of this study is to design a high throughput and FPGA efficiency (FPGA-Eff) cryptosystem for high-traffic applications. To achieve high throughput, loop-unrolling, inner and outer pipelining techniques are employed. In AES, substitution bytes (Sub-Bytes) is one of the costly functions that occupy a large number of resources and has a large delay. To reduce the area of Sub-Bytes, new-affine-transformation, which is the combination of inverse isomorphic and affine transformation, is proposed and employed. Besides that, AES has been modified according to the proposed architecture. For the first nine rounds, Shift-Rows and Sub-Bytes have been exchanged, and Shift-Rows is merged with Add-Round-Key. To make an equal latency between stages, Mix-Columns is divided into two different stages. AES is implemented in counter mode on Xilinx Virtex-5 using VHDL. The proposed implementation achieves a throughput of 79.7 Gbps, FPGA-Eff of 13.3 Mbps/slice, and frequency of 622.4 MHz. Compared to the state-of-the-art work, the proposed design has improved data throughput by 8.02% and FPGA-Eff by 22.63%.

本研究提出了一种高通量现场可编程门阵列(FPGA)实现的高级加密标准-128(AES-128)。AES是一种众所周知的对称密钥加密算法,对不同应用中广泛使用的不同攻击具有很高的安全性。本研究的主要目标是设计一种适用于高流量应用的高吞吐量和FPGA效率(FPGA-Eff)密码系统。为了实现高吞吐量,采用了循环展开、内部和外部流水线技术。在AES中,替换字节(Sub bytes)是一种代价高昂的函数,占用大量资源并具有较大的延迟。为了减少子字节的面积,提出并采用了新的仿射变换,它是逆同构和仿射变换的结合。除此之外,还根据所提出的体系结构对AES进行了修改。在前九轮中,移位行和子字节已经交换,移位行与添加循环键合并。为了使阶段之间的延迟相等,Mix-Columns被分为两个不同的阶段。AES是在Xilinx Virtex-5上使用VHDL以计数器模式实现的。所提出的实现实现了79.7 Gbps的吞吐量、13.3 Mbps/片的FPGA效率和622.4 MHz的频率。与最先进的工作相比,所提出的设计将数据吞吐量提高了8.02%,FPGA效率提高了22.63%。
{"title":"High throughput and area-efficient FPGA implementation of AES for high-traffic applications","authors":"Karim Shahbazi,&nbsp;Seok-Bum Ko","doi":"10.1049/iet-cdt.2019.0179","DOIUrl":"https://doi.org/10.1049/iet-cdt.2019.0179","url":null,"abstract":"<div>\u0000 <p>This study presents a high throughput field-programmable gate array (FPGA) implementation of advanced encryption standard-128 (AES-128). AES is a well-known symmetric key encryption algorithm with high security against different attacks that are widely used in different applications. The main goal of this study is to design a high throughput and FPGA efficiency (FPGA-Eff) cryptosystem for high-traffic applications. To achieve high throughput, loop-unrolling, inner and outer pipelining techniques are employed. In AES, substitution bytes (Sub-Bytes) is one of the costly functions that occupy a large number of resources and has a large delay. To reduce the area of Sub-Bytes, new-affine-transformation, which is the combination of inverse isomorphic and affine transformation, is proposed and employed. Besides that, AES has been modified according to the proposed architecture. For the first nine rounds, Shift-Rows and Sub-Bytes have been exchanged, and Shift-Rows is merged with Add-Round-Key. To make an equal latency between stages, Mix-Columns is divided into two different stages. AES is implemented in counter mode on Xilinx Virtex-5 using VHDL. The proposed implementation achieves a throughput of 79.7 Gbps, FPGA-Eff of 13.3 Mbps/slice, and frequency of 622.4 MHz. Compared to the state-of-the-art work, the proposed design has improved data throughput by 8.02% and FPGA-Eff by 22.63%.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"14 6","pages":"344-352"},"PeriodicalIF":1.2,"publicationDate":"2020-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/iet-cdt.2019.0179","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72147015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
VLSI implementation of anti-notch lattice structure for identification of exon regions in Eukaryotic genes 用于真核基因外显子区域识别的反陷格结构的VLSI实现
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2020-05-21 DOI: 10.1049/iet-cdt.2019.0086
Vikas Pathak, Satyasai Jagannath Nanda, Amit Mahesh Joshi, Sitanshu Sekhar Sahu

In a Eukaryotic gene, identification of exon regions is crucial for protein formation. The periodic-3 property of exon regions has been used for its identification. An anti-notch infinite impulse response (IIR) filter is mostly employed to recognise this periodic-3 property. The lattice structure realisation of anti-notch IIR filter requires less hardware over direct from-II structures. In this study, a hardware implementation of IIR anti-notch filter lattice structure is carried out on Zynq-series (Zybo board) field programmable gate array (FPGA). The performance of hardware design has been improved using techniques like retiming, pipelining and unfolding and finally assessed on various Eukaryotic genes. The hardware implementation reduces the time frame to analyse the DNA sequence of Eukaryotic genes for protein formation, which plays a significant role in detecting individual diseases from genetic reports. Here, the performance evaluation is carried out in MATLAB simulation environment and the results are found similar. Application-specific integrated circuit (ASIC) implementation of the anti-notch filter lattice structure is also carried out on CADENCE-RTL compiler. It is observed that the FPGA implementation is 31 to 34 times faster and ASIC implementation is 58 to 64 times faster compared to the results generated by MATLAB platform with similar prediction accuracy.

在真核基因中,外显子区域的鉴定对蛋白质的形成至关重要。外显子区域的周期-3性质已用于其鉴定。抗陷波无限脉冲响应(IIR)滤波器主要用于识别这种周期-3性质。反陷波IIR滤波器的晶格结构实现需要比直接来自II结构更少的硬件。本研究在Zynq系列(Zybo板)现场可编程门阵列(FPGA)上实现了IIR抗陷波滤波器晶格结构的硬件实现。硬件设计的性能已经通过重新定时、流水线和展开等技术得到了改善,并最终在各种真核基因上进行了评估。硬件实现缩短了分析真核基因DNA序列以形成蛋白质的时间框架,这在从遗传报告中检测单个疾病方面发挥了重要作用。在此,在MATLAB仿真环境中进行了性能评估,结果相似。在CADENCE-RTL编译器上还实现了反陷波滤波器格结构的专用集成电路(ASIC)实现。与具有类似预测精度的MATLAB平台生成的结果相比,FPGA实现快31到34倍,ASIC实现快58到64倍。
{"title":"VLSI implementation of anti-notch lattice structure for identification of exon regions in Eukaryotic genes","authors":"Vikas Pathak,&nbsp;Satyasai Jagannath Nanda,&nbsp;Amit Mahesh Joshi,&nbsp;Sitanshu Sekhar Sahu","doi":"10.1049/iet-cdt.2019.0086","DOIUrl":"https://doi.org/10.1049/iet-cdt.2019.0086","url":null,"abstract":"<div>\u0000 <p>In a Eukaryotic gene, identification of exon regions is crucial for protein formation. The periodic-3 property of exon regions has been used for its identification. An anti-notch infinite impulse response (IIR) filter is mostly employed to recognise this periodic-3 property. The lattice structure realisation of anti-notch IIR filter requires less hardware over direct from-II structures. In this study, a hardware implementation of IIR anti-notch filter lattice structure is carried out on Zynq-series (Zybo board) field programmable gate array (FPGA). The performance of hardware design has been improved using techniques like retiming, pipelining and unfolding and finally assessed on various Eukaryotic genes. The hardware implementation reduces the time frame to analyse the DNA sequence of Eukaryotic genes for protein formation, which plays a significant role in detecting individual diseases from genetic reports. Here, the performance evaluation is carried out in MATLAB simulation environment and the results are found similar. Application-specific integrated circuit (ASIC) implementation of the anti-notch filter lattice structure is also carried out on CADENCE-RTL compiler. It is observed that the FPGA implementation is 31 to 34 times faster and ASIC implementation is 58 to 64 times faster compared to the results generated by MATLAB platform with similar prediction accuracy.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"14 5","pages":"217-229"},"PeriodicalIF":1.2,"publicationDate":"2020-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1049/iet-cdt.2019.0086","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72160389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Lower complexity error location detection block of adjacent error correcting decoder for SRAMs SRAM相邻纠错解码器的低复杂度错误位置检测块
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2020-05-21 DOI: 10.1049/iet-cdt.2019.0268
Raj Kumar Maity, Sayan Tripathi, Jagannath Samanta, Jaydeb Bhaumik

Multiple cell upsets (MCUs) caused by radiation is an important issue related to the reliability of embedded static random access memories (SRAMs). Multiple random and adjacent error correcting codes have been extensively employed for several years to protect stored data in SRAMs against MCUs. A compact and fast error correcting codec is desirable in most of these applications. In this study, simplified expressions for error location detection (ELD) block for single error correction-double error detection-double adjacent error correction (SEC-DED-DAEC) and single error correction-double error detection-triple adjacent error correction (SEC-DED-TAEC) decoders have been obtained by employing Karnaugh map. The conventional SEC-DED-DAEC and SEC-DED-TAEC decoders have been designed and implemented in both field-programmable gate array and ASIC platforms by considering these simplified ELD expressions. In FPGA platform, the proposed design for SEC-DED-DAEC and SEC-DED-TAEC decoders require 1.37–28.40% improvement in area and maximum 14.74% improvement in delay compared to existing designs. Whereas ASIC-based designs provide 2.20–26.81% reduction in area and 0.30–28.96% reduction in delay compared to existing related works. So the proposed design can be considered as an efficient alternative of traditional adjacent error correcting decoders in resource constraint applications.

辐射引起的多单元混乱(MCU)是影响嵌入式静态随机存取存储器(SRAM)可靠性的一个重要问题。多年来,多个随机和相邻的纠错码已被广泛用于保护SRAM中存储的数据不受MCU的影响。在大多数这些应用中,紧凑且快速的纠错编解码器是合乎需要的。在本研究中,利用卡诺图获得了单纠错双误检测双相邻纠错(SEC-DED-DAEC)和单纠错双错检测三相邻纠错(SEC-DED-TAEC)解码器的错误位置检测(ELD)块的简化表达式。通过考虑这些简化的ELD表达式,已经在现场可编程门阵列和ASIC平台上设计和实现了传统的SEC-DED-DAEC和SEC-DED-TAEC解码器。在FPGA平台中,与现有设计相比,SEC-DED-DAEC和SEC-DED-TAEC解码器的拟议设计需要1.37–28.40%的面积改进和14.74%的延迟改进。而基于ASIC的设计与现有相关工程相比,面积减少了2.20–26.81%,延迟减少了0.30–28.96%。因此,在资源受限的应用中,所提出的设计可以被认为是传统相邻纠错解码器的有效替代方案。
{"title":"Lower complexity error location detection block of adjacent error correcting decoder for SRAMs","authors":"Raj Kumar Maity,&nbsp;Sayan Tripathi,&nbsp;Jagannath Samanta,&nbsp;Jaydeb Bhaumik","doi":"10.1049/iet-cdt.2019.0268","DOIUrl":"https://doi.org/10.1049/iet-cdt.2019.0268","url":null,"abstract":"<div>\u0000 <p>Multiple cell upsets (MCUs) caused by radiation is an important issue related to the reliability of embedded static random access memories (SRAMs). Multiple random and adjacent error correcting codes have been extensively employed for several years to protect stored data in SRAMs against MCUs. A compact and fast error correcting codec is desirable in most of these applications. In this study, simplified expressions for error location detection (ELD) block for single error correction-double error detection-double adjacent error correction (SEC-DED-DAEC) and single error correction-double error detection-triple adjacent error correction (SEC-DED-TAEC) decoders have been obtained by employing Karnaugh map. The conventional SEC-DED-DAEC and SEC-DED-TAEC decoders have been designed and implemented in both field-programmable gate array and ASIC platforms by considering these simplified ELD expressions. In FPGA platform, the proposed design for SEC-DED-DAEC and SEC-DED-TAEC decoders require 1.37–28.40% improvement in area and maximum 14.74% improvement in delay compared to existing designs. Whereas ASIC-based designs provide 2.20–26.81% reduction in area and 0.30–28.96% reduction in delay compared to existing related works. So the proposed design can be considered as an efficient alternative of traditional adjacent error correcting decoders in resource constraint applications.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"14 5","pages":"210-216"},"PeriodicalIF":1.2,"publicationDate":"2020-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1049/iet-cdt.2019.0268","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72160390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Accelerated LiDAR data processing algorithm for self-driving cars on the heterogeneous computing platform 异构计算平台上用于自动驾驶汽车的加速LiDAR数据处理算法
IF 1.2 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2020-05-19 DOI: 10.1049/iet-cdt.2019.0166
Wei Li, Jun Liang, Yunquan Zhang, Haipeng Jia, Lin Xiao, Qing Li

In recent years, light detection and ranging (LiDAR) has been widely used in the field of self-driving cars, and the LiDAR data processing algorithm is the core algorithm used for environment perception in self-driving cars. At the same time, the real-time performance of the LiDAR data processing algorithm is highly demanding in self-driving cars. The LiDAR point cloud is characterised by its high density and uneven distribution, which poses a severe challenge in the implementation and optimisation of data processing algorithms. In view of the distribution characteristics of LiDAR data and the characteristics of the data processing algorithm, this study completes the implementation and optimisation of the LiDAR data processing algorithm on an NVIDIA Tegra X2 computing platform and greatly improves the real-time performance of LiDAR data processing algorithms. The experimental results show that compared with an Intel® Core™ i7 industrial personal computer, the optimised algorithm improves feature extraction by nearly 4.5 times, obstacle clustering by nearly 3.5 times, and the performance of the whole algorithm by 2.3 times.

近年来,光探测与测距(LiDAR)在自动驾驶汽车领域得到了广泛应用,而LiDAR数据处理算法是用于自动驾驶汽车环境感知的核心算法。同时,激光雷达数据处理算法的实时性在自动驾驶汽车中要求很高。激光雷达点云的特点是其高密度和不均匀分布,这对数据处理算法的实现和优化提出了严峻挑战。针对激光雷达数据的分布特点和数据处理算法的特点,本研究在NVIDIA Tegra X2计算平台上完成了激光雷达数据处理算法实现和优化,大大提高了激光雷达处理算法的实时性。实验结果表明,与Intel®Core相比™ i7工控机,优化后的算法将特征提取提高了近4.5倍,障碍物聚类提高了近3.5倍,整个算法的性能提高了2.3倍。
{"title":"Accelerated LiDAR data processing algorithm for self-driving cars on the heterogeneous computing platform","authors":"Wei Li,&nbsp;Jun Liang,&nbsp;Yunquan Zhang,&nbsp;Haipeng Jia,&nbsp;Lin Xiao,&nbsp;Qing Li","doi":"10.1049/iet-cdt.2019.0166","DOIUrl":"https://doi.org/10.1049/iet-cdt.2019.0166","url":null,"abstract":"<div>\u0000 <p>In recent years, light detection and ranging (LiDAR) has been widely used in the field of self-driving cars, and the LiDAR data processing algorithm is the core algorithm used for environment perception in self-driving cars. At the same time, the real-time performance of the LiDAR data processing algorithm is highly demanding in self-driving cars. The LiDAR point cloud is characterised by its high density and uneven distribution, which poses a severe challenge in the implementation and optimisation of data processing algorithms. In view of the distribution characteristics of LiDAR data and the characteristics of the data processing algorithm, this study completes the implementation and optimisation of the LiDAR data processing algorithm on an NVIDIA Tegra X2 computing platform and greatly improves the real-time performance of LiDAR data processing algorithms. The experimental results show that compared with an Intel® Core™ i7 industrial personal computer, the optimised algorithm improves feature extraction by nearly 4.5 times, obstacle clustering by nearly 3.5 times, and the performance of the whole algorithm by 2.3 times.</p>\u0000 </div>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"14 5","pages":"201-209"},"PeriodicalIF":1.2,"publicationDate":"2020-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/iet-cdt.2019.0166","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72158194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IET Computers and Digital Techniques
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1