Workshop on Design and Architectures for Signal and Image Processing (14th edition)最新文献

英文中文

Hardware-software implementation of the PointPillars network for 3D object detection in point clouds 点云中三维目标检测的PointPillars网络的硬件软件实现

Workshop on Design and Architectures for Signal and Image Processing (14th edition)

Pub Date : 2021-01-18 DOI: 10.1145/3441110.3441150

Joanna Stanisz, K. Lis, T. Kryjak, M. Gorgon

In this paper, we present a hardware-software implementation of a deep neural network for object detection based on a point cloud obtained by a LiDAR sensor. The Brevitas / PyTorch tools were used for network quantisation and the FINN tool for hardware implementation in the reprogrammable Zynq UltraScale+ MPSoC device. The PointPillars network was used in the research, as it is a reasonable compromise between detection accuracy and calculation complexity. The obtained results show that quite a significant computation precision limitation along with a few network architecture simplifications allows the solution to be implemented on an heterogeneous embedded platform with reasonable detection accuracy.

在本文中，我们提出了一种基于激光雷达传感器获得的点云的深度神经网络目标检测的硬件软件实现。Brevitas / PyTorch工具用于网络量化，FINN工具用于可编程Zynq UltraScale+ MPSoC设备的硬件实现。研究中使用了PointPillars网络，因为它在检测精度和计算复杂度之间取得了合理的折衷。结果表明，该方案在计算精度上有相当大的限制，并且对网络架构进行了一些简化，使得该方案能够在异构嵌入式平台上实现，并且具有合理的检测精度。

引用次数: 1

Convolutional Fully-Connected Capsule Network (CFC-CapsNet) 卷积全连接胶囊网络(CFC-CapsNet)

Workshop on Design and Architectures for Signal and Image Processing (14th edition)

Pub Date : 2021-01-18 DOI: 10.1145/3441110.3441148

Pouya Shiri, A. Baniasadi

Capsule Networks (CapsNets) are the new generation of classifiers with several advantages over the previous ones. Such advantages include higher robustness to affine transformed datasets and detection of overlapping images. CapsNets, while obtaining state-of-the-art accuracy on the MNIST digit recognition dataset, fall behind Convolutional Neural Networks (CNNs) for other datasets. Moreover, CapsNets are slow compared to CNNs. In this work, we propose Convolutional Fully Connected (CFC) CapsNet as an alternative enhanced architecture to conventional CapsNet [8]. CFC-CapsNet is a more efficient network: training and testing are performed faster and a slightly higher accuracy is achieved compared to the conventional CapsNet. CFC-CapsNet includes fewer trainable weights (parameters) and therefore is more efficient in terms of memory usage. The code for CFC-CapsNet is available on Github 1.

胶囊网络(Capsule Networks, CapsNets)是新一代的分类器，与之前的分类器相比具有许多优点。这些优点包括对仿射变换数据集的更高鲁棒性和重叠图像的检测。capnet虽然在MNIST数字识别数据集上获得了最先进的精度，但在其他数据集上却落后于卷积神经网络(cnn)。此外，与cnn相比，capnet速度较慢。在这项工作中，我们提出卷积全连接(CFC) CapsNet作为传统CapsNet的另一种增强架构[8]。CFC-CapsNet是一个更高效的网络:与传统的CapsNet相比，训练和测试执行得更快，准确性略高。CFC-CapsNet包含较少的可训练权值(参数)，因此在内存使用方面更有效。CFC-CapsNet的代码可在Github 1上获得。

引用次数: 5

Automotive perception system evaluation with reference data obtained by a UAV 利用无人机获取的参考数据对汽车感知系统进行评估

Workshop on Design and Architectures for Signal and Image Processing (14th edition)

Pub Date : 2021-01-18 DOI: 10.1145/3441110.3441151

Krzysztof Błachut, M. Danilowicz, Hubert Szolc, Mateusz Wasala, T. Kryjak, Nikodem Pankiewicz, M. Komorkiewicz

Testing and evaluation of an automotive perception system is a complicated task which requires special equipment and infrastructure. To compute key performance indicators and compare the results with the real-world situation, some additional sensors and manual data labelling are often required. In this article, we propose a different approach, which is based on a UAV equipped with a 4K camera flying above the test track. Thanks to the synchronisation of the sensors between the tested vehicle and the UAV, it is possible to precisely determine the positions of the objects around the car and correlate them with the perception system readings. The performed experiments indicate that this approach could be an interesting alternative to the existing evaluation solutions.

汽车感知系统的测试和评估是一项复杂的任务，需要特殊的设备和基础设施。为了计算关键性能指标并将结果与实际情况进行比较，通常需要一些额外的传感器和手动数据标记。在本文中，我们提出了一种不同的方法，该方法基于配备4K摄像机的无人机在测试轨道上方飞行。由于测试车辆和无人机之间传感器的同步，可以精确地确定车辆周围物体的位置，并将它们与感知系统读数相关联。实验表明，这种方法可能是现有评估解决方案的一个有趣的替代方案。

引用次数: 2

Low-Power Sign-Magnitude FFT Design for FMCW Radar Signal Processing FMCW雷达信号处理的低功率信号幅值FFT设计

Workshop on Design and Architectures for Signal and Image Processing (14th edition)

Pub Date : 2021-01-18 DOI: 10.1145/3441110.3441145

O. Meteer, M. Bekooij

Fully integrated CMOS frequency-modulated continuous-wave radar ICs are under development, in which computing FFTs cost a significant amount of energy. In this paper we introduce a power-efficient FFT solution which exploits that intermediate results of FFT computations typically have small amplitudes in FMCW radar systems. We propose using the sign-magnitude number representation combined with a custom, unsigned Booth multiplier that does not generate negative numbers internally, significantly decreasing switching activity. RTL power-simulation results show up to 46.45% less power usage with our sign-magnitude radix-2 FFT implementation compared to a two’s complement design, while only having a 6.67% lower maximum clock speed.

完全集成的CMOS调频连续波雷达ic正在开发中，其中计算fft需要消耗大量的能量。本文介绍了一种节能的FFT解决方案，该方案利用了FMCW雷达系统中FFT计算的中间结果通常具有较小的振幅。我们建议使用符号大小数表示法与自定义的无符号布斯乘法器相结合，该乘法器内部不会产生负数，从而显着减少切换活动。RTL功率仿真结果显示，与2的互补设计相比，我们的符号幅度基数为2的FFT实现的功耗降低了46.45%，而最大时钟速度仅降低了6.67%。

引用次数: 1

On Cache Limits for Dataflow Applications and Related Efficient Memory Management Strategies 数据流应用的缓存限制及相关的高效内存管理策略

Workshop on Design and Architectures for Signal and Image Processing (14th edition)

Pub Date : 2021-01-18 DOI: 10.1145/3441110.3441573

Alemeh Ghasemi, R. Cataldo, J. Diguet, Kevin J. M. Martin

The dataflow paradigm frees the designer to focus on the functionality of an application, independently from the underlying architecture executing it. While mapping the dataflow computational part to the cores seems obvious, the memory aspects do not match accordingly. Dataflow compilers usually do not consider the presence of caches when generating code. A generally accepted idea is that bigger and multi-level caches improve the performance of applications. Unfortunately, state-of-the-art dataflow compilers may prove the exception to this rule. This paper presents two efficient memory management strategies for dataflow applications through a study on the impact of sharing, size, and the number of levels of caches on them. The results show that bigger is not always better, and the foreseen future of more cores and bigger caches do not guarantee software-free better performance for dataflow applications. We propose two strategies, that can be used concurrently, to address the memory aspects of the dataflow model: copy-on-write and non-temporal memory transfers. Experimental results show that we speed up a computer stereo vision application by 2.1 × and reduce the number of L1 data cache misses by 45% while maintaining the actors’ source code and design intact.

数据流范式使设计人员能够专注于应用程序的功能，而不依赖于执行应用程序的底层体系结构。虽然将数据流计算部分映射到核心似乎很明显，但内存方面并没有相应地匹配。数据流编译器在生成代码时通常不考虑缓存的存在。一个普遍接受的观点是，更大的多级缓存可以提高应用程序的性能。不幸的是，最先进的数据流编译器可能是这条规则的例外。本文通过研究共享、大小和缓存级别对数据流应用程序的影响，提出了两种有效的内存管理策略。结果表明，更大并不总是更好，并且可以预见的更多核心和更大缓存的未来并不能保证数据流应用程序的无软件更好的性能。我们提出了两种可以并发使用的策略来解决数据流模型的内存方面的问题:写时复制和非临时内存传输。实验结果表明，在保持演员源代码和设计完整的情况下，我们将计算机立体视觉应用程序的速度提高了2.1倍，将L1数据缓存丢失次数减少了45%。

{"title":"On Cache Limits for Dataflow Applications and Related Efficient Memory Management Strategies","authors":"Alemeh Ghasemi, R. Cataldo, J. Diguet, Kevin J. M. Martin","doi":"10.1145/3441110.3441573","DOIUrl":"https://doi.org/10.1145/3441110.3441573","url":null,"abstract":"The dataflow paradigm frees the designer to focus on the functionality of an application, independently from the underlying architecture executing it. While mapping the dataflow computational part to the cores seems obvious, the memory aspects do not match accordingly. Dataflow compilers usually do not consider the presence of caches when generating code. A generally accepted idea is that bigger and multi-level caches improve the performance of applications. Unfortunately, state-of-the-art dataflow compilers may prove the exception to this rule. This paper presents two efficient memory management strategies for dataflow applications through a study on the impact of sharing, size, and the number of levels of caches on them. The results show that bigger is not always better, and the foreseen future of more cores and bigger caches do not guarantee software-free better performance for dataflow applications. We propose two strategies, that can be used concurrently, to address the memory aspects of the dataflow model: copy-on-write and non-temporal memory transfers. Experimental results show that we speed up a computer stereo vision application by 2.1 × and reduce the number of L1 data cache misses by 45% while maintaining the actors’ source code and design intact.","PeriodicalId":398729,"journal":{"name":"Workshop on Design and Architectures for Signal and Image Processing (14th edition)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121667171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multiple Transform Selection concept modeling and implementation using Interface Based SDF graphs 使用基于接口的SDF图的多重变换选择概念建模和实现

Workshop on Design and Architectures for Signal and Image Processing (14th edition)

Pub Date : 2021-01-18 DOI: 10.1145/3441110.3441153

Naouel Haggui, Fatma Belghith, W. Hamidouche, N. Masmoudi, J. Nezan

Recent studies predict that video data accounts for 82% of Internet traffic by 2022. This fact has motivated MPEG to define a new Video Coding Standard called Versatile Video Coding (VVC), which will be released by the end of 2020. VVC will offer the possibility to handle new video formats and to improve significantly video compression over its predecessor HEVC. Indeed, the objective is to reduce the necessary bit rate by half, at equivalent quality. These advances require the use of more complex algorithms, although the increase in complexity has been limited throughout the standardization process. In order to decrease the complexity of VVC and consequently the coding execution time, several methods have been introduced at different stages of the encoder. The aim of this paper is to explore the available parallelism of VVC to accelerate the coding and the decoding processes. This paper focuses on the transformation block and more specifically the new concept of Multiple Transform Selection (MTS) introduced by VVC. Moreover, a study of several granularity levels of Interface-Based Synchronous Dataflow (IBSDF) models and their impact on the performances obtained on x86 architectures is presented. IBSDF dataflow graph has been developed to reveal the available parallelism of MTS. The PREESM fast prototyping tool is then used for the mapping and the scheduling of MTS on virtual and real parallel architectures and for generating efficient parallel implementations on real architectures. PREESM has been used in this work to explore the potential parallelism offered by MTS and to prove the efficiency of MTS on multicore x86 architectures. Experimental results show a speed-up close to the optimum.

最近的研究预测，到2022年，视频数据将占互联网流量的82%。这一事实促使MPEG定义了一种新的视频编码标准，称为多功能视频编码(VVC)，该标准将于2020年底发布。VVC将提供处理新视频格式的可能性，并在其前身HEVC的基础上显著改善视频压缩。实际上，目标是在同等质量下将必要的比特率降低一半。这些进步需要使用更复杂的算法，尽管在整个标准化过程中复杂性的增加是有限的。为了降低VVC的复杂度，从而减少编码的执行时间，在编码器的不同阶段引入了几种方法。本文的目的是探索VVC的可用并行性，以加快编码和解码过程。本文重点研究了变换块，特别是VVC引入的多重变换选择(Multiple Transform Selection, MTS)的新概念。此外，还研究了基于接口的同步数据流(IBSDF)模型的几种粒度级别及其对x86架构下性能的影响。通过建立IBSDF数据流图来揭示MTS的可用并行性，然后利用PREESM快速原型工具对MTS在虚拟和真实并行体系结构上的映射和调度，并在真实体系结构上生成高效的并行实现。在这项工作中，PREESM被用于探索MTS提供的潜在并行性，并证明MTS在多核x86架构上的效率。实验结果表明，加速速度接近最优。

{"title":"Multiple Transform Selection concept modeling and implementation using Interface Based SDF graphs","authors":"Naouel Haggui, Fatma Belghith, W. Hamidouche, N. Masmoudi, J. Nezan","doi":"10.1145/3441110.3441153","DOIUrl":"https://doi.org/10.1145/3441110.3441153","url":null,"abstract":"Recent studies predict that video data accounts for 82% of Internet traffic by 2022. This fact has motivated MPEG to define a new Video Coding Standard called Versatile Video Coding (VVC), which will be released by the end of 2020. VVC will offer the possibility to handle new video formats and to improve significantly video compression over its predecessor HEVC. Indeed, the objective is to reduce the necessary bit rate by half, at equivalent quality. These advances require the use of more complex algorithms, although the increase in complexity has been limited throughout the standardization process. In order to decrease the complexity of VVC and consequently the coding execution time, several methods have been introduced at different stages of the encoder. The aim of this paper is to explore the available parallelism of VVC to accelerate the coding and the decoding processes. This paper focuses on the transformation block and more specifically the new concept of Multiple Transform Selection (MTS) introduced by VVC. Moreover, a study of several granularity levels of Interface-Based Synchronous Dataflow (IBSDF) models and their impact on the performances obtained on x86 architectures is presented. IBSDF dataflow graph has been developed to reveal the available parallelism of MTS. The PREESM fast prototyping tool is then used for the mapping and the scheduling of MTS on virtual and real parallel architectures and for generating efficient parallel implementations on real architectures. PREESM has been used in this work to explore the potential parallelism offered by MTS and to prove the efficiency of MTS on multicore x86 architectures. Experimental results show a speed-up close to the optimum.","PeriodicalId":398729,"journal":{"name":"Workshop on Design and Architectures for Signal and Image Processing (14th edition)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131931103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Gegelati: Lightweight Artificial Intelligence through Generic and Evolvable Tangled Program Graphs Gegelati:通过通用和进化的纠结程序图的轻量级人工智能

Workshop on Design and Architectures for Signal and Image Processing (14th edition)

Pub Date : 2020-12-15 DOI: 10.1145/3441110.3441575

K. Desnos, Nicolas Sourbier, Pierre-Yves Raumer, Olivier Gesny, M. Pelcat

Tangled Program Graph (TPG) is a reinforcement learning technique based on genetic programming concepts. On state-of-the-art learning environments, TPGs have been shown to offer comparable competence with Deep Neural Networks (DNNs), for a fraction of their computational and storage cost. This lightness of TPGs, both for training and inference, makes them an interesting model to implement Artificial Intelligences (AIs) on embedded systems with limited computational and storage resources. In this paper, we introduce the Gegelati library for TPGs. Besides introducing the general concepts and features of the library, two main contributions are detailed in the paper: 1/ The parallelization of the deterministic training process of TPGs, for supporting heterogeneous Multiprocessor Systems-on-Chipss (MPSoCss). 2/ The support for customizable instruction sets and data types within the genetically evolved programs of the TPG model. The scalability of the parallel training process is demonstrated through experiments on architectures ranging from a high-end 24-core processor to a low-power heterogeneous MPSoCs. The impact of customizable instructions on the outcome of a training process is demonstrated on a state-of-the-art reinforcement learning environment.

纠缠程序图(TPG)是一种基于遗传规划概念的强化学习技术。在最先进的学习环境中，TPGs已被证明可以提供与深度神经网络(dnn)相当的能力，而其计算和存储成本只是前者的一小部分。对于训练和推理来说，这种轻量级的TPGs使它们成为在计算和存储资源有限的嵌入式系统上实现人工智能(ai)的有趣模型。本文介绍了用于TPGs的Gegelati库。除了介绍该库的一般概念和特点外，本文还详细介绍了两个主要贡献:1/为支持异构多处理器片上系统(MPSoCss)，实现了TPGs的确定性训练过程的并行化。2/在TPG模型的遗传进化程序中支持可定制的指令集和数据类型。通过从高端24核处理器到低功耗异构mpsoc的架构实验，证明了并行训练过程的可扩展性。可定制指令对训练过程结果的影响在最先进的强化学习环境中进行了演示。

{"title":"Gegelati: Lightweight Artificial Intelligence through Generic and Evolvable Tangled Program Graphs","authors":"K. Desnos, Nicolas Sourbier, Pierre-Yves Raumer, Olivier Gesny, M. Pelcat","doi":"10.1145/3441110.3441575","DOIUrl":"https://doi.org/10.1145/3441110.3441575","url":null,"abstract":"Tangled Program Graph (TPG) is a reinforcement learning technique based on genetic programming concepts. On state-of-the-art learning environments, TPGs have been shown to offer comparable competence with Deep Neural Networks (DNNs), for a fraction of their computational and storage cost. This lightness of TPGs, both for training and inference, makes them an interesting model to implement Artificial Intelligences (AIs) on embedded systems with limited computational and storage resources. In this paper, we introduce the Gegelati library for TPGs. Besides introducing the general concepts and features of the library, two main contributions are detailed in the paper: 1/ The parallelization of the deterministic training process of TPGs, for supporting heterogeneous Multiprocessor Systems-on-Chipss (MPSoCss). 2/ The support for customizable instruction sets and data types within the genetically evolved programs of the TPG model. The scalability of the parallel training process is demonstrated through experiments on architectures ranging from a high-end 24-core processor to a low-power heterogeneous MPSoCs. The impact of customizable instructions on the outcome of a training process is demonstrated on a state-of-the-art reinforcement learning environment.","PeriodicalId":398729,"journal":{"name":"Workshop on Design and Architectures for Signal and Image Processing (14th edition)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126202449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Workshop on Design and Architectures for Signal and Image Processing (14th edition)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀