{"title":"Latency optimized architectures for a real-time inference pipeline for control tasks","authors":"Florian Schellroth, Jannik Lehner, A. Verl","doi":"10.1109/ict4da53266.2021.9672224","DOIUrl":null,"url":null,"abstract":"With the increasing development of GPUs, the inference time of CNNs continues to decrease. This enables new AI applications in manufacturing that have a direct impact on the control of a process. For this, a GPU is integrated into a real-time system so that the CNN can be executed in real-time. However, it is not sufficient to consider the inference process only, but also to minimize the latency of the whole pipeline. For this purpose, execution strategies of the inference pipeline are presented and evaluated in this paper. The presented architectures are compared using criteria for latency, implementation effort, and exchangeability. The latencies are quantified with measurements on a demonstrator. As a result, the most synchronous architecture has the lowest latency but is not suitable for the use in a service-oriented architecture as targeted by the Industry 4.0. For this, another architecture is presented, providing a good balance between latency and service orientation.","PeriodicalId":371663,"journal":{"name":"2021 International Conference on Information and Communication Technology for Development for Africa (ICT4DA)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Information and Communication Technology for Development for Africa (ICT4DA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ict4da53266.2021.9672224","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With the increasing development of GPUs, the inference time of CNNs continues to decrease. This enables new AI applications in manufacturing that have a direct impact on the control of a process. For this, a GPU is integrated into a real-time system so that the CNN can be executed in real-time. However, it is not sufficient to consider the inference process only, but also to minimize the latency of the whole pipeline. For this purpose, execution strategies of the inference pipeline are presented and evaluated in this paper. The presented architectures are compared using criteria for latency, implementation effort, and exchangeability. The latencies are quantified with measurements on a demonstrator. As a result, the most synchronous architecture has the lowest latency but is not suitable for the use in a service-oriented architecture as targeted by the Industry 4.0. For this, another architecture is presented, providing a good balance between latency and service orientation.