首页 > 最新文献

IEEE Journal on Emerging and Selected Topics in Circuits and Systems最新文献

英文 中文
A Highly-Scalable Deep-Learning Accelerator With a Cost-Effective Chip-to-Chip Adapter and a C2C-Communication-Aware Scheduler 具有成本效益的芯片到芯片适配器和 C2C 通信感知调度器的高扩展性深度学习加速器
IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-07-01 DOI: 10.1109/JETCAS.2024.3421553
Jicheon Kim;Chunmyung Park;Eunjae Hyun;Xuan Truong Nguyen;Hyuk-Jae Lee
Multi-chip-module (MCM) technology heralds a new era for scalable DNN inference systems, offering a cost-effective alternative to large-scale monolithic designs by lowering fabrication and design costs. Nevertheless, MCMs often incur resource and performance overheads due to inter-chip communication, which largely reduce a performance gain in a scaling-out system. To address these challenges, this paper introduces a highly-scalable DNN accelerator with a lightweight chip-to-chip adapter (C2CA) and a C2C-communication-aware scheduler. Our design employs a C2CA for inter-chip communication, which accurately illustrates an MCM system with a constrained C2C bandwidth, e.g., about 1/16, 1/8, or 1/4 of an on-chip bandwidth. We empirically reveal that the limited C2C bandwidth largely affects the overall performance gain of an MCM system. For example, compared with the one-core engine, a four-chip MCM system with a constrained C2C bandwidth only achieves $2.60times $ , $3.27times $ , $2.84times $ , and $2.74times $ performance gains on ResNet50, DarkNet19, MobileNetV1, and EfficientNetS, respectively. Mitigating the problem, we propose a novel C2C-communication-aware scheduler with forward and backward inter-layer scheduling. Specifically, our scheduler effectively utilizes a C2C bandwidth while a core is performing its own computation. To demonstrate the effectiveness and practicality of our concept, we modeled our design with Verilog HDL and implemented it on an FPGA board, i.e., Xilinx ZCU104. The experimental results demonstrate that the system shows significant throughput improvements compared to a single-chip configuration, yielding average enhancements of $1.87times $ and $3.43times $ for two-chip and four-chip configurations, respectively, on ResNet50, DarkNet19, MobileNetV1, and EfficientNetS.
多芯片模块(MCM)技术预示着可扩展 DNN 推理系统进入了一个新时代,它通过降低制造和设计成本,为大规模单片设计提供了一种具有成本效益的替代方案。然而,MCM 通常会因芯片间通信而产生资源和性能开销,这在很大程度上降低了扩展型系统的性能提升。为了应对这些挑战,本文介绍了一种具有轻量级芯片到芯片适配器(C2CA)和 C2C 通信感知调度器的高可扩展 DNN 加速器。我们的设计采用了用于芯片间通信的 C2CA,准确地说明了 C2C 带宽受限的 MCM 系统,如约为片上带宽的 1/16、1/8 或 1/4。我们通过经验发现,有限的 C2C 带宽在很大程度上影响了 MCM 系统的整体性能增益。例如,与单核引擎相比,C2C带宽受限的四芯片MCM系统在ResNet50、DarkNet19、MobileNetV1和EfficientNetS上分别只实现了2.60/times $、3.27/times $、2.84/times $和2.74/times $的性能提升。为缓解这一问题,我们提出了一种新型的 C2C 通信感知调度器,具有前向和后向层间调度功能。具体来说,我们的调度器可在内核执行自身计算时有效利用 C2C 带宽。为了证明我们概念的有效性和实用性,我们用 Verilog HDL 对我们的设计进行了建模,并在 FPGA 板(即 Xilinx ZCU104)上进行了实现。实验结果表明,与单芯片配置相比,该系统的吞吐量有了显著提高,在 ResNet50、DarkNet19、MobileNetV1 和 EfficientNetS 上,双芯片和四芯片配置的平均提高幅度分别为 1.87 美元和 3.43 美元。
{"title":"A Highly-Scalable Deep-Learning Accelerator With a Cost-Effective Chip-to-Chip Adapter and a C2C-Communication-Aware Scheduler","authors":"Jicheon Kim;Chunmyung Park;Eunjae Hyun;Xuan Truong Nguyen;Hyuk-Jae Lee","doi":"10.1109/JETCAS.2024.3421553","DOIUrl":"10.1109/JETCAS.2024.3421553","url":null,"abstract":"Multi-chip-module (MCM) technology heralds a new era for scalable DNN inference systems, offering a cost-effective alternative to large-scale monolithic designs by lowering fabrication and design costs. Nevertheless, MCMs often incur resource and performance overheads due to inter-chip communication, which largely reduce a performance gain in a scaling-out system. To address these challenges, this paper introduces a highly-scalable DNN accelerator with a lightweight chip-to-chip adapter (C2CA) and a C2C-communication-aware scheduler. Our design employs a C2CA for inter-chip communication, which accurately illustrates an MCM system with a constrained C2C bandwidth, e.g., about 1/16, 1/8, or 1/4 of an on-chip bandwidth. We empirically reveal that the limited C2C bandwidth largely affects the overall performance gain of an MCM system. For example, compared with the one-core engine, a four-chip MCM system with a constrained C2C bandwidth only achieves \u0000<inline-formula> <tex-math>$2.60times $ </tex-math></inline-formula>\u0000, \u0000<inline-formula> <tex-math>$3.27times $ </tex-math></inline-formula>\u0000, \u0000<inline-formula> <tex-math>$2.84times $ </tex-math></inline-formula>\u0000, and \u0000<inline-formula> <tex-math>$2.74times $ </tex-math></inline-formula>\u0000 performance gains on ResNet50, DarkNet19, MobileNetV1, and EfficientNetS, respectively. Mitigating the problem, we propose a novel C2C-communication-aware scheduler with forward and backward inter-layer scheduling. Specifically, our scheduler effectively utilizes a C2C bandwidth while a core is performing its own computation. To demonstrate the effectiveness and practicality of our concept, we modeled our design with Verilog HDL and implemented it on an FPGA board, i.e., Xilinx ZCU104. The experimental results demonstrate that the system shows significant throughput improvements compared to a single-chip configuration, yielding average enhancements of \u0000<inline-formula> <tex-math>$1.87times $ </tex-math></inline-formula>\u0000 and \u0000<inline-formula> <tex-math>$3.43times $ </tex-math></inline-formula>\u0000 for two-chip and four-chip configurations, respectively, on ResNet50, DarkNet19, MobileNetV1, and EfficientNetS.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141522295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Secure consensus control for constrained multi-agent systems against intermittent denial-of-service attacks: an adaptive dynamic programming method 针对间歇性拒绝服务攻击的受限多代理系统安全共识控制:一种自适应动态编程方法
IF 4.6 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-06-28 DOI: 10.1109/jetcas.2024.3420396
Zhen Gao, Ning Zhao, Guangdeng Zong, Xudong Zhao
{"title":"Secure consensus control for constrained multi-agent systems against intermittent denial-of-service attacks: an adaptive dynamic programming method","authors":"Zhen Gao, Ning Zhao, Guangdeng Zong, Xudong Zhao","doi":"10.1109/jetcas.2024.3420396","DOIUrl":"https://doi.org/10.1109/jetcas.2024.3420396","url":null,"abstract":"","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141508148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Journal on Emerging and Selected Topics in Circuits and Systems information for authors 供作者参考的《IEEE 电路与系统新兴选题期刊
IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-06-01 DOI: 10.1109/JETCAS.2024.3417549
{"title":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems information for authors","authors":"","doi":"10.1109/JETCAS.2024.3417549","DOIUrl":"https://doi.org/10.1109/JETCAS.2024.3417549","url":null,"abstract":"","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10579095","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141494812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Journal on Emerging and Selected Topics in Circuits and Systems Publication Information 电气和电子工程师学会电路与系统新专题与选题期刊》出版信息
IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-06-01 DOI: 10.1109/JETCAS.2024.3405090
{"title":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems Publication Information","authors":"","doi":"10.1109/JETCAS.2024.3405090","DOIUrl":"https://doi.org/10.1109/JETCAS.2024.3405090","url":null,"abstract":"","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10579073","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141495163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Circuits and Systems Society 电气和电子工程师学会电路与系统协会
IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-06-01 DOI: 10.1109/JETCAS.2024.3405094
{"title":"IEEE Circuits and Systems Society","authors":"","doi":"10.1109/JETCAS.2024.3405094","DOIUrl":"https://doi.org/10.1109/JETCAS.2024.3405094","url":null,"abstract":"","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10579094","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141495247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Guest Editorial Advances in Generative Visual Signal Coding and Processing 特邀编辑:生成式视觉信号编码与处理的进展
IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-06-01 DOI: 10.1109/JETCAS.2024.3403318
Zhibo Chen;Heming Sun;Li Zhang;Fan Zhang
This special issue of IEEE Journal on Emerging and Selected Topics in Circuits and Systems (JETCAS) is dedicated to demonstrating the latest developments in algorithms, implementations, and applications related to visual signal coding and processing with generative models. In recent years, generative models have emerged as one of the most significant and rapidly developing areas of research in artificial intelligence. They have proved to be an important instrument for advancing research in AI-based visual signal coding and processing. For instance, the variational autoencoder (VAE) has been used as a fundamental framework for end-to-end learned image coding, the autoregressive (AR) model has been extensively studied for efficient entropy coding, and the generative adversarial network (GAN) has been utilized frequently to enhance the subjective quality of coding schemes. Meanwhile, generative models have also been explored in various visual signal processing tasks, including quality assessment, restoration, enhancement, editing, and interpolation.
本期《电气和电子工程师学会电路与系统新兴选题期刊》(IEEE Journal on Emerging and Selected Topics in Circuits and Systems,JETCAS)特刊致力于展示与生成模型视觉信号编码和处理相关的算法、实现和应用方面的最新进展。近年来,生成模型已成为人工智能领域最重要、发展最迅速的研究领域之一。事实证明,它们是推动基于人工智能的视觉信号编码和处理研究的重要工具。例如,变分自动编码器(VAE)已被用作端到端学习图像编码的基本框架,自回归(AR)模型已被广泛研究用于高效熵编码,生成对抗网络(GAN)已被频繁用于提高编码方案的主观质量。同时,生成模型还在各种视觉信号处理任务中得到了应用,包括质量评估、修复、增强、编辑和插值。
{"title":"Guest Editorial Advances in Generative Visual Signal Coding and Processing","authors":"Zhibo Chen;Heming Sun;Li Zhang;Fan Zhang","doi":"10.1109/JETCAS.2024.3403318","DOIUrl":"https://doi.org/10.1109/JETCAS.2024.3403318","url":null,"abstract":"This special issue of IEEE Journal on Emerging and Selected Topics in Circuits and Systems (JETCAS) is dedicated to demonstrating the latest developments in algorithms, implementations, and applications related to visual signal coding and processing with generative models. In recent years, generative models have emerged as one of the most significant and rapidly developing areas of research in artificial intelligence. They have proved to be an important instrument for advancing research in AI-based visual signal coding and processing. For instance, the variational autoencoder (VAE) has been used as a fundamental framework for end-to-end learned image coding, the autoregressive (AR) model has been extensively studied for efficient entropy coding, and the generative adversarial network (GAN) has been utilized frequently to enhance the subjective quality of coding schemes. Meanwhile, generative models have also been explored in various visual signal processing tasks, including quality assessment, restoration, enhancement, editing, and interpolation.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10579096","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141495162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parameter Reduction of Kernel-Based Video Frame Interpolation Methods Using Multiple Encoders 使用多个编码器减少基于核的视频帧插值方法的参数
IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-04-30 DOI: 10.1109/JETCAS.2024.3395418
Issa Khalifeh;Luka Murn;Ebroul Izquierdo
Video frame interpolation synthesises a new frame from existing frames. Several approaches have been devised to handle this core computer vision problem. Kernel-based approaches use an encoder-decoder architecture to extract features from the inputs and generate weights for a local separable convolution operation which is used to warp the input frames. The warped inputs are then combined to obtain the final interpolated frame. The ease of implementation of such an approach and favourable performance have enabled it to become a popular method in the field of interpolation. One downside, however, is that the encoder-decoder feature extractor is large and uses a lot of parameters. We propose a Multi-Encoder Method for Parameter Reduction (MEMPR) that can significantly reduce parameters by up to 85% whilst maintaining a similar level of performance. This is achieved by leveraging multiple encoders to focus on different aspects of the input. The approach can also be used to improve the performance of kernel-based models in a parameter-effective manner. To encourage the adoption of such an approach in potential future kernel-based methods, the approach is designed to be modular, intuitive and easy to implement. It is implemented on some of the most impactful kernel-based works such as SepConvNet, AdaCoFNet and EDSC. Extensive experiments on datasets with varying ranges of motion highlight the effectiveness of the MEMPR approach and its generalisability to different convolutional backbones and kernel-based operators.
视频帧插值是从现有帧中合成一个新帧。目前已设计出多种方法来处理这一计算机视觉核心问题。基于核的方法使用编码器-解码器架构从输入中提取特征,并为局部可分离卷积运算生成权重,用于对输入帧进行翘曲。然后将翘曲后的输入合并,得到最终的插值帧。这种方法易于实施,性能良好,因此成为插值领域的一种流行方法。然而,这种方法的一个缺点是编码器-解码器特征提取器体积较大,使用的参数较多。我们提出了一种用于减少参数的多编码器方法 (MEMPR),它能在保持类似性能水平的同时将参数大幅减少 85%。这是通过利用多个编码器来关注输入的不同方面来实现的。这种方法还可用于以参数有效的方式提高基于内核模型的性能。为了鼓励在未来潜在的基于内核的方法中采用这种方法,该方法被设计成模块化、直观且易于实施。它是在一些最有影响力的基于内核的作品上实现的,如 SepConvNet、AdaCoFNet 和 EDSC。在具有不同运动范围的数据集上进行的大量实验凸显了 MEMPR 方法的有效性及其对不同卷积骨干和基于内核算子的通用性。
{"title":"Parameter Reduction of Kernel-Based Video Frame Interpolation Methods Using Multiple Encoders","authors":"Issa Khalifeh;Luka Murn;Ebroul Izquierdo","doi":"10.1109/JETCAS.2024.3395418","DOIUrl":"10.1109/JETCAS.2024.3395418","url":null,"abstract":"Video frame interpolation synthesises a new frame from existing frames. Several approaches have been devised to handle this core computer vision problem. Kernel-based approaches use an encoder-decoder architecture to extract features from the inputs and generate weights for a local separable convolution operation which is used to warp the input frames. The warped inputs are then combined to obtain the final interpolated frame. The ease of implementation of such an approach and favourable performance have enabled it to become a popular method in the field of interpolation. One downside, however, is that the encoder-decoder feature extractor is large and uses a lot of parameters. We propose a Multi-Encoder Method for Parameter Reduction (MEMPR) that can significantly reduce parameters by up to 85% whilst maintaining a similar level of performance. This is achieved by leveraging multiple encoders to focus on different aspects of the input. The approach can also be used to improve the performance of kernel-based models in a parameter-effective manner. To encourage the adoption of such an approach in potential future kernel-based methods, the approach is designed to be modular, intuitive and easy to implement. It is implemented on some of the most impactful kernel-based works such as SepConvNet, AdaCoFNet and EDSC. Extensive experiments on datasets with varying ranges of motion highlight the effectiveness of the MEMPR approach and its generalisability to different convolutional backbones and kernel-based operators.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10510388","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140826670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TM-GAN: A Transformer-Based Multi-Modal Generative Adversarial Network for Guided Depth Image Super-Resolution TM-GAN:用于深度图像超分辨率的基于变换器的多模态生成对抗网络
IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-04-29 DOI: 10.1109/JETCAS.2024.3394495
Jiang Zhu;Van Kwan Zhi Koh;Zhiping Lin;Bihan Wen
Despite significant strides in deep single image super-resolution (SISR), the development of robust guided depth image super-resolution (GDSR) techniques presents a notable challenge. Effective GDSR methods must not only exploit the properties of the target image but also integrate complementary information from the guidance image. The state-of-the-art in guided image super-resolution has been dominated by convolutional neural network (CNN) based methods, which leverage CNN as their architecture. However, CNN has limitations in capturing global information effectively, and their traditional regression training techniques can sometimes lead to challenges in the precise generating of high-frequency details, unlike transformers that have shown remarkable success in deep learning through the self-attention mechanism. Drawing inspiration from the transformative impact of transformers in both language and vision applications, we propose a Transformer-based Multi-modal Generative Adversarial Network dubbed TM-GAN. TM-GAN is designed to effectively process and integrate multi-modal data, leveraging the global contextual understanding and detailed feature extraction capabilities of transformers within a GAN architecture for GDSR, aiming to effectively integrate and utilize multi-modal data sources. Experimental evaluations of TM-GAN on a variety of RGB-D datasets demonstrate its superiority over the state-of-the-art methods, showcasing its effectiveness in leveraging transformer-based techniques for GDSR.
尽管在深度单图像超分辨率(SISR)方面取得了长足进步,但开发稳健的引导深度图像超分辨率(GDSR)技术仍是一项重大挑战。有效的 GDSR 方法不仅要利用目标图像的特性,还要整合引导图像的补充信息。在引导图像超分辨率领域,基于卷积神经网络(CNN)的方法一直处于领先地位,这些方法利用 CNN 作为其架构。然而,CNN 在有效捕捉全局信息方面存在局限性,其传统的回归训练技术有时会导致在精确生成高频细节方面遇到挑战,而变换器则不同,它通过自我注意机制在深度学习方面取得了显著的成功。从变换器在语言和视觉应用中的变革性影响中汲取灵感,我们提出了一种基于变换器的多模态生成对抗网络(TM-GAN)。TM-GAN 设计用于有效处理和整合多模态数据,在 GAN 架构内利用变换器的全局上下文理解和详细特征提取能力来实现 GDSR,旨在有效整合和利用多模态数据源。TM-GAN 在各种 RGB-D 数据集上的实验评估表明,它优于最先进的方法,展示了它在利用基于变换器的技术进行 GDSR 方面的有效性。
{"title":"TM-GAN: A Transformer-Based Multi-Modal Generative Adversarial Network for Guided Depth Image Super-Resolution","authors":"Jiang Zhu;Van Kwan Zhi Koh;Zhiping Lin;Bihan Wen","doi":"10.1109/JETCAS.2024.3394495","DOIUrl":"10.1109/JETCAS.2024.3394495","url":null,"abstract":"Despite significant strides in deep single image super-resolution (SISR), the development of robust guided depth image super-resolution (GDSR) techniques presents a notable challenge. Effective GDSR methods must not only exploit the properties of the target image but also integrate complementary information from the guidance image. The state-of-the-art in guided image super-resolution has been dominated by convolutional neural network (CNN) based methods, which leverage CNN as their architecture. However, CNN has limitations in capturing global information effectively, and their traditional regression training techniques can sometimes lead to challenges in the precise generating of high-frequency details, unlike transformers that have shown remarkable success in deep learning through the self-attention mechanism. Drawing inspiration from the transformative impact of transformers in both language and vision applications, we propose a Transformer-based Multi-modal Generative Adversarial Network dubbed TM-GAN. TM-GAN is designed to effectively process and integrate multi-modal data, leveraging the global contextual understanding and detailed feature extraction capabilities of transformers within a GAN architecture for GDSR, aiming to effectively integrate and utilize multi-modal data sources. Experimental evaluations of TM-GAN on a variety of RGB-D datasets demonstrate its superiority over the state-of-the-art methods, showcasing its effectiveness in leveraging transformer-based techniques for GDSR.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140826640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Compressed-Domain Vision Transformer for Image Classification 用于图像分类的压缩域视觉变换器
IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-04-29 DOI: 10.1109/JETCAS.2024.3394878
Ruolei Ji;Lina J. Karam
Compressed-domain visual task schemes, where visual processing or computer vision are directly performed on the compressed-domain representations, were shown to achieve a higher computational efficiency during training and deployment by avoiding the need to decode the compressed visual information while resulting in a competitive or even better performance as compared to corresponding spatial-domain visual tasks. This work is concerned with learning-based compressed-domain image classification, where the image classification is performed directly on compressed-domain representations, also known as latent representations, that are obtained using a learning-based visual encoder. In this paper, a compressed-domain Vision Transformer (cViT) is proposed to perform image classification in the learning-based compressed-domain. For this purpose, the Vision Transformer (ViT) architecture is adopted and modified to perform classification directly in the compressed-domain. As part of this work, a novel feature patch embedding is introduced leveraging the within- and cross-channel information in the compressed-domain. Also, an adaptation training strategy is designed to adopt the weights from the pre-trained spatial-domain ViT and adapt these to the compressed-domain classification task. Furthermore, the pre-trained ViT weights are utilized through interpolation for position embedding initialization to further improve the performance of cViT. The experimental results show that the proposed cViT outperforms the existing compressed-domain classification networks in terms of Top-1 and Top-5 classification accuracies. Moreover, the proposed cViT can yield competitive classification accuracies with a significantly higher computational efficiency as compared to pixel-domain approaches.
在压缩域视觉任务方案中,视觉处理或计算机视觉直接在压缩域表征上执行,通过避免对压缩视觉信息进行解码,在训练和部署过程中实现了更高的计算效率,同时与相应的空间域视觉任务相比,具有竞争力甚至更好的性能。这项工作关注的是基于学习的压缩域图像分类,即直接在压缩域表征(也称为潜在表征)上执行图像分类,这些表征是使用基于学习的视觉编码器获得的。本文提出了一种压缩域视觉变换器(cViT),用于在基于学习的压缩域中执行图像分类。为此,本文采用并修改了视觉变换器(ViT)架构,以便直接在压缩域中执行分类。作为这项工作的一部分,我们引入了一种新颖的特征补丁嵌入方法,利用压缩域中的内部和跨通道信息。此外,还设计了一种适应性训练策略,采用预先训练好的空间域 ViT 的权重,并将其适应于压缩域分类任务。此外,预训练的 ViT 权重通过插值法用于位置嵌入初始化,以进一步提高 cViT 的性能。实验结果表明,所提出的 cViT 在分类精度 Top-1 和 Top-5 方面优于现有的压缩域分类网络。此外,与像素域方法相比,所提出的 cViT 能以更高的计算效率获得有竞争力的分类精度。
{"title":"Compressed-Domain Vision Transformer for Image Classification","authors":"Ruolei Ji;Lina J. Karam","doi":"10.1109/JETCAS.2024.3394878","DOIUrl":"10.1109/JETCAS.2024.3394878","url":null,"abstract":"Compressed-domain visual task schemes, where visual processing or computer vision are directly performed on the compressed-domain representations, were shown to achieve a higher computational efficiency during training and deployment by avoiding the need to decode the compressed visual information while resulting in a competitive or even better performance as compared to corresponding spatial-domain visual tasks. This work is concerned with learning-based compressed-domain image classification, where the image classification is performed directly on compressed-domain representations, also known as latent representations, that are obtained using a learning-based visual encoder. In this paper, a compressed-domain Vision Transformer (cViT) is proposed to perform image classification in the learning-based compressed-domain. For this purpose, the Vision Transformer (ViT) architecture is adopted and modified to perform classification directly in the compressed-domain. As part of this work, a novel feature patch embedding is introduced leveraging the within- and cross-channel information in the compressed-domain. Also, an adaptation training strategy is designed to adopt the weights from the pre-trained spatial-domain ViT and adapt these to the compressed-domain classification task. Furthermore, the pre-trained ViT weights are utilized through interpolation for position embedding initialization to further improve the performance of cViT. The experimental results show that the proposed cViT outperforms the existing compressed-domain classification networks in terms of Top-1 and Top-5 classification accuracies. Moreover, the proposed cViT can yield competitive classification accuracies with a significantly higher computational efficiency as compared to pixel-domain approaches.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140826671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FVIFormer: Flow-Guided Global-Local Aggregation Transformer Network for Video Inpainting FVIFormer:用于视频绘制的流量引导全局-本地聚合变换器网络
IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-04-25 DOI: 10.1109/JETCAS.2024.3392972
Weiqing Yan;Yiqiu Sun;Guanghui Yue;Wei Zhou;Hantao Liu
Video inpainting has been extensively used in recent years. Established works usually utilise the similarity between the missing region and its surrounding features to inpaint in the visually damaged content in a multi-stage manner. However, due to the complexity of the video content, it may result in the destruction of structural information of objects within the video. In addition to this, the presence of moving objects in the damaged regions of the video can further increase the difficulty of this work. To address these issues, we propose a flow-guided global-Local aggregation Transformer network for video inpainting. First, we use a pre-trained optical flow complementation network to repair the defective optical flow of video frames. Then, we propose a content inpainting module, which use the complete optical flow as a guide, and propagate the global content across the video frames using efficient temporal and spacial Transformer to inpaint in the corrupted regions of the video. Finally, we propose a structural rectification module to enhance the coherence of content around the missing regions via combining the extracted local and global features. In addition, considering the efficiency of the overall framework, we also optimized the self-attention mechanism to improve the speed of training and testing via depth-wise separable encoding. We validate the effectiveness of our method on the YouTube-VOS and DAVIS video datasets. Extensive experiment results demonstrate the effectiveness of our approach in edge-complementing video content that has undergone stabilisation algorithms.
近年来,视频内画技术得到了广泛应用。已有的工作通常是利用缺失区域与其周围特征之间的相似性,以多阶段的方式对视觉上受损的内容进行补绘。然而,由于视频内容的复杂性,可能会导致视频中物体结构信息的破坏。除此之外,视频中受损区域存在移动物体也会进一步增加这项工作的难度。为了解决这些问题,我们提出了一种用于视频内画的流量引导全局-局部聚合变换器网络。首先,我们使用预先训练好的光流互补网络来修复视频帧的缺陷光流。然后,我们提出了一个内容喷绘模块,该模块以完整的光流为指导,利用高效的时空变换器在视频帧中传播全局内容,对视频中的损坏区域进行喷绘。最后,我们提出了一个结构矫正模块,通过结合提取的局部和全局特征来增强缺失区域周围内容的一致性。此外,考虑到整体框架的效率,我们还优化了自我注意机制,通过深度可分离编码提高了训练和测试的速度。我们在 YouTube-VOS 和 DAVIS 视频数据集上验证了我们方法的有效性。广泛的实验结果证明了我们的方法在对经过稳定算法处理的视频内容进行边缘补全时的有效性。
{"title":"FVIFormer: Flow-Guided Global-Local Aggregation Transformer Network for Video Inpainting","authors":"Weiqing Yan;Yiqiu Sun;Guanghui Yue;Wei Zhou;Hantao Liu","doi":"10.1109/JETCAS.2024.3392972","DOIUrl":"10.1109/JETCAS.2024.3392972","url":null,"abstract":"Video inpainting has been extensively used in recent years. Established works usually utilise the similarity between the missing region and its surrounding features to inpaint in the visually damaged content in a multi-stage manner. However, due to the complexity of the video content, it may result in the destruction of structural information of objects within the video. In addition to this, the presence of moving objects in the damaged regions of the video can further increase the difficulty of this work. To address these issues, we propose a flow-guided global-Local aggregation Transformer network for video inpainting. First, we use a pre-trained optical flow complementation network to repair the defective optical flow of video frames. Then, we propose a content inpainting module, which use the complete optical flow as a guide, and propagate the global content across the video frames using efficient temporal and spacial Transformer to inpaint in the corrupted regions of the video. Finally, we propose a structural rectification module to enhance the coherence of content around the missing regions via combining the extracted local and global features. In addition, considering the efficiency of the overall framework, we also optimized the self-attention mechanism to improve the speed of training and testing via depth-wise separable encoding. We validate the effectiveness of our method on the YouTube-VOS and DAVIS video datasets. Extensive experiment results demonstrate the effectiveness of our approach in edge-complementing video content that has undergone stabilisation algorithms.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":null,"pages":null},"PeriodicalIF":3.7,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140805941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Journal on Emerging and Selected Topics in Circuits and Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1