IEEE Journal on Emerging and Selected Topics in Circuits and Systems最新文献

英文中文

Energy-Efficient and Rotationally Adjustable Millimeter-Wave Wireless Interconnects 高能效、可旋转调节的毫米波无线互连器件

IF 3.7 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal on Emerging and Selected Topics in Circuits and Systems

Pub Date : 2024-07-03 DOI: 10.1109/JETCAS.2024.3422371

Abhishek Sharma;Yanghyo Rod Kim

Conventional interconnects experience significant mechanical durability, mobility, and signal integrity challenges when dealing with moving parts or implementing extensive interconnect networks. As a result, they often hinder the performance of advanced autonomous and high-performance computing systems. This paper presents a fully rotatable and diagonally flexible ultra-short distance (≈ 1 mm) wireless interconnect. The proposed wireless interconnect comprises a 57-GHz transceiver integrated with a folded dipole antenna through wire bonding, enabling a flexible contactless connection. Here, two folded dipoles communicate in the Fresnel zone (radiative near-field), where we leverage the longitudinal electric fields to alleviate the polarization mismatch over the entire rotation angle. We have implemented a non-coherent on-off keying (OOK) modulation scheme and employed an automatic gain control (AGC) loop and offset canceling feedback loop to compensate for the transmission degradation and signal imbalance. The proposed system consumes 58.2 mW of power under a 1 V supply while transferring data at a rate of 10-Gb/s, achieving 5.82-pJ/bit energy efficiency.

在处理移动部件或实施广泛的互连网络时，传统互连器件在机械耐久性、移动性和信号完整性方面面临着巨大挑战。因此，它们往往会阻碍先进的自主和高性能计算系统的性能。本文提出了一种完全可旋转、对角线灵活的超短距离（≈ 1 毫米）无线互连器件。所提出的无线互联由一个 57 GHz 收发器和一个折叠偶极子天线组成，通过线键合实现了灵活的非接触式连接。在这里，两个折叠偶极子在菲涅尔区（辐射近场）进行通信，我们利用纵向电场来缓解整个旋转角度的极化失配。我们采用了非相干开关键控（OOK）调制方案，并使用了自动增益控制（AGC）环路和偏移抵消反馈环路来补偿传输劣化和信号失衡。所提出的系统在 1 V 电源下的功耗为 58.2 mW，数据传输速率为 10Gb/s，实现了 5.82-pJ/bit 的能效。

{"title":"Energy-Efficient and Rotationally Adjustable Millimeter-Wave Wireless Interconnects","authors":"Abhishek Sharma;Yanghyo Rod Kim","doi":"10.1109/JETCAS.2024.3422371","DOIUrl":"10.1109/JETCAS.2024.3422371","url":null,"abstract":"Conventional interconnects experience significant mechanical durability, mobility, and signal integrity challenges when dealing with moving parts or implementing extensive interconnect networks. As a result, they often hinder the performance of advanced autonomous and high-performance computing systems. This paper presents a fully rotatable and diagonally flexible ultra-short distance (≈ 1 mm) wireless interconnect. The proposed wireless interconnect comprises a 57-GHz transceiver integrated with a folded dipole antenna through wire bonding, enabling a flexible contactless connection. Here, two folded dipoles communicate in the Fresnel zone (radiative near-field), where we leverage the longitudinal electric fields to alleviate the polarization mismatch over the entire rotation angle. We have implemented a non-coherent on-off keying (OOK) modulation scheme and employed an automatic gain control (AGC) loop and offset canceling feedback loop to compensate for the transmission degradation and signal imbalance. The proposed system consumes 58.2 mW of power under a 1 V supply while transferring data at a rate of 10-Gb/s, achieving 5.82-pJ/bit energy efficiency.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 3","pages":"551-562"},"PeriodicalIF":3.7,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141548575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Highly-Scalable Deep-Learning Accelerator With a Cost-Effective Chip-to-Chip Adapter and a C2C-Communication-Aware Scheduler 具有成本效益的芯片到芯片适配器和 C2C 通信感知调度器的高扩展性深度学习加速器

IF 3.7 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal on Emerging and Selected Topics in Circuits and Systems

Pub Date : 2024-07-01 DOI: 10.1109/JETCAS.2024.3421553

Jicheon Kim;Chunmyung Park;Eunjae Hyun;Xuan Truong Nguyen;Hyuk-Jae Lee

Multi-chip-module (MCM) technology heralds a new era for scalable DNN inference systems, offering a cost-effective alternative to large-scale monolithic designs by lowering fabrication and design costs. Nevertheless, MCMs often incur resource and performance overheads due to inter-chip communication, which largely reduce a performance gain in a scaling-out system. To address these challenges, this paper introduces a highly-scalable DNN accelerator with a lightweight chip-to-chip adapter (C2CA) and a C2C-communication-aware scheduler. Our design employs a C2CA for inter-chip communication, which accurately illustrates an MCM system with a constrained C2C bandwidth, e.g., about 1/16, 1/8, or 1/4 of an on-chip bandwidth. We empirically reveal that the limited C2C bandwidth largely affects the overall performance gain of an MCM system. For example, compared with the one-core engine, a four-chip MCM system with a constrained C2C bandwidth only achieves

$2.60times $

$3.27times $

$2.84times $

, and

$2.74times $

performance gains on ResNet50, DarkNet19, MobileNetV1, and EfficientNetS, respectively. Mitigating the problem, we propose a novel C2C-communication-aware scheduler with forward and backward inter-layer scheduling. Specifically, our scheduler effectively utilizes a C2C bandwidth while a core is performing its own computation. To demonstrate the effectiveness and practicality of our concept, we modeled our design with Verilog HDL and implemented it on an FPGA board, i.e., Xilinx ZCU104. The experimental results demonstrate that the system shows significant throughput improvements compared to a single-chip configuration, yielding average enhancements of

$1.87times $

and

$3.43times $

for two-chip and four-chip configurations, respectively, on ResNet50, DarkNet19, MobileNetV1, and EfficientNetS.

多芯片模块（MCM）技术预示着可扩展 DNN 推理系统进入了一个新时代，它通过降低制造和设计成本，为大规模单片设计提供了一种具有成本效益的替代方案。然而，MCM 通常会因芯片间通信而产生资源和性能开销，这在很大程度上降低了扩展型系统的性能提升。为了应对这些挑战，本文介绍了一种具有轻量级芯片到芯片适配器（C2CA）和 C2C 通信感知调度器的高可扩展 DNN 加速器。我们的设计采用了用于芯片间通信的 C2CA，准确地说明了 C2C 带宽受限的 MCM 系统，如约为片上带宽的 1/16、1/8 或 1/4。我们通过经验发现，有限的 C2C 带宽在很大程度上影响了 MCM 系统的整体性能增益。例如，与单核引擎相比，C2C带宽受限的四芯片MCM系统在ResNet50、DarkNet19、MobileNetV1和EfficientNetS上分别只实现了2.60/times $、3.27/times $、2.84/times $和2.74/times $的性能提升。为缓解这一问题，我们提出了一种新型的 C2C 通信感知调度器，具有前向和后向层间调度功能。具体来说，我们的调度器可在内核执行自身计算时有效利用 C2C 带宽。为了证明我们概念的有效性和实用性，我们用 Verilog HDL 对我们的设计进行了建模，并在 FPGA 板（即 Xilinx ZCU104）上进行了实现。实验结果表明，与单芯片配置相比，该系统的吞吐量有了显著提高，在 ResNet50、DarkNet19、MobileNetV1 和 EfficientNetS 上，双芯片和四芯片配置的平均提高幅度分别为 1.87 美元和 3.43 美元。

{"title":"A Highly-Scalable Deep-Learning Accelerator With a Cost-Effective Chip-to-Chip Adapter and a C2C-Communication-Aware Scheduler","authors":"Jicheon Kim;Chunmyung Park;Eunjae Hyun;Xuan Truong Nguyen;Hyuk-Jae Lee","doi":"10.1109/JETCAS.2024.3421553","DOIUrl":"10.1109/JETCAS.2024.3421553","url":null,"abstract":"Multi-chip-module (MCM) technology heralds a new era for scalable DNN inference systems, offering a cost-effective alternative to large-scale monolithic designs by lowering fabrication and design costs. Nevertheless, MCMs often incur resource and performance overheads due to inter-chip communication, which largely reduce a performance gain in a scaling-out system. To address these challenges, this paper introduces a highly-scalable DNN accelerator with a lightweight chip-to-chip adapter (C2CA) and a C2C-communication-aware scheduler. Our design employs a C2CA for inter-chip communication, which accurately illustrates an MCM system with a constrained C2C bandwidth, e.g., about 1/16, 1/8, or 1/4 of an on-chip bandwidth. We empirically reveal that the limited C2C bandwidth largely affects the overall performance gain of an MCM system. For example, compared with the one-core engine, a four-chip MCM system with a constrained C2C bandwidth only achieves \u0000<inline-formula> <tex-math>$2.60times $ </tex-math></inline-formula>\u0000, \u0000<inline-formula> <tex-math>$3.27times $ </tex-math></inline-formula>\u0000, \u0000<inline-formula> <tex-math>$2.84times $ </tex-math></inline-formula>\u0000, and \u0000<inline-formula> <tex-math>$2.74times $ </tex-math></inline-formula>\u0000 performance gains on ResNet50, DarkNet19, MobileNetV1, and EfficientNetS, respectively. Mitigating the problem, we propose a novel C2C-communication-aware scheduler with forward and backward inter-layer scheduling. Specifically, our scheduler effectively utilizes a C2C bandwidth while a core is performing its own computation. To demonstrate the effectiveness and practicality of our concept, we modeled our design with Verilog HDL and implemented it on an FPGA board, i.e., Xilinx ZCU104. The experimental results demonstrate that the system shows significant throughput improvements compared to a single-chip configuration, yielding average enhancements of \u0000<inline-formula> <tex-math>$1.87times $ </tex-math></inline-formula>\u0000 and \u0000<inline-formula> <tex-math>$3.43times $ </tex-math></inline-formula>\u0000 for two-chip and four-chip configurations, respectively, on ResNet50, DarkNet19, MobileNetV1, and EfficientNetS.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 3","pages":"455-468"},"PeriodicalIF":3.7,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141522295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Secure Consensus Control for Constrained Multi-Agent Systems Against Intermittent Denial-of-Service Attacks: An Adaptive Dynamic Programming Method 针对间歇性拒绝服务攻击的受限多代理系统安全共识控制：一种自适应动态编程方法

IF 3.7 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal on Emerging and Selected Topics in Circuits and Systems

Pub Date : 2024-06-28 DOI: 10.1109/JETCAS.2024.3420396

Zhen Gao;Ning Zhao;Guangdeng Zong;Xudong Zhao

Combining the use of the adaptive dynamic programming method and optimized backstepping strategy, this paper focuses on the secure consensus problem for constrained nonlinear multi-agent systems (MASs) subject to denial-of-service (DoS) attacks and input delay. Since network channels between some agents often suffer from intrusions by attackers during data transmission, we consider information transfers in both attack-sleep and attack-active scenarios, and construct a novel distributed observer with a switched mechanism to estimate the leader’s state information. In order to optimize system performances while ensuring that the system states do not exceed constraint sets, a new performance index function and a tan-type barrier Lyapunov function (BLF) are introduced. Besides, by employing the Pade approximation and an intermediate variable, the effect of input delay is removed. As a consequence, the proposed optimal control can smoothly steer the nonlinear MASs to realize the followers-leader consensus tracking goal, and all system states are consistently constrained within their compact sets. Finally, simulation results verify the effectiveness of this control scheme.

结合自适应动态规划方法和优化退步策略，研究了具有输入延迟和拒绝服务攻击的约束非线性多智能体系统的安全一致性问题。由于某些agent之间的网络通道在数据传输过程中经常受到攻击者的入侵，我们考虑了攻击睡眠和攻击活动两种情况下的信息传输，并构建了一种具有切换机制的分布式观测器来估计leader的状态信息。为了在保证系统状态不超过约束集的前提下优化系统性能，引入了一种新的性能指标函数和tan型障碍Lyapunov函数（BLF）。此外，通过采用Pade近似和一个中间变量，消除了输入延迟的影响。结果表明，所提出的最优控制可以平滑地引导非线性质量实现跟随-领导共识跟踪目标，并且系统的所有状态都被一致地约束在它们的紧集中。最后，仿真结果验证了该控制方案的有效性。

{"title":"Secure Consensus Control for Constrained Multi-Agent Systems Against Intermittent Denial-of-Service Attacks: An Adaptive Dynamic Programming Method","authors":"Zhen Gao;Ning Zhao;Guangdeng Zong;Xudong Zhao","doi":"10.1109/JETCAS.2024.3420396","DOIUrl":"10.1109/JETCAS.2024.3420396","url":null,"abstract":"Combining the use of the adaptive dynamic programming method and optimized backstepping strategy, this paper focuses on the secure consensus problem for constrained nonlinear multi-agent systems (MASs) subject to denial-of-service (DoS) attacks and input delay. Since network channels between some agents often suffer from intrusions by attackers during data transmission, we consider information transfers in both attack-sleep and attack-active scenarios, and construct a novel distributed observer with a switched mechanism to estimate the leader’s state information. In order to optimize system performances while ensuring that the system states do not exceed constraint sets, a new performance index function and a tan-type barrier Lyapunov function (BLF) are introduced. Besides, by employing the Pade approximation and an intermediate variable, the effect of input delay is removed. As a consequence, the proposed optimal control can smoothly steer the nonlinear MASs to realize the followers-leader consensus tracking goal, and all system states are consistently constrained within their compact sets. Finally, simulation results verify the effectiveness of this control scheme.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 4","pages":"705-716"},"PeriodicalIF":3.7,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10577119","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141508148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

IEEE Journal on Emerging and Selected Topics in Circuits and Systems information for authors 供作者参考的《IEEE 电路与系统新兴选题期刊

IF 3.7 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal on Emerging and Selected Topics in Circuits and Systems

Pub Date : 2024-06-01 DOI: 10.1109/JETCAS.2024.3417549

引用次数: 0

IEEE Journal on Emerging and Selected Topics in Circuits and Systems Publication Information 电气和电子工程师学会电路与系统新专题与选题期刊》出版信息

IF 3.7 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal on Emerging and Selected Topics in Circuits and Systems

Pub Date : 2024-06-01 DOI: 10.1109/JETCAS.2024.3405090

引用次数: 0

IEEE Circuits and Systems Society 电气和电子工程师学会电路与系统协会

IF 3.7 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal on Emerging and Selected Topics in Circuits and Systems

Pub Date : 2024-06-01 DOI: 10.1109/JETCAS.2024.3405094

引用次数: 0

Guest Editorial Advances in Generative Visual Signal Coding and Processing 特邀编辑：生成式视觉信号编码与处理的进展

IF 3.7 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal on Emerging and Selected Topics in Circuits and Systems

Pub Date : 2024-06-01 DOI: 10.1109/JETCAS.2024.3403318

Zhibo Chen;Heming Sun;Li Zhang;Fan Zhang

This special issue of IEEE Journal on Emerging and Selected Topics in Circuits and Systems (JETCAS) is dedicated to demonstrating the latest developments in algorithms, implementations, and applications related to visual signal coding and processing with generative models. In recent years, generative models have emerged as one of the most significant and rapidly developing areas of research in artificial intelligence. They have proved to be an important instrument for advancing research in AI-based visual signal coding and processing. For instance, the variational autoencoder (VAE) has been used as a fundamental framework for end-to-end learned image coding, the autoregressive (AR) model has been extensively studied for efficient entropy coding, and the generative adversarial network (GAN) has been utilized frequently to enhance the subjective quality of coding schemes. Meanwhile, generative models have also been explored in various visual signal processing tasks, including quality assessment, restoration, enhancement, editing, and interpolation.

本期《电气和电子工程师学会电路与系统新兴选题期刊》（IEEE Journal on Emerging and Selected Topics in Circuits and Systems，JETCAS）特刊致力于展示与生成模型视觉信号编码和处理相关的算法、实现和应用方面的最新进展。近年来，生成模型已成为人工智能领域最重要、发展最迅速的研究领域之一。事实证明，它们是推动基于人工智能的视觉信号编码和处理研究的重要工具。例如，变分自动编码器（VAE）已被用作端到端学习图像编码的基本框架，自回归（AR）模型已被广泛研究用于高效熵编码，生成对抗网络（GAN）已被频繁用于提高编码方案的主观质量。同时，生成模型还在各种视觉信号处理任务中得到了应用，包括质量评估、修复、增强、编辑和插值。

{"title":"Guest Editorial Advances in Generative Visual Signal Coding and Processing","authors":"Zhibo Chen;Heming Sun;Li Zhang;Fan Zhang","doi":"10.1109/JETCAS.2024.3403318","DOIUrl":"https://doi.org/10.1109/JETCAS.2024.3403318","url":null,"abstract":"This special issue of IEEE Journal on Emerging and Selected Topics in Circuits and Systems (JETCAS) is dedicated to demonstrating the latest developments in algorithms, implementations, and applications related to visual signal coding and processing with generative models. In recent years, generative models have emerged as one of the most significant and rapidly developing areas of research in artificial intelligence. They have proved to be an important instrument for advancing research in AI-based visual signal coding and processing. For instance, the variational autoencoder (VAE) has been used as a fundamental framework for end-to-end learned image coding, the autoregressive (AR) model has been extensively studied for efficient entropy coding, and the generative adversarial network (GAN) has been utilized frequently to enhance the subjective quality of coding schemes. Meanwhile, generative models have also been explored in various visual signal processing tasks, including quality assessment, restoration, enhancement, editing, and interpolation.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 2","pages":"145-148"},"PeriodicalIF":3.7,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10579096","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141495162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Parameter Reduction of Kernel-Based Video Frame Interpolation Methods Using Multiple Encoders 使用多个编码器减少基于核的视频帧插值方法的参数

IF 3.7 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal on Emerging and Selected Topics in Circuits and Systems

Pub Date : 2024-04-30 DOI: 10.1109/JETCAS.2024.3395418

Issa Khalifeh;Luka Murn;Ebroul Izquierdo

Video frame interpolation synthesises a new frame from existing frames. Several approaches have been devised to handle this core computer vision problem. Kernel-based approaches use an encoder-decoder architecture to extract features from the inputs and generate weights for a local separable convolution operation which is used to warp the input frames. The warped inputs are then combined to obtain the final interpolated frame. The ease of implementation of such an approach and favourable performance have enabled it to become a popular method in the field of interpolation. One downside, however, is that the encoder-decoder feature extractor is large and uses a lot of parameters. We propose a Multi-Encoder Method for Parameter Reduction (MEMPR) that can significantly reduce parameters by up to 85% whilst maintaining a similar level of performance. This is achieved by leveraging multiple encoders to focus on different aspects of the input. The approach can also be used to improve the performance of kernel-based models in a parameter-effective manner. To encourage the adoption of such an approach in potential future kernel-based methods, the approach is designed to be modular, intuitive and easy to implement. It is implemented on some of the most impactful kernel-based works such as SepConvNet, AdaCoFNet and EDSC. Extensive experiments on datasets with varying ranges of motion highlight the effectiveness of the MEMPR approach and its generalisability to different convolutional backbones and kernel-based operators.

视频帧插值是从现有帧中合成一个新帧。目前已设计出多种方法来处理这一计算机视觉核心问题。基于核的方法使用编码器-解码器架构从输入中提取特征，并为局部可分离卷积运算生成权重，用于对输入帧进行翘曲。然后将翘曲后的输入合并，得到最终的插值帧。这种方法易于实施，性能良好，因此成为插值领域的一种流行方法。然而，这种方法的一个缺点是编码器-解码器特征提取器体积较大，使用的参数较多。我们提出了一种用于减少参数的多编码器方法 (MEMPR)，它能在保持类似性能水平的同时将参数大幅减少 85%。这是通过利用多个编码器来关注输入的不同方面来实现的。这种方法还可用于以参数有效的方式提高基于内核模型的性能。为了鼓励在未来潜在的基于内核的方法中采用这种方法，该方法被设计成模块化、直观且易于实施。它是在一些最有影响力的基于内核的作品上实现的，如 SepConvNet、AdaCoFNet 和 EDSC。在具有不同运动范围的数据集上进行的大量实验凸显了 MEMPR 方法的有效性及其对不同卷积骨干和基于内核算子的通用性。

{"title":"Parameter Reduction of Kernel-Based Video Frame Interpolation Methods Using Multiple Encoders","authors":"Issa Khalifeh;Luka Murn;Ebroul Izquierdo","doi":"10.1109/JETCAS.2024.3395418","DOIUrl":"10.1109/JETCAS.2024.3395418","url":null,"abstract":"Video frame interpolation synthesises a new frame from existing frames. Several approaches have been devised to handle this core computer vision problem. Kernel-based approaches use an encoder-decoder architecture to extract features from the inputs and generate weights for a local separable convolution operation which is used to warp the input frames. The warped inputs are then combined to obtain the final interpolated frame. The ease of implementation of such an approach and favourable performance have enabled it to become a popular method in the field of interpolation. One downside, however, is that the encoder-decoder feature extractor is large and uses a lot of parameters. We propose a Multi-Encoder Method for Parameter Reduction (MEMPR) that can significantly reduce parameters by up to 85% whilst maintaining a similar level of performance. This is achieved by leveraging multiple encoders to focus on different aspects of the input. The approach can also be used to improve the performance of kernel-based models in a parameter-effective manner. To encourage the adoption of such an approach in potential future kernel-based methods, the approach is designed to be modular, intuitive and easy to implement. It is implemented on some of the most impactful kernel-based works such as SepConvNet, AdaCoFNet and EDSC. Extensive experiments on datasets with varying ranges of motion highlight the effectiveness of the MEMPR approach and its generalisability to different convolutional backbones and kernel-based operators.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 2","pages":"245-260"},"PeriodicalIF":3.7,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10510388","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140826670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TM-GAN: A Transformer-Based Multi-Modal Generative Adversarial Network for Guided Depth Image Super-Resolution TM-GAN：用于深度图像超分辨率的基于变换器的多模态生成对抗网络

IF 3.7 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal on Emerging and Selected Topics in Circuits and Systems

Pub Date : 2024-04-29 DOI: 10.1109/JETCAS.2024.3394495

Jiang Zhu;Van Kwan Zhi Koh;Zhiping Lin;Bihan Wen

Despite significant strides in deep single image super-resolution (SISR), the development of robust guided depth image super-resolution (GDSR) techniques presents a notable challenge. Effective GDSR methods must not only exploit the properties of the target image but also integrate complementary information from the guidance image. The state-of-the-art in guided image super-resolution has been dominated by convolutional neural network (CNN) based methods, which leverage CNN as their architecture. However, CNN has limitations in capturing global information effectively, and their traditional regression training techniques can sometimes lead to challenges in the precise generating of high-frequency details, unlike transformers that have shown remarkable success in deep learning through the self-attention mechanism. Drawing inspiration from the transformative impact of transformers in both language and vision applications, we propose a Transformer-based Multi-modal Generative Adversarial Network dubbed TM-GAN. TM-GAN is designed to effectively process and integrate multi-modal data, leveraging the global contextual understanding and detailed feature extraction capabilities of transformers within a GAN architecture for GDSR, aiming to effectively integrate and utilize multi-modal data sources. Experimental evaluations of TM-GAN on a variety of RGB-D datasets demonstrate its superiority over the state-of-the-art methods, showcasing its effectiveness in leveraging transformer-based techniques for GDSR.

尽管在深度单图像超分辨率（SISR）方面取得了长足进步，但开发稳健的引导深度图像超分辨率（GDSR）技术仍是一项重大挑战。有效的 GDSR 方法不仅要利用目标图像的特性，还要整合引导图像的补充信息。在引导图像超分辨率领域，基于卷积神经网络（CNN）的方法一直处于领先地位，这些方法利用 CNN 作为其架构。然而，CNN 在有效捕捉全局信息方面存在局限性，其传统的回归训练技术有时会导致在精确生成高频细节方面遇到挑战，而变换器则不同，它通过自我注意机制在深度学习方面取得了显著的成功。从变换器在语言和视觉应用中的变革性影响中汲取灵感，我们提出了一种基于变换器的多模态生成对抗网络（TM-GAN）。TM-GAN 设计用于有效处理和整合多模态数据，在 GAN 架构内利用变换器的全局上下文理解和详细特征提取能力来实现 GDSR，旨在有效整合和利用多模态数据源。TM-GAN 在各种 RGB-D 数据集上的实验评估表明，它优于最先进的方法，展示了它在利用基于变换器的技术进行 GDSR 方面的有效性。

{"title":"TM-GAN: A Transformer-Based Multi-Modal Generative Adversarial Network for Guided Depth Image Super-Resolution","authors":"Jiang Zhu;Van Kwan Zhi Koh;Zhiping Lin;Bihan Wen","doi":"10.1109/JETCAS.2024.3394495","DOIUrl":"10.1109/JETCAS.2024.3394495","url":null,"abstract":"Despite significant strides in deep single image super-resolution (SISR), the development of robust guided depth image super-resolution (GDSR) techniques presents a notable challenge. Effective GDSR methods must not only exploit the properties of the target image but also integrate complementary information from the guidance image. The state-of-the-art in guided image super-resolution has been dominated by convolutional neural network (CNN) based methods, which leverage CNN as their architecture. However, CNN has limitations in capturing global information effectively, and their traditional regression training techniques can sometimes lead to challenges in the precise generating of high-frequency details, unlike transformers that have shown remarkable success in deep learning through the self-attention mechanism. Drawing inspiration from the transformative impact of transformers in both language and vision applications, we propose a Transformer-based Multi-modal Generative Adversarial Network dubbed TM-GAN. TM-GAN is designed to effectively process and integrate multi-modal data, leveraging the global contextual understanding and detailed feature extraction capabilities of transformers within a GAN architecture for GDSR, aiming to effectively integrate and utilize multi-modal data sources. Experimental evaluations of TM-GAN on a variety of RGB-D datasets demonstrate its superiority over the state-of-the-art methods, showcasing its effectiveness in leveraging transformer-based techniques for GDSR.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 2","pages":"261-274"},"PeriodicalIF":3.7,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140826640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Compressed-Domain Vision Transformer for Image Classification 用于图像分类的压缩域视觉变换器

IF 3.7 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal on Emerging and Selected Topics in Circuits and Systems

Pub Date : 2024-04-29 DOI: 10.1109/JETCAS.2024.3394878

Ruolei Ji;Lina J. Karam

Compressed-domain visual task schemes, where visual processing or computer vision are directly performed on the compressed-domain representations, were shown to achieve a higher computational efficiency during training and deployment by avoiding the need to decode the compressed visual information while resulting in a competitive or even better performance as compared to corresponding spatial-domain visual tasks. This work is concerned with learning-based compressed-domain image classification, where the image classification is performed directly on compressed-domain representations, also known as latent representations, that are obtained using a learning-based visual encoder. In this paper, a compressed-domain Vision Transformer (cViT) is proposed to perform image classification in the learning-based compressed-domain. For this purpose, the Vision Transformer (ViT) architecture is adopted and modified to perform classification directly in the compressed-domain. As part of this work, a novel feature patch embedding is introduced leveraging the within- and cross-channel information in the compressed-domain. Also, an adaptation training strategy is designed to adopt the weights from the pre-trained spatial-domain ViT and adapt these to the compressed-domain classification task. Furthermore, the pre-trained ViT weights are utilized through interpolation for position embedding initialization to further improve the performance of cViT. The experimental results show that the proposed cViT outperforms the existing compressed-domain classification networks in terms of Top-1 and Top-5 classification accuracies. Moreover, the proposed cViT can yield competitive classification accuracies with a significantly higher computational efficiency as compared to pixel-domain approaches.

在压缩域视觉任务方案中，视觉处理或计算机视觉直接在压缩域表征上执行，通过避免对压缩视觉信息进行解码，在训练和部署过程中实现了更高的计算效率，同时与相应的空间域视觉任务相比，具有竞争力甚至更好的性能。这项工作关注的是基于学习的压缩域图像分类，即直接在压缩域表征（也称为潜在表征）上执行图像分类，这些表征是使用基于学习的视觉编码器获得的。本文提出了一种压缩域视觉变换器（cViT），用于在基于学习的压缩域中执行图像分类。为此，本文采用并修改了视觉变换器（ViT）架构，以便直接在压缩域中执行分类。作为这项工作的一部分，我们引入了一种新颖的特征补丁嵌入方法，利用压缩域中的内部和跨通道信息。此外，还设计了一种适应性训练策略，采用预先训练好的空间域 ViT 的权重，并将其适应于压缩域分类任务。此外，预训练的 ViT 权重通过插值法用于位置嵌入初始化，以进一步提高 cViT 的性能。实验结果表明，所提出的 cViT 在分类精度 Top-1 和 Top-5 方面优于现有的压缩域分类网络。此外，与像素域方法相比，所提出的 cViT 能以更高的计算效率获得有竞争力的分类精度。

{"title":"Compressed-Domain Vision Transformer for Image Classification","authors":"Ruolei Ji;Lina J. Karam","doi":"10.1109/JETCAS.2024.3394878","DOIUrl":"10.1109/JETCAS.2024.3394878","url":null,"abstract":"Compressed-domain visual task schemes, where visual processing or computer vision are directly performed on the compressed-domain representations, were shown to achieve a higher computational efficiency during training and deployment by avoiding the need to decode the compressed visual information while resulting in a competitive or even better performance as compared to corresponding spatial-domain visual tasks. This work is concerned with learning-based compressed-domain image classification, where the image classification is performed directly on compressed-domain representations, also known as latent representations, that are obtained using a learning-based visual encoder. In this paper, a compressed-domain Vision Transformer (cViT) is proposed to perform image classification in the learning-based compressed-domain. For this purpose, the Vision Transformer (ViT) architecture is adopted and modified to perform classification directly in the compressed-domain. As part of this work, a novel feature patch embedding is introduced leveraging the within- and cross-channel information in the compressed-domain. Also, an adaptation training strategy is designed to adopt the weights from the pre-trained spatial-domain ViT and adapt these to the compressed-domain classification task. Furthermore, the pre-trained ViT weights are utilized through interpolation for position embedding initialization to further improve the performance of cViT. The experimental results show that the proposed cViT outperforms the existing compressed-domain classification networks in terms of Top-1 and Top-5 classification accuracies. Moreover, the proposed cViT can yield competitive classification accuracies with a significantly higher computational efficiency as compared to pixel-domain approaches.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 2","pages":"299-310"},"PeriodicalIF":3.7,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140826671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

IEEE Journal on Emerging and Selected Topics in Circuits and Systems

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀