Pub Date : 2025-02-10DOI: 10.1109/JETCAS.2025.3540360
Christian Herglotz;Daniel Palomino;Olivier Le Meur;C.-C. Jay Kuo
The past years have shown that due to the global success of video communication technology, the corresponding hardware systems nowadays contribute significantly to pollution and resource consumption on a global scale, accounting for 1% of global green house gas emissions in 2018. This aspect of sustainability has thus reached increasing attention in academia and industry. In this paper, we present different aspects of sustainability including resource consumption and greenhouse gas emissions, while putting a major focus on the energy consumption during the use of video systems. Finally, we provide an overview on recent research in the domain of green video communications showing promising results and highlighting areas where more research should be performed.
{"title":"Circuits and Systems for Green Video Communications: Fundamentals and Recent Trends","authors":"Christian Herglotz;Daniel Palomino;Olivier Le Meur;C.-C. Jay Kuo","doi":"10.1109/JETCAS.2025.3540360","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3540360","url":null,"abstract":"The past years have shown that due to the global success of video communication technology, the corresponding hardware systems nowadays contribute significantly to pollution and resource consumption on a global scale, accounting for 1% of global green house gas emissions in 2018. This aspect of sustainability has thus reached increasing attention in academia and industry. In this paper, we present different aspects of sustainability including resource consumption and greenhouse gas emissions, while putting a major focus on the energy consumption during the use of video systems. Finally, we provide an overview on recent research in the domain of green video communications showing promising results and highlighting areas where more research should be performed.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 1","pages":"4-15"},"PeriodicalIF":3.7,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143602009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-07DOI: 10.1109/JETCAS.2025.3539948
Mohammad Ghasempour;Hadi Amirpour;Christian Timmerer
Live video streaming’s growing demand for high-quality content has resulted in significant energy consumption, creating challenges for sustainable media delivery. Traditional adaptive video streaming approaches rely on the over-provisioning of resources leading to a fixed bitrate ladder, which is often inefficient for the heterogeneous set of use cases and video content. Although dynamic approaches like per-title encoding optimize the bitrate ladder for each video, they mainly target video-on-demand to avoid latency and fail to address energy consumption. In this paper, we present LiveESTR, a method for building a quality- and energy-aware bitrate ladder for live video streaming. LiveESTR eliminates the need for exhaustive video encoding processes on the server side, ensuring that the bitrate ladder construction process is fast and energy efficient. A lightweight model for multi-label classification, along with a lookup table, is utilized to estimate the optimized resolution-bitrate pair in the bitrate ladder. Furthermore, both spatial and temporal resolutions are supported to achieve high energy savings while preserving compression efficiency. Therefore, a tunable parameter $lambda $ and a threshold $tau $ are introduced to balance the trade-off between compression/quality and energy efficiency. Experimental results show that LiveESTR reduces the encoder and decoder energy consumption by 74.6 % and 29.7 %, with only a 2.1 % increase in Bjøntegaard Delta Rate (BD-Rate) compared to traditional per-title encoding. Furthermore, it is shown that by increasing $lambda $ to prioritize video quality, LiveESTR achieves 2.2 % better compression efficiency in terms of BD-Rate while still reducing decoder energy consumption by 7.5 %.
直播视频流对高质量内容的需求不断增长,导致了大量的能源消耗,为可持续的媒体交付带来了挑战。传统的自适应视频流方法依赖于资源的过度供应,导致固定的比特率阶梯,这对于异构的用例和视频内容集通常是低效的。虽然像按标题编码这样的动态方法优化了每个视频的比特率阶梯,但它们主要针对视频点播,以避免延迟,无法解决能耗问题。在本文中,我们提出了LiveESTR,一种为实时视频流构建质量和能量感知比特率阶梯的方法。LiveESTR消除了在服务器端进行详尽的视频编码过程的需要,确保了比特率阶梯构建过程的快速和节能。使用轻量级的多标签分类模型和查找表来估计比特率阶梯中优化的分辨率-比特率对。此外,支持空间和时间分辨率,在保持压缩效率的同时实现高能量节约。因此,引入可调参数$lambda $和阈值$tau $来平衡压缩/质量和能源效率之间的权衡。实验结果表明,LiveESTR将编码器和解码器的能耗降低了74.6% % and 29.7 %, with only a 2.1 % increase in Bjøntegaard Delta Rate (BD-Rate) compared to traditional per-title encoding. Furthermore, it is shown that by increasing $lambda $ to prioritize video quality, LiveESTR achieves 2.2 % better compression efficiency in terms of BD-Rate while still reducing decoder energy consumption by 7.5 %.
{"title":"Real-Time Quality- and Energy-Aware Bitrate Ladder Construction for Live Video Streaming","authors":"Mohammad Ghasempour;Hadi Amirpour;Christian Timmerer","doi":"10.1109/JETCAS.2025.3539948","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3539948","url":null,"abstract":"Live video streaming’s growing demand for high-quality content has resulted in significant energy consumption, creating challenges for sustainable media delivery. Traditional adaptive video streaming approaches rely on the over-provisioning of resources leading to a fixed bitrate ladder, which is often inefficient for the heterogeneous set of use cases and video content. Although dynamic approaches like per-title encoding optimize the bitrate ladder for each video, they mainly target video-on-demand to avoid latency and fail to address energy consumption. In this paper, we present LiveESTR, a method for building a quality- and energy-aware bitrate ladder for live video streaming. LiveESTR eliminates the need for exhaustive video encoding processes on the server side, ensuring that the bitrate ladder construction process is fast and energy efficient. A lightweight model for multi-label classification, along with a lookup table, is utilized to estimate the optimized resolution-bitrate pair in the bitrate ladder. Furthermore, both spatial and temporal resolutions are supported to achieve high energy savings while preserving compression efficiency. Therefore, a tunable parameter <inline-formula> <tex-math>$lambda $ </tex-math></inline-formula> and a threshold <inline-formula> <tex-math>$tau $ </tex-math></inline-formula> are introduced to balance the trade-off between compression/quality and energy efficiency. Experimental results show that LiveESTR reduces the encoder and decoder energy consumption by 74.6 % and 29.7 %, with only a 2.1 % increase in Bjøntegaard Delta Rate (BD-Rate) compared to traditional per-title encoding. Furthermore, it is shown that by increasing <inline-formula> <tex-math>$lambda $ </tex-math></inline-formula> to prioritize video quality, LiveESTR achieves 2.2 % better compression efficiency in terms of BD-Rate while still reducing decoder energy consumption by 7.5 %.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 1","pages":"83-93"},"PeriodicalIF":3.7,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10877851","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143602039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-04DOI: 10.1109/JETCAS.2025.3538652
Runyu Yang;Dong Liu;Feng Wu;Wen Gao
Learned image compression has shown remarkable compression efficiency gain over the traditional image compression solutions, which is partially attributed to the learned entropy models and the adopted entropy coding engine. However, the inference of the entropy models and the sequential nature of the entropy coding both incur high time complexity. Meanwhile, the neural network-based entropy models usually involve floating-point computations, which incur inconsistent probability estimation and decoding failure in different platforms. We address these limitations by introducing an efficient and cross-platform entropy coding method, chain coding-based latent compression (CC-LC), into learned image compression. First, we leverage the classic chain coding and carefully design a block-based entropy coding procedure, significantly reducing the number of coding symbols and thus the coding time. Second, since CC-LC is not based on neural networks, we propose a rate estimation network as a surrogate of CC-LC during the end-to-end training. Third, we alternately train the analysis/synthesis networks and the rate estimation network for the rate-distortion optimization, making the learned latent fit CC-LC. Experimental results show that our method achieves much lower time complexity than the other learned image compression methods, ensures cross-platform consistency, and has comparable compression efficiency with BPG. Our code and models are publicly available at https://github.com/Yang-Runyu/CC-LC.
{"title":"Learned Image Compression With Efficient Cross-Platform Entropy Coding","authors":"Runyu Yang;Dong Liu;Feng Wu;Wen Gao","doi":"10.1109/JETCAS.2025.3538652","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3538652","url":null,"abstract":"Learned image compression has shown remarkable compression efficiency gain over the traditional image compression solutions, which is partially attributed to the learned entropy models and the adopted entropy coding engine. However, the inference of the entropy models and the sequential nature of the entropy coding both incur high time complexity. Meanwhile, the neural network-based entropy models usually involve floating-point computations, which incur inconsistent probability estimation and decoding failure in different platforms. We address these limitations by introducing an efficient and cross-platform entropy coding method, chain coding-based latent compression (CC-LC), into learned image compression. First, we leverage the classic chain coding and carefully design a block-based entropy coding procedure, significantly reducing the number of coding symbols and thus the coding time. Second, since CC-LC is not based on neural networks, we propose a rate estimation network as a surrogate of CC-LC during the end-to-end training. Third, we alternately train the analysis/synthesis networks and the rate estimation network for the rate-distortion optimization, making the learned latent fit CC-LC. Experimental results show that our method achieves much lower time complexity than the other learned image compression methods, ensures cross-platform consistency, and has comparable compression efficiency with BPG. Our code and models are publicly available at <uri>https://github.com/Yang-Runyu/CC-LC</uri>.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 1","pages":"72-82"},"PeriodicalIF":3.7,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143602040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-03DOI: 10.1109/JETCAS.2025.3538016
Rashed Al Amin;Roman Obermaisser
Super-resolution (SR) systems represent a rapidly advancing area within Information and Communication Technology (ICT) due to their significant applications in computer vision and visual communication. Integrating SR systems with Deep Neural Networks (DNNs) is a widely adopted method for leveraging faster and improved image reconstruction. However, the real-time computational demands, extensive energy overhead and the huge memory footprints associated with DNN-based SR systems limit their throughput and scalability. Field-programmable gate arrays (FPGAs) present a viable and promising solution for exploring the structure and architecture of SR systems due to their reconfigurable nature and parallel computing capabilities. The existing FPGA-based solutions can effectively reduce the computational latency in SR systems, they often result in higher resource and energy consumption. Besides, the traditional SR techniques generally focus on either upscaling or downscaling images or videos without offering any scaling reconfigurability. To address these limitations, this paper introduces BiDSRS+, a novel FPGA based resource-efficient and reconfigurable real-time SR system using modified bicubic interpolation method. In addition, BiDSRS+ supports both upscaling and downscaling of images and videos, enhancing its versatility. Evaluations conducted on the Xilinx ZCU 102 FPGA board reveal substantial resource savings, with reductions of 44x LUT, 31x BRAM, and 35x DSP utilization compared to state-of-the-art DNN-based SR systems, albeit with a trade-off in throughput of 0.5x. Furthermore, when compared to leading algorithm-based SR systems, BiDSRS+ achieves reductions of 5.8x LUT, 1.75x BRAM, and 2.3x Power consumption, without compromising the throughput. Due to its high resource efficiency and reconfigurability with a throughput of 4K@60 FPS, BiDSRS+ offers significant advantages in promoting sustainable and energy-efficient green video communication.
{"title":"BiDSRS+: Resource Efficient Reconfigurable Real Time Bidirectional Super Resolution System for FPGAs","authors":"Rashed Al Amin;Roman Obermaisser","doi":"10.1109/JETCAS.2025.3538016","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3538016","url":null,"abstract":"Super-resolution (SR) systems represent a rapidly advancing area within Information and Communication Technology (ICT) due to their significant applications in computer vision and visual communication. Integrating SR systems with Deep Neural Networks (DNNs) is a widely adopted method for leveraging faster and improved image reconstruction. However, the real-time computational demands, extensive energy overhead and the huge memory footprints associated with DNN-based SR systems limit their throughput and scalability. Field-programmable gate arrays (FPGAs) present a viable and promising solution for exploring the structure and architecture of SR systems due to their reconfigurable nature and parallel computing capabilities. The existing FPGA-based solutions can effectively reduce the computational latency in SR systems, they often result in higher resource and energy consumption. Besides, the traditional SR techniques generally focus on either upscaling or downscaling images or videos without offering any scaling reconfigurability. To address these limitations, this paper introduces <italic>BiDSRS+</i>, a novel FPGA based resource-efficient and reconfigurable real-time SR system using modified bicubic interpolation method. In addition, <italic>BiDSRS+</i> supports both upscaling and downscaling of images and videos, enhancing its versatility. Evaluations conducted on the Xilinx ZCU 102 FPGA board reveal substantial resource savings, with reductions of 44x LUT, 31x BRAM, and 35x DSP utilization compared to state-of-the-art DNN-based SR systems, albeit with a trade-off in throughput of 0.5x. Furthermore, when compared to leading algorithm-based SR systems, <italic>BiDSRS+</i> achieves reductions of 5.8x LUT, 1.75x BRAM, and 2.3x Power consumption, without compromising the throughput. Due to its high resource efficiency and reconfigurability with a throughput of 4K@60 FPS, <italic>BiDSRS+</i> offers significant advantages in promoting sustainable and energy-efficient green video communication.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 1","pages":"120-132"},"PeriodicalIF":3.7,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143601976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As silicon scaling nears its limits and the Big Data era unfolds, in-memory computing is increasingly important for overcoming the Von Neumann bottleneck and thus enhancing modern computing performance. One of the rising in-memory technologies are Memristors, which are resistors capable of memorizing state based on an applied voltage, making them useful for storage and computation. Another emerging computing paradigm is Approximate Computing, which allows for errors in calculations to in turn reduce die area, processing time and energy consumption. In an attempt to combine both concepts and leverage their benefits, we propose the memristor-based adaptive approximate adder ApprOchs - which is able to selectively compute segments of an addition either approximately or exactly. ApprOchs is designed to adapt to the input data given and thus only compute as much as is needed, a quality current State-of-the-Art (SoA) in-memory adders lack. Despite also using OR-based approximation in the lower k bit, ApprOchs has the edge over S-SINC because ApprOchs can skip the computation of the upper n-k bit for a small number of possible input combinations (22k of 22n possible combinations skip the upper bits). Compared to SoA in-memory approximate adders, ApprOchs outperforms them in terms of energy consumption while being highly competitive in terms of error behavior, with moderate speed and area efficiency. In application use cases, ApprOchs demonstrates its energy efficiency, particularly in machine learning applications. In MNIST classification using Deep Convolutional Neural Networks, we achieve 78.4% energy savings compared to SoA approximate adders with the same accuracy as exact adders at 98.9%, while for k-means clustering, we observed a 69% reduction in energy consumption with no quality drop in clustering results compared to the exact computation. For image blurring, we achieve up to 32.7% energy reduction over the exact computation and in its most promising configuration ($k=3$ ), the ApprOchs adder consumes 13.4% less energy than the most energy-efficient competing SoA design (S-SINC+), while achieving a similarly excellent median image quality at 43.74dB PSNR and 0.995 SSIM.
{"title":"ApprOchs: A Memristor-Based In-Memory Adaptive Approximate Adder","authors":"Dominik Ochs;Lukas Rapp;Leandro Borzyk;Nima Amirafshar;Nima TaheriNejad","doi":"10.1109/JETCAS.2025.3537328","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3537328","url":null,"abstract":"As silicon scaling nears its limits and the <italic>Big Data</i> era unfolds, in-memory computing is increasingly important for overcoming the <italic>Von Neumann</i> bottleneck and thus enhancing modern computing performance. One of the rising in-memory technologies are <italic>Memristors</i>, which are resistors capable of memorizing state based on an applied voltage, making them useful for storage and computation. Another emerging computing paradigm is <italic>Approximate Computing</i>, which allows for errors in calculations to in turn reduce die area, processing time and energy consumption. In an attempt to combine both concepts and leverage their benefits, we propose the memristor-based adaptive approximate adder <italic>ApprOchs</i> - which is able to selectively compute segments of an addition either approximately or exactly. ApprOchs is designed to adapt to the input data given and thus only compute as much as is needed, a quality current State-of-the-Art (SoA) in-memory adders lack. Despite also using OR-based approximation in the lower k bit, ApprOchs has the edge over S-SINC because ApprOchs can skip the computation of the upper n-k bit for a small number of possible input combinations (22k of 22n possible combinations skip the upper bits). Compared to SoA in-memory approximate adders, ApprOchs outperforms them in terms of energy consumption while being highly competitive in terms of error behavior, with moderate speed and area efficiency. In application use cases, ApprOchs demonstrates its energy efficiency, particularly in machine learning applications. In MNIST classification using Deep Convolutional Neural Networks, we achieve 78.4% energy savings compared to SoA approximate adders with the same accuracy as exact adders at 98.9%, while for k-means clustering, we observed a 69% reduction in energy consumption with no quality drop in clustering results compared to the exact computation. For image blurring, we achieve up to 32.7% energy reduction over the exact computation and in its most promising configuration (<inline-formula> <tex-math>$k=3$ </tex-math></inline-formula>), the ApprOchs adder consumes 13.4% less energy than the most energy-efficient competing SoA design (S-SINC+), while achieving a similarly excellent median image quality at 43.74dB PSNR and 0.995 SSIM.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 1","pages":"105-119"},"PeriodicalIF":3.7,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143601977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-23DOI: 10.1109/JETCAS.2025.3533041
Alexandre Mercat;Joose Sainio;Steven Le Moan;Christian Herglotz
High-dynamic range (HDR) video content has gained popularity due to its enhanced color depth and luminance range, but it also presents new challenges in terms of compression efficiency and energy consumption. In this paper, we present an in-depth study of the compression performance and energy efficiency of HDR video encoding using High-Efficiency Video Coding (HEVC). In addition to using a native 10-bit HDR encoding configuration as a reference, we explore whether applying tone mapping to an 8-bit representation before encoding can result in additional bitrate and energy savings without compromising visual quality. The main contributions of this work are as follows: 1) a detailed evaluation of four HDR video encoding configurations, three of which leverage tone mapping techniques, 2) a comprehensive experimental setup involving over 15,000 individual encodings across three open-source HEVC encoders (Kvazaar, x265, and SVT-HEVC) and multiple presets, 3) the use of two advanced perception-based metrics for BD-rate calculations, one of which is specifically tailored to capture colour distortions and 4) an open-source dataset consisting of all experimental results for further research. Among the three tone-mapping configurations tested, our findings show that a simple bit-shifting approach can achieves significant reductions in both bitrate and energy consumption compared to the native 10-bit HDR encoding configuration. This research aims to lay an initial foundation for understanding the balance between coding efficiency and energy consumption in HDR video encoding, offering valuable insights to guide future advancements in the field.
高动态范围(HDR)视频内容因其增强的色彩深度和亮度范围而受到欢迎,但它也在压缩效率和能耗方面提出了新的挑战。本文采用高效视频编码(High-Efficiency video Coding, HEVC)技术对HDR视频编码的压缩性能和能效进行了深入研究。除了使用原生10位HDR编码配置作为参考外,我们还探讨了在编码之前将色调映射应用于8位表示是否可以在不影响视觉质量的情况下节省额外的比特率和能源。本工作的主要贡献如下:1)对四种HDR视频编码配置进行详细评估,其中三种利用色调映射技术;2)在三个开源HEVC编码器(Kvazaar, x265和SVT-HEVC)和多个预设中涉及超过15,000个单独编码的综合实验设置;3)使用两个基于感知的高级指标进行bd率计算;其中一个是专门为捕获颜色失真而定制的,4)一个由所有实验结果组成的开源数据集,用于进一步研究。在测试的三种色调映射配置中,我们的研究结果表明,与原生10位HDR编码配置相比,简单的位移位方法可以显著降低比特率和能耗。本研究旨在为理解HDR视频编码中编码效率和能耗之间的平衡奠定初步基础,为指导该领域的未来发展提供有价值的见解。
{"title":"Do We Need 10 bits? Assessing HEVC Encoders for Energy-Efficient HDR Video Streaming","authors":"Alexandre Mercat;Joose Sainio;Steven Le Moan;Christian Herglotz","doi":"10.1109/JETCAS.2025.3533041","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3533041","url":null,"abstract":"High-dynamic range (HDR) video content has gained popularity due to its enhanced color depth and luminance range, but it also presents new challenges in terms of compression efficiency and energy consumption. In this paper, we present an in-depth study of the compression performance and energy efficiency of HDR video encoding using High-Efficiency Video Coding (HEVC). In addition to using a native 10-bit HDR encoding configuration as a reference, we explore whether applying tone mapping to an 8-bit representation before encoding can result in additional bitrate and energy savings without compromising visual quality. The main contributions of this work are as follows: 1) a detailed evaluation of four HDR video encoding configurations, three of which leverage tone mapping techniques, 2) a comprehensive experimental setup involving over 15,000 individual encodings across three open-source HEVC encoders (Kvazaar, x265, and SVT-HEVC) and multiple presets, 3) the use of two advanced perception-based metrics for BD-rate calculations, one of which is specifically tailored to capture colour distortions and 4) an open-source dataset consisting of all experimental results for further research. Among the three tone-mapping configurations tested, our findings show that a simple bit-shifting approach can achieves significant reductions in both bitrate and energy consumption compared to the native 10-bit HDR encoding configuration. This research aims to lay an initial foundation for understanding the balance between coding efficiency and energy consumption in HDR video encoding, offering valuable insights to guide future advancements in the field.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 1","pages":"31-43"},"PeriodicalIF":3.7,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10851260","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143602036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The significant growth in global video data traffic can be mitigated by saliency-based video coding schemes that seek to increase coding efficiency without any loss of objective visual quality by compressing salient video regions less heavily than non-salient regions. However, conducting salient object detection (SOD) on every video frame before encoding tends to lead to substantial complexity and energy consumption overhead, especially if state-of-the-art deep learning techniques are used in saliency detection. This work introduces a saliency-guided video encoding framework that reduces the energy consumption over frame-by-frame SOD by increasing the detection interval and applying the proposed region-of-interest (ROI) tracking between successive detections. The computational complexity of our ROI tracking technique is kept low by predicting object movements from motion vectors, which are inherently calculated during encoding. Our experimental results demonstrate that the proposed ROI tracking solution saves energy by 86-95% and attains 84-94% accuracy over frame-by-frame SOD. Correspondingly, integrating our proposal into the complete saliency-guided video coding scheme reduces energy consumption on CPU by 79-82% at a cost of weighted PSNR of less than 5%. These findings indicate that our solution has significant potential for low-cost and low-power streaming media applications.
{"title":"Energy-Efficient Saliency-Guided Video Coding Framework for Real-Time Applications","authors":"Tero Partanen;Minh Hoang;Alexandre Mercat;Joose Sainio;Jarno Vanne","doi":"10.1109/JETCAS.2024.3525339","DOIUrl":"https://doi.org/10.1109/JETCAS.2024.3525339","url":null,"abstract":"The significant growth in global video data traffic can be mitigated by saliency-based video coding schemes that seek to increase coding efficiency without any loss of objective visual quality by compressing salient video regions less heavily than non-salient regions. However, conducting salient object detection (SOD) on every video frame before encoding tends to lead to substantial complexity and energy consumption overhead, especially if state-of-the-art deep learning techniques are used in saliency detection. This work introduces a saliency-guided video encoding framework that reduces the energy consumption over frame-by-frame SOD by increasing the detection interval and applying the proposed region-of-interest (ROI) tracking between successive detections. The computational complexity of our ROI tracking technique is kept low by predicting object movements from motion vectors, which are inherently calculated during encoding. Our experimental results demonstrate that the proposed ROI tracking solution saves energy by 86-95% and attains 84-94% accuracy over frame-by-frame SOD. Correspondingly, integrating our proposal into the complete saliency-guided video coding scheme reduces energy consumption on CPU by 79-82% at a cost of weighted PSNR of less than 5%. These findings indicate that our solution has significant potential for low-cost and low-power streaming media applications.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 1","pages":"44-57"},"PeriodicalIF":3.7,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10820524","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143601935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-30DOI: 10.1109/JETCAS.2024.3524260
Peilin Chen;Xiaohan Fang;Meng Wang;Shiqi Wang;Siwei Ma
The Human Visual System (HVS), with its intricate sophistication, is capable of achieving ultra-compact information compression for visual signals. This remarkable ability is coupled with high generalization capability and energy efficiency. By contrast, the state-of-the-art Versatile Video Coding (VVC) standard achieves a compression ratio of around 1,000 times for raw visual data. This notable disparity motivates the research community to draw inspiration to effectively handle the immense volume of visual data in a green way. Therefore, this paper provides a survey of how visual data can be efficiently represented for green multimedia, in particular when the ultimate task is knowledge extraction instead of visual signal reconstruction. We introduce recent research efforts that promote green, sustainable, and efficient multimedia in this field. Moreover, we discuss how the deep understanding of the HVS can benefit the research community, and envision the development of future green multimedia technologies.
{"title":"Compact Visual Data Representation for Green Multimedia–A Human Visual System Perspective","authors":"Peilin Chen;Xiaohan Fang;Meng Wang;Shiqi Wang;Siwei Ma","doi":"10.1109/JETCAS.2024.3524260","DOIUrl":"https://doi.org/10.1109/JETCAS.2024.3524260","url":null,"abstract":"The Human Visual System (HVS), with its intricate sophistication, is capable of achieving ultra-compact information compression for visual signals. This remarkable ability is coupled with high generalization capability and energy efficiency. By contrast, the state-of-the-art Versatile Video Coding (VVC) standard achieves a compression ratio of around 1,000 times for raw visual data. This notable disparity motivates the research community to draw inspiration to effectively handle the immense volume of visual data in a green way. Therefore, this paper provides a survey of how visual data can be efficiently represented for green multimedia, in particular when the ultimate task is knowledge extraction instead of visual signal reconstruction. We introduce recent research efforts that promote green, sustainable, and efficient multimedia in this field. Moreover, we discuss how the deep understanding of the HVS can benefit the research community, and envision the development of future green multimedia technologies.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 1","pages":"16-30"},"PeriodicalIF":3.7,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143602038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-25DOI: 10.1109/JETCAS.2024.3523246
Daiane Freitas;Patrick Rosa;Leonardo Müller;Daniel Palomino;Cláudio M. Diniz;Mateus Grellert;Guilherme Corrêa
In modern video encoders, sub-pixel motion models are used to represent smoother transitions between neighboring frames, which is specially useful in regions with intense movement. The AV1 video codec introduces adaptive filtering for sub-pixel interpolation in the inter-frame prediction stage, enhancing flexibility in Motion Estimation (ME) and Motion Compensation (MC), using three filter types: Regular, Sharp, and Smooth. However, the increased variety of filters leads to higher complexity and energy consumption, particularly during the resource-intensive generation of sub-pixel samples. To address this challenge, this paper presents a hardware accelerator optimized for AV1 interpolation, incorporating energy-saving features for unused filters. The accelerator includes one precise version that can be used for both MC and ME and two approximate versions for ME, designed to maximize hardware efficiency and minimize implementation costs. The proposed design can process videos at resolutions up to 4320p at 50 frames per second for MC and 2,656.14 million samples per second for ME, with a power dissipation ranging between 21.25 mW and 40.06 mW, and an average coding efficiency loss of 0.67% and 1.11%, depending on the filter type and version.
{"title":"Low-Power Multiversion Interpolation Filter Accelerator With Hardware Reuse for AV1 Codec","authors":"Daiane Freitas;Patrick Rosa;Leonardo Müller;Daniel Palomino;Cláudio M. Diniz;Mateus Grellert;Guilherme Corrêa","doi":"10.1109/JETCAS.2024.3523246","DOIUrl":"https://doi.org/10.1109/JETCAS.2024.3523246","url":null,"abstract":"In modern video encoders, sub-pixel motion models are used to represent smoother transitions between neighboring frames, which is specially useful in regions with intense movement. The AV1 video codec introduces adaptive filtering for sub-pixel interpolation in the inter-frame prediction stage, enhancing flexibility in Motion Estimation (ME) and Motion Compensation (MC), using three filter types: Regular, Sharp, and Smooth. However, the increased variety of filters leads to higher complexity and energy consumption, particularly during the resource-intensive generation of sub-pixel samples. To address this challenge, this paper presents a hardware accelerator optimized for AV1 interpolation, incorporating energy-saving features for unused filters. The accelerator includes one precise version that can be used for both MC and ME and two approximate versions for ME, designed to maximize hardware efficiency and minimize implementation costs. The proposed design can process videos at resolutions up to 4320p at 50 frames per second for MC and 2,656.14 million samples per second for ME, with a power dissipation ranging between 21.25 mW and 40.06 mW, and an average coding efficiency loss of 0.67% and 1.11%, depending on the filter type and version.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 1","pages":"133-142"},"PeriodicalIF":3.7,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143602016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-13DOI: 10.1109/JETCAS.2024.3502893
{"title":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems Information for Authors","authors":"","doi":"10.1109/JETCAS.2024.3502893","DOIUrl":"https://doi.org/10.1109/JETCAS.2024.3502893","url":null,"abstract":"","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 4","pages":"835-835"},"PeriodicalIF":3.7,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10799918","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142821275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}