Alon Harell, Yalda Foroutan, Nilesh Ahuja, Parual Datta, Bhavya Kanzariya, V Srinivasa Somayazulu, Omesh Tickoo, Anderson de Andrade, Ivan V Bajic
{"title":"Rate-Distortion Theory in Coding for Machines and its Applications.","authors":"Alon Harell, Yalda Foroutan, Nilesh Ahuja, Parual Datta, Bhavya Kanzariya, V Srinivasa Somayazulu, Omesh Tickoo, Anderson de Andrade, Ivan V Bajic","doi":"10.1109/TPAMI.2025.3548516","DOIUrl":null,"url":null,"abstract":"<p><p>Recent years have seen a tremendous growth in both the capability and popularity of automatic machine analysis of media, especially images and video. As a result, a growing need for efficient compression methods optimised for machine vision, rather than human vision, has emerged. To meet this growing demand, significant developments have been made in image and video coding for machines. Unfortunately, while there is a substantial body of knowledge regarding rate-distortion theory for human vision, the same cannot be said of machine analysis. In this paper, we greatly extend the current rate-distortion theory for machines, providing insight into important design considerations of machine-vision codecs. We then utilise this newfound understanding to improve several methods for learned image coding for machines. Our proposed methods achieve state-of-the-art rate-distortion performance on several computer vision tasks - classification, instance and semantic segmentation, and object detection.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TPAMI.2025.3548516","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Recent years have seen a tremendous growth in both the capability and popularity of automatic machine analysis of media, especially images and video. As a result, a growing need for efficient compression methods optimised for machine vision, rather than human vision, has emerged. To meet this growing demand, significant developments have been made in image and video coding for machines. Unfortunately, while there is a substantial body of knowledge regarding rate-distortion theory for human vision, the same cannot be said of machine analysis. In this paper, we greatly extend the current rate-distortion theory for machines, providing insight into important design considerations of machine-vision codecs. We then utilise this newfound understanding to improve several methods for learned image coding for machines. Our proposed methods achieve state-of-the-art rate-distortion performance on several computer vision tasks - classification, instance and semantic segmentation, and object detection.