{"title":"基于隐式神经表征的人机协作图像压缩方法","authors":"Huanyang Li;Xinfeng Zhang","doi":"10.1109/JETCAS.2024.3386639","DOIUrl":null,"url":null,"abstract":"With the explosive increase in the volume of images intended for analysis by AI, image coding for machine have been proposed to transmit information in a machine-interpretable format, thereby enhancing image compression efficiency. However, such efficient coding schemes often lead to issues like loss of image details and features, and unclear semantic information due to high data compression ratio, making them less suitable for human vision domains. Thus, it is a critical problem to balance image visual quality and machine vision accuracy at a given compression ratio. To address these issues, we introduce a human-machine collaborative image coding framework based on Implicit Neural Representations (INR), which effectively reduces the transmitted information for machine vision tasks at the decoding side while maintaining high-efficiency image compression for human vision against INR compression framework. To enhance the model’s perception of images for machine vision, we design a semantic embedding enhancement module to assist in understanding image semantics. Specifically, we employ the Swin Transformer model to initialize image features, ensuring that the embedding of the compression model are effectively applicable to downstream visual tasks. Extensive experimental results demonstrate that our method significantly outperforms other image compression methods in classification tasks while ensuring image compression efficiency.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":null,"pages":null},"PeriodicalIF":3.7000,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Human–Machine Collaborative Image Compression Method Based on Implicit Neural Representations\",\"authors\":\"Huanyang Li;Xinfeng Zhang\",\"doi\":\"10.1109/JETCAS.2024.3386639\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the explosive increase in the volume of images intended for analysis by AI, image coding for machine have been proposed to transmit information in a machine-interpretable format, thereby enhancing image compression efficiency. However, such efficient coding schemes often lead to issues like loss of image details and features, and unclear semantic information due to high data compression ratio, making them less suitable for human vision domains. Thus, it is a critical problem to balance image visual quality and machine vision accuracy at a given compression ratio. To address these issues, we introduce a human-machine collaborative image coding framework based on Implicit Neural Representations (INR), which effectively reduces the transmitted information for machine vision tasks at the decoding side while maintaining high-efficiency image compression for human vision against INR compression framework. To enhance the model’s perception of images for machine vision, we design a semantic embedding enhancement module to assist in understanding image semantics. Specifically, we employ the Swin Transformer model to initialize image features, ensuring that the embedding of the compression model are effectively applicable to downstream visual tasks. Extensive experimental results demonstrate that our method significantly outperforms other image compression methods in classification tasks while ensuring image compression efficiency.\",\"PeriodicalId\":48827,\"journal\":{\"name\":\"IEEE Journal on Emerging and Selected Topics in Circuits and Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-04-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Journal on Emerging and Selected Topics in Circuits and Systems\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10495030/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10495030/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Human–Machine Collaborative Image Compression Method Based on Implicit Neural Representations
With the explosive increase in the volume of images intended for analysis by AI, image coding for machine have been proposed to transmit information in a machine-interpretable format, thereby enhancing image compression efficiency. However, such efficient coding schemes often lead to issues like loss of image details and features, and unclear semantic information due to high data compression ratio, making them less suitable for human vision domains. Thus, it is a critical problem to balance image visual quality and machine vision accuracy at a given compression ratio. To address these issues, we introduce a human-machine collaborative image coding framework based on Implicit Neural Representations (INR), which effectively reduces the transmitted information for machine vision tasks at the decoding side while maintaining high-efficiency image compression for human vision against INR compression framework. To enhance the model’s perception of images for machine vision, we design a semantic embedding enhancement module to assist in understanding image semantics. Specifically, we employ the Swin Transformer model to initialize image features, ensuring that the embedding of the compression model are effectively applicable to downstream visual tasks. Extensive experimental results demonstrate that our method significantly outperforms other image compression methods in classification tasks while ensuring image compression efficiency.
期刊介绍:
The IEEE Journal on Emerging and Selected Topics in Circuits and Systems is published quarterly and solicits, with particular emphasis on emerging areas, special issues on topics that cover the entire scope of the IEEE Circuits and Systems (CAS) Society, namely the theory, analysis, design, tools, and implementation of circuits and systems, spanning their theoretical foundations, applications, and architectures for signal and information processing.