Pub Date : 2024-10-29DOI: 10.1109/TIP.2024.3485484
Chenghao Xu;Jiexi Yan;Muli Yang;Cheng Deng
In the practical application of image generation, dealing with long-tailed data distributions is a common challenge for diffusion-based generative models. To tackle this issue, we investigate the head-class accumulation effect in diffusion models’ latent space, particularly focusing on its correlation to the noise sampling strategy. Our experimental analysis indicates that employing a consistent sampling distribution for the noise prior across all classes leads to a significant bias towards head classes in the noise sampling distribution, which results in poor quality and diversity of the generated images. Motivated by this observation, we propose a novel sampling strategy named Bias-aware Prior Adjusting (BPA) to debias diffusion models in the class-imbalanced scenario. With BPA, each class is automatically assigned an adaptive noise sampling distribution prior during training, effectively mitigating the influence of class imbalance on the generation process. Extensive experiments on several benchmarks demonstrate that images generated using our proposed BPA showcase elevated diversity and superior quality.
{"title":"Rethinking Noise Sampling in Class-Imbalanced Diffusion Models","authors":"Chenghao Xu;Jiexi Yan;Muli Yang;Cheng Deng","doi":"10.1109/TIP.2024.3485484","DOIUrl":"10.1109/TIP.2024.3485484","url":null,"abstract":"In the practical application of image generation, dealing with long-tailed data distributions is a common challenge for diffusion-based generative models. To tackle this issue, we investigate the head-class accumulation effect in diffusion models’ latent space, particularly focusing on its correlation to the noise sampling strategy. Our experimental analysis indicates that employing a consistent sampling distribution for the noise prior across all classes leads to a significant bias towards head classes in the noise sampling distribution, which results in poor quality and diversity of the generated images. Motivated by this observation, we propose a novel sampling strategy named Bias-aware Prior Adjusting (BPA) to debias diffusion models in the class-imbalanced scenario. With BPA, each class is automatically assigned an adaptive noise sampling distribution prior during training, effectively mitigating the influence of class imbalance on the generation process. Extensive experiments on several benchmarks demonstrate that images generated using our proposed BPA showcase elevated diversity and superior quality.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6298-6308"},"PeriodicalIF":0.0,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142541349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-25DOI: 10.1109/TIP.2024.3482191
Yuanman Li;Yingjie He;Changsheng Chen;Li Dong;Bin Li;Jiantao Zhou;Xia Li
Recent advances in deep learning algorithms have shown impressive progress in image copy-move forgery detection (CMFD). However, these algorithms lack generalizability in practical scenarios where the copied regions are not present in the training images, or the cloned regions are part of the background. Additionally, these algorithms utilize convolution operations to distinguish source and target regions, leading to unsatisfactory results when the target regions blend well with the background. To address these limitations, this study proposes a novel end-to-end CMFD framework that integrates the strengths of conventional and deep learning methods. Specifically, the study develops a deep cross-scale PatchMatch (PM) method that is customized for CMFD to locate copy-move regions. Unlike existing deep models, our approach utilizes features extracted from high-resolution scales to seek explicit and reliable point-to-point matching between source and target regions. Furthermore, we propose a novel pairwise rank learning framework to separate source and target regions. By leveraging the strong prior of point-to-point matches, the framework can identify subtle differences and effectively discriminate between source and target regions, even when the target regions blend well with the background. Our framework is fully differentiable and can be trained end-to-end. Comprehensive experimental results highlight the remarkable generalizability of our scheme across various copy-move scenarios, significantly outperforming existing methods.
{"title":"Image Copy-Move Forgery Detection via Deep PatchMatch and Pairwise Ranking Learning","authors":"Yuanman Li;Yingjie He;Changsheng Chen;Li Dong;Bin Li;Jiantao Zhou;Xia Li","doi":"10.1109/TIP.2024.3482191","DOIUrl":"10.1109/TIP.2024.3482191","url":null,"abstract":"Recent advances in deep learning algorithms have shown impressive progress in image copy-move forgery detection (CMFD). However, these algorithms lack generalizability in practical scenarios where the copied regions are not present in the training images, or the cloned regions are part of the background. Additionally, these algorithms utilize convolution operations to distinguish source and target regions, leading to unsatisfactory results when the target regions blend well with the background. To address these limitations, this study proposes a novel end-to-end CMFD framework that integrates the strengths of conventional and deep learning methods. Specifically, the study develops a deep cross-scale PatchMatch (PM) method that is customized for CMFD to locate copy-move regions. Unlike existing deep models, our approach utilizes features extracted from high-resolution scales to seek explicit and reliable point-to-point matching between source and target regions. Furthermore, we propose a novel pairwise rank learning framework to separate source and target regions. By leveraging the strong prior of point-to-point matches, the framework can identify subtle differences and effectively discriminate between source and target regions, even when the target regions blend well with the background. Our framework is fully differentiable and can be trained end-to-end. Comprehensive experimental results highlight the remarkable generalizability of our scheme across various copy-move scenarios, significantly outperforming existing methods.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"425-440"},"PeriodicalIF":0.0,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142490459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
High dynamic range (HDR) video offers a more realistic visual experience than standard dynamic range (SDR) video, while introducing new challenges to both compression and transmission. Rate control is an effective technology to overcome these challenges, and ensure optimal HDR video delivery. However, the rate control algorithm in the latest video coding standard, versatile video coding (VVC), is tailored to SDR videos, and does not produce well coding results when encoding HDR videos. To address this problem, a data-driven $lambda $