Mohammed Alsaafin, Musab Alsheikh, Saeed Anwar, Muhammad Usman
{"title":"Attention Down-Sampling Transformer, Relative Ranking and Self-Consistency for Blind Image Quality Assessment","authors":"Mohammed Alsaafin, Musab Alsheikh, Saeed Anwar, Muhammad Usman","doi":"arxiv-2409.07115","DOIUrl":null,"url":null,"abstract":"The no-reference image quality assessment is a challenging domain that\naddresses estimating image quality without the original reference. We introduce\nan improved mechanism to extract local and non-local information from images\nvia different transformer encoders and CNNs. The utilization of Transformer\nencoders aims to mitigate locality bias and generate a non-local representation\nby sequentially processing CNN features, which inherently capture local visual\nstructures. Establishing a stronger connection between subjective and objective\nassessments is achieved through sorting within batches of images based on\nrelative distance information. A self-consistency approach to self-supervision\nis presented, explicitly addressing the degradation of no-reference image\nquality assessment (NR-IQA) models under equivariant transformations. Our\napproach ensures model robustness by maintaining consistency between an image\nand its horizontally flipped equivalent. Through empirical evaluation of five\npopular image quality assessment datasets, the proposed model outperforms\nalternative algorithms in the context of no-reference image quality assessment\ndatasets, especially on smaller datasets. Codes are available at\n\\href{https://github.com/mas94/ADTRS}{https://github.com/mas94/ADTRS}","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Image and Video Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07115","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The no-reference image quality assessment is a challenging domain that
addresses estimating image quality without the original reference. We introduce
an improved mechanism to extract local and non-local information from images
via different transformer encoders and CNNs. The utilization of Transformer
encoders aims to mitigate locality bias and generate a non-local representation
by sequentially processing CNN features, which inherently capture local visual
structures. Establishing a stronger connection between subjective and objective
assessments is achieved through sorting within batches of images based on
relative distance information. A self-consistency approach to self-supervision
is presented, explicitly addressing the degradation of no-reference image
quality assessment (NR-IQA) models under equivariant transformations. Our
approach ensures model robustness by maintaining consistency between an image
and its horizontally flipped equivalent. Through empirical evaluation of five
popular image quality assessment datasets, the proposed model outperforms
alternative algorithms in the context of no-reference image quality assessment
datasets, especially on smaller datasets. Codes are available at
\href{https://github.com/mas94/ADTRS}{https://github.com/mas94/ADTRS}