评估高阶空间背景下幻觉图像深度生成模型的方法

IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pattern Recognition Letters Pub Date : 2024-09-02 DOI:10.1016/j.patrec.2024.08.023
Rucha Deshpande , Mark A. Anastasio , Frank J. Brooks
{"title":"评估高阶空间背景下幻觉图像深度生成模型的方法","authors":"Rucha Deshpande ,&nbsp;Mark A. Anastasio ,&nbsp;Frank J. Brooks","doi":"10.1016/j.patrec.2024.08.023","DOIUrl":null,"url":null,"abstract":"<div><p>Deep generative models (DGMs) have the potential to revolutionize diagnostic imaging. Generative adversarial networks (GANs) are one kind of DGM which are widely employed. The overarching problem with deploying any sort of DGM in mission-critical applications is a lack of adequate and/or automatic means of assessing the domain-specific quality of generated images. In this work, we demonstrate several objective and human-interpretable tests of images output by two popular DGMs. These tests serve two goals: (i) ruling out DGMs for downstream, domain-specific applications, and (ii) quantifying hallucinations in the expected spatial context in DGM-generated images. The designed datasets are made public and the proposed tests could also serve as benchmarks and aid the prototyping of emerging DGMs. Although these tests are demonstrated on GANs, they can be employed as a benchmark for evaluating any DGM. Specifically, we designed several stochastic context models (SCMs) of distinct image features that can be recovered after generation by a trained DGM. Together, these SCMs encode features as per-image constraints in prevalence, position, intensity, and/or texture. Several of these features are high-order, algorithmic pixel-arrangement rules which are not readily expressed in covariance matrices. We designed and validated statistical classifiers to detect specific effects of the known arrangement rules. We then tested the rates at which two different DGMs correctly reproduced the feature context under a variety of training scenarios, and degrees of feature-class similarity. We found that ensembles of generated images can appear largely accurate visually, and show high accuracy in ensemble measures, while not exhibiting the known spatial arrangements. The main conclusion is that SCMs can be engineered, and serve as benchmarks, to quantify numerous <em>per image</em> errors, <em>i.e.</em>, hallucinations, that may not be captured in ensemble statistics but plausibly can affect subsequent use of the DGM-generated images.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 23-29"},"PeriodicalIF":3.9000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167865524002551/pdfft?md5=5df7937160b427d56d6a3c847ac5fdfc&pid=1-s2.0-S0167865524002551-main.pdf","citationCount":"0","resultStr":"{\"title\":\"A method for evaluating deep generative models of images for hallucinations in high-order spatial context\",\"authors\":\"Rucha Deshpande ,&nbsp;Mark A. Anastasio ,&nbsp;Frank J. Brooks\",\"doi\":\"10.1016/j.patrec.2024.08.023\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Deep generative models (DGMs) have the potential to revolutionize diagnostic imaging. Generative adversarial networks (GANs) are one kind of DGM which are widely employed. The overarching problem with deploying any sort of DGM in mission-critical applications is a lack of adequate and/or automatic means of assessing the domain-specific quality of generated images. In this work, we demonstrate several objective and human-interpretable tests of images output by two popular DGMs. These tests serve two goals: (i) ruling out DGMs for downstream, domain-specific applications, and (ii) quantifying hallucinations in the expected spatial context in DGM-generated images. The designed datasets are made public and the proposed tests could also serve as benchmarks and aid the prototyping of emerging DGMs. Although these tests are demonstrated on GANs, they can be employed as a benchmark for evaluating any DGM. Specifically, we designed several stochastic context models (SCMs) of distinct image features that can be recovered after generation by a trained DGM. Together, these SCMs encode features as per-image constraints in prevalence, position, intensity, and/or texture. Several of these features are high-order, algorithmic pixel-arrangement rules which are not readily expressed in covariance matrices. We designed and validated statistical classifiers to detect specific effects of the known arrangement rules. We then tested the rates at which two different DGMs correctly reproduced the feature context under a variety of training scenarios, and degrees of feature-class similarity. We found that ensembles of generated images can appear largely accurate visually, and show high accuracy in ensemble measures, while not exhibiting the known spatial arrangements. The main conclusion is that SCMs can be engineered, and serve as benchmarks, to quantify numerous <em>per image</em> errors, <em>i.e.</em>, hallucinations, that may not be captured in ensemble statistics but plausibly can affect subsequent use of the DGM-generated images.</p></div>\",\"PeriodicalId\":54638,\"journal\":{\"name\":\"Pattern Recognition Letters\",\"volume\":\"186 \",\"pages\":\"Pages 23-29\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2024-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0167865524002551/pdfft?md5=5df7937160b427d56d6a3c847ac5fdfc&pid=1-s2.0-S0167865524002551-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167865524002551\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865524002551","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

深度生成模型(DGM)有可能彻底改变成像诊断。生成式对抗网络(GAN)是一种被广泛应用的 DGM。在关键任务应用中部署任何类型的 DGM 的首要问题是缺乏适当和/或自动的方法来评估生成图像的特定领域质量。在这项工作中,我们展示了对两种流行的 DGM 所输出图像进行的几种客观且可人为解读的测试。这些测试有两个目的(i) 排除适用于下游特定领域应用的 DGM,(ii) 量化 DGM 生成的图像在预期空间环境中出现的幻觉。所设计的数据集是公开的,所建议的测试也可以作为基准,并有助于新兴 DGM 的原型开发。虽然这些测试是在 GANs 上进行的,但它们可以用作评估任何 DGM 的基准。具体来说,我们设计了几种不同图像特征的随机上下文模型(SCM),可以在训练有素的 DGM 生成后进行恢复。这些随机上下文模型共同将特征编码为每幅图像在流行度、位置、强度和/或纹理方面的约束条件。其中一些特征是高阶算法像素排列规则,不容易用协方差矩阵表示。我们设计并验证了统计分类器,以检测已知排列规则的特定效果。然后,我们测试了两种不同的 DGM 在各种训练场景和特征类相似程度下正确再现特征上下文的比率。我们发现,生成的图像集合可以在视觉上显示出很大程度的准确性,并且在集合测量中显示出很高的准确性,但却没有显示出已知的空间排列。我们的主要结论是,可以设计单片机并将其作为基准,以量化可能无法在集合统计中捕捉到、但可能会影响后续使用 DGM 生成的图像的众多单个图像错误(即幻觉)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A method for evaluating deep generative models of images for hallucinations in high-order spatial context

Deep generative models (DGMs) have the potential to revolutionize diagnostic imaging. Generative adversarial networks (GANs) are one kind of DGM which are widely employed. The overarching problem with deploying any sort of DGM in mission-critical applications is a lack of adequate and/or automatic means of assessing the domain-specific quality of generated images. In this work, we demonstrate several objective and human-interpretable tests of images output by two popular DGMs. These tests serve two goals: (i) ruling out DGMs for downstream, domain-specific applications, and (ii) quantifying hallucinations in the expected spatial context in DGM-generated images. The designed datasets are made public and the proposed tests could also serve as benchmarks and aid the prototyping of emerging DGMs. Although these tests are demonstrated on GANs, they can be employed as a benchmark for evaluating any DGM. Specifically, we designed several stochastic context models (SCMs) of distinct image features that can be recovered after generation by a trained DGM. Together, these SCMs encode features as per-image constraints in prevalence, position, intensity, and/or texture. Several of these features are high-order, algorithmic pixel-arrangement rules which are not readily expressed in covariance matrices. We designed and validated statistical classifiers to detect specific effects of the known arrangement rules. We then tested the rates at which two different DGMs correctly reproduced the feature context under a variety of training scenarios, and degrees of feature-class similarity. We found that ensembles of generated images can appear largely accurate visually, and show high accuracy in ensemble measures, while not exhibiting the known spatial arrangements. The main conclusion is that SCMs can be engineered, and serve as benchmarks, to quantify numerous per image errors, i.e., hallucinations, that may not be captured in ensemble statistics but plausibly can affect subsequent use of the DGM-generated images.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Pattern Recognition Letters
Pattern Recognition Letters 工程技术-计算机:人工智能
CiteScore
12.40
自引率
5.90%
发文量
287
审稿时长
9.1 months
期刊介绍: Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition. Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.
期刊最新文献
Personalized Federated Learning on long-tailed data via knowledge distillation and generated features Adaptive feature alignment for adversarial training Discrete diffusion models with Refined Language-Image Pre-trained representations for remote sensing image captioning A unified framework to stereotyped behavior detection for screening Autism Spectrum Disorder Explainable hypergraphs for gait based Parkinson classification
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1