Grade Inflation in Generative Models.

ArXiv Pub Date : 2025-01-22
Phuc Nguyen, Miao Li, Alexandra Morgan, Rima Arnaout, Ramy Arnaout
{"title":"Grade Inflation in Generative Models.","authors":"Phuc Nguyen, Miao Li, Alexandra Morgan, Rima Arnaout, Ramy Arnaout","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>Generative models hold great potential, but only if one can trust the evaluation of the data they generate. We show that many commonly used quality scores for comparing two-dimensional distributions of synthetic vs. ground-truth data give better results than they should, a phenomenon we call the \"grade inflation problem.\" We show that the correlation score, Jaccard score, earth-mover's score, and Kullback-Leibler (relative-entropy) score all suffer grade inflation. We propose that any score that values all datapoints equally, as these do, will also exhibit grade inflation; we refer to such scores as \"equipoint\" scores. We introduce the concept of \"equidensity\" scores, and present the Eden score, to our knowledge the first example of such a score. We found that Eden avoids grade inflation and agrees better with human perception of goodness-of-fit than the equipoint scores above. We propose that any reasonable equidensity score will avoid grade inflation. We identify a connection between equidensity scores and R\\'enyi entropy of negative order. We conclude that equidensity scores are likely to outperform equipoint scores for generative models, and for comparing low-dimensional distributions more generally.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11722526/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Generative models hold great potential, but only if one can trust the evaluation of the data they generate. We show that many commonly used quality scores for comparing two-dimensional distributions of synthetic vs. ground-truth data give better results than they should, a phenomenon we call the "grade inflation problem." We show that the correlation score, Jaccard score, earth-mover's score, and Kullback-Leibler (relative-entropy) score all suffer grade inflation. We propose that any score that values all datapoints equally, as these do, will also exhibit grade inflation; we refer to such scores as "equipoint" scores. We introduce the concept of "equidensity" scores, and present the Eden score, to our knowledge the first example of such a score. We found that Eden avoids grade inflation and agrees better with human perception of goodness-of-fit than the equipoint scores above. We propose that any reasonable equidensity score will avoid grade inflation. We identify a connection between equidensity scores and R\'enyi entropy of negative order. We conclude that equidensity scores are likely to outperform equipoint scores for generative models, and for comparing low-dimensional distributions more generally.

分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
生成模型中的等级膨胀。
生成模型具有巨大的潜力,但前提是人们可以信任它们生成的数据的评估。我们表明,许多常用的质量分数用于比较合成数据和真实数据的二维分布,结果比它们应该得到的要好,这种现象我们称之为“分数膨胀问题”。我们表明,相关分数、Jaccard分数、土动者分数和Kullback-Leibler(相对熵)分数都遭受了等级膨胀。我们认为,所有数据点的分数都是相等的,就像这些一样,也会出现分数膨胀;我们把这样的分数称为“等分”分数。我们引入了“相等性”分数的概念,并提出伊甸园分数,据我们所知,这是这种分数的第一个例子。我们发现,Eden避免了分数膨胀,并且比上述等分分数更符合人类对拟合优度的感知。我们建议任何合理的平均分都可以避免分数膨胀。我们确定了等密度分数与负阶R′enyi熵之间的联系。我们得出结论,对于生成模型和比较低维分布,等密度分数可能优于等点分数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Grade Inflation in Generative Models. A recent evaluation on the performance of LLMs on radiation oncology physics using questions of randomly shuffled options. A Systematic Computational Framework for Practical Identifiability Analysis in Mathematical Models Arising from Biology. Back to the Continuous Attractor. Inferring resource competition in microbial communities from time series.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1