A study of the evaluation metrics for generative images containing combinational creativity

IF 1.7 3区 工程技术 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Ai Edam-Artificial Intelligence for Engineering Design Analysis and Manufacturing Pub Date : 2023-03-23 DOI:10.1017/S0890060423000069
Boheng Wang, Yunhuai Zhu, Liuqing Chen, Jingcheng Liu, Lingyun Sun, P. Childs
{"title":"A study of the evaluation metrics for generative images containing combinational creativity","authors":"Boheng Wang, Yunhuai Zhu, Liuqing Chen, Jingcheng Liu, Lingyun Sun, P. Childs","doi":"10.1017/S0890060423000069","DOIUrl":null,"url":null,"abstract":"Abstract In the field of content generation by machine, the state-of-the-art text-to-image model, DALL⋅E, has advanced and diverse capacities for the combinational image generation with specific textual prompts. The images generated by DALL⋅E seem to exhibit an appreciable level of combinational creativity close to that of humans in terms of visualizing a combinational idea. Although there are several common metrics which can be applied to assess the quality of the images generated by generative models, such as IS, FID, GIQA, and CLIP, it is unclear whether these metrics are equally applicable to assessing images containing combinational creativity. In this study, we collected the generated image data from machine (DALL⋅E) and human designers, respectively. The results of group ranking in the Consensual Assessment Technique (CAT) and the Turing Test (TT) were used as the benchmarks to assess the combinational creativity. Considering the metrics’ mathematical principles and different starting points in evaluating image quality, we introduced coincident rate (CR) and average rank variation (ARV) which are two comparable spaces. An experiment to calculate the consistency of group ranking of each metric by comparing the benchmarks then was conducted. By comparing the consistency results of CR and ARV on group ranking, we summarized the applicability of the existing evaluation metrics in assessing generative images containing combinational creativity. In the four metrics, GIQA performed the closest consistency to the CAT and TT. It shows the potential as an automated assessment for images containing combinational creativity, which can be used to evaluate the images containing combinational creativity in the relevant task of design and engineering such as conceptual sketch, digital design image, and prototyping image.","PeriodicalId":50951,"journal":{"name":"Ai Edam-Artificial Intelligence for Engineering Design Analysis and Manufacturing","volume":" ","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2023-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ai Edam-Artificial Intelligence for Engineering Design Analysis and Manufacturing","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1017/S0890060423000069","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Abstract In the field of content generation by machine, the state-of-the-art text-to-image model, DALL⋅E, has advanced and diverse capacities for the combinational image generation with specific textual prompts. The images generated by DALL⋅E seem to exhibit an appreciable level of combinational creativity close to that of humans in terms of visualizing a combinational idea. Although there are several common metrics which can be applied to assess the quality of the images generated by generative models, such as IS, FID, GIQA, and CLIP, it is unclear whether these metrics are equally applicable to assessing images containing combinational creativity. In this study, we collected the generated image data from machine (DALL⋅E) and human designers, respectively. The results of group ranking in the Consensual Assessment Technique (CAT) and the Turing Test (TT) were used as the benchmarks to assess the combinational creativity. Considering the metrics’ mathematical principles and different starting points in evaluating image quality, we introduced coincident rate (CR) and average rank variation (ARV) which are two comparable spaces. An experiment to calculate the consistency of group ranking of each metric by comparing the benchmarks then was conducted. By comparing the consistency results of CR and ARV on group ranking, we summarized the applicability of the existing evaluation metrics in assessing generative images containing combinational creativity. In the four metrics, GIQA performed the closest consistency to the CAT and TT. It shows the potential as an automated assessment for images containing combinational creativity, which can be used to evaluate the images containing combinational creativity in the relevant task of design and engineering such as conceptual sketch, digital design image, and prototyping image.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
包含组合创造力的生成图像的评价指标研究
摘要在机器内容生成领域,最先进的文本到图像模型DALL∙E具有先进和多样化的能力,可以通过特定的文本提示生成组合图像。DALL·E生成的图像在可视化组合思想方面似乎表现出了与人类接近的组合创造力。尽管有几种常见的指标可以用于评估生成模型生成的图像的质量,如IS、FID、GIQA和CLIP,但尚不清楚这些指标是否同样适用于评估包含组合创造力的图像。在这项研究中,我们分别从机器(DALL∙E)和人类设计师那里收集了生成的图像数据。以同意评估技术(CAT)和图灵测试(TT)中的小组排名结果为基准来评估组合创造力。考虑到度量的数学原理和评估图像质量的不同起点,我们引入了重合率(CR)和平均秩变异(ARV)这两个可比较的空间。然后通过比较基准来计算每个度量的组排名的一致性。通过比较CR和ARV在群体排名上的一致性结果,我们总结了现有评估指标在评估包含组合创造力的生成图像方面的适用性。在这四个指标中,GIQA表现出与CAT和TT最接近的一致性。它显示出对包含组合创造力的图像进行自动评估的潜力,可用于评估设计和工程相关任务中包含组合创造力(如概念草图、数字设计图像和原型图像)的图像。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
4.40
自引率
14.30%
发文量
27
审稿时长
>12 weeks
期刊介绍: The journal publishes original articles about significant AI theory and applications based on the most up-to-date research in all branches and phases of engineering. Suitable topics include: analysis and evaluation; selection; configuration and design; manufacturing and assembly; and concurrent engineering. Specifically, the journal is interested in the use of AI in planning, design, analysis, simulation, qualitative reasoning, spatial reasoning and graphics, manufacturing, assembly, process planning, scheduling, numerical analysis, optimization, distributed systems, multi-agent applications, cooperation, cognitive modeling, learning and creativity. AI EDAM is also interested in original, major applications of state-of-the-art knowledge-based techniques to important engineering problems.
期刊最新文献
Does empathy lead to creativity? A simulation-based investigation on the role of team trait empathy on nominal group concept generation and early concept screening A knowledge-enabled approach for user experience-driven product improvement at the conceptual design stage Free-text inspiration search for systematic bio-inspiration support of engineering design Tool life prediction via SMB-enabled monitor based on BPNN coupling algorithms for sustainable manufacturing A comparative review on the role of stimuli in idea generation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1