A study of the evaluation metrics for generative images containing combinational creativity

IF 2.3 3区工程技术 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Ai Edam-Artificial Intelligence for Engineering Design Analysis and Manufacturing Pub Date : 2023-03-23 DOI:10.1017/S0890060423000069

Boheng Wang, Yunhuai Zhu, Liuqing Chen, Jingcheng Liu, Lingyun Sun, P. Childs

{"title":"A study of the evaluation metrics for generative images containing combinational creativity","authors":"Boheng Wang, Yunhuai Zhu, Liuqing Chen, Jingcheng Liu, Lingyun Sun, P. Childs","doi":"10.1017/S0890060423000069","DOIUrl":null,"url":null,"abstract":"Abstract In the field of content generation by machine, the state-of-the-art text-to-image model, DALL⋅E, has advanced and diverse capacities for the combinational image generation with specific textual prompts. The images generated by DALL⋅E seem to exhibit an appreciable level of combinational creativity close to that of humans in terms of visualizing a combinational idea. Although there are several common metrics which can be applied to assess the quality of the images generated by generative models, such as IS, FID, GIQA, and CLIP, it is unclear whether these metrics are equally applicable to assessing images containing combinational creativity. In this study, we collected the generated image data from machine (DALL⋅E) and human designers, respectively. The results of group ranking in the Consensual Assessment Technique (CAT) and the Turing Test (TT) were used as the benchmarks to assess the combinational creativity. Considering the metrics’ mathematical principles and different starting points in evaluating image quality, we introduced coincident rate (CR) and average rank variation (ARV) which are two comparable spaces. An experiment to calculate the consistency of group ranking of each metric by comparing the benchmarks then was conducted. By comparing the consistency results of CR and ARV on group ranking, we summarized the applicability of the existing evaluation metrics in assessing generative images containing combinational creativity. In the four metrics, GIQA performed the closest consistency to the CAT and TT. It shows the potential as an automated assessment for images containing combinational creativity, which can be used to evaluate the images containing combinational creativity in the relevant task of design and engineering such as conceptual sketch, digital design image, and prototyping image.","PeriodicalId":50951,"journal":{"name":"Ai Edam-Artificial Intelligence for Engineering Design Analysis and Manufacturing","volume":" ","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2023-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ai Edam-Artificial Intelligence for Engineering Design Analysis and Manufacturing","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1017/S0890060423000069","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Abstract In the field of content generation by machine, the state-of-the-art text-to-image model, DALL⋅E, has advanced and diverse capacities for the combinational image generation with specific textual prompts. The images generated by DALL⋅E seem to exhibit an appreciable level of combinational creativity close to that of humans in terms of visualizing a combinational idea. Although there are several common metrics which can be applied to assess the quality of the images generated by generative models, such as IS, FID, GIQA, and CLIP, it is unclear whether these metrics are equally applicable to assessing images containing combinational creativity. In this study, we collected the generated image data from machine (DALL⋅E) and human designers, respectively. The results of group ranking in the Consensual Assessment Technique (CAT) and the Turing Test (TT) were used as the benchmarks to assess the combinational creativity. Considering the metrics’ mathematical principles and different starting points in evaluating image quality, we introduced coincident rate (CR) and average rank variation (ARV) which are two comparable spaces. An experiment to calculate the consistency of group ranking of each metric by comparing the benchmarks then was conducted. By comparing the consistency results of CR and ARV on group ranking, we summarized the applicability of the existing evaluation metrics in assessing generative images containing combinational creativity. In the four metrics, GIQA performed the closest consistency to the CAT and TT. It shows the potential as an automated assessment for images containing combinational creativity, which can be used to evaluate the images containing combinational creativity in the relevant task of design and engineering such as conceptual sketch, digital design image, and prototyping image.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

包含组合创造力的生成图像的评价指标研究

摘要在机器内容生成领域，最先进的文本到图像模型DALL∙E具有先进和多样化的能力，可以通过特定的文本提示生成组合图像。DALL·E生成的图像在可视化组合思想方面似乎表现出了与人类接近的组合创造力。尽管有几种常见的指标可以用于评估生成模型生成的图像的质量，如IS、FID、GIQA和CLIP，但尚不清楚这些指标是否同样适用于评估包含组合创造力的图像。在这项研究中，我们分别从机器（DALL∙E）和人类设计师那里收集了生成的图像数据。以同意评估技术（CAT）和图灵测试（TT）中的小组排名结果为基准来评估组合创造力。考虑到度量的数学原理和评估图像质量的不同起点，我们引入了重合率（CR）和平均秩变异（ARV）这两个可比较的空间。然后通过比较基准来计算每个度量的组排名的一致性。通过比较CR和ARV在群体排名上的一致性结果，我们总结了现有评估指标在评估包含组合创造力的生成图像方面的适用性。在这四个指标中，GIQA表现出与CAT和TT最接近的一致性。它显示出对包含组合创造力的图像进行自动评估的潜力，可用于评估设计和工程相关任务中包含组合创造力（如概念草图、数字设计图像和原型图像）的图像。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Ai Edam-Artificial Intelligence for Engineering Design Analysis and Manufacturing 工程技术-工程：制造

CiteScore

4.40

自引率

14.30%

发文量

审稿时长

>12 weeks

期刊介绍： The journal publishes original articles about significant AI theory and applications based on the most up-to-date research in all branches and phases of engineering. Suitable topics include: analysis and evaluation; selection; configuration and design; manufacturing and assembly; and concurrent engineering. Specifically, the journal is interested in the use of AI in planning, design, analysis, simulation, qualitative reasoning, spatial reasoning and graphics, manufacturing, assembly, process planning, scheduling, numerical analysis, optimization, distributed systems, multi-agent applications, cooperation, cognitive modeling, learning and creativity. AI EDAM is also interested in original, major applications of state-of-the-art knowledge-based techniques to important engineering problems.