{"title":"A survey of generative models used in text-to-image","authors":"Jingjing Xu, Jiahao Du, Junyi Wang","doi":"10.54254/2755-2721/79/20241286","DOIUrl":null,"url":null,"abstract":"The emergence and rapid development of neural networks have been pivotal in advancing text-to-image generative models, with particular emphasis on generative adversarial networks (GANs), variational autoencoders (VAEs), and augmented reality (AR). These models have greatly enriched the field, offering diverse avenues for image generation. Critical support has been provided by databases such as MS COCO, Flickr30K, Visual Genome, and Conceptual Captions, along with essential evaluation metrics, including Inception Score (IS), Frchet Inception Distance (FID), precision, and recall. In this comprehensive review, we delve into the mechanisms and significance of each model and technique, ensuring a holistic examination of their contributions. Both GANs and VAEs stand out as significant models within image generative frameworks, each excelling in distinct aspects. Therefore, it is imperative to discuss both models in this review, as they offer complementary strengths. Additionally, we include noteworthy models such as augmented reality to provide a well-rounded assessment of the current advancements in the field. In terms of datasets, MS COCO offers a diverse and extensive collection of images, serving as a cornerstone for model training. Other datasets like Flickr 30k, Visual Genome, and Conceptual Captions contribute valuable labeled examples, further enriching the learning process for these models. The incorporation of widely recognized metrics and methodologies in the field allows for effective evaluation and comparison of their relative significance. In conclusion, the field's recent achievements owe much to the integration of its various components. VAEs and GANs, with their unique strengths, complement each other, while metrics and datasets play complementary roles in advancing the capabilities of generative models in the context of text-to-image synthesis. This survey underscores the collaborative synergy between models, metrics, and datasets, propelling the field toward new horizons.","PeriodicalId":502253,"journal":{"name":"Applied and Computational Engineering","volume":"15 8","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied and Computational Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.54254/2755-2721/79/20241286","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The emergence and rapid development of neural networks have been pivotal in advancing text-to-image generative models, with particular emphasis on generative adversarial networks (GANs), variational autoencoders (VAEs), and augmented reality (AR). These models have greatly enriched the field, offering diverse avenues for image generation. Critical support has been provided by databases such as MS COCO, Flickr30K, Visual Genome, and Conceptual Captions, along with essential evaluation metrics, including Inception Score (IS), Frchet Inception Distance (FID), precision, and recall. In this comprehensive review, we delve into the mechanisms and significance of each model and technique, ensuring a holistic examination of their contributions. Both GANs and VAEs stand out as significant models within image generative frameworks, each excelling in distinct aspects. Therefore, it is imperative to discuss both models in this review, as they offer complementary strengths. Additionally, we include noteworthy models such as augmented reality to provide a well-rounded assessment of the current advancements in the field. In terms of datasets, MS COCO offers a diverse and extensive collection of images, serving as a cornerstone for model training. Other datasets like Flickr 30k, Visual Genome, and Conceptual Captions contribute valuable labeled examples, further enriching the learning process for these models. The incorporation of widely recognized metrics and methodologies in the field allows for effective evaluation and comparison of their relative significance. In conclusion, the field's recent achievements owe much to the integration of its various components. VAEs and GANs, with their unique strengths, complement each other, while metrics and datasets play complementary roles in advancing the capabilities of generative models in the context of text-to-image synthesis. This survey underscores the collaborative synergy between models, metrics, and datasets, propelling the field toward new horizons.