{"title":"双向跨模态生成的跨文本-图像生成对抗网络","authors":"Changhong Jing, Bing Xue, Ju-dong Pan","doi":"10.1145/3569966.3569990","DOIUrl":null,"url":null,"abstract":"Cross-modal tasks between text and images are increasingly a research hotspot. This paper proposed a cross-text-image generative adversarial network(CTI-GAN). This model can complete the cross-modal bidirectional generation task between image and text. The method effectively connects text and image modeling to realize bidirectional generation between image and text. The extraction effect of text features is improved by hierarchical LSTM encoding. Through feature pyramid fusion, the features of each layer are fully utilized to improve the image feature representation. In this paper, experiments are conducted to verify the effectiveness of the above improvements for image text generation. The improved algorithm can efficiently complete the task of cross-modal image text generation and improve the accuracy of the generated samples. In the text description generation image task, the inception score of CTI-GAN is improved by about 2% compared with StackGAN++, HDGAN, GAN-INT-CLS, and other models under the same conditions of the same dataset.","PeriodicalId":145580,"journal":{"name":"Proceedings of the 5th International Conference on Computer Science and Software Engineering","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"CTI-GAN: Cross-Text-Image Generative Adversarial Network for Bidirectional Cross-modal Generation\",\"authors\":\"Changhong Jing, Bing Xue, Ju-dong Pan\",\"doi\":\"10.1145/3569966.3569990\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cross-modal tasks between text and images are increasingly a research hotspot. This paper proposed a cross-text-image generative adversarial network(CTI-GAN). This model can complete the cross-modal bidirectional generation task between image and text. The method effectively connects text and image modeling to realize bidirectional generation between image and text. The extraction effect of text features is improved by hierarchical LSTM encoding. Through feature pyramid fusion, the features of each layer are fully utilized to improve the image feature representation. In this paper, experiments are conducted to verify the effectiveness of the above improvements for image text generation. The improved algorithm can efficiently complete the task of cross-modal image text generation and improve the accuracy of the generated samples. In the text description generation image task, the inception score of CTI-GAN is improved by about 2% compared with StackGAN++, HDGAN, GAN-INT-CLS, and other models under the same conditions of the same dataset.\",\"PeriodicalId\":145580,\"journal\":{\"name\":\"Proceedings of the 5th International Conference on Computer Science and Software Engineering\",\"volume\":\"48 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 5th International Conference on Computer Science and Software Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3569966.3569990\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th International Conference on Computer Science and Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3569966.3569990","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
CTI-GAN: Cross-Text-Image Generative Adversarial Network for Bidirectional Cross-modal Generation
Cross-modal tasks between text and images are increasingly a research hotspot. This paper proposed a cross-text-image generative adversarial network(CTI-GAN). This model can complete the cross-modal bidirectional generation task between image and text. The method effectively connects text and image modeling to realize bidirectional generation between image and text. The extraction effect of text features is improved by hierarchical LSTM encoding. Through feature pyramid fusion, the features of each layer are fully utilized to improve the image feature representation. In this paper, experiments are conducted to verify the effectiveness of the above improvements for image text generation. The improved algorithm can efficiently complete the task of cross-modal image text generation and improve the accuracy of the generated samples. In the text description generation image task, the inception score of CTI-GAN is improved by about 2% compared with StackGAN++, HDGAN, GAN-INT-CLS, and other models under the same conditions of the same dataset.