多模态空间中神经网络 Midjourney 的计算创造力

Litera Pub Date : 2024-06-01 DOI:10.25136/2409-8698.2024.6.70890

Christina Petrovna Zhikulina, Viktoriya Vladimirovna Kostromina

{"title":"多模态空间中神经网络 Midjourney 的计算创造力","authors":"Christina Petrovna Zhikulina, Viktoriya Vladimirovna Kostromina","doi":"10.25136/2409-8698.2024.6.70890","DOIUrl":null,"url":null,"abstract":"\n This article deals with the polymodal space in the field of computational creativity in neural networks. The object of research is a polymodal environment that integrates a series of heterogeneous codes to express a common idea, and the subject is the possibility of creating polymodal digital art using text and voice prompts in the generative network Midjourney. The aim of the study is to prove that computational creativity can be detected and described based on the results of iterations in the process of creating images, which in turn will allow us to talk about a complex polymodal system as a separate digital category of polymodality. We used the continuous sampling method when collecting linguistic units as they occur in the analysis process; contextual analysis for the systematic identification and description of the verbal and non-verbal contexts. It was necessary to conduct an experiment with the generative network Midjourney to identify patterns in the creation of a graphic space through text and voice data input, and then compare and contrast the results of iterations with the original image. The scientific novelty consists in the lack of research on the polymodal space in the context of neural networks and their generative ability. During the experiment, we obtained the following results: the term ‘polymodality’ in the context of the generative network Midjourney and its ‘digital art’ is due to the presence of three channels: verbal, visual and voice; tests have shown that the ability of the neural network to create images through prompt is at a high level, however, there are rough technical errors that do not allow users to fully approach the desired result when they generate an image; the summarization of the data allows us to talk about the presence of features of computational creativity in generative networks.\n","PeriodicalId":506782,"journal":{"name":"Litera","volume":"51 33","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Computational creativity of neural network Midjourney in a polymodal space\",\"authors\":\"Christina Petrovna Zhikulina, Viktoriya Vladimirovna Kostromina\",\"doi\":\"10.25136/2409-8698.2024.6.70890\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n This article deals with the polymodal space in the field of computational creativity in neural networks. The object of research is a polymodal environment that integrates a series of heterogeneous codes to express a common idea, and the subject is the possibility of creating polymodal digital art using text and voice prompts in the generative network Midjourney. The aim of the study is to prove that computational creativity can be detected and described based on the results of iterations in the process of creating images, which in turn will allow us to talk about a complex polymodal system as a separate digital category of polymodality. We used the continuous sampling method when collecting linguistic units as they occur in the analysis process; contextual analysis for the systematic identification and description of the verbal and non-verbal contexts. It was necessary to conduct an experiment with the generative network Midjourney to identify patterns in the creation of a graphic space through text and voice data input, and then compare and contrast the results of iterations with the original image. The scientific novelty consists in the lack of research on the polymodal space in the context of neural networks and their generative ability. During the experiment, we obtained the following results: the term ‘polymodality’ in the context of the generative network Midjourney and its ‘digital art’ is due to the presence of three channels: verbal, visual and voice; tests have shown that the ability of the neural network to create images through prompt is at a high level, however, there are rough technical errors that do not allow users to fully approach the desired result when they generate an image; the summarization of the data allows us to talk about the presence of features of computational creativity in generative networks.\\n\",\"PeriodicalId\":506782,\"journal\":{\"name\":\"Litera\",\"volume\":\"51 33\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Litera\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.25136/2409-8698.2024.6.70890\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Litera","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.25136/2409-8698.2024.6.70890","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文论述了神经网络计算创意领域的多模态空间。研究对象是一个多模态环境，该环境整合了一系列异构代码以表达一个共同的想法，研究课题是在生成网络 Midjourney 中使用文本和语音提示创作多模态数字艺术的可能性。研究的目的在于证明，计算创造力可以根据图像创作过程中的迭代结果进行检测和描述，这反过来又可以让我们将复杂的多模态系统作为一个独立的多模态数字类别来讨论。在收集分析过程中出现的语言单位时，我们使用了连续取样法；在系统识别和描述语言和非语言语境时，我们使用了语境分析法。有必要使用生成网络 Midjourney 进行实验，通过输入文本和语音数据来识别图形空间创建的模式，然后将迭代结果与原始图像进行对比。其科学新颖性在于，在神经网络及其生成能力方面缺乏对多模态空间的研究。在实验过程中，我们获得了以下结果：在生成网络 Midjourney 及其 "数字艺术 "的背景下，"多模态 "一词是由于存在三个通道：语言、视觉和语音；测试表明，神经网络通过提示创建图像的能力处于较高水平，然而，在生成图像时，存在粗略的技术错误，使用户无法完全接近预期结果；通过对数据的总结，我们可以谈论生成网络中存在的计算创造力特征。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Computational creativity of neural network Midjourney in a polymodal space

This article deals with the polymodal space in the field of computational creativity in neural networks. The object of research is a polymodal environment that integrates a series of heterogeneous codes to express a common idea, and the subject is the possibility of creating polymodal digital art using text and voice prompts in the generative network Midjourney. The aim of the study is to prove that computational creativity can be detected and described based on the results of iterations in the process of creating images, which in turn will allow us to talk about a complex polymodal system as a separate digital category of polymodality. We used the continuous sampling method when collecting linguistic units as they occur in the analysis process; contextual analysis for the systematic identification and description of the verbal and non-verbal contexts. It was necessary to conduct an experiment with the generative network Midjourney to identify patterns in the creation of a graphic space through text and voice data input, and then compare and contrast the results of iterations with the original image. The scientific novelty consists in the lack of research on the polymodal space in the context of neural networks and their generative ability. During the experiment, we obtained the following results: the term ‘polymodality’ in the context of the generative network Midjourney and its ‘digital art’ is due to the presence of three channels: verbal, visual and voice; tests have shown that the ability of the neural network to create images through prompt is at a high level, however, there are rough technical errors that do not allow users to fully approach the desired result when they generate an image; the summarization of the data allows us to talk about the presence of features of computational creativity in generative networks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Litera

自引率

0.00%

发文量