多模态空间中神经网络 Midjourney 的计算创造力

Christina Petrovna Zhikulina, Viktoriya Vladimirovna Kostromina
{"title":"多模态空间中神经网络 Midjourney 的计算创造力","authors":"Christina Petrovna Zhikulina, Viktoriya Vladimirovna Kostromina","doi":"10.25136/2409-8698.2024.6.70890","DOIUrl":null,"url":null,"abstract":"\n This article deals with the polymodal space in the field of computational creativity in neural networks. The object of research is a polymodal environment that integrates a series of heterogeneous codes to express a common idea, and the subject is the possibility of creating polymodal digital art using text and voice prompts in the generative network Midjourney. The aim of the study is to prove that computational creativity can be detected and described based on the results of iterations in the process of creating images, which in turn will allow us to talk about a complex polymodal system as a separate digital category of polymodality. We used the continuous sampling method when collecting linguistic units as they occur in the analysis process; contextual analysis for the systematic identification and description of the verbal and non-verbal contexts. It was necessary to conduct an experiment with the generative network Midjourney to identify patterns in the creation of a graphic space through text and voice data input, and then compare and contrast the results of iterations with the original image. The scientific novelty consists in the lack of research on the polymodal space in the context of neural networks and their generative ability. During the experiment, we obtained the following results: the term ‘polymodality’ in the context of the generative network Midjourney and its ‘digital art’ is due to the presence of three channels: verbal, visual and voice; tests have shown that the ability of the neural network to create images through prompt is at a high level, however, there are rough technical errors that do not allow users to fully approach the desired result when they generate an image; the summarization of the data allows us to talk about the presence of features of computational creativity in generative networks.\n","PeriodicalId":506782,"journal":{"name":"Litera","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Computational creativity of neural network Midjourney in a polymodal space\",\"authors\":\"Christina Petrovna Zhikulina, Viktoriya Vladimirovna Kostromina\",\"doi\":\"10.25136/2409-8698.2024.6.70890\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n This article deals with the polymodal space in the field of computational creativity in neural networks. The object of research is a polymodal environment that integrates a series of heterogeneous codes to express a common idea, and the subject is the possibility of creating polymodal digital art using text and voice prompts in the generative network Midjourney. The aim of the study is to prove that computational creativity can be detected and described based on the results of iterations in the process of creating images, which in turn will allow us to talk about a complex polymodal system as a separate digital category of polymodality. We used the continuous sampling method when collecting linguistic units as they occur in the analysis process; contextual analysis for the systematic identification and description of the verbal and non-verbal contexts. It was necessary to conduct an experiment with the generative network Midjourney to identify patterns in the creation of a graphic space through text and voice data input, and then compare and contrast the results of iterations with the original image. The scientific novelty consists in the lack of research on the polymodal space in the context of neural networks and their generative ability. During the experiment, we obtained the following results: the term ‘polymodality’ in the context of the generative network Midjourney and its ‘digital art’ is due to the presence of three channels: verbal, visual and voice; tests have shown that the ability of the neural network to create images through prompt is at a high level, however, there are rough technical errors that do not allow users to fully approach the desired result when they generate an image; the summarization of the data allows us to talk about the presence of features of computational creativity in generative networks.\\n\",\"PeriodicalId\":506782,\"journal\":{\"name\":\"Litera\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Litera\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.25136/2409-8698.2024.6.70890\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Litera","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.25136/2409-8698.2024.6.70890","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本文论述了神经网络计算创意领域的多模态空间。研究对象是一个多模态环境,该环境整合了一系列异构代码以表达一个共同的想法,研究课题是在生成网络 Midjourney 中使用文本和语音提示创作多模态数字艺术的可能性。研究的目的在于证明,计算创造力可以根据图像创作过程中的迭代结果进行检测和描述,这反过来又可以让我们将复杂的多模态系统作为一个独立的多模态数字类别来讨论。在收集分析过程中出现的语言单位时,我们使用了连续取样法;在系统识别和描述语言和非语言语境时,我们使用了语境分析法。有必要使用生成网络 Midjourney 进行实验,通过输入文本和语音数据来识别图形空间创建的模式,然后将迭代结果与原始图像进行对比。其科学新颖性在于,在神经网络及其生成能力方面缺乏对多模态空间的研究。在实验过程中,我们获得了以下结果:在生成网络 Midjourney 及其 "数字艺术 "的背景下,"多模态 "一词是由于存在三个通道:语言、视觉和语音;测试表明,神经网络通过提示创建图像的能力处于较高水平,然而,在生成图像时,存在粗略的技术错误,使用户无法完全接近预期结果;通过对数据的总结,我们可以谈论生成网络中存在的计算创造力特征。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Computational creativity of neural network Midjourney in a polymodal space
This article deals with the polymodal space in the field of computational creativity in neural networks. The object of research is a polymodal environment that integrates a series of heterogeneous codes to express a common idea, and the subject is the possibility of creating polymodal digital art using text and voice prompts in the generative network Midjourney. The aim of the study is to prove that computational creativity can be detected and described based on the results of iterations in the process of creating images, which in turn will allow us to talk about a complex polymodal system as a separate digital category of polymodality. We used the continuous sampling method when collecting linguistic units as they occur in the analysis process; contextual analysis for the systematic identification and description of the verbal and non-verbal contexts. It was necessary to conduct an experiment with the generative network Midjourney to identify patterns in the creation of a graphic space through text and voice data input, and then compare and contrast the results of iterations with the original image. The scientific novelty consists in the lack of research on the polymodal space in the context of neural networks and their generative ability. During the experiment, we obtained the following results: the term ‘polymodality’ in the context of the generative network Midjourney and its ‘digital art’ is due to the presence of three channels: verbal, visual and voice; tests have shown that the ability of the neural network to create images through prompt is at a high level, however, there are rough technical errors that do not allow users to fully approach the desired result when they generate an image; the summarization of the data allows us to talk about the presence of features of computational creativity in generative networks.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Computational creativity of neural network Midjourney in a polymodal space The functions of peripheral combinations of N. M. Karamzin's novella "Natalia, the Boyar's Daughter" Metaphorical models of the concept of "Longing" in the idiostyle of A. P. Platonov Russian and Arabic documentary tradition: syntactic aspect Posthumanistic transformation of the subject in the "virtual personal presence" at the ontological level
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1