场景合成与自动生成文本描述

Julian Müller-Huschke, Marcel Ritter, M. Harders
{"title":"场景合成与自动生成文本描述","authors":"Julian Müller-Huschke, Marcel Ritter, M. Harders","doi":"10.2312/egs.20221026","DOIUrl":null,"url":null,"abstract":"Most current research on automatically captioning and describing scenes with spatial content focuses on images. We outline that generating descriptive text for a synthesized 3D scene can be achieved via a suitable intermediate representation employed in the synthesis algorithm. As an example, we synthesize scenes of medieval village settings, and generate their descriptions. Our system employs graph grammars, Markov Chain Monte Carlo optimization, and a natural language generation pipeline. Randomly placed objects are evaluated and optimized by a cost function capturing neighborhood relations, path layouts, and collisions. Further, in a pilot study we assess the performance of our framework by comparing the generated descriptions to others provided by human subjects. While the latter were often short and low-effort, the highest-rated ones clearly outperform our generated ones. Nevertheless, the average of all collected human descriptions was indeed rated by the study participants as being less accurate than the automated ones. CCS Concepts • Computing methodologies → Computer graphics; Natural language generation; The scene consists of three roads meeting at an intersection, a group of trees, an oak tree and three market stands. The three market stands are next to the first road. The group of trees consists of three pine trees and three bushes. The first market stand consists of a sign to the right of a table. A big pot of stew is in the middle of this table. The second market stand consists of a sign besides of a table. A big pot of stew is in the middle of this table. The third market stand consists of three flowerpots on top of a table and a sign. This sign is to the right of this table. Figure 1: (Left:) Example of procedurally generated 3D scene. (Right:) Automatically generated description with our framework.","PeriodicalId":72958,"journal":{"name":"Eurographics ... Workshop on 3D Object Retrieval : EG 3DOR. Eurographics Workshop on 3D Object Retrieval","volume":"27 1","pages":"33-36"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Scene Synthesis with Automated Generation of Textual Descriptions\",\"authors\":\"Julian Müller-Huschke, Marcel Ritter, M. Harders\",\"doi\":\"10.2312/egs.20221026\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Most current research on automatically captioning and describing scenes with spatial content focuses on images. We outline that generating descriptive text for a synthesized 3D scene can be achieved via a suitable intermediate representation employed in the synthesis algorithm. As an example, we synthesize scenes of medieval village settings, and generate their descriptions. Our system employs graph grammars, Markov Chain Monte Carlo optimization, and a natural language generation pipeline. Randomly placed objects are evaluated and optimized by a cost function capturing neighborhood relations, path layouts, and collisions. Further, in a pilot study we assess the performance of our framework by comparing the generated descriptions to others provided by human subjects. While the latter were often short and low-effort, the highest-rated ones clearly outperform our generated ones. Nevertheless, the average of all collected human descriptions was indeed rated by the study participants as being less accurate than the automated ones. CCS Concepts • Computing methodologies → Computer graphics; Natural language generation; The scene consists of three roads meeting at an intersection, a group of trees, an oak tree and three market stands. The three market stands are next to the first road. The group of trees consists of three pine trees and three bushes. The first market stand consists of a sign to the right of a table. A big pot of stew is in the middle of this table. The second market stand consists of a sign besides of a table. A big pot of stew is in the middle of this table. The third market stand consists of three flowerpots on top of a table and a sign. This sign is to the right of this table. Figure 1: (Left:) Example of procedurally generated 3D scene. (Right:) Automatically generated description with our framework.\",\"PeriodicalId\":72958,\"journal\":{\"name\":\"Eurographics ... Workshop on 3D Object Retrieval : EG 3DOR. Eurographics Workshop on 3D Object Retrieval\",\"volume\":\"27 1\",\"pages\":\"33-36\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Eurographics ... Workshop on 3D Object Retrieval : EG 3DOR. Eurographics Workshop on 3D Object Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2312/egs.20221026\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Eurographics ... Workshop on 3D Object Retrieval : EG 3DOR. Eurographics Workshop on 3D Object Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2312/egs.20221026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

目前大多数关于空间内容场景自动字幕和描述的研究都集中在图像上。我们概述了合成3D场景生成描述性文本可以通过合成算法中使用的合适的中间表示来实现。作为一个例子,我们合成了中世纪村庄场景,并生成了它们的描述。我们的系统采用了图语法、马尔可夫链蒙特卡罗优化和自然语言生成管道。随机放置的对象通过捕获邻域关系、路径布局和碰撞的成本函数进行评估和优化。此外,在一项试点研究中,我们通过将生成的描述与人类受试者提供的其他描述进行比较,来评估我们框架的性能。虽然后者通常很短且不费力,但评级最高的游戏显然优于我们生成的游戏。然而,所有收集到的人类描述的平均值确实被研究参与者评为不如自动描述准确。•计算方法→计算机图形学;自然语言生成;这个场景由三条在十字路口交汇的道路、一组树木、一棵橡树和三个市场摊位组成。三个市场摊位紧挨着第一条路。这群树由三棵松树和三棵灌木组成。第一个市场摊位由桌子右边的一个标志组成。桌子中间放着一大锅炖菜。第二个市场摊位除了一张桌子外还有一个标志。桌子中间放着一大锅炖菜。第三个市场摊位由桌子上的三个花盆和一个标志组成。这个标志在桌子的右边。图1:(左)程序生成的3D场景示例。(右:)使用我们的框架自动生成描述。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Scene Synthesis with Automated Generation of Textual Descriptions
Most current research on automatically captioning and describing scenes with spatial content focuses on images. We outline that generating descriptive text for a synthesized 3D scene can be achieved via a suitable intermediate representation employed in the synthesis algorithm. As an example, we synthesize scenes of medieval village settings, and generate their descriptions. Our system employs graph grammars, Markov Chain Monte Carlo optimization, and a natural language generation pipeline. Randomly placed objects are evaluated and optimized by a cost function capturing neighborhood relations, path layouts, and collisions. Further, in a pilot study we assess the performance of our framework by comparing the generated descriptions to others provided by human subjects. While the latter were often short and low-effort, the highest-rated ones clearly outperform our generated ones. Nevertheless, the average of all collected human descriptions was indeed rated by the study participants as being less accurate than the automated ones. CCS Concepts • Computing methodologies → Computer graphics; Natural language generation; The scene consists of three roads meeting at an intersection, a group of trees, an oak tree and three market stands. The three market stands are next to the first road. The group of trees consists of three pine trees and three bushes. The first market stand consists of a sign to the right of a table. A big pot of stew is in the middle of this table. The second market stand consists of a sign besides of a table. A big pot of stew is in the middle of this table. The third market stand consists of three flowerpots on top of a table and a sign. This sign is to the right of this table. Figure 1: (Left:) Example of procedurally generated 3D scene. (Right:) Automatically generated description with our framework.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
AvatarGo: Plug and Play self-avatars for VR Reconstructing 3D Face of Infants in Social Interactions Using Morphable Models of Non-Infants. Dyani White Hawk: Speaking to Relatives, Kemper Museum of Contemporary Art, Kansas City, MO, 18 February–16 May 2021 Andy Warhol, Tate Modern, London, 12 March–15 November 2020 Investigating Fluidity in Hans Haacke’s Condensation Cube (1965) and Gustave Metzger’s Liquid Crystal Environment (1965)
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1