From hearing to seeing: Linking auditory and visual place perceptions with soundscape-to-image generative artificial intelligence

IF 7.1 1区 地球科学 Q1 ENVIRONMENTAL STUDIES Computers Environment and Urban Systems Pub Date : 2024-05-01 DOI:10.1016/j.compenvurbsys.2024.102122
Yonggai Zhuang , Yuhao Kang , Teng Fei , Meng Bian , Yunyan Du
{"title":"From hearing to seeing: Linking auditory and visual place perceptions with soundscape-to-image generative artificial intelligence","authors":"Yonggai Zhuang ,&nbsp;Yuhao Kang ,&nbsp;Teng Fei ,&nbsp;Meng Bian ,&nbsp;Yunyan Du","doi":"10.1016/j.compenvurbsys.2024.102122","DOIUrl":null,"url":null,"abstract":"<div><p>People experience the world through multiple senses simultaneously, contributing to our sense of place. Prior quantitative geography studies have mostly emphasized human visual perceptions, neglecting human auditory perceptions at place due to the challenges in characterizing the acoustic environment vividly. Also, few studies have synthesized the two-dimensional (auditory and visual) perceptions in understanding human sense of place. To bridge these gaps, we propose a Soundscape-to-Image Diffusion model, a generative Artificial Intelligence (AI) model supported by Large Language Models (LLMs), aiming to visualize soundscapes through the generation of street view images. By creating audio-image pairs, acoustic environments are first represented as high-dimensional semantic audio vectors. Our proposed Soundscape-to-Image Diffusion model, which contains a Low-Resolution Diffusion Model and a Super-Resolution Diffusion Model, can then translate those semantic audio vectors into visual representations of place effectively. We evaluated our proposed model by using both machine-based and human-centered approaches. We proved that the generated street view images align with our common perceptions, and accurately create several key street elements of the original soundscapes. It also demonstrates that soundscapes provide sufficient visual information places. This study stands at the forefront of the intersection between generative AI and human geography, demonstrating how human multi-sensory experiences can be linked. We aim to enrich geospatial data science and AI studies with human experiences. It has the potential to inform multiple domains such as human geography, environmental psychology, and urban design and planning, as well as advancing our knowledge of human-environment relationships.</p></div>","PeriodicalId":48241,"journal":{"name":"Computers Environment and Urban Systems","volume":"110 ","pages":"Article 102122"},"PeriodicalIF":7.1000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers Environment and Urban Systems","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0198971524000516","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL STUDIES","Score":null,"Total":0}
引用次数: 0

Abstract

People experience the world through multiple senses simultaneously, contributing to our sense of place. Prior quantitative geography studies have mostly emphasized human visual perceptions, neglecting human auditory perceptions at place due to the challenges in characterizing the acoustic environment vividly. Also, few studies have synthesized the two-dimensional (auditory and visual) perceptions in understanding human sense of place. To bridge these gaps, we propose a Soundscape-to-Image Diffusion model, a generative Artificial Intelligence (AI) model supported by Large Language Models (LLMs), aiming to visualize soundscapes through the generation of street view images. By creating audio-image pairs, acoustic environments are first represented as high-dimensional semantic audio vectors. Our proposed Soundscape-to-Image Diffusion model, which contains a Low-Resolution Diffusion Model and a Super-Resolution Diffusion Model, can then translate those semantic audio vectors into visual representations of place effectively. We evaluated our proposed model by using both machine-based and human-centered approaches. We proved that the generated street view images align with our common perceptions, and accurately create several key street elements of the original soundscapes. It also demonstrates that soundscapes provide sufficient visual information places. This study stands at the forefront of the intersection between generative AI and human geography, demonstrating how human multi-sensory experiences can be linked. We aim to enrich geospatial data science and AI studies with human experiences. It has the potential to inform multiple domains such as human geography, environmental psychology, and urban design and planning, as well as advancing our knowledge of human-environment relationships.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从听觉到视觉:用声景图生成人工智能将听觉和视觉场所感知联系起来
人们通过多种感官同时体验世界,从而形成我们的地方感。之前的定量地理研究大多强调人类的视觉感知,而忽视了人类在地方的听觉感知,原因是很难生动地描述声学环境的特征。此外,很少有研究综合二维(听觉和视觉)感知来理解人类的地方感。为了弥补这些差距,我们提出了一个 "声景到图像扩散模型",这是一个由大型语言模型(LLM)支持的生成式人工智能(AI)模型,旨在通过生成街景图像将声景可视化。通过创建音频图像对,声学环境首先被表示为高维语义音频向量。我们提出的声景到图像扩散模型包含一个低分辨率扩散模型和一个超分辨率扩散模型,可以有效地将这些语义音频向量转化为地点的视觉表征。我们采用基于机器和以人为本的方法对我们提出的模型进行了评估。我们证明,生成的街景图像与我们的共同感知一致,并准确地创建了原始声音景观的几个关键街道元素。这也证明了声音景观能提供足够的视觉信息。这项研究站在了生成式人工智能与人文地理学交叉领域的前沿,展示了人类的多感官体验是如何联系在一起的。我们的目标是用人类体验丰富地理空间数据科学和人工智能研究。它有可能为人文地理学、环境心理学、城市设计与规划等多个领域提供信息,并推进我们对人类与环境关系的认识。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
13.30
自引率
7.40%
发文量
111
审稿时长
32 days
期刊介绍: Computers, Environment and Urban Systemsis an interdisciplinary journal publishing cutting-edge and innovative computer-based research on environmental and urban systems, that privileges the geospatial perspective. The journal welcomes original high quality scholarship of a theoretical, applied or technological nature, and provides a stimulating presentation of perspectives, research developments, overviews of important new technologies and uses of major computational, information-based, and visualization innovations. Applied and theoretical contributions demonstrate the scope of computer-based analysis fostering a better understanding of environmental and urban systems, their spatial scope and their dynamics.
期刊最新文献
Estimating the density of urban trees in 1890s Leeds and Edinburgh using object detection on historical maps The role of data resolution in analyzing urban form and PM2.5 concentration Causal discovery and analysis of global city carbon emissions based on data-driven and hybrid intelligence Editorial Board Exploring the built environment impacts on Online Car-hailing waiting time: An empirical study in Beijing
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1