Pub Date : 2024-10-23DOI: 10.1109/TASLP.2024.3485516
Inmo Yeon;Iljoo Jeong;Seungchul Lee;Jung-Woo Choi
Accurate estimation of indoor space geometries is vital for constructing precise digital twins, whose broad industrial applications include navigation in unfamiliar environments and efficient evacuation planning, particularly in low-light conditions. This study introduces EchoScan, a deep neural network model that utilizes acoustic echoes to perform room geometry inference. Conventional sound-based techniques rely on estimating geometry-related room parameters such as wall position and room size, thereby limiting the diversity of inferable room geometries. Contrarily, EchoScan overcomes this limitation by directly inferring room floorplan maps and height maps, thereby enabling it to handle rooms with complex shapes, including curved walls. The segmentation task for predicting floorplan and height maps enables the model to leverage both low- and high-order reflections. The use of high-order reflections further allows EchoScan to infer complex room shapes when some walls of the room are unobservable from the position of an audio device. Herein, EchoScan was trained and evaluated using RIRs synthesized from complex environments, including the Manhattan and Atlanta layouts, employing a practical audio device configuration compatible with commercial, off-the-shelf devices.
{"title":"EchoScan: Scanning Complex Room Geometries via Acoustic Echoes","authors":"Inmo Yeon;Iljoo Jeong;Seungchul Lee;Jung-Woo Choi","doi":"10.1109/TASLP.2024.3485516","DOIUrl":"https://doi.org/10.1109/TASLP.2024.3485516","url":null,"abstract":"Accurate estimation of indoor space geometries is vital for constructing precise digital twins, whose broad industrial applications include navigation in unfamiliar environments and efficient evacuation planning, particularly in low-light conditions. This study introduces EchoScan, a deep neural network model that utilizes acoustic echoes to perform room geometry inference. Conventional sound-based techniques rely on estimating geometry-related room parameters such as wall position and room size, thereby limiting the diversity of inferable room geometries. Contrarily, EchoScan overcomes this limitation by directly inferring room floorplan maps and height maps, thereby enabling it to handle rooms with complex shapes, including curved walls. The segmentation task for predicting floorplan and height maps enables the model to leverage both low- and high-order reflections. The use of high-order reflections further allows EchoScan to infer complex room shapes when some walls of the room are unobservable from the position of an audio device. Herein, EchoScan was trained and evaluated using RIRs synthesized from complex environments, including the Manhattan and Atlanta layouts, employing a practical audio device configuration compatible with commercial, off-the-shelf devices.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4768-4782"},"PeriodicalIF":4.1,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142598612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Beamforming has been used in a wide range of applications to extract the signal of interest from microphone array observations, which consist of not only the signal of interest, but also noise, interference, and reverberation. The recently proposed interference-controlled maximum noise reduction (ICMR) beamformer provides a flexible way to control the specified amount of the interference attenuation and noise suppression; but it requires accurate estimation of the manifold vector of the interference sources, which is challenging to achieve in real-world applications. To address this issue, we introduce an interference-controlled maximum noise reduction network (ICMRNet) in this study, which is a deep neural network (DNN)-based method for manifold vector estimation. With densely connected modified conformer blocks and the end-to-end training strategy, the interference manifold is learned directly from the observation signals. This approach, akin to ICMR, adeptly adapts to time-varying interference and demonstrates superior convergence rate and extraction efficacy as compared to the linearly constrained minimum variance (LCMV)-based neural beamformers when appropriate attenuation factors are selected. Moreover, via learning-based extraction, ICMRNet effectively suppresses reverberation components within the target signal. Comparative analysis against baseline methods validates the efficacy of the proposed method.
{"title":"Interference-Controlled Maximum Noise Reduction Beamformer Based on Deep-Learned Interference Manifold","authors":"Yichen Yang;Ningning Pan;Wen Zhang;Chao Pan;Jacob Benesty;Jingdong Chen","doi":"10.1109/TASLP.2024.3485551","DOIUrl":"https://doi.org/10.1109/TASLP.2024.3485551","url":null,"abstract":"Beamforming has been used in a wide range of applications to extract the signal of interest from microphone array observations, which consist of not only the signal of interest, but also noise, interference, and reverberation. The recently proposed interference-controlled maximum noise reduction (ICMR) beamformer provides a flexible way to control the specified amount of the interference attenuation and noise suppression; but it requires accurate estimation of the manifold vector of the interference sources, which is challenging to achieve in real-world applications. To address this issue, we introduce an interference-controlled maximum noise reduction network (ICMRNet) in this study, which is a deep neural network (DNN)-based method for manifold vector estimation. With densely connected modified conformer blocks and the end-to-end training strategy, the interference manifold is learned directly from the observation signals. This approach, akin to ICMR, adeptly adapts to time-varying interference and demonstrates superior convergence rate and extraction efficacy as compared to the linearly constrained minimum variance (LCMV)-based neural beamformers when appropriate attenuation factors are selected. Moreover, via learning-based extraction, ICMRNet effectively suppresses reverberation components within the target signal. Comparative analysis against baseline methods validates the efficacy of the proposed method.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4676-4690"},"PeriodicalIF":4.1,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142587647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Temporal knowledge graph reasoning aims to predict the missing links (facts) in the future timestamps. However, most existing methods have a common limitation: they focus on learning dynamic representations of temporal knowledge graphs and rarely consider static characteristics that remain unchanged over time. To address the above issues, we propose to learn the dynamic and static representations for temporal knowledge graph reasoning (DSTKG), which introduces two latent variables to capture the dynamic and static characteristics of entities in temporal knowledge graphs. First, we use a Bi-GRU-based inference network to learn the static latent representation of historical facts and a nonlinear discrete-time transition-based inference network to learn the dynamic latent representation. Then, we sample the latent variables multiple times using re-parameterization tricks to obtain high-quality embeddings and make predictions in the future timestamps. The empirical results on four benchmark datasets show that our model is more effective than state-of-the-art approaches. Compared with the strong baseline model DBKGE (RotatE), the proposed model achieves performance improvements of 2.69%, $1.59%$