Pub Date : 2024-03-12DOI: 10.1109/OJSP.2024.3376297
Riccardo F. Gramaccioni;Christian Marinoni;Changan Chen;Aurelio Uncini;Danilo Comminiello
The primary goal of the L3DAS (Learning 3D Audio Sources) project is to stimulate and support collaborative research studies concerning machine learning techniques applied to 3D audio signal processing. To this end, the L3DAS23 Challenge, presented at IEEE ICASSP 2023, focuses on two spatial audio tasks of paramount interest for practical uses: 3D speech enhancement (3DSE) and 3D sound event localization and detection (3DSELD). Both tasks are evaluated within augmented reality applications. The aim of this paper is to describe the main results obtained from this challenge. We provide the L3DAS23 dataset, which comprises a collection of first-order Ambisonics recordings in reverberant simulated environments. Indeed, we maintain some general characteristics of the previous L3DAS challenges, featuring a pair of first-order Ambisonics microphones to capture the audio signals and involving multiple-source and multiple-perspective Ambisonics recordings. However, in this new edition, we introduce audio-visual scenarios by including images that depict the frontal view of the environments as captured from the perspective of the microphones. This addition aims to enrich the challenge experience, giving participants tools for exploring a combination of audio and images for solving the 3DSE and 3DSELD tasks. In addition to a brand-new dataset, we provide updated baseline models designed to take advantage of audio-image pairs. To ensure accessibility and reproducibility, we also supply supporting API for an effortless replication of our results. Lastly, we present the results achieved by the participants of the L3DAS23 Challenge.
{"title":"L3DAS23: Learning 3D Audio Sources for Audio-Visual Extended Reality","authors":"Riccardo F. Gramaccioni;Christian Marinoni;Changan Chen;Aurelio Uncini;Danilo Comminiello","doi":"10.1109/OJSP.2024.3376297","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3376297","url":null,"abstract":"The primary goal of the L3DAS (Learning 3D Audio Sources) project is to stimulate and support collaborative research studies concerning machine learning techniques applied to 3D audio signal processing. To this end, the L3DAS23 Challenge, presented at IEEE ICASSP 2023, focuses on two spatial audio tasks of paramount interest for practical uses: 3D speech enhancement (3DSE) and 3D sound event localization and detection (3DSELD). Both tasks are evaluated within augmented reality applications. The aim of this paper is to describe the main results obtained from this challenge. We provide the L3DAS23 dataset, which comprises a collection of first-order Ambisonics recordings in reverberant simulated environments. Indeed, we maintain some general characteristics of the previous L3DAS challenges, featuring a pair of first-order Ambisonics microphones to capture the audio signals and involving multiple-source and multiple-perspective Ambisonics recordings. However, in this new edition, we introduce audio-visual scenarios by including images that depict the frontal view of the environments as captured from the perspective of the microphones. This addition aims to enrich the challenge experience, giving participants tools for exploring a combination of audio and images for solving the 3DSE and 3DSELD tasks. In addition to a brand-new dataset, we provide updated baseline models designed to take advantage of audio-image pairs. To ensure accessibility and reproducibility, we also supply supporting API for an effortless replication of our results. Lastly, we present the results achieved by the participants of the L3DAS23 Challenge.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"632-640"},"PeriodicalIF":2.9,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10468560","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141447968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-12DOI: 10.1109/OJSP.2024.3376296
Mohammad Jalilpour Monesi;Lies Bollens;Bernd Accou;Jonas Vanthornhout;Hugo Van Hamme;Tom Francart
This paper describes the auditory EEG challenge, organized as one of the Signal Processing Grand Challenges at ICASSP 2023. The challenge provides EEG recordings of 85 subjects who listened to continuous speech, as audiobooks or podcasts, while their brain activity was recorded. EEG recordings of 71 subjects were provided as a training set such that challenge participants could train their models on a relatively large dataset. The remaining 14 subjects were used as held-out subjects in evaluating the challenge. The challenge consists of two tasks that relate electroencephalogram (EEG) signals to the presented speech stimulus. The first task, match-mismatch, aims to determine which of two speech segments induced a given EEG segment. In the second regression task, the goal is to reconstruct the speech envelope from the EEG. For the match-mismatch task, the performance of different teams was close to the baseline model, and the models did generalize well to unseen subjects. In contrast, For the regression task, the top teams significantly improved over the baseline models in the held-out stories test set while failing to generalize to unseen subjects.
{"title":"Auditory EEG Decoding Challenge for ICASSP 2023","authors":"Mohammad Jalilpour Monesi;Lies Bollens;Bernd Accou;Jonas Vanthornhout;Hugo Van Hamme;Tom Francart","doi":"10.1109/OJSP.2024.3376296","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3376296","url":null,"abstract":"This paper describes the auditory EEG challenge, organized as one of the Signal Processing Grand Challenges at ICASSP 2023. The challenge provides EEG recordings of 85 subjects who listened to continuous speech, as audiobooks or podcasts, while their brain activity was recorded. EEG recordings of 71 subjects were provided as a training set such that challenge participants could train their models on a relatively large dataset. The remaining 14 subjects were used as held-out subjects in evaluating the challenge. The challenge consists of two tasks that relate electroencephalogram (EEG) signals to the presented speech stimulus. The first task, match-mismatch, aims to determine which of two speech segments induced a given EEG segment. In the second regression task, the goal is to reconstruct the speech envelope from the EEG. For the match-mismatch task, the performance of different teams was close to the baseline model, and the models did generalize well to unseen subjects. In contrast, For the regression task, the top teams significantly improved over the baseline models in the held-out stories test set while failing to generalize to unseen subjects.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"652-661"},"PeriodicalIF":2.9,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10468639","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141447999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents an overview of the e-Prevention: Person Identification and Relapse Detection Challenge, which was an open call for researchers at ICASSP-2023. The challenge aimed at the analysis and processing of long-term continuous recordings of biosignals recorded from wearable sensors, namely accelerometers, gyroscopes and heart rate monitors embedded in smartwatches, as well as sleep information and daily step counts, in order to extract high-level representations of the wearer's activity and behavior, termed as digital phenotypes. Specifically, with the goal of analyzing the ability of these digital phenotypes in quantifying behavioral patterns, two tasks were evaluated in two distinct tracks: 1) Identification of the wearer of the smartwatch, and 2) Detection of psychotic relapses in patients in the psychotic spectrum. The long-term data that have been used in this challenge have been acquired during the course of the e-Prevention project (Zlatintsi et al., 2022), an innovative integrated system for medical support that facilitates effective monitoring and relapse prevention in patients with mental disorders. Two baseline systems, one for each task, were described and the validation scores for both tasks were provided to the participants. Herein, we present an overview of the approaches and methods as well as the performance analysis and the results of the 5-top ranked participating teams, which in track 1 achieved accuracy results between 91%-95%, while in track 2 mean PR- and ROC-AUC scores between 0.6051 and 0.6489 were obtained. Finally, we also make the datasets publicly available at https://robotics.ntua.gr/eprevention-sp-challenge/