This paper presents the 6th edition of the “Drone-vs-Bird” detection challenge, jointly organized with the WOSDETC workshop within the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023. The main objective of the challenge is to advance the current state-of-the-art in detecting the presence of one or more Unmanned Aerial Vehicles (UAVs) in real video scenes, while facing challenging conditions such as moving cameras, disturbing environmental factors, and the presence of birds flying in the foreground. For this purpose, a video dataset was provided for training the proposed solutions, and a separate test dataset was released a few days before the challenge deadline to assess their performance. The dataset has continually expanded over consecutive installments of the Drone-vs-Bird challenge and remains openly available to the research community, for non-commercial purposes. The challenge attracted novel signal processing solutions, mainly based on deep learning algorithms. The paper illustrates the results achieved by the teams that successfully participated in the 2023 challenge, offering a concise overview of the state-of-the-art in the field of drone detection using video signal processing. Additionally, the paper provides valuable insights into potential directions for future research, building upon the main pros and limitations of the solutions presented by the participating teams.
{"title":"The Drone-vs-Bird Detection Grand Challenge at ICASSP 2023: A Review of Methods and Results","authors":"Angelo Coluccia;Alessio Fascista;Lars Sommer;Arne Schumann;Anastasios Dimou;Dimitrios Zarpalas","doi":"10.1109/OJSP.2024.3379073","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3379073","url":null,"abstract":"This paper presents the 6th edition of the “Drone-vs-Bird” detection challenge, jointly organized with the WOSDETC workshop within the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023. The main objective of the challenge is to advance the current state-of-the-art in detecting the presence of one or more Unmanned Aerial Vehicles (UAVs) in real video scenes, while facing challenging conditions such as moving cameras, disturbing environmental factors, and the presence of birds flying in the foreground. For this purpose, a video dataset was provided for training the proposed solutions, and a separate test dataset was released a few days before the challenge deadline to assess their performance. The dataset has continually expanded over consecutive installments of the Drone-vs-Bird challenge and remains openly available to the research community, for non-commercial purposes. The challenge attracted novel signal processing solutions, mainly based on deep learning algorithms. The paper illustrates the results achieved by the teams that successfully participated in the 2023 challenge, offering a concise overview of the state-of-the-art in the field of drone detection using video signal processing. Additionally, the paper provides valuable insights into potential directions for future research, building upon the main pros and limitations of the solutions presented by the participating teams.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"766-779"},"PeriodicalIF":2.9,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10475518","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141448001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-18DOI: 10.1109/OJSP.2024.3378593
Mike D. Thornton;Danilo P. Mandic;Tobias J. Reichenbach
The electroencephalogram (EEG) offers a non-invasive means by which a listener's auditory system may be monitored during continuous speech perception. Reliable auditory-EEG decoders could facilitate the objective diagnosis of hearing disorders, or find applications in cognitively-steered hearing aids. Previously, we developed decoders for the ICASSP Auditory EEG Signal Processing Grand Challenge (SPGC). These decoders placed first in the match-mismatch task: given a short temporal segment of EEG recordings, and two candidate speech segments, the task is to identify which of the two speech segments is temporally aligned, or matched, with the EEG segment. The decoders made use of cortical responses to the speech envelope, as well as speech-related frequency-following responses, to relate the EEG recordings to the speech stimuli. Here we comprehensively document the methods by which the decoders were developed. We extend our previous analysis by exploring the association between speaker characteristics (pitch and sex) and classification accuracy, and provide a full statistical analysis of the final performance of the decoders as evaluated on a heldout portion of the dataset. Finally, the generalisation capabilities of the decoders are characterised, by evaluating them using an entirely different dataset which contains EEG recorded under a variety of speech-listening conditions. The results show that the match-mismatch decoders achieve accurate and robust classification accuracies, and they can even serve as auditory attention decoders without additional training.
{"title":"Decoding Envelope and Frequency-Following EEG Responses to Continuous Speech Using Deep Neural Networks","authors":"Mike D. Thornton;Danilo P. Mandic;Tobias J. Reichenbach","doi":"10.1109/OJSP.2024.3378593","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3378593","url":null,"abstract":"The electroencephalogram (EEG) offers a non-invasive means by which a listener's auditory system may be monitored during continuous speech perception. Reliable auditory-EEG decoders could facilitate the objective diagnosis of hearing disorders, or find applications in cognitively-steered hearing aids. Previously, we developed decoders for the ICASSP Auditory EEG Signal Processing Grand Challenge (SPGC). These decoders placed first in the match-mismatch task: given a short temporal segment of EEG recordings, and two candidate speech segments, the task is to identify which of the two speech segments is temporally aligned, or matched, with the EEG segment. The decoders made use of cortical responses to the speech envelope, as well as speech-related frequency-following responses, to relate the EEG recordings to the speech stimuli. Here we comprehensively document the methods by which the decoders were developed. We extend our previous analysis by exploring the association between speaker characteristics (pitch and sex) and classification accuracy, and provide a full statistical analysis of the final performance of the decoders as evaluated on a heldout portion of the dataset. Finally, the generalisation capabilities of the decoders are characterised, by evaluating them using an entirely different dataset which contains EEG recorded under a variety of speech-listening conditions. The results show that the match-mismatch decoders achieve accurate and robust classification accuracies, and they can even serve as auditory attention decoders without additional training.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"700-716"},"PeriodicalIF":2.9,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10474145","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141447973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-18DOI: 10.1109/OJSP.2024.3378594
Liuyin Yang;Bob Van Dyck;Marc M. Van Hulle
Speech envelope reconstruction from EEG is shown to bear clinical potential to assess speech intelligibility. Linear models are commonly used to this end, but they have recently been outperformed in reconstruction scores by non-linear deep neural networks, particularly by dilated convolutional networks. This study presents Sea-Wave, a WaveNet-based architecture for speech envelope reconstruction that outperforms the state-of-the-art model. Our model is an extension of our submission for the Auditory EEG Challenge of the ICASSP Signal Processing Grand Challenge 2023. We improve upon our prior work by evaluating model components and hyperparameters through an ablation study and hyperparameter search, respectively. Our best subject-independent model achieves a Pearson correlation of 22.58% on seen and 11.58% on unseen subjects. After subject-specific fine-tuning, we find an average relative improvement of 30% for the seen subjects and a Pearson correlation of 56.57% for the best seen subject.Finally, we explore several model visualizations to obtain a better understanding of the model, the differences across subjects and the EEG features that relate to auditory perception.
{"title":"Sea-Wave: Speech Envelope Reconstruction From Auditory EEG With an Adapted WaveNet","authors":"Liuyin Yang;Bob Van Dyck;Marc M. Van Hulle","doi":"10.1109/OJSP.2024.3378594","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3378594","url":null,"abstract":"Speech envelope reconstruction from EEG is shown to bear clinical potential to assess speech intelligibility. Linear models are commonly used to this end, but they have recently been outperformed in reconstruction scores by non-linear deep neural networks, particularly by dilated convolutional networks. This study presents Sea-Wave, a WaveNet-based architecture for speech envelope reconstruction that outperforms the state-of-the-art model. Our model is an extension of our submission for the Auditory EEG Challenge of the ICASSP Signal Processing Grand Challenge 2023. We improve upon our prior work by evaluating model components and hyperparameters through an ablation study and hyperparameter search, respectively. Our best subject-independent model achieves a Pearson correlation of 22.58% on seen and 11.58% on unseen subjects. After subject-specific fine-tuning, we find an average relative improvement of 30% for the seen subjects and a Pearson correlation of 56.57% for the best seen subject.Finally, we explore several model visualizations to obtain a better understanding of the model, the differences across subjects and the EEG features that relate to auditory perception.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"686-699"},"PeriodicalIF":2.9,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10474194","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141448000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The ADReSS-M Signal Processing Grand Challenge was held at the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023. The challenge targeted difficult automatic prediction problems of great societal and medical relevance, namely, the detection of Alzheimer's Dementia (AD) and the estimation of cognitive test scoress. Participants were invited to create models for the assessment of cognitive function based on spontaneous speech data. Most of these models employed signal processing and machine learning methods. The ADReSS-M challenge was designed to assess the extent to which predictive models built based on speech in one language generalise to another language. The language data compiled and made available for ADReSS-M comprised English, for model training, and Greek, for model testing and validation. To the best of our knowledge no previous shared research task investigated acoustic features of the speech signal or linguistic characteristics in the context of multilingual AD detection. This paper describes the context of the ADReSS-M challenge, its data sets, its predictive tasks, the evaluation methodology we employed, our baseline models and results, and the top five submissions. The paper concludes with a summary discussion of the ADReSS-M results, and our critical assessment of the future outlook in this field.
{"title":"An Overview of the ADReSS-M Signal Processing Grand Challenge on Multilingual Alzheimer's Dementia Recognition Through Spontaneous Speech","authors":"Saturnino Luz;Fasih Haider;Davida Fromm;Ioulietta Lazarou;Ioannis Kompatsiaris;Brian MacWhinney","doi":"10.1109/OJSP.2024.3378595","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3378595","url":null,"abstract":"The ADReSS-M Signal Processing Grand Challenge was held at the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023. The challenge targeted difficult automatic prediction problems of great societal and medical relevance, namely, the detection of Alzheimer's Dementia (AD) and the estimation of cognitive test scoress. Participants were invited to create models for the assessment of cognitive function based on spontaneous speech data. Most of these models employed signal processing and machine learning methods. The ADReSS-M challenge was designed to assess the extent to which predictive models built based on speech in one language generalise to another language. The language data compiled and made available for ADReSS-M comprised English, for model training, and Greek, for model testing and validation. To the best of our knowledge no previous shared research task investigated acoustic features of the speech signal or linguistic characteristics in the context of multilingual AD detection. This paper describes the context of the ADReSS-M challenge, its data sets, its predictive tasks, the evaluation methodology we employed, our baseline models and results, and the top five submissions. The paper concludes with a summary discussion of the ADReSS-M results, and our critical assessment of the future outlook in this field.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"738-749"},"PeriodicalIF":2.9,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10474114","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141447982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The ICASSP 2023 Deep Noise Suppression (DNS) Challenge marks the fifth edition of the DNS challenge series. DNS challenges were organized from 2019 to 2023 to foster research in the field of DNS. Previous DNS challenges were held at INTERSPEECH 2020, ICASSP 2021, INTERSPEECH 2021, and ICASSP 2022. This challenge aims to advance models capable of jointly addressing denoising, dereverberation, and interfering talker suppression, with separate tracks focusing on headset and speakerphone scenarios. The challenge facilitates personalized deep noise suppression by providing accompanying enrollment clips for each test clip, each containing the primary talker only, which can be used to compute a speaker identity feature and disentangle primary and interfering speech. While the majority of models submitted to the challenge were personalized, the same teams emerged as the winners in both tracks. The best models demonstrated improvements of 0.145 and 0.141 in the challenge's score, respectively, when compared to the noisy blind test set. We present additional analysis and draw comparisons to previous challenges.
ICASSP 2023 深度噪声抑制(DNS)挑战赛是 DNS 系列挑战赛的第五届。DNS 挑战赛于 2019 年至 2023 年举办,旨在促进 DNS 领域的研究。前几届 DNS 挑战赛分别在 INTERSPEECH 2020、ICASSP 2021、INTERSPEECH 2021 和 ICASSP 2022 上举行。本次挑战赛旨在推进能够共同解决去噪、去混响和干扰通话抑制问题的模型,并将耳机和免提电话场景作为不同赛道的重点。该挑战赛通过为每个测试片段提供随附的注册片段(每个片段仅包含主要说话者)来促进个性化深度噪声抑制,这些片段可用于计算说话者身份特征并区分主要语音和干扰语音。虽然提交给挑战赛的大多数模型都是个性化的,但在两个赛道中都有相同的团队获胜。与噪声盲测试集相比,最佳模型的挑战得分分别提高了 0.145 和 0.141。我们还进行了其他分析,并与之前的挑战赛进行了比较。
{"title":"ICASSP 2023 Deep Noise Suppression Challenge","authors":"Harishchandra Dubey;Ashkan Aazami;Vishak Gopal;Babak Naderi;Sebastian Braun;Ross Cutler;Alex Ju;Mehdi Zohourian;Min Tang;Mehrsa Golestaneh;Robert Aichner","doi":"10.1109/OJSP.2024.3378602","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3378602","url":null,"abstract":"The ICASSP 2023 Deep Noise Suppression (DNS) Challenge marks the fifth edition of the DNS challenge series. DNS challenges were organized from 2019 to 2023 to foster research in the field of DNS. Previous DNS challenges were held at INTERSPEECH 2020, ICASSP 2021, INTERSPEECH 2021, and ICASSP 2022. This challenge aims to advance models capable of jointly addressing denoising, dereverberation, and interfering talker suppression, with separate tracks focusing on headset and speakerphone scenarios. The challenge facilitates personalized deep noise suppression by providing accompanying enrollment clips for each test clip, each containing the primary talker only, which can be used to compute a speaker identity feature and disentangle primary and interfering speech. While the majority of models submitted to the challenge were personalized, the same teams emerged as the winners in both tracks. The best models demonstrated improvements of 0.145 and 0.141 in the challenge's score, respectively, when compared to the noisy blind test set. We present additional analysis and draw comparisons to previous challenges.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"725-737"},"PeriodicalIF":2.9,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10474162","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141447985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-18DOI: 10.1109/OJSP.2024.3378604
Miguel Bhagubai;Lauren Swinnen;Evy Cleeren;Wim Van Paesschen;Maarten De Vos;Christos Chatzichristos
The diagnosis of epilepsy can be confirmed in-hospital via video-electroencephalography (vEEG). Currently, long-term monitoring is limited to self-reporting seizure occurrences by the patients. In recent years, the development of wearable sensors has allowed monitoring patients outside of specialized environments. The application of wearable EEG devices for monitoring epileptic patients in ambulatory environments is still dampened by the low performance achieved by automated seizure detection frameworks. In this work, we present the results of a seizure detection grand challenge, organized as an attempt to stimulate the development of automated methodologies for detection of seizures on wearable EEG. The main drawbacks for developing wearable EEG seizure detection algorithms is the lack of data needed for training such frameworks. In this challenge, we provided participants with a large dataset of 42 patients with focal epilepsy, containing continuous recordings of behind-the-ear (bte) EEG. We challenged participants to develop a robust seizure classifier based on wearable EEG. Additionally, we proposed a subtask in order to motivate data-centric approaches to improve the training and performance of seizure detection models. An additional dataset, containing recordings with a bte-EEG wearable device, was employed to evaluate the work submitted by participants. In this paper, we present the five best scoring methodologies. The best performing approach was a feature-based decision tree ensemble algorithm with data augmentation via Fourier Transform surrogates. The organization of this challenge is of high importance for improving automated EEG analysis for epilepsy diagnosis, working towards implementing these technologies in clinical practice.
{"title":"Towards Automated Seizure Detection With Wearable EEG – Grand Challenge","authors":"Miguel Bhagubai;Lauren Swinnen;Evy Cleeren;Wim Van Paesschen;Maarten De Vos;Christos Chatzichristos","doi":"10.1109/OJSP.2024.3378604","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3378604","url":null,"abstract":"The diagnosis of epilepsy can be confirmed in-hospital via video-electroencephalography (vEEG). Currently, long-term monitoring is limited to self-reporting seizure occurrences by the patients. In recent years, the development of wearable sensors has allowed monitoring patients outside of specialized environments. The application of wearable EEG devices for monitoring epileptic patients in ambulatory environments is still dampened by the low performance achieved by automated seizure detection frameworks. In this work, we present the results of a seizure detection grand challenge, organized as an attempt to stimulate the development of automated methodologies for detection of seizures on wearable EEG. The main drawbacks for developing wearable EEG seizure detection algorithms is the lack of data needed for training such frameworks. In this challenge, we provided participants with a large dataset of 42 patients with focal epilepsy, containing continuous recordings of behind-the-ear (bte) EEG. We challenged participants to develop a robust seizure classifier based on wearable EEG. Additionally, we proposed a subtask in order to motivate data-centric approaches to improve the training and performance of seizure detection models. An additional dataset, containing recordings with a bte-EEG wearable device, was employed to evaluate the work submitted by participants. In this paper, we present the five best scoring methodologies. The best performing approach was a feature-based decision tree ensemble algorithm with data augmentation via Fourier Transform surrogates. The organization of this challenge is of high importance for improving automated EEG analysis for epilepsy diagnosis, working towards implementing these technologies in clinical practice.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"717-724"},"PeriodicalIF":2.9,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10474132","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141447983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Radio environment maps (REMs) hold a central role in optimizing wireless network deployment, enhancing network performance, and ensuring effective spectrum management. Conventional REM prediction methods are either excessively time-consuming, e.g., ray tracing, or inaccurate, e.g., statistical models, limiting their adoption in modern inherently dynamic wireless networks. Deep learning-based REM prediction has recently attracted considerable attention as an appealing, accurate, and time-efficient alternative. However, existing works on REM prediction using deep learning are either confined to 2D maps or use a relatively small dataset. In this paper, we introduce a runtime-efficient REM prediction framework based on U-Nets, trained on a large-scale 3D maps dataset. In addition, data preprocessing steps are investigated to further refine the REM prediction accuracy. The proposed U-Net framework, along with preprocessing steps, are evaluated in the context of the 2023 IEEE ICASSP Signal Processing Grand Challenge, namely, the First Pathloss Radio Map Prediction Challenge