Pub Date : 2022-11-10DOI: 10.3389/frsip.2022.984901
Valeria Grasso, Hafiz Wajahat Hassan, P. Mirtaheri, Regine Willumeit-Rӧmer, J. Jose
Recently, the development of learning-based algorithms has shown a crucial role to extract features of vital importance from multi-spectral photoacoustic imaging. In particular, advances in spectral photoacoustic unmixing algorithms can identify tissue biomarkers without a priori information. This has the potential to enhance the diagnosis and treatment of a large number of diseases. Here, we investigated the latest progress within spectral photoacoustic unmixing approaches. We evaluated the sensitivity of different unsupervised Blind Source Separation (BSS) techniques such as Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Non-negative Matrix Factorization (NNMF) to distinguish absorbers from spectral photoacoustic imaging. Besides, the performance of a recently developed superpixel photoacoustic unmixing (SPAX) framework has been also examined in detail. Near-infrared spectroscopy (NIRS) has been used to validate the performance of the different unmixing algorithms. Although the NNMF has shown superior unmixing performance than PCA and ICA in terms of correlation and processing time, this is still prone to unmixing misinterpretation due to spectral coloring artifact. Thus, the SPAX framework, which also compensates for the spectral coloring effect, has shown improved sensitivity and specificity of the unmixed components. In addition, the SPAX also reveals the most and less prominent tissue components from sPAI at a volumetric scale in a data-driven way. Phantom experimental measurements and in vivo studies have been conducted to benchmark the performance of the BSS algorithms and the SPAX framework.
{"title":"Recent advances in photoacoustic blind source spectral unmixing approaches and the enhanced detection of endogenous tissue chromophores","authors":"Valeria Grasso, Hafiz Wajahat Hassan, P. Mirtaheri, Regine Willumeit-Rӧmer, J. Jose","doi":"10.3389/frsip.2022.984901","DOIUrl":"https://doi.org/10.3389/frsip.2022.984901","url":null,"abstract":"Recently, the development of learning-based algorithms has shown a crucial role to extract features of vital importance from multi-spectral photoacoustic imaging. In particular, advances in spectral photoacoustic unmixing algorithms can identify tissue biomarkers without a priori information. This has the potential to enhance the diagnosis and treatment of a large number of diseases. Here, we investigated the latest progress within spectral photoacoustic unmixing approaches. We evaluated the sensitivity of different unsupervised Blind Source Separation (BSS) techniques such as Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Non-negative Matrix Factorization (NNMF) to distinguish absorbers from spectral photoacoustic imaging. Besides, the performance of a recently developed superpixel photoacoustic unmixing (SPAX) framework has been also examined in detail. Near-infrared spectroscopy (NIRS) has been used to validate the performance of the different unmixing algorithms. Although the NNMF has shown superior unmixing performance than PCA and ICA in terms of correlation and processing time, this is still prone to unmixing misinterpretation due to spectral coloring artifact. Thus, the SPAX framework, which also compensates for the spectral coloring effect, has shown improved sensitivity and specificity of the unmixed components. In addition, the SPAX also reveals the most and less prominent tissue components from sPAI at a volumetric scale in a data-driven way. Phantom experimental measurements and in vivo studies have been conducted to benchmark the performance of the BSS algorithms and the SPAX framework.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86933783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-17DOI: 10.3389/frsip.2022.1014700
C. Chatzichristos, Lauren Swinnen, Jaiver Macea, Miguel M. C. Bhagubai, W. van Paesschen, M. de Vos
Patients with absence epilepsy fail to report almost 90% of their seizures. The clinical gold standard to assess absence seizures is video-electroencephalography (vEEG) recorded in the hospital, an expensive and obtrusive procedure which requires also extended reviewing time. Wearable sensors, which allow the recording of electroencephalography (EEG), accelerometer and gyroscope have been used to monitor epileptic patients in their home environment for the first time. We developed a pipeline for accurate and robust absence seizure detection while reducing the review time of the long recordings. Our results show that multimodal analysis of absence seizures can improve the robustness to false alarms, while retaining a high sensitivity in seizure detection.
{"title":"Multimodal detection of typical absence seizures in home environment with wearable electrodes","authors":"C. Chatzichristos, Lauren Swinnen, Jaiver Macea, Miguel M. C. Bhagubai, W. van Paesschen, M. de Vos","doi":"10.3389/frsip.2022.1014700","DOIUrl":"https://doi.org/10.3389/frsip.2022.1014700","url":null,"abstract":"Patients with absence epilepsy fail to report almost 90% of their seizures. The clinical gold standard to assess absence seizures is video-electroencephalography (vEEG) recorded in the hospital, an expensive and obtrusive procedure which requires also extended reviewing time. Wearable sensors, which allow the recording of electroencephalography (EEG), accelerometer and gyroscope have been used to monitor epileptic patients in their home environment for the first time. We developed a pipeline for accurate and robust absence seizure detection while reducing the review time of the long recordings. Our results show that multimodal analysis of absence seizures can improve the robustness to false alarms, while retaining a high sensitivity in seizure detection.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89516934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-05DOI: 10.3389/frsip.2022.986293
Leticia Silva, Carlos Valadão, L. Lampier, D. Delisle-Rodríguez, Eliete Caldeira, T. Bastos-Filho, S. Krishnan
Since the COVID-19 outbreak, a major scientific effort has been made by researchers and companies worldwide to develop a digital diagnostic tool to screen this disease through some biomedical signals, such as cough, and speech. Joint time–frequency feature extraction techniques and machine learning (ML)-based models have been widely explored in respiratory diseases such as influenza, pertussis, and COVID-19 to find biomarkers from human respiratory system-generated acoustic sounds. In recent years, a variety of techniques for discriminating textures and computationally efficient local texture descriptors have been introduced, such as local binary patterns and local ternary patterns, among others. In this work, we propose an audio texture analysis of sounds emitted by subjects in suspicion of COVID-19 infection using time–frequency spectrograms. This approach of the feature extraction method has not been widely used for biomedical sounds, particularly for COVID-19 or respiratory diseases. We hypothesize that this textural sound analysis based on local binary patterns and local ternary patterns enables us to obtain a better classification model by discriminating both people with COVID-19 and healthy subjects. Cough, speech, and breath sounds from the INTERSPEECH 2021 ComParE and Cambridge KDD databases have been processed and analyzed to evaluate our proposed feature extraction method with ML techniques in order to distinguish between positive or negative for COVID-19 sounds. The results have been evaluated in terms of an unweighted average recall (UAR). The results show that the proposed method has performed well for cough, speech, and breath sound classification, with a UAR up to 100.00%, 60.67%, and 95.00%, respectively, to infer COVID-19 infection, which serves as an effective tool to perform a preliminary screening of COVID-19.
{"title":"COVID-19 respiratory sound analysis and classification using audio textures","authors":"Leticia Silva, Carlos Valadão, L. Lampier, D. Delisle-Rodríguez, Eliete Caldeira, T. Bastos-Filho, S. Krishnan","doi":"10.3389/frsip.2022.986293","DOIUrl":"https://doi.org/10.3389/frsip.2022.986293","url":null,"abstract":"Since the COVID-19 outbreak, a major scientific effort has been made by researchers and companies worldwide to develop a digital diagnostic tool to screen this disease through some biomedical signals, such as cough, and speech. Joint time–frequency feature extraction techniques and machine learning (ML)-based models have been widely explored in respiratory diseases such as influenza, pertussis, and COVID-19 to find biomarkers from human respiratory system-generated acoustic sounds. In recent years, a variety of techniques for discriminating textures and computationally efficient local texture descriptors have been introduced, such as local binary patterns and local ternary patterns, among others. In this work, we propose an audio texture analysis of sounds emitted by subjects in suspicion of COVID-19 infection using time–frequency spectrograms. This approach of the feature extraction method has not been widely used for biomedical sounds, particularly for COVID-19 or respiratory diseases. We hypothesize that this textural sound analysis based on local binary patterns and local ternary patterns enables us to obtain a better classification model by discriminating both people with COVID-19 and healthy subjects. Cough, speech, and breath sounds from the INTERSPEECH 2021 ComParE and Cambridge KDD databases have been processed and analyzed to evaluate our proposed feature extraction method with ML techniques in order to distinguish between positive or negative for COVID-19 sounds. The results have been evaluated in terms of an unweighted average recall (UAR). The results show that the proposed method has performed well for cough, speech, and breath sound classification, with a UAR up to 100.00%, 60.67%, and 95.00%, respectively, to infer COVID-19 infection, which serves as an effective tool to perform a preliminary screening of COVID-19.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"2000 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88291081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-03DOI: 10.3389/frsip.2022.975932
L. Marták, Rainer Kelz , Gerhard Widmer
Current state-of-the-art methods for polyphonic piano transcription tend to use high capacity neural networks. Most models are trained “end-to-end”, and learn a mapping from audio input to pitch labels. They require large training corpora consisting of many audio recordings of different piano models and temporally aligned pitch labels. It has been shown in previous work that neural network-based systems struggle to generalize to unseen note combinations, as they tend to learn note combinations by heart. Semi-supervised linear matrix decomposition is a frequently used alternative approach to piano transcription–one that does not have this particular drawback. The disadvantages of linear methods start to show when they encounter recordings of pieces played on unseen pianos, a scenario where neural networks seem relatively untroubled. A recently proposed approach called “Differentiable Dictionary Search” (DDS) combines the modeling capacity of deep density models with the linear mixing model of matrix decomposition in order to balance the mutual advantages and disadvantages of the standalone approaches, making it better suited to model unseen sources, while generalization to unseen note combinations should be unaffected, because the mixing model is not learned, and thus cannot acquire a corpus bias. In its initially proposed form, however, DDS is too inefficient in utilizing computational resources to be applied to piano music transcription. To reduce computational demands and memory requirements, we propose a number of modifications. These adjustments finally enable a fair comparison of our modified DDS variant with a semi-supervised matrix decomposition baseline, as well as a state-of-the-art, deep neural network based system that is trained end-to-end. In systematic experiments with both musical and “unmusical” piano recordings (real musical pieces and unusual chords), we provide quantitative and qualitative analyses at the frame level, characterizing the behavior of the modified approach, along with a comparison to several related methods. The results will generally show the fundamental promise of the model, and in particular demonstrate improvement in situations where a corpus bias incurred by learning from musical material of a specific genre would be problematic.
{"title":"Balancing bias and performance in polyphonic piano transcription systems","authors":"L. Marták, Rainer Kelz , Gerhard Widmer ","doi":"10.3389/frsip.2022.975932","DOIUrl":"https://doi.org/10.3389/frsip.2022.975932","url":null,"abstract":"Current state-of-the-art methods for polyphonic piano transcription tend to use high capacity neural networks. Most models are trained “end-to-end”, and learn a mapping from audio input to pitch labels. They require large training corpora consisting of many audio recordings of different piano models and temporally aligned pitch labels. It has been shown in previous work that neural network-based systems struggle to generalize to unseen note combinations, as they tend to learn note combinations by heart. Semi-supervised linear matrix decomposition is a frequently used alternative approach to piano transcription–one that does not have this particular drawback. The disadvantages of linear methods start to show when they encounter recordings of pieces played on unseen pianos, a scenario where neural networks seem relatively untroubled. A recently proposed approach called “Differentiable Dictionary Search” (DDS) combines the modeling capacity of deep density models with the linear mixing model of matrix decomposition in order to balance the mutual advantages and disadvantages of the standalone approaches, making it better suited to model unseen sources, while generalization to unseen note combinations should be unaffected, because the mixing model is not learned, and thus cannot acquire a corpus bias. In its initially proposed form, however, DDS is too inefficient in utilizing computational resources to be applied to piano music transcription. To reduce computational demands and memory requirements, we propose a number of modifications. These adjustments finally enable a fair comparison of our modified DDS variant with a semi-supervised matrix decomposition baseline, as well as a state-of-the-art, deep neural network based system that is trained end-to-end. In systematic experiments with both musical and “unmusical” piano recordings (real musical pieces and unusual chords), we provide quantitative and qualitative analyses at the frame level, characterizing the behavior of the modified approach, along with a comparison to several related methods. The results will generally show the fundamental promise of the model, and in particular demonstrate improvement in situations where a corpus bias incurred by learning from musical material of a specific genre would be problematic.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"56 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84903881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-30DOI: 10.3389/frsip.2022.904866
Thomas Potter, Z. Cvetković, E. De Sena
A study was performed using a virtual environment to investigate the relative importance of spatial audio fidelity and video resolution on perceived audio-visual quality and immersion. Subjects wore a head-mounted display and headphones and were presented with a virtual environment featuring music and speech stimuli using three levels each of spatial audio quality and video resolution. Spatial audio was rendered monaurally, binaurally with head-tracking, and binaurally with head-tracking and room acoustic rendering. Video was rendered at resolutions of 0.5 megapixels per eye, 1.5 megapixels per eye, and 2.5 megapixels per eye. Results showed that both video resolution and spatial audio rendering had a statistically significant effect on both immersion and audio-visual quality. Most strikingly, the results showed that under the conditions that were tested in the experiment, the addition of room acoustic rendering to head-tracked binaural audio had the same improvement on immersion as increasing the video resolution five-fold, from 0.5 megapixels per eye to 2.5 megapixels per eye.
{"title":"On the Relative Importance of Visual and Spatial Audio Rendering on VR Immersion","authors":"Thomas Potter, Z. Cvetković, E. De Sena","doi":"10.3389/frsip.2022.904866","DOIUrl":"https://doi.org/10.3389/frsip.2022.904866","url":null,"abstract":"A study was performed using a virtual environment to investigate the relative importance of spatial audio fidelity and video resolution on perceived audio-visual quality and immersion. Subjects wore a head-mounted display and headphones and were presented with a virtual environment featuring music and speech stimuli using three levels each of spatial audio quality and video resolution. Spatial audio was rendered monaurally, binaurally with head-tracking, and binaurally with head-tracking and room acoustic rendering. Video was rendered at resolutions of 0.5 megapixels per eye, 1.5 megapixels per eye, and 2.5 megapixels per eye. Results showed that both video resolution and spatial audio rendering had a statistically significant effect on both immersion and audio-visual quality. Most strikingly, the results showed that under the conditions that were tested in the experiment, the addition of room acoustic rendering to head-tracked binaural audio had the same improvement on immersion as increasing the video resolution five-fold, from 0.5 megapixels per eye to 2.5 megapixels per eye.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"116 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87666816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-30DOI: 10.3389/frsip.2022.884541
David Bau, Johannes M. Arend, C. Pörschmann
Conventional individual head-related transfer function (HRTF) measurements are demanding in terms of measurement time and equipment. For more flexibility, free body movement (FBM) measurement systems provide an easy-to-use way to measure full-spherical HRTF datasets with less effort. However, having no fixed measurement installation implies that the HRTFs are not sampled on a predefined regular grid but rely on the individual movements of the subject. Furthermore, depending on the measurement effort, a rather small number of measurements can be expected, ranging, for example, from 50 to 150 sampling points. Spherical harmonics (SH) interpolation has been extensively studied recently as one method to obtain full-spherical datasets from such sparse measurements, but previous studies primarily focused on regular full-spherical sampling grids. For irregular grids, it remains unclear up to which spatial order meaningful SH coefficients can be calculated and how the resulting interpolation error compares to regular grids. This study investigates SH interpolation of selected irregular grids obtained from HRTF measurements with an FBM system. Intending to derive general constraints for SH interpolation of irregular grids, the study analyzes how the variation of the SH order affects the interpolation results. Moreover, the study demonstrates the importance of Tikhonov regularization for SH interpolation, which is popular for solving ill-posed numerical problems associated with such irregular grids. As a key result, the study shows that the optimal SH order that minimizes the interpolation error depends mainly on the grid and the regularization strength but is almost independent of the selected HRTF set. Based on these results, the study proposes to determine the optimal SH order by minimizing the interpolation error of a reference HRTF set sampled on the sparse and irregular FBM grid. Finally, the study verifies the proposed method for estimating the optimal SH order by comparing interpolation results of irregular and equivalent regular grids, showing that the differences are small when the SH interpolation is optimally parameterized.
{"title":"Estimation of the Optimal Spherical Harmonics Order for the Interpolation of Head-Related Transfer Functions Sampled on Sparse Irregular Grids","authors":"David Bau, Johannes M. Arend, C. Pörschmann","doi":"10.3389/frsip.2022.884541","DOIUrl":"https://doi.org/10.3389/frsip.2022.884541","url":null,"abstract":"Conventional individual head-related transfer function (HRTF) measurements are demanding in terms of measurement time and equipment. For more flexibility, free body movement (FBM) measurement systems provide an easy-to-use way to measure full-spherical HRTF datasets with less effort. However, having no fixed measurement installation implies that the HRTFs are not sampled on a predefined regular grid but rely on the individual movements of the subject. Furthermore, depending on the measurement effort, a rather small number of measurements can be expected, ranging, for example, from 50 to 150 sampling points. Spherical harmonics (SH) interpolation has been extensively studied recently as one method to obtain full-spherical datasets from such sparse measurements, but previous studies primarily focused on regular full-spherical sampling grids. For irregular grids, it remains unclear up to which spatial order meaningful SH coefficients can be calculated and how the resulting interpolation error compares to regular grids. This study investigates SH interpolation of selected irregular grids obtained from HRTF measurements with an FBM system. Intending to derive general constraints for SH interpolation of irregular grids, the study analyzes how the variation of the SH order affects the interpolation results. Moreover, the study demonstrates the importance of Tikhonov regularization for SH interpolation, which is popular for solving ill-posed numerical problems associated with such irregular grids. As a key result, the study shows that the optimal SH order that minimizes the interpolation error depends mainly on the grid and the regularization strength but is almost independent of the selected HRTF set. Based on these results, the study proposes to determine the optimal SH order by minimizing the interpolation error of a reference HRTF set sampled on the sparse and irregular FBM grid. Finally, the study verifies the proposed method for estimating the optimal SH order by comparing interpolation results of irregular and equivalent regular grids, showing that the differences are small when the SH interpolation is optimally parameterized.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81030124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-29DOI: 10.3389/frsip.2022.884384
R. N. Tien, Anand Tekriwal, Dylan J. Calame, Jonathan P. Platt, Sunderland Baker, L. Seeberger, Drew S Kern, A. Person, S. Ojemann, John A. Thompson, D. Kramer
Clinical assessments of movement disorders currently rely on the administration of rating scales, which, while clinimetrically validated and reliable, rely on clinicians’ subjective analyses, resulting in interrater differences. Intraoperative microelectrode recording for deep brain stimulation targeting similarly relies on clinicians’ subjective evaluations of movement-related neural activity. Digital motion tracking can improve the diagnosis, assessment, and treatment of movement disorders by generating objective, standardized measures of patients’ kinematics. Motion tracking with concurrent neural recording also enables motor neuroscience studies to elucidate the neurophysiology underlying movements. Despite these promises, motion tracking has seen limited adoption in clinical settings due to the drawbacks of conventional motion tracking systems and practical limitations associated with clinical settings. However, recent advances in deep learning based computer vision algorithms have made accurate, robust markerless motion. tracking viable in any setting where digital video can be captured. Here, we review and discuss the potential clinical applications and technical limitations of deep learning based markerless motion tracking methods with a focus on DeepLabCut (DLC), an open-source software package that has been extensively applied in animal neuroscience research. We first provide a general overview of DLC, discuss its present usage, and describe the advantages that DLC confers over other motion tracking methods for clinical use. We then present our preliminary results from three ongoing studies that demonstrate the use of DLC for 1) movement disorder patient assessment and diagnosis, 2) intraoperative motor mapping for deep brain stimulation targeting and 3) intraoperative neural and kinematic recording for basic human motor neuroscience.
{"title":"Deep learning based markerless motion tracking as a clinical tool for movement disorders: Utility, feasibility and early experience","authors":"R. N. Tien, Anand Tekriwal, Dylan J. Calame, Jonathan P. Platt, Sunderland Baker, L. Seeberger, Drew S Kern, A. Person, S. Ojemann, John A. Thompson, D. Kramer","doi":"10.3389/frsip.2022.884384","DOIUrl":"https://doi.org/10.3389/frsip.2022.884384","url":null,"abstract":"Clinical assessments of movement disorders currently rely on the administration of rating scales, which, while clinimetrically validated and reliable, rely on clinicians’ subjective analyses, resulting in interrater differences. Intraoperative microelectrode recording for deep brain stimulation targeting similarly relies on clinicians’ subjective evaluations of movement-related neural activity. Digital motion tracking can improve the diagnosis, assessment, and treatment of movement disorders by generating objective, standardized measures of patients’ kinematics. Motion tracking with concurrent neural recording also enables motor neuroscience studies to elucidate the neurophysiology underlying movements. Despite these promises, motion tracking has seen limited adoption in clinical settings due to the drawbacks of conventional motion tracking systems and practical limitations associated with clinical settings. However, recent advances in deep learning based computer vision algorithms have made accurate, robust markerless motion. tracking viable in any setting where digital video can be captured. Here, we review and discuss the potential clinical applications and technical limitations of deep learning based markerless motion tracking methods with a focus on DeepLabCut (DLC), an open-source software package that has been extensively applied in animal neuroscience research. We first provide a general overview of DLC, discuss its present usage, and describe the advantages that DLC confers over other motion tracking methods for clinical use. We then present our preliminary results from three ongoing studies that demonstrate the use of DLC for 1) movement disorder patient assessment and diagnosis, 2) intraoperative motor mapping for deep brain stimulation targeting and 3) intraoperative neural and kinematic recording for basic human motor neuroscience.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"128 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90550429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-21DOI: 10.3389/frsip.2022.999457
E. Kumalija, Y. Nakamoto
In VoIP applications, such as Interactive Voice Response and VoIP-phone conversation transcription, speech signals are degraded not only by environmental noise but also by transmission network quality, and distortions induced by encoding and decoding algorithms. Therefore, there is a need for automatic speech recognition (ASR) systems to handle integrated noise-network distorted speech. In this study, we present a comparative analysis of a speech-to-text system trained on clean speech against one trained on integrated noise-network distorted speech. Training an ASR model on noise-network distorted speech dataset improves its robustness. Although the performance of an ASR model trained on clean speech depends on noise type, this is not the case when noise is further distorted by network transmission. The model trained on noise-network distorted speech exhibited a 60% improvement rate in the word error rate (WER), word match rate (MER), and word information lost (WIL) over the model trained on clean speech. Furthermore, the ASR model trained with noise-network distorted speech could tolerate a jitter of less than 20% and a packet loss of less than 15%, without a decrease in performance. However, WER, MER, and WIL increased in proportion to the jitter and packet loss as they exceeded 20% and 15%, respectively. Additionally, the model trained on noise-network distorted speech exhibited higher robustness compared to that trained on clean speech. The ASR model trained on noise-network distorted speech can also tolerate signal-to-noise (SNR) values of 5 dB and above, without the loss of performance, independent of noise type.
{"title":"Performance evaluation of automatic speech recognition systems on integrated noise-network distorted speech","authors":"E. Kumalija, Y. Nakamoto","doi":"10.3389/frsip.2022.999457","DOIUrl":"https://doi.org/10.3389/frsip.2022.999457","url":null,"abstract":"In VoIP applications, such as Interactive Voice Response and VoIP-phone conversation transcription, speech signals are degraded not only by environmental noise but also by transmission network quality, and distortions induced by encoding and decoding algorithms. Therefore, there is a need for automatic speech recognition (ASR) systems to handle integrated noise-network distorted speech. In this study, we present a comparative analysis of a speech-to-text system trained on clean speech against one trained on integrated noise-network distorted speech. Training an ASR model on noise-network distorted speech dataset improves its robustness. Although the performance of an ASR model trained on clean speech depends on noise type, this is not the case when noise is further distorted by network transmission. The model trained on noise-network distorted speech exhibited a 60% improvement rate in the word error rate (WER), word match rate (MER), and word information lost (WIL) over the model trained on clean speech. Furthermore, the ASR model trained with noise-network distorted speech could tolerate a jitter of less than 20% and a packet loss of less than 15%, without a decrease in performance. However, WER, MER, and WIL increased in proportion to the jitter and packet loss as they exceeded 20% and 15%, respectively. Additionally, the model trained on noise-network distorted speech exhibited higher robustness compared to that trained on clean speech. The ASR model trained on noise-network distorted speech can also tolerate signal-to-noise (SNR) values of 5 dB and above, without the loss of performance, independent of noise type.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90113584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-13DOI: 10.3389/frsip.2022.981453
E. Fotiadou, Raoul Melaet, R. Vullings
The use of wearable technology for monitoring a person’s health status is becoming increasingly more popular. Unfortunately, this technology typically suffers from low-quality measurement data, making the acquisition of, for instance, the heart rate based on electrocardiography data from non-adhesive sensors challenging. Such sensors are prone to motion artifacts and hence the electrocardiogram (ECG) measurements require signal processing to enhance their quality and enable detection of the heart rate. Over the last years, considerable progress has been made in the use of deep neural networks for many signal processing challenges. Yet, for healthcare applications their success is limited because the required large datasets to train these networks are typically not available. In this paper we propose a method to embed prior knowledge about the measurement data and problem statement in the network architecture to make it more data efficient. Our proposed method aims to enhance the quality of ECG signals by describing ECG signals from the perspective of a multi-measurement vector convolutional sparse coding model and use a deep unfolded neural network architecture to learn the model parameters. The sparse coding problem was solved using the Alternation Direction Method of Multipliers. Our method was evaluated by denoising ECG signals, that were corrupted by adding noise to clean ECG signals, and subsequently detecting the heart beats from the denoised data and compare these to the heartbeats and derived heartrate variability features detected in the clean ECG signals. This evaluation demonstrated an improved in the signal-to-noise ratio (SNR) improvement ranging from 17 to 27 dB and an improvement in heart rate detection (i.e. F1 score) ranging between 0 and 50%, where the range depends on the SNR of the input signals. The performance of the method was compared to that of a denoising encoder-decoder neural network and a wavelet-based denoising method, showing equivalent and better performance, respectively.
{"title":"Deep unfolding for multi-measurement vector convolutional sparse coding to denoise unobtrusive electrocardiography signals","authors":"E. Fotiadou, Raoul Melaet, R. Vullings","doi":"10.3389/frsip.2022.981453","DOIUrl":"https://doi.org/10.3389/frsip.2022.981453","url":null,"abstract":"The use of wearable technology for monitoring a person’s health status is becoming increasingly more popular. Unfortunately, this technology typically suffers from low-quality measurement data, making the acquisition of, for instance, the heart rate based on electrocardiography data from non-adhesive sensors challenging. Such sensors are prone to motion artifacts and hence the electrocardiogram (ECG) measurements require signal processing to enhance their quality and enable detection of the heart rate. Over the last years, considerable progress has been made in the use of deep neural networks for many signal processing challenges. Yet, for healthcare applications their success is limited because the required large datasets to train these networks are typically not available. In this paper we propose a method to embed prior knowledge about the measurement data and problem statement in the network architecture to make it more data efficient. Our proposed method aims to enhance the quality of ECG signals by describing ECG signals from the perspective of a multi-measurement vector convolutional sparse coding model and use a deep unfolded neural network architecture to learn the model parameters. The sparse coding problem was solved using the Alternation Direction Method of Multipliers. Our method was evaluated by denoising ECG signals, that were corrupted by adding noise to clean ECG signals, and subsequently detecting the heart beats from the denoised data and compare these to the heartbeats and derived heartrate variability features detected in the clean ECG signals. This evaluation demonstrated an improved in the signal-to-noise ratio (SNR) improvement ranging from 17 to 27 dB and an improvement in heart rate detection (i.e. F1 score) ranging between 0 and 50%, where the range depends on the SNR of the input signals. The performance of the method was compared to that of a denoising encoder-decoder neural network and a wavelet-based denoising method, showing equivalent and better performance, respectively.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"64 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89700685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-08DOI: 10.3389/frsip.2022.884254
L. Ciccarelli, S. Ferrara, Florian Maurer
TV 3.0 is the next generation digital broadcasting system developed in Brazil by the SBTVD Forum. The ambition of TV 3.0 is significantly higher than that of previous generations as it targets the delivery of IP based signals for applications, such as 8K, HDR, virtual and augmented reality, video enhancement and scalability. To deliver such services, more advanced and flexible compression technologies are required. MPEG-5 Part 2 Low Complexity Enhancement Video Coding (LCEVC) is a new video coding standard which works in combination with a separate video standard (e.g., H.264/AVC [H.264/AVC], H.265/HEVC [H.265/HEVC], H.266/VVC [H.266/VVC], AV1 [AV1]) to enhance the quality of a video. In the typical scenario, the enhanced quality is provided in terms of a higher resolution video obtained by adding details coded through an enhancement layer to a lower resolution version of the same video coded through a base layer. The LCEVC format also provides the ability to signal the bit-depth of the base layer independently from that of the enhancement layer and allowing up to 14-bit depth HDR. MPEG-5 LCEVC has been selected by the SBTVD committee as part of the TV 3.0 in December 2021. In this paper we describe the proposal submitted for LCEVC in response to the SBTVD Call for Proposals (CfP) for TV 3.0.
TV 3.0是巴西SBTVD论坛开发的下一代数字广播系统。电视3.0的雄心壮志明显高于前几代,因为它的目标是为应用提供基于IP的信号,如8K、HDR、虚拟和增强现实、视频增强和可扩展性。为了提供这样的服务,需要更先进、更灵活的压缩技术。MPEG-5 Part 2低复杂度增强视频编码(LCEVC)是一种新的视频编码标准,它可以与单独的视频标准(如H.264/AVC [H. 264])结合使用。264/ avc, h .265/ hevc [h。266/ vvc [h。266/VVC], AV1 [AV1])来提高视频的质量。在典型场景中,通过将通过增强层编码的细节添加到通过基础层编码的相同视频的较低分辨率版本中获得更高分辨率的视频来提供增强质量。LCEVC格式还提供了独立于增强层的基础层的位深信号的能力,并允许高达14位深度的HDR。MPEG-5 LCEVC已于2021年12月被SBTVD委员会选定为TV 3.0的一部分。在本文中,我们描述了为响应SBTVD电视3.0提案征集(CfP)而提交给LCEVC的提案。
{"title":"MPEG-5 LCEVC for 3.0 Next Generation Digital TV in Brazil","authors":"L. Ciccarelli, S. Ferrara, Florian Maurer","doi":"10.3389/frsip.2022.884254","DOIUrl":"https://doi.org/10.3389/frsip.2022.884254","url":null,"abstract":"TV 3.0 is the next generation digital broadcasting system developed in Brazil by the SBTVD Forum. The ambition of TV 3.0 is significantly higher than that of previous generations as it targets the delivery of IP based signals for applications, such as 8K, HDR, virtual and augmented reality, video enhancement and scalability. To deliver such services, more advanced and flexible compression technologies are required. MPEG-5 Part 2 Low Complexity Enhancement Video Coding (LCEVC) is a new video coding standard which works in combination with a separate video standard (e.g., H.264/AVC [H.264/AVC], H.265/HEVC [H.265/HEVC], H.266/VVC [H.266/VVC], AV1 [AV1]) to enhance the quality of a video. In the typical scenario, the enhanced quality is provided in terms of a higher resolution video obtained by adding details coded through an enhancement layer to a lower resolution version of the same video coded through a base layer. The LCEVC format also provides the ability to signal the bit-depth of the base layer independently from that of the enhancement layer and allowing up to 14-bit depth HDR. MPEG-5 LCEVC has been selected by the SBTVD committee as part of the TV 3.0 in December 2021. In this paper we describe the proposal submitted for LCEVC in response to the SBTVD Call for Proposals (CfP) for TV 3.0.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81003708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}