首页 > 最新文献

Frontiers in signal processing最新文献

英文 中文
Recent advances in photoacoustic blind source spectral unmixing approaches and the enhanced detection of endogenous tissue chromophores 光声盲源光谱分解方法及增强内源性组织发色团检测的最新进展
Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2022-11-10 DOI: 10.3389/frsip.2022.984901
Valeria Grasso, Hafiz Wajahat Hassan, P. Mirtaheri, Regine Willumeit-Rӧmer, J. Jose
Recently, the development of learning-based algorithms has shown a crucial role to extract features of vital importance from multi-spectral photoacoustic imaging. In particular, advances in spectral photoacoustic unmixing algorithms can identify tissue biomarkers without a priori information. This has the potential to enhance the diagnosis and treatment of a large number of diseases. Here, we investigated the latest progress within spectral photoacoustic unmixing approaches. We evaluated the sensitivity of different unsupervised Blind Source Separation (BSS) techniques such as Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Non-negative Matrix Factorization (NNMF) to distinguish absorbers from spectral photoacoustic imaging. Besides, the performance of a recently developed superpixel photoacoustic unmixing (SPAX) framework has been also examined in detail. Near-infrared spectroscopy (NIRS) has been used to validate the performance of the different unmixing algorithms. Although the NNMF has shown superior unmixing performance than PCA and ICA in terms of correlation and processing time, this is still prone to unmixing misinterpretation due to spectral coloring artifact. Thus, the SPAX framework, which also compensates for the spectral coloring effect, has shown improved sensitivity and specificity of the unmixed components. In addition, the SPAX also reveals the most and less prominent tissue components from sPAI at a volumetric scale in a data-driven way. Phantom experimental measurements and in vivo studies have been conducted to benchmark the performance of the BSS algorithms and the SPAX framework.
近年来,基于学习的算法在从多光谱光声成像中提取重要特征方面发挥了至关重要的作用。特别是,光谱光声分解算法的进步可以在没有先验信息的情况下识别组织生物标志物。这有可能提高对许多疾病的诊断和治疗。本文综述了光谱光声解混方法的最新进展。我们评估了不同的无监督盲源分离(BSS)技术,如主成分分析(PCA)、独立成分分析(ICA)和非负矩阵分解(NNMF)在光谱光声成像中区分吸收体的灵敏度。此外,本文还对新开发的超像素光声解混(SPAX)框架的性能进行了详细的研究。利用近红外光谱(NIRS)对不同解混算法的性能进行了验证。尽管NNMF在相关性和处理时间方面表现出比PCA和ICA更好的解混性能,但由于光谱着色伪影,这仍然容易产生解混误读。因此,SPAX框架也补偿了光谱着色效应,显示出提高了未混合组分的灵敏度和特异性。此外,SPAX还以数据驱动的方式在体积尺度上揭示了sPAI中最突出和最不突出的组织成分。幻影实验测量和体内研究已经进行了基准性能的BSS算法和SPAX框架。
{"title":"Recent advances in photoacoustic blind source spectral unmixing approaches and the enhanced detection of endogenous tissue chromophores","authors":"Valeria Grasso, Hafiz Wajahat Hassan, P. Mirtaheri, Regine Willumeit-Rӧmer, J. Jose","doi":"10.3389/frsip.2022.984901","DOIUrl":"https://doi.org/10.3389/frsip.2022.984901","url":null,"abstract":"Recently, the development of learning-based algorithms has shown a crucial role to extract features of vital importance from multi-spectral photoacoustic imaging. In particular, advances in spectral photoacoustic unmixing algorithms can identify tissue biomarkers without a priori information. This has the potential to enhance the diagnosis and treatment of a large number of diseases. Here, we investigated the latest progress within spectral photoacoustic unmixing approaches. We evaluated the sensitivity of different unsupervised Blind Source Separation (BSS) techniques such as Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Non-negative Matrix Factorization (NNMF) to distinguish absorbers from spectral photoacoustic imaging. Besides, the performance of a recently developed superpixel photoacoustic unmixing (SPAX) framework has been also examined in detail. Near-infrared spectroscopy (NIRS) has been used to validate the performance of the different unmixing algorithms. Although the NNMF has shown superior unmixing performance than PCA and ICA in terms of correlation and processing time, this is still prone to unmixing misinterpretation due to spectral coloring artifact. Thus, the SPAX framework, which also compensates for the spectral coloring effect, has shown improved sensitivity and specificity of the unmixed components. In addition, the SPAX also reveals the most and less prominent tissue components from sPAI at a volumetric scale in a data-driven way. Phantom experimental measurements and in vivo studies have been conducted to benchmark the performance of the BSS algorithms and the SPAX framework.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86933783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Multimodal detection of typical absence seizures in home environment with wearable electrodes 基于可穿戴电极的家庭环境中典型失神发作的多模态检测
Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2022-10-17 DOI: 10.3389/frsip.2022.1014700
C. Chatzichristos, Lauren Swinnen, Jaiver Macea, Miguel M. C. Bhagubai, W. van Paesschen, M. de Vos
Patients with absence epilepsy fail to report almost 90% of their seizures. The clinical gold standard to assess absence seizures is video-electroencephalography (vEEG) recorded in the hospital, an expensive and obtrusive procedure which requires also extended reviewing time. Wearable sensors, which allow the recording of electroencephalography (EEG), accelerometer and gyroscope have been used to monitor epileptic patients in their home environment for the first time. We developed a pipeline for accurate and robust absence seizure detection while reducing the review time of the long recordings. Our results show that multimodal analysis of absence seizures can improve the robustness to false alarms, while retaining a high sensitivity in seizure detection.
缺乏性癫痫患者几乎90%的癫痫发作没有报告。评估失神发作的临床黄金标准是在医院记录的视频脑电图(vEEG),这是一项昂贵且突发性的程序,也需要延长检查时间。可以记录脑电图(EEG)的可穿戴传感器、加速度计和陀螺仪首次被用于监测癫痫患者的家庭环境。我们开发了一种准确而稳健的缺失发作检测管道,同时减少了长记录的审查时间。我们的研究结果表明,多模态分析可以提高对假警报的鲁棒性,同时在癫痫检测中保持较高的灵敏度。
{"title":"Multimodal detection of typical absence seizures in home environment with wearable electrodes","authors":"C. Chatzichristos, Lauren Swinnen, Jaiver Macea, Miguel M. C. Bhagubai, W. van Paesschen, M. de Vos","doi":"10.3389/frsip.2022.1014700","DOIUrl":"https://doi.org/10.3389/frsip.2022.1014700","url":null,"abstract":"Patients with absence epilepsy fail to report almost 90% of their seizures. The clinical gold standard to assess absence seizures is video-electroencephalography (vEEG) recorded in the hospital, an expensive and obtrusive procedure which requires also extended reviewing time. Wearable sensors, which allow the recording of electroencephalography (EEG), accelerometer and gyroscope have been used to monitor epileptic patients in their home environment for the first time. We developed a pipeline for accurate and robust absence seizure detection while reducing the review time of the long recordings. Our results show that multimodal analysis of absence seizures can improve the robustness to false alarms, while retaining a high sensitivity in seizure detection.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89516934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
COVID-19 respiratory sound analysis and classification using audio textures 使用音频纹理分析和分类COVID-19呼吸声音
Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2022-10-05 DOI: 10.3389/frsip.2022.986293
Leticia Silva, Carlos Valadão, L. Lampier, D. Delisle-Rodríguez, Eliete Caldeira, T. Bastos-Filho, S. Krishnan
Since the COVID-19 outbreak, a major scientific effort has been made by researchers and companies worldwide to develop a digital diagnostic tool to screen this disease through some biomedical signals, such as cough, and speech. Joint time–frequency feature extraction techniques and machine learning (ML)-based models have been widely explored in respiratory diseases such as influenza, pertussis, and COVID-19 to find biomarkers from human respiratory system-generated acoustic sounds. In recent years, a variety of techniques for discriminating textures and computationally efficient local texture descriptors have been introduced, such as local binary patterns and local ternary patterns, among others. In this work, we propose an audio texture analysis of sounds emitted by subjects in suspicion of COVID-19 infection using time–frequency spectrograms. This approach of the feature extraction method has not been widely used for biomedical sounds, particularly for COVID-19 or respiratory diseases. We hypothesize that this textural sound analysis based on local binary patterns and local ternary patterns enables us to obtain a better classification model by discriminating both people with COVID-19 and healthy subjects. Cough, speech, and breath sounds from the INTERSPEECH 2021 ComParE and Cambridge KDD databases have been processed and analyzed to evaluate our proposed feature extraction method with ML techniques in order to distinguish between positive or negative for COVID-19 sounds. The results have been evaluated in terms of an unweighted average recall (UAR). The results show that the proposed method has performed well for cough, speech, and breath sound classification, with a UAR up to 100.00%, 60.67%, and 95.00%, respectively, to infer COVID-19 infection, which serves as an effective tool to perform a preliminary screening of COVID-19.
自2019冠状病毒病爆发以来,世界各地的研究人员和公司做出了重大的科学努力,开发一种数字诊断工具,通过咳嗽和语言等一些生物医学信号来筛查这种疾病。联合时频特征提取技术和基于机器学习(ML)的模型在流感、百日咳和COVID-19等呼吸系统疾病中得到了广泛的探索,以从人类呼吸系统产生的声音中寻找生物标志物。近年来,各种纹理识别技术和计算效率高的局部纹理描述符被引入,如局部二进制模式和局部三元模式等。在这项工作中,我们提出使用时频频谱对疑似COVID-19感染的受试者发出的声音进行音频纹理分析。这种方法的特征提取方法尚未广泛应用于生物医学声音,特别是COVID-19或呼吸系统疾病。我们假设这种基于局部二元模式和局部三元模式的纹理声音分析可以使我们通过区分COVID-19患者和健康受试者获得更好的分类模型。对来自INTERSPEECH 2021 ComParE和Cambridge KDD数据库的咳嗽、语音和呼吸音进行了处理和分析,以评估我们提出的基于ML技术的特征提取方法,以区分COVID-19声音的阳性或阴性。这些结果是根据未加权平均召回(UAR)进行评估的。结果表明,该方法对咳嗽声、说话声和呼吸声的分类效果良好,UAR分别达到100.00%、60.67%和95.00%,可作为初步筛查COVID-19的有效工具。
{"title":"COVID-19 respiratory sound analysis and classification using audio textures","authors":"Leticia Silva, Carlos Valadão, L. Lampier, D. Delisle-Rodríguez, Eliete Caldeira, T. Bastos-Filho, S. Krishnan","doi":"10.3389/frsip.2022.986293","DOIUrl":"https://doi.org/10.3389/frsip.2022.986293","url":null,"abstract":"Since the COVID-19 outbreak, a major scientific effort has been made by researchers and companies worldwide to develop a digital diagnostic tool to screen this disease through some biomedical signals, such as cough, and speech. Joint time–frequency feature extraction techniques and machine learning (ML)-based models have been widely explored in respiratory diseases such as influenza, pertussis, and COVID-19 to find biomarkers from human respiratory system-generated acoustic sounds. In recent years, a variety of techniques for discriminating textures and computationally efficient local texture descriptors have been introduced, such as local binary patterns and local ternary patterns, among others. In this work, we propose an audio texture analysis of sounds emitted by subjects in suspicion of COVID-19 infection using time–frequency spectrograms. This approach of the feature extraction method has not been widely used for biomedical sounds, particularly for COVID-19 or respiratory diseases. We hypothesize that this textural sound analysis based on local binary patterns and local ternary patterns enables us to obtain a better classification model by discriminating both people with COVID-19 and healthy subjects. Cough, speech, and breath sounds from the INTERSPEECH 2021 ComParE and Cambridge KDD databases have been processed and analyzed to evaluate our proposed feature extraction method with ML techniques in order to distinguish between positive or negative for COVID-19 sounds. The results have been evaluated in terms of an unweighted average recall (UAR). The results show that the proposed method has performed well for cough, speech, and breath sound classification, with a UAR up to 100.00%, 60.67%, and 95.00%, respectively, to infer COVID-19 infection, which serves as an effective tool to perform a preliminary screening of COVID-19.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"2000 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88291081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Balancing bias and performance in polyphonic piano transcription systems 平衡偏差和性能在复调钢琴转录系统
Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2022-10-03 DOI: 10.3389/frsip.2022.975932
L. Marták, Rainer Kelz , Gerhard Widmer 
Current state-of-the-art methods for polyphonic piano transcription tend to use high capacity neural networks. Most models are trained “end-to-end”, and learn a mapping from audio input to pitch labels. They require large training corpora consisting of many audio recordings of different piano models and temporally aligned pitch labels. It has been shown in previous work that neural network-based systems struggle to generalize to unseen note combinations, as they tend to learn note combinations by heart. Semi-supervised linear matrix decomposition is a frequently used alternative approach to piano transcription–one that does not have this particular drawback. The disadvantages of linear methods start to show when they encounter recordings of pieces played on unseen pianos, a scenario where neural networks seem relatively untroubled. A recently proposed approach called “Differentiable Dictionary Search” (DDS) combines the modeling capacity of deep density models with the linear mixing model of matrix decomposition in order to balance the mutual advantages and disadvantages of the standalone approaches, making it better suited to model unseen sources, while generalization to unseen note combinations should be unaffected, because the mixing model is not learned, and thus cannot acquire a corpus bias. In its initially proposed form, however, DDS is too inefficient in utilizing computational resources to be applied to piano music transcription. To reduce computational demands and memory requirements, we propose a number of modifications. These adjustments finally enable a fair comparison of our modified DDS variant with a semi-supervised matrix decomposition baseline, as well as a state-of-the-art, deep neural network based system that is trained end-to-end. In systematic experiments with both musical and “unmusical” piano recordings (real musical pieces and unusual chords), we provide quantitative and qualitative analyses at the frame level, characterizing the behavior of the modified approach, along with a comparison to several related methods. The results will generally show the fundamental promise of the model, and in particular demonstrate improvement in situations where a corpus bias incurred by learning from musical material of a specific genre would be problematic.
目前最先进的复调钢琴转录方法倾向于使用高容量神经网络。大多数模型都是“端到端”训练的,并学习从音频输入到音高标签的映射。它们需要大量的训练语料库,包括许多不同钢琴型号的录音和暂时对齐的音高标签。之前的研究已经表明,基于神经网络的系统很难泛化到看不见的音符组合,因为它们倾向于记住音符组合。半监督线性矩阵分解是一种常用的钢琴转录替代方法,它没有这个特殊的缺点。线性方法的缺点在遇到未见过的钢琴演奏曲目的录音时开始显现出来,在这种情况下,神经网络似乎相对不受影响。最近提出的一种称为“可微分字典搜索”(DDS)的方法将深度密度模型的建模能力与矩阵分解的线性混合模型相结合,以平衡独立方法的相互优缺点,使其更适合于建模看不见的源,而泛化到看不见的音符组合应该不受影响,因为混合模型没有学习,因此无法获得语料库偏差。然而,在其最初提出的形式中,DDS在利用计算资源方面效率太低,无法应用于钢琴音乐转录。为了减少计算需求和内存需求,我们提出了一些修改。这些调整最终能够将我们改进的DDS变体与半监督矩阵分解基线以及最先进的基于端到端训练的深度神经网络系统进行公平比较。在音乐和“非音乐”钢琴录音(真实的音乐作品和不寻常的和弦)的系统实验中,我们在框架层面上提供了定量和定性分析,描述了改进方法的行为特征,并与几种相关方法进行了比较。结果通常会显示该模型的基本承诺,特别是在学习特定类型的音乐材料导致语料库偏差的情况下,该模型会得到改善。
{"title":"Balancing bias and performance in polyphonic piano transcription systems","authors":"L. Marták, Rainer Kelz , Gerhard Widmer ","doi":"10.3389/frsip.2022.975932","DOIUrl":"https://doi.org/10.3389/frsip.2022.975932","url":null,"abstract":"Current state-of-the-art methods for polyphonic piano transcription tend to use high capacity neural networks. Most models are trained “end-to-end”, and learn a mapping from audio input to pitch labels. They require large training corpora consisting of many audio recordings of different piano models and temporally aligned pitch labels. It has been shown in previous work that neural network-based systems struggle to generalize to unseen note combinations, as they tend to learn note combinations by heart. Semi-supervised linear matrix decomposition is a frequently used alternative approach to piano transcription–one that does not have this particular drawback. The disadvantages of linear methods start to show when they encounter recordings of pieces played on unseen pianos, a scenario where neural networks seem relatively untroubled. A recently proposed approach called “Differentiable Dictionary Search” (DDS) combines the modeling capacity of deep density models with the linear mixing model of matrix decomposition in order to balance the mutual advantages and disadvantages of the standalone approaches, making it better suited to model unseen sources, while generalization to unseen note combinations should be unaffected, because the mixing model is not learned, and thus cannot acquire a corpus bias. In its initially proposed form, however, DDS is too inefficient in utilizing computational resources to be applied to piano music transcription. To reduce computational demands and memory requirements, we propose a number of modifications. These adjustments finally enable a fair comparison of our modified DDS variant with a semi-supervised matrix decomposition baseline, as well as a state-of-the-art, deep neural network based system that is trained end-to-end. In systematic experiments with both musical and “unmusical” piano recordings (real musical pieces and unusual chords), we provide quantitative and qualitative analyses at the frame level, characterizing the behavior of the modified approach, along with a comparison to several related methods. The results will generally show the fundamental promise of the model, and in particular demonstrate improvement in situations where a corpus bias incurred by learning from musical material of a specific genre would be problematic.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"56 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84903881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
On the Relative Importance of Visual and Spatial Audio Rendering on VR Immersion 视觉和空间音频渲染在VR沉浸中的相对重要性
Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2022-09-30 DOI: 10.3389/frsip.2022.904866
Thomas Potter, Z. Cvetković, E. De Sena
A study was performed using a virtual environment to investigate the relative importance of spatial audio fidelity and video resolution on perceived audio-visual quality and immersion. Subjects wore a head-mounted display and headphones and were presented with a virtual environment featuring music and speech stimuli using three levels each of spatial audio quality and video resolution. Spatial audio was rendered monaurally, binaurally with head-tracking, and binaurally with head-tracking and room acoustic rendering. Video was rendered at resolutions of 0.5 megapixels per eye, 1.5 megapixels per eye, and 2.5 megapixels per eye. Results showed that both video resolution and spatial audio rendering had a statistically significant effect on both immersion and audio-visual quality. Most strikingly, the results showed that under the conditions that were tested in the experiment, the addition of room acoustic rendering to head-tracked binaural audio had the same improvement on immersion as increasing the video resolution five-fold, from 0.5 megapixels per eye to 2.5 megapixels per eye.
一项研究使用虚拟环境来调查空间音频保真度和视频分辨率对感知视听质量和沉浸感的相对重要性。研究对象戴着头戴式显示器和耳机,在一个虚拟环境中播放音乐和语音刺激,每个空间音频质量和视频分辨率都有三个级别。空间音频采用单耳渲染、双耳头部跟踪和双耳头部跟踪和房间声学渲染。视频以每只眼睛50万像素、150万像素和250万像素的分辨率呈现。结果表明,视频分辨率和空间音频渲染对沉浸感和视听质量都有统计学上显著的影响。最引人注目的是,结果显示,在实验测试的条件下,将室内声学渲染添加到头部跟踪的双耳音频中,对沉浸感的改善与将视频分辨率提高五倍一样,从每只眼睛50万像素提高到每只眼睛250万像素。
{"title":"On the Relative Importance of Visual and Spatial Audio Rendering on VR Immersion","authors":"Thomas Potter, Z. Cvetković, E. De Sena","doi":"10.3389/frsip.2022.904866","DOIUrl":"https://doi.org/10.3389/frsip.2022.904866","url":null,"abstract":"A study was performed using a virtual environment to investigate the relative importance of spatial audio fidelity and video resolution on perceived audio-visual quality and immersion. Subjects wore a head-mounted display and headphones and were presented with a virtual environment featuring music and speech stimuli using three levels each of spatial audio quality and video resolution. Spatial audio was rendered monaurally, binaurally with head-tracking, and binaurally with head-tracking and room acoustic rendering. Video was rendered at resolutions of 0.5 megapixels per eye, 1.5 megapixels per eye, and 2.5 megapixels per eye. Results showed that both video resolution and spatial audio rendering had a statistically significant effect on both immersion and audio-visual quality. Most strikingly, the results showed that under the conditions that were tested in the experiment, the addition of room acoustic rendering to head-tracked binaural audio had the same improvement on immersion as increasing the video resolution five-fold, from 0.5 megapixels per eye to 2.5 megapixels per eye.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"116 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87666816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Estimation of the Optimal Spherical Harmonics Order for the Interpolation of Head-Related Transfer Functions Sampled on Sparse Irregular Grids 稀疏不规则网格上采样头相关传递函数插值的最优球谐阶估计
Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2022-09-30 DOI: 10.3389/frsip.2022.884541
David Bau, Johannes M. Arend, C. Pörschmann
Conventional individual head-related transfer function (HRTF) measurements are demanding in terms of measurement time and equipment. For more flexibility, free body movement (FBM) measurement systems provide an easy-to-use way to measure full-spherical HRTF datasets with less effort. However, having no fixed measurement installation implies that the HRTFs are not sampled on a predefined regular grid but rely on the individual movements of the subject. Furthermore, depending on the measurement effort, a rather small number of measurements can be expected, ranging, for example, from 50 to 150 sampling points. Spherical harmonics (SH) interpolation has been extensively studied recently as one method to obtain full-spherical datasets from such sparse measurements, but previous studies primarily focused on regular full-spherical sampling grids. For irregular grids, it remains unclear up to which spatial order meaningful SH coefficients can be calculated and how the resulting interpolation error compares to regular grids. This study investigates SH interpolation of selected irregular grids obtained from HRTF measurements with an FBM system. Intending to derive general constraints for SH interpolation of irregular grids, the study analyzes how the variation of the SH order affects the interpolation results. Moreover, the study demonstrates the importance of Tikhonov regularization for SH interpolation, which is popular for solving ill-posed numerical problems associated with such irregular grids. As a key result, the study shows that the optimal SH order that minimizes the interpolation error depends mainly on the grid and the regularization strength but is almost independent of the selected HRTF set. Based on these results, the study proposes to determine the optimal SH order by minimizing the interpolation error of a reference HRTF set sampled on the sparse and irregular FBM grid. Finally, the study verifies the proposed method for estimating the optimal SH order by comparing interpolation results of irregular and equivalent regular grids, showing that the differences are small when the SH interpolation is optimally parameterized.
传统的个体头部相关传递函数(HRTF)测量在测量时间和设备方面要求很高。为了获得更大的灵活性,自由体运动(FBM)测量系统提供了一种易于使用的方法来测量全球面HRTF数据集。然而,没有固定的测量装置意味着hrtf不是在预定义的规则网格上采样,而是依赖于受试者的个人运动。此外,根据测量工作的不同,可以预期相当少量的测量,例如从50到150个采样点不等。球面谐波(SH)插值作为从稀疏测量中获得全球面数据集的一种方法,近年来得到了广泛的研究,但以往的研究主要集中在规则的全球面采样网格上。对于不规则网格,目前还不清楚可以计算到哪个空间顺序有意义的SH系数,以及与规则网格相比,由此产生的插值误差如何。本文研究了用FBM系统对从HRTF测量中获得的选定不规则网格进行SH插值。为了推导出不规则网格SH插值的一般约束条件,分析了SH阶的变化对插值结果的影响。此外,该研究还证明了Tikhonov正则化对SH插值的重要性,这是解决与此类不规则网格相关的病态数值问题的流行方法。研究的一个关键结果是,最小化插值误差的最优SH顺序主要取决于网格和正则化强度,而几乎与所选择的HRTF集无关。基于这些结果,本研究提出了通过最小化参考HRTF集在稀疏和不规则FBM网格上采样的插值误差来确定最优SH顺序。最后,通过对不规则网格和等效规则网格插值结果的比较,验证了所提方法的最优SH阶估计方法,结果表明,在最优参数化SH阶插值时,两者之间的差异很小。
{"title":"Estimation of the Optimal Spherical Harmonics Order for the Interpolation of Head-Related Transfer Functions Sampled on Sparse Irregular Grids","authors":"David Bau, Johannes M. Arend, C. Pörschmann","doi":"10.3389/frsip.2022.884541","DOIUrl":"https://doi.org/10.3389/frsip.2022.884541","url":null,"abstract":"Conventional individual head-related transfer function (HRTF) measurements are demanding in terms of measurement time and equipment. For more flexibility, free body movement (FBM) measurement systems provide an easy-to-use way to measure full-spherical HRTF datasets with less effort. However, having no fixed measurement installation implies that the HRTFs are not sampled on a predefined regular grid but rely on the individual movements of the subject. Furthermore, depending on the measurement effort, a rather small number of measurements can be expected, ranging, for example, from 50 to 150 sampling points. Spherical harmonics (SH) interpolation has been extensively studied recently as one method to obtain full-spherical datasets from such sparse measurements, but previous studies primarily focused on regular full-spherical sampling grids. For irregular grids, it remains unclear up to which spatial order meaningful SH coefficients can be calculated and how the resulting interpolation error compares to regular grids. This study investigates SH interpolation of selected irregular grids obtained from HRTF measurements with an FBM system. Intending to derive general constraints for SH interpolation of irregular grids, the study analyzes how the variation of the SH order affects the interpolation results. Moreover, the study demonstrates the importance of Tikhonov regularization for SH interpolation, which is popular for solving ill-posed numerical problems associated with such irregular grids. As a key result, the study shows that the optimal SH order that minimizes the interpolation error depends mainly on the grid and the regularization strength but is almost independent of the selected HRTF set. Based on these results, the study proposes to determine the optimal SH order by minimizing the interpolation error of a reference HRTF set sampled on the sparse and irregular FBM grid. Finally, the study verifies the proposed method for estimating the optimal SH order by comparing interpolation results of irregular and equivalent regular grids, showing that the differences are small when the SH interpolation is optimally parameterized.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81030124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Deep learning based markerless motion tracking as a clinical tool for movement disorders: Utility, feasibility and early experience 基于深度学习的无标记运动跟踪作为运动障碍的临床工具:实用性、可行性和早期经验
Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2022-09-29 DOI: 10.3389/frsip.2022.884384
R. N. Tien, Anand Tekriwal, Dylan J. Calame, Jonathan P. Platt, Sunderland Baker, L. Seeberger, Drew S Kern, A. Person, S. Ojemann, John A. Thompson, D. Kramer
Clinical assessments of movement disorders currently rely on the administration of rating scales, which, while clinimetrically validated and reliable, rely on clinicians’ subjective analyses, resulting in interrater differences. Intraoperative microelectrode recording for deep brain stimulation targeting similarly relies on clinicians’ subjective evaluations of movement-related neural activity. Digital motion tracking can improve the diagnosis, assessment, and treatment of movement disorders by generating objective, standardized measures of patients’ kinematics. Motion tracking with concurrent neural recording also enables motor neuroscience studies to elucidate the neurophysiology underlying movements. Despite these promises, motion tracking has seen limited adoption in clinical settings due to the drawbacks of conventional motion tracking systems and practical limitations associated with clinical settings. However, recent advances in deep learning based computer vision algorithms have made accurate, robust markerless motion. tracking viable in any setting where digital video can be captured. Here, we review and discuss the potential clinical applications and technical limitations of deep learning based markerless motion tracking methods with a focus on DeepLabCut (DLC), an open-source software package that has been extensively applied in animal neuroscience research. We first provide a general overview of DLC, discuss its present usage, and describe the advantages that DLC confers over other motion tracking methods for clinical use. We then present our preliminary results from three ongoing studies that demonstrate the use of DLC for 1) movement disorder patient assessment and diagnosis, 2) intraoperative motor mapping for deep brain stimulation targeting and 3) intraoperative neural and kinematic recording for basic human motor neuroscience.
目前对运动障碍的临床评估依赖于评定量表的使用,而评定量表虽然经过临床验证和可靠,但依赖于临床医生的主观分析,导致了判读者之间的差异。术中用于深部脑刺激的微电极记录同样依赖于临床医生对运动相关神经活动的主观评估。数字运动跟踪可以通过生成客观的、标准化的患者运动学测量来改善运动障碍的诊断、评估和治疗。运动跟踪与并发神经记录也使运动神经科学研究阐明潜在的运动神经生理学。尽管有这些承诺,但由于传统运动跟踪系统的缺点和与临床环境相关的实际限制,运动跟踪在临床环境中的采用有限。然而,基于深度学习的计算机视觉算法的最新进展已经实现了精确、鲁棒的无标记运动。跟踪在任何可以捕获数字视频的环境中都是可行的。在这里,我们回顾并讨论了基于深度学习的无标记运动跟踪方法的潜在临床应用和技术局限性,重点是DeepLabCut (DLC),一个广泛应用于动物神经科学研究的开源软件包。我们首先提供了DLC的总体概述,讨论了其目前的使用情况,并描述了DLC在临床使用中赋予其他运动跟踪方法的优势。然后,我们展示了三项正在进行的研究的初步结果,这些研究证明了DLC在以下方面的应用:1)运动障碍患者评估和诊断;2)术中运动映射用于深部脑刺激靶向;3)术中神经和运动学记录用于基本的人类运动神经科学。
{"title":"Deep learning based markerless motion tracking as a clinical tool for movement disorders: Utility, feasibility and early experience","authors":"R. N. Tien, Anand Tekriwal, Dylan J. Calame, Jonathan P. Platt, Sunderland Baker, L. Seeberger, Drew S Kern, A. Person, S. Ojemann, John A. Thompson, D. Kramer","doi":"10.3389/frsip.2022.884384","DOIUrl":"https://doi.org/10.3389/frsip.2022.884384","url":null,"abstract":"Clinical assessments of movement disorders currently rely on the administration of rating scales, which, while clinimetrically validated and reliable, rely on clinicians’ subjective analyses, resulting in interrater differences. Intraoperative microelectrode recording for deep brain stimulation targeting similarly relies on clinicians’ subjective evaluations of movement-related neural activity. Digital motion tracking can improve the diagnosis, assessment, and treatment of movement disorders by generating objective, standardized measures of patients’ kinematics. Motion tracking with concurrent neural recording also enables motor neuroscience studies to elucidate the neurophysiology underlying movements. Despite these promises, motion tracking has seen limited adoption in clinical settings due to the drawbacks of conventional motion tracking systems and practical limitations associated with clinical settings. However, recent advances in deep learning based computer vision algorithms have made accurate, robust markerless motion. tracking viable in any setting where digital video can be captured. Here, we review and discuss the potential clinical applications and technical limitations of deep learning based markerless motion tracking methods with a focus on DeepLabCut (DLC), an open-source software package that has been extensively applied in animal neuroscience research. We first provide a general overview of DLC, discuss its present usage, and describe the advantages that DLC confers over other motion tracking methods for clinical use. We then present our preliminary results from three ongoing studies that demonstrate the use of DLC for 1) movement disorder patient assessment and diagnosis, 2) intraoperative motor mapping for deep brain stimulation targeting and 3) intraoperative neural and kinematic recording for basic human motor neuroscience.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"128 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90550429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Performance evaluation of automatic speech recognition systems on integrated noise-network distorted speech 综合噪声-网络畸变语音自动识别系统性能评价
Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2022-09-21 DOI: 10.3389/frsip.2022.999457
E. Kumalija, Y. Nakamoto
In VoIP applications, such as Interactive Voice Response and VoIP-phone conversation transcription, speech signals are degraded not only by environmental noise but also by transmission network quality, and distortions induced by encoding and decoding algorithms. Therefore, there is a need for automatic speech recognition (ASR) systems to handle integrated noise-network distorted speech. In this study, we present a comparative analysis of a speech-to-text system trained on clean speech against one trained on integrated noise-network distorted speech. Training an ASR model on noise-network distorted speech dataset improves its robustness. Although the performance of an ASR model trained on clean speech depends on noise type, this is not the case when noise is further distorted by network transmission. The model trained on noise-network distorted speech exhibited a 60% improvement rate in the word error rate (WER), word match rate (MER), and word information lost (WIL) over the model trained on clean speech. Furthermore, the ASR model trained with noise-network distorted speech could tolerate a jitter of less than 20% and a packet loss of less than 15%, without a decrease in performance. However, WER, MER, and WIL increased in proportion to the jitter and packet loss as they exceeded 20% and 15%, respectively. Additionally, the model trained on noise-network distorted speech exhibited higher robustness compared to that trained on clean speech. The ASR model trained on noise-network distorted speech can also tolerate signal-to-noise (SNR) values of 5 dB and above, without the loss of performance, independent of noise type.
在VoIP应用中,如交互式语音响应和VoIP电话会话转录,语音信号不仅受到环境噪声的影响,还受到传输网络质量以及编码和解码算法引起的失真的影响。因此,需要自动语音识别(ASR)系统来处理综合噪声网络失真语音。在这项研究中,我们提出了一个比较分析的语音到文本系统训练干净的语音和一个训练的综合噪声网络扭曲的语音。在噪声网络失真语音数据集上训练ASR模型可以提高其鲁棒性。虽然在干净语音上训练的ASR模型的性能取决于噪声类型,但当噪声被网络传输进一步扭曲时,情况就不是这样了。在噪声网络扭曲语音上训练的模型在单词错误率(WER)、单词匹配率(MER)和单词信息丢失(WIL)方面比在干净语音上训练的模型提高了60%。此外,用噪声网络扭曲语音训练的ASR模型可以容忍小于20%的抖动和小于15%的数据包丢失,而不会降低性能。但是,WER、MER和WIL分别超过20%和15%,与抖动和丢包成比例增加。此外,在噪声网络扭曲语音上训练的模型比在干净语音上训练的模型具有更高的鲁棒性。在噪声网络畸变语音上训练的ASR模型也可以容忍5 dB及以上的信噪比(SNR)值,而不会损失性能,与噪声类型无关。
{"title":"Performance evaluation of automatic speech recognition systems on integrated noise-network distorted speech","authors":"E. Kumalija, Y. Nakamoto","doi":"10.3389/frsip.2022.999457","DOIUrl":"https://doi.org/10.3389/frsip.2022.999457","url":null,"abstract":"In VoIP applications, such as Interactive Voice Response and VoIP-phone conversation transcription, speech signals are degraded not only by environmental noise but also by transmission network quality, and distortions induced by encoding and decoding algorithms. Therefore, there is a need for automatic speech recognition (ASR) systems to handle integrated noise-network distorted speech. In this study, we present a comparative analysis of a speech-to-text system trained on clean speech against one trained on integrated noise-network distorted speech. Training an ASR model on noise-network distorted speech dataset improves its robustness. Although the performance of an ASR model trained on clean speech depends on noise type, this is not the case when noise is further distorted by network transmission. The model trained on noise-network distorted speech exhibited a 60% improvement rate in the word error rate (WER), word match rate (MER), and word information lost (WIL) over the model trained on clean speech. Furthermore, the ASR model trained with noise-network distorted speech could tolerate a jitter of less than 20% and a packet loss of less than 15%, without a decrease in performance. However, WER, MER, and WIL increased in proportion to the jitter and packet loss as they exceeded 20% and 15%, respectively. Additionally, the model trained on noise-network distorted speech exhibited higher robustness compared to that trained on clean speech. The ASR model trained on noise-network distorted speech can also tolerate signal-to-noise (SNR) values of 5 dB and above, without the loss of performance, independent of noise type.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90113584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Deep unfolding for multi-measurement vector convolutional sparse coding to denoise unobtrusive electrocardiography signals 深度展开多测量向量卷积稀疏编码去噪不显眼的心电图信号
Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2022-09-13 DOI: 10.3389/frsip.2022.981453
E. Fotiadou, Raoul Melaet, R. Vullings
The use of wearable technology for monitoring a person’s health status is becoming increasingly more popular. Unfortunately, this technology typically suffers from low-quality measurement data, making the acquisition of, for instance, the heart rate based on electrocardiography data from non-adhesive sensors challenging. Such sensors are prone to motion artifacts and hence the electrocardiogram (ECG) measurements require signal processing to enhance their quality and enable detection of the heart rate. Over the last years, considerable progress has been made in the use of deep neural networks for many signal processing challenges. Yet, for healthcare applications their success is limited because the required large datasets to train these networks are typically not available. In this paper we propose a method to embed prior knowledge about the measurement data and problem statement in the network architecture to make it more data efficient. Our proposed method aims to enhance the quality of ECG signals by describing ECG signals from the perspective of a multi-measurement vector convolutional sparse coding model and use a deep unfolded neural network architecture to learn the model parameters. The sparse coding problem was solved using the Alternation Direction Method of Multipliers. Our method was evaluated by denoising ECG signals, that were corrupted by adding noise to clean ECG signals, and subsequently detecting the heart beats from the denoised data and compare these to the heartbeats and derived heartrate variability features detected in the clean ECG signals. This evaluation demonstrated an improved in the signal-to-noise ratio (SNR) improvement ranging from 17 to 27 dB and an improvement in heart rate detection (i.e. F1 score) ranging between 0 and 50%, where the range depends on the SNR of the input signals. The performance of the method was compared to that of a denoising encoder-decoder neural network and a wavelet-based denoising method, showing equivalent and better performance, respectively.
使用可穿戴技术来监测一个人的健康状况正变得越来越流行。不幸的是,这种技术通常受到低质量测量数据的影响,例如,基于非粘性传感器的心电图数据的心率采集具有挑战性。这种传感器容易产生运动伪影,因此心电图(ECG)测量需要信号处理以提高其质量并能够检测心率。在过去的几年中,在使用深度神经网络解决许多信号处理挑战方面取得了相当大的进展。然而,对于医疗保健应用程序,它们的成功是有限的,因为训练这些网络所需的大型数据集通常不可用。本文提出了一种将测量数据和问题陈述的先验知识嵌入到网络体系结构中的方法,以提高网络体系结构的数据效率。我们提出的方法旨在从多测量向量卷积稀疏编码模型的角度描述心电信号,并使用深度展开的神经网络架构来学习模型参数,从而提高心电信号的质量。利用乘法器的交替方向法解决了稀疏编码问题。我们的方法是通过对被噪声破坏的心电信号进行去噪来评估的,然后从去噪的数据中检测心跳,并将这些数据与在干净的心电信号中检测到的心跳和衍生的心率变异性特征进行比较。该评估表明,信噪比(SNR)的改善范围从17到27 dB,心率检测(即F1评分)的改善范围在0到50%之间,其中范围取决于输入信号的信噪比。将该方法与去噪的编码器-解码器神经网络和基于小波的去噪方法进行了比较,结果表明该方法的性能相当,性能更好。
{"title":"Deep unfolding for multi-measurement vector convolutional sparse coding to denoise unobtrusive electrocardiography signals","authors":"E. Fotiadou, Raoul Melaet, R. Vullings","doi":"10.3389/frsip.2022.981453","DOIUrl":"https://doi.org/10.3389/frsip.2022.981453","url":null,"abstract":"The use of wearable technology for monitoring a person’s health status is becoming increasingly more popular. Unfortunately, this technology typically suffers from low-quality measurement data, making the acquisition of, for instance, the heart rate based on electrocardiography data from non-adhesive sensors challenging. Such sensors are prone to motion artifacts and hence the electrocardiogram (ECG) measurements require signal processing to enhance their quality and enable detection of the heart rate. Over the last years, considerable progress has been made in the use of deep neural networks for many signal processing challenges. Yet, for healthcare applications their success is limited because the required large datasets to train these networks are typically not available. In this paper we propose a method to embed prior knowledge about the measurement data and problem statement in the network architecture to make it more data efficient. Our proposed method aims to enhance the quality of ECG signals by describing ECG signals from the perspective of a multi-measurement vector convolutional sparse coding model and use a deep unfolded neural network architecture to learn the model parameters. The sparse coding problem was solved using the Alternation Direction Method of Multipliers. Our method was evaluated by denoising ECG signals, that were corrupted by adding noise to clean ECG signals, and subsequently detecting the heart beats from the denoised data and compare these to the heartbeats and derived heartrate variability features detected in the clean ECG signals. This evaluation demonstrated an improved in the signal-to-noise ratio (SNR) improvement ranging from 17 to 27 dB and an improvement in heart rate detection (i.e. F1 score) ranging between 0 and 50%, where the range depends on the SNR of the input signals. The performance of the method was compared to that of a denoising encoder-decoder neural network and a wavelet-based denoising method, showing equivalent and better performance, respectively.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"64 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89700685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MPEG-5 LCEVC for 3.0 Next Generation Digital TV in Brazil MPEG-5 LCEVC 3.0下一代数字电视在巴西
Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2022-09-08 DOI: 10.3389/frsip.2022.884254
L. Ciccarelli, S. Ferrara, Florian Maurer
TV 3.0 is the next generation digital broadcasting system developed in Brazil by the SBTVD Forum. The ambition of TV 3.0 is significantly higher than that of previous generations as it targets the delivery of IP based signals for applications, such as 8K, HDR, virtual and augmented reality, video enhancement and scalability. To deliver such services, more advanced and flexible compression technologies are required. MPEG-5 Part 2 Low Complexity Enhancement Video Coding (LCEVC) is a new video coding standard which works in combination with a separate video standard (e.g., H.264/AVC [H.264/AVC], H.265/HEVC [H.265/HEVC], H.266/VVC [H.266/VVC], AV1 [AV1]) to enhance the quality of a video. In the typical scenario, the enhanced quality is provided in terms of a higher resolution video obtained by adding details coded through an enhancement layer to a lower resolution version of the same video coded through a base layer. The LCEVC format also provides the ability to signal the bit-depth of the base layer independently from that of the enhancement layer and allowing up to 14-bit depth HDR. MPEG-5 LCEVC has been selected by the SBTVD committee as part of the TV 3.0 in December 2021. In this paper we describe the proposal submitted for LCEVC in response to the SBTVD Call for Proposals (CfP) for TV 3.0.
TV 3.0是巴西SBTVD论坛开发的下一代数字广播系统。电视3.0的雄心壮志明显高于前几代,因为它的目标是为应用提供基于IP的信号,如8K、HDR、虚拟和增强现实、视频增强和可扩展性。为了提供这样的服务,需要更先进、更灵活的压缩技术。MPEG-5 Part 2低复杂度增强视频编码(LCEVC)是一种新的视频编码标准,它可以与单独的视频标准(如H.264/AVC [H. 264])结合使用。264/ avc, h .265/ hevc [h。266/ vvc [h。266/VVC], AV1 [AV1])来提高视频的质量。在典型场景中,通过将通过增强层编码的细节添加到通过基础层编码的相同视频的较低分辨率版本中获得更高分辨率的视频来提供增强质量。LCEVC格式还提供了独立于增强层的基础层的位深信号的能力,并允许高达14位深度的HDR。MPEG-5 LCEVC已于2021年12月被SBTVD委员会选定为TV 3.0的一部分。在本文中,我们描述了为响应SBTVD电视3.0提案征集(CfP)而提交给LCEVC的提案。
{"title":"MPEG-5 LCEVC for 3.0 Next Generation Digital TV in Brazil","authors":"L. Ciccarelli, S. Ferrara, Florian Maurer","doi":"10.3389/frsip.2022.884254","DOIUrl":"https://doi.org/10.3389/frsip.2022.884254","url":null,"abstract":"TV 3.0 is the next generation digital broadcasting system developed in Brazil by the SBTVD Forum. The ambition of TV 3.0 is significantly higher than that of previous generations as it targets the delivery of IP based signals for applications, such as 8K, HDR, virtual and augmented reality, video enhancement and scalability. To deliver such services, more advanced and flexible compression technologies are required. MPEG-5 Part 2 Low Complexity Enhancement Video Coding (LCEVC) is a new video coding standard which works in combination with a separate video standard (e.g., H.264/AVC [H.264/AVC], H.265/HEVC [H.265/HEVC], H.266/VVC [H.266/VVC], AV1 [AV1]) to enhance the quality of a video. In the typical scenario, the enhanced quality is provided in terms of a higher resolution video obtained by adding details coded through an enhancement layer to a lower resolution version of the same video coded through a base layer. The LCEVC format also provides the ability to signal the bit-depth of the base layer independently from that of the enhancement layer and allowing up to 14-bit depth HDR. MPEG-5 LCEVC has been selected by the SBTVD committee as part of the TV 3.0 in December 2021. In this paper we describe the proposal submitted for LCEVC in response to the SBTVD Call for Proposals (CfP) for TV 3.0.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81003708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Frontiers in signal processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1