首页 > 最新文献

Frontiers in signal processing最新文献

英文 中文
Does Deep Learning-Based Super-Resolution Help Humans With Face Recognition? 基于深度学习的超分辨率能帮助人类识别人脸吗?
Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2022-04-20 DOI: 10.3389/frsip.2022.854737
Erik Velan, M. Fontani, Sergio Carrato, M. Jerian
The last decade witnessed a renaissance of machine learning for image processing. Super-resolution (SR) is one of the areas where deep learning techniques have achieved impressive results, with a specific focus on the SR of facial images. Examining and comparing facial images is one of the critical activities in forensic video analysis; a compelling question is thus whether recent SR techniques could help face recognition (FR) made by a human operator, especially in the challenging scenario where very low resolution images are available, which is typical of surveillance recordings. This paper addresses such a question through a simple yet insightful experiment: we used two state-of-the-art deep learning-based SR algorithms to enhance some very low-resolution faces of 30 worldwide celebrities. We then asked a heterogeneous group of more than 130 individuals to recognize them and compared the recognition accuracy against the one achieved by presenting a simple bicubic-interpolated version of the same faces. Results are somehow surprising: despite an undisputed general superiority of SR-enhanced images in terms of visual appearance, SR techniques brought no considerable advantage in overall recognition accuracy.
过去十年见证了机器学习在图像处理方面的复兴。超分辨率(SR)是深度学习技术取得令人印象深刻成果的领域之一,特别关注面部图像的SR。面部图像的检测和比对是法医视频分析的关键环节之一;因此,一个令人信服的问题是,最近的SR技术是否可以帮助人类操作员进行人脸识别(FR),特别是在具有挑战性的场景中,非常低分辨率的图像可用,这是典型的监控记录。本文通过一个简单而深刻的实验解决了这样一个问题:我们使用了两种最先进的基于深度学习的SR算法来增强30位世界名人的一些非常低分辨率的面孔。然后,我们要求一个由130多人组成的异质小组来识别这些面孔,并将其识别准确率与展示相同面孔的简单三次插值版本所获得的识别准确率进行比较。结果有些令人惊讶:尽管SR增强图像在视觉外观方面具有无可争议的普遍优势,但SR技术在整体识别准确性方面没有带来相当大的优势。
{"title":"Does Deep Learning-Based Super-Resolution Help Humans With Face Recognition?","authors":"Erik Velan, M. Fontani, Sergio Carrato, M. Jerian","doi":"10.3389/frsip.2022.854737","DOIUrl":"https://doi.org/10.3389/frsip.2022.854737","url":null,"abstract":"The last decade witnessed a renaissance of machine learning for image processing. Super-resolution (SR) is one of the areas where deep learning techniques have achieved impressive results, with a specific focus on the SR of facial images. Examining and comparing facial images is one of the critical activities in forensic video analysis; a compelling question is thus whether recent SR techniques could help face recognition (FR) made by a human operator, especially in the challenging scenario where very low resolution images are available, which is typical of surveillance recordings. This paper addresses such a question through a simple yet insightful experiment: we used two state-of-the-art deep learning-based SR algorithms to enhance some very low-resolution faces of 30 worldwide celebrities. We then asked a heterogeneous group of more than 130 individuals to recognize them and compared the recognition accuracy against the one achieved by presenting a simple bicubic-interpolated version of the same faces. Results are somehow surprising: despite an undisputed general superiority of SR-enhanced images in terms of visual appearance, SR techniques brought no considerable advantage in overall recognition accuracy.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"94 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80896581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Spatiotemporal Features Fusion From Local Facial Regions for Micro-Expressions Recognition 基于局部人脸区域的时空特征融合微表情识别
Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2022-04-13 DOI: 10.3389/frsip.2022.861469
Mouath Aouayeb, Catherine Soladié, W. Hamidouche, K. Kpalma, R. Séguier
Facial micro-expressions (MiEs) analysis has applications in various fields, including emotional intelligence, psychotherapy, and police investigation. However, because MiEs are fast, subtle, and local reactions, there is a challenge for humans and machines to detect and recognize them. In this article, we propose a deep learning approach that addresses the locality and the temporal aspects of MiE by learning spatiotemporal features from local facial regions. Our proposed method is particularly unique in that we use two fusion-based squeeze and excitation (SE) strategies to drive the model to learn the optimal combination of extracted spatiotemporal features from each area. The proposed architecture enhances a previous solution of an automatic system for micro-expression recognition (MER) from local facial regions using a composite deep learning model of convolutional neural network (CNN) and long short-term memory (LSTM). Experiments on three spontaneous MiE datasets show that the proposed solution outperforms state-of-the-art approaches. Our code is presented at https://github.com/MouathAb/AnalyseMiE-CNN_LSTM_SE as an open source.
面部微表情(MiEs)分析在许多领域都有应用,包括情商、心理治疗和警察调查。然而,由于密斯是快速、微妙和局部的反应,人类和机器要检测和识别它们是一个挑战。在本文中,我们提出了一种深度学习方法,通过学习局部面部区域的时空特征来解决MiE的局部性和时间方面的问题。我们提出的方法特别独特,因为我们使用两种基于融合的挤压和激励(SE)策略来驱动模型学习从每个区域提取的时空特征的最佳组合。该架构利用卷积神经网络(CNN)和长短期记忆(LSTM)的复合深度学习模型,对先前的局部面部微表情识别(MER)自动系统的解决方案进行了改进。在三个自发MiE数据集上的实验表明,所提出的解决方案优于最先进的方法。我们的代码以开放源代码的形式出现在https://github.com/MouathAb/AnalyseMiE-CNN_LSTM_SE。
{"title":"Spatiotemporal Features Fusion From Local Facial Regions for Micro-Expressions Recognition","authors":"Mouath Aouayeb, Catherine Soladié, W. Hamidouche, K. Kpalma, R. Séguier","doi":"10.3389/frsip.2022.861469","DOIUrl":"https://doi.org/10.3389/frsip.2022.861469","url":null,"abstract":"Facial micro-expressions (MiEs) analysis has applications in various fields, including emotional intelligence, psychotherapy, and police investigation. However, because MiEs are fast, subtle, and local reactions, there is a challenge for humans and machines to detect and recognize them. In this article, we propose a deep learning approach that addresses the locality and the temporal aspects of MiE by learning spatiotemporal features from local facial regions. Our proposed method is particularly unique in that we use two fusion-based squeeze and excitation (SE) strategies to drive the model to learn the optimal combination of extracted spatiotemporal features from each area. The proposed architecture enhances a previous solution of an automatic system for micro-expression recognition (MER) from local facial regions using a composite deep learning model of convolutional neural network (CNN) and long short-term memory (LSTM). Experiments on three spontaneous MiE datasets show that the proposed solution outperforms state-of-the-art approaches. Our code is presented at https://github.com/MouathAb/AnalyseMiE-CNN_LSTM_SE as an open source.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"48 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90224397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Horizons in Single-Lead ECG Analysis From Devices to Data 从设备到数据的单导联心电图分析
Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2022-04-11 DOI: 10.3389/frsip.2022.866047
A. Abdou, S. Krishnan
Single-lead wearable electrocardiographic (ECG) devices for remote monitoring are emerging as critical components of the viability of long-term continuous health and wellness monitoring applications. These sensors make it simple to monitor chronically ill patients and the elderly in long-term care homes, as well as empower users focused on fitness and wellbeing with timely health and lifestyle information and metrics. This article addresses the future developments in single-lead electrocardiogram (ECG) wearables, their design concepts, signal processing, machine learning (ML), and emerging healthcare applications. A literature review of multiple wearable ECG remote monitoring devices is first performed; Apple Watch, Kardia, Zio, BioHarness, Bittium Faros and Carnation Ambulatory Monitor. Zio showed the longest wear time with patients wearing the patch for 14 days maximum but required users to mail the device to a processing center for analysis. While the Apple Watch and Kardia showed good quality acquisition of raw ECG but are not continuous monitoring devices. The design considerations for single-lead ECG wearable devices could be classified as follows: power needs, computational complexity, signal quality, and human factors. These dimensions shadow hardware and software characteristics of ECG wearables and can act as a checklist for future single-lead ECG wearable designs. Trends in ECG de-noising, signal processing, feature extraction, compressive sensing (CS), and remote monitoring applications are later followed to show the emerging opportunities and recent innovations in single-lead ECG wearables.
用于远程监测的单导联可穿戴心电图(ECG)设备正在成为长期连续健康监测应用可行性的关键组成部分。这些传感器可以很容易地监测长期护理院里的慢性病患者和老年人,并为关注健身和福祉的用户提供及时的健康和生活方式信息和指标。本文讨论了单导联心电图(ECG)可穿戴设备的未来发展、它们的设计概念、信号处理、机器学习(ML)和新兴医疗保健应用。首先对多种可穿戴式心电远程监护设备进行了文献综述;采购产品Apple Watch, Kardia, Zio, BioHarness, Bittium Faros和康乃馨动态监视器。Zio显示,患者佩戴贴片的最长时间为14天,但需要用户将设备邮寄到处理中心进行分析。而Apple Watch和Kardia对原始心电图的采集质量较好,但不是连续监测设备。单导联心电可穿戴设备的设计考虑因素可分为以下几个方面:功率需求、计算复杂度、信号质量和人为因素。这些尺寸反映了ECG可穿戴设备的硬件和软件特征,可以作为未来单导联ECG可穿戴设计的检查表。随后介绍了心电降噪、信号处理、特征提取、压缩感知(CS)和远程监测应用的趋势,以展示单导联心电可穿戴设备的新兴机会和最新创新。
{"title":"Horizons in Single-Lead ECG Analysis From Devices to Data","authors":"A. Abdou, S. Krishnan","doi":"10.3389/frsip.2022.866047","DOIUrl":"https://doi.org/10.3389/frsip.2022.866047","url":null,"abstract":"Single-lead wearable electrocardiographic (ECG) devices for remote monitoring are emerging as critical components of the viability of long-term continuous health and wellness monitoring applications. These sensors make it simple to monitor chronically ill patients and the elderly in long-term care homes, as well as empower users focused on fitness and wellbeing with timely health and lifestyle information and metrics. This article addresses the future developments in single-lead electrocardiogram (ECG) wearables, their design concepts, signal processing, machine learning (ML), and emerging healthcare applications. A literature review of multiple wearable ECG remote monitoring devices is first performed; Apple Watch, Kardia, Zio, BioHarness, Bittium Faros and Carnation Ambulatory Monitor. Zio showed the longest wear time with patients wearing the patch for 14 days maximum but required users to mail the device to a processing center for analysis. While the Apple Watch and Kardia showed good quality acquisition of raw ECG but are not continuous monitoring devices. The design considerations for single-lead ECG wearable devices could be classified as follows: power needs, computational complexity, signal quality, and human factors. These dimensions shadow hardware and software characteristics of ECG wearables and can act as a checklist for future single-lead ECG wearable designs. Trends in ECG de-noising, signal processing, feature extraction, compressive sensing (CS), and remote monitoring applications are later followed to show the emerging opportunities and recent innovations in single-lead ECG wearables.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76791046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Multivariate Lipschitz Analysis of the Stability of Neural Networks 神经网络稳定性的多元Lipschitz分析
Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2022-04-05 DOI: 10.3389/frsip.2022.794469
K. Gupta, F. Kaakai, B. Pesquet-Popescu, J. Pesquet, Fragkiskos D. Malliaros
The stability of neural networks with respect to adversarial perturbations has been extensively studied. One of the main strategies consist of quantifying the Lipschitz regularity of neural networks. In this paper, we introduce a multivariate Lipschitz constant-based stability analysis of fully connected neural networks allowing us to capture the influence of each input or group of inputs on the neural network stability. Our approach relies on a suitable re-normalization of the input space, with the objective to perform a more precise analysis than the one provided by a global Lipschitz constant. We investigate the mathematical properties of the proposed multivariate Lipschitz analysis and show its usefulness in better understanding the sensitivity of the neural network with regard to groups of inputs. We display the results of this analysis by a new representation designed for machine learning practitioners and safety engineers termed as a Lipschitz star. The Lipschitz star is a graphical and practical tool to analyze the sensitivity of a neural network model during its development, with regard to different combinations of inputs. By leveraging this tool, we show that it is possible to build robust-by-design models using spectral normalization techniques for controlling the stability of a neural network, given a safety Lipschitz target. Thanks to our multivariate Lipschitz analysis, we can also measure the efficiency of adversarial training in inference tasks. We perform experiments on various open access tabular datasets, and also on a real Thales Air Mobility industrial application subject to certification requirements.
神经网络在对抗性扰动下的稳定性已经得到了广泛的研究。其中一个主要的策略是量化神经网络的Lipschitz规则。在本文中,我们引入了基于多元Lipschitz常数的全连接神经网络稳定性分析,使我们能够捕获每个输入或一组输入对神经网络稳定性的影响。我们的方法依赖于输入空间的适当的再归一化,目的是执行比全局Lipschitz常数提供的更精确的分析。我们研究了所提出的多元Lipschitz分析的数学性质,并展示了它在更好地理解神经网络对输入组的敏感性方面的有用性。我们通过为机器学习从业者和安全工程师设计的一种称为Lipschitz星的新表示来显示这种分析的结果。Lipschitz星形图是一种实用的图形工具,用于分析神经网络模型在其发展过程中对不同输入组合的敏感性。通过利用这个工具,我们证明了在给定安全Lipschitz目标的情况下,使用谱归一化技术来建立设计鲁棒模型来控制神经网络的稳定性是可能的。由于我们的多元Lipschitz分析,我们还可以衡量推理任务中对抗性训练的效率。我们在各种开放存取表格数据集上进行实验,并在符合认证要求的实际泰雷兹空中移动工业应用程序上进行实验。
{"title":"Multivariate Lipschitz Analysis of the Stability of Neural Networks","authors":"K. Gupta, F. Kaakai, B. Pesquet-Popescu, J. Pesquet, Fragkiskos D. Malliaros","doi":"10.3389/frsip.2022.794469","DOIUrl":"https://doi.org/10.3389/frsip.2022.794469","url":null,"abstract":"The stability of neural networks with respect to adversarial perturbations has been extensively studied. One of the main strategies consist of quantifying the Lipschitz regularity of neural networks. In this paper, we introduce a multivariate Lipschitz constant-based stability analysis of fully connected neural networks allowing us to capture the influence of each input or group of inputs on the neural network stability. Our approach relies on a suitable re-normalization of the input space, with the objective to perform a more precise analysis than the one provided by a global Lipschitz constant. We investigate the mathematical properties of the proposed multivariate Lipschitz analysis and show its usefulness in better understanding the sensitivity of the neural network with regard to groups of inputs. We display the results of this analysis by a new representation designed for machine learning practitioners and safety engineers termed as a Lipschitz star. The Lipschitz star is a graphical and practical tool to analyze the sensitivity of a neural network model during its development, with regard to different combinations of inputs. By leveraging this tool, we show that it is possible to build robust-by-design models using spectral normalization techniques for controlling the stability of a neural network, given a safety Lipschitz target. Thanks to our multivariate Lipschitz analysis, we can also measure the efficiency of adversarial training in inference tasks. We perform experiments on various open access tabular datasets, and also on a real Thales Air Mobility industrial application subject to certification requirements.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"111 3S 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87594009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Deep-Learning Based Framework for Source Separation, Analysis, and Synthesis of Choral Ensembles 基于深度学习的合唱合奏源分离、分析和合成框架
Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2022-04-05 DOI: 10.3389/frsip.2022.808594
Pritish Chandna, Helena Cuesta, Darius Petermann, E. Gómez
Choral singing in the soprano, alto, tenor and bass (SATB) format is a widely practiced and studied art form with significant cultural importance. Despite the popularity of the choral setting, it has received little attention in the field of Music Information Retrieval. However, the recent publication of high-quality choral singing datasets as well as recent developments in deep learning based methodologies applied to the field of music and speech processing, have opened new avenues for research in this field. In this paper, we use some of the publicly available choral singing datasets to train and evaluate state-of-the-art source separation algorithms from the speech and music domains for the case of choral singing. Furthermore, we evaluate existing monophonic F0 estimators on the separated unison stems and propose an approximation of the perceived F0 of a unison signal. Additionally, we present a set of applications combining the proposed methodologies, including synthesizing a single singer voice from the unison, and transposing and remixing the separated stems into a synthetic multi-singer choral signal. We finally conduct a set of listening tests to perform a perceptual evaluation of the results we obtain with the proposed methodologies.
女高音、中音、男高音和男低音(SATB)形式的合唱是一种广泛实践和研究的艺术形式,具有重要的文化意义。尽管合唱环境很受欢迎,但在音乐信息检索领域却很少受到关注。然而,最近发表的高质量合唱数据集以及应用于音乐和语音处理领域的基于深度学习的方法的最新发展,为该领域的研究开辟了新的途径。在本文中,我们使用一些公开可用的合唱数据集来训练和评估来自合唱的语音和音乐领域的最先进的源分离算法。此外,我们评估了现有的单音F0估计器在分离的同音干上,并提出了一个同音信号的感知F0的近似值。此外,我们提出了一组结合所提出的方法的应用,包括从同音中合成单个歌手的声音,以及将分离的词干转置和重新混合成合成的多歌手合唱信号。最后,我们进行了一组听力测试,对我们使用所提出的方法获得的结果进行感性评估。
{"title":"A Deep-Learning Based Framework for Source Separation, Analysis, and Synthesis of Choral Ensembles","authors":"Pritish Chandna, Helena Cuesta, Darius Petermann, E. Gómez","doi":"10.3389/frsip.2022.808594","DOIUrl":"https://doi.org/10.3389/frsip.2022.808594","url":null,"abstract":"Choral singing in the soprano, alto, tenor and bass (SATB) format is a widely practiced and studied art form with significant cultural importance. Despite the popularity of the choral setting, it has received little attention in the field of Music Information Retrieval. However, the recent publication of high-quality choral singing datasets as well as recent developments in deep learning based methodologies applied to the field of music and speech processing, have opened new avenues for research in this field. In this paper, we use some of the publicly available choral singing datasets to train and evaluate state-of-the-art source separation algorithms from the speech and music domains for the case of choral singing. Furthermore, we evaluate existing monophonic F0 estimators on the separated unison stems and propose an approximation of the perceived F0 of a unison signal. Additionally, we present a set of applications combining the proposed methodologies, including synthesizing a single singer voice from the unison, and transposing and remixing the separated stems into a synthetic multi-singer choral signal. We finally conduct a set of listening tests to perform a perceptual evaluation of the results we obtain with the proposed methodologies.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76723978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
CEL-Unet: Distance Weighted Maps and Multi-Scale Pyramidal Edge Extraction for Accurate Osteoarthritic Bone Segmentation in CT Scans CEL-Unet:距离加权图和多尺度锥体边缘提取用于骨关节炎CT扫描的精确骨分割
Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2022-04-05 DOI: 10.3389/frsip.2022.857313
Matteo Rossi, L. Marsilio, L. Mainardi, A. Manzotti, P. Cerveri
Unet architectures are being investigated for automatic image segmentation of bones in CT scans because of their ability to address size-varying anatomies and pathological deformations. Nonetheless, changes in mineral density, narrowing of joint spaces and formation of largely irregular osteophytes may easily disrupt automatism requiring extensive manual refinement. A novel Unet variant, called CEL-Unet, is presented to boost the segmentation quality of the femur and tibia in the osteoarthritic knee joint. The neural network embeds region-aware and two contour-aware branches in the decoding path. The paper features three main technical novelties: 1) directed connections between contour and region branches progressively at different decoding scales; 2) pyramidal edge extraction in the contour branch to perform multi-resolution edge processing; 3) distance-weighted cross-entropy loss function to increase delineation quality at the sharp edges of the shapes. A set of 700 knee CT scans was used to train the model and test segmentation performance. Qualitatively CEL-Unet correctly segmented cases where the state-of-the-art architectures failed. Quantitatively, the Jaccard indexes of femur and tibia segmentation were 0.98 and 0.97, with median 3D reconstruction errors less than 0.80 and 0.60 mm, overcoming competitive Unet models. The results were evaluated against knee arthroplasty planning based on personalized surgical instruments (PSI). Excellent agreement with reference data was found for femoral (0.11°) and tibial (0.05°) alignments of the distal and proximal cuts computed on the reconstructed surfaces. The bone segmentation was effective for large pathological deformations and osteophytes, making the techniques potentially usable in PSI-based surgical planning, where the reconstruction accuracy of the bony shapes is one of the main critical factors for the success of the operation.
Unet架构正在被研究用于CT扫描中骨骼的自动图像分割,因为它们能够处理大小变化的解剖结构和病理变形。尽管如此,矿物质密度的变化、关节间隙的狭窄和大量不规则骨赘的形成很容易破坏需要大量人工改良的自动性。提出了一种新的Unet变体,称为CEL-Unet,用于提高骨关节炎膝关节中股骨和胫骨的分割质量。该神经网络在解码路径中嵌入区域感知分支和两个轮廓感知分支。本文的技术创新主要体现在三个方面:1)在不同的解码尺度上,等高线和区域分支之间的定向连接是渐进式的;2)轮廓分支中锥体边缘提取,进行多分辨率边缘处理;3)距离加权交叉熵损失函数,以提高形状锐利边缘的描绘质量。使用一组700个膝关节CT扫描来训练模型并测试分割性能。定性地,CEL-Unet正确地分割了最先进的体系结构失败的情况。定量上,股骨和胫骨分割的Jaccard指数分别为0.98和0.97,三维重建误差中位数分别小于0.80和0.60 mm,优于Unet模型。对基于个性化手术器械(PSI)的膝关节置换术计划的结果进行评估。重建面上股骨(0.11°)和胫骨(0.05°)远端和近端切口的对准与参考数据非常吻合。骨分割对于大的病理性变形和骨赘是有效的,这使得该技术在基于psi的手术计划中具有潜在的可用性,其中骨形状重建的准确性是手术成功的主要关键因素之一。
{"title":"CEL-Unet: Distance Weighted Maps and Multi-Scale Pyramidal Edge Extraction for Accurate Osteoarthritic Bone Segmentation in CT Scans","authors":"Matteo Rossi, L. Marsilio, L. Mainardi, A. Manzotti, P. Cerveri","doi":"10.3389/frsip.2022.857313","DOIUrl":"https://doi.org/10.3389/frsip.2022.857313","url":null,"abstract":"Unet architectures are being investigated for automatic image segmentation of bones in CT scans because of their ability to address size-varying anatomies and pathological deformations. Nonetheless, changes in mineral density, narrowing of joint spaces and formation of largely irregular osteophytes may easily disrupt automatism requiring extensive manual refinement. A novel Unet variant, called CEL-Unet, is presented to boost the segmentation quality of the femur and tibia in the osteoarthritic knee joint. The neural network embeds region-aware and two contour-aware branches in the decoding path. The paper features three main technical novelties: 1) directed connections between contour and region branches progressively at different decoding scales; 2) pyramidal edge extraction in the contour branch to perform multi-resolution edge processing; 3) distance-weighted cross-entropy loss function to increase delineation quality at the sharp edges of the shapes. A set of 700 knee CT scans was used to train the model and test segmentation performance. Qualitatively CEL-Unet correctly segmented cases where the state-of-the-art architectures failed. Quantitatively, the Jaccard indexes of femur and tibia segmentation were 0.98 and 0.97, with median 3D reconstruction errors less than 0.80 and 0.60 mm, overcoming competitive Unet models. The results were evaluated against knee arthroplasty planning based on personalized surgical instruments (PSI). Excellent agreement with reference data was found for femoral (0.11°) and tibial (0.05°) alignments of the distal and proximal cuts computed on the reconstructed surfaces. The bone segmentation was effective for large pathological deformations and osteophytes, making the techniques potentially usable in PSI-based surgical planning, where the reconstruction accuracy of the bony shapes is one of the main critical factors for the success of the operation.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"91 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86707290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Imagined Speech Classification Using Six Phonetically Distributed Words 基于六个语音分布词的想象语音分类
Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2022-03-25 DOI: 10.3389/frsip.2022.760643
Y. Varshney, Azizuddin Khan
Imagined speech can be used to send commands without any muscle movement or emitting audio. The current status of research is in the early stage, and there is a shortage of open-access datasets for imagined speech analysis. We have proposed an openly accessible electroencephalograph (EEG) dataset for six imagined words in this work. We have selected six phonetically distributed, monosyllabic, and emotionally neutral words from W-22 CID word lists. The phonetic distribution of words consisted of the different places of consonants’ articulation and different positions of tongue advancement for vowel pronunciation. The selected words were “could,” “yard,” “give,” “him,” “there,” and “toe.” The experiment was performed over 15 subjects who performed the overt and imagined speech task for the displayed word. Each word was presented 50 times in random order. EEG signals were recorded during the experiment using a 64-channel EEG acquisition system with a sampling rate of 2,048 Hz. A preliminary analysis of the recorded data is presented by performing the classification of EEGs corresponding to the imagined words. The achieved accuracy is above the chance level for all subjects, which suggests that the recorded EEGs contain distinctive information about the imagined words.
想象的语音可以用来发送命令,而不需要任何肌肉运动或发出声音。目前的研究处于早期阶段,缺乏开放获取的虚拟语音分析数据集。在这项工作中,我们提出了一个开放访问的脑电图(EEG)数据集,用于六个想象词。我们从W-22 CID单词列表中选择了6个语音分布的、单音节的、情感中性的单词。单词的语音分布由辅音发音的不同位置和元音发音的舌头推进的不同位置组成。被选中的单词是“could”、“yard”、“give”、“him”、“there”和“toe”。该实验在15名受试者中进行,他们分别对所显示的单词进行显性和想象的语音任务。每个单词按随机顺序出现50次。实验过程中脑电信号的记录采用64通道脑电信号采集系统,采样率为2048 Hz。通过对想象词对应的脑电图进行分类,对记录的数据进行初步分析。对所有受试者来说,达到的准确率都高于随机水平,这表明记录的脑电图包含了关于想象单词的独特信息。
{"title":"Imagined Speech Classification Using Six Phonetically Distributed Words","authors":"Y. Varshney, Azizuddin Khan","doi":"10.3389/frsip.2022.760643","DOIUrl":"https://doi.org/10.3389/frsip.2022.760643","url":null,"abstract":"Imagined speech can be used to send commands without any muscle movement or emitting audio. The current status of research is in the early stage, and there is a shortage of open-access datasets for imagined speech analysis. We have proposed an openly accessible electroencephalograph (EEG) dataset for six imagined words in this work. We have selected six phonetically distributed, monosyllabic, and emotionally neutral words from W-22 CID word lists. The phonetic distribution of words consisted of the different places of consonants’ articulation and different positions of tongue advancement for vowel pronunciation. The selected words were “could,” “yard,” “give,” “him,” “there,” and “toe.” The experiment was performed over 15 subjects who performed the overt and imagined speech task for the displayed word. Each word was presented 50 times in random order. EEG signals were recorded during the experiment using a 64-channel EEG acquisition system with a sampling rate of 2,048 Hz. A preliminary analysis of the recorded data is presented by performing the classification of EEGs corresponding to the imagined words. The achieved accuracy is above the chance level for all subjects, which suggests that the recorded EEGs contain distinctive information about the imagined words.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73016808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
CRLBs for Location and Velocity Estimation for MIMO Radars in CES-Distributed Clutter 基于CRLBs的多目标雷达位置和速度估计
Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2022-03-25 DOI: 10.3389/frsip.2022.822285
N. Rojhani, M. Greco, F. Gini
In this article, we investigate the problem of jointly estimating target location and velocity for widely separated multiple-input multiple-output (MIMO) radar operating in correlated non-Gaussian clutter, modeled by a complex elliptically symmetric (CES) distribution. More specifically, we derive the Cramér–Rao lower bounds (CRLBs) when the target is modeled by the Swerling 0 model and the clutter is complex t-distributed. We thoroughly analyze the impact of the clutter correlation and spikiness to provide accurate performance estimation. Index terms—Cramér–Rao lower bounds (CRLBs), MIMO radar, location and velocity estimation, performance analysis, complex elliptically symmetric (CES) distributed, and complex t-distribution.
在本文中,我们研究了在复杂椭圆对称(CES)分布的相关非高斯杂波中工作的广泛分离多输入多输出(MIMO)雷达目标位置和速度的联合估计问题。更具体地说,我们推导了目标用Swerling 0模型建模,杂波为复t分布时的cram - rao下界(CRLBs)。我们深入分析了杂波相关性和尖峰性的影响,以提供准确的性能估计。索引项- cramims - rao下界(CRLBs), MIMO雷达,位置和速度估计,性能分析,复杂椭圆对称(CES)分布和复杂t分布。
{"title":"CRLBs for Location and Velocity Estimation for MIMO Radars in CES-Distributed Clutter","authors":"N. Rojhani, M. Greco, F. Gini","doi":"10.3389/frsip.2022.822285","DOIUrl":"https://doi.org/10.3389/frsip.2022.822285","url":null,"abstract":"In this article, we investigate the problem of jointly estimating target location and velocity for widely separated multiple-input multiple-output (MIMO) radar operating in correlated non-Gaussian clutter, modeled by a complex elliptically symmetric (CES) distribution. More specifically, we derive the Cramér–Rao lower bounds (CRLBs) when the target is modeled by the Swerling 0 model and the clutter is complex t-distributed. We thoroughly analyze the impact of the clutter correlation and spikiness to provide accurate performance estimation. Index terms—Cramér–Rao lower bounds (CRLBs), MIMO radar, location and velocity estimation, performance analysis, complex elliptically symmetric (CES) distributed, and complex t-distribution.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80981293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Depth Map Super-Resolution via Cascaded Transformers Guidance 深度图超分辨率通过级联变压器指导
Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2022-03-24 DOI: 10.3389/frsip.2022.847890
I. Ariav, I. Cohen
Depth information captured by affordable depth sensors is characterized by low spatial resolution, which limits potential applications. Several methods have recently been proposed for guided super-resolution of depth maps using convolutional neural networks to overcome this limitation. In a guided super-resolution scheme, high-resolution depth maps are inferred from low-resolution ones with the additional guidance of a corresponding high-resolution intensity image. However, these methods are still prone to texture copying issues due to improper guidance by the intensity image. We propose a multi-scale residual deep network for depth map super-resolution. A cascaded transformer module incorporates high-resolution structural information from the intensity image into the depth upsampling process. The proposed cascaded transformer module achieves linear complexity in image resolution, making it applicable to high-resolution images. Extensive experiments demonstrate that the proposed method outperforms state-of-the-art techniques for guided depth super-resolution.
经济实惠的深度传感器捕获的深度信息具有空间分辨率低的特点,这限制了潜在的应用。为了克服这一限制,最近提出了几种使用卷积神经网络进行深度图引导超分辨率的方法。在引导的超分辨率方案中,高分辨率深度图是在低分辨率深度图的基础上,在相应的高分辨率强度图像的引导下推断出来的。然而,这些方法由于灰度图像引导不当,仍然容易出现纹理复制问题。提出了一种用于深度图超分辨率的多尺度残差深度网络。级联变压器模块将来自强度图像的高分辨率结构信息集成到深度上采样过程中。所提出的级联变压器模块在图像分辨率上实现了线性复杂度,使其适用于高分辨率图像。大量的实验表明,该方法优于当前最先进的制导深度超分辨率技术。
{"title":"Depth Map Super-Resolution via Cascaded Transformers Guidance","authors":"I. Ariav, I. Cohen","doi":"10.3389/frsip.2022.847890","DOIUrl":"https://doi.org/10.3389/frsip.2022.847890","url":null,"abstract":"Depth information captured by affordable depth sensors is characterized by low spatial resolution, which limits potential applications. Several methods have recently been proposed for guided super-resolution of depth maps using convolutional neural networks to overcome this limitation. In a guided super-resolution scheme, high-resolution depth maps are inferred from low-resolution ones with the additional guidance of a corresponding high-resolution intensity image. However, these methods are still prone to texture copying issues due to improper guidance by the intensity image. We propose a multi-scale residual deep network for depth map super-resolution. A cascaded transformer module incorporates high-resolution structural information from the intensity image into the depth upsampling process. The proposed cascaded transformer module achieves linear complexity in image resolution, making it applicable to high-resolution images. Extensive experiments demonstrate that the proposed method outperforms state-of-the-art techniques for guided depth super-resolution.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73350761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Speech Localization at Low Bitrates in Wireless Acoustics Sensor Networks 无线声学传感器网络低比特率语音定位
Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2022-03-17 DOI: 10.3389/frsip.2022.800003
Mariem Bouafif Mansali, Pablo Pérez Zarazaga, Tom Bäckström, Z. Lachiri
The use of speech source localization (SSL) and its applications offer great possibilities for the design of speaker local positioning systems with wireless acoustic sensor networks (WASNs). Recent works have shown that data-driven front-ends can outperform traditional algorithms for SSL when trained to work in specific domains, depending on factors like reverberation and noise levels. However, such localization models consider localization directly from raw sensor observations, without consideration for transmission losses in WASNs. In contrast, when sensors reside in separate real-life devices, we need to quantize, encode and transmit sensor data, decreasing the performance of localization, especially when the transmission bitrate is low. In this work, we investigate the effect of low bitrate transmission on a Direction of Arrival (DoA) estimator. We analyze a deep neural network (DNN) based framework performance as a function of the audio encoding bitrate for compressed signals by employing recent communication codecs including PyAWNeS, Opus, EVS, and Lyra. Experimental results show that training the DNN on input encoded with the PyAWNeS codec at 16.4 kB/s can improve the accuracy significantly, and up to 50% of accuracy degradation at a low bitrate for almost all codecs can be recovered. Our results further show that for the best accuracy of the trained model when one of the two channels can be encoded with a bitrate higher than 32 kB/s, it is optimal to have the raw data for the second channel. However, for a lower bitrate, it is preferable to similarly encode the two channels. More importantly, for practical applications, a more generalized model trained with a randomly selected codec for each channel, shows a large accuracy gain when at least one of the two channels is encoded with PyAWNeS.
语音源定位技术及其应用为无线声传感器网络扬声器局部定位系统的设计提供了极大的可能性。最近的研究表明,根据混响和噪声水平等因素,在特定领域进行训练时,数据驱动的前端可以胜过SSL的传统算法。然而,这种定位模型直接从原始传感器观测中考虑定位,而不考虑wasn中的传输损失。相反,当传感器位于独立的现实设备中时,我们需要量化、编码和传输传感器数据,这降低了定位的性能,特别是当传输比特率很低时。在这项工作中,我们研究了低比特率传输对到达方向(DoA)估计器的影响。我们通过使用最新的通信编解码器(包括PyAWNeS, Opus, EVS和Lyra),分析了基于深度神经网络(DNN)的框架性能作为压缩信号音频编码比特率的函数。实验结果表明,在PyAWNeS编解码器编码的输入上以16.4 kB/s的速度训练DNN可以显著提高准确率,并且几乎所有编解码器在低比特率下都可以恢复高达50%的准确率下降。我们的结果进一步表明,当两个通道中的一个可以以高于32 kB/s的比特率进行编码时,为了获得训练模型的最佳精度,最佳方法是使用第二个通道的原始数据。然而,对于较低的比特率,最好对两个通道进行类似的编码。更重要的是,对于实际应用,使用随机选择的编解码器对每个通道进行训练的更广义的模型显示,当两个通道中至少有一个使用PyAWNeS进行编码时,精度增益很大。
{"title":"Speech Localization at Low Bitrates in Wireless Acoustics Sensor Networks","authors":"Mariem Bouafif Mansali, Pablo Pérez Zarazaga, Tom Bäckström, Z. Lachiri","doi":"10.3389/frsip.2022.800003","DOIUrl":"https://doi.org/10.3389/frsip.2022.800003","url":null,"abstract":"The use of speech source localization (SSL) and its applications offer great possibilities for the design of speaker local positioning systems with wireless acoustic sensor networks (WASNs). Recent works have shown that data-driven front-ends can outperform traditional algorithms for SSL when trained to work in specific domains, depending on factors like reverberation and noise levels. However, such localization models consider localization directly from raw sensor observations, without consideration for transmission losses in WASNs. In contrast, when sensors reside in separate real-life devices, we need to quantize, encode and transmit sensor data, decreasing the performance of localization, especially when the transmission bitrate is low. In this work, we investigate the effect of low bitrate transmission on a Direction of Arrival (DoA) estimator. We analyze a deep neural network (DNN) based framework performance as a function of the audio encoding bitrate for compressed signals by employing recent communication codecs including PyAWNeS, Opus, EVS, and Lyra. Experimental results show that training the DNN on input encoded with the PyAWNeS codec at 16.4 kB/s can improve the accuracy significantly, and up to 50% of accuracy degradation at a low bitrate for almost all codecs can be recovered. Our results further show that for the best accuracy of the trained model when one of the two channels can be encoded with a bitrate higher than 32 kB/s, it is optimal to have the raw data for the second channel. However, for a lower bitrate, it is preferable to similarly encode the two channels. More importantly, for practical applications, a more generalized model trained with a randomly selected codec for each channel, shows a large accuracy gain when at least one of the two channels is encoded with PyAWNeS.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79019704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Frontiers in signal processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1