首页 > 最新文献

IEEE Journal of Selected Topics in Signal Processing最新文献

英文 中文
Speech Acoustic Markers Can Detect Mild Cognitive Impairment in Parkinson’s Disease 语音声学标记可以检测帕金森病的轻度认知障碍
IF 13.7 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-10-17 DOI: 10.1109/JSTSP.2025.3620716
Kara M. Smith;James R. Williamson;Thomas F. Quatieri
Background: Speech biomarkers have been used to assess motor dysfunction in people with Parkinson’s disease (PD), but speech biomarkers for mild cognitive impairment in PD (PD-MCI) have not been well studied. Objective: To identify speech acoustic features associated with PD-MCI and evaluate the performance of a model to discriminate PD-MCI from participants with normal cognitive status (PD-NC). Methods: We analyzed speech samples from 42 participants with PD, diagnosed as either PD-MCI or PD-NC using the Movement disorders Society Task Force Tier II criteria as a gold-standard classification of MCI. A reading passage and a picture description task were analyzed for acoustic features, which were used to generate individual and then a final fused Gaussian mixture model (GMM) to discriminate PD-MCI and PD-NC participants. Results: The picture description task yielded a larger number of acoustic features that were highly associated with PD-MCI status compared to the reading task. Fusing the model outputs from the picture description task resulted in an AUC = 0.82 for discriminating PD-MCI from PD-NC participants. The acoustic features associated with PD-MCI stemmed from multiple speech subsystems. Conclusion: PD-MCI has a distinct speech acoustic signature that may be harnessed to develop better tools to detect and monitor this complication.
背景:语言生物标志物已被用于评估帕金森病(PD)患者的运动功能障碍,但PD轻度认知障碍(PD- mci)的语言生物标志物尚未得到很好的研究。目的:识别与PD-MCI相关的语音声学特征,并评估PD-MCI与正常认知状态(PD-NC)参与者的区分模型的性能。方法:我们分析了42名PD患者的语音样本,这些患者被诊断为PD-MCI或PD- nc,使用运动障碍协会特别工作组第二级标准作为MCI的金标准分类。通过对阅读文章和图片描述任务的声学特征分析,生成个体特征,并最终建立融合高斯混合模型(GMM)来区分PD-MCI和PD-NC被试。结果:与阅读任务相比,图片描述任务产生了更多与PD-MCI状态高度相关的声学特征。融合图片描述任务的模型输出导致区分PD-MCI和PD-NC参与者的AUC = 0.82。PD-MCI的声学特征源于多个语音子系统。结论:PD-MCI具有独特的语音声学特征,可用于开发更好的工具来检测和监测该并发症。
{"title":"Speech Acoustic Markers Can Detect Mild Cognitive Impairment in Parkinson’s Disease","authors":"Kara M. Smith;James R. Williamson;Thomas F. Quatieri","doi":"10.1109/JSTSP.2025.3620716","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3620716","url":null,"abstract":"Background: Speech biomarkers have been used to assess motor dysfunction in people with Parkinson’s disease (PD), but speech biomarkers for mild cognitive impairment in PD (PD-MCI) have not been well studied. Objective: To identify speech acoustic features associated with PD-MCI and evaluate the performance of a model to discriminate PD-MCI from participants with normal cognitive status (PD-NC). Methods: We analyzed speech samples from 42 participants with PD, diagnosed as either PD-MCI or PD-NC using the Movement disorders Society Task Force Tier II criteria as a gold-standard classification of MCI. A reading passage and a picture description task were analyzed for acoustic features, which were used to generate individual and then a final fused Gaussian mixture model (GMM) to discriminate PD-MCI and PD-NC participants. Results: The picture description task yielded a larger number of acoustic features that were highly associated with PD-MCI status compared to the reading task. Fusing the model outputs from the picture description task resulted in an AUC = 0.82 for discriminating PD-MCI from PD-NC participants. The acoustic features associated with PD-MCI stemmed from multiple speech subsystems. Conclusion: PD-MCI has a distinct speech acoustic signature that may be harnessed to develop better tools to detect and monitor this complication.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 5","pages":"731-740"},"PeriodicalIF":13.7,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11206405","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145659224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting Neurocognitive Disorders Through Analyses of Topic Evolution and Cross-Modal Consistency in Visual-Stimulated Narratives 通过分析视觉刺激叙事的主题演变和跨模态一致性来检测神经认知障碍
IF 13.7 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-10-15 DOI: 10.1109/JSTSP.2025.3622049
Jinchao Li;Yuejiao Wang;Junan Li;Jiawen Kang;Bo Zheng;Ka Ho Wong;Brian Kan-Wing Mak;Helene H. Fung;Jean Woo;Man-Wai Mak;Timothy Kwok;Vincent Mok;Xianmin Gong;Xixin Wu;Xunying Liu;Patrick C. M. Wong;Helen Meng
Early detection of neurocognitive disorders (NCDs) is crucial for timely intervention and disease management. Given that language impairments manifest early in NCD progression, visual-stimulated narrative (VSN)-based analysis offers a promising avenue for NCD detection. Current VSN-based NCD detection methods primarily focus on linguistic microstructures (e.g., lexical diversity) that are closely tied to bottom-up, stimulus-driven cognitive processes. While these features illuminate basic language abilities, the higher-order linguistic macrostructures (e.g., topic development) that may reflect top-down, concept-driven cognitive abilities remain underexplored. These macrostructural patterns are crucial for NCD detection, yet challenging to quantify due to their abstract and complex nature. To bridge this gap, we propose two novel macrostructural approaches: (1) a Dynamic Topic Model (DTM) to track topic evolution over time, and (2) a Text-Image Temporal Alignment Network (TITAN) to measure cross-modal consistency between narrative and visual stimuli. Experimental results show the effectiveness of the proposed approaches in NCD detection, with TITAN achieving superior performance across three corpora: ADReSS (F1 = 0.8889), ADReSSo (F1 = 0.8504), and CU-MARVEL-RABBIT (F1 = 0.7238). Feature contribution analysis reveals that macrostructural features (e.g., topic variability, topic change rate, and topic consistency) constitute the most significant contributors to the model's decision pathways, outperforming the investigated microstructural features. These findings underscore the value of macrostructural analysis for understanding linguistic-cognitive interactions associated with NCDs.
神经认知障碍(NCDs)的早期发现对于及时干预和疾病管理至关重要。鉴于语言障碍在非传染性疾病发展的早期就表现出来,基于视觉刺激叙事(VSN)的分析为非传染性疾病的检测提供了一个有希望的途径。目前基于vsn的非传染性疾病检测方法主要关注与自下而上、刺激驱动的认知过程密切相关的语言微观结构(如词汇多样性)。虽然这些特征阐明了基本的语言能力,但可能反映自上而下、概念驱动的认知能力的高阶语言宏观结构(例如,主题发展)仍未得到充分探索。这些宏观结构模式对非传染性疾病的检测至关重要,但由于其抽象和复杂的性质,难以量化。为了弥补这一差距,我们提出了两种新的宏观结构方法:(1)动态主题模型(DTM)来跟踪主题随时间的演变,(2)文本-图像时间对齐网络(TITAN)来测量叙事和视觉刺激之间的跨模态一致性。实验结果表明了所提出方法在非传染性疾病检测中的有效性,其中TITAN在三个语料:address (F1 = 0.8889)、ADReSSo (F1 = 0.8504)和CU-MARVEL-RABBIT (F1 = 0.7238)上取得了优异的性能。特征贡献分析表明,宏观结构特征(如主题可变性、主题变化率和主题一致性)是模型决策路径的最重要贡献者,优于所研究的微观结构特征。这些发现强调了宏观结构分析对理解与非传染性疾病相关的语言认知相互作用的价值。
{"title":"Detecting Neurocognitive Disorders Through Analyses of Topic Evolution and Cross-Modal Consistency in Visual-Stimulated Narratives","authors":"Jinchao Li;Yuejiao Wang;Junan Li;Jiawen Kang;Bo Zheng;Ka Ho Wong;Brian Kan-Wing Mak;Helene H. Fung;Jean Woo;Man-Wai Mak;Timothy Kwok;Vincent Mok;Xianmin Gong;Xixin Wu;Xunying Liu;Patrick C. M. Wong;Helen Meng","doi":"10.1109/JSTSP.2025.3622049","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3622049","url":null,"abstract":"Early detection of neurocognitive disorders (NCDs) is crucial for timely intervention and disease management. Given that language impairments manifest early in NCD progression, visual-stimulated narrative (VSN)-based analysis offers a promising avenue for NCD detection. Current VSN-based NCD detection methods primarily focus on linguistic microstructures (e.g., lexical diversity) that are closely tied to bottom-up, stimulus-driven cognitive processes. While these features illuminate basic language abilities, the higher-order linguistic macrostructures (e.g., topic development) that may reflect top-down, concept-driven cognitive abilities remain underexplored. These macrostructural patterns are crucial for NCD detection, yet challenging to quantify due to their abstract and complex nature. To bridge this gap, we propose two novel macrostructural approaches: (1) a Dynamic Topic Model (DTM) to track topic evolution over time, and (2) a Text-Image Temporal Alignment Network (TITAN) to measure cross-modal consistency between narrative and visual stimuli. Experimental results show the effectiveness of the proposed approaches in NCD detection, with TITAN achieving superior performance across three corpora: ADReSS (F1 = 0.8889), ADReSSo (F1 = 0.8504), and CU-MARVEL-RABBIT (F1 = 0.7238). Feature contribution analysis reveals that macrostructural features (e.g., topic variability, topic change rate, and topic consistency) constitute the most significant contributors to the model's decision pathways, outperforming the investigated microstructural features. These findings underscore the value of macrostructural analysis for understanding linguistic-cognitive interactions associated with NCDs.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 5","pages":"741-756"},"PeriodicalIF":13.7,"publicationDate":"2025-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145659183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust MRI Reconstruction by Smoothed Unrolling (SMUG) 基于平滑展开(SMUG)的鲁棒MRI重建
IF 13.7 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-10-10 DOI: 10.1109/JSTSP.2025.3615540
Shijun Liang;Van Hoang Minh Nguyen;Jinghan Jia;Ismail R. Alkhouri;Sijia Liu;Saiprasad Ravishankar
As the popularity of deep learning (DL) in the field of magnetic resonance imaging (MRI) continues to rise, recent research has indicated that DL-based MRI reconstruction models might be excessively sensitive to minor input disturbances, including worst-case or random additive perturbations. This sensitivity often leads to unstable aliased images. This raises the question of how to devise DL techniques for MRI reconstruction that can be robust to these variations. To address this problem, we propose a novel image reconstruction framework, termed Smoothed Unrolling (SMUG), which advances a deep unrolling-based MRI reconstruction model using a randomized smoothing (RS)-based robust learning approach. RS, which improves the tolerance of a model against input noise, has been widely used in the design of adversarial defense approaches for image classification tasks. Yet, we find that the conventional design that applies RS to the entire DL-based MRI model is ineffective. In this paper, we show that SMUG and its variants address the above issue by customizing the RS process based on the unrolling architecture of DL-based MRI reconstruction models. We theoretically analyze the robustness of our method in the presence of perturbations. Compared to vanilla RS and other recent approaches, we show that SMUG improves the robustness of MRI reconstruction with respect to a diverse set of instability sources, including worst-case and random noise perturbations to input measurements, varying measurement sampling rates, and different numbers of unrolling steps.
随着深度学习(DL)在磁共振成像(MRI)领域的普及,最近的研究表明,基于DL的MRI重建模型可能对轻微的输入干扰过于敏感,包括最坏情况或随机加性扰动。这种灵敏度通常会导致不稳定的混叠图像。这就提出了一个问题,即如何设计能够对这些变化具有鲁棒性的MRI重建DL技术。为了解决这个问题,我们提出了一种新的图像重建框架,称为平滑展开(SMUG),它使用基于随机平滑(RS)的鲁棒学习方法提出了基于深度展开的MRI重建模型。RS提高了模型对输入噪声的容忍度,被广泛应用于图像分类任务的对抗防御方法设计中。然而,我们发现将RS应用于整个基于dl的MRI模型的传统设计是无效的。在本文中,我们展示了SMUG及其变体通过基于基于dl的MRI重建模型的展开架构定制RS过程来解决上述问题。从理论上分析了该方法在存在扰动时的鲁棒性。与传统RS和其他最近的方法相比,我们表明SMUG提高了MRI重建的鲁棒性,涉及不同的不稳定性来源,包括输入测量的最坏情况和随机噪声扰动,不同的测量采样率和不同的展开步骤数。
{"title":"Robust MRI Reconstruction by Smoothed Unrolling (SMUG)","authors":"Shijun Liang;Van Hoang Minh Nguyen;Jinghan Jia;Ismail R. Alkhouri;Sijia Liu;Saiprasad Ravishankar","doi":"10.1109/JSTSP.2025.3615540","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3615540","url":null,"abstract":"As the popularity of deep learning (DL) in the field of magnetic resonance imaging (MRI) continues to rise, recent research has indicated that DL-based MRI reconstruction models might be excessively sensitive to minor input disturbances, including worst-case or random additive perturbations. This sensitivity often leads to unstable aliased images. This raises the question of how to devise DL techniques for MRI reconstruction that can be robust to these variations. To address this problem, we propose a novel image reconstruction framework, termed <sc><u>Sm</u>oothed <u>U</u>nrollin<u>g</u></small> (<sc>SMUG</small>), which advances a deep unrolling-based MRI reconstruction model using a randomized smoothing (RS)-based robust learning approach. RS, which improves the tolerance of a model against input noise, has been widely used in the design of adversarial defense approaches for image classification tasks. Yet, we find that the conventional design that applies RS to the entire DL-based MRI model is ineffective. In this paper, we show that <sc>SMUG</small> and its variants address the above issue by customizing the RS process based on the unrolling architecture of DL-based MRI reconstruction models. We theoretically analyze the robustness of our method in the presence of perturbations. Compared to vanilla RS and other recent approaches, we show that <sc>SMUG</small> improves the robustness of MRI reconstruction with respect to a diverse set of instability sources, including worst-case and random noise perturbations to input measurements, varying measurement sampling rates, and different numbers of unrolling steps.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 7","pages":"1558-1573"},"PeriodicalIF":13.7,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145859871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
XPPG-PCA: Reference-Free Automatic Speech Severity Evaluation With Principal Components 基于主成分的无参考语音严重程度自动评估
IF 13.7 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-10-03 DOI: 10.1109/JSTSP.2025.3617859
Bence Mark Halpern;Thomas B. Tienkamp;Teja Rebernik;Rob J.J.H. van Son;Sebastiaan A.H.J. de Visscher;Max J.H. Witjes;Defne Abur;Tomoki Toda
Reliably evaluating the severity of a speech pathology is crucial in healthcare. However, the current reliance on expert evaluations by speech-language pathologists presents several challenges: while their assessments are highly skilled, they are also subjective, time-consuming, and costly, which can limit the reproducibility of clinical studies and place a strain on healthcare resources. While automated methods exist, they have significant drawbacks. Reference-based approaches require transcriptions or healthy speech samples, restricting them to read speech and limiting their applicability. Existing reference-free methods are also flawed; supervised models often learn spurious shortcuts from data, while handcrafted features are often unreliable and restricted to specific speech tasks. This paper introduces XPPG-PCA (x-vector phonetic posteriorgram principal component analysis), a novel, unsupervised, reference-free method for speech severity evaluation. Using three Dutch oral cancer datasets, we demonstrate that XPPG-PCA performs comparably to, or exceeds established reference-based methods. Our experiments confirm its robustness against data shortcuts and noise, showing its potential for real-world clinical use. Taken together, our results show that XPPG-PCA provides a robust, generalizable solution for the objective assessment of speech pathology, with the potential to significantly improve the efficiency and reliability of clinical evaluations across a range of disorders. An open-source implementation is available.
可靠地评估语言病理的严重程度在医疗保健中是至关重要的。然而,目前依赖语言病理学家的专家评估提出了几个挑战:虽然他们的评估是高度熟练的,但他们也是主观的,耗时的,昂贵的,这可能限制临床研究的可重复性,并对医疗资源造成压力。虽然存在自动化方法,但它们有明显的缺点。基于参考的方法需要转录或健康语音样本,限制了它们的阅读语音和限制了它们的适用性。现有的无引用方法也存在缺陷;有监督的模型经常从数据中学习虚假的捷径,而手工制作的特征往往不可靠,而且仅限于特定的语音任务。本文介绍了XPPG-PCA (x向量语音后图主成分分析),这是一种新的、无监督的、无参考的语音严重程度评估方法。使用三个荷兰口腔癌数据集,我们证明了XPPG-PCA的表现与现有的基于参考的方法相当,甚至超过了这些方法。我们的实验证实了它对数据捷径和噪声的鲁棒性,显示了它在现实世界的临床应用潜力。综上所述,我们的研究结果表明,XPPG-PCA为语言病理的客观评估提供了一个强大的、可推广的解决方案,有可能显著提高对一系列障碍的临床评估的效率和可靠性。一个开源实现是可用的。
{"title":"XPPG-PCA: Reference-Free Automatic Speech Severity Evaluation With Principal Components","authors":"Bence Mark Halpern;Thomas B. Tienkamp;Teja Rebernik;Rob J.J.H. van Son;Sebastiaan A.H.J. de Visscher;Max J.H. Witjes;Defne Abur;Tomoki Toda","doi":"10.1109/JSTSP.2025.3617859","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3617859","url":null,"abstract":"Reliably evaluating the severity of a speech pathology is crucial in healthcare. However, the current reliance on expert evaluations by speech-language pathologists presents several challenges: while their assessments are highly skilled, they are also subjective, time-consuming, and costly, which can limit the reproducibility of clinical studies and place a strain on healthcare resources. While automated methods exist, they have significant drawbacks. Reference-based approaches require transcriptions or healthy speech samples, restricting them to read speech and limiting their applicability. Existing reference-free methods are also flawed; supervised models often learn spurious shortcuts from data, while handcrafted features are often unreliable and restricted to specific speech tasks. This paper introduces XPPG-PCA (x-vector phonetic posteriorgram principal component analysis), a novel, unsupervised, reference-free method for speech severity evaluation. Using three Dutch oral cancer datasets, we demonstrate that XPPG-PCA performs comparably to, or exceeds established reference-based methods. Our experiments confirm its robustness against data shortcuts and noise, showing its potential for real-world clinical use. Taken together, our results show that XPPG-PCA provides a robust, generalizable solution for the objective assessment of speech pathology, with the potential to significantly improve the efficiency and reliability of clinical evaluations across a range of disorders. An open-source implementation is available.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 5","pages":"783-795"},"PeriodicalIF":13.7,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145659190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dysarthric Speech Intelligibility Assessment by Custom Keyword Spotting 通过自定义关键字识别来评估诵读困难语音的可理解性
IF 13.7 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-09-01 DOI: 10.1109/JSTSP.2025.3604709
Anuprabha M;Krishna Gurugubelli;Anil Kumar Vuppala
Traditionally, dysarthric speech intelligibility assessment systems have focused on speech as the primary input, utilizing methods such as extraction of relevant speech features, classification models, alignment of Automatic Speech Recognition (ASR) outputs, and comparisons between speech representations of dysarthric and healthy speakers. However, to achieve an automated intelligibility assessment that closely mirrors the auditory-perceptual evaluations conducted by clinicians, a model that captures both the acoustic characteristics of dysarthric speech and the linguistic structure related to word pronunciation are needed. Inspired by the practices of clinicians, this study introduces a novel text-guided dysarthric speech intelligibility assessment framework that leverages custom keyword spotting (DySIA-CKWS). The model evaluates intelligibility by detecting specific keywords and is extensively tested using UA-Speech database for speaker-wise analysis and across word groups of varying complexity. To ensure robustness, the system’s performance is further validated on TORGO database, demonstrating its adaptability in cross-database settings. Statistical analysis demonstrates strong alignment between predicted and subjective intelligibility scores, with a Pearson Correlation Coefficient (PCC) of 0.9588 and a Spearman’s Correlation Coefficient (SCC) of 0.9141, achieved using the proposed system on the UA-Speech database. The findings emphasize the importance of word selection and showcase the model’s effectiveness in diagnosing dysarthric speech, offering a significant advancement in intelligibility assessment methodologies.
传统上,困难语音可理解性评估系统侧重于将语音作为主要输入,利用诸如提取相关语音特征、分类模型、自动语音识别(ASR)输出对齐以及比较困难和健康说话者的语音表征等方法。然而,为了实现与临床医生进行的听觉-知觉评估密切相关的自动可理解性评估,需要一个既能捕捉发音困难语音的声学特征又能捕捉与单词发音相关的语言结构的模型。受临床医生实践的启发,本研究引入了一种新的文本引导的障碍语音可理解性评估框架,该框架利用自定义关键词识别(DySIA-CKWS)。该模型通过检测特定的关键字来评估可理解性,并使用UA-Speech数据库进行广泛的测试,用于说话人分析和不同复杂性的词组。为了保证系统的鲁棒性,在TORGO数据库上进一步验证了系统的性能,证明了系统在跨数据库设置下的适应性。统计分析表明,在UA-Speech数据库上使用该系统,预测的可理解性分数与主观可理解性分数之间具有很强的一致性,Pearson相关系数(PCC)为0.9588,Spearman相关系数(SCC)为0.9141。研究结果强调了词语选择的重要性,并展示了该模型在诊断语言障碍方面的有效性,为可理解性评估方法提供了重大进展。
{"title":"Dysarthric Speech Intelligibility Assessment by Custom Keyword Spotting","authors":"Anuprabha M;Krishna Gurugubelli;Anil Kumar Vuppala","doi":"10.1109/JSTSP.2025.3604709","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3604709","url":null,"abstract":"Traditionally, dysarthric speech intelligibility assessment systems have focused on speech as the primary input, utilizing methods such as extraction of relevant speech features, classification models, alignment of Automatic Speech Recognition (ASR) outputs, and comparisons between speech representations of dysarthric and healthy speakers. However, to achieve an automated intelligibility assessment that closely mirrors the auditory-perceptual evaluations conducted by clinicians, a model that captures both the acoustic characteristics of dysarthric speech and the linguistic structure related to word pronunciation are needed. Inspired by the practices of clinicians, this study introduces a novel text-guided dysarthric speech intelligibility assessment framework that leverages custom keyword spotting (DySIA-CKWS). The model evaluates intelligibility by detecting specific keywords and is extensively tested using UA-Speech database for speaker-wise analysis and across word groups of varying complexity. To ensure robustness, the system’s performance is further validated on TORGO database, demonstrating its adaptability in cross-database settings. Statistical analysis demonstrates strong alignment between predicted and subjective intelligibility scores, with a Pearson Correlation Coefficient (PCC) of 0.9588 and a Spearman’s Correlation Coefficient (SCC) of 0.9141, achieved using the proposed system on the UA-Speech database. The findings emphasize the importance of word selection and showcase the model’s effectiveness in diagnosing dysarthric speech, offering a significant advancement in intelligibility assessment methodologies.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 5","pages":"757-766"},"PeriodicalIF":13.7,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145659158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accelerating Multiuser Beamforming With Full-Dimension One-Bit Chains 利用全维位链加速多用户波束形成
IF 13.7 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-08-01 DOI: 10.1109/JSTSP.2025.3590607
Lina Liu;Dongning Guo
Massive multiple-input multiple-output (MIMO) systems are vital for achieving high spectral efficiencies at mid-band and millimeter wave frequencies. Conventional hybrid MIMO architectures, which use fewer digital chains than antennas, offer a balance between performance, cost, and energy consumption but often prolong channel estimation. This paper proposes a novel architecture that integrates a set of full-dimension digital chains with one-bit analog-to-digital converters (ADCs) to overcome these limitations and provide an alternative trade-off. By assigning one digital chain to each receive antenna, the proposed approach captures energy from all receive antennas and accelerates angle-of-arrival (AoA) estimation and beam computation. Likelihood-based AoA estimation methods are developed to optimize analog beamforming in narrowband and wideband channels, in both single-user and multiuser scenarios. Numerical results, including the equivalent signal-to-noise ratio per bit post-equalization, demonstrate that full-dimension one-bit digital chains significantly improve the efficiency of beamforming.
大规模多输入多输出(MIMO)系统对于实现中频和毫米波频率的高频谱效率至关重要。传统的混合MIMO架构使用比天线更少的数字链,可以在性能、成本和能耗之间取得平衡,但通常会延长信道估计时间。本文提出了一种新颖的架构,该架构将一组全维数字链与1位模数转换器(adc)集成在一起,以克服这些限制并提供另一种权衡。该方法通过为每个接收天线分配一个数字链,捕获所有接收天线的能量,加快了到达角估计和波束计算。开发了基于似然的AoA估计方法,以优化窄带和宽带信道中单用户和多用户场景下的模拟波束形成。计算结果表明,全维位数字链显著提高了波束形成的效率,其中包括每比特后均衡的等效信噪比。
{"title":"Accelerating Multiuser Beamforming With Full-Dimension One-Bit Chains","authors":"Lina Liu;Dongning Guo","doi":"10.1109/JSTSP.2025.3590607","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3590607","url":null,"abstract":"Massive multiple-input multiple-output (MIMO) systems are vital for achieving high spectral efficiencies at mid-band and millimeter wave frequencies. Conventional hybrid MIMO architectures, which use fewer digital chains than antennas, offer a balance between performance, cost, and energy consumption but often prolong channel estimation. This paper proposes a novel architecture that integrates a set of full-dimension digital chains with one-bit analog-to-digital converters (ADCs) to overcome these limitations and provide an alternative trade-off. By assigning one digital chain to each receive antenna, the proposed approach captures energy from all receive antennas and accelerates angle-of-arrival (AoA) estimation and beam computation. Likelihood-based AoA estimation methods are developed to optimize analog beamforming in narrowband and wideband channels, in both single-user and multiuser scenarios. Numerical results, including the equivalent signal-to-noise ratio per bit post-equalization, demonstrate that full-dimension one-bit digital chains significantly improve the efficiency of beamforming.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 6","pages":"1203-1217"},"PeriodicalIF":13.7,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145852547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Overview of Automatic Speech Analysis and Technologies for Neurodegenerative Disorders: Diagnosis and Assistive Applications 神经退行性疾病的自动语音分析和技术综述:诊断和辅助应用
IF 13.7 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-21 DOI: 10.1109/JSTSP.2025.3591062
Shakeel A. Sheikh;Md. Sahidullah;Ina Kodrasi
Advancements in spoken language technologies for neurodegenerative speech disorders are crucial for meeting both clinical and technological needs. This overview paper is vital for advancing the field, as it presents a comprehensive review of state-of-the-art methods in pathological speech detection, automatic speech recognition, pathological speech intelligibility enhancement, intelligibility and severity assessment, and data augmentation approaches for pathological speech. It also highlights key challenges, such as ensuring robustness, privacy, and interpretability. The paper concludes by exploring promising future directions, including the adoption of multimodal approaches and the integration of large language models to further advance speech technologies for neurodegenerative speech disorders.
神经退行性语言障碍口语技术的进步对于满足临床和技术需求至关重要。这篇综述论文对于推进这一领域至关重要,因为它全面回顾了病理语音检测、自动语音识别、病理语音可理解性增强、可理解性和严重性评估以及病理语音数据增强方法等方面的最新方法。它还强调了关键挑战,例如确保健壮性、隐私性和可解释性。本文最后探讨了未来的发展方向,包括采用多模态方法和整合大型语言模型,以进一步推进神经退行性语言障碍的语音技术。
{"title":"Overview of Automatic Speech Analysis and Technologies for Neurodegenerative Disorders: Diagnosis and Assistive Applications","authors":"Shakeel A. Sheikh;Md. Sahidullah;Ina Kodrasi","doi":"10.1109/JSTSP.2025.3591062","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3591062","url":null,"abstract":"Advancements in spoken language technologies for neurodegenerative speech disorders are crucial for meeting both clinical and technological needs. This overview paper is vital for advancing the field, as it presents a comprehensive review of state-of-the-art methods in pathological speech detection, automatic speech recognition, pathological speech intelligibility enhancement, intelligibility and severity assessment, and data augmentation approaches for pathological speech. It also highlights key challenges, such as ensuring robustness, privacy, and interpretability. The paper concludes by exploring promising future directions, including the adoption of multimodal approaches and the integration of large language models to further advance speech technologies for neurodegenerative speech disorders.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 5","pages":"700-716"},"PeriodicalIF":13.7,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145659229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mixed-Precision Quantization: Make the Best Use of Bits Where They Matter Most 混合精度量化:在最重要的地方充分利用比特
IF 13.7 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-16 DOI: 10.1109/JSTSP.2025.3589745
Yiming Fang;Li Chen;Yunfei Chen;Weidong Wang;Changsheng You
Mixed-precision quantization offers superior performance to fixed-precision quantization. It has been widely used in signal processing, communication systems, and machine learning. In mixed-precision quantization, bit allocation is essential. Hence, in this paper, we propose a new bit allocation framework for mixed-precision quantization from a search perspective. First, we formulate a general bit allocation problem for mixed-precision quantization. Then we introduce the penalized particle swarm optimization (PPSO) algorithm to address the integer consumption constraint. To improve efficiency and avoid iterations on infeasible solutions within the PPSO algorithm, a greedy criterion particle swarm optimization (GC-PSO) algorithm is proposed. The corresponding convergence analysis is derived based on dynamical system theory. Furthermore, we apply the above framework to some specific classic fields, i.e., finite impulse response (FIR) filters, receivers, and gradient descent. Numerical examples in each application underscore the superiority of the proposed framework to the existing algorithms.
混合精度量化具有比固定精度量化更优越的性能。它已被广泛应用于信号处理、通信系统和机器学习。在混合精度量化中,比特分配至关重要。因此,本文从搜索的角度提出了一种新的混合精度量化位分配框架。首先,我们提出了混合精度量化的一般位分配问题。然后引入惩罚粒子群算法(PPSO)来解决整数消耗约束问题。为了提高粒子群优化算法的效率和避免对不可行解的迭代,提出了一种贪心准则粒子群优化算法。基于动力系统理论,推导了相应的收敛性分析。此外,我们将上述框架应用于一些特定的经典领域,即有限脉冲响应(FIR)滤波器,接收器和梯度下降。每个应用中的数值例子都强调了所提出的框架相对于现有算法的优越性。
{"title":"Mixed-Precision Quantization: Make the Best Use of Bits Where They Matter Most","authors":"Yiming Fang;Li Chen;Yunfei Chen;Weidong Wang;Changsheng You","doi":"10.1109/JSTSP.2025.3589745","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3589745","url":null,"abstract":"Mixed-precision quantization offers superior performance to fixed-precision quantization. It has been widely used in signal processing, communication systems, and machine learning. In mixed-precision quantization, bit allocation is essential. Hence, in this paper, we propose a new bit allocation framework for mixed-precision quantization from a search perspective. First, we formulate a general bit allocation problem for mixed-precision quantization. Then we introduce the penalized particle swarm optimization (PPSO) algorithm to address the integer consumption constraint. To improve efficiency and avoid iterations on infeasible solutions within the PPSO algorithm, a greedy criterion particle swarm optimization (GC-PSO) algorithm is proposed. The corresponding convergence analysis is derived based on dynamical system theory. Furthermore, we apply the above framework to some specific classic fields, i.e., finite impulse response (FIR) filters, receivers, and gradient descent. Numerical examples in each application underscore the superiority of the proposed framework to the existing algorithms.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 6","pages":"1218-1233"},"PeriodicalIF":13.7,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145852508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Resolving Domain Mismatches in Electrolaryngeal Speech Enhancement With Linguistic Intermediates 用语言中间体解决喉电语音增强中的域不匹配问题
IF 13.7 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-06-30 DOI: 10.1109/JSTSP.2025.3584195
Lester Phillip Violeta;Wen-Chin Huang;Ding Ma;Ryuichi Yamamoto;Kazuhiro Kobayashi;Tomoki Toda
We investigate the use of linguistic intermediates to resolve domain mismatches in the electrolaryngeal (EL) speech enhancement task. We first propose the use of linguistic encoders to produce bottleneck feature intermediates, and use a recognition, alignment, and synthesis framework, effectively improving performance due to the removal of the timbre mismatches between the pretraining (typical) and fine-tuning (EL) data. We then further improve this by introducing discrete text intermediates, which effectively alleviate temporal mismatches between the source (EL) and target (typical) data to improve prosody modeling. Our findings show that by simply using bottleneck feature intermediates, more intelligible and naturally sounding speech can already be synthesized, as shown by a significant 16% improvement in character error rate and 0.83 improvement in naturalness score compared to the baseline. Moreover, through the use of discrete phoneme-level intermediates, we can further improve the modeling of the temporal structure of typical speech and get another absolute improvement of 1.4% in character error rate and 0.2 in naturalness compared to the initially proposed system. Finally, we also verify these findings on a larger pseudo-EL dataset of 14 speakers and another set of 3 real-world EL speakers, which consistently show that using the phoneme-level intermediates is most effective approach in terms of phoneme error rate. We conclude the research by summarizing the advantages and disadvantages of each proposed technique.
我们研究了使用语言中间体来解决电喉(EL)语音增强任务中的域不匹配。我们首先提出使用语言编码器来产生瓶颈特征中间体,并使用识别,对齐和合成框架,由于消除了预训练(典型)和微调(EL)数据之间的音色不匹配,有效地提高了性能。然后,我们通过引入离散文本中间体进一步改进了这一点,该中间体有效地缓解了源(EL)和目标(典型)数据之间的时间不匹配,以改进韵律建模。我们的研究结果表明,通过简单地使用瓶颈特征中间部分,可以合成更容易理解和听起来更自然的语音,与基线相比,字符错误率显著提高了16%,自然度得分提高了0.83。此外,通过使用离散音位中间体,我们可以进一步改进典型语音时间结构的建模,与最初提出的系统相比,字符错误率提高1.4%,自然度提高0.2。最后,我们还在一个包含14名说话者的伪EL数据集和另一组包含3名真实EL说话者的数据集上验证了这些发现,结果一致表明,就音素错误率而言,使用音素级中间词是最有效的方法。我们通过总结每种提出的技术的优缺点来结束研究。
{"title":"Resolving Domain Mismatches in Electrolaryngeal Speech Enhancement With Linguistic Intermediates","authors":"Lester Phillip Violeta;Wen-Chin Huang;Ding Ma;Ryuichi Yamamoto;Kazuhiro Kobayashi;Tomoki Toda","doi":"10.1109/JSTSP.2025.3584195","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3584195","url":null,"abstract":"We investigate the use of linguistic intermediates to resolve domain mismatches in the electrolaryngeal (EL) speech enhancement task. We first propose the use of linguistic encoders to produce bottleneck feature intermediates, and use a recognition, alignment, and synthesis framework, effectively improving performance due to the removal of the timbre mismatches between the pretraining (typical) and fine-tuning (EL) data. We then further improve this by introducing discrete text intermediates, which effectively alleviate temporal mismatches between the source (EL) and target (typical) data to improve prosody modeling. Our findings show that by simply using bottleneck feature intermediates, more intelligible and naturally sounding speech can already be synthesized, as shown by a significant 16% improvement in character error rate and 0.83 improvement in naturalness score compared to the baseline. Moreover, through the use of discrete phoneme-level intermediates, we can further improve the modeling of the temporal structure of typical speech and get another absolute improvement of 1.4% in character error rate and 0.2 in naturalness compared to the initially proposed system. Finally, we also verify these findings on a larger pseudo-EL dataset of 14 speakers and another set of 3 real-world EL speakers, which consistently show that using the phoneme-level intermediates is most effective approach in terms of phoneme error rate. We conclude the research by summarizing the advantages and disadvantages of each proposed technique.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 5","pages":"827-839"},"PeriodicalIF":13.7,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11059307","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145659225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Signal Processing Society Publication Information IEEE信号处理学会出版物信息
IF 8.7 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-06-27 DOI: 10.1109/JSTSP.2025.3570399
{"title":"IEEE Signal Processing Society Publication Information","authors":"","doi":"10.1109/JSTSP.2025.3570399","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3570399","url":null,"abstract":"","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 4","pages":"C2-C2"},"PeriodicalIF":8.7,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11054319","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144501942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Journal of Selected Topics in Signal Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1