首页 > 最新文献

Computer Speech and Language最新文献

英文 中文
Multimodal laryngoscopic video analysis for assisted diagnosis of vocal fold paralysis 多模态喉镜视频分析对声带麻痹的辅助诊断
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-06 DOI: 10.1016/j.csl.2025.101891
Yucong Zhang , Xin Zou , Jinshan Yang , Wenjun Chen , Juan Liu , Faya Liang , Ming Li
This paper presents the Multimodal Laryngoscopic Video Analyzing System (MLVAS),2 a novel system that leverages both audio and video data to automatically extract key video segments and metrics from raw laryngeal videostroboscopic videos for assisted clinical assessment. The system integrates video-based glottis detection with an audio keyword spotting method to analyze both video and audio data, identifying patient vocalizations and refining video highlights to ensure optimal inspection of vocal fold movements. Beyond key video segment extraction from the raw laryngeal videos, MLVAS is able to generate effective audio and visual features for Vocal Fold Paralysis (VFP) detection. Pre-trained audio encoders are utilized to encode the patient voice to get the audio features. Visual features are generated by measuring the angle deviation of both the left and right vocal folds to the estimated glottal midline on the segmented glottis masks. To get better masks, we introduce a diffusion-based refinement that follows traditional U-Net segmentation to reduce false positives. We conducted several ablation studies to demonstrate the effectiveness of each module and modalities in the proposed MLVAS. The experimental results on a public segmentation dataset show the effectiveness of our proposed segmentation module. In addition, unilateral VFP classification results on a real-world clinic dataset demonstrate MLVAS’s ability of providing reliable and objective metrics as well as visualization for assisted clinical diagnosis.
本文介绍了多模态喉镜视频分析系统(MLVAS),这是一个利用音频和视频数据从原始喉镜视频中自动提取关键视频片段和指标以辅助临床评估的新系统。该系统将基于视频的声门检测与音频关键字识别方法相结合,分析视频和音频数据,识别患者发声并精炼视频亮点,以确保对声带运动的最佳检查。除了从原始喉部视频中提取关键视频片段外,MLVAS还能够为声带麻痹(VFP)检测生成有效的音频和视觉特征。利用预训练的音频编码器对患者语音进行编码以获得音频特征。视觉特征是通过在分段声门掩膜上测量左右声带与估计的声门中线的角度偏差来产生的。为了获得更好的掩码,我们在传统的U-Net分割之后引入了基于扩散的细化,以减少误报。我们进行了几项消融研究,以证明在拟议的MLVAS中每个模块和模式的有效性。在公共分割数据集上的实验结果表明了所提出的分割模块的有效性。此外,在真实临床数据集上的单侧VFP分类结果表明,MLVAS能够为辅助临床诊断提供可靠和客观的指标以及可视化。
{"title":"Multimodal laryngoscopic video analysis for assisted diagnosis of vocal fold paralysis","authors":"Yucong Zhang ,&nbsp;Xin Zou ,&nbsp;Jinshan Yang ,&nbsp;Wenjun Chen ,&nbsp;Juan Liu ,&nbsp;Faya Liang ,&nbsp;Ming Li","doi":"10.1016/j.csl.2025.101891","DOIUrl":"10.1016/j.csl.2025.101891","url":null,"abstract":"<div><div>This paper presents the Multimodal Laryngoscopic Video Analyzing System (MLVAS),<span><span><sup>2</sup></span></span> a novel system that leverages both audio and video data to automatically extract key video segments and metrics from raw laryngeal videostroboscopic videos for assisted clinical assessment. The system integrates video-based glottis detection with an audio keyword spotting method to analyze both video and audio data, identifying patient vocalizations and refining video highlights to ensure optimal inspection of vocal fold movements. Beyond key video segment extraction from the raw laryngeal videos, MLVAS is able to generate effective audio and visual features for Vocal Fold Paralysis (VFP) detection. Pre-trained audio encoders are utilized to encode the patient voice to get the audio features. Visual features are generated by measuring the angle deviation of both the left and right vocal folds to the estimated glottal midline on the segmented glottis masks. To get better masks, we introduce a diffusion-based refinement that follows traditional U-Net segmentation to reduce false positives. We conducted several ablation studies to demonstrate the effectiveness of each module and modalities in the proposed MLVAS. The experimental results on a public segmentation dataset show the effectiveness of our proposed segmentation module. In addition, unilateral VFP classification results on a real-world clinic dataset demonstrate MLVAS’s ability of providing reliable and objective metrics as well as visualization for assisted clinical diagnosis.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"96 ","pages":"Article 101891"},"PeriodicalIF":3.4,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145266429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A speech prediction model based on codec modeling and transformer decoding 基于编解码器建模和变压器解码的语音预测模型
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-26 DOI: 10.1016/j.csl.2025.101892
Heming Wang , Yufeng Yang , DeLiang Wang
Speech prediction is essential for tasks like packet loss concealment and algorithmic delay compensation. This paper proposes a novel prediction algorithm that leverages a speech codec and transformer decoder to autoregressively predict missing frames. Unlike text-guided methods requiring auxiliary information, the proposed approach operates solely on speech for prediction. A comparative study is conducted to evaluate and compare the proposed and existing speech prediction methods on packet loss concealment (PLC) and frame-wise speech prediction tasks. Comprehensive experiments demonstrate that the proposed model achieves superior prediction results, which are substantially better than other state-of-the-art baselines, including on a recent PLC challenge. We also systematically examine factors influencing prediction performance, including context window lengths, prediction lengths, and training and inference strategies.
语音预测对于丢包隐藏和算法延迟补偿等任务至关重要。本文提出了一种利用语音编解码器和变换解码器自回归预测缺失帧的预测算法。与需要辅助信息的文本引导方法不同,本文提出的方法仅对语音进行预测。针对丢包隐藏(PLC)和逐帧语音预测任务,对提出的语音预测方法和现有的语音预测方法进行了评价和比较。综合实验表明,所提出的模型实现了优越的预测结果,这大大优于其他最先进的基线,包括最近的PLC挑战。我们还系统地研究了影响预测性能的因素,包括上下文窗口长度、预测长度、训练和推理策略。
{"title":"A speech prediction model based on codec modeling and transformer decoding","authors":"Heming Wang ,&nbsp;Yufeng Yang ,&nbsp;DeLiang Wang","doi":"10.1016/j.csl.2025.101892","DOIUrl":"10.1016/j.csl.2025.101892","url":null,"abstract":"<div><div>Speech prediction is essential for tasks like packet loss concealment and algorithmic delay compensation. This paper proposes a novel prediction algorithm that leverages a speech codec and transformer decoder to autoregressively predict missing frames. Unlike text-guided methods requiring auxiliary information, the proposed approach operates solely on speech for prediction. A comparative study is conducted to evaluate and compare the proposed and existing speech prediction methods on packet loss concealment (PLC) and frame-wise speech prediction tasks. Comprehensive experiments demonstrate that the proposed model achieves superior prediction results, which are substantially better than other state-of-the-art baselines, including on a recent PLC challenge. We also systematically examine factors influencing prediction performance, including context window lengths, prediction lengths, and training and inference strategies.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"96 ","pages":"Article 101892"},"PeriodicalIF":3.4,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145157458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AIPO: Automatic Instruction Prompt Optimization by model itself with “Gradient Ascent” AIPO:基于“梯度上升”的模型自动指令提示优化
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-23 DOI: 10.1016/j.csl.2025.101889
Kyeonghye Park, Daeshik Kim
Large language models (LLMs) can perform a variety of tasks such as summarization, translation, and question answering by generating answers with user input prompt. The text that is used as input to the model, including instruction, is called input prompt. There are two types of input prompt: zero-shot prompting provides a question with no examples, on the other hand, few-shot prompting provides a question with multiple examples. The way the input prompt is set can have a big impact on the accuracy of the model generation. The relevant research is called prompt engineering. Prompt engineering, especially prompt optimization is used to find the optimal prompts optimized for each model and task. Manually written prompts could be optimal prompts, but it is time-consuming and expensive. Therefore, research is being conducted on automatically generating prompts that are as effective as human-crafted ones for each task. We propose Automatic Instruction Prompt Optimization (AIPO), which allows the model to generate an initial prompt directly through instruction induction when given a task in a zero-shot setting and then improve the initial prompt to optimal prompt for model based on the “gradient ascent” algorithm. With the final prompt generated by AIPO, we achieve more accurate generation than manual prompt on benchmark datasets regardless of the output format.
大型语言模型(llm)可以通过生成带有用户输入提示的答案来执行各种任务,例如摘要、翻译和问题回答。用于模型输入的文本(包括指令)称为输入提示符。有两种类型的输入提示:zero-shot提示提供一个没有示例的问题,另一方面,few-shot提示提供一个有多个示例的问题。设置输入提示符的方式对模型生成的准确性有很大的影响。相关的研究被称为提示工程。提示工程,特别是提示优化,用于找到针对每个模型和任务优化的最佳提示。手动编写提示可能是最佳提示,但它既耗时又昂贵。因此,研究人员正在进行自动生成提示的研究,这些提示与人工制作的提示一样有效。我们提出了自动指令提示优化(AIPO),它允许模型在零射击设置下直接通过指令归纳生成初始提示,然后基于“梯度上升”算法将初始提示改进为模型的最优提示。使用AIPO生成的最终提示符,无论输出格式如何,我们都可以在基准数据集上实现比手动提示符更准确的生成。
{"title":"AIPO: Automatic Instruction Prompt Optimization by model itself with “Gradient Ascent”","authors":"Kyeonghye Park,&nbsp;Daeshik Kim","doi":"10.1016/j.csl.2025.101889","DOIUrl":"10.1016/j.csl.2025.101889","url":null,"abstract":"<div><div>Large language models (LLMs) can perform a variety of tasks such as summarization, translation, and question answering by generating answers with user input prompt. The text that is used as input to the model, including instruction, is called input prompt. There are two types of input prompt: zero-shot prompting provides a question with no examples, on the other hand, few-shot prompting provides a question with multiple examples. The way the input prompt is set can have a big impact on the accuracy of the model generation. The relevant research is called prompt engineering. Prompt engineering, especially prompt optimization is used to find the optimal prompts optimized for each model and task. Manually written prompts could be optimal prompts, but it is time-consuming and expensive. Therefore, research is being conducted on automatically generating prompts that are as effective as human-crafted ones for each task. We propose <em>Automatic Instruction Prompt Optimization</em> (AIPO), which allows the model to generate an initial prompt directly through instruction induction when given a task in a zero-shot setting and then improve the initial prompt to optimal prompt for model based on the “gradient ascent” algorithm. With the final prompt generated by AIPO, we achieve more accurate generation than manual prompt on benchmark datasets regardless of the output format.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"96 ","pages":"Article 101889"},"PeriodicalIF":3.4,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145157457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editorial: Special issue on security and privacy in speech communication 社论:关于语言交流中的安全和隐私的特刊
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-20 DOI: 10.1016/j.csl.2025.101890
Ingo Siegert, Sneha Das, Jennifer Williams
{"title":"Editorial: Special issue on security and privacy in speech communication","authors":"Ingo Siegert,&nbsp;Sneha Das,&nbsp;Jennifer Williams","doi":"10.1016/j.csl.2025.101890","DOIUrl":"10.1016/j.csl.2025.101890","url":null,"abstract":"","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"97 ","pages":"Article 101890"},"PeriodicalIF":3.4,"publicationDate":"2025-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145736605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Continual End-to-End Speech-to-Text translation using augmented bi-sampler 使用增强双采样器的连续端到端语音到文本翻译
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-20 DOI: 10.1016/j.csl.2025.101885
Balaram Sarkar, Pranav Karande, Ankit Malviya, Chandresh Kumar Maurya
Speech-to-Text (ST) is the translation of speech in one language to text in another language. Earlier models for ST used a pipeline approach combining automatic speech recognition (ASR) and machine translation (MT). Such models suffer from cascade error propagation, high latency and memory consumption. Therefore, End-to-End (E2E) ST models were proposed. Adapting E2E ST models to new language pairs results in deterioration of performance on the previously trained language pairs. This phenomenon is called Catastrophic Forgetting (CF). Therefore, we need ST models that can learn continually. The present work proposes a novel continual learning (CL) framework for E2E ST tasks. The core idea behind our approach combines proportional-language sampling (PLS), random sampling (RS), and augmentation. RS helps in performing well on the current task by sampling aggressively from it. PLS is used to sample equal proportion from past task data but it may cause over-fitting. To mitigate that, a combined approach of PLS+RS is used, dubbed as continual bi-sampler (CBS). However, CBS still suffers from over-fitting due to repeated samples from the past tasks. Therefore, we apply various augmentation strategies combined with CBS which we call continual augmented bi-sampler (CABS). We perform experiments on 4 language pairs of MuST-C (One to Many) and mTEDx (Many to Many) datasets and achieve a gain of 68.38% and 41% respectively in the average BLEU score compared to baselines. CABS also mitigates the average forgetting by 82.2% in MuST-C dataset compared to the Gradient Episodic Memory (GEM) baseline. The results show that the proposed CL based E2E ST ensures knowledge retention across previously trained languages. To the best of our knowledge, E2E ST model has not been studied before in a CL setup.
语音到文本(ST)是将一种语言的语音翻译成另一种语言的文本。早期的ST模型使用了结合自动语音识别(ASR)和机器翻译(MT)的管道方法。这种模型受到级联错误传播、高延迟和内存消耗的影响。因此,提出了端到端(E2E) ST模型。将E2E - ST模型应用于新的语言对,会导致在先前训练过的语言对上的性能下降。这种现象被称为灾难性遗忘(CF)。因此,我们需要能够持续学习的ST模型。本工作提出了一个新的持续学习(CL)框架的E2E ST任务。我们的方法背后的核心思想结合了比例语言抽样(PLS)、随机抽样(RS)和增强。RS通过积极地从当前任务中抽样来帮助更好地完成当前任务。PLS从过去的任务数据中取等比例的样本,但可能导致过拟合。为了减轻这种情况,使用了PLS+RS的组合方法,称为连续双采样器(CBS)。然而,由于来自过去任务的重复样本,CBS仍然存在过拟合的问题。因此,我们将各种增强策略与CBS相结合,我们称之为连续增强双采样器(CABS)。我们在4个语言对的MuST-C (One to Many)和mTEDx (Many to Many)数据集上进行了实验,BLEU平均分比基线分别提高了68.38%和41%。与梯度情景记忆(GEM)基线相比,CABS还减轻了MuST-C数据集中82.2%的平均遗忘。结果表明,本文提出的基于CL的E2E - ST确保了之前训练过的语言之间的知识保留。据我们所知,之前还没有在CL设置中研究过E2E ST模型。
{"title":"Continual End-to-End Speech-to-Text translation using augmented bi-sampler","authors":"Balaram Sarkar,&nbsp;Pranav Karande,&nbsp;Ankit Malviya,&nbsp;Chandresh Kumar Maurya","doi":"10.1016/j.csl.2025.101885","DOIUrl":"10.1016/j.csl.2025.101885","url":null,"abstract":"<div><div>Speech-to-Text (ST) is the translation of speech in one language to text in another language. Earlier models for ST used a pipeline approach combining automatic speech recognition (ASR) and machine translation (MT). Such models suffer from cascade error propagation, high latency and memory consumption. Therefore, End-to-End (E2E) ST models were proposed. Adapting E2E ST models to new language pairs results in deterioration of performance on the previously trained language pairs. This phenomenon is called Catastrophic Forgetting (CF). Therefore, we need ST models that can learn continually. The present work proposes a novel continual learning (CL) framework for E2E ST tasks. The core idea behind our approach combines proportional-language sampling (PLS), random sampling (RS), and augmentation. RS helps in performing well on the current task by sampling aggressively from it. PLS is used to sample equal proportion from past task data but it may cause over-fitting. To mitigate that, a combined approach of PLS+RS is used, dubbed as continual bi-sampler (CBS). However, CBS still suffers from over-fitting due to repeated samples from the past tasks. Therefore, we apply various augmentation strategies combined with CBS which we call continual augmented bi-sampler (CABS). We perform experiments on 4 language pairs of MuST-C (One to Many) and mTEDx (Many to Many) datasets and achieve a gain of <strong>68.38%</strong> and <strong>41%</strong> respectively in the average BLEU score compared to baselines. CABS also mitigates the average forgetting by <strong>82.2%</strong> in MuST-C dataset compared to the Gradient Episodic Memory (GEM) baseline. The results show that the proposed CL based E2E ST ensures knowledge retention across previously trained languages. To the best of our knowledge, E2E ST model has not been studied before in a CL setup.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"96 ","pages":"Article 101885"},"PeriodicalIF":3.4,"publicationDate":"2025-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145157456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic design optimization of preference-based subjective evaluation with online learning in crowdsourcing environment 众包环境下基于偏好的主观评价在线学习自动设计优化
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-12 DOI: 10.1016/j.csl.2025.101888
Yusuke Yasuda, Tomoki Toda
Preference-based subjective evaluation is a key method for reliably evaluating generative media. However, its huge number of pair combinations makes it prohibitively difficult to apply to large-scale evaluation using crowdsourcing. To address this issue, we propose an automatic optimization method for preference-based subjective evaluation in terms of pair combination selections and the allocation of evaluation volumes with online learning in a crowdsourcing environment. We use a preference-based online learning method based on a sorting algorithm to identify the total order of systems with minimum sample volumes. Our online learning algorithm supports parallel and asynchronous executions under fixed-budget conditions required for crowdsourcing. Our experiment on the preference-based subjective evaluation of synthetic speech on naturalness shows that our method successfully optimizes the preference-based test by reducing the number of pair combinations from 351 to 83 and allocating optimal evaluation volumes for each pair ranging from 30 to 663 without compromising evaluation errors and wasting budget allocations.
基于偏好的主观评价是可靠评价生成媒体的关键方法。然而,它的大量配对组合使得使用众包进行大规模评估变得非常困难。为了解决这一问题,我们提出了一种基于偏好的主观评价的自动优化方法,包括在众包环境下对组合的选择和评价量的分配。我们使用基于排序算法的基于偏好的在线学习方法来识别具有最小样本体积的系统的总顺序。我们的在线学习算法支持在众包所需的固定预算条件下并行和异步执行。我们对基于偏好的合成语音自然度主观评价的实验表明,我们的方法成功地优化了基于偏好的测试,将配对组合的数量从351个减少到83个,并为每对分配30到663个最优评价量,而不影响评价误差和浪费预算分配。
{"title":"Automatic design optimization of preference-based subjective evaluation with online learning in crowdsourcing environment","authors":"Yusuke Yasuda,&nbsp;Tomoki Toda","doi":"10.1016/j.csl.2025.101888","DOIUrl":"10.1016/j.csl.2025.101888","url":null,"abstract":"<div><div>Preference-based subjective evaluation is a key method for reliably evaluating generative media. However, its huge number of pair combinations makes it prohibitively difficult to apply to large-scale evaluation using crowdsourcing. To address this issue, we propose an automatic optimization method for preference-based subjective evaluation in terms of pair combination selections and the allocation of evaluation volumes with online learning in a crowdsourcing environment. We use a preference-based online learning method based on a sorting algorithm to identify the total order of systems with minimum sample volumes. Our online learning algorithm supports parallel and asynchronous executions under fixed-budget conditions required for crowdsourcing. Our experiment on the preference-based subjective evaluation of synthetic speech on naturalness shows that our method successfully optimizes the preference-based test by reducing the number of pair combinations from 351 to 83 and allocating optimal evaluation volumes for each pair ranging from 30 to 663 without compromising evaluation errors and wasting budget allocations.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"96 ","pages":"Article 101888"},"PeriodicalIF":3.4,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145105453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Public perceptions of speech technology trust in the United Kingdom 英国公众对语音技术信任的看法
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-12 DOI: 10.1016/j.csl.2025.101884
Jennifer Williams , Tayyaba Azim , Anna-Maria Piskopani , Richard Hyde , Shuo Zhang , Zack Hodari
Speech technology is now pervasive throughout the world, impacting a variety of socio-technical use-cases. Speech technology is a broad term encompassing capabilities that translate, analyse, transcribe, generate, modify, enhance, or summarise human speech. Many of the technical features and the possibility of speech data misuse are not often revealed to the users of such systems. When combined with the rapid development of AI and the plethora of use-cases where speech-based AI systems are now being applied, the consequence is that researchers, regulators, designers and government policymakers still have little understanding of the public’s perception of speech technology. Our research explores the public’s perceptions of trust in speech technology by asking people about their experiences, awareness of their rights, their susceptibility to being harmed, their expected behaviour, and ethical choices governing behavioural responsibility. We adopt a multidisciplinary lens to our work, in order to present a fuller picture of the United Kingdom (UK) public perspective through a series of socio-technical scenarios in a large-scale survey. We analysed survey responses from 1,000 participants from the UK, where people from different walks of life were asked to reflect on existing, emerging, and hypothetical speech technologies. Our socio-technical scenarios are designed to provoke and stimulate debate and discussion on principles of trust, privacy, responsibility, fairness, and transparency. We found that gender is a statistically significant factor correlated to awareness of rights and trust. We also found that awareness of rights is statistically correlated to perceptions of trust and responsible use of speech technology. By understanding the notions of responsibility in behaviour and differing perspectives of trust, our work encapsulates the current state of public acceptance of speech technology in the UK. Such an understanding has the potential to affect how regulatory and policy frameworks are developed, how the UK invests in its AI research and development ecosystem, and how speech technology that is developed within the UK might be received by global stakeholders.
语音技术现在遍布世界各地,影响着各种社会技术用例。语音技术是一个广义的术语,包括翻译、分析、转录、生成、修改、增强或总结人类语音的能力。许多技术特征和语音数据误用的可能性通常不会向此类系统的用户透露。再加上人工智能的快速发展,以及基于语音的人工智能系统正在应用的大量用例,其后果是,研究人员、监管机构、设计师和政府政策制定者仍然对公众对语音技术的看法知之甚少。我们的研究通过询问人们的经历、对自己权利的认识、对受到伤害的易感性、预期的行为以及管理行为责任的道德选择,探讨了公众对语音技术信任的看法。我们采用多学科的镜头来我们的工作,为了通过一系列大规模调查的社会技术场景呈现英国(英国)公众视角的更全面的画面。我们分析了来自英国的1000名参与者的调查反馈,这些人来自各行各业,被要求对现有的、新兴的和假设的语音技术进行反思。我们的社会技术场景旨在激发和激发关于信任、隐私、责任、公平和透明原则的辩论和讨论。我们发现,性别是与权利意识和信任相关的统计显著因素。我们还发现,权利意识与信任和负责任地使用语音技术的认知在统计上是相关的。通过理解行为中的责任概念和信任的不同观点,我们的工作概括了英国公众接受语音技术的现状。这种理解有可能影响监管和政策框架的制定方式,英国如何投资其人工智能研发生态系统,以及英国境内开发的语音技术如何被全球利益相关者接受。
{"title":"Public perceptions of speech technology trust in the United Kingdom","authors":"Jennifer Williams ,&nbsp;Tayyaba Azim ,&nbsp;Anna-Maria Piskopani ,&nbsp;Richard Hyde ,&nbsp;Shuo Zhang ,&nbsp;Zack Hodari","doi":"10.1016/j.csl.2025.101884","DOIUrl":"10.1016/j.csl.2025.101884","url":null,"abstract":"<div><div>Speech technology is now pervasive throughout the world, impacting a variety of socio-technical use-cases. Speech technology is a broad term encompassing capabilities that translate, analyse, transcribe, generate, modify, enhance, or summarise human speech. Many of the technical features and the possibility of speech data misuse are not often revealed to the users of such systems. When combined with the rapid development of AI and the plethora of use-cases where speech-based AI systems are now being applied, the consequence is that researchers, regulators, designers and government policymakers still have little understanding of the public’s perception of speech technology. Our research explores the public’s perceptions of trust in speech technology by asking people about their experiences, awareness of their rights, their susceptibility to being harmed, their expected behaviour, and ethical choices governing behavioural responsibility. We adopt a multidisciplinary lens to our work, in order to present a fuller picture of the United Kingdom (UK) public perspective through a series of socio-technical scenarios in a large-scale survey. We analysed survey responses from 1,000 participants from the UK, where people from different walks of life were asked to reflect on existing, emerging, and hypothetical speech technologies. Our socio-technical scenarios are designed to provoke and stimulate debate and discussion on principles of trust, privacy, responsibility, fairness, and transparency. We found that gender is a statistically significant factor correlated to awareness of rights and trust. We also found that awareness of rights is statistically correlated to perceptions of trust and responsible use of speech technology. By understanding the notions of responsibility in behaviour and differing perspectives of trust, our work encapsulates the current state of public acceptance of speech technology in the UK. Such an understanding has the potential to affect how regulatory and policy frameworks are developed, how the UK invests in its AI research and development ecosystem, and how speech technology that is developed within the UK might be received by global stakeholders.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"96 ","pages":"Article 101884"},"PeriodicalIF":3.4,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145105455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advanced noise-aware speech enhancement algorithm via adaptive dictionary selection based on compressed sensing in the time-frequency domain 基于时频域压缩感知的自适应字典选择语音增强算法
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-11 DOI: 10.1016/j.csl.2025.101887
Naser Sharafi , Salman Karimi , Samira Mavaddati
Speech signal enhancement and noise reduction play a vital role in applications such as telecommunications, audio broadcasting, and military systems. This paper proposes a novel speech enhancement method based on compressive sensing principles in the time-frequency domain, incorporating sparse representation and dictionary learning techniques. The proposed method constructs an optimal dictionary of atoms that can sparsely represent clean speech signals. A key component of the framework is a noise-aware block, which leverages multiple pre-trained noise dictionaries along with the spectral features of noisy speech to build a composite noise model. It isolates noise-only segments, computes their sparse coefficients, and evaluates energy contributions across all candidate dictionaries. The dictionary with the highest energy is then selected as the dominant noise type. The algorithm dynamically adapts to handle unseen noise types by selecting the most similar noise structure present in the dictionary pool, offering a degree of generalization. The proposed system is evaluated under three clearly defined scenarios: (i) using a baseline sparse representation model, (ii) incorporating dictionary learning with a fixed noise model, and (iii) employing the full adaptive noise-aware framework. The method demonstrates strong performance against nine types of noise (non-stationary, periodic, and static) across a wide SNR range (-5 dB to +20 dB). On average, it yields 16.71 % improvement in PESQ and 3.39 % in STOI compared to existing techniques. Simulation results confirm the superiority of the proposed approach in both noise suppression and speech intelligibility, highlighting its potential as a robust tool for speech enhancement in real-world noisy environments.
语音信号增强和降噪在电信、音频广播和军事系统等应用中起着至关重要的作用。本文提出了一种基于时频压缩感知原理,结合稀疏表示和字典学习技术的语音增强方法。该方法构建了一个最优的原子字典,可以稀疏地表示干净的语音信号。该框架的一个关键组成部分是噪声感知块,它利用多个预训练的噪声字典以及噪声语音的频谱特征来构建复合噪声模型。它隔离只有噪声的片段,计算它们的稀疏系数,并评估所有候选字典中的能量贡献。然后选择具有最高能量的字典作为主导噪声类型。该算法通过选择字典池中存在的最相似的噪声结构来动态适应处理未见噪声类型,提供了一定程度的泛化。该系统在三种明确定义的场景下进行评估:(i)使用基线稀疏表示模型,(ii)将字典学习与固定噪声模型结合起来,以及(iii)采用完全自适应噪声感知框架。该方法在宽信噪比范围(-5 dB至+20 dB)内对九种类型的噪声(非平稳、周期性和静态)具有很强的性能。平均而言,与现有技术相比,它的PESQ提高了16.71%,STOI提高了3.39%。仿真结果证实了该方法在噪声抑制和语音清晰度方面的优越性,突出了其作为现实世界噪声环境中语音增强的鲁棒工具的潜力。
{"title":"Advanced noise-aware speech enhancement algorithm via adaptive dictionary selection based on compressed sensing in the time-frequency domain","authors":"Naser Sharafi ,&nbsp;Salman Karimi ,&nbsp;Samira Mavaddati","doi":"10.1016/j.csl.2025.101887","DOIUrl":"10.1016/j.csl.2025.101887","url":null,"abstract":"<div><div>Speech signal enhancement and noise reduction play a vital role in applications such as telecommunications, audio broadcasting, and military systems. This paper proposes a novel speech enhancement method based on compressive sensing principles in the time-frequency domain, incorporating sparse representation and dictionary learning techniques. The proposed method constructs an optimal dictionary of atoms that can sparsely represent clean speech signals. A key component of the framework is a noise-aware block, which leverages multiple pre-trained noise dictionaries along with the spectral features of noisy speech to build a composite noise model. It isolates noise-only segments, computes their sparse coefficients, and evaluates energy contributions across all candidate dictionaries. The dictionary with the highest energy is then selected as the dominant noise type. The algorithm dynamically adapts to handle unseen noise types by selecting the most similar noise structure present in the dictionary pool, offering a degree of generalization. The proposed system is evaluated under three clearly defined scenarios: (i) using a baseline sparse representation model, (ii) incorporating dictionary learning with a fixed noise model, and (iii) employing the full adaptive noise-aware framework. The method demonstrates strong performance against nine types of noise (non-stationary, periodic, and static) across a wide SNR range (-5 dB to +20 dB). On average, it yields 16.71 % improvement in PESQ and 3.39 % in STOI compared to existing techniques. Simulation results confirm the superiority of the proposed approach in both noise suppression and speech intelligibility, highlighting its potential as a robust tool for speech enhancement in real-world noisy environments.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"96 ","pages":"Article 101887"},"PeriodicalIF":3.4,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145105456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Electroglottography-based speech content classification using stacked BiLSTM-FCN network for clinical applications 基于电声门图的堆叠BiLSTM-FCN网络语音内容分类的临床应用
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-04 DOI: 10.1016/j.csl.2025.101886
Srinidhi Kanagachalam, Deok-Hwan Kim
In this study, we introduce a newer approach to classify the human speech contents based on Electroglottographic (EGG) signals. In general, identifying human speech using EGG signals is challenging and unaddressed, as human speech may contain pathology due to vocal cord damage. In this paper, we propose a deep learning-based approach called Stacked BiLSTM-FCN to identify the speech contents for both the healthy and pathological person. This deep learning-based technique integrates a recurrent neural network (RNN) that utilizes bidirectional long short-term memory (BiLSTM) with a convolutional network that uses a squeeze and excitation layer, learns features from the EGG signals and classifies them based on the learned features. Experiments on the existing Saarbruecken Voice Database (SVD) dataset containing healthy and pathological voices with different pitch levels showed an accuracy of 92.09% on the proposed model. Further evaluations prove the generalization performance and robustness of the proposed method for application in clinical laboratories to identify speech contents with different pathologies and varying accent types.
在这项研究中,我们提出了一种新的基于声门电信号的人类语音内容分类方法。一般来说,使用EGG信号识别人类语言是具有挑战性和未解决的,因为人类语言可能包含由于声带损伤而引起的病理。在本文中,我们提出了一种基于深度学习的方法,称为堆叠BiLSTM-FCN,用于识别健康人和健康人的语音内容。这种基于深度学习的技术将利用双向长短期记忆(BiLSTM)的循环神经网络(RNN)与使用挤压和激励层的卷积网络集成在一起,从EGG信号中学习特征,并根据学习到的特征对其进行分类。在现有的包含不同音高水平的健康和病理声音的Saarbruecken Voice Database (SVD)数据集上进行的实验表明,该模型的准确率为92.09%。进一步的评估证明了该方法的泛化性能和鲁棒性,适用于临床实验室识别不同病理和不同口音类型的语音内容。
{"title":"Electroglottography-based speech content classification using stacked BiLSTM-FCN network for clinical applications","authors":"Srinidhi Kanagachalam,&nbsp;Deok-Hwan Kim","doi":"10.1016/j.csl.2025.101886","DOIUrl":"10.1016/j.csl.2025.101886","url":null,"abstract":"<div><div>In this study, we introduce a newer approach to classify the human speech contents based on Electroglottographic (EGG) signals. In general, identifying human speech using EGG signals is challenging and unaddressed, as human speech may contain pathology due to vocal cord damage. In this paper, we propose a deep learning-based approach called Stacked BiLSTM-FCN to identify the speech contents for both the healthy and pathological person. This deep learning-based technique integrates a recurrent neural network (RNN) that utilizes bidirectional long short-term memory (BiLSTM) with a convolutional network that uses a squeeze and excitation layer, learns features from the EGG signals and classifies them based on the learned features. Experiments on the existing Saarbruecken Voice Database (SVD) dataset containing healthy and pathological voices with different pitch levels showed an accuracy of 92.09% on the proposed model. Further evaluations prove the generalization performance and robustness of the proposed method for application in clinical laboratories to identify speech contents with different pathologies and varying accent types.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"96 ","pages":"Article 101886"},"PeriodicalIF":3.4,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145048985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep feature representations and fusion strategies for speech emotion recognition from acoustic and linguistic modalities: A systematic review 基于声学和语言模式的语音情感识别的深度特征表示和融合策略:系统综述
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-01 DOI: 10.1016/j.csl.2025.101873
Andrea Chaves-Villota , Ana Jimenez-Martín , Mario Jojoa-Acosta , Alfonso Bahillo , Juan Jesús García-Domínguez
Emotion Recognition (ER) has gained significant attention due to its importance in advanced human-machine interaction and its widespread real-world applications. In recent years, research on ER systems has focused on multiple key aspects, including the development of high-quality emotional databases, the selection of robust feature representations, and the implementation of advanced classifiers leveraging AI-based techniques. Despite this progress in research, ER still faces significant challenges and gaps that must be addressed to develop accurate and reliable systems. To systematically assess these critical aspects, particularly those centered on AI-based techniques, we employed the PRISMA methodology. Thus, we include journal and conference papers that provide essential insights into key parameters required for dataset development, involving emotion modeling (categorical or dimensional), the type of speech data (natural, acted, or elicited), the most common modalities integrated with acoustic and linguistic data from speech and the technologies used. Similarly, following this methodology, we identified the key representative features that serve as critical emotional information sources in both modalities. For acoustic, this included those extracted from the time and frequency domains, while for linguistic, earlier embeddings and the most common transformer models were considered. In addition, Deep Learning (DL) and attention-based methods were analyzed for both. Given the importance of effectively combining these diverse features for improving ER, we then explore fusion techniques based on the level of abstraction. Specifically, we focus on traditional approaches, including feature-, decision-, DL-, and attention-based fusion methods. Next, we provide a comparative analysis to assess the performance of the approaches included in our study. Our findings indicate that for the most commonly used datasets in the literature: IEMOCAP and MELD, the integration of acoustic and linguistic features reached a weighted accuracy (WA) of 85.71% and 63.80%, respectively. Finally, we discuss the main challenges and propose future guidelines that could enhance the performance of ER systems using acoustic and linguistic features from speech.
情感识别(ER)由于其在高级人机交互中的重要性和广泛的现实应用而受到广泛关注。近年来,对ER系统的研究集中在多个关键方面,包括开发高质量的情感数据库,选择鲁棒特征表示,以及利用基于人工智能的技术实现高级分类器。尽管在研究上取得了这些进展,但急诊仍然面临着重大的挑战和差距,必须解决这些挑战和差距才能开发出准确可靠的系统。为了系统地评估这些关键方面,特别是那些以人工智能为中心的技术,我们采用了PRISMA方法。因此,我们收录了期刊和会议论文,这些论文提供了对数据集开发所需的关键参数的基本见解,包括情感建模(分类或维度)、语音数据类型(自然、行为或引出)、与语音声学和语言数据集成的最常见模式以及所使用的技术。同样,按照这种方法,我们确定了在两种模式中作为关键情感信息源的关键代表性特征。对于声学,这包括从时域和频域提取的数据,而对于语言,则考虑了早期的嵌入和最常见的变压器模型。此外,对深度学习(DL)和基于注意的方法进行了分析。考虑到有效地结合这些不同的特征对于改善ER的重要性,我们随后探索了基于抽象级别的融合技术。具体来说,我们关注传统的方法,包括特征、决策、深度学习和基于注意力的融合方法。接下来,我们提供了一个比较分析,以评估我们研究中包括的方法的性能。研究结果表明,对于文献中最常用的数据集IEMOCAP和MELD,声学和语言特征的融合加权精度(WA)分别达到85.71%和63.80%。最后,我们讨论了主要的挑战,并提出了未来的指导方针,可以利用语音的声学和语言特征来提高ER系统的性能。
{"title":"Deep feature representations and fusion strategies for speech emotion recognition from acoustic and linguistic modalities: A systematic review","authors":"Andrea Chaves-Villota ,&nbsp;Ana Jimenez-Martín ,&nbsp;Mario Jojoa-Acosta ,&nbsp;Alfonso Bahillo ,&nbsp;Juan Jesús García-Domínguez","doi":"10.1016/j.csl.2025.101873","DOIUrl":"10.1016/j.csl.2025.101873","url":null,"abstract":"<div><div>Emotion Recognition (ER) has gained significant attention due to its importance in advanced human-machine interaction and its widespread real-world applications. In recent years, research on ER systems has focused on multiple key aspects, including the development of high-quality emotional databases, the selection of robust feature representations, and the implementation of advanced classifiers leveraging AI-based techniques. Despite this progress in research, ER still faces significant challenges and gaps that must be addressed to develop accurate and reliable systems. To systematically assess these critical aspects, particularly those centered on AI-based techniques, we employed the PRISMA methodology. Thus, we include journal and conference papers that provide essential insights into key parameters required for dataset development, involving emotion modeling (categorical or dimensional), the type of speech data (natural, acted, or elicited), the most common modalities integrated with acoustic and linguistic data from speech and the technologies used. Similarly, following this methodology, we identified the key representative features that serve as critical emotional information sources in both modalities. For acoustic, this included those extracted from the time and frequency domains, while for linguistic, earlier embeddings and the most common transformer models were considered. In addition, Deep Learning (DL) and attention-based methods were analyzed for both. Given the importance of effectively combining these diverse features for improving ER, we then explore fusion techniques based on the level of abstraction. Specifically, we focus on traditional approaches, including feature-, decision-, DL-, and attention-based fusion methods. Next, we provide a comparative analysis to assess the performance of the approaches included in our study. Our findings indicate that for the most commonly used datasets in the literature: IEMOCAP and MELD, the integration of acoustic and linguistic features reached a weighted accuracy (WA) of 85.71% and 63.80%, respectively. Finally, we discuss the main challenges and propose future guidelines that could enhance the performance of ER systems using acoustic and linguistic features from speech.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"96 ","pages":"Article 101873"},"PeriodicalIF":3.4,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145004000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computer Speech and Language
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1