首页 > 最新文献

Speech Communication最新文献

英文 中文
Lateral channel dynamics and F3 modulation: Quantifying para-sagittal articulation in Australian English /l/ 横向通道动力学和F3调制:量化澳大利亚英语中矢状旁发音/l/
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-12-13 DOI: 10.1016/j.specom.2025.103345
Jia Ying
This study investigates articulatory-acoustic relationships in Australian English /l/ using simultaneous 3D electromagnetic articulography (EMA) and acoustic recordings from six speakers producing /l/ in onset and coda positions with /æ/ and /ɪ/ vowels. Linear mixed-effects models revealed significant relationships between tongue lateralization and all three formants, with F3 emerging as the primary acoustic correlate of lateralization (β = 0.081, p < 0.001). Acoustic properties of /l/ were strongly influenced by vowel context, with significant vowel-lateralization interactions for F1 and F2, indicating that the acoustic consequences of lateralization vary by vowel environment. Temporal analysis revealed position-dependent timing relationships: F3 preceded articulatory peaks in coda position but showed near-synchronous timing in onset position, while F1 and F2 consistently lagged behind articulatory peaks across all conditions. These findings suggest distinct articulatory-acoustic coupling mechanisms for onset versus coda /l/, with F3 serving as an anticipatory cue in coda position. The results highlight the complex, context-dependent nature of /l/'s articulatory-acoustic relationships and underscore the importance of considering both spectral and temporal dimensions in understanding liquid consonant production.
本研究使用同时3D电磁发音仪(EMA)和六位说话者的录音来研究澳大利亚英语/l/中的发音-声学关系,这些录音产生/l/在元音/æ/和/ / /的起音和尾音位置。线性混合效应模型显示,舌侧化与所有三个共振峰之间存在显著关系,其中F3是舌侧化的主要声学相关性(β = 0.081, p < 0.001)。/l/的声学特性受到元音环境的强烈影响,F1和F2的元音-侧化相互作用显著,表明侧化的声学后果因元音环境而异。时间分析揭示了位置依赖的时间关系:F3先于尾位发音峰值,但在起始位发音峰值几乎同步,而F1和F2始终滞后于所有条件下的发音峰值。这些发现表明不同的发音-声学耦合机制的开始与尾/l/, F3作为尾位置的预期提示。研究结果强调了/l/发音-声学关系的复杂性和上下文依赖性,并强调了在理解液态辅音产生时考虑频谱和时间维度的重要性。
{"title":"Lateral channel dynamics and F3 modulation: Quantifying para-sagittal articulation in Australian English /l/","authors":"Jia Ying","doi":"10.1016/j.specom.2025.103345","DOIUrl":"10.1016/j.specom.2025.103345","url":null,"abstract":"<div><div>This study investigates articulatory-acoustic relationships in Australian English /l/ using simultaneous 3D electromagnetic articulography (EMA) and acoustic recordings from six speakers producing /l/ in onset and coda positions with /æ/ and /ɪ/ vowels. Linear mixed-effects models revealed significant relationships between tongue lateralization and all three formants, with F3 emerging as the primary acoustic correlate of lateralization (β = 0.081, p &lt; 0.001). Acoustic properties of /l/ were strongly influenced by vowel context, with significant vowel-lateralization interactions for F1 and F2, indicating that the acoustic consequences of lateralization vary by vowel environment. Temporal analysis revealed position-dependent timing relationships: F3 preceded articulatory peaks in coda position but showed near-synchronous timing in onset position, while F1 and F2 consistently lagged behind articulatory peaks across all conditions. These findings suggest distinct articulatory-acoustic coupling mechanisms for onset versus coda /l/, with F3 serving as an anticipatory cue in coda position. The results highlight the complex, context-dependent nature of /l/'s articulatory-acoustic relationships and underscore the importance of considering both spectral and temporal dimensions in understanding liquid consonant production.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"176 ","pages":"Article 103345"},"PeriodicalIF":3.0,"publicationDate":"2025-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A review on speech emotion recognition for low-resource and Indigenous languages 低资源语言与本土语言语音情感识别研究进展
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-12-09 DOI: 10.1016/j.specom.2025.103342
Himashi Rathnayake , Jesin James , Gianna Leoni , Ake Nicholas , Catherine Watson , Peter Keegan
Speech emotion recognition (SER) is an emerging field in human–computer interaction. Although numerous studies have focused on SER for well-resourced languages, the literature reveals a significant gap in research on low-resource and Indigenous (LRI) languages. This paper presents a comprehensive review of the existing literature on SER in the context of LRI languages, analysing critical factors to consider at each stage of designing an SER system. The review indicates that most studies on SER for LRI languages adopt emotion categories established for well-resourced languages, often assuming the universality of emotions. However, the literature suggests that this approach may be limited due to emotional disparities influenced by cultural variations. Additionally, the review underscores that current SER systems typically lack community-oriented methodologies in the development of technology for LRI languages. The importance of feature selection is highlighted, with evidence suggesting that a combination of traditional machine learning methods and carefully selected acoustic features may offer viable options for SER in these languages. Furthermore, the review identifies a need for further exploration of semi-supervised and unsupervised approaches to enhance SER capabilities in LRI contexts. Overall, current SER systems for LRI languages lag behind state-of-the-art standards due to the lack of resources, indicating that there is still much work to be done in this area.
语音情感识别(SER)是人机交互领域的一个新兴领域。尽管大量的研究都集中在资源丰富的语言的SER上,但文献显示,在资源匮乏和土著语言(LRI)的研究中存在显著的差距。本文全面回顾了LRI语言背景下关于SER的现有文献,分析了在设计SER系统的每个阶段需要考虑的关键因素。综述表明,大多数针对低语言语言的SER研究都采用了为资源丰富的语言建立的情感类别,往往假设情感具有普遍性。然而,文献表明,由于受文化差异影响的情绪差异,这种方法可能受到限制。此外,回顾强调当前的SER系统在LRI语言的技术开发中通常缺乏面向社区的方法。强调了特征选择的重要性,有证据表明,将传统的机器学习方法和精心选择的声学特征相结合,可能为这些语言中的SER提供可行的选择。此外,该综述确定需要进一步探索半监督和非监督方法来增强LRI背景下的SER能力。总的来说,由于缺乏资源,目前LRI语言的SER系统落后于最先进的标准,这表明在这个领域还有很多工作要做。
{"title":"A review on speech emotion recognition for low-resource and Indigenous languages","authors":"Himashi Rathnayake ,&nbsp;Jesin James ,&nbsp;Gianna Leoni ,&nbsp;Ake Nicholas ,&nbsp;Catherine Watson ,&nbsp;Peter Keegan","doi":"10.1016/j.specom.2025.103342","DOIUrl":"10.1016/j.specom.2025.103342","url":null,"abstract":"<div><div>Speech emotion recognition (SER) is an emerging field in human–computer interaction. Although numerous studies have focused on SER for well-resourced languages, the literature reveals a significant gap in research on low-resource and Indigenous (LRI) languages. This paper presents a comprehensive review of the existing literature on SER in the context of LRI languages, analysing critical factors to consider at each stage of designing an SER system. The review indicates that most studies on SER for LRI languages adopt emotion categories established for well-resourced languages, often assuming the universality of emotions. However, the literature suggests that this approach may be limited due to emotional disparities influenced by cultural variations. Additionally, the review underscores that current SER systems typically lack community-oriented methodologies in the development of technology for LRI languages. The importance of feature selection is highlighted, with evidence suggesting that a combination of traditional machine learning methods and carefully selected acoustic features may offer viable options for SER in these languages. Furthermore, the review identifies a need for further exploration of semi-supervised and unsupervised approaches to enhance SER capabilities in LRI contexts. Overall, current SER systems for LRI languages lag behind state-of-the-art standards due to the lack of resources, indicating that there is still much work to be done in this area.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"176 ","pages":"Article 103342"},"PeriodicalIF":3.0,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145737291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bottom-up modeling of phoneme learning: Universal sensitivity and language-specific transformation 自下而上的音素学习模型:普遍敏感性和语言特异性转换
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-12-04 DOI: 10.1016/j.specom.2025.103343
Frank Lihui Tan, Youngah Do
This study investigates the emergence and development of universal phonetic sensitivity during early phonological learning using an unsupervised modeling approach. Autoencoder models were trained on raw acoustic input from English and Mandarin to simulate bottom-up perceptual development, with a focus on phoneme contrast learning. The results show that phoneme-like categories and feature-aligned representational spaces can emerge from context-free acoustic exposure alone. Crucially, the model exhibits universal phonetic sensitivity as a transient developmental stage that varies across contrasts and gradually gives way to language-specific perception—a trajectory that parallels infant perceptual development. Different featural contrasts remain universally discriminable for varying durations over the course of learning. These findings support the view that universal sensitivity is not innately fixed but emerges through learning, and that early phonological development proceeds along a mosaic, feature-dependent trajectory.
本研究采用无监督建模的方法研究了早期语音学习中普遍语音敏感性的出现和发展。自动编码器模型在英语和普通话的原始声音输入上进行训练,以模拟自下而上的感知发展,重点是音素对比学习。结果表明,音素类类别和特征对齐的表征空间可以单独从上下文无关的声音暴露中产生。至关重要的是,该模型显示,普遍的语音敏感性是一个短暂的发展阶段,在不同的对比中有所不同,并逐渐让位给特定语言的感知——一个与婴儿感知发展相似的轨迹。在学习过程中,不同的特征对比在不同的持续时间内仍然是普遍可辨别的。这些发现支持了一种观点,即普遍的敏感性不是天生固定的,而是通过学习形成的,早期的语音发展是沿着一个马赛克的、依赖特征的轨迹进行的。
{"title":"Bottom-up modeling of phoneme learning: Universal sensitivity and language-specific transformation","authors":"Frank Lihui Tan,&nbsp;Youngah Do","doi":"10.1016/j.specom.2025.103343","DOIUrl":"10.1016/j.specom.2025.103343","url":null,"abstract":"<div><div>This study investigates the emergence and development of universal phonetic sensitivity during early phonological learning using an unsupervised modeling approach. Autoencoder models were trained on raw acoustic input from English and Mandarin to simulate bottom-up perceptual development, with a focus on phoneme contrast learning. The results show that phoneme-like categories and feature-aligned representational spaces can emerge from context-free acoustic exposure alone. Crucially, the model exhibits universal phonetic sensitivity as a transient developmental stage that varies across contrasts and gradually gives way to language-specific perception—a trajectory that parallels infant perceptual development. Different featural contrasts remain universally discriminable for varying durations over the course of learning. These findings support the view that universal sensitivity is not innately fixed but emerges through learning, and that early phonological development proceeds along a mosaic, feature-dependent trajectory.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"176 ","pages":"Article 103343"},"PeriodicalIF":3.0,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145737290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Speaker-conditioned phrase break prediction for text-to-speech with phoneme-level pre-trained language model 基于音素级预训练语言模型的文本到语音的说话人条件断句预测
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-11-29 DOI: 10.1016/j.specom.2025.103331
Dong Yang , Yuki Saito , Takaaki Saeki , Tomoki Koriyama , Wataru Nakata , Detai Xin , Hiroshi Saruwatari
This paper advances phrase break prediction (also known as phrasing) in multi-speaker text-to-speech (TTS) systems. We integrate speaker-specific features by leveraging speaker embeddings to enhance the performance of the phrasing model. We further demonstrate that these speaker embeddings can capture speaker-related characteristics solely from the phrasing task. Besides, we explore the potential of pre-trained speaker embeddings for unseen speakers through a few-shot adaptation method. Furthermore, we pioneer the application of phoneme-level pre-trained language models to this TTS front-end task, which significantly boosts the accuracy of the phrasing model. Our methods are rigorously assessed through both objective and subjective evaluations, demonstrating their effectiveness.
本文提出了多说话人文本到语音(TTS)系统中的断句预测方法。我们通过利用说话人嵌入来集成特定于说话人的功能,以增强措辞模型的性能。我们进一步证明,这些说话人嵌入可以仅从措辞任务中捕获说话人相关的特征。此外,我们还通过少镜头自适应方法探索了预训练的说话人嵌入对未见说话人的潜力。此外,我们率先将音素级预训练语言模型应用于TTS前端任务,显著提高了短语模型的准确性。我们的方法经过客观和主观的严格评估,证明了它们的有效性。
{"title":"Speaker-conditioned phrase break prediction for text-to-speech with phoneme-level pre-trained language model","authors":"Dong Yang ,&nbsp;Yuki Saito ,&nbsp;Takaaki Saeki ,&nbsp;Tomoki Koriyama ,&nbsp;Wataru Nakata ,&nbsp;Detai Xin ,&nbsp;Hiroshi Saruwatari","doi":"10.1016/j.specom.2025.103331","DOIUrl":"10.1016/j.specom.2025.103331","url":null,"abstract":"<div><div>This paper advances phrase break prediction (also known as phrasing) in multi-speaker text-to-speech (TTS) systems. We integrate speaker-specific features by leveraging speaker embeddings to enhance the performance of the phrasing model. We further demonstrate that these speaker embeddings can capture speaker-related characteristics solely from the phrasing task. Besides, we explore the potential of pre-trained speaker embeddings for unseen speakers through a few-shot adaptation method. Furthermore, we pioneer the application of phoneme-level pre-trained language models to this TTS front-end task, which significantly boosts the accuracy of the phrasing model. Our methods are rigorously assessed through both objective and subjective evaluations, demonstrating their effectiveness.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"176 ","pages":"Article 103331"},"PeriodicalIF":3.0,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145685354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Effect of individual characteristics on impressions of one’s own recorded voice 个人特征对自己录制的声音印象的影响
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-11-27 DOI: 10.1016/j.specom.2025.103335
Hikaru Yanagida , Yusuke Ijima , Naohiro Tawara
This study aims to identify individual characteristics such as age, gender, personality traits, and values that influence the perception of one’s own recorded voice. While previous studies have shown that the perception of one’s own recorded voice is different from that of others, and that these differences are influenced by individual characteristics, only a limited number of individual characteristics were examined in past research. In our study, we conducted a large-scale subjective experiment with 141 Japanese participants using multiple individual characteristics. Participants evaluated impressions of their own recorded voices and the voices of others, and we analyzed the relationship between each of the individual characteristics and the voice impressions. Our findings showed that individual characteristics such as the frequency of listening to one’s own recorded voice (which had not been examined in the previous studies) influenced the perception of one’s own recorded voice. We further analyzed the use of combinations of multiple individual characteristics, including those that influenced impressions in a single use, to predict impressions of one’s own recorded voice and found that they were better predicted by the combination of multiple individual characteristics than by the use of a single individual characteristic.
这项研究旨在确定个人特征,如年龄、性别、性格特征和价值观,这些特征会影响人们对自己录制的声音的感知。虽然以前的研究表明,一个人对自己录制的声音的感知与其他人不同,而且这些差异受到个人特征的影响,但在过去的研究中,只研究了有限数量的个人特征。在我们的研究中,我们对141名日本参与者进行了大规模的主观实验,使用了多个个体特征。参与者评估了他们自己录制的声音和其他人的声音的印象,我们分析了每个个体特征和声音印象之间的关系。我们的研究结果表明,个人特征,如听自己录制的声音的频率(这在之前的研究中没有得到检验)会影响人们对自己录制的声音的感知。我们进一步分析了使用多个个体特征的组合,包括那些在一次使用中影响印象的个体特征,来预测一个人自己录制的声音的印象,发现多个个体特征的组合比使用单个个体特征更能预测他们。
{"title":"Effect of individual characteristics on impressions of one’s own recorded voice","authors":"Hikaru Yanagida ,&nbsp;Yusuke Ijima ,&nbsp;Naohiro Tawara","doi":"10.1016/j.specom.2025.103335","DOIUrl":"10.1016/j.specom.2025.103335","url":null,"abstract":"<div><div>This study aims to identify individual characteristics such as age, gender, personality traits, and values that influence the perception of one’s own recorded voice. While previous studies have shown that the perception of one’s own recorded voice is different from that of others, and that these differences are influenced by individual characteristics, only a limited number of individual characteristics were examined in past research. In our study, we conducted a large-scale subjective experiment with 141 Japanese participants using multiple individual characteristics. Participants evaluated impressions of their own recorded voices and the voices of others, and we analyzed the relationship between each of the individual characteristics and the voice impressions. Our findings showed that individual characteristics such as the frequency of listening to one’s own recorded voice (which had not been examined in the previous studies) influenced the perception of one’s own recorded voice. We further analyzed the use of combinations of multiple individual characteristics, including those that influenced impressions in a single use, to predict impressions of one’s own recorded voice and found that they were better predicted by the combination of multiple individual characteristics than by the use of a single individual characteristic.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"176 ","pages":"Article 103335"},"PeriodicalIF":3.0,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-Supervised Learning for Speaker Recognition: A study and review 自监督学习在说话人识别中的研究与回顾
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-11-24 DOI: 10.1016/j.specom.2025.103333
Theo Lepage, Reda Dehak
Deep learning models trained in a supervised setting have revolutionized audio and speech processing. However, their performance inherently depends on the quantity of human-annotated data, making them costly to scale and prone to poor generalization under unseen conditions. To address these challenges, Self-Supervised Learning (SSL) has emerged as a promising paradigm, leveraging vast amounts of unlabeled data to learn relevant representations. The application of SSL for Automatic Speech Recognition (ASR) has been extensively studied, but research on other downstream tasks, notably Speaker Recognition (SR), remains in its early stages. This work describes major SSL instance-invariance frameworks (e.g., SimCLR, MoCo, and DINO), initially developed for computer vision, along with their adaptation to SR. Various SSL methods for SR, proposed in the literature and built upon these frameworks, are also presented. An extensive review of these approaches is then conducted: (1) the effect of the main hyperparameters of SSL frameworks is investigated; (2) the role of SSL components is studied (e.g., data-augmentation, projector, positive sampling); and (3) SSL frameworks are evaluated on SR with in-domain and out-of-domain data, using a consistent experimental setup, and a comprehensive comparison of SSL methods from the literature is provided. Specifically, DINO achieves the best downstream performance and effectively models intra-speaker variability, although it is highly sensitive to hyperparameters and training conditions, while SimCLR and MoCo provide robust alternatives that effectively capture inter-speaker variability and are less prone to collapse. This work aims to highlight recent trends and advancements, identifying current challenges in the field.
在监督环境中训练的深度学习模型已经彻底改变了音频和语音处理。然而,它们的性能本质上依赖于人工注释数据的数量,这使得它们的扩展成本很高,并且在未知的条件下容易出现较差的泛化。为了应对这些挑战,自我监督学习(Self-Supervised Learning, SSL)作为一种很有前途的范例出现了,它利用大量未标记的数据来学习相关的表示。SSL在自动语音识别(ASR)中的应用已经得到了广泛的研究,但对其他下游任务,特别是说话人识别(SR)的研究仍处于早期阶段。本文描述了主要的SSL实例不变性框架(例如SimCLR、MoCo和DINO),最初是为计算机视觉开发的,以及它们对SR的适应。本文还介绍了文献中提出的基于这些框架的SR的各种SSL方法。然后对这些方法进行了广泛的回顾:(1)研究了SSL框架的主要超参数的影响;(2)研究了SSL组件的作用(如数据增强、投影仪、正采样);(3)使用一致的实验设置,在域内和域外数据的SR上评估SSL框架,并对文献中的SSL方法进行了全面比较。具体来说,DINO实现了最佳的下游性能,并有效地模拟了说话人内部的变化,尽管它对超参数和训练条件高度敏感,而SimCLR和MoCo提供了鲁棒的替代方案,有效地捕获了说话人之间的变化,并且不容易崩溃。这项工作旨在突出最近的趋势和进展,确定该领域当前的挑战。
{"title":"Self-Supervised Learning for Speaker Recognition: A study and review","authors":"Theo Lepage,&nbsp;Reda Dehak","doi":"10.1016/j.specom.2025.103333","DOIUrl":"10.1016/j.specom.2025.103333","url":null,"abstract":"<div><div>Deep learning models trained in a supervised setting have revolutionized audio and speech processing. However, their performance inherently depends on the quantity of human-annotated data, making them costly to scale and prone to poor generalization under unseen conditions. To address these challenges, Self-Supervised Learning (SSL) has emerged as a promising paradigm, leveraging vast amounts of unlabeled data to learn relevant representations. The application of SSL for Automatic Speech Recognition (ASR) has been extensively studied, but research on other downstream tasks, notably Speaker Recognition (SR), remains in its early stages. This work describes major SSL instance-invariance frameworks (e.g., SimCLR, MoCo, and DINO), initially developed for computer vision, along with their adaptation to SR. Various SSL methods for SR, proposed in the literature and built upon these frameworks, are also presented. An extensive review of these approaches is then conducted: (1) the effect of the main hyperparameters of SSL frameworks is investigated; (2) the role of SSL components is studied (e.g., data-augmentation, projector, positive sampling); and (3) SSL frameworks are evaluated on SR with in-domain and out-of-domain data, using a consistent experimental setup, and a comprehensive comparison of SSL methods from the literature is provided. Specifically, DINO achieves the best downstream performance and effectively models intra-speaker variability, although it is highly sensitive to hyperparameters and training conditions, while SimCLR and MoCo provide robust alternatives that effectively capture inter-speaker variability and are less prone to collapse. This work aims to highlight recent trends and advancements, identifying current challenges in the field.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"176 ","pages":"Article 103333"},"PeriodicalIF":3.0,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145685355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive weighting in a transformer framework for multimodal emotion recognition 多模态情感识别变压器框架中的自适应加权
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-11-24 DOI: 10.1016/j.specom.2025.103332
Weijie Lu , Yunfeng Xu , Jintan Gu
Multimodal Dialogue Emotion Recognition is rapidly emerging as a research hotspot with broad application prospects. In recent years, researchers have invested a lot of effort in the integration of modal feature information, but the detailed analysis of each modal feature information is still insufficient, and the difference in the influence of each modal feature information on the recognition results has not been fully considered. In order to solve this problem, we propose a Transformer-based multimodal interaction model with an adaptive weighted fusion mechanism (TIAWFM). The model effectively captures deep inter-modal correlations in multimodal emotion recognition tasks, mitigating the limitations of unimodal representations. We observe that incorporating specific conversational contexts and dynamically allocating weights to each modality not only fully leverages the model’s capabilities but also enables more accurate capture of emotional information embedded in the features. We conducted extensive experiments on two benchmark multimodal datasets, IEMOCAP and MELD. Experimental results demonstrate that TIAWFM exhibits significant advantages in dynamically integrating multimodal information, leading to notable improvements in both the accuracy and robustness of emotion recognition.
多模态对话情感识别正迅速成为一个具有广阔应用前景的研究热点。近年来,研究人员在模态特征信息的集成方面投入了大量的精力,但对各个模态特征信息的详细分析仍然不足,并且没有充分考虑到各个模态特征信息对识别结果影响的差异。为了解决这一问题,我们提出了一种基于变压器的多模态交互模型,该模型具有自适应加权融合机制(TIAWFM)。该模型有效地捕获了多模态情感识别任务中的深度多模态相关性,减轻了单模态表示的局限性。我们观察到,结合特定的会话上下文并动态分配权重给每个模态不仅充分利用了模型的功能,而且能够更准确地捕获嵌入在特征中的情感信息。我们在IEMOCAP和MELD两个基准多模态数据集上进行了广泛的实验。实验结果表明,TIAWFM在动态整合多模态信息方面具有显著优势,显著提高了情绪识别的准确性和鲁棒性。
{"title":"Adaptive weighting in a transformer framework for multimodal emotion recognition","authors":"Weijie Lu ,&nbsp;Yunfeng Xu ,&nbsp;Jintan Gu","doi":"10.1016/j.specom.2025.103332","DOIUrl":"10.1016/j.specom.2025.103332","url":null,"abstract":"<div><div>Multimodal Dialogue Emotion Recognition is rapidly emerging as a research hotspot with broad application prospects. In recent years, researchers have invested a lot of effort in the integration of modal feature information, but the detailed analysis of each modal feature information is still insufficient, and the difference in the influence of each modal feature information on the recognition results has not been fully considered. In order to solve this problem, we propose a Transformer-based multimodal interaction model with an adaptive weighted fusion mechanism (TIAWFM). The model effectively captures deep inter-modal correlations in multimodal emotion recognition tasks, mitigating the limitations of unimodal representations. We observe that incorporating specific conversational contexts and dynamically allocating weights to each modality not only fully leverages the model’s capabilities but also enables more accurate capture of emotional information embedded in the features. We conducted extensive experiments on two benchmark multimodal datasets, IEMOCAP and MELD. Experimental results demonstrate that TIAWFM exhibits significant advantages in dynamically integrating multimodal information, leading to notable improvements in both the accuracy and robustness of emotion recognition.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"176 ","pages":"Article 103332"},"PeriodicalIF":3.0,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145594745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards unsupervised speech recognition without pronunciation models 无语音模型的无监督语音识别
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-11-15 DOI: 10.1016/j.specom.2025.103330
Junrui Ni , Liming Wang , Yang Zhang , Kaizhi Qian , Heting Gao , Mark Hasegawa-Johnson , James Glass , Chang D. Yoo
Recent advancements in supervised automatic speech recognition (ASR) have achieved remarkable performance, largely due to the growing availability of large transcribed speech corpora. However, most languages lack sufficient paired speech and text data to effectively train these systems. In this article, we tackle the challenge of developing ASR systems without paired speech and text corpora by proposing the removal of reliance on a phoneme lexicon. We explore a new research direction: word-level unsupervised ASR, and experimentally demonstrate that an unsupervised speech recognizer can emerge from joint speech-to-speech and text-to-text masked token-infilling. Using a curated speech corpus containing a fixed number of English words, our system iteratively refines the word segmentation structure and achieves a word error rate of between 20%–23%, depending on the vocabulary size, without parallel transcripts, oracle word boundaries, or a pronunciation lexicon. This innovative model surpasses the performance of previous unsupervised ASR models under the lexicon-free setting.
有监督自动语音识别(ASR)的最新进展取得了显著的成绩,这主要是由于越来越多的大型转录语音语料库的可用性。然而,大多数语言缺乏足够的配对语音和文本数据来有效地训练这些系统。在本文中,我们通过提出消除对音素词典的依赖来解决在没有配对语音和文本语料库的情况下开发ASR系统的挑战。我们探索了一个新的研究方向:词级无监督语音识别,并通过实验证明了一种无监督语音识别器可以从语音到语音和文本到文本的掩码token填充中产生。使用包含固定数量英语单词的精选语音语料库,我们的系统迭代地改进了分词结构,并根据词汇量的大小实现了20%-23%的单词错误率,而不需要并行转录本、oracle单词边界或发音词典。这个创新的模型超越了以前无词典设置下的无监督ASR模型的性能。
{"title":"Towards unsupervised speech recognition without pronunciation models","authors":"Junrui Ni ,&nbsp;Liming Wang ,&nbsp;Yang Zhang ,&nbsp;Kaizhi Qian ,&nbsp;Heting Gao ,&nbsp;Mark Hasegawa-Johnson ,&nbsp;James Glass ,&nbsp;Chang D. Yoo","doi":"10.1016/j.specom.2025.103330","DOIUrl":"10.1016/j.specom.2025.103330","url":null,"abstract":"<div><div>Recent advancements in supervised automatic speech recognition (ASR) have achieved remarkable performance, largely due to the growing availability of large transcribed speech corpora. However, most languages lack sufficient paired speech and text data to effectively train these systems. In this article, we tackle the challenge of developing ASR systems without paired speech and text corpora by proposing the removal of reliance on a phoneme lexicon. We explore a new research direction: word-level unsupervised ASR, and experimentally demonstrate that an unsupervised speech recognizer can emerge from joint speech-to-speech and text-to-text masked token-infilling. Using a curated speech corpus containing a fixed number of English words, our system iteratively refines the word segmentation structure and achieves a word error rate of between 20%–23%, depending on the vocabulary size, without parallel transcripts, oracle word boundaries, or a pronunciation lexicon. This innovative model surpasses the performance of previous unsupervised ASR models under the lexicon-free setting.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"176 ","pages":"Article 103330"},"PeriodicalIF":3.0,"publicationDate":"2025-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The discriminative capacity of English segments in forensic speaker comparison 司法说话人比较中英语分词的辨析能力
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-11-11 DOI: 10.1016/j.specom.2025.103329
Paul Foulkes, Vincent Hughes, Kayleigh Peters, Jasmine Rouse
This study compares the relative performance of formant- and MFCC-based analyses of the same dataset, extending the work of Franco-Pedroso and Gonzalez-Rodriguez (2016). Using a corpus of read speech from 24 male English speakers we extracted vowel formant data and segmentally-based MFCCs of all phonemes. Data were taken from three versions of the corpus: 10 min and 3 min samples with wholly automated segment labelling and data extraction (the 10U and 3U datasets), and 3 min samples with manually corrected segment labelling and manual checking of formant tracking (the 3C dataset). The datasets were split in half and used for nine speaker discrimination tests: six tests using formants or MFCCs in each of the 10U, 3U and 3C datasets, and three fused systems combining formants and MFCCs for each dataset.
The formant-based tests revealed that the best performing segments were /ɪ/, /eɪ/, /aɪ/, /e/, /ʌ/ and /əː/. These vowels also performed well in MFCC-based tests, along with the three nasal consonants /m, n, ŋ/ and /k/. Relatively similar patterns were found for the three datasets. There was also a correlation with segment frequency: more frequent phonemes generally yielded better results. In addition, formant-based measures gave better EER and Cllr values than segmentally-based MFCCs. For formants, the best results came from the 10U dataset, while for MFCCs the best results came from the manually corrected 3C dataset. The effect of manual correction was starkest for consonants. Finally, the fused systems performed very well, with both formant- and MFCC-based systems producing EERs close to 0 in some cases. The best systems were those using the 3C dataset. The fused 10U system generally produced notably weaker LLRs, presumably because of the inevitably larger number of data labelling errors.
While the study is not forensically realistic, it has a number of implications for forensic speaker comparison. First, the best performing segments are those vowels in which formant separation is clear, and consonants (nasals) with formant structure. Second, manual correction of data is beneficial, especially for consonants. MFCCs are high dimensional data relative to vowel formants taken at a segment’s midpoint. Misalignment of automated labelling and tracking is thus potentially more likely to have a deleterious effect on MFCCs. While the 10U dataset yielded the best scores for vowel formants, there is a danger that it overestimates the discriminatory power of those segments. A degree of manual correction is therefore worthwhile. Finally, although MFCC data yielded worse scores on a segment by segment basis, the fused system worked very well. Further research is therefore merited on MFCC-based analysis of segments as variables in speaker comparison, and more broadly in phonetic research.
本研究比较了基于形成峰和基于mfc的同一数据集分析的相对性能,扩展了Franco-Pedroso和Gonzalez-Rodriguez(2016)的工作。使用来自24名男性英语使用者的读语音语料库,我们提取了所有音素的元音形成峰数据和基于片段的mfccc。数据取自三个版本的语料库:10分钟和3分钟的样本,完全自动化的片段标记和数据提取(10U和3U数据集),以及3分钟的样本,手动校正的片段标记和手动检查的峰跟踪(3C数据集)。数据集被分成两半,用于9个说话人识别测试:在10U、3U和3C数据集中使用共振子或mfccc的6个测试,以及每个数据集结合共振子和mfccc的3个融合系统。基于共振峰的测试显示,表现最好的片段是/ / /、/e/ /、/a / /、/e/、/ / /和/ / /。这些元音以及三个鼻辅音/m、n、音/和/k/在基于mfcc的测试中也表现良好。在三个数据集中发现了相对相似的模式。音素频率也有相关性:音素频率越高,结果越好。此外,基于共振峰的测量比基于分段的mfccc给出了更好的EER和Cllr值。对于共振体,最好的结果来自10U数据集,而对于mfccc,最好的结果来自手动校正的3C数据集。手工矫正辅音的效果最为明显。最后,融合系统表现非常好,在某些情况下,基于形成峰和mfc的系统产生的EERs接近于0。最好的系统是使用3C数据集的系统。融合的10U系统通常产生明显较弱的llr,可能是因为不可避免的大量数据标记错误。虽然这项研究在法医上并不现实,但它对法医说话人比较有一些启示。首先,表现最好的音段是那些形成峰分离清晰的元音和具有形成峰结构的辅音(鼻音)。第二,人工校正数据是有益的,特别是对辅音。mfccc是相对于元音共振峰的高维数据。因此,自动标记和跟踪的不对齐可能更有可能对mfc产生有害影响。虽然10U数据集在元音共振音方面取得了最好的成绩,但它存在高估这些音段的区别能力的危险。因此,一定程度的人工校正是值得的。最后,尽管MFCC数据在分段的基础上得到了较差的分数,但融合系统的效果非常好。因此,基于mfcc的分词分析作为说话人比较变量,以及更广泛的语音研究值得进一步研究。
{"title":"The discriminative capacity of English segments in forensic speaker comparison","authors":"Paul Foulkes,&nbsp;Vincent Hughes,&nbsp;Kayleigh Peters,&nbsp;Jasmine Rouse","doi":"10.1016/j.specom.2025.103329","DOIUrl":"10.1016/j.specom.2025.103329","url":null,"abstract":"<div><div>This study compares the relative performance of formant- and MFCC-based analyses of the same dataset, extending the work of Franco-Pedroso and Gonzalez-Rodriguez (2016). Using a corpus of read speech from 24 male English speakers we extracted vowel formant data and segmentally-based MFCCs of all phonemes. Data were taken from three versions of the corpus: 10 min and 3 min samples with wholly automated segment labelling and data extraction (the 10U and 3U datasets), and 3 min samples with manually corrected segment labelling and manual checking of formant tracking (the 3C dataset). The datasets were split in half and used for nine speaker discrimination tests: six tests using formants or MFCCs in each of the 10U, 3U and 3C datasets, and three fused systems combining formants and MFCCs for each dataset.</div><div>The formant-based tests revealed that the best performing segments were /ɪ/, /eɪ/, /aɪ/, /e/, /ʌ/ and /əː/. These vowels also performed well in MFCC-based tests, along with the three nasal consonants /m, n, ŋ/ and /k/. Relatively similar patterns were found for the three datasets. There was also a correlation with segment frequency: more frequent phonemes generally yielded better results. In addition, formant-based measures gave better EER and <em>C</em><sub>llr</sub> values than segmentally-based MFCCs. For formants, the best results came from the 10U dataset, while for MFCCs the best results came from the manually corrected 3C dataset. The effect of manual correction was starkest for consonants. Finally, the fused systems performed very well, with both formant- and MFCC-based systems producing EERs close to 0 in some cases. The best systems were those using the 3C dataset. The fused 10U system generally produced notably weaker LLRs, presumably because of the inevitably larger number of data labelling errors.</div><div>While the study is not forensically realistic, it has a number of implications for forensic speaker comparison. First, the best performing segments are those vowels in which formant separation is clear, and consonants (nasals) with formant structure. Second, manual correction of data is beneficial, especially for consonants. MFCCs are high dimensional data relative to vowel formants taken at a segment’s midpoint. Misalignment of automated labelling and tracking is thus potentially more likely to have a deleterious effect on MFCCs. While the 10U dataset yielded the best scores for vowel formants, there is a danger that it overestimates the discriminatory power of those segments. A degree of manual correction is therefore worthwhile. Finally, although MFCC data yielded worse scores on a segment by segment basis, the fused system worked very well. Further research is therefore merited on MFCC-based analysis of segments as variables in speaker comparison, and more broadly in phonetic research.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"176 ","pages":"Article 103329"},"PeriodicalIF":3.0,"publicationDate":"2025-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ultrasound imaging in second language research: Systematic review and thematic analysis 超声成像在第二语言研究中的应用:系统回顾与专题分析
IF 3 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2025-11-01 DOI: 10.1016/j.specom.2025.103324
Eija M.A. Aalto , Hana Ben Asker , Lucie Ménard , Walcir Cardoso , Catherine Laporte
Several publications have explored second language (L2) articulation through lingual ultrasound imaging technology. This systematic review and thematic analysis collate and evaluate these studies, focusing on methodologies, experimental setups, and findings. The review includes 31 works: 23 on ultrasound biofeedback and 8 on characterizing L2 articulation. English is the predominant language studied (82 % as L1 or L2), with participants mainly young adults (2–60 participants per study). The 23 ultrasound biofeedback studies showed significant variation in session numbers and length, including 16 PICO studies (i.e. study design with participants, intervention, controls/comparison group, outcome) where ultrasound biofeedback was compared to auditory feedback and/or control conditions. Data analysis of biofeedback studies often included acoustic or perceptual assessments in addition or instead of ultrasound data analysis. Analysis of results indicate that ultrasound biofeedback is effective for improving L2 articulation. However, the PICO studies revealed that while ultrasound biofeedback may offer certain advantages, these findings remain preliminary and warrant further investigation. Learner characteristics and target selection may affect biofeedback efficacy. Ultrasound also proved valuable for characterizing L2 articulation by showing articulatory and coarticulatory patterns, particularly in English sounds like /ɹ/, /l/, and various vowels. L2 characterization studies frequently used dynamic speech movement analysis. Moving forward, researchers are encouraged to use dynamic movement analysis also in biofeedback studies to deepen understanding of articulation processes. Expanding linguistic and demographic diversity in future research is essential to capturing language heterogeneity.
一些出版物通过语言超声成像技术探讨了第二语言(L2)的发音。本系统综述和专题分析对这些研究进行了整理和评价,重点关注方法、实验设置和结果。本文综述31篇,其中超声生物反馈23篇,L2关节表征8篇。英语是研究的主要语言(82%为L1或L2),参与者主要是年轻人(每个研究2-60名参与者)。23项超声生物反馈研究显示,疗程数量和长度存在显著差异,其中包括16项PICO研究(即研究设计包括受试者、干预、对照组/对照组、结果),其中超声生物反馈与听觉反馈和/或对照条件进行了比较。生物反馈研究的数据分析通常包括声学或感知评估,或者代替超声数据分析。分析结果表明,超声生物反馈对改善L2关节是有效的。然而,PICO研究表明,虽然超声生物反馈可能提供某些优势,但这些发现仍然是初步的,需要进一步的研究。学习者特征和目标选择可能影响生物反馈效果。通过显示发音和辅助发音模式,超声波也被证明对表征L2发音很有价值,特别是在英语发音中,如/ r /, /l/和各种元音。二语表征研究中经常使用动态语音运动分析。展望未来,研究人员也被鼓励在生物反馈研究中使用动态运动分析来加深对发音过程的理解。在未来的研究中扩大语言和人口的多样性是捕捉语言异质性的必要条件。
{"title":"Ultrasound imaging in second language research: Systematic review and thematic analysis","authors":"Eija M.A. Aalto ,&nbsp;Hana Ben Asker ,&nbsp;Lucie Ménard ,&nbsp;Walcir Cardoso ,&nbsp;Catherine Laporte","doi":"10.1016/j.specom.2025.103324","DOIUrl":"10.1016/j.specom.2025.103324","url":null,"abstract":"<div><div>Several publications have explored second language (L2) articulation through lingual ultrasound imaging technology. This systematic review and thematic analysis collate and evaluate these studies, focusing on methodologies, experimental setups, and findings. The review includes 31 works: 23 on ultrasound biofeedback and 8 on characterizing L2 articulation. English is the predominant language studied (82 % as L1 or L2), with participants mainly young adults (2–60 participants per study). The 23 ultrasound biofeedback studies showed significant variation in session numbers and length, including 16 PICO studies (i.e. study design with participants, intervention, controls/comparison group, outcome) where ultrasound biofeedback was compared to auditory feedback and/or control conditions. Data analysis of biofeedback studies often included acoustic or perceptual assessments in addition or instead of ultrasound data analysis. Analysis of results indicate that ultrasound biofeedback is effective for improving L2 articulation. However, the PICO studies revealed that while ultrasound biofeedback may offer certain advantages, these findings remain preliminary and warrant further investigation. Learner characteristics and target selection may affect biofeedback efficacy. Ultrasound also proved valuable for characterizing L2 articulation by showing articulatory and coarticulatory patterns, particularly in English sounds like /ɹ/, /l/, and various vowels. L2 characterization studies frequently used dynamic speech movement analysis. Moving forward, researchers are encouraged to use dynamic movement analysis also in biofeedback studies to deepen understanding of articulation processes. Expanding linguistic and demographic diversity in future research is essential to capturing language heterogeneity.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"175 ","pages":"Article 103324"},"PeriodicalIF":3.0,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145474243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Speech Communication
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1