首页 > 最新文献

IEEE/ACM Transactions on Audio, Speech, and Language Processing最新文献

英文 中文
CoNeTTE: An Efficient Audio Captioning System Leveraging Multiple Datasets With Task Embedding CoNeTTE:通过任务嵌入利用多个数据集的高效音频字幕系统
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-07-18 DOI: 10.1109/TASLP.2024.3430813
Étienne Labbé;Thomas Pellegrini;Julien Pinquier
Automated Audio Captioning (AAC) involves generating natural language descriptions of audio content, using encoder-decoder architectures. An audio encoder produces audio embeddings fed to a decoder, usually a Transformer decoder, for caption generation. In this work, we describe our model, which novelty, compared to existing models, lies in the use of a ConvNeXt architecture as audio encoder, adapted from the vision domain to audio classification. This model, called CNext-trans, achieved state-of-the-art scores on the AudioCaps (AC) dataset and performed competitively on Clotho (CL), while using four to forty times fewer parameters than existing models. We examine potential biases in the AC dataset due to its origin from AudioSet by investigating unbiased encoder's impact on performance. Using the well-known PANN's CNN14, for instance, as an unbiased encoder, we observed a 0.017 absolute reduction in SPIDEr score (where higher scores indicate better performance). To improve cross-dataset performance, we conducted experiments by combining multiple AAC datasets (AC, CL, MACS, WavCaps) for training. Although this strategy enhanced overall model performance across datasets, it still fell short compared to models trained specifically on a single target dataset, indicating the absence of a one-size-fits-all model. To mitigate performance gaps between datasets, we introduced a Task Embedding (TE) token, allowing the model to identify the source dataset for each input sample. We provide insights into the impact of these TEs on both the form (words) and content (sound event types) of the generated captions. The resulting model, named CoNeTTE, an unbiased CNext-trans model enriched with dataset-specific Task Embeddings, achieved SPIDEr scores of 0.467 and 0.310 on AC and CL, respectively.
自动音频字幕制作(AAC)涉及使用编码器-解码器架构生成音频内容的自然语言描述。音频编码器生成的音频嵌入信息会被送入解码器(通常是变换器解码器),用于生成字幕。在这项工作中,我们介绍了我们的模型,与现有模型相比,该模型的新颖之处在于使用 ConvNeXt 架构作为音频编码器,并从视觉领域调整到音频分类领域。这个名为 CNext-trans 的模型在 AudioCaps(AC)数据集上取得了最先进的成绩,在 Clotho(CL)上的表现也很有竞争力,同时使用的参数比现有模型少四到四十倍。我们通过研究无偏编码器对性能的影响,检查了 AC 数据集由于源自 AudioSet 而可能存在的偏差。例如,使用著名的 PANN's CNN14 作为无偏编码器,我们观察到 SPIDEr 分数绝对值降低了 0.017(分数越高表示性能越好)。为了提高跨数据集的性能,我们结合多个 AAC 数据集(AC、CL、MACS、WavCaps)进行了训练实验。虽然这一策略提高了跨数据集模型的整体性能,但与专门在单一目标数据集上训练的模型相比仍有差距,这表明没有放之四海而皆准的模型。为了缩小数据集之间的性能差距,我们引入了任务嵌入(TE)标记,允许模型识别每个输入样本的源数据集。我们深入分析了这些 TE 对生成字幕的形式(单词)和内容(声音事件类型)的影响。由此产生的模型被命名为 CoNeTTE(一种使用特定数据集任务嵌入丰富的无偏 CNext-trans 模型),在 AC 和 CL 上分别获得了 0.467 和 0.310 的 SPIDEr 分数。
{"title":"CoNeTTE: An Efficient Audio Captioning System Leveraging Multiple Datasets With Task Embedding","authors":"Étienne Labbé;Thomas Pellegrini;Julien Pinquier","doi":"10.1109/TASLP.2024.3430813","DOIUrl":"10.1109/TASLP.2024.3430813","url":null,"abstract":"Automated Audio Captioning (AAC) involves generating natural language descriptions of audio content, using encoder-decoder architectures. An audio encoder produces audio embeddings fed to a decoder, usually a Transformer decoder, for caption generation. In this work, we describe our model, which novelty, compared to existing models, lies in the use of a ConvNeXt architecture as audio encoder, adapted from the vision domain to audio classification. This model, called CNext-trans, achieved state-of-the-art scores on the AudioCaps (AC) dataset and performed competitively on Clotho (CL), while using four to forty times fewer parameters than existing models. We examine potential biases in the AC dataset due to its origin from AudioSet by investigating unbiased encoder's impact on performance. Using the well-known PANN's CNN14, for instance, as an unbiased encoder, we observed a 0.017 absolute reduction in SPIDEr score (where higher scores indicate better performance). To improve cross-dataset performance, we conducted experiments by combining multiple AAC datasets (AC, CL, MACS, WavCaps) for training. Although this strategy enhanced overall model performance across datasets, it still fell short compared to models trained specifically on a single target dataset, indicating the absence of a one-size-fits-all model. To mitigate performance gaps between datasets, we introduced a Task Embedding (TE) token, allowing the model to identify the source dataset for each input sample. We provide insights into the impact of these TEs on both the form (words) and content (sound event types) of the generated captions. The resulting model, named CoNeTTE, an unbiased CNext-trans model enriched with dataset-specific Task Embeddings, achieved SPIDEr scores of 0.467 and 0.310 on AC and CL, respectively.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"3785-3794"},"PeriodicalIF":4.1,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141745639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The VoicePrivacy 2022 Challenge: Progress and Perspectives in Voice Anonymisation 语音隐私 2022 年挑战:语音匿名化的进展与展望
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-07-18 DOI: 10.1109/TASLP.2024.3430530
Michele Panariello;Natalia Tomashenko;Xin Wang;Xiaoxiao Miao;Pierre Champion;Hubert Nourtel;Massimiliano Todisco;Nicholas Evans;Emmanuel Vincent;Junichi Yamagishi
The VoicePrivacy Challenge promotes the development of voice anonymisation solutions for speech technology. In this paper we present a systematic overview and analysis of the second edition held in 2022. We describe the voice anonymisation task and datasets used for system development and evaluation, present the different attack models used for evaluation, and the associated objective and subjective metrics. We describe three anonymisation baselines, provide a summary description of the anonymisation systems developed by challenge participants, and report objective and subjective evaluation results for all. In addition, we describe post-evaluation analyses and a summary of related work reported in the open literature. Results show that solutions based on voice conversion better preserve utility, that an alternative which combines automatic speech recognition with synthesis achieves greater privacy, and that a privacy-utility trade-off remains inherent to current anonymisation solutions. Finally, we present our ideas and priorities for future VoicePrivacy Challenge editions.
语音隐私挑战赛(VoicePrivacy Challenge)旨在促进语音技术的语音匿名解决方案的开发。在本文中,我们对 2022 年举办的第二届挑战赛进行了系统的概述和分析。我们描述了用于系统开发和评估的语音匿名任务和数据集,介绍了用于评估的不同攻击模型,以及相关的客观和主观指标。我们介绍了三种匿名基线,概述了挑战赛参与者开发的匿名系统,并报告了所有系统的客观和主观评估结果。此外,我们还介绍了评估后分析和公开文献中报告的相关工作摘要。结果表明,基于语音转换的解决方案能更好地保护实用性,将自动语音识别与合成相结合的替代方案能实现更高的隐私性,而隐私性与实用性之间的权衡仍是当前匿名解决方案的固有问题。最后,我们介绍了我们对未来语音隐私挑战赛的想法和优先事项。
{"title":"The VoicePrivacy 2022 Challenge: Progress and Perspectives in Voice Anonymisation","authors":"Michele Panariello;Natalia Tomashenko;Xin Wang;Xiaoxiao Miao;Pierre Champion;Hubert Nourtel;Massimiliano Todisco;Nicholas Evans;Emmanuel Vincent;Junichi Yamagishi","doi":"10.1109/TASLP.2024.3430530","DOIUrl":"10.1109/TASLP.2024.3430530","url":null,"abstract":"The VoicePrivacy Challenge promotes the development of voice anonymisation solutions for speech technology. In this paper we present a systematic overview and analysis of the second edition held in 2022. We describe the voice anonymisation task and datasets used for system development and evaluation, present the different attack models used for evaluation, and the associated objective and subjective metrics. We describe three anonymisation baselines, provide a summary description of the anonymisation systems developed by challenge participants, and report objective and subjective evaluation results for all. In addition, we describe post-evaluation analyses and a summary of related work reported in the open literature. Results show that solutions based on voice conversion better preserve utility, that an alternative which combines automatic speech recognition with synthesis achieves greater privacy, and that a privacy-utility trade-off remains inherent to current anonymisation solutions. Finally, we present our ideas and priorities for future VoicePrivacy Challenge editions.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"3477-3491"},"PeriodicalIF":4.1,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141745638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adapting Knowledge for Few-shot Table-to-Text Generation 调整知识,实现从表格到文本的快速生成
IF 5.4 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-07-18 DOI: 10.1109/taslp.2024.3430480
Zhixin Guo, Mingxuan Yan, Jiexing Qi, Jianping Zhou, Ziwei He, Guanjie Zheng, Xinbing Wang, Chenghu Zhou
{"title":"Adapting Knowledge for Few-shot Table-to-Text Generation","authors":"Zhixin Guo, Mingxuan Yan, Jiexing Qi, Jianping Zhou, Ziwei He, Guanjie Zheng, Xinbing Wang, Chenghu Zhou","doi":"10.1109/taslp.2024.3430480","DOIUrl":"https://doi.org/10.1109/taslp.2024.3430480","url":null,"abstract":"","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"38 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141745641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ASiT: Local-Global Audio Spectrogram Vision Transformer for Event Classification ASiT:用于事件分类的本地-全局音频谱图 vIsion 变换器
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-07-16 DOI: 10.1109/TASLP.2024.3428908
Sara Atito Ali Ahmed;Muhammad Awais;Wenwu Wang;Mark D. Plumbley;Josef Kittler
Transformers, which were originally developed for natural language processing, have recently generated significant interest in the computer vision and audio communities due to their flexibility in learning long-range relationships. Constrained by the data hungry nature of transformers and the limited amount of labelled data, most transformer-based models for audio tasks are finetuned from ImageNet pretrained models, despite the huge gap between the domain of natural images and audio. This has motivated the research in self-supervised pretraining of audio transformers, which reduces the dependency on large amounts of labeled data and focuses on extracting concise representations of audio spectrograms. In this paper, we propose Local-Global Audio Spectrogram vIsion Transformer, namely ASiT, a novel self-supervised learning framework that captures local and global contextual information by employing group masked model learning and self-distillation. We evaluate our pretrained models on both audio and speech classification tasks, including audio event classification, keyword spotting, and speaker identification. We further conduct comprehensive ablation studies, including evaluations of different pretraining strategies. The proposed ASiT framework significantly boosts the performance on all tasks and sets a new state-of-the-art performance in five audio and speech classification tasks, outperforming recent methods, including the approaches that use additional datasets for pretraining.
变换器最初是为自然语言处理而开发的,由于其在学习远距离关系方面的灵活性,最近引起了计算机视觉和音频界的极大兴趣。尽管自然图像和音频领域存在巨大差距,但受限于变换器的数据饥渴特性和有限的标记数据量,大多数基于变换器的音频任务模型都是根据 ImageNet 预训练模型进行微调的。这激发了对音频变换器进行自监督预训练的研究,该研究减少了对大量标记数据的依赖,专注于提取音频频谱图的简明表示。在本文中,我们提出了 "本地-全局音频频谱声变换器"(即 ASiT),这是一种新型的自监督学习框架,它通过使用群组掩蔽模型学习和自颤动来捕捉本地和全局上下文信息。我们在音频和语音分类任务中评估了预训练模型,包括音频事件分类、关键词定位和说话人识别。我们还进行了全面的消减研究,包括对不同预训练策略的评估。所提出的 ASiT 框架大大提高了所有任务的性能,并在五项音频和语音分类任务中创造了新的一流性能,超越了最近的方法,包括使用额外数据集进行预训练的方法。
{"title":"ASiT: Local-Global Audio Spectrogram Vision Transformer for Event Classification","authors":"Sara Atito Ali Ahmed;Muhammad Awais;Wenwu Wang;Mark D. Plumbley;Josef Kittler","doi":"10.1109/TASLP.2024.3428908","DOIUrl":"10.1109/TASLP.2024.3428908","url":null,"abstract":"Transformers, which were originally developed for natural language processing, have recently generated significant interest in the computer vision and audio communities due to their flexibility in learning long-range relationships. Constrained by the data hungry nature of transformers and the limited amount of labelled data, most transformer-based models for audio tasks are finetuned from ImageNet pretrained models, despite the huge gap between the domain of natural images and audio. This has motivated the research in self-supervised pretraining of audio transformers, which reduces the dependency on large amounts of labeled data and focuses on extracting concise representations of audio spectrograms. In this paper, we propose \u0000<bold>L</b>\u0000ocal-\u0000<bold>G</b>\u0000lobal \u0000<bold>A</b>\u0000udio \u0000<bold>S</b>\u0000pectrogram v\u0000<bold>I</b>\u0000sion \u0000<bold>T</b>\u0000ransformer, namely ASiT, a novel self-supervised learning framework that captures local and global contextual information by employing group masked model learning and self-distillation. We evaluate our pretrained models on both audio and speech classification tasks, including audio event classification, keyword spotting, and speaker identification. We further conduct comprehensive ablation studies, including evaluations of different pretraining strategies. The proposed ASiT framework significantly boosts the performance on all tasks and sets a new state-of-the-art performance in five audio and speech classification tasks, outperforming recent methods, including the approaches that use additional datasets for pretraining.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"3684-3693"},"PeriodicalIF":4.1,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141720828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigating Long-Term and Short-Term Time-Varying Speaker Verification 调查长期和短期时变扬声器验证
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-07-16 DOI: 10.1109/TASLP.2024.3428910
Xiaoyi Qin;Na Li;Shufei Duan;Ming Li
The performance of speaker verification systems can be adversely affected by time domain variations. However, limited research has been conducted on time-varying speaker verification due to the absence of appropriate datasets. This paper aims to investigate the impact of long-term and short-term time-varying in speaker verification and proposes solutions to mitigate these effects. For long-term speaker verification (i.e., cross-age speaker verification), we introduce an age-decoupling adversarial learning method to learn age-invariant speaker representation by mining age information from the VoxCeleb dataset. For short-term speaker verification, we collect the SMIIP-TimeVarying (SMIIP-TV) Dataset, which includes recordings at multiple time slots every day from 373 speakers for 90 consecutive days and other relevant meta information. Using this dataset, we analyze the time-varying of speaker embeddings and propose a novel but realistic time-varying speaker verification task, termed incremental sequence-pair speaker verification. This task involves continuous interaction between enrollment audios and a sequence of testing audios with the aim of improving performance over time. We introduce the template updating method to counter the negative effects over time, and then formulate the template updating processing as a Markov Decision Process and propose a template updating method based on deep reinforcement learning (DRL). The policy network of DRL is treated as an agent to determine if and how much should the template be updated. In summary, this paper releases our collected database, investigates both the long-term and short-term time-varying scenarios and provides insights and solutions into time-varying speaker verification.
扬声器验证系统的性能会受到时域变化的不利影响。然而,由于缺乏适当的数据集,针对时变说话人验证的研究十分有限。本文旨在研究长期和短期时变对说话人验证的影响,并提出减轻这些影响的解决方案。对于长期说话人验证(即跨年龄说话人验证),我们引入了一种年龄去耦对抗学习方法,通过挖掘 VoxCeleb 数据集中的年龄信息来学习与年龄无关的说话人表征。在短期说话人验证方面,我们收集了 SMIIP-TimeVarying(SMIIP-TV)数据集,其中包括 373 位说话人连续 90 天每天多个时间段的录音和其他相关元信息。利用该数据集,我们分析了扬声器嵌入的时变性,并提出了一种新颖但现实的时变扬声器验证任务,即增量序列对扬声器验证。这项任务涉及报名音频和测试音频序列之间的持续互动,目的是随着时间的推移提高性能。我们引入了模板更新方法来对抗随时间变化的负面影响,然后将模板更新处理过程表述为马尔可夫决策过程,并提出了一种基于深度强化学习(DRL)的模板更新方法。DRL 的策略网络被视为一个代理,以决定模板是否应该更新以及更新多少。总之,本文发布了我们收集的数据库,研究了长期和短期时变场景,并为时变说话人验证提供了见解和解决方案。
{"title":"Investigating Long-Term and Short-Term Time-Varying Speaker Verification","authors":"Xiaoyi Qin;Na Li;Shufei Duan;Ming Li","doi":"10.1109/TASLP.2024.3428910","DOIUrl":"10.1109/TASLP.2024.3428910","url":null,"abstract":"The performance of speaker verification systems can be adversely affected by time domain variations. However, limited research has been conducted on time-varying speaker verification due to the absence of appropriate datasets. This paper aims to investigate the impact of long-term and short-term time-varying in speaker verification and proposes solutions to mitigate these effects. For long-term speaker verification (i.e., cross-age speaker verification), we introduce an age-decoupling adversarial learning method to learn age-invariant speaker representation by mining age information from the VoxCeleb dataset. For short-term speaker verification, we collect the SMIIP-TimeVarying (SMIIP-TV) Dataset, which includes recordings at multiple time slots every day from 373 speakers for 90 consecutive days and other relevant meta information. Using this dataset, we analyze the time-varying of speaker embeddings and propose a novel but realistic time-varying speaker verification task, termed incremental sequence-pair speaker verification. This task involves continuous interaction between enrollment audios and a sequence of testing audios with the aim of improving performance over time. We introduce the template updating method to counter the negative effects over time, and then formulate the template updating processing as a Markov Decision Process and propose a template updating method based on deep reinforcement learning (DRL). The policy network of DRL is treated as an agent to determine if and how much should the template be updated. In summary, this paper releases our collected database, investigates both the long-term and short-term time-varying scenarios and provides insights and solutions into time-varying speaker verification.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"3408-3423"},"PeriodicalIF":4.1,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141720854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From Raw Speech to Fixed Representations: A Comprehensive Evaluation of Speech Embedding Techniques 从原始语音到固定表示:语音嵌入技术综合评估
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-07-12 DOI: 10.1109/TASLP.2024.3426301
Dejan Porjazovski;Tamás Grósz;Mikko Kurimo
Speech embeddings, fixed-size representations derived from raw audio data, play a crucial role in diverse machine learning applications. Despite the abundance of speech embedding techniques, selecting the most suitable one remains challenging. Existing studies often focus on intrinsic or extrinsic aspects, seldom exploring both simultaneously. Furthermore, comparing the state-of-the-art pre-trained models with prior speech embedding solutions is notably scarce in the literature. To address these gaps, we undertake a comprehensive evaluation of both small and large-scale speech embedding models, which, in our opinion, needs to incorporate both intrinsic and extrinsic assessments. The intrinsic experiments delve into the models' ability to pick speaker-related characteristics and assess their discriminative capacities, providing insights into their inherent capabilities and internal workings. Concurrently, the extrinsic experiments evaluate whether the models learned semantic cues during pre-training. The findings underscore the superior performance of the large-scale pre-trained models, albeit at an elevated computational cost. The base self-supervised models show comparable results to their large counterparts, making them a better choice for many applications. Furthermore, we show that by selecting the most crucial dimensions, the models' performance often does not suffer drastically and even improves in some cases. This research contributes valuable insights into the nuanced landscape of speech embeddings, aiding researchers and practitioners in making informed choices for various applications.
语音嵌入是从原始音频数据中提取的固定大小表示,在各种机器学习应用中发挥着至关重要的作用。尽管语音嵌入技术层出不穷,但选择最合适的技术仍具有挑战性。现有的研究通常侧重于内在或外在方面,很少同时探索这两个方面。此外,将最先进的预训练模型与先前的语音嵌入解决方案进行比较的文献也很少。为了弥补这些不足,我们对小型和大型语音嵌入模型进行了全面评估,我们认为,这需要结合内在和外在评估。内在实验深入研究模型提取说话人相关特征的能力,并评估其分辨能力,从而深入了解模型的内在能力和内部运作。同时,外在实验评估了模型是否在预训练中学习了语义线索。研究结果表明,大规模预训练模型性能优越,但计算成本较高。基本的自监督模型显示出与大型模型相当的结果,使它们成为许多应用的更好选择。此外,我们还表明,通过选择最关键的维度,模型的性能通常不会受到严重影响,在某些情况下甚至会有所提高。这项研究为了解语音嵌入的细微差别提供了宝贵的见解,有助于研究人员和从业人员为各种应用做出明智的选择。
{"title":"From Raw Speech to Fixed Representations: A Comprehensive Evaluation of Speech Embedding Techniques","authors":"Dejan Porjazovski;Tamás Grósz;Mikko Kurimo","doi":"10.1109/TASLP.2024.3426301","DOIUrl":"10.1109/TASLP.2024.3426301","url":null,"abstract":"Speech embeddings, fixed-size representations derived from raw audio data, play a crucial role in diverse machine learning applications. Despite the abundance of speech embedding techniques, selecting the most suitable one remains challenging. Existing studies often focus on intrinsic or extrinsic aspects, seldom exploring both simultaneously. Furthermore, comparing the state-of-the-art pre-trained models with prior speech embedding solutions is notably scarce in the literature. To address these gaps, we undertake a comprehensive evaluation of both small and large-scale speech embedding models, which, in our opinion, needs to incorporate both intrinsic and extrinsic assessments. The intrinsic experiments delve into the models' ability to pick speaker-related characteristics and assess their discriminative capacities, providing insights into their inherent capabilities and internal workings. Concurrently, the extrinsic experiments evaluate whether the models learned semantic cues during pre-training. The findings underscore the superior performance of the large-scale pre-trained models, albeit at an elevated computational cost. The base self-supervised models show comparable results to their large counterparts, making them a better choice for many applications. Furthermore, we show that by selecting the most crucial dimensions, the models' performance often does not suffer drastically and even improves in some cases. This research contributes valuable insights into the nuanced landscape of speech embeddings, aiding researchers and practitioners in making informed choices for various applications.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"3546-3560"},"PeriodicalIF":4.1,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10596685","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141613380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Implicit Self-Supervised Language Representation for Spoken Language Diarization 用于口语记录的隐式自我监督语言表征
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-07-12 DOI: 10.1109/TASLP.2024.3426978
Jagabandhu Mishra;S. R. Mahadeva Prasanna
The use of spoken language diarization (LD) as a preprocessing system might be essential in a code-switched (CS) scenario. Furthermore, implicit frameworks are preferable to explicit ones, as implicit frameworks can be easily adapted to deal with low/zero resource languages. Inspired by speaker diarization literature, three frameworks based on (a) fixed segmentation, (b) change-point-based segmentation, and (c) end-to-end (E2E) are used in this study to perform LD. The initial exploration in the constructed text-to-speech female language diarization (TTSF-LD) dataset shows, that using the x-vector as implicit language representation with appropriate analysis window length achieves, comparable performance to explicit LD. The best implicit LD performance of 6.4% in terms of Jaccard error rate (JER) is achieved by using the E2E framework. However, using the natural Microsoft CS dataset, the performance of the E2E implicit LD degrades to 60.4% JER. The performance degradation is due to the inability of the x-vector representation to capture language-specific traits. To address this shortcoming, a self-supervised implicit language representation framework is used in this study. Compared to the x-vector representation, the self-supervised representation yields a relative improvement of 63.9%, achieving a JER of 21.8% when used in conjunction with the E2E framework.
在代码转换(CS)场景中,使用口语日记(LD)作为预处理系统可能是至关重要的。此外,隐式框架比显式框架更可取,因为隐式框架可以很容易地适应低/零资源语言。受说话人日记化文献的启发,本研究使用了基于 (a) 固定分割、(b) 基于变化点的分割和 (c) 端到端(E2E)的三种框架来执行 LD。在所构建的文本到语音女性语言日记化(TTSF-LD)数据集中进行的初步探索表明,使用 x 向量作为隐式语言表示法,加上适当的分析窗口长度,可以达到与显式 LD 相当的性能。通过使用 E2E 框架,隐式 LD 的最佳性能为 6.4%,即 Jaccard 错误率 (JER)。然而,在使用微软 CS 自然数据集时,E2E 隐式 LD 的性能下降到了 60.4% JER。性能下降的原因是 x 向量表示法无法捕捉特定语言的特征。为了解决这一缺陷,本研究采用了自监督隐式语言表征框架。与 x 向量表示法相比,自监督表示法的相对性能提高了 63.9%,与 E2E 框架结合使用时的 JER 为 21.8%。
{"title":"Implicit Self-Supervised Language Representation for Spoken Language Diarization","authors":"Jagabandhu Mishra;S. R. Mahadeva Prasanna","doi":"10.1109/TASLP.2024.3426978","DOIUrl":"10.1109/TASLP.2024.3426978","url":null,"abstract":"The use of spoken language diarization (LD) as a preprocessing system might be essential in a code-switched (CS) scenario. Furthermore, implicit frameworks are preferable to explicit ones, as implicit frameworks can be easily adapted to deal with low/zero resource languages. Inspired by speaker diarization literature, three frameworks based on (a) fixed segmentation, (b) change-point-based segmentation, and (c) end-to-end (E2E) are used in this study to perform LD. The initial exploration in the constructed text-to-speech female language diarization (TTSF-LD) dataset shows, that using the x-vector as implicit language representation with appropriate analysis window length achieves, comparable performance to explicit LD. The best implicit LD performance of 6.4% in terms of Jaccard error rate (JER) is achieved by using the E2E framework. However, using the natural Microsoft CS dataset, the performance of the E2E implicit LD degrades to 60.4% JER. The performance degradation is due to the inability of the x-vector representation to capture language-specific traits. To address this shortcoming, a self-supervised implicit language representation framework is used in this study. Compared to the x-vector representation, the self-supervised representation yields a relative improvement of 63.9%, achieving a JER of 21.8% when used in conjunction with the E2E framework.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"3393-3407"},"PeriodicalIF":4.1,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141613404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised Face-Masked Speech Enhancement Using Generative Adversarial Networks With Human-in-the-Loop Assessment Metrics 使用生成式对抗网络和人在回路中评估指标的无监督人脸掩码语音增强技术
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-07-12 DOI: 10.1109/TASLP.2024.3426996
Syu-Siang Wang;Jia-Yang Chen;Bo-Ren Bai;Shih-Hau Fang;Yu Tsao
The utilization of face masks is an essential healthcare measure, particularly during times of pandemics, yet it can present challenges in communication in our daily lives. To address this problem, we propose a novel approach known as the human-in-the-loop StarGAN (HL–StarGAN) face-masked speech enhancement method. HL–StarGAN comprises discriminator, classifier, metric assessment predictor, and generator that leverages an attention mechanism. The metric assessment predictor, referred to as MaskQSS, incorporates human participants in its development and serves as a “human-in-the-loop” module during the learning process of HL–StarGAN. The overall HL–StarGAN model was trained using an unsupervised learning strategy that simultaneously focuses on the reconstruction of the original clean speech and the optimization of human perception. To implement HL–StarGAN, we created a face-masked speech database named “FMVD,” which comprises recordings from 34 speakers in three distinct face-masked scenarios and a clean condition. We conducted subjective and objective tests on the proposed HL–StarGAN using this database. The outcomes of the test results are as follows: (1) MaskQSS successfully predicted the quality scores of face-masked voices, outperforming several existing speech assessment methods. (2) The integration of the MaskQSS predictor enhanced the ability of HL–StarGAN to transform face-masked voices into high-quality speech; this enhancement is evident in both objective and subjective tests, outperforming conventional StarGAN and CycleGAN-based systems.
使用面罩是一项重要的医疗保健措施,尤其是在大流行病时期,但它可能会给我们的日常生活交流带来挑战。为了解决这个问题,我们提出了一种新方法,即人在环 StarGAN(HL-StarGAN)面罩语音增强方法。HL-StarGAN 由鉴别器、分类器、度量评估预测器和利用注意力机制的生成器组成。度量评估预测器被称为 MaskQSS,在其开发过程中纳入了人类参与者,并在 HL-StarGAN 的学习过程中充当 "人在环中 "模块。整个 HL-StarGAN 模型采用无监督学习策略进行训练,该策略同时关注原始纯净语音的重建和人类感知的优化。为了实现 HL-StarGAN,我们创建了一个名为 "FMVD "的人脸屏蔽语音数据库,其中包括 34 位发言人在三种不同的人脸屏蔽场景和一种干净状态下的录音。我们使用该数据库对拟议的 HL-StarGAN 进行了主观和客观测试。测试结果如下:(1) MaskQSS 成功预测了蒙面语音的质量得分,优于现有的几种语音评估方法。(2) MaskQSS 预测器的集成增强了 HL-StarGAN 将人脸屏蔽语音转换为高质量语音的能力;这种增强在客观和主观测试中都很明显,优于基于 StarGAN 和 CycleGAN 的传统系统。
{"title":"Unsupervised Face-Masked Speech Enhancement Using Generative Adversarial Networks With Human-in-the-Loop Assessment Metrics","authors":"Syu-Siang Wang;Jia-Yang Chen;Bo-Ren Bai;Shih-Hau Fang;Yu Tsao","doi":"10.1109/TASLP.2024.3426996","DOIUrl":"10.1109/TASLP.2024.3426996","url":null,"abstract":"The utilization of face masks is an essential healthcare measure, particularly during times of pandemics, yet it can present challenges in communication in our daily lives. To address this problem, we propose a novel approach known as the human-in-the-loop StarGAN (HL–StarGAN) face-masked speech enhancement method. HL–StarGAN comprises discriminator, classifier, metric assessment predictor, and generator that leverages an attention mechanism. The metric assessment predictor, referred to as MaskQSS, incorporates human participants in its development and serves as a “human-in-the-loop” module during the learning process of HL–StarGAN. The overall HL–StarGAN model was trained using an unsupervised learning strategy that simultaneously focuses on the reconstruction of the original clean speech and the optimization of human perception. To implement HL–StarGAN, we created a face-masked speech database named “FMVD,” which comprises recordings from 34 speakers in three distinct face-masked scenarios and a clean condition. We conducted subjective and objective tests on the proposed HL–StarGAN using this database. The outcomes of the test results are as follows: (1) MaskQSS successfully predicted the quality scores of face-masked voices, outperforming several existing speech assessment methods. (2) The integration of the MaskQSS predictor enhanced the ability of HL–StarGAN to transform face-masked voices into high-quality speech; this enhancement is evident in both objective and subjective tests, outperforming conventional StarGAN and CycleGAN-based systems.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"3826-3837"},"PeriodicalIF":4.1,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10596684","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141613381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised Adaptive Speaker Recognition by Coupling-Regularized Optimal Transport 通过耦合-规则化优化传输实现无监督自适应扬声器识别
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-07-12 DOI: 10.1109/TASLP.2024.3426934
Ruiteng Zhang;Jianguo Wei;Xugang Lu;Wenhuan Lu;Di Jin;Lin Zhang;Junhai Xu
Cross-domain speaker recognition (SR) can be improved by unsupervised domain adaptation (UDA) algorithms. UDA algorithms often reduce domain mismatch at the cost of decreasing the discrimination of speaker features. In contrast, optimal transport (OT) has the potential to achieve domain alignment while preserving the speaker discrimination capability in UDA applications; however, naively applying OT to measure global probability distribution discrepancies between the source and target domains may induce negative transports where samples belonging to different speakers are coupled in transportation. These negative transports reduce the SR model's discriminative power, degrading the SR performance. This paper proposes a coupling-regularized optimal transport (CROT) algorithm for cross-domain SR to reduce the negative transport during UDA. In the proposed CROT, two consecutive processing modules regularize the coupling paths for the OT solution: a progressive inter-speaker constraint (PISC) module and a coupling-smoothed regularization (CSR) module. The PISC, designed as a pseudo-label memory bank with curriculum learning, is first applied to select valid samples to guarantee that coupling samples are from the same speaker. The CSR, designed to control the information entropy of the coupling paths further, reduces the effect of negative transport in UDA. To evaluate the effectiveness of the proposed algorithm, cross-domain SR experiments were conducted under different target domains, speaker encoders, corpora, and acoustic features. Experimental results showed that CROT achieved a 50% relative reduction in equal error rates compared to conventional OT-based UDAs, outperforming the state-of-the-art UDAs.
跨域说话人识别(SR)可以通过无监督域适应(UDA)算法得到改善。UDA 算法通常以降低说话人特征的辨识度为代价来减少域不匹配。与此相反,在 UDA 应用中,最优传输(OT)有可能在保持说话人辨别能力的同时实现域对齐;然而,天真地应用 OT 来测量源域和目标域之间的全局概率分布差异可能会引起负传输,即属于不同说话人的样本在传输中耦合在一起。这些负迁移会降低 SR 模型的分辨能力,从而降低 SR 性能。本文提出了一种用于跨域 SR 的耦合规则化最优传输(CROT)算法,以减少 UDA 过程中的负传输。在所提出的 CROT 算法中,有两个连续的处理模块对 OT 解决方案的耦合路径进行了正则化处理:一个是渐进式扬声器间约束(PISC)模块,另一个是耦合平滑正则化(CSR)模块。PISC 设计为具有课程学习功能的伪标签记忆库,首先用于选择有效样本,以保证耦合样本来自同一说话者。CSR 的目的是进一步控制耦合路径的信息熵,减少 UDA 中负传输的影响。为了评估所提算法的有效性,我们在不同的目标域、说话者编码器、语料库和声学特征下进行了跨域 SR 实验。实验结果表明,与传统的基于 OT 的 UDA 相比,CROT 实现了相等错误率相对减少 50%,优于最先进的 UDA。
{"title":"Unsupervised Adaptive Speaker Recognition by Coupling-Regularized Optimal Transport","authors":"Ruiteng Zhang;Jianguo Wei;Xugang Lu;Wenhuan Lu;Di Jin;Lin Zhang;Junhai Xu","doi":"10.1109/TASLP.2024.3426934","DOIUrl":"10.1109/TASLP.2024.3426934","url":null,"abstract":"Cross-domain speaker recognition (SR) can be improved by unsupervised domain adaptation (UDA) algorithms. UDA algorithms often reduce domain mismatch at the cost of decreasing the discrimination of speaker features. In contrast, optimal transport (OT) has the potential to achieve domain alignment while preserving the speaker discrimination capability in UDA applications; however, naively applying OT to measure global probability distribution discrepancies between the source and target domains may induce negative transports where samples belonging to different speakers are coupled in transportation. These negative transports reduce the SR model's discriminative power, degrading the SR performance. This paper proposes a coupling-regularized optimal transport (CROT) algorithm for cross-domain SR to reduce the negative transport during UDA. In the proposed CROT, two consecutive processing modules regularize the coupling paths for the OT solution: a progressive inter-speaker constraint (PISC) module and a coupling-smoothed regularization (CSR) module. The PISC, designed as a pseudo-label memory bank with curriculum learning, is first applied to select valid samples to guarantee that coupling samples are from the same speaker. The CSR, designed to control the information entropy of the coupling paths further, reduces the effect of negative transport in UDA. To evaluate the effectiveness of the proposed algorithm, cross-domain SR experiments were conducted under different target domains, speaker encoders, corpora, and acoustic features. Experimental results showed that CROT achieved a 50% relative reduction in equal error rates compared to conventional OT-based UDAs, outperforming the state-of-the-art UDAs.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"3603-3617"},"PeriodicalIF":4.1,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141613403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Latent Space Interpolation of Synthesizer Parameters Using Timbre-Regularized Auto-Encoders 使用音色细化自动编码器对合成器参数进行潜空间插值
IF 4.1 2区 计算机科学 Q1 ACOUSTICS Pub Date : 2024-07-12 DOI: 10.1109/TASLP.2024.3426987
Gwendal Le Vaillant;Thierry Dutoit
Sound synthesizers are ubiquitous in modern music production but manipulating their presets, i.e. the sets of synthesis parameters, demands expert skills. This study presents a novel variational auto-encoder model tailored for black-box synthesizer preset interpolation, which enables the intuitive generation of new presets from pre-existing ones. Leveraging multi-head self-attention networks, the model efficiently learns latent representations of synthesis parameters, aligning these with perceived timbre dimensions through attribute-based regularization. It is able to gradually transition between diverse presets, surpassing traditional linear parametric interpolation methods. Furthermore, we introduce an objective and reproducible evaluation method, based on linearity and smoothness metrics computed on a broad set of audio features. The model's efficacy is demonstrated through subjective experiments, whose results also highlight significant correlations with the proposed objective metrics. The model is validated using a widespread frequency modulation synthesizer with a large set of interdependent parameters. It can be adapted to various commercial synthesizers, and can perform other tasks such as modulations and extrapolations.
声音合成器在现代音乐制作中无处不在,但操作其预置(即合成参数集)需要专业技能。本研究提出了一种专为黑盒子合成器预置插值定制的新型变异自动编码器模型,它能从已有的预置直观地生成新的预置。该模型利用多头自注意网络,有效地学习合成参数的潜在表征,并通过基于属性的正则化将这些表征与感知音色维度相一致。它能够在不同的预设之间逐渐过渡,超越了传统的线性参数插值方法。此外,我们还引入了一种客观、可重复的评估方法,该方法基于对大量音频特征计算出的线性和平滑度指标。主观实验证明了该模型的功效,实验结果也凸显了该模型与所提出的客观指标之间的显著相关性。该模型通过一个广泛使用的频率调制合成器进行验证,该合成器具有大量相互依存的参数。该模型可适用于各种商用合成器,并能执行调制和外推等其他任务。
{"title":"Latent Space Interpolation of Synthesizer Parameters Using Timbre-Regularized Auto-Encoders","authors":"Gwendal Le Vaillant;Thierry Dutoit","doi":"10.1109/TASLP.2024.3426987","DOIUrl":"10.1109/TASLP.2024.3426987","url":null,"abstract":"Sound synthesizers are ubiquitous in modern music production but manipulating their presets, i.e. the sets of synthesis parameters, demands expert skills. This study presents a novel variational auto-encoder model tailored for black-box synthesizer preset interpolation, which enables the intuitive generation of new presets from pre-existing ones. Leveraging multi-head self-attention networks, the model efficiently learns latent representations of synthesis parameters, aligning these with perceived timbre dimensions through attribute-based regularization. It is able to gradually transition between diverse presets, surpassing traditional linear parametric interpolation methods. Furthermore, we introduce an objective and reproducible evaluation method, based on linearity and smoothness metrics computed on a broad set of audio features. The model's efficacy is demonstrated through subjective experiments, whose results also highlight significant correlations with the proposed objective metrics. The model is validated using a widespread frequency modulation synthesizer with a large set of interdependent parameters. It can be adapted to various commercial synthesizers, and can perform other tasks such as modulations and extrapolations.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"3379-3392"},"PeriodicalIF":4.1,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141613382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE/ACM Transactions on Audio, Speech, and Language Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1