首页 > 最新文献

Computer Speech and Language最新文献

英文 中文
Modeling the temporal envelope of sub-band signals for improving the performance of children’s speech recognition system in zero-resource scenario 为提高零资源情景下儿童语音识别系统的性能,对子带信号的时间包络进行建模
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-12 DOI: 10.1016/j.csl.2026.101954
Kaustav Das, Biswaranjan Pattanayak, Gayadhar Pradhan
Children’s KWS (keyword spotting) systems often experience a significant decline in performance when acoustic mismatches occur between training and testing conditions. Though multiple factors are liable for creating such mismatches, pitch and speaking rate are the two predominant sources of acoustic mismatch. This work proposes a pitch-robust acoustic feature by computing the temporal envelope of sub-band signals to develop a children’s KWS system in the zero-resource scenario. To accomplish this, the speech signal is first passed through M non-overlapping band-pass filters arranged in a linear scale to break it down into sub-bands. Then, the temporal envelope of each sub-band signal is estimated with the application of the Hilbert transform. The mean values of the estimated envelopes are computed over an analysis frame and logarithmically compressed to yield an M-dimensional feature vector per analysis frame, here termed the logarithmically compressed averaged temporal envelope of sub-band signals (LC-ATESS). The efficacy of the proposed LC-ATESS feature is tested on the deep neural network-hidden Markov model-based acoustic model. The observed KWS results are superior to conventional Mel-frequency cepstral coefficients (MFCC), MFCC computed after spectral smoothing, and features calculated from single-frequency spectra, both with and without data augmentation, across clean and noisy test scenarios.
当训练条件和测试条件之间出现声学不匹配时,儿童关键字识别系统的性能往往会显著下降。虽然造成这种不匹配的因素有很多,但音高和语速是造成声学不匹配的两个主要原因。本研究通过计算子带信号的时间包络,提出了一种音调鲁棒的声学特征,以开发零资源场景下的儿童KWS系统。为了实现这一点,语音信号首先通过按线性比例排列的M个非重叠带通滤波器,将其分解成子带。然后,利用希尔伯特变换估计各子带信号的时域包络。在分析帧上计算估计包络的平均值,并对其进行对数压缩以产生每个分析帧的m维特征向量,这里称为子带信号的对数压缩平均时间包络(LC-ATESS)。在基于深度神经网络隐马尔可夫模型的声学模型上测试了LC-ATESS特征的有效性。观察到的KWS结果优于传统的mel频率倒谱系数(MFCC),谱平滑后计算的MFCC,以及从单频谱中计算的特征,无论有无数据增强,在清洁和嘈杂的测试场景中。
{"title":"Modeling the temporal envelope of sub-band signals for improving the performance of children’s speech recognition system in zero-resource scenario","authors":"Kaustav Das,&nbsp;Biswaranjan Pattanayak,&nbsp;Gayadhar Pradhan","doi":"10.1016/j.csl.2026.101954","DOIUrl":"10.1016/j.csl.2026.101954","url":null,"abstract":"<div><div>Children’s KWS (keyword spotting) systems often experience a significant decline in performance when acoustic mismatches occur between training and testing conditions. Though multiple factors are liable for creating such mismatches, pitch and speaking rate are the two predominant sources of acoustic mismatch. This work proposes a pitch-robust acoustic feature by computing the temporal envelope of sub-band signals to develop a children’s KWS system in the zero-resource scenario. To accomplish this, the speech signal is first passed through <span><math><mi>M</mi></math></span> non-overlapping band-pass filters arranged in a linear scale to break it down into sub-bands. Then, the temporal envelope of each sub-band signal is estimated with the application of the Hilbert transform. The mean values of the estimated envelopes are computed over an analysis frame and logarithmically compressed to yield an <span><math><mi>M</mi></math></span>-dimensional feature vector per analysis frame, here termed the logarithmically compressed averaged temporal envelope of sub-band signals (LC-ATESS). The efficacy of the proposed LC-ATESS feature is tested on the deep neural network-hidden Markov model-based acoustic model. The observed KWS results are superior to conventional Mel-frequency cepstral coefficients (MFCC), MFCC computed after spectral smoothing, and features calculated from single-frequency spectra, both with and without data augmentation, across clean and noisy test scenarios.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"100 ","pages":"Article 101954"},"PeriodicalIF":3.4,"publicationDate":"2026-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the use of DiaPer models and matching algorithm for RTVE speaker diarization 2024 dataset 基于纸尿裤模型和匹配算法的RTVE说话人拨号2024数据集
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-06 DOI: 10.1016/j.csl.2026.101948
Juan Ignacio Alvarez-Trejos , Sara Barahona , Laura Herrera-Alarcon , Jérémie Touati , Alicia Lozano-Diez
Speaker diarization in broadcast media presents significant challenges due to long-duration recordings, numerous speakers, and complex acoustic conditions. End-to-end neural diarization models like DiaPer (Diarization with Perceiver), which directly predict speaker activity from audio features without intermediate clustering steps, have shown promising results. However, their application to extended recordings remains computationally prohibitive due to quadratic complexity with respect to input length. This paper addresses these limitations by proposing a framework that applies DiaPer to short audio chunks and subsequently reconciles speaker identities across segments using a matching algorithm. We systematically analyze optimal chunk durations for DiaPer processing and introduce an enhanced chunk-matching algorithm leveraging state-of-the-art speaker embeddings, comparing Emphasized Channel Attention, Propagation and Aggregation in Time Delay Neural Network (ECAPA-TDNN), Residual Networks (ResNet), and Reshape Dimensions Network (ReDimNet) architectures. Our experimental evaluation on the challenging Radio Televisión Española (RTVE) datasets shows that ReDimNet embeddings consistently outperform alternatives, achieving substantial improvements in speaker identity consistency across segments. The proposed approach yields a Diarization Error Rate (DER) of 17.34% on the RTVE 2024 test set, which is competitive with state-of-the-art systems while achieving a 63.6% relative improvement over the baseline DiaPer model applied directly to complete audio recordings. This demonstrates that end-to-end neural approaches can be successfully extended to hour-long recordings while maintaining computational efficiency.
由于长时间录音、众多扬声器和复杂的声学条件,广播媒体中的扬声器拨号提出了重大挑战。像尿布(diarization with percepver)这样的端到端神经diarization模型,直接从音频特征中预测说话者的活动,而不需要中间聚类步骤,已经显示出有希望的结果。然而,由于输入长度的二次复杂度,它们在扩展记录中的应用在计算上仍然是禁止的。本文通过提出一个框架来解决这些限制,该框架将尿布应用于短音频块,随后使用匹配算法协调分段之间的说话人身份。我们系统地分析了尿布处理的最佳块持续时间,并引入了一种增强的块匹配算法,利用最先进的扬声器嵌入,比较了延迟神经网络(ECAPA-TDNN)、残差网络(ResNet)和重塑维度网络(ReDimNet)架构中的强调频道注意、传播和聚合。我们在具有挑战性的Radio Televisión Española (RTVE)数据集上的实验评估表明,ReDimNet嵌入始终优于替代方案,在跨段的说话人身份一致性方面取得了实质性的改进。所提出的方法在RTVE 2024测试集上产生的Diarization错误率(DER)为17.34%,与最先进的系统竞争,同时比直接应用于完整录音的基线尿布模型实现了63.6%的相对改进。这表明端到端神经方法可以成功地扩展到长达一小时的记录,同时保持计算效率。
{"title":"On the use of DiaPer models and matching algorithm for RTVE speaker diarization 2024 dataset","authors":"Juan Ignacio Alvarez-Trejos ,&nbsp;Sara Barahona ,&nbsp;Laura Herrera-Alarcon ,&nbsp;Jérémie Touati ,&nbsp;Alicia Lozano-Diez","doi":"10.1016/j.csl.2026.101948","DOIUrl":"10.1016/j.csl.2026.101948","url":null,"abstract":"<div><div>Speaker diarization in broadcast media presents significant challenges due to long-duration recordings, numerous speakers, and complex acoustic conditions. End-to-end neural diarization models like DiaPer (Diarization with Perceiver), which directly predict speaker activity from audio features without intermediate clustering steps, have shown promising results. However, their application to extended recordings remains computationally prohibitive due to quadratic complexity with respect to input length. This paper addresses these limitations by proposing a framework that applies DiaPer to short audio chunks and subsequently reconciles speaker identities across segments using a matching algorithm. We systematically analyze optimal chunk durations for DiaPer processing and introduce an enhanced chunk-matching algorithm leveraging state-of-the-art speaker embeddings, comparing Emphasized Channel Attention, Propagation and Aggregation in Time Delay Neural Network (ECAPA-TDNN), Residual Networks (ResNet), and Reshape Dimensions Network (ReDimNet) architectures. Our experimental evaluation on the challenging <em>Radio Televisión Española</em> (RTVE) datasets shows that ReDimNet embeddings consistently outperform alternatives, achieving substantial improvements in speaker identity consistency across segments. The proposed approach yields a Diarization Error Rate (DER) of 17.34% on the RTVE 2024 test set, which is competitive with state-of-the-art systems while achieving a 63.6% relative improvement over the baseline DiaPer model applied directly to complete audio recordings. This demonstrates that end-to-end neural approaches can be successfully extended to hour-long recordings while maintaining computational efficiency.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"99 ","pages":"Article 101948"},"PeriodicalIF":3.4,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146173545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Two-stage multiple instance learning networks with attention-based hybrid aggregation for speech emotion recognition 基于注意力混合聚合的两阶段多实例学习网络用于语音情感识别
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-03 DOI: 10.1016/j.csl.2026.101946
Shiqing Zhang, Chen Chen, Dandan Wang, Xin Tao, Xiaoming Zhao
In common categorical speech emotion recognition (SER) tasks, the used emotion corpora often provide ground truth labels at utterance-level rather than segment-level. However, such coarse-grained labeling approaches rely on an assumption that emotion expression in an utterance is uniformly distributed, which is inappropriate to characterize human emotional ambiguity in real scenarios. To alleviate this issue, this work proposes two-stage multiple instance learning (MIL) networks equipped with attention-based hybrid aggregation for SER. From the viewpoint of MIL, an utterance is considered as a bag, and divided into certain segments, each of which is taken as an instance. Each instance is then processed with two stages: segment-level acoustic feature encoder in stage-1, and MIL-based hybrid aggregator in stage-2. In particular, in stage-1 multiple-level acoustic features are encoded for each divided segment, and then a cross-attention mechanism is employed to perform feature enhancement and fusion. In stage-2, a MIL-based hybrid aggregator, consisting of adaptive aggregation, instance selection and attention-based aggregation, is designed to obtain final utterance-level results. The proposed method is evaluated on the public IEMOCAP and MELD datasets, and experimental results demonstrate the effectiveness of the proposed method.
在常见的分类语音情感识别(SER)任务中,使用的情感语料库通常在话语级而不是段级提供基础真值标签。然而,这种粗粒度标记方法依赖于一个假设,即话语中的情感表达是均匀分布的,这并不适合在真实场景中表征人类的情感歧义。为了缓解这一问题,本研究提出了两阶段多实例学习(MIL)网络,该网络为SER配备了基于注意力的混合聚合。从MIL的角度来看,将话语视为一个袋子,并将其分成一定的片段,每个片段都作为一个实例。然后将每个实例进行两个阶段的处理:阶段1为段级声学特征编码器,阶段2为基于mil的混合聚合器。其中,第一阶段对每一分割段进行多级声学特征编码,然后利用交叉注意机制进行特征增强和融合。在第二阶段,设计了一个基于mil的混合聚合器,包括自适应聚合、实例选择和基于注意力的聚合,以获得最终的话语级结果。在IEMOCAP和MELD公共数据集上对所提方法进行了评估,实验结果证明了所提方法的有效性。
{"title":"Two-stage multiple instance learning networks with attention-based hybrid aggregation for speech emotion recognition","authors":"Shiqing Zhang,&nbsp;Chen Chen,&nbsp;Dandan Wang,&nbsp;Xin Tao,&nbsp;Xiaoming Zhao","doi":"10.1016/j.csl.2026.101946","DOIUrl":"10.1016/j.csl.2026.101946","url":null,"abstract":"<div><div>In common categorical speech emotion recognition (SER) tasks, the used emotion corpora often provide ground truth labels at utterance-level rather than segment-level. However, such coarse-grained labeling approaches rely on an assumption that emotion expression in an utterance is uniformly distributed, which is inappropriate to characterize human emotional ambiguity in real scenarios. To alleviate this issue, this work proposes two-stage multiple instance learning (MIL) networks equipped with attention-based hybrid aggregation for SER. From the viewpoint of MIL, an utterance is considered as a bag, and divided into certain segments, each of which is taken as an instance. Each instance is then processed with two stages: segment-level acoustic feature encoder in stage-1, and MIL-based hybrid aggregator in stage-2. In particular, in stage-1 multiple-level acoustic features are encoded for each divided segment, and then a cross-attention mechanism is employed to perform feature enhancement and fusion. In stage-2, a MIL-based hybrid aggregator, consisting of adaptive aggregation, instance selection and attention-based aggregation, is designed to obtain final utterance-level results. The proposed method is evaluated on the public IEMOCAP and MELD datasets, and experimental results demonstrate the effectiveness of the proposed method.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"99 ","pages":"Article 101946"},"PeriodicalIF":3.4,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146172980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
QuAVA: A privacy-aware architecture for conversational desktop Content Retrieval systems 会话式桌面内容检索系统的隐私感知架构
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-02 DOI: 10.1016/j.csl.2026.101950
Nikolaos Malamas , Andreas L. Symeonidis , John B. Theocharis
Question Answering (QA) and Content Retrieval (CR) systems have experienced a boost in performance in recent years leveraging state-of-the-art Transformer models to process user expressions and retrieve and extract information requested. Despite the constant language understanding improvements, very little effort has been put into the design of such systems for personal desktop use, where data are kept locally and are not sent to cloud services and decisions and outputs are transparent and explainable to the user. To that end, we present QuAVA, a conversational desktop content retrieval assistant, designed on four pillars: privacy and security, explainability, low-resource requirements, and multi-source data fusion. QuAVA is a data and privacy-preserving assistant that enables users to access their private data such as files, emails, and message exchanges, conversationally and transparently. The proposed architecture automatically extracts and preprocesses content from various sources and organizes it in a 3-layered hierarchical structure, namely a topic, a subtopic, and a content layer by employing ML algorithms for clustering and labeling. This way, users can navigate and access information via a set of conversation rules embedded in the assistant. We conduct a qualitative comparison analysis of the QuAVA architecture with other well-established QA and CR architectures against the four pillars defined, as well as privacy tests, and conclude that QuAVA is the only – to our knowledge – virtual assistant that successfully satisfies them.
近年来,问答(QA)和内容检索(CR)系统利用最先进的Transformer模型处理用户表达式并检索和提取所请求的信息,在性能上有了很大的提高。尽管对语言的理解不断得到改进,但在设计这种供个人桌面使用的系统方面投入的努力很少,这些系统的数据保存在本地,而不是发送到云服务,决策和输出是透明的,对用户是可解释的。为此,我们提出了QuAVA,这是一个会话桌面内容检索助手,其设计基于四个支柱:隐私和安全性、可解释性、低资源需求和多源数据融合。QuAVA是一个数据和隐私保护助手,使用户能够以对话和透明的方式访问他们的私人数据,如文件、电子邮件和消息交换。该架构采用ML算法进行聚类和标记,自动从各种来源提取和预处理内容,并将其组织成3层层次结构,即主题、子主题和内容层。这样,用户就可以通过助手中嵌入的一组对话规则来导航和访问信息。我们针对定义的四个支柱,以及隐私测试,对QuAVA架构与其他完善的QA和CR架构进行了定性比较分析,并得出结论:据我们所知,QuAVA是唯一成功满足他们的虚拟助手。
{"title":"QuAVA: A privacy-aware architecture for conversational desktop Content Retrieval systems","authors":"Nikolaos Malamas ,&nbsp;Andreas L. Symeonidis ,&nbsp;John B. Theocharis","doi":"10.1016/j.csl.2026.101950","DOIUrl":"10.1016/j.csl.2026.101950","url":null,"abstract":"<div><div>Question Answering (QA) and Content Retrieval (CR) systems have experienced a boost in performance in recent years leveraging state-of-the-art Transformer models to process user expressions and retrieve and extract information requested. Despite the constant language understanding improvements, very little effort has been put into the design of such systems for personal desktop use, where data are kept locally and are not sent to cloud services and decisions and outputs are transparent and explainable to the user. To that end, we present QuAVA, a conversational desktop content retrieval assistant, designed on four pillars: privacy and security, explainability, low-resource requirements, and multi-source data fusion. QuAVA is a data and privacy-preserving assistant that enables users to access their private data such as files, emails, and message exchanges, conversationally and transparently. The proposed architecture automatically extracts and preprocesses content from various sources and organizes it in a 3-layered hierarchical structure, namely a topic, a subtopic, and a content layer by employing ML algorithms for clustering and labeling. This way, users can navigate and access information via a set of conversation rules embedded in the assistant. We conduct a qualitative comparison analysis of the QuAVA architecture with other well-established QA and CR architectures against the four pillars defined, as well as privacy tests, and conclude that QuAVA is the only – to our knowledge – virtual assistant that successfully satisfies them.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"99 ","pages":"Article 101950"},"PeriodicalIF":3.4,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146173544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Emotion-guided cross-modal alignment for multimodal depression detection 情绪引导的多模态抑郁检测的跨模态对齐
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-29 DOI: 10.1016/j.csl.2026.101951
Wenzhe Jia , Yuhang Wang , Yahui Kang
Depression detection from multimodal data is crucial for early intervention and mental health monitoring. Existing systems, however, face three challenges: (i) capturing subtle affective cues that distinguish depressive states from normal emotional variations, (ii) establishing reliable correspondence between heterogeneous speech and text modalities, and (iii) handling severe class imbalance in real-world corpora. To address these challenges, we propose a framework that integrates explicit emotion supervision, cross-modal alignment, and metric-oriented optimization for robust multimodal depression detection. Acoustic and lexical features are augmented with emotion-category embeddings derived from supervision signals to provide affective context, while semantic correspondence is reinforced through a contrastive alignment objective. To mitigate imbalance, we directly optimize macro-F1 with the Lovász loss. On the Emotional Audio-Textual Depression Corpus (EATD-Corpus), our framework achieves 87.40% ± 0.46% macro-F1 with dataset-provided emotions and 83.15% with predicted emotions, compared to 71.82% without emotion information. Cross-dataset evaluation on the Distress Analysis Interview Corpus – Wizard of Oz (DAIC-WOZ) shows consistent gains, including a 12.34% F1 improvement with emotion augmentation. This integrated approach—combining emotion supervision, cross-modal alignment, and metric-oriented optimization—represents a novel contribution to depression detection. Our framework provides a practical and robust solution for real-world multimodal depression detection.
从多模态数据中发现抑郁症对于早期干预和心理健康监测至关重要。然而,现有的系统面临着三个挑战:(i)捕捉微妙的情感线索,以区分抑郁状态和正常的情绪变化;(ii)在异质语音和文本模式之间建立可靠的对应关系;(iii)处理现实世界语料库中严重的阶级不平衡。为了应对这些挑战,我们提出了一个框架,该框架集成了明确的情绪监督、跨模态对齐和面向度量的优化,用于鲁棒的多模态抑郁检测。语音和词汇特征通过来自监督信号的情感类别嵌入得到增强,以提供情感语境,而语义对应通过对比对齐目标得到加强。为了减轻不平衡,我们直接用Lovász损失优化宏观f1。在情绪音频-文本抑郁语料库(EATD-Corpus)上,我们的框架在数据集提供情绪的情况下实现了87.40%±0.46%的宏观f1,在预测情绪的情况下实现了83.15%的宏观f1,而在没有情绪信息的情况下实现了71.82%的宏观f1。对《绿野仙踪》(DAIC-WOZ)的痛苦分析访谈语料(Distress Analysis Interview Corpus - Wizard of Oz)的跨数据集评估显示出持续的收益,包括情绪增强的12.34% F1改进。这种综合方法-结合情绪监督,跨模态对齐和面向度量的优化-代表了对抑郁症检测的新贡献。我们的框架为现实世界的多模态抑郁检测提供了一个实用而强大的解决方案。
{"title":"Emotion-guided cross-modal alignment for multimodal depression detection","authors":"Wenzhe Jia ,&nbsp;Yuhang Wang ,&nbsp;Yahui Kang","doi":"10.1016/j.csl.2026.101951","DOIUrl":"10.1016/j.csl.2026.101951","url":null,"abstract":"<div><div>Depression detection from multimodal data is crucial for early intervention and mental health monitoring. Existing systems, however, face three challenges: (i) capturing subtle affective cues that distinguish depressive states from normal emotional variations, (ii) establishing reliable correspondence between heterogeneous speech and text modalities, and (iii) handling severe class imbalance in real-world corpora. To address these challenges, we propose a framework that integrates explicit emotion supervision, cross-modal alignment, and metric-oriented optimization for robust multimodal depression detection. Acoustic and lexical features are augmented with emotion-category embeddings derived from supervision signals to provide affective context, while semantic correspondence is reinforced through a contrastive alignment objective. To mitigate imbalance, we directly optimize macro-F1 with the Lovász loss. On the Emotional Audio-Textual Depression Corpus (EATD-Corpus), our framework achieves 87.40% <span><math><mo>±</mo></math></span> 0.46% macro-F1 with dataset-provided emotions and 83.15% with predicted emotions, compared to 71.82% without emotion information. Cross-dataset evaluation on the Distress Analysis Interview Corpus – Wizard of Oz (DAIC-WOZ) shows consistent gains, including a 12.34% F1 improvement with emotion augmentation. This integrated approach—combining emotion supervision, cross-modal alignment, and metric-oriented optimization—represents a novel contribution to depression detection. Our framework provides a practical and robust solution for real-world multimodal depression detection.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"99 ","pages":"Article 101951"},"PeriodicalIF":3.4,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146077583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Medical related word enhancement framework: A new method for large language model in medical dialogue generation 医学相关词增强框架:医学对话生成中大语言模型的一种新方法
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-29 DOI: 10.1016/j.csl.2026.101949
Yujiang Liu , Lijun Fu , Xiaojun Xia
With the development of AI technology, especially after the emergence of large language models, the response of the medical chatbot is more accurate and reasonable than before. However, due to the high cost of data annotation and hardware for training or fine-tuning specific data, it is difficult for researchers or physicians to train appropriate models for medical consultation. In this paper, we propose a new framework to solve this problem for medical dialogue generation. It is a vector level optimization scheme that we use different strategies during the training and testing stages. In the training stage, the original response and medical related words are supervised by two LLMs, which are considered as a twin network. While in the testing stage, we combine the hidden states of them to get the fusion output of the response. A large number of experiments show that our framework is effective and achieves performance improvement on five medical chat datasets. Thus, we provide new research ideas for medical chatbots.
随着AI技术的发展,特别是大型语言模型出现后,医疗聊天机器人的响应比以前更加准确合理。然而,由于数据标注和用于训练或微调特定数据的硬件成本较高,研究人员或医生很难训练出合适的医疗咨询模型。在本文中,我们提出了一个新的框架来解决医学对话生成的这个问题。这是一个矢量级的优化方案,我们在训练和测试阶段使用不同的策略。在训练阶段,原始响应和医学相关词语由两个llm监督,这两个llm被认为是一个孪生网络。在测试阶段,我们将它们的隐藏状态组合起来,得到响应的融合输出。大量实验表明,我们的框架是有效的,并在五个医疗聊天数据集上实现了性能提升。从而为医疗聊天机器人提供了新的研究思路。
{"title":"Medical related word enhancement framework: A new method for large language model in medical dialogue generation","authors":"Yujiang Liu ,&nbsp;Lijun Fu ,&nbsp;Xiaojun Xia","doi":"10.1016/j.csl.2026.101949","DOIUrl":"10.1016/j.csl.2026.101949","url":null,"abstract":"<div><div>With the development of AI technology, especially after the emergence of large language models, the response of the medical chatbot is more accurate and reasonable than before. However, due to the high cost of data annotation and hardware for training or fine-tuning specific data, it is difficult for researchers or physicians to train appropriate models for medical consultation. In this paper, we propose a new framework to solve this problem for medical dialogue generation. It is a vector level optimization scheme that we use different strategies during the training and testing stages. In the training stage, the original response and medical related words are supervised by two LLMs, which are considered as a twin network. While in the testing stage, we combine the hidden states of them to get the fusion output of the response. A large number of experiments show that our framework is effective and achieves performance improvement on five medical chat datasets. Thus, we provide new research ideas for medical chatbots.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"99 ","pages":"Article 101949"},"PeriodicalIF":3.4,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146173546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LaFresCat: A studio-quality Catalan multi-accent speech dataset for text-to-speech synthesis 用于文本到语音合成的工作室质量加泰罗尼亚语多口音语音数据集
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-21 DOI: 10.1016/j.csl.2026.101945
Alex Peiró-Lilja , Carme Armentano-Oller , José Giraldo , Wendy Elvira-García , Ignasi Esquerra , Rodolfo Zevallos , Cristina España-Bonet , Martí Llopart-Font , Baybars Külebi , Mireia Farrús
Current text-to-speech (TTS) systems are capable of learning the phonetics of a language accurately given that the speech data used to train such models covers all phonetic phenomena. For languages with different varieties, this includes all their richness and accents. This is the case of Catalan, a mid-resourced language with several dialects or accents. Although there are various publicly available corpora, there is a lack of high-quality open-access data for speech technologies covering its variety of accents. Common Voice includes recordings of Catalan speakers from different regions; however, accent labeling has been shown to be inaccurate, and artificially enhanced samples may be unsuitable for TTS. To address these limitations, we present LaFresCat, the first studio-quality Catalan multi-accent dataset. LaFresCat comprises 3.5 h of professionally recording speech covering four of the most prominent Catalan accents: Balearic, Central, North-Western, and Valencian. In this work, we provide a detailed description of the dataset design: utterances were selected to be phonetically balanced, detailed speaker instructions were provided, native speakers from the regions corresponding to the Catalan accents were hired, and the recordings were formatted and post-processed. The resulting dataset, LaFresCat, is publicly available. To preliminarily evaluate the dataset, we trained and assessed a lightweight flow-based TTS system, which is also provided as a by-product. We also analyzed LaFresCat samples and the corresponding TTS-generated samples at the phonetic level, employing expert annotations and Pillai scores to quantify acoustic vowel overlap. Preliminary results suggest a significant improvement in predicted mean opinion score (UTMOS), with an increase of 0.42 points when the TTS system is fine-tuned on LaFresCat rather than trained from scratch, starting from a pre-trained version based on Central Catalan data from Common Voice. Subsequent human expert annotations achieved nearly 90% accuracy in accent classification for LaFresCat recordings. However, although the TTS tends to homogenize pronunciation, it still learns distinct dialectal patterns. This assessment offers key insights for establishing a baseline to guide future evaluations of Catalan multi-accent TTS systems and further studies of LaFresCat.
当前的文本到语音(TTS)系统能够准确地学习语言的语音,因为用于训练这种模型的语音数据涵盖了所有语音现象。对于具有不同种类的语言,这包括它们所有的丰富性和口音。这就是加泰罗尼亚语的情况,这是一种中等资源的语言,有几种方言或口音。尽管有各种各样的公开可用的语料库,但缺乏覆盖各种口音的高质量开放访问的语音技术数据。“共同之声”包括来自不同地区的加泰罗尼亚语使用者的录音;然而,重音标记已被证明是不准确的,人工增强的样本可能不适合TTS。为了解决这些限制,我们提出了LaFresCat,第一个工作室质量的加泰罗尼亚语多口音数据集。LaFresCat包括3.5小时的专业录音演讲,涵盖四个最突出的加泰罗尼亚口音:巴利阿里,中部,西北部和巴伦西亚。在这项工作中,我们提供了数据集设计的详细描述:选择语音平衡的话语,提供详细的说话人说明,聘请来自加泰罗尼亚口音相应地区的母语人士,并对录音进行格式化和后处理。得到的数据集LaFresCat是公开的。为了初步评估数据集,我们训练并评估了一个轻量级的基于流量的TTS系统,该系统也是作为副产品提供的。我们还在语音层面分析了LaFresCat样本和相应的tts生成样本,采用专家注释和Pillai评分来量化元音重叠。初步结果表明,预测平均意见得分(UTMOS)显著提高,当TTS系统在LaFresCat上进行微调而不是从头开始训练时,从基于Common Voice中央加泰罗尼亚语数据的预训练版本开始,预测平均意见得分(UTMOS)增加了0.42分。随后的人类专家注释在LaFresCat录音的口音分类中达到了近90%的准确率。然而,虽然TTS倾向于同质化发音,但它仍然学习不同的方言模式。该评估为建立基线提供了关键见解,以指导加泰罗尼亚语多口音TTS系统的未来评估和LaFresCat的进一步研究。
{"title":"LaFresCat: A studio-quality Catalan multi-accent speech dataset for text-to-speech synthesis","authors":"Alex Peiró-Lilja ,&nbsp;Carme Armentano-Oller ,&nbsp;José Giraldo ,&nbsp;Wendy Elvira-García ,&nbsp;Ignasi Esquerra ,&nbsp;Rodolfo Zevallos ,&nbsp;Cristina España-Bonet ,&nbsp;Martí Llopart-Font ,&nbsp;Baybars Külebi ,&nbsp;Mireia Farrús","doi":"10.1016/j.csl.2026.101945","DOIUrl":"10.1016/j.csl.2026.101945","url":null,"abstract":"<div><div>Current text-to-speech (TTS) systems are capable of learning the phonetics of a language accurately given that the speech data used to train such models covers all phonetic phenomena. For languages with different varieties, this includes all their richness and accents. This is the case of Catalan, a mid-resourced language with several dialects or accents. Although there are various publicly available corpora, there is a lack of high-quality open-access data for speech technologies covering its variety of accents. Common Voice includes recordings of Catalan speakers from different regions; however, accent labeling has been shown to be inaccurate, and artificially enhanced samples may be unsuitable for TTS. To address these limitations, we present LaFresCat, the first studio-quality Catalan multi-accent dataset. LaFresCat comprises 3.5 h of professionally recording speech covering four of the most prominent Catalan accents: Balearic, Central, North-Western, and Valencian. In this work, we provide a detailed description of the dataset design: utterances were selected to be phonetically balanced, detailed speaker instructions were provided, native speakers from the regions corresponding to the Catalan accents were hired, and the recordings were formatted and post-processed. The resulting dataset, LaFresCat, is publicly available. To preliminarily evaluate the dataset, we trained and assessed a lightweight flow-based TTS system, which is also provided as a by-product. We also analyzed LaFresCat samples and the corresponding TTS-generated samples at the phonetic level, employing expert annotations and Pillai scores to quantify acoustic vowel overlap. Preliminary results suggest a significant improvement in predicted mean opinion score (UTMOS), with an increase of 0.42 points when the TTS system is fine-tuned on LaFresCat rather than trained from scratch, starting from a pre-trained version based on Central Catalan data from Common Voice. Subsequent human expert annotations achieved nearly 90% accuracy in accent classification for LaFresCat recordings. However, although the TTS tends to homogenize pronunciation, it still learns distinct dialectal patterns. This assessment offers key insights for establishing a baseline to guide future evaluations of Catalan multi-accent TTS systems and further studies of LaFresCat.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"100 ","pages":"Article 101945"},"PeriodicalIF":3.4,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146147489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Trainable multi-channel front-ends for joint beamforming and speaker embedding extraction 联合波束形成和扬声器嵌入提取的可训练多通道前端
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-16 DOI: 10.1016/j.csl.2026.101944
Ladislav Mošner , Oldřich Plchot , Lukáš Burget , Chunlei Zhang , Jan Černocký , Meng Yu
Multi-channel speaker verification (SV), employing numerous microphones for capturing enrollment and/or test recordings, gained attention for its benefits in far-field scenarios. While some studies approach the problem by designing multi-channel embedding extractors, we focus on building and thoroughly analyzing a framework integrating beamforming pre-processing paired with single-channel embedding extraction. This strategy benefits from accommodating both multi-channel and single-channel inputs. Furthermore, it provides human-interpretable intermediate output – enhanced speech – that can be independently evaluated and related to SV performance. We first focus on the front-end, taking advantage of deep-learning source separation for direct or indirect mask estimation required by the beamformer. We alternate single-channel network architectures, subsequently extended to multi-channel ones by reference channel attention (RCA). We also analyze the impact of beamformer and network output fusion. Finally, we show improvements brought by end-to-end fine-tuning the entire architecture facilitated by our newly designed multi-channel corpus, MultiSV2, extending our previous MultiSV dataset.
多声道扬声器验证(SV)采用多个麦克风来捕获注册和/或测试记录,因其在远场场景中的优势而受到关注。虽然一些研究通过设计多通道嵌入提取器来解决这个问题,但我们重点构建并深入分析了波束形成预处理与单通道嵌入提取相结合的框架。这一策略得益于同时适应多通道和单通道输入。此外,它提供了人类可解释的中间输出-增强语音-可以独立评估并与SV性能相关。我们首先关注前端,利用深度学习源分离来进行波束形成器所需的直接或间接掩模估计。我们交替使用单通道网络架构,随后通过参考通道注意(RCA)扩展到多通道网络架构。分析了波束形成器和网络输出融合的影响。最后,我们展示了由我们新设计的多通道语料库MultiSV2促进的端到端微调整个体系结构所带来的改进,扩展了我们以前的MultiSV数据集。
{"title":"Trainable multi-channel front-ends for joint beamforming and speaker embedding extraction","authors":"Ladislav Mošner ,&nbsp;Oldřich Plchot ,&nbsp;Lukáš Burget ,&nbsp;Chunlei Zhang ,&nbsp;Jan Černocký ,&nbsp;Meng Yu","doi":"10.1016/j.csl.2026.101944","DOIUrl":"10.1016/j.csl.2026.101944","url":null,"abstract":"<div><div>Multi-channel speaker verification (SV), employing numerous microphones for capturing enrollment and/or test recordings, gained attention for its benefits in far-field scenarios. While some studies approach the problem by designing multi-channel embedding extractors, we focus on building and thoroughly analyzing a framework integrating beamforming pre-processing paired with single-channel embedding extraction. This strategy benefits from accommodating both multi-channel and single-channel inputs. Furthermore, it provides human-interpretable intermediate output – enhanced speech – that can be independently evaluated and related to SV performance. We first focus on the front-end, taking advantage of deep-learning source separation for direct or indirect mask estimation required by the beamformer. We alternate single-channel network architectures, subsequently extended to multi-channel ones by reference channel attention (RCA). We also analyze the impact of beamformer and network output fusion. Finally, we show improvements brought by end-to-end fine-tuning the entire architecture facilitated by our newly designed multi-channel corpus, MultiSV2, extending our previous MultiSV dataset.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"99 ","pages":"Article 101944"},"PeriodicalIF":3.4,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146022541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
V-APA: A Voice-driven Agentic Process Automation System 语音驱动的代理过程自动化系统
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-13 DOI: 10.1016/j.csl.2026.101938
Myeong-Ha Hwang , Jikang Shin , Junseong Bang
While voice interaction facilitates hands-free control for Robotic Process Automation (RPA), real-world deployment faces significant challenges regarding robustness to ASR errors, reliable context tracking, and safeguards against unsafe execution. To address these, we propose V-APA, a voice-driven agentic spoken-dialogue system that automates administrative workflows through policy-driven orchestration, selecting dialogue actions online rather than following fixed, hand-crafted rule flows. The system incorporates three primary robustness and safety mechanisms: N-best ASR hypothesis fusion to mitigate recognition noise, Dialogue State Tracking (DST) for persistent context preservation across turns, and risk-aware confirmation gates to prevent high-impact mis-executions. V-APA is implemented using a practical, reproducible stack featuring Whisper-family ASR, a transformer-based intent ensemble (BERT, RoBERTa, T5), rule-based slot extraction, and LangGraph for dynamic multi-step orchestration. Out-of-scope requests are handled by an optional open-weight LLM fallback based on the Llama-3-8B architecture. Evaluated on 400 spoken task scenarios using a calibrated per-module latency model, results demonstrate that the proposed system significantly improves reliability and safety while maintaining an interactive turn-level latency of approximately 0.5 s.
虽然语音交互促进了机器人过程自动化(RPA)的免提控制,但实际部署面临着关于ASR错误的鲁棒性、可靠的上下文跟踪和防止不安全执行的保障方面的重大挑战。为了解决这些问题,我们提出了V-APA,一个语音驱动的代理口语对话系统,它通过策略驱动的编排自动化管理工作流,在线选择对话动作,而不是遵循固定的、手工制作的规则流。该系统包含三种主要的鲁棒性和安全性机制:N-best ASR假设融合以减轻识别噪声,对话状态跟踪(DST)用于跨回合持续上下文保存,以及风险感知确认门以防止高影响的错误执行。V-APA采用实用的、可重复的堆栈实现,其中包括Whisper-family ASR、基于变压器的意图集成(BERT、RoBERTa、T5)、基于规则的插槽提取和用于动态多步骤编排的LangGraph。超出作用域的请求由一个可选的基于lama-3- 8b架构的开放权重LLM回退处理。使用校准的每个模块延迟模型对400个语音任务场景进行评估,结果表明,该系统在保持约0.5 s的交互回合级延迟的同时,显著提高了可靠性和安全性。
{"title":"V-APA: A Voice-driven Agentic Process Automation System","authors":"Myeong-Ha Hwang ,&nbsp;Jikang Shin ,&nbsp;Junseong Bang","doi":"10.1016/j.csl.2026.101938","DOIUrl":"10.1016/j.csl.2026.101938","url":null,"abstract":"<div><div>While voice interaction facilitates hands-free control for Robotic Process Automation (RPA), real-world deployment faces significant challenges regarding robustness to ASR errors, reliable context tracking, and safeguards against unsafe execution. To address these, we propose V-APA, a voice-driven agentic spoken-dialogue system that automates administrative workflows through policy-driven orchestration, selecting dialogue actions online rather than following fixed, hand-crafted rule flows. The system incorporates three primary robustness and safety mechanisms: N-best ASR hypothesis fusion to mitigate recognition noise, Dialogue State Tracking (DST) for persistent context preservation across turns, and risk-aware confirmation gates to prevent high-impact mis-executions. V-APA is implemented using a practical, reproducible stack featuring Whisper-family ASR, a transformer-based intent ensemble (BERT, RoBERTa, T5), rule-based slot extraction, and LangGraph for dynamic multi-step orchestration. Out-of-scope requests are handled by an optional open-weight LLM fallback based on the Llama-3-8B architecture. Evaluated on 400 spoken task scenarios using a calibrated per-module latency model, results demonstrate that the proposed system significantly improves reliability and safety while maintaining an interactive turn-level latency of approximately 0.5 s.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"99 ","pages":"Article 101938"},"PeriodicalIF":3.4,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146022542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An experimental study of diffusion-based general speech restoration with predictive-guided conditioning 基于扩散的预测引导条件反射的一般语音恢复实验研究
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-11 DOI: 10.1016/j.csl.2026.101940
Da-Hee Yang, Joon-Hyuk Chang
This study presents a hybrid speech restoration framework that integrates predictive-guided conditioning into a diffusion-based generative model to address complex distortions, including noise, reverberation, and bandwidth reduction. The proposed method employs the outputs of a predictive model to guide the diffusion process, enabling more accurate reconstruction under challenging acoustic conditions. Furthermore, during the final sampling stage, the outputs of the predictive and generative models are fused with a tunable ratio, balancing signal fidelity and perceptual naturalness. Experimental results demonstrate that the proposed approach significantly improves objective restoration metrics compared to conventional diffusion baselines. However, the perceptual quality varies with the fusion ratio, revealing a trade-off between objective gains and subjective preference. These findings highlight the potential of predictive-guided conditioning for robust speech restoration and provide insights into optimizing the balance between predictive and generative contributions.
本研究提出了一种混合语音恢复框架,该框架将预测引导条件反射集成到基于扩散的生成模型中,以解决复杂的失真,包括噪声、混响和带宽减少。该方法采用预测模型的输出来指导扩散过程,从而在具有挑战性的声学条件下实现更精确的重建。此外,在最后的采样阶段,预测模型和生成模型的输出以可调比例融合,平衡信号保真度和感知自然性。实验结果表明,与传统的扩散基线相比,该方法显著提高了客观恢复指标。然而,感知质量随着融合比例的变化而变化,揭示了客观收益和主观偏好之间的权衡。这些发现突出了预测引导条件反射在稳健语音恢复中的潜力,并为优化预测和生成贡献之间的平衡提供了见解。
{"title":"An experimental study of diffusion-based general speech restoration with predictive-guided conditioning","authors":"Da-Hee Yang,&nbsp;Joon-Hyuk Chang","doi":"10.1016/j.csl.2026.101940","DOIUrl":"10.1016/j.csl.2026.101940","url":null,"abstract":"<div><div>This study presents a hybrid speech restoration framework that integrates predictive-guided conditioning into a diffusion-based generative model to address complex distortions, including noise, reverberation, and bandwidth reduction. The proposed method employs the outputs of a predictive model to guide the diffusion process, enabling more accurate reconstruction under challenging acoustic conditions. Furthermore, during the final sampling stage, the outputs of the predictive and generative models are fused with a tunable ratio, balancing signal fidelity and perceptual naturalness. Experimental results demonstrate that the proposed approach significantly improves objective restoration metrics compared to conventional diffusion baselines. However, the perceptual quality varies with the fusion ratio, revealing a trade-off between objective gains and subjective preference. These findings highlight the potential of predictive-guided conditioning for robust speech restoration and provide insights into optimizing the balance between predictive and generative contributions.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"99 ","pages":"Article 101940"},"PeriodicalIF":3.4,"publicationDate":"2026-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146022543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computer Speech and Language
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1