首页 > 最新文献

计算机科学最新文献

英文 中文
IF:
Identifying robust and dataset-independent acoustic biomarkers of depression through multi-model feature consensus analysis 通过多模型特征共识分析识别稳健且与数据集无关的抑郁症声学生物标志物
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-10-01 Epub Date: 2026-02-19 DOI: 10.1016/j.csl.2026.101960
Musyysb Yousufi, Rytis Maskeliunas
Speech is one of the most abundant and natural sources of acoustic data containing prosodic and spectral information. Acoustic features help diagnose mental and emotional health issues. In recent years, several researchers have looked at speech features as a way to detect depression. However, most of the frameworks only work with the data on which they were trained and do not work with new speakers, recording devices, or languages. This research aims to identify reliable and interpretable acoustic features that serve as stable indicators of depression in various speech datasets.
This study used two publicly available datasets, E-DAIC and MODMA. A total of 107 handcrafted prosodic, spectral, and voice quality acoustic features were extracted from 4-second segments, with 1-second overlap for long audios and padding for short audio clips. Subject-aware pre-processing was used to prevent speaker level overlap. Five feature selection algorithms were used and their findings were integrated using a consensus-based rank aggregation framework to identify consistent depression related features in both datasets. The resulting set of characteristics was evaluated using four classifier architectures through a K-sweep analysis. The adaptation of the correlation alignment domain was used to reduce distribution mismatches by aligning second-order statistics between the source and target domains, allowing robust cross-dataset transfer evaluation. Bidirectional cross-dataset evaluation demonstrated effective generalization in both transfer directions. Models trained on E-DAIC achieved F1=0.49-0.52 in MODMA (92%–94% of within-dataset performance), while MODMA trained models achieved F1=0.34–0.35 in E-DAIC, exceeding the baseline within-dataset of E-DAIC. The negative domain loss observed in E-DAIC (domain loss = −0.22 to −0.24) reflects high intra-dataset heterogeneity from naturalistic recording conditions rather than poor generalizability. These findings demonstrate that robust acoustic depression biomarkers can be learned from diverse datasets, enabling the detection of cross-linguistic depression.
语音是包含韵律和频谱信息的最丰富和最自然的声学数据来源之一。声学特征有助于诊断精神和情感健康问题。近年来,一些研究人员将语言特征作为一种检测抑郁症的方法。然而,大多数框架只适用于它们所训练的数据,而不适用于新的说话者、录音设备或语言。本研究旨在识别可靠和可解释的声学特征,作为各种语音数据集中抑郁的稳定指标。这项研究使用了两个公开可用的数据集,e - aic和MODMA。从4秒的音频片段中提取了107个手工制作的韵律、频谱和语音质量声学特征,长音频片段有1秒的重叠,短音频片段有1秒的填充。受试者感知预处理用于防止说话人水平重叠。使用了五种特征选择算法,并使用基于共识的排名聚合框架将他们的发现整合起来,以识别两个数据集中一致的抑郁症相关特征。通过k -扫描分析,使用四种分类器架构对结果特征集进行评估。利用相关对齐域的自适应,通过对齐源域和目标域之间的二阶统计量来减少分布不匹配,从而实现鲁棒的跨数据集传输评估。双向跨数据集评估表明,在两个迁移方向上都有有效的泛化。经e - aic训练的模型在MODMA中达到F1=0.49-0.52(92%-94%的数据集内性能),而MODMA训练的模型在e - aic中达到F1= 0.34-0.35,超过e - aic的数据集内基线。在e - aic中观察到的负域损失(域损失= - 0.22至- 0.24)反映了自然记录条件下数据集内部的高异质性,而不是较差的泛化性。这些发现表明,稳健的声学抑郁生物标志物可以从不同的数据集中学习,从而实现跨语言抑郁症的检测。
{"title":"Identifying robust and dataset-independent acoustic biomarkers of depression through multi-model feature consensus analysis","authors":"Musyysb Yousufi,&nbsp;Rytis Maskeliunas","doi":"10.1016/j.csl.2026.101960","DOIUrl":"10.1016/j.csl.2026.101960","url":null,"abstract":"<div><div>Speech is one of the most abundant and natural sources of acoustic data containing prosodic and spectral information. Acoustic features help diagnose mental and emotional health issues. In recent years, several researchers have looked at speech features as a way to detect depression. However, most of the frameworks only work with the data on which they were trained and do not work with new speakers, recording devices, or languages. This research aims to identify reliable and interpretable acoustic features that serve as stable indicators of depression in various speech datasets.</div><div>This study used two publicly available datasets, E-DAIC and MODMA. A total of 107 handcrafted prosodic, spectral, and voice quality acoustic features were extracted from 4-second segments, with 1-second overlap for long audios and padding for short audio clips. Subject-aware pre-processing was used to prevent speaker level overlap. Five feature selection algorithms were used and their findings were integrated using a consensus-based rank aggregation framework to identify consistent depression related features in both datasets. The resulting set of characteristics was evaluated using four classifier architectures through a K-sweep analysis. The adaptation of the correlation alignment domain was used to reduce distribution mismatches by aligning second-order statistics between the source and target domains, allowing robust cross-dataset transfer evaluation. Bidirectional cross-dataset evaluation demonstrated effective generalization in both transfer directions. Models trained on E-DAIC achieved F1=0.49-0.52 in MODMA (92%–94% of within-dataset performance), while MODMA trained models achieved F1=0.34–0.35 in E-DAIC, exceeding the baseline within-dataset of E-DAIC. The negative domain loss observed in E-DAIC (domain loss = −0.22 to −0.24) reflects high intra-dataset heterogeneity from naturalistic recording conditions rather than poor generalizability. These findings demonstrate that robust acoustic depression biomarkers can be learned from diverse datasets, enabling the detection of cross-linguistic depression.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"100 ","pages":"Article 101960"},"PeriodicalIF":3.4,"publicationDate":"2026-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147385911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Review and perspectives on multimodal perception, mutual cognition, and embodied execution for human–robot collaboration in Industry 5.0 工业5.0中人机协作的多模态感知、相互认知和具体化执行综述与展望
IF 11.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-10-01 Epub Date: 2026-02-28 DOI: 10.1016/j.rcim.2026.103280
Kai Ding , Qingyuan Mao , Yaqian Zhang , Yirong Zhang , Pai Zheng , Lihui Wang
Industry 5.0 represents a paradigm shift from efficiency-oriented automation to human-centric, resilient, and sustainable manufacturing, where human–robot collaboration (HRC) plays a crucial role by combining human flexibility with robotic precision. However, current HRC systems remain reactive and fragmented, lacking the alignment across perception, cognition, and execution required for seamless collaboration and robust generalization. While generative large models (GLMs) are emerging as a promising solution to these challenges, their integration into HRC exhibits a notable temporal lag compared to robotic domains, necessitating a systematic cross-domain synergy. This paper presents a review of GLM-enhanced HRC and proposes a prospective blueprint of multimodal perception, mutual cognition, and embodied execution for HRC in Industry 5.0. This blueprint outlines potential pathways toward human-centric smart manufacturing by synergizing generative artificial intelligence and embodied intelligence.
工业5.0代表了从以效率为导向的自动化到以人为中心、有弹性和可持续的制造业的范式转变,其中人机协作(HRC)通过将人类的灵活性与机器人的精度相结合,发挥着至关重要的作用。然而,当前的HRC系统仍然是反应性和碎片化的,缺乏无缝协作和健壮泛化所需的感知、认知和执行的一致性。虽然生成式大型模型(GLMs)正在成为应对这些挑战的一个有希望的解决方案,但与机器人领域相比,它们与HRC的集成表现出明显的时间滞后,需要系统的跨领域协同。本文综述了glm增强的HRC,并提出了工业5.0中HRC的多模态感知、相互认知和具体化执行的远景蓝图。该蓝图概述了通过协同生成式人工智能和具身智能实现以人为中心的智能制造的潜在途径。
{"title":"Review and perspectives on multimodal perception, mutual cognition, and embodied execution for human–robot collaboration in Industry 5.0","authors":"Kai Ding ,&nbsp;Qingyuan Mao ,&nbsp;Yaqian Zhang ,&nbsp;Yirong Zhang ,&nbsp;Pai Zheng ,&nbsp;Lihui Wang","doi":"10.1016/j.rcim.2026.103280","DOIUrl":"10.1016/j.rcim.2026.103280","url":null,"abstract":"<div><div>Industry 5.0 represents a paradigm shift from efficiency-oriented automation to human-centric, resilient, and sustainable manufacturing, where human–robot collaboration (HRC) plays a crucial role by combining human flexibility with robotic precision. However, current HRC systems remain reactive and fragmented, lacking the alignment across perception, cognition, and execution required for seamless collaboration and robust generalization. While generative large models (GLMs) are emerging as a promising solution to these challenges, their integration into HRC exhibits a notable temporal lag compared to robotic domains, necessitating a systematic cross-domain synergy. This paper presents a review of GLM-enhanced HRC and proposes a prospective blueprint of multimodal perception, mutual cognition, and embodied execution for HRC in Industry 5.0. This blueprint outlines potential pathways toward human-centric smart manufacturing by synergizing generative artificial intelligence and embodied intelligence.</div></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"101 ","pages":"Article 103280"},"PeriodicalIF":11.4,"publicationDate":"2026-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147330050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design of single-channel speech enhancement algorithm in noisy acoustic environments 噪声环境下单通道语音增强算法设计
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-10-01 Epub Date: 2026-02-12 DOI: 10.1016/j.csl.2026.101955
Yi-Fu Zhao, Guang-Hui Dong, Nan Liu
In speech enhancement, Transformers and Self-Attention-based denoising networks are widely used and perform well, and speech enhancement serves as a valuable front-end for speech recognition. However, existing dual-branch architectures lack sufficient natural speech phase extraction due to the phase spectrum’s sensitivity and easy compensation, and traditional dilated convolution architectures are unsuitable for resource-constrained devices, creating an urgent need for lightweight alternatives. Thus, this paper proposes TFEM-PHASEN-MINI, a discrete dual-branch phase extraction architecture based on the Base and Detail Feature Modules. It uses DilatedReparamBlock to replace the Dense Encoder’s dilated convolution module, balancing computational efficiency and performance by fusing Convolutional Neural Networks and Transformers. It also designs a time-frequency feature extraction module to verify integrating speech recognition modules into speech enhancement, and adds a Phase Enhancement Module to address insufficient phase-spectrum speech phoneme feature extraction (caused by magnitude spectrum over-compensation) via parallel phase estimation. On the VoiceBank+DEMAND dataset, it achieves scores of 3.44, 4.72, 4.18, 17.13, 2.10, and 0.96 for PESQ, CSIG, COVL, FWSSNR, CEPS, and STOI, respectively. On the DNS-Challenge dataset, it attains scores of 3.20 and 3.57 for WB-PESQ and NB-PESQ, respectively. On the EARS-WHAM testset and its blind testset, it improves the metrics of PESQ, CSIG, CBAK, COVL, SSNR, FWSSNR, CEPS, and STOI by 0.56, 1.00, 0.94, 0.83, 8.42, 5.26, 0.21, and 0.15 respectively, and achieves non-intrusive metrics (Overall Quality of 3.80, Noisiness of 4.18, Discontinuity of 4.32, Coloration of 3.85, Loudness of 3.45), showing optimal generalization. Though it has relatively lower CBAK and SSNR on the VoiceBank+DEMAND dataset, it remains overall advanced. Computational complexity and device inference tests verify the balance between its computational efficiency and accuracy.
在语音增强中,变压器和基于自注意的去噪网络得到了广泛的应用,并且表现良好,语音增强是语音识别的一个有价值的前端。然而,由于相位谱的灵敏度和易于补偿,现有的双分支架构缺乏足够的自然语音相位提取,传统的扩展卷积架构不适合资源受限的设备,因此迫切需要轻量级的替代方案。因此,本文提出了一种基于基本特征模块和细节特征模块的离散双支路相位提取体系结构TFEM-PHASEN-MINI。它使用DilatedReparamBlock来取代Dense Encoder的扩展卷积模块,通过融合卷积神经网络和变压器来平衡计算效率和性能。设计时频特征提取模块,验证将语音识别模块集成到语音增强中;增加相位增强模块,通过并行相位估计解决相谱语音音素特征提取不足(幅度谱过补偿)的问题。在VoiceBank+DEMAND数据集上,PESQ、CSIG、COVL、FWSSNR、CEPS和STOI的得分分别为3.44、4.72、4.18、17.13、2.10和0.96。在DNS-Challenge数据集上,WB-PESQ和NB-PESQ的得分分别为3.20和3.57。在ear - wham测试集及其盲测试集上,将PESQ、CSIG、CBAK、COVL、SSNR、FWSSNR、CEPS和STOI指标分别提高了0.56、1.00、0.94、0.83、8.42、5.26、0.21和0.15,实现了非侵入性指标(综合质量为3.80、噪声为4.18、不连续性为4.32、显色性为3.85、响度为3.45),呈现出最佳泛化效果。虽然它在VoiceBank+DEMAND数据集上的CBAK和sssnr相对较低,但总体上仍处于领先地位。计算复杂度和设备推理测试验证了其计算效率和精度之间的平衡。
{"title":"Design of single-channel speech enhancement algorithm in noisy acoustic environments","authors":"Yi-Fu Zhao,&nbsp;Guang-Hui Dong,&nbsp;Nan Liu","doi":"10.1016/j.csl.2026.101955","DOIUrl":"10.1016/j.csl.2026.101955","url":null,"abstract":"<div><div>In speech enhancement, Transformers and Self-Attention-based denoising networks are widely used and perform well, and speech enhancement serves as a valuable front-end for speech recognition. However, existing dual-branch architectures lack sufficient natural speech phase extraction due to the phase spectrum’s sensitivity and easy compensation, and traditional dilated convolution architectures are unsuitable for resource-constrained devices, creating an urgent need for lightweight alternatives. Thus, this paper proposes TFEM-PHASEN-MINI, a discrete dual-branch phase extraction architecture based on the Base and Detail Feature Modules. It uses DilatedReparamBlock to replace the Dense Encoder’s dilated convolution module, balancing computational efficiency and performance by fusing Convolutional Neural Networks and Transformers. It also designs a time-frequency feature extraction module to verify integrating speech recognition modules into speech enhancement, and adds a Phase Enhancement Module to address insufficient phase-spectrum speech phoneme feature extraction (caused by magnitude spectrum over-compensation) via parallel phase estimation. On the VoiceBank+DEMAND dataset, it achieves scores of 3.44, 4.72, 4.18, 17.13, 2.10, and 0.96 for PESQ, CSIG, COVL, FWSSNR, CEPS, and STOI, respectively. On the DNS-Challenge dataset, it attains scores of 3.20 and 3.57 for WB-PESQ and NB-PESQ, respectively. On the EARS-WHAM testset and its blind testset, it improves the metrics of PESQ, CSIG, CBAK, COVL, SSNR, FWSSNR, CEPS, and STOI by 0.56, 1.00, 0.94, 0.83, 8.42, 5.26, 0.21, and 0.15 respectively, and achieves non-intrusive metrics (Overall Quality of 3.80, Noisiness of 4.18, Discontinuity of 4.32, Coloration of 3.85, Loudness of 3.45), showing optimal generalization. Though it has relatively lower CBAK and SSNR on the VoiceBank+DEMAND dataset, it remains overall advanced. Computational complexity and device inference tests verify the balance between its computational efficiency and accuracy.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"100 ","pages":"Article 101955"},"PeriodicalIF":3.4,"publicationDate":"2026-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147386295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sentence representations for semantic textual similarity: A systematic review 语义文本相似度的句子表示:系统综述
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-10-01 Epub Date: 2026-02-28 DOI: 10.1016/j.csl.2026.101970
Larissa Guder , João Paulo Aires , Hígor Uélinton da Silva , Felipe Meneguzzi , Dalvan Griebler
In natural language processing (NLP), generating semantically-rich representations of sentences can improve performance on multiple tasks, such as question answering, duplicate detection, sentiment analysis, and machine translation. Recent approaches to NLP using machine learning can produce text representations that carry syntactic and semantic information. This article surveys recent works on generating sentence representations for semantic textual similarity tasks. We conduct our survey using a systematic literature review approach. We retrieve papers from several digital libraries and summarize their key techniques and findings. We propose a taxonomy to facilitate the understanding of the semantic textual similarity task on the sentence level. In our analysis, we describe the current state-of-the-art in sentence representation for semantic textual similarity and propose a guideline for working on this task.
在自然语言处理(NLP)中,生成语义丰富的句子表示可以提高多个任务的性能,例如问答、重复检测、情感分析和机器翻译。最近使用机器学习的NLP方法可以产生携带语法和语义信息的文本表示。本文综述了最近在语义文本相似任务生成句子表示方面的研究成果。我们使用系统的文献回顾方法进行调查。我们从几个数字图书馆检索论文,并总结了他们的关键技术和发现。我们提出了一种分类方法,以便在句子层面上理解语义文本相似任务。在我们的分析中,我们描述了当前语义文本相似性的句子表示的最新技术,并提出了完成这项任务的指导方针。
{"title":"Sentence representations for semantic textual similarity: A systematic review","authors":"Larissa Guder ,&nbsp;João Paulo Aires ,&nbsp;Hígor Uélinton da Silva ,&nbsp;Felipe Meneguzzi ,&nbsp;Dalvan Griebler","doi":"10.1016/j.csl.2026.101970","DOIUrl":"10.1016/j.csl.2026.101970","url":null,"abstract":"<div><div>In natural language processing (NLP), generating semantically-rich representations of sentences can improve performance on multiple tasks, such as question answering, duplicate detection, sentiment analysis, and machine translation. Recent approaches to NLP using machine learning can produce text representations that carry syntactic and semantic information. This article surveys recent works on generating sentence representations for semantic textual similarity tasks. We conduct our survey using a systematic literature review approach. We retrieve papers from several digital libraries and summarize their key techniques and findings. We propose a taxonomy to facilitate the understanding of the semantic textual similarity task on the sentence level. In our analysis, we describe the current state-of-the-art in sentence representation for semantic textual similarity and propose a guideline for working on this task.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"100 ","pages":"Article 101970"},"PeriodicalIF":3.4,"publicationDate":"2026-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147385930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-linguistic analysis of prosodic features based on wavelet prominence: A study of L2 English and L1 Sindhi lexical stress using large language & deep learning models 基于小波突出的韵律特征跨语言分析:基于大语言深度学习模型的第二语言英语和第一语言信德语词汇重音研究
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-10-01 Epub Date: 2026-02-13 DOI: 10.1016/j.csl.2026.101953
Abdul Malik Abbasi , Imtiaz Husain
This study presents a cross-linguistic analysis of prosodic features in English and Sindhi, with an emphasis on modelling lexical stress and rhythmic prominence using advanced artificial intelligence techniques. The proposed framework integrates wavelet-based signal processing with Deep Learning architectures and prosodic embeddings extracted from Large Language Models (LLMs). We address the lack of computational research on Sindhi lexical stress and investigate the central research question of whether a fused representation of CWT-based prosodic prominence and Wav2Vec 2.0 embeddings can accurately model stress patterns and support cross-lingual transfer to L2 English. Trained on lexical stress patterns in Sindhi, the system is applied to English speech data from speakers with diverse first-language (L1) backgrounds to automatically predict syllable prominence. Experimental results show that the hybrid model combining continuous wavelet transform (CWT) features with BiLSTM and Wav2Vec 2.0 embeddings achieves a stress classification accuracy of 92.1%, outperforming baseline models by a significant margin. Feature ablation analysis confirms duration as the most predictive cue in Sindhi, while pitch dominates in English. The model's prominence estimates show strong alignment with human-assigned CEFR ratings (Pearson’s r = 0.78, p < 0.001), validating its perceptual reliability. These findings underscore the effectiveness of interpretable, AI-driven approaches for multilingual prosody modelling and highlight their practical utility in speech synthesis, automatic speech recognition, and language learning technologies.
本研究对英语和信德语的韵律特征进行了跨语言分析,重点是使用先进的人工智能技术对词汇重音和节奏突出进行建模。该框架将基于小波的信号处理与深度学习架构和从大型语言模型(LLMs)中提取的韵律嵌入相结合。我们解决了信德语词汇重音计算研究的不足,并探讨了基于cwt的韵律突出和Wav2Vec 2.0嵌入的融合表示是否可以准确地建模重音模式并支持跨语言迁移到第二语言英语这一核心研究问题。该系统对信德语的词汇重音模式进行了训练,并应用于不同母语背景的英语语音数据,以自动预测音节的突出程度。实验结果表明,将连续小波变换(CWT)特征与BiLSTM和Wav2Vec 2.0嵌入相结合的混合模型的应力分类准确率达到92.1%,明显优于基线模型。特征消融分析证实,在信德语中,时长是最具预测性的线索,而在英语中,音高占主导地位。该模型的显著性估计显示与人为分配的CEFR评级高度一致(Pearson’s r = 0.78, p < 0.001),验证了其感知可靠性。这些发现强调了可解释的、人工智能驱动的多语言韵律建模方法的有效性,并强调了它们在语音合成、自动语音识别和语言学习技术中的实际应用。
{"title":"Cross-linguistic analysis of prosodic features based on wavelet prominence: A study of L2 English and L1 Sindhi lexical stress using large language & deep learning models","authors":"Abdul Malik Abbasi ,&nbsp;Imtiaz Husain","doi":"10.1016/j.csl.2026.101953","DOIUrl":"10.1016/j.csl.2026.101953","url":null,"abstract":"<div><div>This study presents a cross-linguistic analysis of prosodic features in English and Sindhi, with an emphasis on modelling lexical stress and rhythmic prominence using advanced artificial intelligence techniques. The proposed framework integrates wavelet-based signal processing with Deep Learning architectures and prosodic embeddings extracted from Large Language Models (LLMs). We address the lack of computational research on Sindhi lexical stress and investigate the central research question of whether a fused representation of CWT-based prosodic prominence and Wav2Vec 2.0 embeddings can accurately model stress patterns and support cross-lingual transfer to L2 English. Trained on lexical stress patterns in Sindhi, the system is applied to English speech data from speakers with diverse first-language (L1) backgrounds to automatically predict syllable prominence. Experimental results show that the hybrid model combining continuous wavelet transform (CWT) features with BiLSTM and Wav2Vec 2.0 embeddings achieves a stress classification accuracy of 92.1%, outperforming baseline models by a significant margin. Feature ablation analysis confirms duration as the most predictive cue in Sindhi, while pitch dominates in English. The model's prominence estimates show strong alignment with human-assigned CEFR ratings (Pearson’s r = 0.78, <em>p</em> &lt; 0.001), validating its perceptual reliability. These findings underscore the effectiveness of interpretable, AI-driven approaches for multilingual prosody modelling and highlight their practical utility in speech synthesis, automatic speech recognition, and language learning technologies.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"100 ","pages":"Article 101953"},"PeriodicalIF":3.4,"publicationDate":"2026-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147385932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards human-centric manufacturing: Task planning under uncertainties in human–robot collaborative assembly 面向以人为中心的制造:不确定条件下的人机协同装配任务规划
IF 11.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-10-01 Epub Date: 2026-03-05 DOI: 10.1016/j.rcim.2026.103293
Yingchao You , Ze Ji , Changyun Wei
Task planning plays a pivotal role in ensuring the smooth collaboration between humans and robots by efficiently allocating tasks among agents and scheduling available resources. Although some recently proposed task planners incorporate human factors into their frameworks, few explicitly account for human-related uncertainties, which can potentially lead to task failures. To address this gap, this study introduces a physical exertion–aware task planner that explicitly considers uncertainties in both human factors and task execution time. The uncertainties associated with physical exertion and execution time are modelled using the Single-Valued Triangular Neutrosophic (SVTN) Number method. Furthermore, a reinforcement learning-based approach is developed to learn adaptive task allocation policies and scheduling under these uncertainties. The experimental results indicate that the reinforcement learning-based approach effectively reduces performance variability compared with the benchmark methods.
任务规划通过在agent之间有效地分配任务和调度可用资源,在保证人与机器人之间顺利协作方面起着关键作用。尽管最近提出的一些任务规划将人为因素纳入其框架,但很少有人明确考虑到与人为相关的不确定性,这可能导致任务失败。为了解决这一差距,本研究引入了一个明确考虑人为因素和任务执行时间不确定性的体力活动感知任务规划器。与体力消耗和执行时间相关的不确定性使用单值三角嗜中性(SVTN)数方法建模。在此基础上,提出了一种基于强化学习的方法来学习这些不确定性下的自适应任务分配策略和调度。实验结果表明,与基准方法相比,基于强化学习的方法有效地降低了性能变异性。
{"title":"Towards human-centric manufacturing: Task planning under uncertainties in human–robot collaborative assembly","authors":"Yingchao You ,&nbsp;Ze Ji ,&nbsp;Changyun Wei","doi":"10.1016/j.rcim.2026.103293","DOIUrl":"10.1016/j.rcim.2026.103293","url":null,"abstract":"<div><div>Task planning plays a pivotal role in ensuring the smooth collaboration between humans and robots by efficiently allocating tasks among agents and scheduling available resources. Although some recently proposed task planners incorporate human factors into their frameworks, few explicitly account for human-related uncertainties, which can potentially lead to task failures. To address this gap, this study introduces a physical exertion–aware task planner that explicitly considers uncertainties in both human factors and task execution time. The uncertainties associated with physical exertion and execution time are modelled using the Single-Valued Triangular Neutrosophic (SVTN) Number method. Furthermore, a reinforcement learning-based approach is developed to learn adaptive task allocation policies and scheduling under these uncertainties. The experimental results indicate that the reinforcement learning-based approach effectively reduces performance variability compared with the benchmark methods.</div></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"101 ","pages":"Article 103293"},"PeriodicalIF":11.4,"publicationDate":"2026-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147360715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A dual-loop framework for manufacturability-aware topology optimization of electric vehicle structures via wire arc additive manufacturing 基于电弧增材制造的电动汽车结构可制造性感知拓扑优化双环框架
IF 11.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2026-10-01 Epub Date: 2026-03-02 DOI: 10.1016/j.rcim.2026.103273
Qiang Cui , Chuan Yu , Daoqian Yang , Jiangshan Li , Chunyang Yu
Wire Arc Additive Manufacturing (WAAM) enables efficient fabrication of large-scale electric vehicle (EV) structures, yet its integration with Discrete Topology Optimization (DTO) is often limited by static and conservative manufacturability constraints. This study presents a dual-loop framework that tightly couples DTO with WAAM through adaptive constraint refinement and in-situ process feedback. An inner loop performs real-time path compensation and process parameter adjustment based on geometric deviation monitoring, while an outer loop updates inclination-based manufacturability constraints using accumulated fabrication knowledge. Printability is characterized by minimum self-supporting and maximum compensable angle thresholds, allowing manufacturability to be modeled as a graded design variable. Both hard and soft constraint strategies are incorporated into the DTO formulation to regulate overhang-sensitive members. A full-scale electric vehicle chassis is used as a running case throughout the paper to demonstrate the proposed framework, spanning constrained DTO, deposition experiments, and robotic WAAM fabrication, and showing improved printability while preserving load-efficient topologies.
电弧增材制造(WAAM)能够高效制造大型电动汽车(EV)结构,但其与离散拓扑优化(DTO)的集成往往受到静态和保守制造性约束的限制。通过自适应约束细化和原位工艺反馈,提出了DTO与WAAM紧密耦合的双环框架。内环基于几何偏差监测执行实时路径补偿和工艺参数调整,而外环利用积累的制造知识更新基于倾角的可制造性约束。可印刷性的特点是最小的自支撑和最大的可补偿角度阈值,允许可制造性建模为分级设计变量。将硬约束策略和软约束策略结合到DTO公式中来调节悬挑敏感构件。整篇论文使用全尺寸电动汽车底盘作为运行案例来演示所提出的框架,涵盖受限DTO、沉积实验和机器人WAAM制造,并在保持负载高效拓扑的同时显示出改进的可打印性。
{"title":"A dual-loop framework for manufacturability-aware topology optimization of electric vehicle structures via wire arc additive manufacturing","authors":"Qiang Cui ,&nbsp;Chuan Yu ,&nbsp;Daoqian Yang ,&nbsp;Jiangshan Li ,&nbsp;Chunyang Yu","doi":"10.1016/j.rcim.2026.103273","DOIUrl":"10.1016/j.rcim.2026.103273","url":null,"abstract":"<div><div>Wire Arc Additive Manufacturing (WAAM) enables efficient fabrication of large-scale electric vehicle (EV) structures, yet its integration with Discrete Topology Optimization (DTO) is often limited by static and conservative manufacturability constraints. This study presents a dual-loop framework that tightly couples DTO with WAAM through adaptive constraint refinement and <em>in-situ</em> process feedback. An inner loop performs real-time path compensation and process parameter adjustment based on geometric deviation monitoring, while an outer loop updates inclination-based manufacturability constraints using accumulated fabrication knowledge. Printability is characterized by minimum self-supporting and maximum compensable angle thresholds, allowing manufacturability to be modeled as a graded design variable. Both hard and soft constraint strategies are incorporated into the DTO formulation to regulate overhang-sensitive members. A full-scale electric vehicle chassis is used as a running case throughout the paper to demonstrate the proposed framework, spanning constrained DTO, deposition experiments, and robotic WAAM fabrication, and showing improved printability while preserving load-efficient topologies.</div></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"101 ","pages":"Article 103273"},"PeriodicalIF":11.4,"publicationDate":"2026-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147360728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An exhaustive evaluation method for open-domain LLM dialogue by constructing recursive CoT 一种基于构造递归CoT的开域LLM对话穷举评价方法
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-10-01 Epub Date: 2026-02-13 DOI: 10.1016/j.csl.2026.101957
Shengjie Zhao , Zhenping Xie
In recent years, evaluation methods based on large language models (LLMs) have demonstrated advanced performance in reference-free evaluation of open-domain dialogue quality. However, existing approaches often rely on simple, manually crafted evaluation instructions, lacking the depth and diversity to reflect complex human thinking processes. To address these limitations, we propose the Rec-CoT-Eval framework, a reference-free method for evaluating dialogue quality that automatically constructs a Chain-of-Thought (CoT) through interaction with LLMs. Unlike existing methods that depend on manually crafted instructions, our approach enables the automatic construction of a CoT for evaluation. We treat each evaluation metric as a root task and use prompts to guide the LLMs in recursively decomposing it into sub-problems in a top-down manner. By solving these sub-problems, a comprehensive evaluation CoT is constructed. Ultimately, this CoT is used as a prompt for the LLMs, enabling them to act as dialogue quality evaluation agents and perform reference-free evaluation of target dialogues. Furthermore, the framework incorporates an optional human-computer interaction mechanism, designed to meet the need for fine-grained and personalized customization of evaluation criteria in practical industrial applications. This mechanism allows evaluators to dynamically modify the generated CoT when necessary, integrating expert knowledge to enhance evaluation accuracy and personalization. Experimental results demonstrate that our proposed method achieves a higher correlation with human judgments and outperforms existing approaches.
近年来,基于大型语言模型(llm)的评价方法在开放域对话质量的无参考评价中表现出了先进的性能。然而,现有的方法往往依赖于简单的、手工制作的评估指令,缺乏反映复杂的人类思维过程的深度和多样性。为了解决这些限制,我们提出了Rec-CoT-Eval框架,这是一种无需参考的方法,用于评估对话质量,通过与llm的交互自动构建思维链(CoT)。与现有依赖于手工制作指令的方法不同,我们的方法能够自动构建用于评估的CoT。我们将每个评估指标视为一个根任务,并使用提示来指导llm以自上而下的方式递归地将其分解为子问题。通过求解这些子问题,构建了一个综合评价模型。最终,这个CoT被用作llm的提示,使它们能够充当对话质量评估代理,并对目标对话执行无参考的评估。此外,该框架还包含一个可选的人机交互机制,旨在满足实际工业应用中对评估标准的细粒度和个性化定制的需要。该机制允许评估人员在必要时动态修改生成的CoT,集成专家知识以提高评估的准确性和个性化。实验结果表明,该方法与人类判断具有较高的相关性,优于现有方法。
{"title":"An exhaustive evaluation method for open-domain LLM dialogue by constructing recursive CoT","authors":"Shengjie Zhao ,&nbsp;Zhenping Xie","doi":"10.1016/j.csl.2026.101957","DOIUrl":"10.1016/j.csl.2026.101957","url":null,"abstract":"<div><div>In recent years, evaluation methods based on large language models (LLMs) have demonstrated advanced performance in reference-free evaluation of open-domain dialogue quality. However, existing approaches often rely on simple, manually crafted evaluation instructions, lacking the depth and diversity to reflect complex human thinking processes. To address these limitations, we propose the Rec-CoT-Eval framework, a reference-free method for evaluating dialogue quality that automatically constructs a Chain-of-Thought (CoT) through interaction with LLMs. Unlike existing methods that depend on manually crafted instructions, our approach enables the automatic construction of a CoT for evaluation. We treat each evaluation metric as a root task and use prompts to guide the LLMs in recursively decomposing it into sub-problems in a top-down manner. By solving these sub-problems, a comprehensive evaluation CoT is constructed. Ultimately, this CoT is used as a prompt for the LLMs, enabling them to act as dialogue quality evaluation agents and perform reference-free evaluation of target dialogues. Furthermore, the framework incorporates an optional human-computer interaction mechanism, designed to meet the need for fine-grained and personalized customization of evaluation criteria in practical industrial applications. This mechanism allows evaluators to dynamically modify the generated CoT when necessary, integrating expert knowledge to enhance evaluation accuracy and personalization. Experimental results demonstrate that our proposed method achieves a higher correlation with human judgments and outperforms existing approaches.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"100 ","pages":"Article 101957"},"PeriodicalIF":3.4,"publicationDate":"2026-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147386293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improvements in Spanish audio transcription workflows: Integrating preprocessing, LLM-based correction, and speaker diarization and identification 西班牙语音频转录工作流程的改进:集成预处理,基于法学硕士的校正,以及说话人的特征和识别
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-10-01 Epub Date: 2026-02-26 DOI: 10.1016/j.csl.2026.101966
Gonzalo Nieto Montero, Santiago Hernández, Juan Casal
Robust, richly-annotated transcription of Spanish broadcast audio remains difficult under realistic conditions even for state-of-the-art multilingual ASR systems. This paper advances Spanish speech transcription through a framework that couples (i) targeted audio preprocessing, (ii) large language model (LLM) post-correction with deterministic verification, and (iii) diarization plus speaker identity assignment to produce both more accurate and more informative transcripts. First, we show that applying HDemucs vocal isolation followed by band-limited filtering improves WhisperX (Whisper large-v3) performance on modern RTVE broadcast test sets, reaching 10.82% WER on RTVE2022DB (2.79% relative reduction vs. WhisperX) and 10.36% on RTVE2020DB. To define the boundaries of this approach, we also evaluate the NVIDIA Canary-1B-v2 model, observing that these gains are model-dependent. Second, we introduce a verification algorithm for LLM-based correction that constrains the model to a purely corrective role via normalized-text equivalence checks and bounded edit-distance acceptance, preserving pipeline determinism while retaining LLM benefits. On two formatting-noise stress tests (RTVE2017-week subtitles and noisy VoxPopuli-es), this mechanism nearly halves case- and punctuation-sensitive error rate and identifies a robust operating region for tolerance thresholds. Third, we enrich transcripts with speaker names by combining WhisperX/pyannote diarization with audio-embedding matching and complementary transcript-driven (LLM) identification, achieving 29.92% DER on RTVE2022DB, an improvement over the challenge reference baseline. Together, the modules deliver cleaner, speaker-aware transcripts that surpass the strongest zero-shot WhisperX baseline and illustrate how carefully combining off-the-shelf models can advance Spanish ASR in realistic conditions without training.
即使是最先进的多语言ASR系统,在现实条件下,西班牙广播音频的鲁棒性,丰富的注释转录仍然很困难。本文通过一个框架来推进西班牙语语音转录,该框架结合了(i)有针对性的音频预处理,(ii)具有确定性验证的大型语言模型(LLM)后校正,以及(iii) diarization加上说话者身份分配,以产生更准确和更有信息的转录本。首先,我们表明,在现代RTVE广播测试集上,应用HDemucs语音隔离和带限滤波提高了WhisperX (Whisper large-v3)的性能,在RTVE2022DB上达到10.82%的降噪(相对于WhisperX降低2.79%),在RTVE2020DB上达到10.36%。为了定义这种方法的边界,我们还评估了NVIDIA金丝雀- 1b -v2模型,观察到这些增益是模型相关的。其次,我们引入了一种基于LLM的校正验证算法,该算法通过规范化文本等价检查和有界编辑距离接受将模型约束为纯粹的校正角色,在保留LLM优势的同时保留了管道确定性。在两个格式噪声压力测试(RTVE2017-week字幕和嘈杂的voxpopular -es)中,该机制几乎将大小写和标点敏感错误率减半,并确定了一个稳健的操作区域作为容忍阈值。第三,我们通过将WhisperX/pyannote diarization与音频嵌入匹配和互补转录驱动(LLM)识别相结合,丰富了说话者姓名的转录本,在RTVE2022DB上实现了29.92%的DER,比挑战参考基线有所提高。总之,这些模块提供了更清晰、说话人感知的转录本,超越了最强的零射WhisperX基线,并说明了如何精心结合现成的模型,可以在没有训练的情况下在现实条件下推进西班牙语ASR。
{"title":"Improvements in Spanish audio transcription workflows: Integrating preprocessing, LLM-based correction, and speaker diarization and identification","authors":"Gonzalo Nieto Montero,&nbsp;Santiago Hernández,&nbsp;Juan Casal","doi":"10.1016/j.csl.2026.101966","DOIUrl":"10.1016/j.csl.2026.101966","url":null,"abstract":"<div><div>Robust, richly-annotated transcription of Spanish broadcast audio remains difficult under realistic conditions even for state-of-the-art multilingual ASR systems. This paper advances Spanish speech transcription through a framework that couples (i) targeted audio preprocessing, (ii) large language model (LLM) post-correction with deterministic verification, and (iii) diarization plus speaker identity assignment to produce both more accurate and more informative transcripts. First, we show that applying HDemucs vocal isolation followed by band-limited filtering improves WhisperX (Whisper large-v3) performance on modern RTVE broadcast test sets, reaching 10.82% WER on RTVE2022DB (2.79% relative reduction vs. WhisperX) and 10.36% on RTVE2020DB. To define the boundaries of this approach, we also evaluate the NVIDIA Canary-1B-v2 model, observing that these gains are model-dependent. Second, we introduce a verification algorithm for LLM-based correction that constrains the model to a purely corrective role via normalized-text equivalence checks and bounded edit-distance acceptance, preserving pipeline determinism while retaining LLM benefits. On two formatting-noise stress tests (RTVE2017-week subtitles and noisy VoxPopuli-es), this mechanism nearly halves case- and punctuation-sensitive error rate and identifies a robust operating region for tolerance thresholds. Third, we enrich transcripts with speaker names by combining WhisperX/pyannote diarization with audio-embedding matching and complementary transcript-driven (LLM) identification, achieving 29.92% DER on RTVE2022DB, an improvement over the challenge reference baseline. Together, the modules deliver cleaner, speaker-aware transcripts that surpass the strongest zero-shot WhisperX baseline and illustrate how carefully combining off-the-shelf models can advance Spanish ASR in realistic conditions without training.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"100 ","pages":"Article 101966"},"PeriodicalIF":3.4,"publicationDate":"2026-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147386298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attention based convolutional residual squeeze excited capsule network for aspect based sentiment classification in Malayalam movie reviews 基于注意的卷积残差挤激胶囊网络在马来语影评中基于方面的情感分类
IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-10-01 Epub Date: 2026-02-03 DOI: 10.1016/j.csl.2026.101952
Sharika TR , Julia Punithamalar Dhas
One of the main functions of Natural Language Processing (NLP) is sentiment analysis, which extracts attitudes, ideas, views or judgments about a given topic. The Internet is a vast and unstructured information source full of text documents, including evaluations and opinions. Firstly, the input texts are pre-processed using an efficient NLP method such as tokenization, stemming, removal of empty sets, stop words removal and morphological segmentation. These pre-processed texts serve as the input for the feature extraction stage. Using the three methods of Improved Term Frequency-Inverse Document Frequency (ITF-IDF), Latent Semantic Analysis (LSA) and Extended Bidirectional Encoder Representations from Transformers (E-BERT), the review-based features are extracted. Aspect-based features are extracted from the review text using the Aspect Related Feature (ARF) extraction method. By enhancing term weights with improved frequency scaling, the model improves on regular TF-IDF and includes more subtle contextual meanings and relationships with words. Finally, applying both types of features, a new Attention-based Convolutional Residual Squeeze Excited Capsule Network (A-CR-SECapNet) model is created to classify sentiment polarities as positive, negative and neutral. The Convolutional Residual Module captures spatial relationships to learn deeper networks that mitigate vanishing gradients. The SE Module improves the attentiveness of the network by dynamically reweighting the channel-wise information from features that correlate with important sentiment variables. The CapNet preserves the spatial relationships between words to maintain the dependence of sentiment between features. Finally, the performance of the model is further improved by fine-tuning the parameters using the Modified Gazelle Optimization (MGO) optimization method. In the results section, the proposed model is compared to the existing model in terms of precision, f1-score, accuracy, recall, mean absolute error (MSE) and mean absolute percentage error (MAPE). The proposed model produced the best results, demonstrating its superiority.
自然语言处理(NLP)的主要功能之一是情感分析,即提取对给定主题的态度、想法、观点或判断。互联网是一个巨大的非结构化信息源,充满了文本文档,包括评估和意见。首先,使用高效的自然语言处理方法对输入文本进行预处理,如标记化、词干提取、空集去除、停止词去除和形态分割。这些预处理文本作为特征提取阶段的输入。采用改进词频-逆文档频率(ITF-IDF)、潜在语义分析(LSA)和扩展双向编码器表示(E-BERT)三种方法提取基于评论的特征。使用方面相关特征(Aspect Related Feature, ARF)提取方法从评审文本中提取基于方面的特征。通过改进频率缩放来增强术语权重,该模型改进了常规TF-IDF,并包含了更微妙的上下文含义和与单词的关系。最后,应用这两种类型的特征,创建了一个新的基于注意力的卷积残余挤压兴奋胶囊网络(a - cr - secapnet)模型,将情绪极性分为积极、消极和中性。卷积残差模块捕获空间关系来学习更深层的网络,以减轻消失的梯度。SE模块通过动态地重新加权与重要情绪变量相关的特征的通道信息来提高网络的注意力。CapNet通过保留词间的空间关系来保持特征间情感的依赖性。最后,采用修正瞪羚优化(Modified Gazelle Optimization, MGO)优化方法对模型参数进行微调,进一步提高了模型的性能。在结果部分,将提出的模型与现有模型在精度、f1-score、准确率、召回率、平均绝对误差(MSE)和平均绝对百分比误差(MAPE)方面进行比较。所提出的模型得到了最好的结果,证明了它的优越性。
{"title":"Attention based convolutional residual squeeze excited capsule network for aspect based sentiment classification in Malayalam movie reviews","authors":"Sharika TR ,&nbsp;Julia Punithamalar Dhas","doi":"10.1016/j.csl.2026.101952","DOIUrl":"10.1016/j.csl.2026.101952","url":null,"abstract":"<div><div>One of the main functions of Natural Language Processing (NLP) is sentiment analysis, which extracts attitudes, ideas, views or judgments about a given topic. The Internet is a vast and unstructured information source full of text documents, including evaluations and opinions. Firstly, the input texts are pre-processed using an efficient NLP method such as tokenization, stemming, removal of empty sets, stop words removal and morphological segmentation. These pre-processed texts serve as the input for the feature extraction stage. Using the three methods of Improved Term Frequency-Inverse Document Frequency (ITF-IDF), Latent Semantic Analysis (LSA) and Extended Bidirectional Encoder Representations from Transformers (E-BERT), the review-based features are extracted. Aspect-based features are extracted from the review text using the Aspect Related Feature (ARF) extraction method. By enhancing term weights with improved frequency scaling, the model improves on regular TF-IDF and includes more subtle contextual meanings and relationships with words. Finally, applying both types of features, a new Attention-based Convolutional Residual Squeeze Excited Capsule Network (A-CR-SECapNet) model is created to classify sentiment polarities as positive, negative and neutral. The Convolutional Residual Module captures spatial relationships to learn deeper networks that mitigate vanishing gradients. The SE Module improves the attentiveness of the network by dynamically reweighting the channel-wise information from features that correlate with important sentiment variables. The CapNet preserves the spatial relationships between words to maintain the dependence of sentiment between features. Finally, the performance of the model is further improved by fine-tuning the parameters using the Modified Gazelle Optimization (MGO) optimization method. In the results section, the proposed model is compared to the existing model in terms of precision, f1-score, accuracy, recall, mean absolute error (MSE) and mean absolute percentage error (MAPE). The proposed model produced the best results, demonstrating its superiority.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"100 ","pages":"Article 101952"},"PeriodicalIF":3.4,"publicationDate":"2026-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147386296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
全部 J. Field Rob. J. Bionic Eng. ACTA INFORM Adv. Rob. AI MAG Ann. Math. Artif. Intell. Appl. Bionics Biomech. APPL INTELL APPL COMPUT ELECTROM APPL ARTIF INTELL Artif. Intell. ARTIF INTELL REV CHEMOMETR INTELL LAB China Commun. CMC-Comput. Mater. Continua Complex Intell. Syst. Comput. Sci. Eng. Commun. ACM COMPUTER Comput. Graphics Forum COMPUTING EMPIR SOFTW ENG Enterp. Inf. Syst. EPJ Data Sci. ETRI J EURASIP J WIREL COMM Evolving Systems FORM METHOD SYST DES Front. Neurorob. FRONT COMPUT SCI-CHI IEEE Trans. Commun. IEEE Trans. Comput. Social Syst. IEEE Trans. Dependable Secure Comput. IEEE Trans. Green Commun. Networking IEEE Trans. Cognit. Commun. Networking IEEE Access IEEE Trans. Comput. IEEE Antennas Propag. Mag. IEEE Micro IEEE Trans. Antennas Propag. IEEE Trans. Control Syst. Technol. IEEE Trans. Big Data IEEE Trans. Cybern. IEEE Internet Comput. IEEE Trans. Affective Comput. IEEE Trans. Emerging Top. Comput. Intell. IEEE SECUR PRIV IEEE Trans. Emerging Top. Comput. IEEE Trans. Aerosp. Electron. Syst. IEEE Trans. Broadcast. IEEE Intell. Syst. IEEE Commun. Lett. IEEE Trans. Autom. Control IEEE Trans. Cloud Comput. IEEE Trans. Evol. Comput. IEEE Trans. Consum. Electron. IEEE Trans. Fuzzy Syst. IEEE Trans. Haptic IEEE Trans. Image Process. IEEE Multimedia IEEE Rob. Autom. Lett. IEEE J. Sel. Areas Commun. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. IETE Tech. Rev. IEEE Trans. Serv. Comput. IEEE Trans. Parallel Distrib. Syst. IEEE Trans. Sustainable Comput. IEEE Trans. Multimedia IEEE Trans. Ind. Inf. IEEE Trans. Neural Networks Learn. Syst. IEEE Trans. Software Eng. IEEE-ACM T AUDIO SPE IEEE Wireless Commun. IEEE Wireless Commun. Lett. IET MICROW ANTENNA P IEEE Trans. Visual Comput. Graphics IEEE Trans. Ind. Electron. IET Optoelectron IEEE Trans. Veh. Technol. IEEE Trans. Netw. Serv. Manage. IEEE Trans. Pattern Anal. Mach. Intell. IEEE Trans. Wireless Commun. IEEE ACM T NETWORK IEEE Trans. Inf. Forensics Secur. IEEE Trans. Inf. Theory IEEE Trans. Knowl. Data Eng. INFORM SYST FRONT INFORMS J COMPUT INFOR Int. J. Comput. Vision Int. J. Approximate Reasoning Int. J. Control Int. J. Commun. Syst. Int. J. Imaging Syst. Technol. Int. J. Fuzzy Syst. Int. J. Intell. Syst. Int. J. Network Manage. Int. J. Parallel Program. Int. J. Social Rob. Int. J. Software Tools Technol. Trans.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1