首页 > 最新文献

Applied Intelligence最新文献

英文 中文
Attribute reduction algorithm based on variable precision neighborhood rough set with bivariate and inclusion degree 基于二元和包含度变精度邻域粗糙集的属性约简算法
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-25 DOI: 10.1007/s10489-026-07106-3
Shangzhi Wu, Jie Liu, Zhengwei Hao

Attribute reduction can improve the information processing efficiency by removing redundant attributes without degrading the information in the dataset. The delineation of information granularity and fault tolerance in attribute reduction are essential. To address the above issues, an attribute reduction algorithm based on variable precision neighborhood rough set with bivariate and inclusion degree is proposed. Firstly, the concepts of bivariate and inclusion degree are introduced into the variable precision neighborhood rough set. The bivariate is used to compute the distance between the information granules and thus divide the information granules. The inclusion degree is used to increase the fault tolerance of the model when calculating the upper and lower approximations. Secondly, novel model enhances ability to divide information granules and fault tolerance. On this foundation, an algorithm for attribute reduction is proposed. Finally, this paper ensures the validity of the experiments by conducting them on real datasets. The experimental results show that the algorithm is able to reduce the redundant attributes in the dataset.

属性约简可以在不降低数据集信息质量的前提下去除冗余属性,从而提高信息处理效率。在属性约简中,信息粒度的描述和容错是至关重要的。针对上述问题,提出了一种基于二元包容度变精度邻域粗糙集的属性约简算法。首先,在变精度邻域粗糙集中引入二元和包含度的概念;二元变量用于计算信息颗粒之间的距离,从而划分信息颗粒。在计算上下近似时,采用夹杂度来提高模型的容错性。其次,该模型增强了信息颗粒的划分能力和容错能力。在此基础上,提出了一种属性约简算法。最后,通过在实际数据集上进行实验,验证了实验的有效性。实验结果表明,该算法能够减少数据集中的冗余属性。
{"title":"Attribute reduction algorithm based on variable precision neighborhood rough set with bivariate and inclusion degree","authors":"Shangzhi Wu,&nbsp;Jie Liu,&nbsp;Zhengwei Hao","doi":"10.1007/s10489-026-07106-3","DOIUrl":"10.1007/s10489-026-07106-3","url":null,"abstract":"<div><p>Attribute reduction can improve the information processing efficiency by removing redundant attributes without degrading the information in the dataset. The delineation of information granularity and fault tolerance in attribute reduction are essential. To address the above issues, an attribute reduction algorithm based on variable precision neighborhood rough set with bivariate and inclusion degree is proposed. Firstly, the concepts of bivariate and inclusion degree are introduced into the variable precision neighborhood rough set. The bivariate is used to compute the distance between the information granules and thus divide the information granules. The inclusion degree is used to increase the fault tolerance of the model when calculating the upper and lower approximations. Secondly, novel model enhances ability to divide information granules and fault tolerance. On this foundation, an algorithm for attribute reduction is proposed. Finally, this paper ensures the validity of the experiments by conducting them on real datasets. The experimental results show that the algorithm is able to reduce the redundant attributes in the dataset.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 4","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147342066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CCNN-SCD: a deep composite architecture for skin cancer detection using feature engineering-aided convolutional neural networks for dermatological diagnosis CCNN-SCD:一种用于皮肤癌检测的深层复合架构,使用特征工程辅助卷积神经网络进行皮肤病学诊断
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-25 DOI: 10.1007/s10489-026-07111-6
Madiha Hameed, Aneela Zameer Jaffery, Muhammad Yousaf Hamza, Muhammad Asif Zahoor Raja

Melanoma skin cancer is an important, growing, and among the most lethal skin cancer forms, but could easily be cured if detected at an earlier stage. The primary cause of skin cancer is the abnormal growth of skin cells, often resulting from prolonged exposure to sunlight. Unfortunately, identifying malignant skin development at an early stage is both costly and challenging. The grading of skin cancer is determined based on the location of the tumor and the type of cells involved. Accurate classification of lesions requires a particular range of precision and recall, presenting a significant challenge in dermatology. This paper proposes a new system, the Composite Convolutional Neural Network (CCNN) skin cancer detector (SCD), based on the use of more sophisticated deep learning systems known as composite convolutional neural networks. The framework is applied in three primary stages. During the first step, noise, normalization, and increment of dermoscopic images are performed to improve the quality of data and variability. The next stage will utilize transfer learning using the models ResNet50, InceptionV3, and VGG16 to identify the high-level discriminative features. In the classification module, skin lesion classification is done using a customized CNN where the feature dimensionality reduction is done using an attention-based autoencoder to enhance the generalization of the model. Lastly, the deep features obtained are tested with convolutional machine learning classifiers, such as K - nearest neighbor (KNN), random forest classifier (RFC), and support-vector machine (SVM), which allow to compare the performance of the derived deep features; the accuracy of 93.53%, the precision of 94.02%, and the recall of 93.61% were achieved, which leads to the conclusion of the strength of the proposed approach and its possible use in clinical dermatologic diagnosis.

黑色素瘤皮肤癌是一种重要的、不断增长的、最致命的皮肤癌之一,但如果在早期发现,很容易治愈。皮肤癌的主要原因是皮肤细胞的异常生长,通常是由于长时间暴露在阳光下造成的。不幸的是,在早期阶段识别恶性皮肤发展既昂贵又具有挑战性。皮肤癌的分级是根据肿瘤的位置和涉及的细胞类型来确定的。病灶的准确分类需要一定的精确度和召回率,这在皮肤病学中是一个重大的挑战。本文提出了一个新的系统,复合卷积神经网络(CCNN)皮肤癌检测器(SCD),基于使用更复杂的深度学习系统,即复合卷积神经网络。该框架的应用分为三个主要阶段。在第一步中,对皮肤镜图像进行噪声、归一化和增量处理,以提高数据质量和可变性。下一阶段将利用迁移学习,使用ResNet50、InceptionV3和VGG16模型来识别高级判别特征。在分类模块中,使用自定义CNN进行皮肤病变分类,其中使用基于注意力的自编码器进行特征降维,以增强模型的泛化性。最后,使用卷积机器学习分类器(如K近邻分类器(KNN)、随机森林分类器(RFC)和支持向量机(SVM))对得到的深度特征进行测试,以比较得到的深度特征的性能;准确率为93.53%,精密度为94.02%,召回率为93.61%,表明该方法具有较强的临床应用价值。
{"title":"CCNN-SCD: a deep composite architecture for skin cancer detection using feature engineering-aided convolutional neural networks for dermatological diagnosis","authors":"Madiha Hameed,&nbsp;Aneela Zameer Jaffery,&nbsp;Muhammad Yousaf Hamza,&nbsp;Muhammad Asif Zahoor Raja","doi":"10.1007/s10489-026-07111-6","DOIUrl":"10.1007/s10489-026-07111-6","url":null,"abstract":"<div>\u0000 \u0000 <p>Melanoma skin cancer is an important, growing, and among the most lethal skin cancer forms, but could easily be cured if detected at an earlier stage. The primary cause of skin cancer is the abnormal growth of skin cells, often resulting from prolonged exposure to sunlight. Unfortunately, identifying malignant skin development at an early stage is both costly and challenging. The grading of skin cancer is determined based on the location of the tumor and the type of cells involved. Accurate classification of lesions requires a particular range of precision and recall, presenting a significant challenge in dermatology. This paper proposes a new system, the Composite Convolutional Neural Network (CCNN) skin cancer detector (SCD), based on the use of more sophisticated deep learning systems known as composite convolutional neural networks. The framework is applied in three primary stages. During the first step, noise, normalization, and increment of dermoscopic images are performed to improve the quality of data and variability. The next stage will utilize transfer learning using the models ResNet50, InceptionV3, and VGG16 to identify the high-level discriminative features. In the classification module, skin lesion classification is done using a customized CNN where the feature dimensionality reduction is done using an attention-based autoencoder to enhance the generalization of the model. Lastly, the deep features obtained are tested with convolutional machine learning classifiers, such as K - nearest neighbor (KNN), random forest classifier (RFC), and support-vector machine (SVM), which allow to compare the performance of the derived deep features; the accuracy of 93.53%, the precision of 94.02%, and the recall of 93.61% were achieved, which leads to the conclusion of the strength of the proposed approach and its possible use in clinical dermatologic diagnosis.</p>\u0000 </div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 4","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147342065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DSAN: a dual-scale spatial-temporal aggregation network for robust gait recognition in one-shot and cross-view scenarios DSAN:一种双尺度时空聚合网络,用于单镜头和交叉视角场景下的鲁棒步态识别
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-25 DOI: 10.1007/s10489-026-07146-9
Liang Ren, Xiuhui Wang, Wei Qi Yan

Gait recognition technology represents a promising biometric identification method with extensive applications in fields such as visual surveillance. However, addressing the few-shot learning challenge remains an urgent issue in practical applications. In this paper, we propose a novel Dual-Scale Spatial-Temporal Aggregation Network (DSAN), which extracts gait features at different temporal scales through a dual-branch architecture. The Multi-Scale Temporal Aggregation Module (MSTA) is designed to aggregate multi-scale local temporal information from long-term gait features, employing an attention mechanism to reduce redundant information and enhance temporal feature representation. Additionally, the Covariance-guided Spatial-Temporal Attention Module (CoSTA) integrates short-term features into long-term representations through global covariance pooling and attention mechanisms, further improving feature expressiveness. Furthermore, we introduce the View Variation Adversarial Normalization (VVAN), which mitigates cross-view discrepancies under few-shot conditions via adversarial learning. Different from GaitGL, DSAN explicitly captures long-term and short-term dynamics and performs cross-scale fusion via MSTA and CoSTA. In contrast to GaitDAN’s multi-discriminator formulation, VVAN employs a single multi-class view discriminator, requiring only a change in the classifier output dimension when the view set varies. Extensive experiments are conducted on the CASIA-B and OUMVLP datasets, with DSAN evaluated in a one-shot setting on CASIA-B. The experimental results demonstrate that DSAN achieves state-of-the-art performance in gait recognition tasks and effectively alleviates the adverse impact of the one-shot problem on recognition accuracy.

步态识别技术是一种很有前途的生物特征识别方法,在视觉监控等领域有着广泛的应用。然而,在实际应用中,如何解决少镜头学习挑战仍然是一个迫切需要解决的问题。在本文中,我们提出了一种新的双尺度时空聚合网络(DSAN),该网络通过双分支架构提取不同时间尺度的步态特征。多尺度时间聚合模块(MSTA)旨在从长时步态特征中聚合多尺度局部时间信息,采用注意机制减少冗余信息,增强时间特征表征。此外,协方差引导的时空注意模块(CoSTA)通过全局协方差池化和注意机制,将短期特征整合为长期表征,进一步提高了特征的表达能力。此外,我们引入了视点变化对抗性归一化(VVAN),它通过对抗性学习减轻了在少数镜头条件下的交叉视点差异。与GaitGL不同,DSAN明确捕获长期和短期动态,并通过MSTA和CoSTA进行跨尺度融合。与GaitDAN的多鉴别器公式不同,VVAN使用单个多类视图鉴别器,当视图集发生变化时,只需要改变分类器输出维度。在CASIA-B和OUMVLP数据集上进行了大量实验,并在CASIA-B上一次性评估了DSAN。实验结果表明,DSAN在步态识别任务中达到了最先进的性能,有效缓解了一次性问题对识别精度的不利影响。
{"title":"DSAN: a dual-scale spatial-temporal aggregation network for robust gait recognition in one-shot and cross-view scenarios","authors":"Liang Ren,&nbsp;Xiuhui Wang,&nbsp;Wei Qi Yan","doi":"10.1007/s10489-026-07146-9","DOIUrl":"10.1007/s10489-026-07146-9","url":null,"abstract":"<div>\u0000 \u0000 <p>Gait recognition technology represents a promising biometric identification method with extensive applications in fields such as visual surveillance. However, addressing the few-shot learning challenge remains an urgent issue in practical applications. In this paper, we propose a novel Dual-Scale Spatial-Temporal Aggregation Network (DSAN), which extracts gait features at different temporal scales through a dual-branch architecture. The Multi-Scale Temporal Aggregation Module (MSTA) is designed to aggregate multi-scale local temporal information from long-term gait features, employing an attention mechanism to reduce redundant information and enhance temporal feature representation. Additionally, the Covariance-guided Spatial-Temporal Attention Module (CoSTA) integrates short-term features into long-term representations through global covariance pooling and attention mechanisms, further improving feature expressiveness. Furthermore, we introduce the View Variation Adversarial Normalization (VVAN), which mitigates cross-view discrepancies under few-shot conditions via adversarial learning. Different from GaitGL, DSAN explicitly captures long-term and short-term dynamics and performs cross-scale fusion via MSTA and CoSTA. In contrast to GaitDAN’s multi-discriminator formulation, VVAN employs a single multi-class view discriminator, requiring only a change in the classifier output dimension when the view set varies. Extensive experiments are conducted on the CASIA-B and OUMVLP datasets, with DSAN evaluated in a one-shot setting on CASIA-B. The experimental results demonstrate that DSAN achieves state-of-the-art performance in gait recognition tasks and effectively alleviates the adverse impact of the one-shot problem on recognition accuracy.</p>\u0000 </div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 4","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147341608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Matrix-based multi-granularity multi-source fuzzy information fusion for three-dimensional dynamic data 基于矩阵的三维动态数据多粒度多源模糊信息融合
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-25 DOI: 10.1007/s10489-026-07098-0
Xiaoyan Zhang, Jiaxin Guo

Multi-source information fusion is a key technique in big data technology, essential for data mining and knowledge discovery. When merging traditional multi-source systems into single systems, data loss is a common issue. Multi-granularity information fusion methods are designed to solve this problem. However, there is limited research on how to use these methods dynamically to extract valuable information when there are changes in three dimensions: objects, information sources, and the number of attributes. Additionally, information fusion in multi-source fuzzy information systems can better handle uncertainties in data. This paper proposes a method for calculating multi-granularity fusion operators based on a similarity relationship matrix and creates an incremental update fusion mechanism for six different scenarios, such as adding or removing objects, information sources, and attributes. Experimental results on 12 public datasets show that our proposed dynamic fusion method has clear advantages in dealing with complex changes and can update the fusion operator more efficiently.

多源信息融合是大数据技术中的一项关键技术,对数据挖掘和知识发现至关重要。在将传统的多源系统合并为单个系统时,数据丢失是一个常见的问题。为解决这一问题,设计了多粒度信息融合方法。然而,如何在对象、信息源和属性数量三个维度发生变化时,动态地利用这些方法提取有价值的信息,目前的研究还很有限。此外,多源模糊信息系统中的信息融合可以更好地处理数据中的不确定性。提出了一种基于相似关系矩阵的多粒度融合算子计算方法,并针对对象、信息源、属性等6种不同场景建立了一种增量更新融合机制。在12个公共数据集上的实验结果表明,本文提出的动态融合方法在处理复杂变化方面具有明显的优势,并且可以更有效地更新融合算子。
{"title":"Matrix-based multi-granularity multi-source fuzzy information fusion for three-dimensional dynamic data","authors":"Xiaoyan Zhang,&nbsp;Jiaxin Guo","doi":"10.1007/s10489-026-07098-0","DOIUrl":"10.1007/s10489-026-07098-0","url":null,"abstract":"<div><p>Multi-source information fusion is a key technique in big data technology, essential for data mining and knowledge discovery. When merging traditional multi-source systems into single systems, data loss is a common issue. Multi-granularity information fusion methods are designed to solve this problem. However, there is limited research on how to use these methods dynamically to extract valuable information when there are changes in three dimensions: objects, information sources, and the number of attributes. Additionally, information fusion in multi-source fuzzy information systems can better handle uncertainties in data. This paper proposes a method for calculating multi-granularity fusion operators based on a similarity relationship matrix and creates an incremental update fusion mechanism for six different scenarios, such as adding or removing objects, information sources, and attributes. Experimental results on 12 public datasets show that our proposed dynamic fusion method has clear advantages in dealing with complex changes and can update the fusion operator more efficiently.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 4","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147341609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FREQuency ATTribution: benchmarking frequency-based occlusion for time series data 频率归属:对时间序列数据的基于频率的遮挡进行基准测试
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-24 DOI: 10.1007/s10489-026-07116-1
Dominique Mercier, Andreas Dengel, Sheraz Ahmed

Deep neural networks are among the most successful algorithms in terms of performance and scalability across different domains. However, since these networks are black boxes, their usability is severely restricted due to a lack of interpretability. Existing interpretability methods do not address the analysis of time-series-based networks specifically enough. This paper shows that an analysis in the frequency domain can not only highlight relevant areas in the input signal better than existing methods but is also more robust to fluctuations in the signal. In this paper, FreqAtt is presented — a framework that enables post-hoc interpretation of time-series analysis. To achieve this, the relevant frequencies are evaluated, and the signal is either filtered or the relevant input data is marked. FreqAtt is evaluated using a wide range of statistical metrics to provide a broad overview of its performance. The results show that using frequency-based attribution, especially in combination with traditional attribution on top of the frequency-optimized signal, provides strong performance across different metrics.

在不同领域的性能和可扩展性方面,深度神经网络是最成功的算法之一。然而,由于这些网络是黑盒子,由于缺乏可解释性,它们的可用性受到严重限制。现有的可解释性方法不能足够具体地处理基于时间序列的网络的分析。本文表明,在频域进行分析不仅可以比现有方法更好地突出输入信号的相关区域,而且对信号的波动具有更强的鲁棒性。本文介绍了FreqAtt——一个能够对时间序列分析进行事后解释的框架。为了实现这一目标,对相关频率进行评估,并对信号进行滤波或标记相关输入数据。FreqAtt使用广泛的统计指标进行评估,以提供其性能的广泛概述。结果表明,使用基于频率的属性,特别是在频率优化信号的基础上与传统属性相结合,可以在不同指标上提供较强的性能。
{"title":"FREQuency ATTribution: benchmarking frequency-based occlusion for time series data","authors":"Dominique Mercier,&nbsp;Andreas Dengel,&nbsp;Sheraz Ahmed","doi":"10.1007/s10489-026-07116-1","DOIUrl":"10.1007/s10489-026-07116-1","url":null,"abstract":"<div><p>Deep neural networks are among the most successful algorithms in terms of performance and scalability across different domains. However, since these networks are black boxes, their usability is severely restricted due to a lack of interpretability. Existing interpretability methods do not address the analysis of time-series-based networks specifically enough. This paper shows that an analysis in the frequency domain can not only highlight relevant areas in the input signal better than existing methods but is also more robust to fluctuations in the signal. In this paper, <i>FreqAtt</i> is presented — a framework that enables post-hoc interpretation of time-series analysis. To achieve this, the relevant frequencies are evaluated, and the signal is either filtered or the relevant input data is marked. <i>FreqAtt</i> is evaluated using a wide range of statistical metrics to provide a broad overview of its performance. The results show that using frequency-based attribution, especially in combination with traditional attribution on top of the frequency-optimized signal, provides strong performance across different metrics.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 4","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147341929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tracking a fixed-wing unmanned aerial vehicle: an experimental evaluation 固定翼无人机跟踪:实验评估
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-24 DOI: 10.1007/s10489-026-07148-7
Yong Wang, Xiangyu Zhu, Zhiyang Sun, Robert Laganiere, Lu Ding

Tracking a fixed-wing unmanned aerial vehicle (UAV) plays a key role in the aerospace applications, but this research area lacks a high-diversity and large-scale benchmark dataset, which is a key factor for both the comprehensive evaluation of UAV to UAV (UAV2UAV) tracking algorithms and the development of deep learning methods. It is thus essential to construct a UAV2UAV dataset to fill the gap and advance the research in this domain. To this end, we release here a large-scale dataset, UAV2UAV-2 dataset, for a fixed-wing UAV tracking. Our UAV2UAV-2 dataset includes 49 videos with more than 27k frames in total. A training dataset with more than 4k images and various background scenarios is also provided, which can be used to further validate the training process. A comprehensive performance evaluation of 24 tracking methods is carried out on this dataset. Three representative tracking methods which are trained on the training dataset achieve superior results compared to the original corresponding methods. This indicates that training with dedicated data can improve tracking performance, and this approach can be extended to more tracking algorithms. A detailed analysis of the performance results is also provided. The data will be available at: https://github.com/zxysysu/UAV2UAV-2.

固定翼无人机(UAV)的跟踪在航空航天应用中发挥着关键作用,但该研究领域缺乏高多样性和大规模的基准数据集,这是综合评估无人机对无人机(UAV2UAV)跟踪算法和发展深度学习方法的关键因素。因此,构建UAV2UAV数据集来填补这一空白并推进这一领域的研究是必要的。为此,我们在此发布一个大型数据集,UAV2UAV-2数据集,用于固定翼无人机的跟踪。我们的UAV2UAV-2数据集包括49个视频,总共超过27k帧。还提供了一个包含超过4k图像和各种背景场景的训练数据集,可用于进一步验证训练过程。在此数据集上对24种跟踪方法进行了综合性能评价。在训练数据集上训练的三种有代表性的跟踪方法与原有的相应方法相比,取得了更好的效果。这表明使用专用数据训练可以提高跟踪性能,并且这种方法可以扩展到更多的跟踪算法中。对性能结果进行了详细的分析。这些数据可在https://github.com/zxysysu/UAV2UAV-2上获得。
{"title":"Tracking a fixed-wing unmanned aerial vehicle: an experimental evaluation","authors":"Yong Wang,&nbsp;Xiangyu Zhu,&nbsp;Zhiyang Sun,&nbsp;Robert Laganiere,&nbsp;Lu Ding","doi":"10.1007/s10489-026-07148-7","DOIUrl":"10.1007/s10489-026-07148-7","url":null,"abstract":"<div>\u0000 \u0000 <p>Tracking a fixed-wing unmanned aerial vehicle (UAV) plays a key role in the aerospace applications, but this research area lacks a high-diversity and large-scale benchmark dataset, which is a key factor for both the comprehensive evaluation of UAV to UAV (UAV2UAV) tracking algorithms and the development of deep learning methods. It is thus essential to construct a UAV2UAV dataset to fill the gap and advance the research in this domain. To this end, we release here a large-scale dataset, UAV2UAV-2 dataset, for a fixed-wing UAV tracking. Our UAV2UAV-2 dataset includes 49 videos with more than 27k frames in total. A training dataset with more than 4k images and various background scenarios is also provided, which can be used to further validate the training process. A comprehensive performance evaluation of 24 tracking methods is carried out on this dataset. Three representative tracking methods which are trained on the training dataset achieve superior results compared to the original corresponding methods. This indicates that training with dedicated data can improve tracking performance, and this approach can be extended to more tracking algorithms. A detailed analysis of the performance results is also provided. The data will be available at: https://github.com/zxysysu/UAV2UAV-2.</p>\u0000 </div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 4","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147341928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TRTP: a three-stage robust task planning framework for open worlds via visual-language models and digital twin simulation TRTP:一个通过视觉语言模型和数字孪生仿真的面向开放世界的三阶段鲁棒任务规划框架
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-23 DOI: 10.1007/s10489-026-07143-y
Yuanjin Qu, Xiangtao Hu, Fei Chen, Zhihong Wei

Robotic task planning in open-world environments remains challenging due to the requirements of cross-modal semantic understanding, action feasibility verification, and reliable closed-loop correction. This paper presents TRTP, a three-stage robust task-planning framework that combines vision-language models with digital twin simulation to address these issues. The framework consists of three tightly integrated components. The spatial prompt generation module converts scene images into structured spatial prompts that encode relations such as orientation and containment. The task planning module uses these prompts, along with task instructions and demonstration videos, to produce an initial sequence of executable actions. The digital twin simulation module evaluates spatial reachability and physical feasibility in a virtual environment and provides structured corrective feedback. Together, these modules form a complete multimodal, closed-loop planning pipeline that mitigates failures arising from incomplete spatial modeling or the lack of physical validation. We further develop a multi-model inference and cross-scoring data generation pipeline to support fine-tuning of the vision-language models used in the spatial-prompt and task planning modules, thereby reducing reliance on manual annotation. Experimental evaluations demonstrate that TRTP achieves marked improvements in task consistency, physical feasibility, and overall success rate compared with baselines that omit spatial prompts or feedback. The results confirm the complementary contributions of the three stages and show that TRTP generalizes effectively to diverse real-world robotic scenarios. Home page: https://trtp.github.io/

由于对跨模态语义理解、动作可行性验证和可靠的闭环校正的要求,开放世界环境下的机器人任务规划仍然具有挑战性。本文提出了TRTP,一个将视觉语言模型与数字孪生仿真相结合的三阶段鲁棒任务规划框架来解决这些问题。该框架由三个紧密集成的组件组成。空间提示生成模块将场景图像转换为结构化的空间提示,这些空间提示对方向和包容等关系进行编码。任务规划模块使用这些提示,以及任务说明和演示视频,生成可执行操作的初始序列。数字孪生仿真模块在虚拟环境中评估空间可达性和物理可行性,并提供结构化的纠正反馈。这些模块共同构成了一个完整的多模态闭环规划管道,可以减少由于空间建模不完整或缺乏物理验证而导致的故障。我们进一步开发了一个多模型推理和交叉评分数据生成管道,以支持空间提示和任务规划模块中使用的视觉语言模型的微调,从而减少对手动注释的依赖。实验评估表明,与忽略空间提示或反馈的基线相比,TRTP在任务一致性、物理可行性和总体成功率方面取得了显著改善。结果证实了这三个阶段的互补贡献,并表明TRTP有效地推广到各种现实世界的机器人场景。主页:https://trtp.github.io/
{"title":"TRTP: a three-stage robust task planning framework for open worlds via visual-language models and digital twin simulation","authors":"Yuanjin Qu,&nbsp;Xiangtao Hu,&nbsp;Fei Chen,&nbsp;Zhihong Wei","doi":"10.1007/s10489-026-07143-y","DOIUrl":"10.1007/s10489-026-07143-y","url":null,"abstract":"<div><p>Robotic task planning in open-world environments remains challenging due to the requirements of cross-modal semantic understanding, action feasibility verification, and reliable closed-loop correction. This paper presents TRTP, a three-stage robust task-planning framework that combines vision-language models with digital twin simulation to address these issues. The framework consists of three tightly integrated components. The spatial prompt generation module converts scene images into structured spatial prompts that encode relations such as orientation and containment. The task planning module uses these prompts, along with task instructions and demonstration videos, to produce an initial sequence of executable actions. The digital twin simulation module evaluates spatial reachability and physical feasibility in a virtual environment and provides structured corrective feedback. Together, these modules form a complete multimodal, closed-loop planning pipeline that mitigates failures arising from incomplete spatial modeling or the lack of physical validation. We further develop a multi-model inference and cross-scoring data generation pipeline to support fine-tuning of the vision-language models used in the spatial-prompt and task planning modules, thereby reducing reliance on manual annotation. Experimental evaluations demonstrate that TRTP achieves marked improvements in task consistency, physical feasibility, and overall success rate compared with baselines that omit spatial prompts or feedback. The results confirm the complementary contributions of the three stages and show that TRTP generalizes effectively to diverse real-world robotic scenarios. Home page: https://trtp.github.io/</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 4","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147341218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Benefit-cost frontier-aware semantic reasoning for zero-shot object navigation 零射击目标导航的利益-成本边界感知语义推理
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-23 DOI: 10.1007/s10489-026-07144-x
Hanrui Chen, Liqi Yan, Qifan Wang, Jianhui Zhang, Fangli Guan, Pan Li

Zero-shot object navigation involves locating target objects in unseen environments, a task fundamental to embodied intelligence. Traditional approaches, particularly recent vision-language navigation models, leverage Large Language Models (LLMs) to enable multimodal reasoning based on real-time visual perception. However, two key challenges remain: (1) Semantic mismatches emerge between the global map and the environment when frontier selection optimizes a singular optimization criterion; and (2) General-purpose LLMs are challenging to optimize for navigation tasks, leading to limited navigation-specific reasoning capabilities. To address these issues, we propose Benefit Frontier Semantic Map (BFSMap) to enable human-like semantic exploration through iterative reasoning. First, BFSMap reformulates semantic mapping as an image captioning task, integrating multiple maps to derive an optimal frontier that balances benefit and cost, thereby alleviating vision–language misalignment in prior end-to-end methods. Second, we introduce a lightweight semantic-aware benefit prediction module (LightSA), trained from scratch using a novel prompt learning strategy for cross-view semantic reasoning to update the semantic maps. Third, we design a modular object-aware decision-making policy that mimics human-like reasoning to identify the target object and correct suboptimal paths promptly. Our model achieves state-of-the-art performance (+2.8% SR on HM3D, +1.2% SR on MP3D compared to baseline) without relying on LLMs, demonstrating the promise of efficient navigation models based on human-like reasoning for unknown environment. Our code will be available.

零射击目标导航涉及在看不见的环境中定位目标物体,这是具身智能的一项基本任务。传统的方法,特别是最近的视觉语言导航模型,利用大型语言模型(llm)来实现基于实时视觉感知的多模态推理。然而,存在两个关键挑战:(1)边界选择优化单一优化准则时,全局地图与环境之间存在语义不匹配;(2)通用llm难以针对导航任务进行优化,导致导航特定推理能力有限。为了解决这些问题,我们提出了利益边界语义地图(BFSMap),通过迭代推理实现类似人类的语义探索。首先,BFSMap将语义映射重新定义为图像字幕任务,整合多个地图以获得平衡收益和成本的最佳边界,从而减轻先前端到端方法中的视觉语言偏差。其次,我们引入了一个轻量级的语义感知效益预测模块(LightSA),该模块使用一种新的跨视图语义推理快速学习策略从头开始训练,以更新语义地图。第三,我们设计了一个模块化的对象感知决策策略,模仿人类的推理来识别目标对象并及时纠正次优路径。我们的模型在不依赖llm的情况下实现了最先进的性能(HM3D上+2.8%的SR, MP3D上+1.2%的SR),证明了基于人类对未知环境的推理的高效导航模型的前景。我们的代码将可用。
{"title":"Benefit-cost frontier-aware semantic reasoning for zero-shot object navigation","authors":"Hanrui Chen,&nbsp;Liqi Yan,&nbsp;Qifan Wang,&nbsp;Jianhui Zhang,&nbsp;Fangli Guan,&nbsp;Pan Li","doi":"10.1007/s10489-026-07144-x","DOIUrl":"10.1007/s10489-026-07144-x","url":null,"abstract":"<div>\u0000 \u0000 <p>Zero-shot object navigation involves locating target objects in unseen environments, a task fundamental to embodied intelligence. Traditional approaches, particularly recent vision-language navigation models, leverage Large Language Models (LLMs) to enable multimodal reasoning based on real-time visual perception. However, two key challenges remain: (1) Semantic mismatches emerge between the global map and the environment when frontier selection optimizes a singular optimization criterion; and (2) General-purpose LLMs are challenging to optimize for navigation tasks, leading to limited navigation-specific reasoning capabilities. To address these issues, we propose <b>Benefit Frontier Semantic Map (BFSMap)</b> to enable human-like semantic exploration through iterative reasoning. First, BFSMap reformulates semantic mapping as an image captioning task, integrating multiple maps to derive an optimal frontier that balances benefit and cost, thereby alleviating vision–language misalignment in prior end-to-end methods. Second, we introduce a lightweight semantic-aware benefit prediction module (LightSA), trained from scratch using a novel prompt learning strategy for cross-view semantic reasoning to update the semantic maps. Third, we design a modular object-aware decision-making policy that mimics human-like reasoning to identify the target object and correct suboptimal paths promptly. Our model achieves state-of-the-art performance (+2.8% SR on HM3D, +1.2% SR on MP3D compared to baseline) without relying on LLMs, demonstrating the promise of efficient navigation models based on human-like reasoning for unknown environment. <i>Our code will be available.</i></p>\u0000 </div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 4","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147341217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FCFCNN: frequency coupled fusion convolutional neural network for hyperspectral and LiDAR data classification FCFCNN:用于高光谱和激光雷达数据分类的频率耦合融合卷积神经网络
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-21 DOI: 10.1007/s10489-026-07126-z
Yin Yin, Yining Feng, Chuanming Song, Xianghai Wang

With the rapid advancement of remote sensing (RS) technology, the difficulty of acquiring multi-source RS data is significantly decreasing. Land use/land cover (LULC) classification methods that fuse hyperspectral (HS) image and light detection and ranging (LiDAR) have become a current research hotspot. However, current mainstream methods primarily focus on extracting the most salient features from HS image and LiDAR data. They emphasize the rich spectral information of HS image and the precise elevation information of LiDAR, often overlooking the important frequency-domain characteristics inherent in both data sources. Based on this, we propose a novel network framework called frequency coupled fusion convolutional neural network (FCFCNN), which aims to enhance classification performance by integrating frequency features from HS image and LiDAR data as supplementary information in the feature fusion process. The network is designed with a dual-branch structure. The first branch focuses on frequency information, employing a frequency-splitting module to separately extract high-frequency and low-frequency features from HS image and LiDAR data. Subsequently, the local enhanced position attention module (LEPAM) and coupling strategy are used to fuse these high-frequency and low-frequency features, respectively. The second branch focuses on learning the spectral-spatial features of HS image and the elevation features in LiDAR data. Subsequently, all features extracted from the two branches are fused at the feature level. Finally, they are combined with spectral-spatial and elevation information through decision-level fusion to achieve accurate classification. The experimental results on three real RS datasets demonstrate that our method exhibits better effectiveness and accuracy compared to existing technologies.

随着遥感技术的快速发展,获取多源遥感数据的难度显著降低。融合高光谱(HS)图像和光探测与测距(LiDAR)的土地利用/土地覆盖(LULC)分类方法已成为当前的研究热点。然而,目前的主流方法主要集中在从HS图像和LiDAR数据中提取最显著的特征。他们强调HS图像的丰富光谱信息和LiDAR的精确高程信息,往往忽略了这两个数据源固有的重要频域特性。在此基础上,我们提出了一种新的网络框架——频率耦合融合卷积神经网络(FCFCNN),该网络在特征融合过程中通过将HS图像和LiDAR数据中的频率特征作为补充信息进行融合,从而提高分类性能。该网络采用双分支结构设计。第一个分支侧重于频率信息,利用分频模块分别从HS图像和LiDAR数据中提取高频和低频特征。随后,采用局部增强位置注意模块(LEPAM)和耦合策略分别对高频和低频特征进行融合。第二个分支重点学习HS图像的光谱空间特征和LiDAR数据中的高程特征。然后,在特征级上融合从两个分支中提取的所有特征。最后,通过决策级融合,与光谱空间信息和高程信息相结合,实现准确分类。在三个真实遥感数据集上的实验结果表明,与现有技术相比,我们的方法具有更好的有效性和准确性。
{"title":"FCFCNN: frequency coupled fusion convolutional neural network for hyperspectral and LiDAR data classification","authors":"Yin Yin,&nbsp;Yining Feng,&nbsp;Chuanming Song,&nbsp;Xianghai Wang","doi":"10.1007/s10489-026-07126-z","DOIUrl":"10.1007/s10489-026-07126-z","url":null,"abstract":"<div>\u0000 \u0000 <p>With the rapid advancement of remote sensing (RS) technology, the difficulty of acquiring multi-source RS data is significantly decreasing. Land use/land cover (LULC) classification methods that fuse hyperspectral (HS) image and light detection and ranging (LiDAR) have become a current research hotspot. However, current mainstream methods primarily focus on extracting the most salient features from HS image and LiDAR data. They emphasize the rich spectral information of HS image and the precise elevation information of LiDAR, often overlooking the important frequency-domain characteristics inherent in both data sources. Based on this, we propose a novel network framework called frequency coupled fusion convolutional neural network (FCFCNN), which aims to enhance classification performance by integrating frequency features from HS image and LiDAR data as supplementary information in the feature fusion process. The network is designed with a dual-branch structure. The first branch focuses on frequency information, employing a frequency-splitting module to separately extract high-frequency and low-frequency features from HS image and LiDAR data. Subsequently, the local enhanced position attention module (LEPAM) and coupling strategy are used to fuse these high-frequency and low-frequency features, respectively. The second branch focuses on learning the spectral-spatial features of HS image and the elevation features in LiDAR data. Subsequently, all features extracted from the two branches are fused at the feature level. Finally, they are combined with spectral-spatial and elevation information through decision-level fusion to achieve accurate classification. The experimental results on three real RS datasets demonstrate that our method exhibits better effectiveness and accuracy compared to existing technologies.</p>\u0000 </div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 4","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147340907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UniVoice: a unified framework for text-to-speech, singing voice synthesis, and opera singing synthesis UniVoice:文本转语音、歌唱语音合成、歌剧歌唱合成的统一框架
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-20 DOI: 10.1007/s10489-026-07137-w
Yue Zhou, Peng Bai, Xiaodong Shi

Text-to-speech (TTS), singing voice synthesis (SVS), and opera singing synthesis (OSS) aim to convert linguistic or musical inputs into intelligible and expressive speech. Although these tasks share a similar generative pipeline, they differ substantially in acoustic characteristics such as pitch range, prosody, and expressiveness. These distinctions pose significant challenges for unified modeling, particularly in learning representations that are both generalizable across tasks and sufficiently expressive for each task’s specific requirements. Existing approaches typically employ separate models for each task, resulting in increased computational overhead and limited cross-task knowledge transfer. To address these challenges, we propose UniVoice, a unified voice synthesis framework designed to jointly model TTS, SVS, and OSS. UniVoice adopts discrete acoustic units as a unified representation across tasks, simplifying the learning objective. To accommodate task-specific variability, we introduce the task-specific adapter with Top-1 routing that enables the acoustic model to specialize through lightweight, learnable adapters, effectively mitigating cross-task interference while preserving parameter efficiency. Furthermore, we incorporate a task-aware postnet that refines unit sequences based on task identity and musical context, enhancing both expressiveness and output quality. The final waveform is synthesized using a unit-based vocoder built on BigVGAN, conditioned on discrete units and pitch information. Experiments on multiple datasets spanning TTS, SVS, and OSS demonstrate that UniVoice significantly outperforms existing single-task and multi-task baselines. Specifically, UniVoice achieves improvements of 0.26, 0.12, and 0.22 in average Mean Opinion Scores (MOS) over the multi-task baseline for TTS, SVS, and OSS, respectively. Moreover, UniVoice achieves these gains with a compact model size of only 49.4M parameters and maintains a fast inference speed, indicating its effectiveness in balancing synthesis quality and computational efficiency.

文本到语音(TTS)、歌唱声音合成(SVS)和歌剧歌唱合成(OSS)旨在将语言或音乐输入转换为可理解和富有表现力的语音。尽管这些任务共享相似的生成管道,但它们在音高范围、韵律和表现力等声学特征上存在很大差异。这些区别给统一建模带来了巨大的挑战,特别是在学习既可以跨任务一般化又可以充分表达每个任务特定需求的表示时。现有的方法通常为每个任务使用单独的模型,导致计算开销增加和跨任务知识转移受限。为了应对这些挑战,我们提出了统一的语音合成框架UniVoice,旨在联合建模TTS, SVS和OSS。UniVoice采用离散的声学单元作为跨任务的统一表示,简化了学习目标。为了适应特定于任务的可变性,我们引入了带有Top-1路由的特定于任务的适配器,该适配器使声学模型能够通过轻量级、可学习的适配器进行专一化,有效地减轻了跨任务干扰,同时保持了参数效率。此外,我们结合了一个任务感知的postnet,该postnet基于任务身份和音乐上下文来优化单元序列,从而增强了表达能力和输出质量。最后的波形是使用基于BigVGAN的基于单元的声码器合成的,以离散单元和基音信息为条件。在跨TTS、SVS和OSS的多个数据集上的实验表明,UniVoice显著优于现有的单任务和多任务基线。具体来说,UniVoice在TTS、SVS和OSS的多任务基线上的平均意见得分(MOS)分别提高了0.26、0.12和0.22。而且,UniVoice以仅49.4M个参数的紧凑模型尺寸实现了这些增益,并保持了快速的推理速度,表明其在平衡合成质量和计算效率方面的有效性。
{"title":"UniVoice: a unified framework for text-to-speech, singing voice synthesis, and opera singing synthesis","authors":"Yue Zhou,&nbsp;Peng Bai,&nbsp;Xiaodong Shi","doi":"10.1007/s10489-026-07137-w","DOIUrl":"10.1007/s10489-026-07137-w","url":null,"abstract":"<div>\u0000 \u0000 <p>Text-to-speech (TTS), singing voice synthesis (SVS), and opera singing synthesis (OSS) aim to convert linguistic or musical inputs into intelligible and expressive speech. Although these tasks share a similar generative pipeline, they differ substantially in acoustic characteristics such as pitch range, prosody, and expressiveness. These distinctions pose significant challenges for unified modeling, particularly in learning representations that are both generalizable across tasks and sufficiently expressive for each task’s specific requirements. Existing approaches typically employ separate models for each task, resulting in increased computational overhead and limited cross-task knowledge transfer. To address these challenges, we propose UniVoice, a unified voice synthesis framework designed to jointly model TTS, SVS, and OSS. UniVoice adopts discrete acoustic units as a unified representation across tasks, simplifying the learning objective. To accommodate task-specific variability, we introduce the task-specific adapter with Top-1 routing that enables the acoustic model to specialize through lightweight, learnable adapters, effectively mitigating cross-task interference while preserving parameter efficiency. Furthermore, we incorporate a task-aware postnet that refines unit sequences based on task identity and musical context, enhancing both expressiveness and output quality. The final waveform is synthesized using a unit-based vocoder built on BigVGAN, conditioned on discrete units and pitch information. Experiments on multiple datasets spanning TTS, SVS, and OSS demonstrate that UniVoice significantly outperforms existing single-task and multi-task baselines. Specifically, UniVoice achieves improvements of 0.26, 0.12, and 0.22 in average Mean Opinion Scores (MOS) over the multi-task baseline for TTS, SVS, and OSS, respectively. Moreover, UniVoice achieves these gains with a compact model size of only 49.4M parameters and maintains a fast inference speed, indicating its effectiveness in balancing synthesis quality and computational efficiency.</p>\u0000 </div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 4","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147340652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Applied Intelligence
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1