Neural Processing Letters最新文献

英文中文

Sub-One Quasi-Norm-Based k-Means Clustering Algorithm and Analyses 基于子一准规范的 k-Means 聚类算法及分析

IF 3.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Processing Letters

Pub Date : 2024-05-13 DOI: 10.1007/s11063-024-11615-y

Qi An, Shan Jiang

Recognizing the pivotal role of choosing an appropriate distance metric in designing the clustering algorithm, our focus is on innovating the k-means method by redefining the distance metric in its distortion. In this study, we introduce a novel k-means clustering algorithm utilizing a distance metric derived from the (ell _p) quasi-norm with (pin (0,1)). Through an illustrative example, we showcase the advantageous properties of the proposed distance metric compared to commonly used alternatives for revealing natural groupings in data. Subsequently, we present a novel k-means type heuristic by integrating this sub-one quasi-norm-based distance, offer a step-by-step iterative relocation scheme, and prove the convergence to the Kuhn-Tucker point. Finally, we empirically validate the effectiveness of our clustering method through experiments on synthetic and real-life datasets, both in their original form and with additional noise introduced. We also investigate the performance of the proposed method as a subroutine in a deep learning clustering algorithm. Our results demonstrate the efficacy of the proposed k-means algorithm in capturing distinctive patterns exhibited by certain data types.

认识到选择合适的距离度量在设计聚类算法中的关键作用，我们的重点是通过重新定义变形中的距离度量来创新 k-means 方法。在这项研究中，我们介绍了一种新颖的 k-means 聚类算法，该算法使用的距离度量来自 (ell _p) quasi-norm with (pin (0,1))。通过一个示例，我们展示了所提出的距离度量与常用的其他度量相比在揭示数据自然分组方面的优势特性。随后，我们提出了一种新颖的 k-means 类型启发式，通过整合这种基于子一准规范的距离，提供了一种逐步迭代的重定位方案，并证明了其对 Kuhn-Tucker 点的收敛性。最后，我们通过对合成数据集和实际数据集的实验，验证了我们的聚类方法的有效性，包括原始数据集和引入额外噪声的数据集。我们还研究了作为深度学习聚类算法子程序的拟议方法的性能。我们的研究结果表明，所提出的 k-means 算法能有效捕捉某些数据类型所表现出的独特模式。

{"title":"Sub-One Quasi-Norm-Based k-Means Clustering Algorithm and Analyses","authors":"Qi An, Shan Jiang","doi":"10.1007/s11063-024-11615-y","DOIUrl":"https://doi.org/10.1007/s11063-024-11615-y","url":null,"abstract":"Recognizing the pivotal role of choosing an appropriate distance metric in designing the clustering algorithm, our focus is on innovating the k-means method by redefining the distance metric in its distortion. In this study, we introduce a novel k-means clustering algorithm utilizing a distance metric derived from the (ell _p) quasi-norm with (pin (0,1)). Through an illustrative example, we showcase the advantageous properties of the proposed distance metric compared to commonly used alternatives for revealing natural groupings in data. Subsequently, we present a novel k-means type heuristic by integrating this sub-one quasi-norm-based distance, offer a step-by-step iterative relocation scheme, and prove the convergence to the Kuhn-Tucker point. Finally, we empirically validate the effectiveness of our clustering method through experiments on synthetic and real-life datasets, both in their original form and with additional noise introduced. We also investigate the performance of the proposed method as a subroutine in a deep learning clustering algorithm. Our results demonstrate the efficacy of the proposed k-means algorithm in capturing distinctive patterns exhibited by certain data types.","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"46 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140938931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Time Series Classification Based on Forward Echo State Convolution Network 基于前向回波状态卷积网络的时间序列分类

IF 3.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Processing Letters

Pub Date : 2024-05-11 DOI: 10.1007/s11063-024-11449-8

Lei Xia, Jianfeng Tang, Guangli Li, Jun Fu, Shukai Duan, Lidan Wang

The Echo state network (ESN) is an efficient recurrent neural network that has achieved good results in time series prediction tasks. Still, its application in time series classification tasks has yet to develop fully. In this study, we work on the time series classification problem based on echo state networks. We propose a new framework called forward echo state convolutional network (FESCN). It consists of two parts, the encoder and the decoder, where the encoder part is composed of a forward topology echo state network (FT-ESN), and the decoder part mainly consists of a convolutional layer and a max-pooling layer. We apply the proposed network framework to the univariate time series dataset UCR and compare it with six traditional methods and four neural network models. The experimental findings demonstrate that FESCN outperforms other methods in terms of overall classification accuracy. Additionally, we investigated the impact of reservoir size on network performance and observed that the optimal classification results were obtained when the reservoir size was set to 32. Finally, we investigated the performance of the network under noise interference, and the results show that FESCN has a more stable network performance compared to EMN (echo memory network).

回声状态网络（ESN）是一种高效的递归神经网络，在时间序列预测任务中取得了良好的效果。然而，它在时间序列分类任务中的应用还有待充分发展。在本研究中，我们致力于研究基于回波状态网络的时间序列分类问题。我们提出了一种名为前向回波状态卷积网络（FESCN）的新框架。它由编码器和解码器两部分组成，其中编码器部分由正向拓扑回声状态网络（FT-ESN）组成，解码器部分主要由卷积层和最大池化层组成。我们将提出的网络框架应用于单变量时间序列数据集 UCR，并与六种传统方法和四种神经网络模型进行了比较。实验结果表明，FESCN 在整体分类准确性方面优于其他方法。此外，我们还研究了储层规模对网络性能的影响，发现当储层规模设置为 32 时，分类结果最佳。最后，我们还研究了网络在噪声干扰下的性能，结果表明与 EMN（回声记忆网络）相比，FESCN 的网络性能更稳定。

{"title":"Time Series Classification Based on Forward Echo State Convolution Network","authors":"Lei Xia, Jianfeng Tang, Guangli Li, Jun Fu, Shukai Duan, Lidan Wang","doi":"10.1007/s11063-024-11449-8","DOIUrl":"https://doi.org/10.1007/s11063-024-11449-8","url":null,"abstract":"The Echo state network (ESN) is an efficient recurrent neural network that has achieved good results in time series prediction tasks. Still, its application in time series classification tasks has yet to develop fully. In this study, we work on the time series classification problem based on echo state networks. We propose a new framework called forward echo state convolutional network (FESCN). It consists of two parts, the encoder and the decoder, where the encoder part is composed of a forward topology echo state network (FT-ESN), and the decoder part mainly consists of a convolutional layer and a max-pooling layer. We apply the proposed network framework to the univariate time series dataset UCR and compare it with six traditional methods and four neural network models. The experimental findings demonstrate that FESCN outperforms other methods in terms of overall classification accuracy. Additionally, we investigated the impact of reservoir size on network performance and observed that the optimal classification results were obtained when the reservoir size was set to 32. Finally, we investigated the performance of the network under noise interference, and the results show that FESCN has a more stable network performance compared to EMN (echo memory network).","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"49 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140938985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Unified Asymmetric Knowledge Distillation Framework for Image Classification 图像分类的统一非对称知识提炼框架

IF 3.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Processing Letters

Pub Date : 2024-05-10 DOI: 10.1007/s11063-024-11606-z

Xin Ye, Xiang Tian, Bolun Zheng, Fan Zhou, Yaowu Chen

Knowledge distillation is a model compression technique that transfers knowledge learned by teacher networks to student networks. Existing knowledge distillation methods greatly expand the forms of knowledge, but also make the distillation models complex and symmetric. However, few studies have explored the commonalities among these methods. In this study, we propose a concise distillation framework to unify these methods and a method to construct asymmetric knowledge distillation under the framework. Asymmetric distillation aims to enable differentiated knowledge transfers for different distillation objects. We designed a multi-stage shallow-wide branch bifurcation method to distill different knowledge representations and a grouping ensemble strategy to supervise the network to teach and learn selectively. Consequently, we conducted experiments using image classification benchmarks to verify the proposed method. Experimental results show that our implementation can achieve considerable improvements over existing methods, demonstrating the effectiveness of the method and the potential of the framework.

知识蒸馏是一种将教师网络所学知识转移到学生网络的模型压缩技术。现有的知识蒸馏方法大大扩展了知识的形式，但也使蒸馏模型变得复杂和对称。然而，很少有研究探讨这些方法之间的共性。在本研究中，我们提出了一个简明的蒸馏框架来统一这些方法，并在该框架下提出了一种构建非对称知识蒸馏的方法。非对称蒸馏旨在针对不同的蒸馏对象实现差异化的知识转移。我们设计了一种多级浅宽分支分叉法来提炼不同的知识表征，并设计了一种分组集合策略来监督网络有选择地教学和学习。因此，我们使用图像分类基准进行了实验，以验证所提出的方法。实验结果表明，与现有方法相比，我们的实现方法可以取得相当大的改进，证明了该方法的有效性和该框架的潜力。

引用次数: 0

Pinning Group Consensus of Multi-agent Systems Under DoS Attacks DoS攻击下多代理系统的钉组共识

IF 3.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Processing Letters

Pub Date : 2024-05-10 DOI: 10.1007/s11063-024-11630-z

Qian Lang, Jing Xu, Huiwen Zhang, Zhengxin Wang

In this paper, group consensus is investigated for a class of nonlinear multi-agent systems suffered from the DoS attacks. Firstly, a first-order nonlinear multi-agent system is constructed, which is divided into M subsystems and each subsystem has an unique leader. Then a protocol is proposed and a Lyapunov function candidate is chosen. By means of the stability theory, a sufficient criterion, which involves the duration of DoS attacks, coupling strength and control gain, is obtained for achieving group consensus in first-order system. That is, the nodes in each subsystem can track the leader of that group. Furthermore, the result is extended to nonlinear second-order multi-agent systems and the controller is also improved to obtain sufficient conditions for group consensus. Additionally, the lower bounds of the coupling strength and average interval of DoS attacks can be determined from the obtained sufficient conditions. Finally, several numerical simulations are presented to explain the effectiveness of the proposed controllers and the derived theoretical results.

本文研究了一类遭受 DoS 攻击的非线性多代理系统的群体共识。首先，构建一个一阶非线性多代理系统，将其划分为 M 个子系统，每个子系统都有一个唯一的领导者。然后提出一个协议，并选择一个候选 Lyapunov 函数。通过稳定性理论，得到了在一阶系统中实现群体共识的充分准则，该准则涉及 DoS 攻击持续时间、耦合强度和控制增益。也就是说，每个子系统中的节点都能跟踪该组的领导者。此外，该结果还扩展到了非线性二阶多代理系统，并改进了控制器，从而获得了群体共识的充分条件。此外，还可以根据获得的充分条件确定耦合强度的下限和 DoS 攻击的平均间隔。最后，介绍了几个数值模拟，以解释所提控制器的有效性和推导出的理论结果。

引用次数: 0

Use of a Modified Threshold Function in Fuzzy Cognitive Maps for Improved Failure Mode Identification 在模糊认知图中使用修正阈值函数改进故障模式识别

IF 3.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Processing Letters

Pub Date : 2024-05-09 DOI: 10.1007/s11063-024-11623-y

Manu Augustine, Om Prakash Yadav, Ashish Nayyar, Dheeraj Joshi

Fuzzy cognitive maps (FCMs) provide a rapid and efficient approach for system modeling and simulation. The literature demonstrates numerous successful applications of FCMs in identifying failure modes. The standard process of failure mode identification using FCMs involves monitoring crucial concept/node values for excesses. Threshold functions are used to limit the value of nodes within a pre-specified range, which is usually [0, 1] or [-1, + 1]. However, traditional FCMs using the tanh threshold function possess two crucial drawbacks for this particular.Purpose(i) a tendency to reduce the values of state vector components, and (ii) the potential inability to reach a limit state with clearly identifiable failure states. The reason for this is the inherent mathematical nature of the tanh function in being asymptotic to the horizontal line demarcating the edge of the specified range. To overcome these limitations, this paper introduces a novel modified tanh threshold function that effectively addresses both issues.

模糊认知图（FCM）为系统建模和仿真提供了一种快速高效的方法。文献显示，模糊认知图在故障模式识别方面的成功应用不胜枚举。使用 FCM 进行故障模式识别的标准流程包括监测关键概念/节点值是否超标。阈值函数用于将节点值限制在预先指定的范围内，该范围通常为[0, 1]或[-1, + 1]。然而，使用 tanh 阈值函数的传统 FCM 对这一特定目的而言有两个关键缺点：(i) 容易降低状态向量分量的值，(ii) 可能无法达到具有清晰可辨故障状态的极限状态。究其原因，是 tanh 函数的固有数学性质，即它与划定指定范围边缘的水平线近似。为了克服这些局限性，本文引入了一种新的修正 tanh 阈值函数，以有效解决这两个问题。

引用次数: 0

Unsupervised Domain Adaptation Depth Estimation Based on Self-attention Mechanism and Edge Consistency Constraints 基于自我注意机制和边缘一致性约束的无监督领域自适应深度估计

IF 3.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Processing Letters

Pub Date : 2024-05-09 DOI: 10.1007/s11063-024-11621-0

Peng Guo, Shuguo Pan, Peng Hu, Ling Pei, Baoguo Yu

In the unsupervised domain adaptation (UDA) (Akada et al. Self-supervised learning of domain invariant features for depth estimation, in: 2022 IEEE/CVF winter conference on applications of computer vision (WACV), pp 3377–3387 (2022). 10.1109/WACV51458.2022.00107) depth estimation task, a new adaptive approach is to use the bidirectional transformation network to transfer the style between the target and source domain inputs, and then train the depth estimation network in their respective domains. However, the domain adaptation process and the style transfer may result in defects and biases, often leading to depth holes and instance edge depth missing in the target domain’s depth output. To address these issues, We propose a training network that has been improved in terms of model structure and supervision constraints. First, we introduce a edge-guided self-attention mechanism in the task network of each domain to enhance the network’s attention to high-frequency edge features, maintain clear boundaries and fill in missing areas of depth. Furthermore, we utilize an edge detection algorithm to extract edge features from the input of the target domain. Then we establish edge consistency constraints between inter-domain entities in order to narrow the gap between domains and make domain-to-domain transfers easier. Our experimental demonstrate that our proposed method effectively solve the aforementioned problem, resulting in a higher quality depth map and outperforming existing state-of-the-art methods.

在无监督领域适应（UDA）（Akada et al：2022 年 IEEE/CVF 计算机视觉应用冬季会议（WACV），第 3377-3387 页（2022 年）。10.1109/WACV51458.2022.00107) 深度估计任务，一种新的自适应方法是使用双向转换网络在目标域和源输入域之间转换样式，然后在各自的域中训练深度估计网络。然而，域适应过程和样式转移可能会导致缺陷和偏差，往往会导致目标域深度输出中出现深度漏洞和实例边缘深度缺失。为了解决这些问题，我们提出了一种在模型结构和监督约束方面进行了改进的训练网络。首先，我们在每个域的任务网络中引入了边缘引导的自我关注机制，以增强网络对高频边缘特征的关注，保持清晰的边界并填补深度缺失区域。此外，我们还利用边缘检测算法从目标域的输入中提取边缘特征。然后，我们在域间实体之间建立边缘一致性约束，以缩小域间差距，使域间传输更容易。实验证明，我们提出的方法有效地解决了上述问题，得到了更高质量的深度图，优于现有的先进方法。

{"title":"Unsupervised Domain Adaptation Depth Estimation Based on Self-attention Mechanism and Edge Consistency Constraints","authors":"Peng Guo, Shuguo Pan, Peng Hu, Ling Pei, Baoguo Yu","doi":"10.1007/s11063-024-11621-0","DOIUrl":"https://doi.org/10.1007/s11063-024-11621-0","url":null,"abstract":"In the unsupervised domain adaptation (UDA) (Akada et al. Self-supervised learning of domain invariant features for depth estimation, in: 2022 IEEE/CVF winter conference on applications of computer vision (WACV), pp 3377–3387 (2022). 10.1109/WACV51458.2022.00107) depth estimation task, a new adaptive approach is to use the bidirectional transformation network to transfer the style between the target and source domain inputs, and then train the depth estimation network in their respective domains. However, the domain adaptation process and the style transfer may result in defects and biases, often leading to depth holes and instance edge depth missing in the target domain’s depth output. To address these issues, We propose a training network that has been improved in terms of model structure and supervision constraints. First, we introduce a edge-guided self-attention mechanism in the task network of each domain to enhance the network’s attention to high-frequency edge features, maintain clear boundaries and fill in missing areas of depth. Furthermore, we utilize an edge detection algorithm to extract edge features from the input of the target domain. Then we establish edge consistency constraints between inter-domain entities in order to narrow the gap between domains and make domain-to-domain transfers easier. Our experimental demonstrate that our proposed method effectively solve the aforementioned problem, resulting in a higher quality depth map and outperforming existing state-of-the-art methods.","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"2 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140938928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Prototype-Based Neural Network for Image Anomaly Detection and Localization 基于原型的图像异常检测和定位神经网络

IF 3.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Processing Letters

Pub Date : 2024-05-08 DOI: 10.1007/s11063-024-11466-7

Chao Huang, Zhao Kang, Hong Wu

Image anomaly detection and localization perform not only image-level anomaly classification but also locate pixel-level anomaly regions. Recently, it has received much research attention due to its wide application in various fields. This paper proposes ProtoAD, a prototype-based neural network for image anomaly detection and localization. First, the patch features of normal images are extracted by a deep network pre-trained on nature images. Then, the prototypes of the normal patch features are learned by non-parametric clustering. Finally, we construct an image anomaly localization network (ProtoAD) by appending the feature extraction network with L2 feature normalization, a (1times 1) convolutional layer, a channel max-pooling, and a subtraction operation. We use the prototypes as the kernels of the (1times 1) convolutional layer; therefore, our neural network does not need a training phase and can conduct anomaly detection and localization in an end-to-end manner. Extensive experiments on two challenging industrial anomaly detection datasets, MVTec AD and BTAD, demonstrate that ProtoAD achieves competitive performance compared to the state-of-the-art methods with a higher inference speed. The code and pre-trained models are publicly available at https://github.com/98chao/ProtoAD.

图像异常检测和定位不仅能进行图像级的异常分类，还能定位像素级的异常区域。近年来，由于其在各个领域的广泛应用，受到了许多研究人员的关注。本文提出了一种用于图像异常检测和定位的基于原型的神经网络 ProtoAD。首先，通过在自然图像上预先训练的深度网络提取正常图像的斑块特征。然后，通过非参数聚类学习正常斑块特征的原型。最后，我们通过对特征提取网络进行 L2 特征归一化、卷积层、通道最大池化和减法运算，构建了图像异常定位网络（ProtoAD）。我们使用原型作为卷积层的核；因此，我们的神经网络不需要训练阶段，就能以端到端的方式进行异常检测和定位。在两个具有挑战性的工业异常检测数据集（MVTec AD 和 BTAD）上进行的广泛实验表明，ProtoAD 与最先进的方法相比，具有更高的推理速度，实现了具有竞争力的性能。代码和预训练模型可通过 https://github.com/98chao/ProtoAD 公开获取。

{"title":"A Prototype-Based Neural Network for Image Anomaly Detection and Localization","authors":"Chao Huang, Zhao Kang, Hong Wu","doi":"10.1007/s11063-024-11466-7","DOIUrl":"https://doi.org/10.1007/s11063-024-11466-7","url":null,"abstract":"Image anomaly detection and localization perform not only image-level anomaly classification but also locate pixel-level anomaly regions. Recently, it has received much research attention due to its wide application in various fields. This paper proposes ProtoAD, a prototype-based neural network for image anomaly detection and localization. First, the patch features of normal images are extracted by a deep network pre-trained on nature images. Then, the prototypes of the normal patch features are learned by non-parametric clustering. Finally, we construct an image anomaly localization network (ProtoAD) by appending the feature extraction network with L2 feature normalization, a (1times 1) convolutional layer, a channel max-pooling, and a subtraction operation. We use the prototypes as the kernels of the (1times 1) convolutional layer; therefore, our neural network does not need a training phase and can conduct anomaly detection and localization in an end-to-end manner. Extensive experiments on two challenging industrial anomaly detection datasets, MVTec AD and BTAD, demonstrate that ProtoAD achieves competitive performance compared to the state-of-the-art methods with a higher inference speed. The code and pre-trained models are publicly available at https://github.com/98chao/ProtoAD.","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"45 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140938881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

WaveVC: Speech and Fundamental Frequency Consistent Raw Audio Voice Conversion WaveVC：语音和基频一致的原始音频语音转换

IF 3.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Processing Letters

Pub Date : 2024-05-08 DOI: 10.1007/s11063-024-11613-0

Kyungdeuk Ko, Donghyeon Kim, Kyungseok Oh, Hanseok Ko

Voice conversion (VC) is a task for changing the speech of a source speaker to the target voice while preserving linguistic information of the source speech. The existing VC methods typically use mel-spectrogram as both input and output, so a separate vocoder is required to transform mel-spectrogram into waveform. Therefore, the VC performance varies depending on the vocoder performance, and noisy speech can be generated due to problems such as train-test mismatch. In this paper, we propose a speech and fundamental frequency consistent raw audio voice conversion method called WaveVC. Unlike other methods, WaveVC does not require a separate vocoder and can perform VC directly on raw audio waveform using 1D convolution. This eliminates the issue of performance degradation caused by the train-test mismatch of the vocoder. In the training phase, WaveVC employs speech loss and F0 loss to preserve the content of the source speech and generate F0 consistent speech using the pre-trained networks. WaveVC is capable of converting voices while maintaining consistency in speech and fundamental frequency. In the test phase, the F0 feature of the source speech is concatenated with a content embedding vector to ensure the converted speech follows the fundamental frequency flow of the source speech. WaveVC achieves higher performances than baseline methods in both many-to-many VC and any-to-any VC. The converted samples are available online.

语音转换（VC）是在保留源语音的语言信息的前提下，将源语音转换为目标语音的一项任务。现有的语音转换方法通常将 mel 频谱作为输入和输出，因此需要单独的声码器将 mel 频谱转换为波形。因此，VC 的性能取决于声码器的性能，而且由于训练-测试不匹配等问题，可能会产生噪声语音。本文提出了一种语音和基频一致的原始音频语音转换方法，称为 WaveVC。与其他方法不同的是，WaveVC 不需要单独的声码器，可以直接使用一维卷积对原始音频波形执行 VC。这就消除了由于声码器的训练-测试不匹配而导致的性能下降问题。在训练阶段，WaveVC 采用语音损失和 F0 损失来保留源语音的内容，并使用预训练网络生成 F0 一致的语音。WaveVC 能够转换语音，同时保持语音和基频的一致性。在测试阶段，源语音的 F0 特征与内容嵌入向量相串联，以确保转换后的语音遵循源语音的基频流。在多对多变声和任意对任意变声中，WaveVC 的性能都高于基线方法。转换后的样本可在线获取。

{"title":"WaveVC: Speech and Fundamental Frequency Consistent Raw Audio Voice Conversion","authors":"Kyungdeuk Ko, Donghyeon Kim, Kyungseok Oh, Hanseok Ko","doi":"10.1007/s11063-024-11613-0","DOIUrl":"https://doi.org/10.1007/s11063-024-11613-0","url":null,"abstract":"Voice conversion (VC) is a task for changing the speech of a source speaker to the target voice while preserving linguistic information of the source speech. The existing VC methods typically use mel-spectrogram as both input and output, so a separate vocoder is required to transform mel-spectrogram into waveform. Therefore, the VC performance varies depending on the vocoder performance, and noisy speech can be generated due to problems such as train-test mismatch. In this paper, we propose a speech and fundamental frequency consistent raw audio voice conversion method called WaveVC. Unlike other methods, WaveVC does not require a separate vocoder and can perform VC directly on raw audio waveform using 1D convolution. This eliminates the issue of performance degradation caused by the train-test mismatch of the vocoder. In the training phase, WaveVC employs speech loss and F0 loss to preserve the content of the source speech and generate F0 consistent speech using the pre-trained networks. WaveVC is capable of converting voices while maintaining consistency in speech and fundamental frequency. In the test phase, the F0 feature of the source speech is concatenated with a content embedding vector to ensure the converted speech follows the fundamental frequency flow of the source speech. WaveVC achieves higher performances than baseline methods in both many-to-many VC and any-to-any VC. The converted samples are available online.","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"37 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140887887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-view Self-supervised Learning and Multi-scale Feature Fusion for Automatic Speech Recognition 多视角自监督学习和多尺度特征融合用于自动语音识别

IF 3.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Processing Letters

Pub Date : 2024-05-08 DOI: 10.1007/s11063-024-11614-z

Jingyu Zhao, Ruwei Li, Maocun Tian, Weidong An

To address the challenges of the poor representation capability and low data utilization rate of end-to-end speech recognition models in deep learning, this study proposes an end-to-end speech recognition model based on multi-scale feature fusion and multi-view self-supervised learning (MM-ASR). It adopts a multi-task learning paradigm for training. The proposed method emphasizes the importance of inter-layer information within shared encoders, aiming to enhance the model’s characterization capability via the multi-scale feature fusion module. Moreover, we apply multi-view self-supervised learning to effectively exploit data information. Our approach is rigorously evaluated on the Aishell-1 dataset and further validated its effectiveness on the English corpus WSJ. The experimental results demonstrate a noteworthy 4.6(%) reduction in character error rate, indicating significantly improved speech recognition performance . These findings showcase the effectiveness and potential of our proposed MM-ASR model for end-to-end speech recognition tasks.

针对深度学习中端到端语音识别模型表示能力差、数据利用率低的难题，本研究提出了一种基于多尺度特征融合和多视角自监督学习（MM-ASR）的端到端语音识别模型。它采用多任务学习范式进行训练。所提出的方法强调了共享编码器中层间信息的重要性，旨在通过多尺度特征融合模块增强模型的表征能力。此外，我们还应用多视角自监督学习来有效利用数据信息。我们的方法在 Aishell-1 数据集上进行了严格评估，并在英语语料库 WSJ 上进一步验证了其有效性。实验结果表明，字符错误率明显降低了4.6%，这表明语音识别性能有了显著提高。这些发现展示了我们提出的 MM-ASR 模型在端到端语音识别任务中的有效性和潜力。

引用次数: 0

TLCE: Transfer-Learning Based Classifier Ensembles for Few-Shot Class-Incremental Learning TLCE：基于迁移学习的分类器集合，用于少镜头分类增量学习

IF 3.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Processing Letters

Pub Date : 2024-05-08 DOI: 10.1007/s11063-024-11605-0

Shuangmei Wang, Yang Cao, Tieru Wu

Few-shot class-incremental learning (FSCIL) struggles to incrementally recognize novel classes from few examples without catastrophic forgetting of old classes or overfitting to new classes. We propose TLCE, which ensembles multiple pre-trained models to improve separation of novel and old classes. Specifically, we use episodic training to map images from old classes to quasi-orthogonal prototypes, which minimizes interference between old and new classes. Then, we incorporate the use of ensembling diverse pre-trained models to further tackle the challenge of data imbalance and enhance adaptation to novel classes. Extensive experiments on various datasets demonstrate that our transfer learning ensemble approach outperforms state-of-the-art FSCIL methods.

少量类增量学习（FSCIL）难以从少量示例中增量识别新类，同时又不会灾难性地遗忘旧类或过度拟合新类。我们提出了 TLCE，它集合了多个预先训练好的模型，以改善新类和旧类的分离。具体来说，我们使用偶发训练将旧类别的图像映射到准正交原型，从而最大限度地减少新旧类别之间的干扰。然后，我们将不同的预训练模型进行组合，进一步应对数据不平衡的挑战，并增强对新类别的适应性。在各种数据集上进行的大量实验表明，我们的迁移学习集合方法优于最先进的 FSCIL 方法。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Neural Processing Letters

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀