首页 > 最新文献

Neural Processing Letters最新文献

英文 中文
A Unified Asymmetric Knowledge Distillation Framework for Image Classification 图像分类的统一非对称知识提炼框架
IF 3.1 4区 计算机科学 Q2 Computer Science Pub Date : 2024-05-10 DOI: 10.1007/s11063-024-11606-z
Xin Ye, Xiang Tian, Bolun Zheng, Fan Zhou, Yaowu Chen

Knowledge distillation is a model compression technique that transfers knowledge learned by teacher networks to student networks. Existing knowledge distillation methods greatly expand the forms of knowledge, but also make the distillation models complex and symmetric. However, few studies have explored the commonalities among these methods. In this study, we propose a concise distillation framework to unify these methods and a method to construct asymmetric knowledge distillation under the framework. Asymmetric distillation aims to enable differentiated knowledge transfers for different distillation objects. We designed a multi-stage shallow-wide branch bifurcation method to distill different knowledge representations and a grouping ensemble strategy to supervise the network to teach and learn selectively. Consequently, we conducted experiments using image classification benchmarks to verify the proposed method. Experimental results show that our implementation can achieve considerable improvements over existing methods, demonstrating the effectiveness of the method and the potential of the framework.

知识蒸馏是一种将教师网络所学知识转移到学生网络的模型压缩技术。现有的知识蒸馏方法大大扩展了知识的形式,但也使蒸馏模型变得复杂和对称。然而,很少有研究探讨这些方法之间的共性。在本研究中,我们提出了一个简明的蒸馏框架来统一这些方法,并在该框架下提出了一种构建非对称知识蒸馏的方法。非对称蒸馏旨在针对不同的蒸馏对象实现差异化的知识转移。我们设计了一种多级浅宽分支分叉法来提炼不同的知识表征,并设计了一种分组集合策略来监督网络有选择地教学和学习。因此,我们使用图像分类基准进行了实验,以验证所提出的方法。实验结果表明,与现有方法相比,我们的实现方法可以取得相当大的改进,证明了该方法的有效性和该框架的潜力。
{"title":"A Unified Asymmetric Knowledge Distillation Framework for Image Classification","authors":"Xin Ye, Xiang Tian, Bolun Zheng, Fan Zhou, Yaowu Chen","doi":"10.1007/s11063-024-11606-z","DOIUrl":"https://doi.org/10.1007/s11063-024-11606-z","url":null,"abstract":"<p>Knowledge distillation is a model compression technique that transfers knowledge learned by teacher networks to student networks. Existing knowledge distillation methods greatly expand the forms of knowledge, but also make the distillation models complex and symmetric. However, few studies have explored the commonalities among these methods. In this study, we propose a concise distillation framework to unify these methods and a method to construct asymmetric knowledge distillation under the framework. Asymmetric distillation aims to enable differentiated knowledge transfers for different distillation objects. We designed a multi-stage shallow-wide branch bifurcation method to distill different knowledge representations and a grouping ensemble strategy to supervise the network to teach and learn selectively. Consequently, we conducted experiments using image classification benchmarks to verify the proposed method. Experimental results show that our implementation can achieve considerable improvements over existing methods, demonstrating the effectiveness of the method and the potential of the framework.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140942261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pinning Group Consensus of Multi-agent Systems Under DoS Attacks DoS攻击下多代理系统的钉组共识
IF 3.1 4区 计算机科学 Q2 Computer Science Pub Date : 2024-05-10 DOI: 10.1007/s11063-024-11630-z
Qian Lang, Jing Xu, Huiwen Zhang, Zhengxin Wang

In this paper, group consensus is investigated for a class of nonlinear multi-agent systems suffered from the DoS attacks. Firstly, a first-order nonlinear multi-agent system is constructed, which is divided into M subsystems and each subsystem has an unique leader. Then a protocol is proposed and a Lyapunov function candidate is chosen. By means of the stability theory, a sufficient criterion, which involves the duration of DoS attacks, coupling strength and control gain, is obtained for achieving group consensus in first-order system. That is, the nodes in each subsystem can track the leader of that group. Furthermore, the result is extended to nonlinear second-order multi-agent systems and the controller is also improved to obtain sufficient conditions for group consensus. Additionally, the lower bounds of the coupling strength and average interval of DoS attacks can be determined from the obtained sufficient conditions. Finally, several numerical simulations are presented to explain the effectiveness of the proposed controllers and the derived theoretical results.

本文研究了一类遭受 DoS 攻击的非线性多代理系统的群体共识。首先,构建一个一阶非线性多代理系统,将其划分为 M 个子系统,每个子系统都有一个唯一的领导者。然后提出一个协议,并选择一个候选 Lyapunov 函数。通过稳定性理论,得到了在一阶系统中实现群体共识的充分准则,该准则涉及 DoS 攻击持续时间、耦合强度和控制增益。也就是说,每个子系统中的节点都能跟踪该组的领导者。此外,该结果还扩展到了非线性二阶多代理系统,并改进了控制器,从而获得了群体共识的充分条件。此外,还可以根据获得的充分条件确定耦合强度的下限和 DoS 攻击的平均间隔。最后,介绍了几个数值模拟,以解释所提控制器的有效性和推导出的理论结果。
{"title":"Pinning Group Consensus of Multi-agent Systems Under DoS Attacks","authors":"Qian Lang, Jing Xu, Huiwen Zhang, Zhengxin Wang","doi":"10.1007/s11063-024-11630-z","DOIUrl":"https://doi.org/10.1007/s11063-024-11630-z","url":null,"abstract":"<p>In this paper, group consensus is investigated for a class of nonlinear multi-agent systems suffered from the DoS attacks. Firstly, a first-order nonlinear multi-agent system is constructed, which is divided into <i>M</i> subsystems and each subsystem has an unique leader. Then a protocol is proposed and a Lyapunov function candidate is chosen. By means of the stability theory, a sufficient criterion, which involves the duration of DoS attacks, coupling strength and control gain, is obtained for achieving group consensus in first-order system. That is, the nodes in each subsystem can track the leader of that group. Furthermore, the result is extended to nonlinear second-order multi-agent systems and the controller is also improved to obtain sufficient conditions for group consensus. Additionally, the lower bounds of the coupling strength and average interval of DoS attacks can be determined from the obtained sufficient conditions. Finally, several numerical simulations are presented to explain the effectiveness of the proposed controllers and the derived theoretical results.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140938927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Use of a Modified Threshold Function in Fuzzy Cognitive Maps for Improved Failure Mode Identification 在模糊认知图中使用修正阈值函数改进故障模式识别
IF 3.1 4区 计算机科学 Q2 Computer Science Pub Date : 2024-05-09 DOI: 10.1007/s11063-024-11623-y
Manu Augustine, Om Prakash Yadav, Ashish Nayyar, Dheeraj Joshi

Fuzzy cognitive maps (FCMs) provide a rapid and efficient approach for system modeling and simulation. The literature demonstrates numerous successful applications of FCMs in identifying failure modes. The standard process of failure mode identification using FCMs involves monitoring crucial concept/node values for excesses. Threshold functions are used to limit the value of nodes within a pre-specified range, which is usually [0, 1] or [-1, + 1]. However, traditional FCMs using the tanh threshold function possess two crucial drawbacks for this particular.Purpose(i) a tendency to reduce the values of state vector components, and (ii) the potential inability to reach a limit state with clearly identifiable failure states. The reason for this is the inherent mathematical nature of the tanh function in being asymptotic to the horizontal line demarcating the edge of the specified range. To overcome these limitations, this paper introduces a novel modified tanh threshold function that effectively addresses both issues.

模糊认知图(FCM)为系统建模和仿真提供了一种快速高效的方法。文献显示,模糊认知图在故障模式识别方面的成功应用不胜枚举。使用 FCM 进行故障模式识别的标准流程包括监测关键概念/节点值是否超标。阈值函数用于将节点值限制在预先指定的范围内,该范围通常为[0, 1]或[-1, + 1]。然而,使用 tanh 阈值函数的传统 FCM 对这一特定目的而言有两个关键缺点:(i) 容易降低状态向量分量的值,(ii) 可能无法达到具有清晰可辨故障状态的极限状态。究其原因,是 tanh 函数的固有数学性质,即它与划定指定范围边缘的水平线近似。为了克服这些局限性,本文引入了一种新的修正 tanh 阈值函数,以有效解决这两个问题。
{"title":"Use of a Modified Threshold Function in Fuzzy Cognitive Maps for Improved Failure Mode Identification","authors":"Manu Augustine, Om Prakash Yadav, Ashish Nayyar, Dheeraj Joshi","doi":"10.1007/s11063-024-11623-y","DOIUrl":"https://doi.org/10.1007/s11063-024-11623-y","url":null,"abstract":"<p>Fuzzy cognitive maps (FCMs) provide a rapid and efficient approach for system modeling and simulation. The literature demonstrates numerous successful applications of FCMs in identifying failure modes. The standard process of failure mode identification using FCMs involves monitoring crucial concept/node values for excesses. Threshold functions are used to limit the value of nodes within a pre-specified range, which is usually [0, 1] or [-1, + 1]. However, traditional FCMs using the <i>tanh</i> threshold function possess two crucial drawbacks for this particular.Purpose(i) a tendency to reduce the values of state vector components, and (ii) the potential inability to reach a limit state with clearly identifiable failure states. The reason for this is the inherent mathematical nature of the <i>tanh</i> function in being asymptotic to the horizontal line demarcating the edge of the specified range. To overcome these limitations, this paper introduces a novel modified <i>tanh</i> threshold function that effectively addresses both issues.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140938983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised Domain Adaptation Depth Estimation Based on Self-attention Mechanism and Edge Consistency Constraints 基于自我注意机制和边缘一致性约束的无监督领域自适应深度估计
IF 3.1 4区 计算机科学 Q2 Computer Science Pub Date : 2024-05-09 DOI: 10.1007/s11063-024-11621-0
Peng Guo, Shuguo Pan, Peng Hu, Ling Pei, Baoguo Yu

In the unsupervised domain adaptation (UDA) (Akada et al. Self-supervised learning of domain invariant features for depth estimation, in: 2022 IEEE/CVF winter conference on applications of computer vision (WACV), pp 3377–3387 (2022). 10.1109/WACV51458.2022.00107) depth estimation task, a new adaptive approach is to use the bidirectional transformation network to transfer the style between the target and source domain inputs, and then train the depth estimation network in their respective domains. However, the domain adaptation process and the style transfer may result in defects and biases, often leading to depth holes and instance edge depth missing in the target domain’s depth output. To address these issues, We propose a training network that has been improved in terms of model structure and supervision constraints. First, we introduce a edge-guided self-attention mechanism in the task network of each domain to enhance the network’s attention to high-frequency edge features, maintain clear boundaries and fill in missing areas of depth. Furthermore, we utilize an edge detection algorithm to extract edge features from the input of the target domain. Then we establish edge consistency constraints between inter-domain entities in order to narrow the gap between domains and make domain-to-domain transfers easier. Our experimental demonstrate that our proposed method effectively solve the aforementioned problem, resulting in a higher quality depth map and outperforming existing state-of-the-art methods.

在无监督领域适应(UDA)(Akada et al:2022 年 IEEE/CVF 计算机视觉应用冬季会议(WACV),第 3377-3387 页(2022 年)。10.1109/WACV51458.2022.00107) 深度估计任务,一种新的自适应方法是使用双向转换网络在目标域和源输入域之间转换样式,然后在各自的域中训练深度估计网络。然而,域适应过程和样式转移可能会导致缺陷和偏差,往往会导致目标域深度输出中出现深度漏洞和实例边缘深度缺失。为了解决这些问题,我们提出了一种在模型结构和监督约束方面进行了改进的训练网络。首先,我们在每个域的任务网络中引入了边缘引导的自我关注机制,以增强网络对高频边缘特征的关注,保持清晰的边界并填补深度缺失区域。此外,我们还利用边缘检测算法从目标域的输入中提取边缘特征。然后,我们在域间实体之间建立边缘一致性约束,以缩小域间差距,使域间传输更容易。实验证明,我们提出的方法有效地解决了上述问题,得到了更高质量的深度图,优于现有的先进方法。
{"title":"Unsupervised Domain Adaptation Depth Estimation Based on Self-attention Mechanism and Edge Consistency Constraints","authors":"Peng Guo, Shuguo Pan, Peng Hu, Ling Pei, Baoguo Yu","doi":"10.1007/s11063-024-11621-0","DOIUrl":"https://doi.org/10.1007/s11063-024-11621-0","url":null,"abstract":"<p>In the unsupervised domain adaptation (UDA) (Akada et al. Self-supervised learning of domain invariant features for depth estimation, in: 2022 IEEE/CVF winter conference on applications of computer vision (WACV), pp 3377–3387 (2022). 10.1109/WACV51458.2022.00107) depth estimation task, a new adaptive approach is to use the bidirectional transformation network to transfer the style between the target and source domain inputs, and then train the depth estimation network in their respective domains. However, the domain adaptation process and the style transfer may result in defects and biases, often leading to depth holes and instance edge depth missing in the target domain’s depth output. To address these issues, We propose a training network that has been improved in terms of model structure and supervision constraints. First, we introduce a edge-guided self-attention mechanism in the task network of each domain to enhance the network’s attention to high-frequency edge features, maintain clear boundaries and fill in missing areas of depth. Furthermore, we utilize an edge detection algorithm to extract edge features from the input of the target domain. Then we establish edge consistency constraints between inter-domain entities in order to narrow the gap between domains and make domain-to-domain transfers easier. Our experimental demonstrate that our proposed method effectively solve the aforementioned problem, resulting in a higher quality depth map and outperforming existing state-of-the-art methods.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140938928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Prototype-Based Neural Network for Image Anomaly Detection and Localization 基于原型的图像异常检测和定位神经网络
IF 3.1 4区 计算机科学 Q2 Computer Science Pub Date : 2024-05-08 DOI: 10.1007/s11063-024-11466-7
Chao Huang, Zhao Kang, Hong Wu

Image anomaly detection and localization perform not only image-level anomaly classification but also locate pixel-level anomaly regions. Recently, it has received much research attention due to its wide application in various fields. This paper proposes ProtoAD, a prototype-based neural network for image anomaly detection and localization. First, the patch features of normal images are extracted by a deep network pre-trained on nature images. Then, the prototypes of the normal patch features are learned by non-parametric clustering. Finally, we construct an image anomaly localization network (ProtoAD) by appending the feature extraction network with L2 feature normalization, a (1times 1) convolutional layer, a channel max-pooling, and a subtraction operation. We use the prototypes as the kernels of the (1times 1) convolutional layer; therefore, our neural network does not need a training phase and can conduct anomaly detection and localization in an end-to-end manner. Extensive experiments on two challenging industrial anomaly detection datasets, MVTec AD and BTAD, demonstrate that ProtoAD achieves competitive performance compared to the state-of-the-art methods with a higher inference speed. The code and pre-trained models are publicly available at https://github.com/98chao/ProtoAD.

图像异常检测和定位不仅能进行图像级的异常分类,还能定位像素级的异常区域。近年来,由于其在各个领域的广泛应用,受到了许多研究人员的关注。本文提出了一种用于图像异常检测和定位的基于原型的神经网络 ProtoAD。首先,通过在自然图像上预先训练的深度网络提取正常图像的斑块特征。然后,通过非参数聚类学习正常斑块特征的原型。最后,我们通过对特征提取网络进行 L2 特征归一化、卷积层、通道最大池化和减法运算,构建了图像异常定位网络(ProtoAD)。我们使用原型作为卷积层的核;因此,我们的神经网络不需要训练阶段,就能以端到端的方式进行异常检测和定位。在两个具有挑战性的工业异常检测数据集(MVTec AD 和 BTAD)上进行的广泛实验表明,ProtoAD 与最先进的方法相比,具有更高的推理速度,实现了具有竞争力的性能。代码和预训练模型可通过 https://github.com/98chao/ProtoAD 公开获取。
{"title":"A Prototype-Based Neural Network for Image Anomaly Detection and Localization","authors":"Chao Huang, Zhao Kang, Hong Wu","doi":"10.1007/s11063-024-11466-7","DOIUrl":"https://doi.org/10.1007/s11063-024-11466-7","url":null,"abstract":"<p>Image anomaly detection and localization perform not only image-level anomaly classification but also locate pixel-level anomaly regions. Recently, it has received much research attention due to its wide application in various fields. This paper proposes ProtoAD, a prototype-based neural network for image anomaly detection and localization. First, the patch features of normal images are extracted by a deep network pre-trained on nature images. Then, the prototypes of the normal patch features are learned by non-parametric clustering. Finally, we construct an image anomaly localization network (ProtoAD) by appending the feature extraction network with <i>L</i>2 feature normalization, a <span>(1times 1)</span> convolutional layer, a channel max-pooling, and a subtraction operation. We use the prototypes as the kernels of the <span>(1times 1)</span> convolutional layer; therefore, our neural network does not need a training phase and can conduct anomaly detection and localization in an end-to-end manner. Extensive experiments on two challenging industrial anomaly detection datasets, MVTec AD and BTAD, demonstrate that ProtoAD achieves competitive performance compared to the state-of-the-art methods with a higher inference speed. The code and pre-trained models are publicly available at https://github.com/98chao/ProtoAD.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140938881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
WaveVC: Speech and Fundamental Frequency Consistent Raw Audio Voice Conversion WaveVC:语音和基频一致的原始音频语音转换
IF 3.1 4区 计算机科学 Q2 Computer Science Pub Date : 2024-05-08 DOI: 10.1007/s11063-024-11613-0
Kyungdeuk Ko, Donghyeon Kim, Kyungseok Oh, Hanseok Ko

Voice conversion (VC) is a task for changing the speech of a source speaker to the target voice while preserving linguistic information of the source speech. The existing VC methods typically use mel-spectrogram as both input and output, so a separate vocoder is required to transform mel-spectrogram into waveform. Therefore, the VC performance varies depending on the vocoder performance, and noisy speech can be generated due to problems such as train-test mismatch. In this paper, we propose a speech and fundamental frequency consistent raw audio voice conversion method called WaveVC. Unlike other methods, WaveVC does not require a separate vocoder and can perform VC directly on raw audio waveform using 1D convolution. This eliminates the issue of performance degradation caused by the train-test mismatch of the vocoder. In the training phase, WaveVC employs speech loss and F0 loss to preserve the content of the source speech and generate F0 consistent speech using the pre-trained networks. WaveVC is capable of converting voices while maintaining consistency in speech and fundamental frequency. In the test phase, the F0 feature of the source speech is concatenated with a content embedding vector to ensure the converted speech follows the fundamental frequency flow of the source speech. WaveVC achieves higher performances than baseline methods in both many-to-many VC and any-to-any VC. The converted samples are available online.

语音转换(VC)是在保留源语音的语言信息的前提下,将源语音转换为目标语音的一项任务。现有的语音转换方法通常将 mel 频谱作为输入和输出,因此需要单独的声码器将 mel 频谱转换为波形。因此,VC 的性能取决于声码器的性能,而且由于训练-测试不匹配等问题,可能会产生噪声语音。本文提出了一种语音和基频一致的原始音频语音转换方法,称为 WaveVC。与其他方法不同的是,WaveVC 不需要单独的声码器,可以直接使用一维卷积对原始音频波形执行 VC。这就消除了由于声码器的训练-测试不匹配而导致的性能下降问题。在训练阶段,WaveVC 采用语音损失和 F0 损失来保留源语音的内容,并使用预训练网络生成 F0 一致的语音。WaveVC 能够转换语音,同时保持语音和基频的一致性。在测试阶段,源语音的 F0 特征与内容嵌入向量相串联,以确保转换后的语音遵循源语音的基频流。在多对多变声和任意对任意变声中,WaveVC 的性能都高于基线方法。转换后的样本可在线获取。
{"title":"WaveVC: Speech and Fundamental Frequency Consistent Raw Audio Voice Conversion","authors":"Kyungdeuk Ko, Donghyeon Kim, Kyungseok Oh, Hanseok Ko","doi":"10.1007/s11063-024-11613-0","DOIUrl":"https://doi.org/10.1007/s11063-024-11613-0","url":null,"abstract":"<p>Voice conversion (VC) is a task for changing the speech of a source speaker to the target voice while preserving linguistic information of the source speech. The existing VC methods typically use mel-spectrogram as both input and output, so a separate vocoder is required to transform mel-spectrogram into waveform. Therefore, the VC performance varies depending on the vocoder performance, and noisy speech can be generated due to problems such as train-test mismatch. In this paper, we propose a speech and fundamental frequency consistent raw audio voice conversion method called WaveVC. Unlike other methods, WaveVC does not require a separate vocoder and can perform VC directly on raw audio waveform using 1D convolution. This eliminates the issue of performance degradation caused by the train-test mismatch of the vocoder. In the training phase, WaveVC employs speech loss and F0 loss to preserve the content of the source speech and generate F0 consistent speech using the pre-trained networks. WaveVC is capable of converting voices while maintaining consistency in speech and fundamental frequency. In the test phase, the F0 feature of the source speech is concatenated with a content embedding vector to ensure the converted speech follows the fundamental frequency flow of the source speech. WaveVC achieves higher performances than baseline methods in both many-to-many VC and any-to-any VC. The converted samples are available online.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140887887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-view Self-supervised Learning and Multi-scale Feature Fusion for Automatic Speech Recognition 多视角自监督学习和多尺度特征融合用于自动语音识别
IF 3.1 4区 计算机科学 Q2 Computer Science Pub Date : 2024-05-08 DOI: 10.1007/s11063-024-11614-z
Jingyu Zhao, Ruwei Li, Maocun Tian, Weidong An

To address the challenges of the poor representation capability and low data utilization rate of end-to-end speech recognition models in deep learning, this study proposes an end-to-end speech recognition model based on multi-scale feature fusion and multi-view self-supervised learning (MM-ASR). It adopts a multi-task learning paradigm for training. The proposed method emphasizes the importance of inter-layer information within shared encoders, aiming to enhance the model’s characterization capability via the multi-scale feature fusion module. Moreover, we apply multi-view self-supervised learning to effectively exploit data information. Our approach is rigorously evaluated on the Aishell-1 dataset and further validated its effectiveness on the English corpus WSJ. The experimental results demonstrate a noteworthy 4.6(%) reduction in character error rate, indicating significantly improved speech recognition performance . These findings showcase the effectiveness and potential of our proposed MM-ASR model for end-to-end speech recognition tasks.

针对深度学习中端到端语音识别模型表示能力差、数据利用率低的难题,本研究提出了一种基于多尺度特征融合和多视角自监督学习(MM-ASR)的端到端语音识别模型。它采用多任务学习范式进行训练。所提出的方法强调了共享编码器中层间信息的重要性,旨在通过多尺度特征融合模块增强模型的表征能力。此外,我们还应用多视角自监督学习来有效利用数据信息。我们的方法在 Aishell-1 数据集上进行了严格评估,并在英语语料库 WSJ 上进一步验证了其有效性。实验结果表明,字符错误率明显降低了4.6%,这表明语音识别性能有了显著提高。这些发现展示了我们提出的 MM-ASR 模型在端到端语音识别任务中的有效性和潜力。
{"title":"Multi-view Self-supervised Learning and Multi-scale Feature Fusion for Automatic Speech Recognition","authors":"Jingyu Zhao, Ruwei Li, Maocun Tian, Weidong An","doi":"10.1007/s11063-024-11614-z","DOIUrl":"https://doi.org/10.1007/s11063-024-11614-z","url":null,"abstract":"<p>To address the challenges of the poor representation capability and low data utilization rate of end-to-end speech recognition models in deep learning, this study proposes an end-to-end speech recognition model based on multi-scale feature fusion and multi-view self-supervised learning (MM-ASR). It adopts a multi-task learning paradigm for training. The proposed method emphasizes the importance of inter-layer information within shared encoders, aiming to enhance the model’s characterization capability via the multi-scale feature fusion module. Moreover, we apply multi-view self-supervised learning to effectively exploit data information. Our approach is rigorously evaluated on the Aishell-1 dataset and further validated its effectiveness on the English corpus WSJ. The experimental results demonstrate a noteworthy 4.6<span>(%)</span> reduction in character error rate, indicating significantly improved speech recognition performance . These findings showcase the effectiveness and potential of our proposed MM-ASR model for end-to-end speech recognition tasks.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140942262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TLCE: Transfer-Learning Based Classifier Ensembles for Few-Shot Class-Incremental Learning TLCE:基于迁移学习的分类器集合,用于少镜头分类增量学习
IF 3.1 4区 计算机科学 Q2 Computer Science Pub Date : 2024-05-08 DOI: 10.1007/s11063-024-11605-0
Shuangmei Wang, Yang Cao, Tieru Wu

Few-shot class-incremental learning (FSCIL) struggles to incrementally recognize novel classes from few examples without catastrophic forgetting of old classes or overfitting to new classes. We propose TLCE, which ensembles multiple pre-trained models to improve separation of novel and old classes. Specifically, we use episodic training to map images from old classes to quasi-orthogonal prototypes, which minimizes interference between old and new classes. Then, we incorporate the use of ensembling diverse pre-trained models to further tackle the challenge of data imbalance and enhance adaptation to novel classes. Extensive experiments on various datasets demonstrate that our transfer learning ensemble approach outperforms state-of-the-art FSCIL methods.

少量类增量学习(FSCIL)难以从少量示例中增量识别新类,同时又不会灾难性地遗忘旧类或过度拟合新类。我们提出了 TLCE,它集合了多个预先训练好的模型,以改善新类和旧类的分离。具体来说,我们使用偶发训练将旧类别的图像映射到准正交原型,从而最大限度地减少新旧类别之间的干扰。然后,我们将不同的预训练模型进行组合,进一步应对数据不平衡的挑战,并增强对新类别的适应性。在各种数据集上进行的大量实验表明,我们的迁移学习集合方法优于最先进的 FSCIL 方法。
{"title":"TLCE: Transfer-Learning Based Classifier Ensembles for Few-Shot Class-Incremental Learning","authors":"Shuangmei Wang, Yang Cao, Tieru Wu","doi":"10.1007/s11063-024-11605-0","DOIUrl":"https://doi.org/10.1007/s11063-024-11605-0","url":null,"abstract":"<p>Few-shot class-incremental learning (FSCIL) struggles to incrementally recognize novel classes from few examples without catastrophic forgetting of old classes or overfitting to new classes. We propose TLCE, which ensembles multiple pre-trained models to improve separation of novel and old classes. Specifically, we use episodic training to map images from old classes to quasi-orthogonal prototypes, which minimizes interference between old and new classes. Then, we incorporate the use of ensembling diverse pre-trained models to further tackle the challenge of data imbalance and enhance adaptation to novel classes. Extensive experiments on various datasets demonstrate that our transfer learning ensemble approach outperforms state-of-the-art FSCIL methods.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140887658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Parallel Model for Jointly Extracting Entities and Relations 联合提取实体和关系的并行模型
IF 3.1 4区 计算机科学 Q2 Computer Science Pub Date : 2024-05-07 DOI: 10.1007/s11063-024-11616-x
Zuqin Chen, Yujie Zheng, Jike Ge, Wencheng Yu, Zining Wang

Extracting relational triples from a piece of text is an essential task in knowledge graph construction. However, most existing methods either identify entities before predicting their relations, or detect relations before recognizing associated entities. This order may lead to error accumulation because once there is an error in the initial step, it will accumulate to subsequent steps. To solve this problem, we propose a parallel model for jointly extracting entities and relations, called PRE-Span, which consists of two mutually independent submodules. Specifically, candidate entities and relations are first generated by enumerating token sequences in sentences. Then, two independent submodules (Entity Extraction Module and Relation Detection Module) are designed to predict entities and relations. Finally, the predicted results of the two submodules are analyzed to select entities and relations, which are jointly decoded to obtain relational triples. The advantage of this method is that all triples can be extracted in just one step. Extensive experiments on the WebNLG*, NYT*, NYT and WebNLG datasets show that our model outperforms other baselines at 94.4%, 88.3%, 86.5% and 83.0%, respectively.

从文本中提取关系三元组是构建知识图谱的一项基本任务。然而,现有的大多数方法要么是先识别实体再预测其关系,要么是先检测关系再识别相关实体。这种顺序可能会导致错误累积,因为一旦初始步骤出现错误,就会累积到后续步骤。为了解决这个问题,我们提出了一种联合提取实体和关系的并行模型,称为 PRE-Span,它由两个相互独立的子模块组成。具体来说,首先通过枚举句子中的标记序列生成候选实体和关系。然后,设计两个独立的子模块(实体提取模块和关系检测模块)来预测实体和关系。最后,对两个子模块的预测结果进行分析,选出实体和关系,并对它们进行联合解码,得到关系三。这种方法的优点是只需一步就能提取所有三元组。在 WebNLG*、NYT*、NYT 和 WebNLG 数据集上进行的大量实验表明,我们的模型优于其他基线模型的比例分别为 94.4%、88.3%、86.5% 和 83.0%。
{"title":"A Parallel Model for Jointly Extracting Entities and Relations","authors":"Zuqin Chen, Yujie Zheng, Jike Ge, Wencheng Yu, Zining Wang","doi":"10.1007/s11063-024-11616-x","DOIUrl":"https://doi.org/10.1007/s11063-024-11616-x","url":null,"abstract":"<p>Extracting relational triples from a piece of text is an essential task in knowledge graph construction. However, most existing methods either identify entities before predicting their relations, or detect relations before recognizing associated entities. This order may lead to error accumulation because once there is an error in the initial step, it will accumulate to subsequent steps. To solve this problem, we propose a parallel model for jointly extracting entities and relations, called PRE-Span, which consists of two mutually independent submodules. Specifically, candidate entities and relations are first generated by enumerating token sequences in sentences. Then, two independent submodules (Entity Extraction Module and Relation Detection Module) are designed to predict entities and relations. Finally, the predicted results of the two submodules are analyzed to select entities and relations, which are jointly decoded to obtain relational triples. The advantage of this method is that all triples can be extracted in just one step. Extensive experiments on the WebNLG*, NYT*, NYT and WebNLG datasets show that our model outperforms other baselines at 94.4%, 88.3%, 86.5% and 83.0%, respectively.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140887757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Non-linear Time Series Prediction using Improved CEEMDAN, SVD and LSTM 利用改进的 CEEMDAN、SVD 和 LSTM 进行非线性时间序列预测
IF 3.1 4区 计算机科学 Q2 Computer Science Pub Date : 2024-05-06 DOI: 10.1007/s11063-024-11622-z
Sameer Poongadan, M. C. Lineesh

This study recommends a new time series forecasting model, namely ICEEMDAN - SVD - LSTM model, which coalesces Improved Complete Ensemble EMD with Adaptive Noise, Singular Value Decomposition and Long Short Term Memory network. It can be applied to analyse Non-linear and non-stationary data. The framework of this model is comprised of three levels, namely ICEEMDAN level, SVD level and LSTM level. The first level utilized ICEEMDAN to break up the series into some IMF components along with a residue. The SVD in the second level accounts for de-noising of every IMF component and residue. LSTM forecasts all the resultant IMF components and residue in third level. To obtain the forecasted values of the original data, the predictions of all IMF components and residue are added. The proposed model is contrasted with other extant ones, namely LSTM model, EMD - LSTM model, EEMD - LSTM model, CEEMDAN - LSTM model, EEMD - SVD - LSTM model, ICEEMDAN - LSTM model and CEEMDAN - SVD - LSTM model. The comparison bears witness to the potential of the recommended model over the traditional models.

本研究推荐了一种新的时间序列预测模型,即 ICEEMDAN - SVD - LSTM 模型,它将改进的完整集合 EMD 与自适应噪声、奇异值分解和长短期记忆网络结合在一起。它可用于分析非线性和非平稳数据。该模型的框架包括三个层次,即 ICEEMDAN 层次、SVD 层次和 LSTM 层次。第一级利用 ICEEMDAN 将序列分解为一些 IMF 成分和残差。第二级中的 SVD 对每个 IMF 分量和残差进行去噪处理。在第三级中,LSTM 对所有产生的 IMF 分量和残差进行预测。为了获得原始数据的预测值,需要将所有 IMF 分量和残差的预测值相加。建议的模型与其他现有模型进行了对比,即 LSTM 模型、EMD - LSTM 模型、EEMD - LSTM 模型、CEEMDAN - LSTM 模型、EEMD - SVD - LSTM 模型、ICEEMDAN - LSTM 模型和 CEEMDAN - SVD - LSTM 模型。对比结果证明了推荐模型比传统模型更有潜力。
{"title":"Non-linear Time Series Prediction using Improved CEEMDAN, SVD and LSTM","authors":"Sameer Poongadan, M. C. Lineesh","doi":"10.1007/s11063-024-11622-z","DOIUrl":"https://doi.org/10.1007/s11063-024-11622-z","url":null,"abstract":"<p>This study recommends a new time series forecasting model, namely ICEEMDAN - SVD - LSTM model, which coalesces Improved Complete Ensemble EMD with Adaptive Noise, Singular Value Decomposition and Long Short Term Memory network. It can be applied to analyse Non-linear and non-stationary data. The framework of this model is comprised of three levels, namely ICEEMDAN level, SVD level and LSTM level. The first level utilized ICEEMDAN to break up the series into some IMF components along with a residue. The SVD in the second level accounts for de-noising of every IMF component and residue. LSTM forecasts all the resultant IMF components and residue in third level. To obtain the forecasted values of the original data, the predictions of all IMF components and residue are added. The proposed model is contrasted with other extant ones, namely LSTM model, EMD - LSTM model, EEMD - LSTM model, CEEMDAN - LSTM model, EEMD - SVD - LSTM model, ICEEMDAN - LSTM model and CEEMDAN - SVD - LSTM model. The comparison bears witness to the potential of the recommended model over the traditional models.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140887562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Neural Processing Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1