Pub Date : 2024-05-07DOI: 10.1007/s11063-024-11616-x
Zuqin Chen, Yujie Zheng, Jike Ge, Wencheng Yu, Zining Wang
Extracting relational triples from a piece of text is an essential task in knowledge graph construction. However, most existing methods either identify entities before predicting their relations, or detect relations before recognizing associated entities. This order may lead to error accumulation because once there is an error in the initial step, it will accumulate to subsequent steps. To solve this problem, we propose a parallel model for jointly extracting entities and relations, called PRE-Span, which consists of two mutually independent submodules. Specifically, candidate entities and relations are first generated by enumerating token sequences in sentences. Then, two independent submodules (Entity Extraction Module and Relation Detection Module) are designed to predict entities and relations. Finally, the predicted results of the two submodules are analyzed to select entities and relations, which are jointly decoded to obtain relational triples. The advantage of this method is that all triples can be extracted in just one step. Extensive experiments on the WebNLG*, NYT*, NYT and WebNLG datasets show that our model outperforms other baselines at 94.4%, 88.3%, 86.5% and 83.0%, respectively.
{"title":"A Parallel Model for Jointly Extracting Entities and Relations","authors":"Zuqin Chen, Yujie Zheng, Jike Ge, Wencheng Yu, Zining Wang","doi":"10.1007/s11063-024-11616-x","DOIUrl":"https://doi.org/10.1007/s11063-024-11616-x","url":null,"abstract":"<p>Extracting relational triples from a piece of text is an essential task in knowledge graph construction. However, most existing methods either identify entities before predicting their relations, or detect relations before recognizing associated entities. This order may lead to error accumulation because once there is an error in the initial step, it will accumulate to subsequent steps. To solve this problem, we propose a parallel model for jointly extracting entities and relations, called PRE-Span, which consists of two mutually independent submodules. Specifically, candidate entities and relations are first generated by enumerating token sequences in sentences. Then, two independent submodules (Entity Extraction Module and Relation Detection Module) are designed to predict entities and relations. Finally, the predicted results of the two submodules are analyzed to select entities and relations, which are jointly decoded to obtain relational triples. The advantage of this method is that all triples can be extracted in just one step. Extensive experiments on the WebNLG*, NYT*, NYT and WebNLG datasets show that our model outperforms other baselines at 94.4%, 88.3%, 86.5% and 83.0%, respectively.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"17 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140887757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-06DOI: 10.1007/s11063-024-11622-z
Sameer Poongadan, M. C. Lineesh
This study recommends a new time series forecasting model, namely ICEEMDAN - SVD - LSTM model, which coalesces Improved Complete Ensemble EMD with Adaptive Noise, Singular Value Decomposition and Long Short Term Memory network. It can be applied to analyse Non-linear and non-stationary data. The framework of this model is comprised of three levels, namely ICEEMDAN level, SVD level and LSTM level. The first level utilized ICEEMDAN to break up the series into some IMF components along with a residue. The SVD in the second level accounts for de-noising of every IMF component and residue. LSTM forecasts all the resultant IMF components and residue in third level. To obtain the forecasted values of the original data, the predictions of all IMF components and residue are added. The proposed model is contrasted with other extant ones, namely LSTM model, EMD - LSTM model, EEMD - LSTM model, CEEMDAN - LSTM model, EEMD - SVD - LSTM model, ICEEMDAN - LSTM model and CEEMDAN - SVD - LSTM model. The comparison bears witness to the potential of the recommended model over the traditional models.
{"title":"Non-linear Time Series Prediction using Improved CEEMDAN, SVD and LSTM","authors":"Sameer Poongadan, M. C. Lineesh","doi":"10.1007/s11063-024-11622-z","DOIUrl":"https://doi.org/10.1007/s11063-024-11622-z","url":null,"abstract":"<p>This study recommends a new time series forecasting model, namely ICEEMDAN - SVD - LSTM model, which coalesces Improved Complete Ensemble EMD with Adaptive Noise, Singular Value Decomposition and Long Short Term Memory network. It can be applied to analyse Non-linear and non-stationary data. The framework of this model is comprised of three levels, namely ICEEMDAN level, SVD level and LSTM level. The first level utilized ICEEMDAN to break up the series into some IMF components along with a residue. The SVD in the second level accounts for de-noising of every IMF component and residue. LSTM forecasts all the resultant IMF components and residue in third level. To obtain the forecasted values of the original data, the predictions of all IMF components and residue are added. The proposed model is contrasted with other extant ones, namely LSTM model, EMD - LSTM model, EEMD - LSTM model, CEEMDAN - LSTM model, EEMD - SVD - LSTM model, ICEEMDAN - LSTM model and CEEMDAN - SVD - LSTM model. The comparison bears witness to the potential of the recommended model over the traditional models.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"18 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140887562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-04DOI: 10.1007/s11063-024-11627-8
Yunyun Wang, Qinghao Li, Ziyi Hua
Different from Unsupervised Domain Adaptation (UDA), Source-Free Unsupervised Domain Adaptation (SFUDA) transfers source knowledge to target domain without accessing the source data, using only the source model, has attracted much attention recently. One mainstream SFUDA method fine-tunes the source model by self-training to generate pseudo-labels of the target data. However, due to the significant differences between different domains, these target pseudo-labels often contain some noise, and it will inevitably degenerates the target performance. For this purpose, we propose an innovative SFUDA method with adaptive pseudo-labels learning named Dual Classifier Adaptation (DCA). In DCA, a dual classifier structure is introduced to adaptively learn target pseudo-labels by cooperation between source and target classifiers. Simultaneously, the minimax entropy is introduced for target learning, in order to adapt target data to source model, while capture the intrinsic cluster structure in target domain as well. After compared our proposed method DCA with a range of UDA and SFUDA methods, DCA achieves far ahead performance on several benchmark datasets.
{"title":"Dual Classifier Adaptation: Source-Free UDA via Adaptive Pseudo-Labels Learning","authors":"Yunyun Wang, Qinghao Li, Ziyi Hua","doi":"10.1007/s11063-024-11627-8","DOIUrl":"https://doi.org/10.1007/s11063-024-11627-8","url":null,"abstract":"<p>Different from Unsupervised Domain Adaptation (UDA), Source-Free Unsupervised Domain Adaptation (SFUDA) transfers source knowledge to target domain without accessing the source data, using only the source model, has attracted much attention recently. One mainstream SFUDA method fine-tunes the source model by self-training to generate pseudo-labels of the target data. However, due to the significant differences between different domains, these target pseudo-labels often contain some noise, and it will inevitably degenerates the target performance. For this purpose, we propose an innovative SFUDA method with adaptive pseudo-labels learning named Dual Classifier Adaptation (DCA). In DCA, a dual classifier structure is introduced to adaptively learn target pseudo-labels by cooperation between source and target classifiers. Simultaneously, the minimax entropy is introduced for target learning, in order to adapt target data to source model, while capture the intrinsic cluster structure in target domain as well. After compared our proposed method DCA with a range of UDA and SFUDA methods, DCA achieves far ahead performance on several benchmark datasets.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"13 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140887575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-02DOI: 10.1007/s11063-024-11619-8
Haowei Chen, Chen Li, Jiajing Liang, Lihua Tian
With the continuous advancement of social information, the number of texts in the form of dialogue between individuals has exponentially increased. However, it is very challenging to review the previous dialogue content before initiating a new conversation. In view of the above background, a new dialogue summarization algorithm based on multi-task learning is first proposed in the paper. Specifically, Minimum Risk Training is used as the loss function to alleviate the problem of inconsistent goals between the training phase and the testing phase. Then, in order to deal with the problem that the model cannot effectively distinguish gender pronouns, a gender pronoun discrimination auxiliary task based on contrast learning is designed to help the model learn to distinguish different gender pronouns. Finally, an auxiliary task of reducing exposure bias is introduced, which involves incorporating the summary generated during inference into another round of training to reduce the difference between the decoder inputs during the training and testing stages. Experimental results show that our model outperforms strong baselines on three public dialogue summarization datasets: SAMSUM, DialogSum, and CSDS.
{"title":"A Dialogues Summarization Algorithm Based on Multi-task Learning","authors":"Haowei Chen, Chen Li, Jiajing Liang, Lihua Tian","doi":"10.1007/s11063-024-11619-8","DOIUrl":"https://doi.org/10.1007/s11063-024-11619-8","url":null,"abstract":"<p>With the continuous advancement of social information, the number of texts in the form of dialogue between individuals has exponentially increased. However, it is very challenging to review the previous dialogue content before initiating a new conversation. In view of the above background, a new dialogue summarization algorithm based on multi-task learning is first proposed in the paper. Specifically, Minimum Risk Training is used as the loss function to alleviate the problem of inconsistent goals between the training phase and the testing phase. Then, in order to deal with the problem that the model cannot effectively distinguish gender pronouns, a gender pronoun discrimination auxiliary task based on contrast learning is designed to help the model learn to distinguish different gender pronouns. Finally, an auxiliary task of reducing exposure bias is introduced, which involves incorporating the summary generated during inference into another round of training to reduce the difference between the decoder inputs during the training and testing stages. Experimental results show that our model outperforms strong baselines on three public dialogue summarization datasets: SAMSUM, DialogSum, and CSDS.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"57 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140833468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-02DOI: 10.1007/s11063-024-11625-w
Harsha Putla, Chanakya Patibandla, Krishna Pratap Singh, P Nagabhushan
This research explores the vulnerability of selective reincarnation, a concept in Multi-Agent Reinforcement Learning (MARL), in response to observation poisoning attacks. Observation poisoning is an adversarial strategy that subtly manipulates an agent’s observation space, potentially leading to a misdirection in its learning process. The primary aim of this paper is to systematically evaluate the robustness of selective reincarnation in MARL systems against the subtle yet potentially debilitating effects of observation poisoning attacks. Through assessing how manipulated observation data influences MARL agents, we seek to highlight potential vulnerabilities and inform the development of more resilient MARL systems. Our experimental testbed was the widely used HalfCheetah environment, utilizing the Independent Deep Deterministic Policy Gradient algorithm within a cooperative MARL setting. We introduced a series of triggers, namely Gaussian noise addition, observation reversal, random shuffling, and scaling, into the teacher dataset of the MARL system provided to the reincarnating agents of HalfCheetah. Here, the “teacher dataset” refers to the stored experiences from previous training sessions used to accelerate the learning of reincarnating agents in MARL. This approach enabled the observation of these triggers’ significant impact on reincarnation decisions. Specifically, the reversal technique showed the most pronounced negative effect for maximum returns, with an average decrease of 38.08% in Kendall’s tau values across all the agent combinations. With random shuffling, Kendall’s tau values decreased by 17.66%. On the other hand, noise addition and scaling aligned with the original ranking by only 21.42% and 32.66%, respectively. The results, quantified by Kendall’s tau metric, indicate the fragility of the selective reincarnation process under adversarial observation poisoning. Our findings also reveal that vulnerability to observation poisoning varies significantly among different agent combinations, with some exhibiting markedly higher susceptibility than others. This investigation elucidates our understanding of selective reincarnation’s robustness against observation poisoning attacks, which is crucial for developing more secure MARL systems and also for making informed decisions about agent reincarnation.
{"title":"A Pilot Study of Observation Poisoning on Selective Reincarnation in Multi-Agent Reinforcement Learning","authors":"Harsha Putla, Chanakya Patibandla, Krishna Pratap Singh, P Nagabhushan","doi":"10.1007/s11063-024-11625-w","DOIUrl":"https://doi.org/10.1007/s11063-024-11625-w","url":null,"abstract":"<p>This research explores the vulnerability of selective reincarnation, a concept in Multi-Agent Reinforcement Learning (MARL), in response to observation poisoning attacks. Observation poisoning is an adversarial strategy that subtly manipulates an agent’s observation space, potentially leading to a misdirection in its learning process. The primary aim of this paper is to systematically evaluate the robustness of selective reincarnation in MARL systems against the subtle yet potentially debilitating effects of observation poisoning attacks. Through assessing how manipulated observation data influences MARL agents, we seek to highlight potential vulnerabilities and inform the development of more resilient MARL systems. Our experimental testbed was the widely used HalfCheetah environment, utilizing the Independent Deep Deterministic Policy Gradient algorithm within a cooperative MARL setting. We introduced a series of triggers, namely Gaussian noise addition, observation reversal, random shuffling, and scaling, into the teacher dataset of the MARL system provided to the reincarnating agents of HalfCheetah. Here, the “teacher dataset” refers to the stored experiences from previous training sessions used to accelerate the learning of reincarnating agents in MARL. This approach enabled the observation of these triggers’ significant impact on reincarnation decisions. Specifically, the reversal technique showed the most pronounced negative effect for maximum returns, with an average decrease of 38.08% in Kendall’s tau values across all the agent combinations. With random shuffling, Kendall’s tau values decreased by 17.66%. On the other hand, noise addition and scaling aligned with the original ranking by only 21.42% and 32.66%, respectively. The results, quantified by Kendall’s tau metric, indicate the fragility of the selective reincarnation process under adversarial observation poisoning. Our findings also reveal that vulnerability to observation poisoning varies significantly among different agent combinations, with some exhibiting markedly higher susceptibility than others. This investigation elucidates our understanding of selective reincarnation’s robustness against observation poisoning attacks, which is crucial for developing more secure MARL systems and also for making informed decisions about agent reincarnation.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"308 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140887758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-29DOI: 10.1007/s11063-024-11607-y
Yongshuai Zhang, Jiajin Huang, Jian Yang
Social recommendation aims to improve the recommendation performance by learning user interest and social representations from users’ interaction records and social relations. Intuitively, these learned representations entangle user interest factors with social factors because users’ interaction behaviors and social relations affect each other. A high-quality recommender system should provide items to a user according to his/her interest factors. However, most existing social recommendation models aggregate the two kinds of representations indiscriminately, and this kind of aggregation limits their recommendation performance. In this paper, we develop a model called Disentangled Variational autoencoder for Social Recommendation (DVSR) to disentangle interest and social factors from the two kinds of user representations. Firstly, we perform a preliminary analysis of the entangled information on three popular social recommendation datasets. Then, we present the model architecture of DVSR, which is based on the Variational AutoEncoder (VAE) framework. Besides the traditional method of training VAE, we also use contrastive estimation to penalize the mutual information between interest and social factors. Extensive experiments are conducted on three benchmark datasets to evaluate the effectiveness of our model.
社交推荐旨在通过从用户的交互记录和社交关系中学习用户兴趣和社交表征来提高推荐性能。直观地说,这些学习到的表征将用户兴趣因素与社会因素联系在一起,因为用户的交互行为和社会关系会相互影响。高质量的推荐系统应根据用户的兴趣因素向其提供项目。然而,现有的社交推荐模型大多将这两种表征不加区分地聚合在一起,这种聚合限制了其推荐性能。在本文中,我们开发了一种名为 "社交推荐变异自动编码器"(Disentangled Variational autoencoder for Social Recommendation,DVSR)的模型,将兴趣因素和社交因素从两种用户表征中分离出来。首先,我们在三个流行的社交推荐数据集上对纠缠信息进行了初步分析。然后,我们介绍了基于变异自动编码器(VAE)框架的 DVSR 模型架构。除了传统的 VAE 训练方法外,我们还使用了对比估计法来惩罚兴趣因素和社会因素之间的互信息。我们在三个基准数据集上进行了广泛的实验,以评估我们模型的有效性。
{"title":"Disentangled Variational Autoencoder for Social Recommendation","authors":"Yongshuai Zhang, Jiajin Huang, Jian Yang","doi":"10.1007/s11063-024-11607-y","DOIUrl":"https://doi.org/10.1007/s11063-024-11607-y","url":null,"abstract":"<p>Social recommendation aims to improve the recommendation performance by learning user interest and social representations from users’ interaction records and social relations. Intuitively, these learned representations entangle user interest factors with social factors because users’ interaction behaviors and social relations affect each other. A high-quality recommender system should provide items to a user according to his/her interest factors. However, most existing social recommendation models aggregate the two kinds of representations indiscriminately, and this kind of aggregation limits their recommendation performance. In this paper, we develop a model called <b>D</b>isentangled <b>V</b>ariational autoencoder for <b>S</b>ocial <b>R</b>ecommendation (DVSR) to disentangle interest and social factors from the two kinds of user representations. Firstly, we perform a preliminary analysis of the entangled information on three popular social recommendation datasets. Then, we present the model architecture of DVSR, which is based on the Variational AutoEncoder (VAE) framework. Besides the traditional method of training VAE, we also use contrastive estimation to penalize the mutual information between interest and social factors. Extensive experiments are conducted on three benchmark datasets to evaluate the effectiveness of our model.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"18 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140833239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-29DOI: 10.1007/s11063-024-11624-x
Yuge Xu, Chuanlong Lyu
Long-tailed recognition performs poorly on minority classes. The extremely imbalanced distribution of classifier weight norms leads to a decision boundary biased toward majority classes. To address this issue, we propose Class-Balanced Regularization to balance the distribution of classifier weight norms so that the model can make more balanced and reasonable classification decisions. In detail, CBR separately adjusts the regularization factors based on L2 regularization to be correlated with the class sample frequency positively, rather than using a fixed regularization factor. CBR trains balanced classifiers by increasing the L2 norm penalty for majority classes and reducing the penalty for minority classes. Since CBR is mainly used for classification adjustment instead of feature extraction, we adopt a two-stage training algorithm. In the first stage, the network with the traditional empirical risk minimization is trained, and in the second stage, CBR for classifier adjustment is applied. To validate the effectiveness of CBR, we perform extensive experiments on CIFAR10-LT, CIFAR100-LT, and ImageNet-LT datasets. The results demonstrate that CBR significantly improves performance by effectively balancing the distribution of classifier weight norms.
{"title":"Class-Balanced Regularization for Long-Tailed Recognition","authors":"Yuge Xu, Chuanlong Lyu","doi":"10.1007/s11063-024-11624-x","DOIUrl":"https://doi.org/10.1007/s11063-024-11624-x","url":null,"abstract":"<p>Long-tailed recognition performs poorly on minority classes. The extremely imbalanced distribution of classifier weight norms leads to a decision boundary biased toward majority classes. To address this issue, we propose Class-Balanced Regularization to balance the distribution of classifier weight norms so that the model can make more balanced and reasonable classification decisions. In detail, CBR separately adjusts the regularization factors based on L2 regularization to be correlated with the class sample frequency positively, rather than using a fixed regularization factor. CBR trains balanced classifiers by increasing the L2 norm penalty for majority classes and reducing the penalty for minority classes. Since CBR is mainly used for classification adjustment instead of feature extraction, we adopt a two-stage training algorithm. In the first stage, the network with the traditional empirical risk minimization is trained, and in the second stage, CBR for classifier adjustment is applied. To validate the effectiveness of CBR, we perform extensive experiments on CIFAR10-LT, CIFAR100-LT, and ImageNet-LT datasets. The results demonstrate that CBR significantly improves performance by effectively balancing the distribution of classifier weight norms.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"94 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140833440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-27DOI: 10.1007/s11063-024-11508-0
Xinyi Chang, Chunyu Du, Xinjing Song, Weifeng Liu, Yanjiang Wang
Few-shot learning has achieved satisfactory progress over the years, but these methods implicitly hypothesize that the data in the base (source) classes and novel (target) classes are sampled from the same data distribution (domain), which is often invalid in reality. The purpose of cross-domain few-shot learning (CD-FSL) is to successfully identify novel target classes with a small quantity of labeled instances on the target domain under the circumstance of domain shift between the source domain and the target domain. However, in CD-FSL, the knowledge learned by the network on the source domain often suffers from the situation of inadaptation when it is transferred to the target domain, since the instances on the source and target domains do not obey the same data distribution. To surmount this problem, we propose a Target Oriented Dynamic Adaption (TODA) model, which uses a tiny amount of target data to orient the network to dynamically adjust and adapt during training. Specifically, this work proposes a domain-specific adapter to ameliorate the network inadaptability issues in transfer to the target domain. The domain-specific adapter can make the extracted features more specific to the tasks in the target domain and reduce the impact of tasks in the source domain by combining them with the mainstream backbone network. In addition, we propose an adaptive optimization method in the network optimization process, which assigns different weights according to the importance of different optimization tasks. Extensive experiments on several benchmark datasets demonstrate the effectiveness of our TODA method.
多年来,少点学习取得了令人满意的进展,但这些方法隐含的假设是,基础(源)类和新颖(目标)类中的数据是从相同的数据分布(域)中采样的,而这在现实中往往是无效的。跨域少量学习(CD-FSL)的目的是在源域和目标域之间发生域转移的情况下,利用目标域上的少量标注实例成功识别新目标类。然而,在 CD-FSL 中,由于源域和目标域上的实例并不服从相同的数据分布,网络在源域上学习到的知识在转移到目标域时往往会出现不适应的情况。为了解决这个问题,我们提出了目标导向动态自适应(TODA)模型,该模型在训练过程中使用极少量的目标数据来引导网络进行动态调整和自适应。具体来说,这项工作提出了一个特定领域适配器,以改善网络在转移到目标领域时的不适应问题。特定领域适配器可以使提取的特征更加针对目标领域的任务,并通过与主流骨干网络相结合来减少源领域任务的影响。此外,我们还在网络优化过程中提出了自适应优化方法,根据不同优化任务的重要性分配不同权重。在多个基准数据集上的广泛实验证明了我们的 TODA 方法的有效性。
{"title":"Target Oriented Dynamic Adaption for Cross-Domain Few-Shot Learning","authors":"Xinyi Chang, Chunyu Du, Xinjing Song, Weifeng Liu, Yanjiang Wang","doi":"10.1007/s11063-024-11508-0","DOIUrl":"https://doi.org/10.1007/s11063-024-11508-0","url":null,"abstract":"<p>Few-shot learning has achieved satisfactory progress over the years, but these methods implicitly hypothesize that the data in the base (source) classes and novel (target) classes are sampled from the same data distribution (domain), which is often invalid in reality. The purpose of cross-domain few-shot learning (CD-FSL) is to successfully identify novel target classes with a small quantity of labeled instances on the target domain under the circumstance of domain shift between the source domain and the target domain. However, in CD-FSL, the knowledge learned by the network on the source domain often suffers from the situation of inadaptation when it is transferred to the target domain, since the instances on the source and target domains do not obey the same data distribution. To surmount this problem, we propose a Target Oriented Dynamic Adaption (TODA) model, which uses a tiny amount of target data to orient the network to dynamically adjust and adapt during training. Specifically, this work proposes a domain-specific adapter to ameliorate the network inadaptability issues in transfer to the target domain. The domain-specific adapter can make the extracted features more specific to the tasks in the target domain and reduce the impact of tasks in the source domain by combining them with the mainstream backbone network. In addition, we propose an adaptive optimization method in the network optimization process, which assigns different weights according to the importance of different optimization tasks. Extensive experiments on several benchmark datasets demonstrate the effectiveness of our TODA method.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"20 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140809329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-27DOI: 10.1007/s11063-024-11579-z
Claudio Filipi Gonçalves dos Santos, João Paulo Papa
Regularization helps to improve machine learning techniques by penalizing the models during training. Such approaches act in either the input, internal, or output layers. Regarding the latter, label smoothing is widely used to introduce noise in the label vector, making learning more challenging. This work proposes a new label regularization method, Random Label Smoothing, that attributes random values to the labels while preserving their semantics during training. The idea is to change the entire label into fixed arbitrary values. Results show improvements in image classification and super-resolution tasks, outperforming state-of-the-art techniques for such purposes.
{"title":"Rethinking Regularization with Random Label Smoothing","authors":"Claudio Filipi Gonçalves dos Santos, João Paulo Papa","doi":"10.1007/s11063-024-11579-z","DOIUrl":"https://doi.org/10.1007/s11063-024-11579-z","url":null,"abstract":"<p>Regularization helps to improve machine learning techniques by penalizing the models during training. Such approaches act in either the input, internal, or output layers. Regarding the latter, label smoothing is widely used to introduce noise in the label vector, making learning more challenging. This work proposes a new label regularization method, Random Label Smoothing, that attributes random values to the labels while preserving their semantics during training. The idea is to change the entire label into fixed arbitrary values. Results show improvements in image classification and super-resolution tasks, outperforming state-of-the-art techniques for such purposes.\u0000</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"18 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140809350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-26DOI: 10.1007/s11063-024-11612-1
Xiao-Song Yang
In this paper we investigate deep neural networks for binary classification of datasets from geometric perspective in order to understand the working mechanism of deep neural networks. First, we establish a geometrical result on injectivity of finite set under a projection from Euclidean space to the real line. Then by introducing notions of alternative points and alternative number, we propose an approach to design DNNs for binary classification of finite labeled points on the real line, thus proving existence of binary classification neural net with its hidden layers of width two and the number of hidden layers not larger than the cardinality of the finite labelled set. We also demonstrate geometrically how the dataset is transformed across every hidden layers in a narrow DNN setting for binary classification task.
{"title":"A Geometric Theory for Binary Classification of Finite Datasets by DNNs with Relu Activations","authors":"Xiao-Song Yang","doi":"10.1007/s11063-024-11612-1","DOIUrl":"https://doi.org/10.1007/s11063-024-11612-1","url":null,"abstract":"<p>In this paper we investigate deep neural networks for binary classification of datasets from geometric perspective in order to understand the working mechanism of deep neural networks. First, we establish a geometrical result on injectivity of finite set under a projection from Euclidean space to the real line. Then by introducing notions of alternative points and alternative number, we propose an approach to design DNNs for binary classification of finite labeled points on the real line, thus proving existence of binary classification neural net with its hidden layers of width two and the number of hidden layers not larger than the cardinality of the finite labelled set. We also demonstrate geometrically how the dataset is transformed across every hidden layers in a narrow DNN setting for binary classification task.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"12 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140809324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}