Explaining Siamese networks in few-shot learning

IF 4.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Machine Learning Pub Date : 2024-04-29 DOI:10.1007/s10994-024-06529-8

Andrea Fedele, Riccardo Guidotti, Dino Pedreschi

{"title":"Explaining Siamese networks in few-shot learning","authors":"Andrea Fedele, Riccardo Guidotti, Dino Pedreschi","doi":"10.1007/s10994-024-06529-8","DOIUrl":null,"url":null,"abstract":"<p>Machine learning models often struggle to generalize accurately when tested on new class distributions that were not present in their training data. This is a significant challenge for real-world applications that require quick adaptation without the need for retraining. To address this issue, few-shot learning frameworks, which includes models such as Siamese Networks, have been proposed. Siamese Networks learn similarity between pairs of records through a metric that can be easily extended to new, unseen classes. However, these systems lack interpretability, which can hinder their use in certain applications. To address this, we propose a data-agnostic method to explain the outcomes of Siamese Networks in the context of few-shot learning. Our explanation method is based on a post-hoc perturbation-based procedure that evaluates the contribution of individual input features to the final outcome. As such, it falls under the category of post-hoc explanation methods. We present two variants, one that considers each input feature independently, and another that evaluates the interplay between features. Additionally, we propose two perturbation procedures to evaluate feature contributions. Qualitative and quantitative results demonstrate that our method is able to identify highly discriminant intra-class and inter-class characteristics, as well as predictive behaviors that lead to misclassification by relying on incorrect features.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"38 1","pages":""},"PeriodicalIF":4.3000,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10994-024-06529-8","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Machine learning models often struggle to generalize accurately when tested on new class distributions that were not present in their training data. This is a significant challenge for real-world applications that require quick adaptation without the need for retraining. To address this issue, few-shot learning frameworks, which includes models such as Siamese Networks, have been proposed. Siamese Networks learn similarity between pairs of records through a metric that can be easily extended to new, unseen classes. However, these systems lack interpretability, which can hinder their use in certain applications. To address this, we propose a data-agnostic method to explain the outcomes of Siamese Networks in the context of few-shot learning. Our explanation method is based on a post-hoc perturbation-based procedure that evaluates the contribution of individual input features to the final outcome. As such, it falls under the category of post-hoc explanation methods. We present two variants, one that considers each input feature independently, and another that evaluates the interplay between features. Additionally, we propose two perturbation procedures to evaluate feature contributions. Qualitative and quantitative results demonstrate that our method is able to identify highly discriminant intra-class and inter-class characteristics, as well as predictive behaviors that lead to misclassification by relying on incorrect features.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

解释少儿学习中的连体网络

机器学习模型在对训练数据中不存在的新类别分布进行测试时，往往难以准确泛化。对于需要快速适应而无需重新训练的实际应用来说，这是一个巨大的挑战。为了解决这个问题，有人提出了少量学习框架，其中包括连体网络（Siamese Networks）等模型。连体网络通过一种指标来学习记录对之间的相似性，这种指标可以很容易地扩展到新的、未见过的类别。然而，这些系统缺乏可解释性，这可能会阻碍它们在某些应用中的使用。为了解决这个问题，我们提出了一种与数据无关的方法，来解释连体网络在少量学习中的结果。我们的解释方法基于一种事后扰动程序，该程序可评估各个输入特征对最终结果的贡献。因此，它属于事后解释方法的范畴。我们提出了两种变体，一种是独立考虑每个输入特征，另一种是评估特征之间的相互作用。此外，我们还提出了两种扰动程序来评估特征贡献。定性和定量结果表明，我们的方法能够识别高区分度的类内和类间特征，以及依赖不正确特征而导致误分类的预测行为。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Machine Learning 工程技术-计算机：人工智能

CiteScore

11.00

自引率

2.70%

发文量

162

审稿时长

3 months

期刊介绍： Machine Learning serves as a global platform dedicated to computational approaches in learning. The journal reports substantial findings on diverse learning methods applied to various problems, offering support through empirical studies, theoretical analysis, or connections to psychological phenomena. It demonstrates the application of learning methods to solve significant problems and aims to enhance the conduct of machine learning research with a focus on verifiable and replicable evidence in published papers.