Pub Date : 2024-04-16DOI: 10.1109/TAI.2024.3356669
K. P. Suba Subbalakshmi;Wojciech Samek;Xia Ben Hu
This special issue brings together seven articles that address different aspects of explainable and interpretable artificial intelligence (AI). Over the years, machine learning (ML) and AI models have posted strong performance across several tasks. This has sparked interest in deploying these methods in critical applications like health and finance. However, to be deployable in the field, ML and AI models must be trustworthy. Explainable and interpretable AI are two areas of research that have become increasingly important to ensure trustworthiness and hence deployability of advanced AI and ML methods. Interpretable AI are models that obey some domain-specific constraints so that they are better understandable by humans. In essence, they are not black-box models. On the other hand, explainable AI refers to models and methods that are typically used to explain another black-box model.
本特刊汇集了七篇文章,探讨了可解释和可解释人工智能(AI)的不同方面。多年来,机器学习(ML)和人工智能模型在多项任务中表现出色。这激发了人们将这些方法部署到健康和金融等关键应用领域的兴趣。然而,要在该领域部署,ML 和 AI 模型必须值得信赖。可解释人工智能和可解释人工智能是两个日益重要的研究领域,可确保先进人工智能和 ML 方法的可信度和可部署性。可解释的人工智能模型遵从某些特定领域的约束条件,因此更容易被人类理解。从本质上讲,它们不是黑盒模型。另一方面,可解释人工智能指的是通常用于解释另一个黑盒模型的模型和方法。
{"title":"Guest Editorial: New Developments in Explainable and Interpretable Artificial Intelligence","authors":"K. P. Suba Subbalakshmi;Wojciech Samek;Xia Ben Hu","doi":"10.1109/TAI.2024.3356669","DOIUrl":"https://doi.org/10.1109/TAI.2024.3356669","url":null,"abstract":"This special issue brings together seven articles that address different aspects of explainable and interpretable artificial intelligence (AI). Over the years, machine learning (ML) and AI models have posted strong performance across several tasks. This has sparked interest in deploying these methods in critical applications like health and finance. However, to be deployable in the field, ML and AI models must be trustworthy. Explainable and interpretable AI are two areas of research that have become increasingly important to ensure trustworthiness and hence deployability of advanced AI and ML methods. Interpretable AI are models that obey some domain-specific constraints so that they are better understandable by humans. In essence, they are not black-box models. On the other hand, explainable AI refers to models and methods that are typically used to explain another black-box model.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 4","pages":"1427-1428"},"PeriodicalIF":0.0,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10500898","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140559370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-16DOI: 10.1109/TAI.2024.3389611
Hezhe Sun;Yufei Wang;Huiwen Yang;Kaixuan Huo;Yuzhe Li
Privacy-aware machine learning paradigms have sparked widespread concern due to their ability to safeguard the local privacy of data owners, preventing the leakage of private information to untrustworthy platforms or malicious third parties. This article focuses on characterizing the interactions between the learner and the data owner within this privacy-aware training process. Here, the data owner hesitates to transmit the original gradient to the learner due to potential cybersecurity issues, such as gradient leakage and membership inference. To address this concern, we propose a Stackelberg game framework that models the training process. In this framework, the data owner's objective is not to maximize the discrepancy between the learner's obtained gradient and the true gradient but rather to ensure that the learner obtains a gradient closely resembling one deliberately designed by the data owner, while the learner's objective is to recover the true gradient as accurately as possible. We derive the optimal encoder and decoder using mismatched cost functions and characterize the equilibrium for specific cases, balancing model accuracy and local privacy. Numerical examples illustrate the main results, and we conclude with expanding discussions to suggest future investigations into reliable countermeasure designs.
{"title":"Strategic Gradient Transmission With Targeted Privacy-Awareness in Model Training: A Stackelberg Game Analysis","authors":"Hezhe Sun;Yufei Wang;Huiwen Yang;Kaixuan Huo;Yuzhe Li","doi":"10.1109/TAI.2024.3389611","DOIUrl":"https://doi.org/10.1109/TAI.2024.3389611","url":null,"abstract":"Privacy-aware machine learning paradigms have sparked widespread concern due to their ability to safeguard the local privacy of data owners, preventing the leakage of private information to untrustworthy platforms or malicious third parties. This article focuses on characterizing the interactions between the learner and the data owner within this privacy-aware training process. Here, the data owner hesitates to transmit the original gradient to the learner due to potential cybersecurity issues, such as gradient leakage and membership inference. To address this concern, we propose a Stackelberg game framework that models the training process. In this framework, the data owner's objective is not to maximize the discrepancy between the learner's obtained gradient and the true gradient but rather to ensure that the learner obtains a gradient closely resembling one deliberately designed by the data owner, while the learner's objective is to recover the true gradient as accurately as possible. We derive the optimal encoder and decoder using mismatched cost functions and characterize the equilibrium for specific cases, balancing model accuracy and local privacy. Numerical examples illustrate the main results, and we conclude with expanding discussions to suggest future investigations into reliable countermeasure designs.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 9","pages":"4635-4648"},"PeriodicalIF":0.0,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142169723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-16DOI: 10.1109/TAI.2024.3388389
Mingfu Xue;Xin Wang;Yinghao Wu;Shifeng Ni;Leo Yu Zhang;Yushu Zhang;Weiqiang Liu
Intellectual property (IP) protection for deep neural networks (DNNs) has raised serious concerns in recent years. Most existing works embed watermarks in the DNN model for IP protection, which need to modify the model and do not consider/mention interpretability. In this article, for the first time, we propose an interpretable IP protection method for DNN based on explainable artificial intelligence. Compared with existing works, the proposed method does not modify the DNN model, and the decision of the ownership verification is interpretable. We extract the intrinsic features of the DNN model by using deep Taylor decomposition. Since the intrinsic feature is composed of unique interpretation of the model's decision, the intrinsic feature can be regarded as fingerprint of the model. If the fingerprint of a suspected model is the same as the original model, the suspected model is considered as a pirated model. Experimental results demonstrate that the fingerprints can be successfully used to verify the ownership of the model and the test accuracy of the model is not affected. Furthermore, the proposed method is robust to fine-tuning attack, pruning attack, watermark overwriting attack, and adaptive attack.
{"title":"An Explainable Intellectual Property Protection Method for Deep Neural Networks Based on Intrinsic Features","authors":"Mingfu Xue;Xin Wang;Yinghao Wu;Shifeng Ni;Leo Yu Zhang;Yushu Zhang;Weiqiang Liu","doi":"10.1109/TAI.2024.3388389","DOIUrl":"https://doi.org/10.1109/TAI.2024.3388389","url":null,"abstract":"Intellectual property (IP) protection for deep neural networks (DNNs) has raised serious concerns in recent years. Most existing works embed watermarks in the DNN model for IP protection, which need to modify the model and do not consider/mention interpretability. In this article, for the first time, we propose an interpretable IP protection method for DNN based on explainable artificial intelligence. Compared with existing works, the proposed method does not modify the DNN model, and the decision of the ownership verification is interpretable. We extract the intrinsic features of the DNN model by using deep Taylor decomposition. Since the intrinsic feature is composed of unique interpretation of the model's decision, the intrinsic feature can be regarded as fingerprint of the model. If the fingerprint of a suspected model is the same as the original model, the suspected model is considered as a pirated model. Experimental results demonstrate that the fingerprints can be successfully used to verify the ownership of the model and the test accuracy of the model is not affected. Furthermore, the proposed method is robust to fine-tuning attack, pruning attack, watermark overwriting attack, and adaptive attack.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 9","pages":"4649-4659"},"PeriodicalIF":0.0,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142169670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}