As person image variations are likely to cause a part misalignment problem, most previous person Re-Identification (ReID) works may adopt local feature partition or additional landmark annotations to acquire aligned person features and boost ReID performance. However, such approaches either only achieve coarse-grained part alignments without considering detailed image variations within each part, or require extra annotated landmarks to train an available pose estimation model. In this work, we propose an effective Discriminable and Reliable Transformer (DRFormer) framework to learn part-aligned person representations with only person identity labels. Specifically, the DRFormer framework consists of Discriminable Feature Transformer (DFT) and Reliable Feature Transformer (RFT) modules, which generate discriminable and reliable high-order features, respectively. For reducing the dimension of high-order features, the DFT module utilizes a Self-Attentive Kronecker Product (SAKP) algorithm to promote the representational capabilities of compressed features via a self-attention strategy. For eliminating the background noise, the RFT module mines the foreground regions to adaptively aggregate foreground features via a Gumbel-Softmax strategy. Moreover, the proposed framework derives from an interpretable motivation and elegantly solves part misalignments without using feature partition or pose estimation. This paper theoretically and experimentally demonstrates the superiority of the proposed DRFormer framework, achieving state-of-the-art performance on various person ReID datasets.
{"title":"DRFormer: A Discriminable and Reliable Feature Transformer for Person Re-Identification","authors":"Pingyu Wang;Xingjian Zheng;Linbo Qing;Bonan Li;Fei Su;Zhicheng Zhao;Honggang Chen","doi":"10.1109/TIFS.2024.3520304","DOIUrl":"10.1109/TIFS.2024.3520304","url":null,"abstract":"As person image variations are likely to cause a part misalignment problem, most previous person Re-Identification (ReID) works may adopt local feature partition or additional landmark annotations to acquire aligned person features and boost ReID performance. However, such approaches either only achieve coarse-grained part alignments without considering detailed image variations within each part, or require extra annotated landmarks to train an available pose estimation model. In this work, we propose an effective Discriminable and Reliable Transformer (DRFormer) framework to learn part-aligned person representations with only person identity labels. Specifically, the DRFormer framework consists of Discriminable Feature Transformer (DFT) and Reliable Feature Transformer (RFT) modules, which generate discriminable and reliable high-order features, respectively. For reducing the dimension of high-order features, the DFT module utilizes a Self-Attentive Kronecker Product (SAKP) algorithm to promote the representational capabilities of compressed features via a self-attention strategy. For eliminating the background noise, the RFT module mines the foreground regions to adaptively aggregate foreground features via a Gumbel-Softmax strategy. Moreover, the proposed framework derives from an interpretable motivation and elegantly solves part misalignments without using feature partition or pose estimation. This paper theoretically and experimentally demonstrates the superiority of the proposed DRFormer framework, achieving state-of-the-art performance on various person ReID datasets.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"980-995"},"PeriodicalIF":6.3,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142858456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-18DOI: 10.1109/TIFS.2024.3520020
Yiping Zhang;Yuntao Shou;Wei Ai;Tao Meng;Keqin Li
With the recent advances in computer vision, age estimation has significantly improved in overall accuracy. However, owing to the most common methods do not take into account the class imbalance problem in age estimation datasets, they suffer from a large bias in recognizing long-tailed groups. To achieve high-quality imbalanced learning in long-tailed groups, the dominant solution lies in that the feature extractor learns the discriminative features of different groups and the classifier is able to provide appropriate and unbiased margins for different groups by the discriminative features. Therefore, in this novel, we propose an innovative collaborative learning framework (GroupFace) that integrates a multi-hop attention graph convolutional network and a dynamic group-aware margin strategy based on reinforcement learning. Specifically, to extract the discriminative features of different groups, we design an enhanced multi-hop attention graph convolutional network. This network is capable of capturing the interactions of neighboring nodes at different distances, fusing local and global information to model facial deep aging, and exploring diverse representations of different groups. In addition, to further address the class imbalance problem, we design a dynamic group-aware margin strategy based on reinforcement learning to provide appropriate and unbiased margins for different groups. The strategy divides the sample into four age groups and considers identifying the optimum margins for various age groups by employing a Markov decision process. Under the guidance of the agent, the feature representation bias and the classification margin deviation between different groups can be reduced simultaneously, balancing inter-class separability and intra-class proximity. After joint optimization, our architecture achieves excellent performance on several age estimation benchmark datasets. It not only achieves large improvements in overall estimation accuracy but also gains balanced performance in long-tailed group estimation.
{"title":"GroupFace: Imbalanced Age Estimation Based on Multi-Hop Attention Graph Convolutional Network and Group-Aware Margin Optimization","authors":"Yiping Zhang;Yuntao Shou;Wei Ai;Tao Meng;Keqin Li","doi":"10.1109/TIFS.2024.3520020","DOIUrl":"10.1109/TIFS.2024.3520020","url":null,"abstract":"With the recent advances in computer vision, age estimation has significantly improved in overall accuracy. However, owing to the most common methods do not take into account the class imbalance problem in age estimation datasets, they suffer from a large bias in recognizing long-tailed groups. To achieve high-quality imbalanced learning in long-tailed groups, the dominant solution lies in that the feature extractor learns the discriminative features of different groups and the classifier is able to provide appropriate and unbiased margins for different groups by the discriminative features. Therefore, in this novel, we propose an innovative collaborative learning framework (GroupFace) that integrates a multi-hop attention graph convolutional network and a dynamic group-aware margin strategy based on reinforcement learning. Specifically, to extract the discriminative features of different groups, we design an enhanced multi-hop attention graph convolutional network. This network is capable of capturing the interactions of neighboring nodes at different distances, fusing local and global information to model facial deep aging, and exploring diverse representations of different groups. In addition, to further address the class imbalance problem, we design a dynamic group-aware margin strategy based on reinforcement learning to provide appropriate and unbiased margins for different groups. The strategy divides the sample into four age groups and considers identifying the optimum margins for various age groups by employing a Markov decision process. Under the guidance of the agent, the feature representation bias and the classification margin deviation between different groups can be reduced simultaneously, balancing inter-class separability and intra-class proximity. After joint optimization, our architecture achieves excellent performance on several age estimation benchmark datasets. It not only achieves large improvements in overall estimation accuracy but also gains balanced performance in long-tailed group estimation.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"605-619"},"PeriodicalIF":6.3,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142849469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Internet of Medical Things (IoMT) has gained significant research focus in both academic and medical institutions. Nevertheless, the sensitive data involved in IoMT raises concerns regarding user validation and data privacy. To address these concerns, certificateless signcryption (CLSC) has emerged as a promising solution, offering authenticity, confidentiality, and unforgeability. Unfortunately, most existing CLSC schemes are impractical for IoMT due to their heavy computational and storage requirements. Additionally, these schemes are vulnerable to quantum computing attacks. Therefore, research focusing on designing an efficient post-quantum CLSC scheme is still far-reaching. In this work, we propose PQ-CLSCL, a novel post-quantum CLSC scheme with linkability for IoMT. Our proposed design facilitates secure transmission of medical data between physicians and patients, effectively validating user legitimacy and minimizing the risk of private information leakage. To achieve this, we leverage lattice sampling algorithms and hash functions to generate the partial secret key, then employ the sign-then-encrypt method and design a link label. We also formalize and prove the security of our design, including indistinguishability against chosen-ciphertext attacks (IND-CCA2), existential unforgeability against chosen-message attacks (EU-CMA), and linkability. Finally, through comprehensive performance evaluation, our computation overhead is just 5% of other existing schemes. The evaluation results demonstrate that our solution is practical and efficient.
{"title":"Efficient and Secure Post-Quantum Certificateless Signcryption With Linkability for IoMT","authors":"Shiyuan Xu;Xue Chen;Yu Guo;Siu-Ming Yiu;Shang Gao;Bin Xiao","doi":"10.1109/TIFS.2024.3520007","DOIUrl":"10.1109/TIFS.2024.3520007","url":null,"abstract":"The Internet of Medical Things (IoMT) has gained significant research focus in both academic and medical institutions. Nevertheless, the sensitive data involved in IoMT raises concerns regarding user validation and data privacy. To address these concerns, certificateless signcryption (CLSC) has emerged as a promising solution, offering authenticity, confidentiality, and unforgeability. Unfortunately, most existing CLSC schemes are impractical for IoMT due to their heavy computational and storage requirements. Additionally, these schemes are vulnerable to quantum computing attacks. Therefore, research focusing on designing an efficient post-quantum CLSC scheme is still far-reaching. In this work, we propose PQ-CLSCL, a novel post-quantum CLSC scheme with linkability for IoMT. Our proposed design facilitates secure transmission of medical data between physicians and patients, effectively validating user legitimacy and minimizing the risk of private information leakage. To achieve this, we leverage lattice sampling algorithms and hash functions to generate the partial secret key, then employ the sign-then-encrypt method and design a link label. We also formalize and prove the security of our design, including indistinguishability against chosen-ciphertext attacks (IND-CCA2), existential unforgeability against chosen-message attacks (EU-CMA), and linkability. Finally, through comprehensive performance evaluation, our computation overhead is just 5% of other existing schemes. The evaluation results demonstrate that our solution is practical and efficient.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"1119-1134"},"PeriodicalIF":6.3,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10806671","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142849579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Localization is a key premise for implementing the applications of underwater acoustic sensor networks (UASNs). However, the inhomogeneous medium and the open feature of underwater environment make it challenging to accomplish the above task. This paper studies the privacy-preserving localization issue of UASNs with consideration of direct and indirect data threats. To handle the direct data threat, a privacy-preserving localization protocol is designed for sensor nodes, where the mutual information is adopted to acquire the optimal noises added on anchor nodes. With the collected range information from anchor nodes, a ray tracing model is employed for sensor nodes to compensate the range bias caused by straight-line propagation. Then, a differential privacy (DP) based deep learning localization estimator is designed to calculate the positions of sensor nodes, and the perturbations are added to the forward propagation of deep learning framework, such that the indirect data leakage can be avoided. Besides that, the theory analyses including the Cramer-Rao Lower Bound (CRLB), the privacy budget and the complexity are provided. Main innovations of this paper include: 1) the mutual information-based localization protocol can acquire the optimal noise over the traditional noise-adding mechanisms; 2) the DP-based deep learning estimator can avoid the leakage of training data caused by overfitting in traditional deep learning-based solutions. Finally, simulation and experimental results are both conducted to verify the effectiveness of our approach.
{"title":"Privacy-Preserving Localization for Underwater Acoustic Sensor Networks: A Differential Privacy-Based Deep Learning Approach","authors":"Jing Yan;Yuhan Zheng;Xian Yang;Cailian Chen;Xinping Guan","doi":"10.1109/TIFS.2024.3518069","DOIUrl":"10.1109/TIFS.2024.3518069","url":null,"abstract":"Localization is a key premise for implementing the applications of underwater acoustic sensor networks (UASNs). However, the inhomogeneous medium and the open feature of underwater environment make it challenging to accomplish the above task. This paper studies the privacy-preserving localization issue of UASNs with consideration of direct and indirect data threats. To handle the direct data threat, a privacy-preserving localization protocol is designed for sensor nodes, where the mutual information is adopted to acquire the optimal noises added on anchor nodes. With the collected range information from anchor nodes, a ray tracing model is employed for sensor nodes to compensate the range bias caused by straight-line propagation. Then, a differential privacy (DP) based deep learning localization estimator is designed to calculate the positions of sensor nodes, and the perturbations are added to the forward propagation of deep learning framework, such that the indirect data leakage can be avoided. Besides that, the theory analyses including the Cramer-Rao Lower Bound (CRLB), the privacy budget and the complexity are provided. Main innovations of this paper include: 1) the mutual information-based localization protocol can acquire the optimal noise over the traditional noise-adding mechanisms; 2) the DP-based deep learning estimator can avoid the leakage of training data caused by overfitting in traditional deep learning-based solutions. Finally, simulation and experimental results are both conducted to verify the effectiveness of our approach.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"737-752"},"PeriodicalIF":6.3,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142849470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AI-synthesized speech, also known as deepfake speech, has recently raised significant concerns due to the rapid advancement of speech synthesis and speech conversion techniques. Previous works often rely on distinguishing synthesizer artifacts to identify deepfake speech. However, excessive reliance on these specific synthesizer artifacts may result in unsatisfactory performance when addressing speech signals created by unseen synthesizers. In this paper, we propose a robust deepfake speech detection method that employs feature decomposition to learn synthesizer-independent content features as complementary for detection. Specifically, we propose a dual-stream feature decomposition learning strategy that decomposes the learned speech representation using a synthesizer stream and a content stream. The synthesizer stream specializes in learning synthesizer features through supervised training with synthesizer labels. Meanwhile, the content stream focuses on learning synthesizer-independent content features, enabled by a pseudo-labeling-based supervised learning method. This method randomly transforms speech to generate speed and compression labels for training. Additionally, we employ an adversarial learning technique to reduce the synthesizer-related components in the content stream. The final classification is determined by concatenating the synthesizer and content features. To enhance the model’s robustness to different synthesizer characteristics, we further propose a synthesizer feature augmentation strategy that randomly blends the characteristic styles within real and fake audio features and randomly shuffles the synthesizer features with the content features. This strategy effectively enhances the feature diversity and simulates more feature combinations. Experimental results on four deepfake speech benchmark datasets demonstrate that our model achieves state-of-the-art robust detection performance across various evaluation scenarios, including cross-method, cross-dataset, and cross-language evaluations.
{"title":"Robust AI-Synthesized Speech Detection Using Feature Decomposition Learning and Synthesizer Feature Augmentation","authors":"Kuiyuan Zhang;Zhongyun Hua;Yushu Zhang;Yifang Guo;Tao Xiang","doi":"10.1109/TIFS.2024.3520001","DOIUrl":"10.1109/TIFS.2024.3520001","url":null,"abstract":"AI-synthesized speech, also known as deepfake speech, has recently raised significant concerns due to the rapid advancement of speech synthesis and speech conversion techniques. Previous works often rely on distinguishing synthesizer artifacts to identify deepfake speech. However, excessive reliance on these specific synthesizer artifacts may result in unsatisfactory performance when addressing speech signals created by unseen synthesizers. In this paper, we propose a robust deepfake speech detection method that employs feature decomposition to learn synthesizer-independent content features as complementary for detection. Specifically, we propose a dual-stream feature decomposition learning strategy that decomposes the learned speech representation using a synthesizer stream and a content stream. The synthesizer stream specializes in learning synthesizer features through supervised training with synthesizer labels. Meanwhile, the content stream focuses on learning synthesizer-independent content features, enabled by a pseudo-labeling-based supervised learning method. This method randomly transforms speech to generate speed and compression labels for training. Additionally, we employ an adversarial learning technique to reduce the synthesizer-related components in the content stream. The final classification is determined by concatenating the synthesizer and content features. To enhance the model’s robustness to different synthesizer characteristics, we further propose a synthesizer feature augmentation strategy that randomly blends the characteristic styles within real and fake audio features and randomly shuffles the synthesizer features with the content features. This strategy effectively enhances the feature diversity and simulates more feature combinations. Experimental results on four deepfake speech benchmark datasets demonstrate that our model achieves state-of-the-art robust detection performance across various evaluation scenarios, including cross-method, cross-dataset, and cross-language evaluations.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"871-885"},"PeriodicalIF":6.3,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142849581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-18DOI: 10.1109/TIFS.2024.3520014
Bo Gao;Weiwei Liu;Guangjie Liu;Fengyuan Nie;Jianan Huang
Deep learning-based website fingerprinting (WF) attacks dominate website traffic classification. In the real world, the main challenges limiting their effectiveness are, on the one hand, the difficulty in countering the effect of content updates on the basis of accurate descriptions of page features in traffic representations. On the other hand, the model’s accuracy relies on training numerous samples, requiring constant manual labeling. The key to solving the problem is to find a website traffic representation that can stably and accurately display page features, as well as to perform self-supervised learning that is not reliant on manual labeling. This study introduces the multi-level resource-coherented graph convolutional neural network (MRCGCN), a self-supervised learning-based WF attack. It analyzes website traffic using resources as the basic unit, which are coarser than packets, ensuring the page’s unique resource layout while improving the robustness of the representations. Then, we utilized an echelon-ordered graph kernel function to extract the graph topology as the label for website traffic. Finally, a two-channel graph convolutional neural network is designed for constructing a self-supervised learning-based traffic classifier. We evaluated the WF attacks using real data in both closed- and open-world scenarios. The results demonstrate that the proposed WF attack has superior and more comprehensive performance compared to state-of-the-art methods.
{"title":"Multi-Level Resource-Coherented Graph Learning for Website Fingerprinting Attacks","authors":"Bo Gao;Weiwei Liu;Guangjie Liu;Fengyuan Nie;Jianan Huang","doi":"10.1109/TIFS.2024.3520014","DOIUrl":"10.1109/TIFS.2024.3520014","url":null,"abstract":"Deep learning-based website fingerprinting (WF) attacks dominate website traffic classification. In the real world, the main challenges limiting their effectiveness are, on the one hand, the difficulty in countering the effect of content updates on the basis of accurate descriptions of page features in traffic representations. On the other hand, the model’s accuracy relies on training numerous samples, requiring constant manual labeling. The key to solving the problem is to find a website traffic representation that can stably and accurately display page features, as well as to perform self-supervised learning that is not reliant on manual labeling. This study introduces the multi-level resource-coherented graph convolutional neural network (MRCGCN), a self-supervised learning-based WF attack. It analyzes website traffic using resources as the basic unit, which are coarser than packets, ensuring the page’s unique resource layout while improving the robustness of the representations. Then, we utilized an echelon-ordered graph kernel function to extract the graph topology as the label for website traffic. Finally, a two-channel graph convolutional neural network is designed for constructing a self-supervised learning-based traffic classifier. We evaluated the WF attacks using real data in both closed- and open-world scenarios. The results demonstrate that the proposed WF attack has superior and more comprehensive performance compared to state-of-the-art methods.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"693-708"},"PeriodicalIF":6.3,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142849471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Model Inversion Attacks (MIAs) pose a certain threat to the data privacy of learning-based systems, as they enable adversaries to reconstruct identifiable features of the training distribution with only query access to the victim model. In the context of deep learning, the primary challenges associated with MIAs are suboptimal attack success rates and the corresponding high computational costs. Prior efforts assumed that the expansive search space caused these limitations, employing generative models to constrain the dimensions of the search space. Despite the initial success of these generative-based solutions, recent experiments have cast doubt on this fundamental assumption, leaving two open questions about the influential factors determining MIA performance and how to manipulate these factors to improve MIAs. To answer these questions, we reframe MIAs from the perspective of information flow. This new formulation allows us to establish a lower bound for the error probability of MIAs, determined by two critical factors: (1) the size of the search space and (2) the mutual information between input and output random variables. Through a detailed analysis of generative-based MIAs within this theoretical framework, we uncover a trade-off between the size of the search space and the generation capability of generative models. Based on the theoretical conclusions, we introduce the Query-Efficient Model Inversion Approach (QE-MIA). By strategically selecting an appropriate search space and introducing additional mutual information, QE-MIA achieves a reduction of $60%sim 70%$ in query overhead while concurrently enhancing the attack success rate by $5%sim 25%$ .
模型反转攻击(mia)对基于学习的系统的数据隐私构成了一定的威胁,因为它们使攻击者能够仅通过对受害者模型的查询访问来重建训练分布的可识别特征。在深度学习的背景下,与MIAs相关的主要挑战是次优攻击成功率和相应的高计算成本。先前的研究假设是庞大的搜索空间造成了这些限制,使用生成模型来约束搜索空间的维度。尽管这些基于生成的解决方案取得了初步成功,但最近的实验对这一基本假设提出了质疑,留下了两个悬而未决的问题,即决定MIA性能的影响因素以及如何操纵这些因素来改善MIA。为了回答这些问题,我们从信息流的角度重新构建mia。这个新公式允许我们建立mia错误概率的下界,它由两个关键因素决定:(1)搜索空间的大小和(2)输入和输出随机变量之间的互信息。通过在该理论框架内对基于生成的MIAs的详细分析,我们发现了搜索空间大小与生成模型的生成能力之间的权衡。在理论结论的基础上,提出了查询高效模型反演方法(Query-Efficient Model Inversion Approach, QE-MIA)。通过战略性地选择适当的搜索空间并引入额外的互信息,QE-MIA实现了查询开销减少60%至70%,同时将攻击成功率提高了5%至25%。
{"title":"Query-Efficient Model Inversion Attacks: An Information Flow View","authors":"Yixiao Xu;Binxing Fang;Mohan Li;Xiaolei Liu;Zhihong Tian","doi":"10.1109/TIFS.2024.3518779","DOIUrl":"10.1109/TIFS.2024.3518779","url":null,"abstract":"Model Inversion Attacks (MIAs) pose a certain threat to the data privacy of learning-based systems, as they enable adversaries to reconstruct identifiable features of the training distribution with only query access to the victim model. In the context of deep learning, the primary challenges associated with MIAs are suboptimal attack success rates and the corresponding high computational costs. Prior efforts assumed that the expansive search space caused these limitations, employing generative models to constrain the dimensions of the search space. Despite the initial success of these generative-based solutions, recent experiments have cast doubt on this fundamental assumption, leaving two open questions about the influential factors determining MIA performance and how to manipulate these factors to improve MIAs. To answer these questions, we reframe MIAs from the perspective of information flow. This new formulation allows us to establish a lower bound for the error probability of MIAs, determined by two critical factors: (1) the size of the search space and (2) the mutual information between input and output random variables. Through a detailed analysis of generative-based MIAs within this theoretical framework, we uncover a trade-off between the size of the search space and the generation capability of generative models. Based on the theoretical conclusions, we introduce the Query-Efficient Model Inversion Approach (QE-MIA). By strategically selecting an appropriate search space and introducing additional mutual information, QE-MIA achieves a reduction of <inline-formula> <tex-math>$60%sim 70%$ </tex-math></inline-formula> in query overhead while concurrently enhancing the attack success rate by <inline-formula> <tex-math>$5%sim 25%$ </tex-math></inline-formula>.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"1023-1036"},"PeriodicalIF":6.3,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142849580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-18DOI: 10.1109/TIFS.2024.3520015
Yuhang Qiu;Honghui Chen;Xingbo Dong;Zheng Lin;Iman Yi Liao;Massimo Tistarelli;Zhe Jin
Determining dense feature points on fingerprints used in constructing deep fixed-length representations for accurate matching, particularly at the pixel level, is of significant interest. To explore the interpretability of fingerprint matching, we propose a multi-stage interpretable fingerprint matching network, namely Interpretable Fixed-length Representation for Fingerprint Matching via Vision Transformer (IFViT), which consists of two primary modules. The first module, an interpretable dense registration module, establishes a Vision Transformer (ViT)-based Siamese Network to capture long-range dependencies and the global context in fingerprint pairs. It provides interpretable dense pixel-wise correspondences of feature points for fingerprint alignment and enhances the interpretability in the subsequent matching stage. The second module takes into account both local and global representations of the aligned fingerprint pair to achieve an interpretable fixed-length representation extraction and matching. It employs the ViTs trained in the first module with the additional fully connected layer and retrains them to simultaneously produce the discriminative fixed-length representation and interpretable dense pixel-wise correspondences of feature points. Extensive experimental results on diverse publicly available fingerprint databases demonstrate that the proposed framework not only exhibits superior performance on dense registration and matching but also significantly promotes the interpretability in deep fixed-length representations-based fingerprint matching.
{"title":"IFViT: Interpretable Fixed-Length Representation for Fingerprint Matching via Vision Transformer","authors":"Yuhang Qiu;Honghui Chen;Xingbo Dong;Zheng Lin;Iman Yi Liao;Massimo Tistarelli;Zhe Jin","doi":"10.1109/TIFS.2024.3520015","DOIUrl":"10.1109/TIFS.2024.3520015","url":null,"abstract":"Determining dense feature points on fingerprints used in constructing deep fixed-length representations for accurate matching, particularly at the pixel level, is of significant interest. To explore the interpretability of fingerprint matching, we propose a multi-stage interpretable fingerprint matching network, namely Interpretable Fixed-length Representation for Fingerprint Matching via Vision Transformer (IFViT), which consists of two primary modules. The first module, an interpretable dense registration module, establishes a Vision Transformer (ViT)-based Siamese Network to capture long-range dependencies and the global context in fingerprint pairs. It provides interpretable dense pixel-wise correspondences of feature points for fingerprint alignment and enhances the interpretability in the subsequent matching stage. The second module takes into account both local and global representations of the aligned fingerprint pair to achieve an interpretable fixed-length representation extraction and matching. It employs the ViTs trained in the first module with the additional fully connected layer and retrains them to simultaneously produce the discriminative fixed-length representation and interpretable dense pixel-wise correspondences of feature points. Extensive experimental results on diverse publicly available fingerprint databases demonstrate that the proposed framework not only exhibits superior performance on dense registration and matching but also significantly promotes the interpretability in deep fixed-length representations-based fingerprint matching.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"559-573"},"PeriodicalIF":6.3,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142849468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Assessing the stealthiness of adversarial perturbations is challenging due to the lack of appropriate evaluation metrics. Existing evaluation metrics, e.g., $L_{p}$ norms or Image Quality Assessment (IQA), fall short of assessing the pixel-level stealthiness of subtle adversarial perturbations since these metrics are primarily designed for traditional distortions. To bridge this gap, we present the first comprehensive study on the subjective and objective assessment of the stealthiness of adversarial perturbations from a visual perspective at a pixel level. Specifically, we propose new subjective assessment criteria for human observers to score adversarial stealthiness in a fine-grained manner. Then, we create a large-scale adversarial example dataset comprising 10586 pairs of clean and adversarial samples encompassing twelve state-of-the-art adversarial attacks. To obtain the subjective scores according to the proposed criterion, we recruit 60 human observers, and each adversarial example is evaluated by at least 15 observers. The mean opinion score of each adversarial example is utilized for labeling. Finally, we develop a three-stage objective scoring model that mimics human scoring habits to predict adversarial perturbation’s stealthiness. Experimental results demonstrate that our objective model exhibits superior consistency with the human visual system, surpassing commonly employed metrics like PSNR and SSIM.
{"title":"Stealthiness Assessment of Adversarial Perturbation: From a Visual Perspective","authors":"Hangcheng Liu;Yuan Zhou;Ying Yang;Qingchuan Zhao;Tianwei Zhang;Tao Xiang","doi":"10.1109/TIFS.2024.3520016","DOIUrl":"10.1109/TIFS.2024.3520016","url":null,"abstract":"Assessing the stealthiness of adversarial perturbations is challenging due to the lack of appropriate evaluation metrics. Existing evaluation metrics, e.g., <inline-formula> <tex-math>$L_{p}$ </tex-math></inline-formula> norms or Image Quality Assessment (IQA), fall short of assessing the pixel-level stealthiness of subtle adversarial perturbations since these metrics are primarily designed for traditional distortions. To bridge this gap, we present the first comprehensive study on the subjective and objective assessment of the stealthiness of adversarial perturbations from a visual perspective at a pixel level. Specifically, we propose new subjective assessment criteria for human observers to score adversarial stealthiness in a fine-grained manner. Then, we create a large-scale adversarial example dataset comprising 10586 pairs of clean and adversarial samples encompassing twelve state-of-the-art adversarial attacks. To obtain the subjective scores according to the proposed criterion, we recruit 60 human observers, and each adversarial example is evaluated by at least 15 observers. The mean opinion score of each adversarial example is utilized for labeling. Finally, we develop a three-stage objective scoring model that mimics human scoring habits to predict adversarial perturbation’s stealthiness. Experimental results demonstrate that our objective model exhibits superior consistency with the human visual system, surpassing commonly employed metrics like PSNR and SSIM.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"898-913"},"PeriodicalIF":6.3,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142849467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-16DOI: 10.1109/TIFS.2024.3518065
Apollo Albright;Boris Gelfand;Michael Dixon
We show that a class of optical physical unclonable functions (PUFs) can be efficiently PAC-learned to arbitrary precision with arbitrarily high probability, even in the presence of intentionally injected noise, given access to polynomially many challenge-response pairs, under mild and practical assumptions about the distributions of the noise and challenge vectors. We motivate our analysis by identifying similarities between the integrated version of Pappu’s original optical PUF design and the post-quantum Learning with Errors (LWE) cryptosystem. We derive polynomial bounds for the required number of samples and the computational complexity of a linear regression algorithm, based on size parameters of the PUF, the distributions of the challenge and noise vectors, and the desired accuracy and probability of success of the regression algorithm. We use a similar analysis to that done by Bootle et al. [“LWE without modular reduction and improved side-channel attacks against BLISS,” in Advances in Cryptology – ASIACRYPT 2018], who demonstrated a learning attack on poorly implemented versions of LWE cryptosystems. This extends the results of Rührmair et al. [“Optical PUFs reloaded,” Cryptology ePrint Archive, 2013], who presented a theoretical framework showing that a subset of this class of PUFs is learnable in polynomial time in the absence of injected noise, under the assumption that the optics of the PUF were either linear or had negligible nonlinear effects. (Rührmair et al. also included an experimental validation of this technique, which of course included measurement uncertainty, demonstrating robustness to the presence of natural noise.) We recommend that the design of strong PUFs should be treated as a cryptographic engineering problem in physics, as PUF designs would benefit greatly from basing their physics and security on standard cryptographic assumptions. Finally, we identify future research directions, including suggestions for how to modify an LWE-based optical PUF design to better defend against cryptanalytic attacks.
{"title":"Learnability of Optical Physical Unclonable Functions Through the Lens of Learning With Errors","authors":"Apollo Albright;Boris Gelfand;Michael Dixon","doi":"10.1109/TIFS.2024.3518065","DOIUrl":"10.1109/TIFS.2024.3518065","url":null,"abstract":"We show that a class of optical physical unclonable functions (PUFs) can be efficiently PAC-learned to arbitrary precision with arbitrarily high probability, even in the presence of intentionally injected noise, given access to polynomially many challenge-response pairs, under mild and practical assumptions about the distributions of the noise and challenge vectors. We motivate our analysis by identifying similarities between the integrated version of Pappu’s original optical PUF design and the post-quantum Learning with Errors (LWE) cryptosystem. We derive polynomial bounds for the required number of samples and the computational complexity of a linear regression algorithm, based on size parameters of the PUF, the distributions of the challenge and noise vectors, and the desired accuracy and probability of success of the regression algorithm. We use a similar analysis to that done by Bootle et al. [“LWE without modular reduction and improved side-channel attacks against BLISS,” in Advances in Cryptology – ASIACRYPT 2018], who demonstrated a learning attack on poorly implemented versions of LWE cryptosystems. This extends the results of Rührmair et al. [“Optical PUFs reloaded,” Cryptology ePrint Archive, 2013], who presented a theoretical framework showing that a subset of this class of PUFs is learnable in polynomial time in the absence of injected noise, under the assumption that the optics of the PUF were either linear or had negligible nonlinear effects. (Rührmair et al. also included an experimental validation of this technique, which of course included measurement uncertainty, demonstrating robustness to the presence of natural noise.) We recommend that the design of strong PUFs should be treated as a cryptographic engineering problem in physics, as PUF designs would benefit greatly from basing their physics and security on standard cryptographic assumptions. Finally, we identify future research directions, including suggestions for how to modify an LWE-based optical PUF design to better defend against cryptanalytic attacks.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"886-897"},"PeriodicalIF":6.3,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10802998","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142832519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}