Pub Date : 2025-12-25DOI: 10.1109/TIFS.2025.3648567
Yufeng Zhang;Xilai Wang;Wenwei Song;Wenxiong Kang
Dynamic hand gesture authentication (DHGA) has emerged as a promising biometric technology, offering enhanced theoretical security over conventional unimodal systems by combining both physiological and behavioral characteristics. Existing DHGA research predominantly focuses on controlled lab conditions, therefore showing low generalizability to uncontrolled application conditions. To bridge this gap, we propose a novel Skeleton-assistant Standardization and Authentication Framework (SSAF) that incorporates a generic data preprocessing method before authentication. First, we introduce a Geometry-Environment Standardization (GE-Stan) method to standardize five primary geometric and environmental factors inducing data distribution discrepancy, significantly improving robustness across different sessions and scenarios. Notably, the GE-Stan method can be applied to most existing algorithms and brings substantial improvement. Second, we design an Appearance and Motion Network (AM-Net) to fully leverage standardized video and skeleton data. It decouples appearance and motion features using specialized representation and processing strategies. Therefore, our SSAF achieves a flexible balance between accuracy and efficiency, enabling up to $3.6times $ efficiency boost with only minor accuracy trade-offs. Finally, to support real-world evaluation, we also contribute a new challenging dataset, SCUT-RealDHGA, captured under uncontrolled practical conditions with diverse backgrounds and illuminations. Extensive experiments across three DHGA datasets demonstrate that SSAF outperforms existing methods in terms of accuracy, efficiency, and robustness. The code and dataset are available at https://github.com/SCUT-BIP-Lab/SSAF
{"title":"Enhancing Perceptron Constancy for Real-World Dynamic Hand Gesture Authentication","authors":"Yufeng Zhang;Xilai Wang;Wenwei Song;Wenxiong Kang","doi":"10.1109/TIFS.2025.3648567","DOIUrl":"10.1109/TIFS.2025.3648567","url":null,"abstract":"Dynamic hand gesture authentication (DHGA) has emerged as a promising biometric technology, offering enhanced theoretical security over conventional unimodal systems by combining both physiological and behavioral characteristics. Existing DHGA research predominantly focuses on controlled lab conditions, therefore showing low generalizability to uncontrolled application conditions. To bridge this gap, we propose a novel Skeleton-assistant Standardization and Authentication Framework (SSAF) that incorporates a generic data preprocessing method before authentication. First, we introduce a Geometry-Environment Standardization (GE-Stan) method to standardize five primary geometric and environmental factors inducing data distribution discrepancy, significantly improving robustness across different sessions and scenarios. Notably, the GE-Stan method can be applied to most existing algorithms and brings substantial improvement. Second, we design an Appearance and Motion Network (AM-Net) to fully leverage standardized video and skeleton data. It decouples appearance and motion features using specialized representation and processing strategies. Therefore, our SSAF achieves a flexible balance between accuracy and efficiency, enabling up to <inline-formula> <tex-math>$3.6times $ </tex-math></inline-formula> efficiency boost with only minor accuracy trade-offs. Finally, to support real-world evaluation, we also contribute a new challenging dataset, SCUT-RealDHGA, captured under uncontrolled practical conditions with diverse backgrounds and illuminations. Extensive experiments across three DHGA datasets demonstrate that SSAF outperforms existing methods in terms of accuracy, efficiency, and robustness. The code and dataset are available at <uri>https://github.com/SCUT-BIP-Lab/SSAF</uri>","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"21 ","pages":"886-899"},"PeriodicalIF":8.0,"publicationDate":"2025-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145830164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, with the widespread application of Vision Transformer (ViT) in visual trackers, their robustness has received increasing attention. However, by focusing on global interactions between image patches, ViT reduces sensitivity to local noise, posing new challenges for adversarial attacks. Meanwhile, existing decision-based adversarial attack methods often overlook the differences in noise sensitivity between different patches, further limiting the compression efficiency of adversarial noise, especially in ViT. In visual tracking, existing adversarial attack methods primarily target Siamese network-based trackers, and research on adversarial attacks against Transformer-based trackers, particularly decision-based black-box attacks, is still relatively limited. To implement effective black-box attacks on Transformer-based trackers, this paper innovatively proposes patch-based adversarial noise compression (PANC), a decision-based adversarial attack method. This method effectively compresses adversarial noise patch by patch, significantly improving compression efficiency and attack concealment. PANC also introduces a noise sensitivity matrix that dynamically adds and reduces adversarial noise, optimizing the spatial distribution of noise while decreasing the number of queries. We validated the effectiveness of the proposed PANC attack method on several Transformer-based trackers, including OSTrack, STARK, TransT, and MixformerV2, and three public large-scale benchmark datasets: GOT-10k, TrackingNet, and LaSOT. Experimental results show that compared to the existing state-of-the-art adversarial attack method, the IoU attack, PANC compresses the noise level to 10%, improving the attack effectiveness by 162% with the number of queries of only 45.7%. Furthermore, PANC can serve as an initialization or post-processing optimization strategy for other adversarial attack methods, providing a more flexible and efficient mechanism for adversarial example generation. Our work reveals the vulnerabilities of existing Transformer-based visual trackers and offers new ideas for further improving the efficiency and concealment of adversarial attacks.
{"title":"Towards Patch-Based Noise Compression for Adversarial Attack Against Transformer-Based Visual Tracking","authors":"Peng Gao;Long Xu;Wen-Jia Tang;Fei Wang;Hamido Fujita;Hanan Aljuaid;Ru-Yue Yuan","doi":"10.1109/TIFS.2025.3648551","DOIUrl":"10.1109/TIFS.2025.3648551","url":null,"abstract":"In recent years, with the widespread application of Vision Transformer (ViT) in visual trackers, their robustness has received increasing attention. However, by focusing on global interactions between image patches, ViT reduces sensitivity to local noise, posing new challenges for adversarial attacks. Meanwhile, existing decision-based adversarial attack methods often overlook the differences in noise sensitivity between different patches, further limiting the compression efficiency of adversarial noise, especially in ViT. In visual tracking, existing adversarial attack methods primarily target Siamese network-based trackers, and research on adversarial attacks against Transformer-based trackers, particularly decision-based black-box attacks, is still relatively limited. To implement effective black-box attacks on Transformer-based trackers, this paper innovatively proposes patch-based adversarial noise compression (PANC), a decision-based adversarial attack method. This method effectively compresses adversarial noise patch by patch, significantly improving compression efficiency and attack concealment. PANC also introduces a noise sensitivity matrix that dynamically adds and reduces adversarial noise, optimizing the spatial distribution of noise while decreasing the number of queries. We validated the effectiveness of the proposed PANC attack method on several Transformer-based trackers, including OSTrack, STARK, TransT, and MixformerV2, and three public large-scale benchmark datasets: GOT-10k, TrackingNet, and LaSOT. Experimental results show that compared to the existing state-of-the-art adversarial attack method, the IoU attack, PANC compresses the noise level to 10%, improving the attack effectiveness by 162% with the number of queries of only 45.7%. Furthermore, PANC can serve as an initialization or post-processing optimization strategy for other adversarial attack methods, providing a more flexible and efficient mechanism for adversarial example generation. Our work reveals the vulnerabilities of existing Transformer-based visual trackers and offers new ideas for further improving the efficiency and concealment of adversarial attacks.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"21 ","pages":"854-869"},"PeriodicalIF":8.0,"publicationDate":"2025-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145830167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Coffer: An Efficient and Scalable TEE on RISC-V","authors":"Mingde Ren, Jiatong Chen, Ziquan Wang, Fengwei Zhang, Zhenyu Ning, Heming Cui","doi":"10.1109/tifs.2025.3648190","DOIUrl":"https://doi.org/10.1109/tifs.2025.3648190","url":null,"abstract":"","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"126 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145830163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-24DOI: 10.1109/TIFS.2025.3648202
Nanxing Meng;Qizao Wang;Bin Li;Xiangyang Xue
With rich temporal-spatial information, video-based person re-identification methods have shown broad prospects. Although tracklets can be easily obtained with ready-made tracking models, annotating identities is still expensive and impractical. Therefore, some video-based methods propose using only a few identity annotations or camera labels to facilitate feature learning. They also simply average the frame features of each tracklet, overlooking unexpected variations and inherent identity consistency within tracklets. In this paper, we propose the Self-Supervised Refined Clustering (SSR-C) framework without relying on any annotation or auxiliary information to promote unsupervised video person re-identification. Specifically, we first propose the Noise-Filtered Tracklet Partition (NFTP) module to reduce the feature bias of tracklets caused by noisy tracking results, and sequentially partition the noise-filtered tracklets into “sub-tracklets”. Then, we cluster and further merge sub-tracklets using the self-supervised signal from the tracklet partition, which is enhanced through a progressive strategy to generate reliable pseudo labels, facilitating intra-class cross-tracklet aggregation. Moreover, we propose the Class Smoothing Classification (CSC) loss to efficiently promote model learning. Extensive experiments on the MARS and DukeMTMC-VideoReID datasets demonstrate that our proposed SSR-C for unsupervised video person re-identification achieves state-of-the-art results and is comparable to advanced supervised methods. The code is available at https://github.com/Darylmeng/SSRC-Reid
{"title":"Unleashing the Potential of Tracklets for Unsupervised Video Person Re-Identification","authors":"Nanxing Meng;Qizao Wang;Bin Li;Xiangyang Xue","doi":"10.1109/TIFS.2025.3648202","DOIUrl":"10.1109/TIFS.2025.3648202","url":null,"abstract":"With rich temporal-spatial information, video-based person re-identification methods have shown broad prospects. Although tracklets can be easily obtained with ready-made tracking models, annotating identities is still expensive and impractical. Therefore, some video-based methods propose using only a few identity annotations or camera labels to facilitate feature learning. They also simply average the frame features of each tracklet, overlooking unexpected variations and inherent identity consistency within tracklets. In this paper, we propose the Self-Supervised Refined Clustering (SSR-C) framework without relying on any annotation or auxiliary information to promote unsupervised video person re-identification. Specifically, we first propose the Noise-Filtered Tracklet Partition (NFTP) module to reduce the feature bias of tracklets caused by noisy tracking results, and sequentially partition the noise-filtered tracklets into “sub-tracklets”. Then, we cluster and further merge sub-tracklets using the self-supervised signal from the tracklet partition, which is enhanced through a progressive strategy to generate reliable pseudo labels, facilitating intra-class cross-tracklet aggregation. Moreover, we propose the Class Smoothing Classification (CSC) loss to efficiently promote model learning. Extensive experiments on the MARS and DukeMTMC-VideoReID datasets demonstrate that our proposed SSR-C for unsupervised video person re-identification achieves state-of-the-art results and is comparable to advanced supervised methods. The code is available at <uri>https://github.com/Darylmeng/SSRC-Reid</uri>","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"21 ","pages":"431-445"},"PeriodicalIF":8.0,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145823458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-24DOI: 10.1109/tifs.2025.3646493
Lihai Nie, Xiaodong Dong, Lili Shi, Laiping Zhao, Zheli Liu
{"title":"Stinger: A Light-weight Website Fingerprinting Defense through Poisoning Packet Sequences","authors":"Lihai Nie, Xiaodong Dong, Lili Shi, Laiping Zhao, Zheli Liu","doi":"10.1109/tifs.2025.3646493","DOIUrl":"https://doi.org/10.1109/tifs.2025.3646493","url":null,"abstract":"","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"4 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145823448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Decentralized storage auditing approaches are designed to ensure data security in dishonest decentralized storage providers. However, the need for data updates introduces new challenges to the design of decentralized storage auditing approaches. Existing approaches can support dynamic auditing for updated files. Unfortunately, they can only deal with block-level updating, which is counter-intuitive and requires conversion from semantic changes to binary changes. Furthermore, existing dynamic auditing approaches require the recalculation of auxiliary auditing information (e.g., auditing authenticators) in data owners, which imposes unnecessary additional burdens on data owners, particularly those with constrained resources in decentralized storage environments. In this paper, we focus on image files and propose $textsf {iAudit}$ , an efficient pixel-level dynamic image auditing approach in decentralized storage. We first design a novel image authenticator with image pixels for efficient dynamic auditing, which combines convolution operations and polynomial commitment in authenticator construction. Additionally, we build an owner-free dynamic mechanism in dynamic decentralized storage auditing approach by utilizing zero-knowledge proof techniques. In this way, the dynamic operation overheads incurred by auditing can be completely eliminated from the data owners. A prototype of $textsf {iAudit}$ is implemented, and extensive experimental results demonstrate that $textsf {iAudit}$ outperforms state-of-the-art works, achieving over a $210 times $ speedup for data owner in dynamic update phase.
{"title":"iAudit: Toward Efficient Pixel-Level Dynamic Image Auditing in Decentralized Storage","authors":"Haiyang Yu;Yinglong Gao;Shen Su;Zhen Yang;Yuwen Chen;Shui Yu","doi":"10.1109/TIFS.2025.3648191","DOIUrl":"10.1109/TIFS.2025.3648191","url":null,"abstract":"Decentralized storage auditing approaches are designed to ensure data security in dishonest decentralized storage providers. However, the need for data updates introduces new challenges to the design of decentralized storage auditing approaches. Existing approaches can support dynamic auditing for updated files. Unfortunately, they can only deal with block-level updating, which is counter-intuitive and requires conversion from semantic changes to binary changes. Furthermore, existing dynamic auditing approaches require the recalculation of auxiliary auditing information (e.g., auditing authenticators) in data owners, which imposes unnecessary additional burdens on data owners, particularly those with constrained resources in decentralized storage environments. In this paper, we focus on image files and propose <inline-formula> <tex-math>$textsf {iAudit}$ </tex-math></inline-formula>, an efficient pixel-level dynamic image auditing approach in decentralized storage. We first design a novel image authenticator with image pixels for efficient dynamic auditing, which combines convolution operations and polynomial commitment in authenticator construction. Additionally, we build an owner-free dynamic mechanism in dynamic decentralized storage auditing approach by utilizing zero-knowledge proof techniques. In this way, the dynamic operation overheads incurred by auditing can be completely eliminated from the data owners. A prototype of <inline-formula> <tex-math>$textsf {iAudit}$ </tex-math></inline-formula> is implemented, and extensive experimental results demonstrate that <inline-formula> <tex-math>$textsf {iAudit}$ </tex-math></inline-formula> outperforms state-of-the-art works, achieving over a <inline-formula> <tex-math>$210 times $ </tex-math></inline-formula> speedup for data owner in dynamic update phase.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"21 ","pages":"913-928"},"PeriodicalIF":8.0,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145823456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-24DOI: 10.1109/tifs.2025.3648158
Shijie Rao, Yidong Huang, Xueqian Zhang, Hao Fang, Ajian Liu, Jun Wan, Kaiyu Cui, Yali Li
{"title":"HySpeFAS: A Hyperspectral Face Anti-spoofing Dataset based on Snapshot Compressive Imaging","authors":"Shijie Rao, Yidong Huang, Xueqian Zhang, Hao Fang, Ajian Liu, Jun Wan, Kaiyu Cui, Yali Li","doi":"10.1109/tifs.2025.3648158","DOIUrl":"https://doi.org/10.1109/tifs.2025.3648158","url":null,"abstract":"","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"45 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145823451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-24DOI: 10.1109/TIFS.2025.3648161
Yinbin Miao;Xin Wang;Guijuan Wang;Yibing Wang;Kaifa Zheng;Xinghua Li;Zhiquan Liu;Robert H. Deng
With the widespread use of encrypted spatial data, many range query schemes emerge to address potential security risks caused by access pattern leakage. However, most existing schemes rely on a dual-server model to hide access patterns and often involve complex spatial relation judgments during range comparisons, leading to low query efficiency. To address these issues, we propose a novel Fast and Access Hidden Range Query (FAHRQ) scheme. First, we introduce an efficient range membership verification technique based on Bloom filters and Lagrange interpolation function, combine homomorphic encryption to ensure the confidentiality of spatial data and the computational flexibility of related operations, and realize the access pattern hidden under single server. Then, we construct an index using R-tree and employ Bloom filters and prefix 0-1 encoding to accelerate the minimum bounding rectangle intersection judgment, enabling secure and efficient range queries over encrypted spatial data while maintaining retrieval accuracy. Finally, we give a formal security analysis to show that our scheme achieves access pattern hidden while protecting data security, and conduct extensive experiments to demonstrate that our scheme improves query efficiency by $5-7times $ compared to existing schemes.
{"title":"Search Me in the Dark: Access Pattern-Hidden Range Query Over Encrypted Spatial Data","authors":"Yinbin Miao;Xin Wang;Guijuan Wang;Yibing Wang;Kaifa Zheng;Xinghua Li;Zhiquan Liu;Robert H. Deng","doi":"10.1109/TIFS.2025.3648161","DOIUrl":"10.1109/TIFS.2025.3648161","url":null,"abstract":"With the widespread use of encrypted spatial data, many range query schemes emerge to address potential security risks caused by access pattern leakage. However, most existing schemes rely on a dual-server model to hide access patterns and often involve complex spatial relation judgments during range comparisons, leading to low query efficiency. To address these issues, we propose a novel Fast and Access Hidden Range Query (FAHRQ) scheme. First, we introduce an efficient range membership verification technique based on Bloom filters and Lagrange interpolation function, combine homomorphic encryption to ensure the confidentiality of spatial data and the computational flexibility of related operations, and realize the access pattern hidden under single server. Then, we construct an index using R-tree and employ Bloom filters and prefix 0-1 encoding to accelerate the minimum bounding rectangle intersection judgment, enabling secure and efficient range queries over encrypted spatial data while maintaining retrieval accuracy. Finally, we give a formal security analysis to show that our scheme achieves access pattern hidden while protecting data security, and conduct extensive experiments to demonstrate that our scheme improves query efficiency by <inline-formula> <tex-math>$5-7times $ </tex-math></inline-formula> compared to existing schemes.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"21 ","pages":"784-797"},"PeriodicalIF":8.0,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145823457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Machine learning (ML) is highly effective for accurate encrypted malicious traffic identification by using high-quality training data. In fact, obtaining such data is costly and challenging. As a result, many ML-based models are inevitably trained on low-quality data and perform poorly. To enhance performance, some methods utilize various sample selection techniques to choose confident samples for model training. However, they often rely on a single metric for this selection, which restricts their adaptability across diverse datasets and noise conditions. In this paper, we propose a robust framework BAPTISM for identifying encrypted malicious traffic with low-quality training data. Particularly, BAPTISM selects a suitable base model for each task, and trains it with early stopping to generate traffic representation before overfitting occurs. Then, we devise an adaptive metric selection strategy to select confident samples. By employing two metrics (JSD and CSD) to assess the characteristic of traffic representation from distinct perspectives, we find the more proper metric for each class and apply it for confident sample selection. According to the confident samples and selected metric for each class, we develop a label correction tactic which adapts to class nature to improve the quality of training data. Finally, we employ parallel training strategy to train the base model with the corrected data, further mitigating the impact of low-quality data. We conduct experiments across three real-world malicious traffic datasets with various noise settings. The results demonstrate that BAPTISM is compatible with different base models and outperforms across noise ratios ranging from 20% to 90%. Meanwhile, BAPTISM consistently selects the confident samples with the highest purity and volume under each setting.
{"title":"BAPTISM: A Robust Framework for Encrypted Malicious Traffic Identification With Low-Quality Training Data","authors":"Xiang Luo;Chang Liu;Gang Xiong;Gaopeng Gou;Zhen Li;Junzheng Shi;Li Guo;Binxing Fang","doi":"10.1109/TIFS.2025.3648170","DOIUrl":"10.1109/TIFS.2025.3648170","url":null,"abstract":"Machine learning (ML) is highly effective for accurate encrypted malicious traffic identification by using high-quality training data. In fact, obtaining such data is costly and challenging. As a result, many ML-based models are inevitably trained on low-quality data and perform poorly. To enhance performance, some methods utilize various sample selection techniques to choose confident samples for model training. However, they often rely on a single metric for this selection, which restricts their adaptability across diverse datasets and noise conditions. In this paper, we propose a robust framework BAPTISM for identifying encrypted malicious traffic with low-quality training data. Particularly, BAPTISM selects a suitable base model for each task, and trains it with early stopping to generate traffic representation before overfitting occurs. Then, we devise an adaptive metric selection strategy to select confident samples. By employing two metrics (JSD and CSD) to assess the characteristic of traffic representation from distinct perspectives, we find the more proper metric for each class and apply it for confident sample selection. According to the confident samples and selected metric for each class, we develop a label correction tactic which adapts to class nature to improve the quality of training data. Finally, we employ parallel training strategy to train the base model with the corrected data, further mitigating the impact of low-quality data. We conduct experiments across three real-world malicious traffic datasets with various noise settings. The results demonstrate that BAPTISM is compatible with different base models and outperforms across noise ratios ranging from 20% to 90%. Meanwhile, BAPTISM consistently selects the confident samples with the highest purity and volume under each setting.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"21 ","pages":"960-975"},"PeriodicalIF":8.0,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145823452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}