Pub Date : 2026-02-24DOI: 10.1109/TIFS.2026.3667459
Guoyuan Lin;Weiqi Luo;Peijia Zheng;Jiwu Huang
The increasing use of Voice over Internet Protocol (VoIP) technology in telecom fraud has become a serious global concern. Its ability to spoof caller IDs and IP addresses, and the use of overseas or anonymized servers make VoIP-based scams difficult to trace and regulate. As a result, distinguishing VoIP calls from conventional mobile phone calls based on voice signal characteristics is crucial for enhancing anti-fraud measures. However, existing forensic techniques often struggle to accurately identify speech transmitted via VoIP. To address this challenge, we propose a dual-level 1D-CNN that leverages both frame and utterance features for effective VoIP detection. After evaluating a range of acoustic features, we primarily focus on short-frame Mel-Frequency Cepstral Coefficients (MFCCs) due to their effectiveness in capturing VoIP characteristics. Given the frame-based processing and transmission nature of VoIP, we employ a 1D-CNN, rather than the more commonly used 2D-CNN that treats spectrograms as image, to extract frame-level codec features. Finally, we propose a dual-level classification strategy: the frame-level classifier captures encoding discrepancies within individual frames, while the utterance-level classifier aggregates these frame-level features to learn global encoding patterns through global covariance pooling. Experimental results on the VoIP Phone Call Identification Database (VPCID) demonstrate that the proposed method consistently outperforms existing approaches, delivering superior accuracy and robustness across a wide range of challenging scenarios. Moreover, comprehensive ablation studies validate the effectiveness and rationale behind the design of the proposed model architecture.
{"title":"VoIP Call Identification via a Dual-Level 1D-CNN With Frame and Utterance Features","authors":"Guoyuan Lin;Weiqi Luo;Peijia Zheng;Jiwu Huang","doi":"10.1109/TIFS.2026.3667459","DOIUrl":"10.1109/TIFS.2026.3667459","url":null,"abstract":"The increasing use of Voice over Internet Protocol (VoIP) technology in telecom fraud has become a serious global concern. Its ability to spoof caller IDs and IP addresses, and the use of overseas or anonymized servers make VoIP-based scams difficult to trace and regulate. As a result, distinguishing VoIP calls from conventional mobile phone calls based on voice signal characteristics is crucial for enhancing anti-fraud measures. However, existing forensic techniques often struggle to accurately identify speech transmitted via VoIP. To address this challenge, we propose a dual-level 1D-CNN that leverages both frame and utterance features for effective VoIP detection. After evaluating a range of acoustic features, we primarily focus on short-frame Mel-Frequency Cepstral Coefficients (MFCCs) due to their effectiveness in capturing VoIP characteristics. Given the frame-based processing and transmission nature of VoIP, we employ a 1D-CNN, rather than the more commonly used 2D-CNN that treats spectrograms as image, to extract frame-level codec features. Finally, we propose a dual-level classification strategy: the frame-level classifier captures encoding discrepancies within individual frames, while the utterance-level classifier aggregates these frame-level features to learn global encoding patterns through global covariance pooling. Experimental results on the VoIP Phone Call Identification Database (VPCID) demonstrate that the proposed method consistently outperforms existing approaches, delivering superior accuracy and robustness across a wide range of challenging scenarios. Moreover, comprehensive ablation studies validate the effectiveness and rationale behind the design of the proposed model architecture.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"21 ","pages":"2389-2402"},"PeriodicalIF":8.0,"publicationDate":"2026-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147279429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-24DOI: 10.1109/tifs.2026.3667457
Zhihao Liu, Guanghua Liu, Jia Zhang, Chenlong Wang, Tao Jiang
{"title":"A Novel Perspective on Gradient Defense: Layer-Specific Protection Against Privacy Leakage","authors":"Zhihao Liu, Guanghua Liu, Jia Zhang, Chenlong Wang, Tao Jiang","doi":"10.1109/tifs.2026.3667457","DOIUrl":"https://doi.org/10.1109/tifs.2026.3667457","url":null,"abstract":"","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"17 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147279426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GUARD: A Unified Open-Set and Closed-Set Gait Recognition Framework via Feature Reconstruction on Wi-Fi CSI","authors":"Ying Liang, Wenjie Wu, Haobo Li, Lijun Cui, Jianguo Ju, Pengfei Xu","doi":"10.1109/tifs.2026.3667485","DOIUrl":"https://doi.org/10.1109/tifs.2026.3667485","url":null,"abstract":"","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"5 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2026-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147279428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-23DOI: 10.1109/TIFS.2026.3666908
Xun Ma;Xinchen Lyu;Chenshan Ren;Guoshun Nan;Qimei Cui
Online model training is pivotal for enabling multiuser semantic communication systems to adapt to dynamic channel conditions. However, conventional frameworks suffer from prohibitive communication overhead and vulnerabilities to privacy attacks, hindering practical deployment. This paper proposes semantic information mixup (SIMix), a secure and efficient training framework that integrates Over-the-Air Mixup (OAM) with label-aware user grouping to jointly optimize spectral efficiency and semantic security. The OAM mixes semantic features of multiple users via wireless channels, inherently obfuscating sensitive data while reducing communication overhead. A closed-form Tx-Rx scaling optimization minimizes the mean square error (MSE) of over-the-air computation under channel noise, ensuring stable convergence in low-SNR regimes. Furthermore, an extended max-clique algorithm dynamically partitions users into groups with minimal intra-label similarity, reducing model inversion attack success rates. Experiments on CIFAR-10 and Tiny ImageNet demonstrate that the proposed approach is superior in terms of communication efficiency and security, reducing communication overhead by up to 25% and attaining 17.58 dB PSNR (20.98 dB reduction) under inversion attack and reducing 13.44% attack success rate under label inference attack, while achieving comparable transmission accuracy.
{"title":"Secure and Efficient Model Training Framework for Multiuser Semantic Communications via Over-the-Air Mixup","authors":"Xun Ma;Xinchen Lyu;Chenshan Ren;Guoshun Nan;Qimei Cui","doi":"10.1109/TIFS.2026.3666908","DOIUrl":"https://doi.org/10.1109/TIFS.2026.3666908","url":null,"abstract":"Online model training is pivotal for enabling multiuser semantic communication systems to adapt to dynamic channel conditions. However, conventional frameworks suffer from prohibitive communication overhead and vulnerabilities to privacy attacks, hindering practical deployment. This paper proposes semantic information mixup (SIMix), a secure and efficient training framework that integrates Over-the-Air Mixup (OAM) with label-aware user grouping to jointly optimize spectral efficiency and semantic security. The OAM mixes semantic features of multiple users via wireless channels, inherently obfuscating sensitive data while reducing communication overhead. A closed-form Tx-Rx scaling optimization minimizes the mean square error (MSE) of over-the-air computation under channel noise, ensuring stable convergence in low-SNR regimes. Furthermore, an extended max-clique algorithm dynamically partitions users into groups with minimal intra-label similarity, reducing model inversion attack success rates. Experiments on CIFAR-10 and Tiny ImageNet demonstrate that the proposed approach is superior in terms of communication efficiency and security, reducing communication overhead by up to 25% and attaining 17.58 dB PSNR (20.98 dB reduction) under inversion attack and reducing 13.44% attack success rate under label inference attack, while achieving comparable transmission accuracy.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"21 ","pages":"2358-2372"},"PeriodicalIF":8.0,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-23DOI: 10.1109/TIFS.2026.3666910
Zhiqiang Zhang;Youwen Zhu;Xiaodong Yang;Xiaohui Ding;Changhee Hahn;Jian Wang;Junbeom Hur
Attribute-Based Encryption (ABE) enables fine-grained access control over outsourced data, but its key generation process typically requires users to disclose their complete attribute sets, introducing significant privacy risks. Existing privacy-preserving approaches—such as those based on zero-knowledge proofs or tightly coupled interactive protocols—suffer from limited scalability, high communication costs, and insufficient support for selective attribute disclosure. To address these limitations, we propose a privacy-enhancing key generation protocol guided by the principle of Minimal Disclosure, which ensures that users disclose only the minimally necessary subset of attributes required for authorization. Our protocol decouples attribute verification from key issuance: users first obtain cryptographically verifiable attribute tokens, and later issue blinded key requests over selectively chosen attributes. This design enables selective disclosure, supports reusable attribute credentials, and enhances user autonomy. To improve scalability, we introduce a lightweight batch verification mechanism that reduces computation and communication overhead for the attribute authority. We prove that our protocol achieves the binding and hiding properties under standard cryptographic assumptions, and we formally verify these guarantees in the symbolic model using the ProVerif tool. In addition, we propose two privacy metrics—Attribute Inference Gain (AIG) and Privacy Gain (PG)—alongside an entropy-based analysis to quantify resistance against attribute inference attacks. Experimental results show that our scheme effectively mitigates inference leakage while offering substantial efficiency gains compared to existing schemes.
{"title":"Decoupled and Privacy-Preserving Key Generation in ABE Under the Minimal Disclosure Principle","authors":"Zhiqiang Zhang;Youwen Zhu;Xiaodong Yang;Xiaohui Ding;Changhee Hahn;Jian Wang;Junbeom Hur","doi":"10.1109/TIFS.2026.3666910","DOIUrl":"https://doi.org/10.1109/TIFS.2026.3666910","url":null,"abstract":"Attribute-Based Encryption (ABE) enables fine-grained access control over outsourced data, but its key generation process typically requires users to disclose their complete attribute sets, introducing significant privacy risks. Existing privacy-preserving approaches—such as those based on zero-knowledge proofs or tightly coupled interactive protocols—suffer from limited scalability, high communication costs, and insufficient support for selective attribute disclosure. To address these limitations, we propose a privacy-enhancing key generation protocol guided by the principle of Minimal Disclosure, which ensures that users disclose only the minimally necessary subset of attributes required for authorization. Our protocol decouples attribute verification from key issuance: users first obtain cryptographically verifiable attribute tokens, and later issue blinded key requests over selectively chosen attributes. This design enables selective disclosure, supports reusable attribute credentials, and enhances user autonomy. To improve scalability, we introduce a lightweight batch verification mechanism that reduces computation and communication overhead for the attribute authority. We prove that our protocol achieves the binding and hiding properties under standard cryptographic assumptions, and we formally verify these guarantees in the symbolic model using the ProVerif tool. In addition, we propose two privacy metrics—Attribute Inference Gain (AIG) and Privacy Gain (PG)—alongside an entropy-based analysis to quantify resistance against attribute inference attacks. Experimental results show that our scheme effectively mitigates inference leakage while offering substantial efficiency gains compared to existing schemes.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"21 ","pages":"2478-2491"},"PeriodicalIF":8.0,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147362497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-23DOI: 10.1109/TIFS.2026.3666853
Ke Cheng;Jixin Zhang;Haiyun Li;Zipeng Zhong;Mingwu Zhang;Zheng Qin
Face recognition (FR) brings convenience to people’s lives while also posing security risks. Some malicious users employ FR attacks to impersonate the identity of a target. To reveal the security risks, recent work has attacked black-box FR models by utilizing substitute models to generate adversarial face images that are misclassified as the target individual due to the attack transferability of substitute models. However, the substitute models cannot accurately approximate the target model that leads to a decrease in FR attack success rate and adversarial face image quality. To address the issue, we propose the PPOM-Attack, a substitute model-free Perturbation Prediction and Optimization Method for black-box adversarial Attack against face recognition. PPOM-Attack directly obtains feedback from the target model instead of using substitute models, it avoids any discrepancy with the attack objective. To achieve this goal, we design a proximal policy optimization (PPO)-based agent to predict the perturbation regions in the face image and self-adaptively disturb the regions. To maintain high-quality adversarial face images, we further propose a minimum brightness offsets method specifically designed to generate perturbations that minimize the feature embedding difference between the adversarial and targeted face images. The experimental results show that our approach outperforms state-of-the-art FR attack methods by an average of 21.7% in terms of attack success rate, while achieving better image quality on seven FR models.
{"title":"PPOM-Attack: A Substitute Model-Free Perturbation Prediction and Optimization Method for Black-Box Adversarial Attack Against Face Recognition","authors":"Ke Cheng;Jixin Zhang;Haiyun Li;Zipeng Zhong;Mingwu Zhang;Zheng Qin","doi":"10.1109/TIFS.2026.3666853","DOIUrl":"https://doi.org/10.1109/TIFS.2026.3666853","url":null,"abstract":"Face recognition (FR) brings convenience to people’s lives while also posing security risks. Some malicious users employ FR attacks to impersonate the identity of a target. To reveal the security risks, recent work has attacked black-box FR models by utilizing substitute models to generate adversarial face images that are misclassified as the target individual due to the attack transferability of substitute models. However, the substitute models cannot accurately approximate the target model that leads to a decrease in FR attack success rate and adversarial face image quality. To address the issue, we propose the <sc>PPOM-Attack</small>, a substitute model-free Perturbation Prediction and Optimization Method for black-box adversarial Attack against face recognition. <sc>PPOM-Attack</small> directly obtains feedback from the target model instead of using substitute models, it avoids any discrepancy with the attack objective. To achieve this goal, we design a proximal policy optimization (PPO)-based agent to predict the perturbation regions in the face image and self-adaptively disturb the regions. To maintain high-quality adversarial face images, we further propose a minimum brightness offsets method specifically designed to generate perturbations that minimize the feature embedding difference between the adversarial and targeted face images. The experimental results show that our approach outperforms state-of-the-art FR attack methods by an average of 21.7% in terms of attack success rate, while achieving better image quality on seven FR models.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"21 ","pages":"2580-2595"},"PeriodicalIF":8.0,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147440514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-19DOI: 10.1109/TIFS.2026.3666295
Lei Zhou;Youwen Zhu;Rongke Liu
In response to emerging regulations on the “right to be forgotten”, federated unlearning (FU) has been proposed to ensure privacy compliance by efficiently eliminating the influence of specific data from federated learning (FL) models. However, existing FU studies primarily focus on improving unlearning efficiency, with little attention given to the potential privacy risks introduced by FU itself. To bridge this research gap, we propose a novel federated unlearning inversion attack (FUIA) to expose potential privacy leakage in FU. This work represents the first systematic study on the privacy vulnerabilities inherent in FU. FUIA can be applied to three major FU scenarios: sample unlearning, client unlearning, and class unlearning, demonstrating broad applicability and threat potential. Specifically, the server, acting as an honest-but-curious attacker, continuously records model parameter changes throughout the unlearning process and analyzes the differences before and after unlearning to infer the gradient information of forgotten data, enabling the reconstruction of its features or labels. FUIA directly undermines the goal of FU to eliminate the influence of specific data, exploiting vulnerabilities in the FU process to reconstruct forgotten data, thereby revealing flaws in privacy protection. Moreover, we explore two potential defense strategies that introduce a trade-off between privacy protection and model performance. Extensive experiments on multiple benchmark datasets and various FU methods demonstrate that FUIA effectively reveals private information of forgotten data.
{"title":"Model Inversion Attack Against Federated Unlearning","authors":"Lei Zhou;Youwen Zhu;Rongke Liu","doi":"10.1109/TIFS.2026.3666295","DOIUrl":"10.1109/TIFS.2026.3666295","url":null,"abstract":"In response to emerging regulations on the “right to be forgotten”, federated unlearning (FU) has been proposed to ensure privacy compliance by efficiently eliminating the influence of specific data from federated learning (FL) models. However, existing FU studies primarily focus on improving unlearning efficiency, with little attention given to the potential privacy risks introduced by FU itself. To bridge this research gap, we propose a novel federated unlearning inversion attack (FUIA) to expose potential privacy leakage in FU. This work represents the first systematic study on the privacy vulnerabilities inherent in FU. FUIA can be applied to three major FU scenarios: sample unlearning, client unlearning, and class unlearning, demonstrating broad applicability and threat potential. Specifically, the server, acting as an honest-but-curious attacker, continuously records model parameter changes throughout the unlearning process and analyzes the differences before and after unlearning to infer the gradient information of forgotten data, enabling the reconstruction of its features or labels. FUIA directly undermines the goal of FU to eliminate the influence of specific data, exploiting vulnerabilities in the FU process to reconstruct forgotten data, thereby revealing flaws in privacy protection. Moreover, we explore two potential defense strategies that introduce a trade-off between privacy protection and model performance. Extensive experiments on multiple benchmark datasets and various FU methods demonstrate that FUIA effectively reveals private information of forgotten data.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"21 ","pages":"2342-2357"},"PeriodicalIF":8.0,"publicationDate":"2026-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146231164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-19DOI: 10.1109/TIFS.2026.3666459
Senming Yan;Lei Shi;Wei Wang;Jing Ren;Ying Li;Limin Sun
Guided by the principle of “Never Trust, Always Verify”, Zero Trust Architecture (ZTA) mandates continuous monitoring and analysis of users and entities, highlighting the critical role of behavior analytics. However, the growing volume of audit data and its complex contextual information render many existing behavior analytics methods insufficient. Moreover, most approaches rely on high-quality labeled data for supervised training, limiting their effectiveness against previously unseen malicious behaviors. To address these challenges, we propose the Large Language Model for Behavior Analytics (LLMBA) framework. LLMBA leverages a Large Language Model (LLM) to analyze behavioral patterns of internal users and entities, capitalizing on the LLM’s strong ability to model sequential data. We introduce a multi-level behavior encoding scheme to capture both contextual and temporal information from behavior records, producing rich input representations for the LLM-enhanced model. The LLM is fine-tuned using self-supervised learning, enabling the detection of unknown malicious behaviors. To reduce the computational and storage overhead inherent in LLMs, we apply knowledge distillation to compress the model while maintaining high detection performance. Extensive experiments on the CERT Insider Threat dataset demonstrate that LLMBA outperforms state-of-the-art baselines in detection accuracy. Furthermore, the compressed student model achieves superior performance compared with existing methods under comparable runtime constraints, making LLMBA highly suitable for real-world deployment.
{"title":"LLMBA: Efficient Behavior Analytics via Large Pretrained Models in Zero Trust Networks","authors":"Senming Yan;Lei Shi;Wei Wang;Jing Ren;Ying Li;Limin Sun","doi":"10.1109/TIFS.2026.3666459","DOIUrl":"10.1109/TIFS.2026.3666459","url":null,"abstract":"Guided by the principle of “Never Trust, Always Verify”, Zero Trust Architecture (ZTA) mandates continuous monitoring and analysis of users and entities, highlighting the critical role of behavior analytics. However, the growing volume of audit data and its complex contextual information render many existing behavior analytics methods insufficient. Moreover, most approaches rely on high-quality labeled data for supervised training, limiting their effectiveness against previously unseen malicious behaviors. To address these challenges, we propose the Large Language Model for Behavior Analytics (LLMBA) framework. LLMBA leverages a Large Language Model (LLM) to analyze behavioral patterns of internal users and entities, capitalizing on the LLM’s strong ability to model sequential data. We introduce a multi-level behavior encoding scheme to capture both contextual and temporal information from behavior records, producing rich input representations for the LLM-enhanced model. The LLM is fine-tuned using self-supervised learning, enabling the detection of unknown malicious behaviors. To reduce the computational and storage overhead inherent in LLMs, we apply knowledge distillation to compress the model while maintaining high detection performance. Extensive experiments on the CERT Insider Threat dataset demonstrate that LLMBA outperforms state-of-the-art baselines in detection accuracy. Furthermore, the compressed student model achieves superior performance compared with existing methods under comparable runtime constraints, making LLMBA highly suitable for real-world deployment.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"21 ","pages":"2403-2415"},"PeriodicalIF":8.0,"publicationDate":"2026-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146231022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-19DOI: 10.1109/TIFS.2026.3666307
Xin Cheng;Hao Wang;Xiangyang Luo;Bin Ma;Baowei Wang;Bin Li;Jinwei Wang
The quantization step is a crucial parameter in the JPEG compression process, and provides prior knowledge for JPEG image steganography and forensics. Existing neural network-based methods typically estimate the quantization steps for all discrete cosine transform (DCT) subbands jointly, by treating the entire quantization table as a unified input and leveraging the inter-subband relationships. However, subband relationships vary across different quantization tables, leading to poor generalization for methods that rely heavily on such relationships. To address the above issues, we depart from the strategy that relies on inter-subband relationships and instead train the model on a specific single subband. To compensate for the possible decrease in accuracy due to the lack of relationships between subbands, we extract the ranking features and histogram features from the DCT coefficient histograms of the subbands. Ranking features capture local patterns in DCT histograms by modeling the relative relationships between neighboring coefficients, thereby compensating for the absence of local detail. On the other hand, histogram features represent the overall distribution pattern of the DCT coefficient histograms and capture the global trends and statistical properties in the subbands. We subsequently employ convolutional groups and multilayer perceptron (MLP) structures to extract compression artifacts from these two features. Finally, we introduce a comprehensive evaluation metric, called GenAQt, to quantify the algorithm’s generalization ability across quantization tables. The experimental results demonstrate that our method maintains high accuracy across quantization tables, with RelGenAQt (relative accuracy decrease) exceeding 81% and AbsGenAQt (absolute accuracy decrease) being less than 0.38.
{"title":"Rethinking Cross-Table Quantization Step Estimation: From Global and Local Perspectives","authors":"Xin Cheng;Hao Wang;Xiangyang Luo;Bin Ma;Baowei Wang;Bin Li;Jinwei Wang","doi":"10.1109/TIFS.2026.3666307","DOIUrl":"10.1109/TIFS.2026.3666307","url":null,"abstract":"The quantization step is a crucial parameter in the JPEG compression process, and provides prior knowledge for JPEG image steganography and forensics. Existing neural network-based methods typically estimate the quantization steps for all discrete cosine transform (DCT) subbands jointly, by treating the entire quantization table as a unified input and leveraging the inter-subband relationships. However, subband relationships vary across different quantization tables, leading to poor generalization for methods that rely heavily on such relationships. To address the above issues, we depart from the strategy that relies on inter-subband relationships and instead train the model on a specific single subband. To compensate for the possible decrease in accuracy due to the lack of relationships between subbands, we extract the ranking features and histogram features from the DCT coefficient histograms of the subbands. Ranking features capture local patterns in DCT histograms by modeling the relative relationships between neighboring coefficients, thereby compensating for the absence of local detail. On the other hand, histogram features represent the overall distribution pattern of the DCT coefficient histograms and capture the global trends and statistical properties in the subbands. We subsequently employ convolutional groups and multilayer perceptron (MLP) structures to extract compression artifacts from these two features. Finally, we introduce a comprehensive evaluation metric, called GenAQt, to quantify the algorithm’s generalization ability across quantization tables. The experimental results demonstrate that our method maintains high accuracy across quantization tables, with RelGenAQt (relative accuracy decrease) exceeding 81% and AbsGenAQt (absolute accuracy decrease) being less than 0.38.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"21 ","pages":"2326-2341"},"PeriodicalIF":8.0,"publicationDate":"2026-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146231018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}