首页 > 最新文献

IEEE Signal Processing Letters最新文献

英文 中文
MLLM-TA: Leveraging Multimodal Large Language Models for Precise Temporal Video Grounding MLLM-TA:利用多模态大语言模型进行精确的时间视频接地
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-12-04 DOI: 10.1109/LSP.2024.3511426
Yi Liu;Haowen Hou;Fei Ma;Shiguang Ni;Fei Richard Yu
In untrimmed video tasks, identifying temporal boundaries in videos is crucial for temporal video grounding. With the emergence of multimodal large language models (MLLMs), recent studies have focused on endowing these models with the capability of temporal perception in untrimmed videos. To address the challenge, in this paper, we introduce a multimodal large language model named MLLM-TA with precise temporal perception to obtain temporal attention. Unlike the traditional MLLMs, answering temporal questions through one or two words related to temporal information, we leverage the text description proficiency of MLLMs to acquire video temporal attention with description. Specifically, we design a dual temporal-aware generative branches aimed at the visual space of the entire video and the textual space of global descriptions, simultaneously generating mutually supervised consistent temporal attention, thereby enhancing the video temporal perception capabilities of MLLMs. Finally, we evaluate our approach on both video grounding task and highlight detection task on three popular benchmarks, including Charades-STA, ActivityNet Captions and QVHighlights. The extensive results show that our MLLM-TA significantly outperforms previous approaches both on zero-shot and supervised setting, achieving state-of-the-art performance.
在未修剪的视频任务中,识别视频的时间边界是视频时间接地的关键。随着多模态大语言模型(mllm)的出现,近年来的研究主要集中在赋予这些模型在未修剪视频中的时间感知能力。为了解决这一挑战,本文引入了一种具有精确时间感知的多模态大语言模型MLLM-TA来获得时间注意力。与传统mllm通过与时间信息相关的一两个词来回答时间问题不同,我们利用mllm的文本描述熟练度,通过描述来获取视频的时间注意力。具体而言,我们针对整个视频的视觉空间和全局描述的文本空间设计了双时间感知的生成分支,同时生成相互监督的一致时间关注,从而增强了mllm的视频时间感知能力。最后,我们在三个流行的基准测试上评估了我们在视频接地任务和突出显示检测任务上的方法,包括Charades-STA, ActivityNet Captions和QVHighlights。广泛的结果表明,我们的MLLM-TA在零射击和监督设置上都明显优于以前的方法,达到了最先进的性能。
{"title":"MLLM-TA: Leveraging Multimodal Large Language Models for Precise Temporal Video Grounding","authors":"Yi Liu;Haowen Hou;Fei Ma;Shiguang Ni;Fei Richard Yu","doi":"10.1109/LSP.2024.3511426","DOIUrl":"https://doi.org/10.1109/LSP.2024.3511426","url":null,"abstract":"In untrimmed video tasks, identifying temporal boundaries in videos is crucial for temporal video grounding. With the emergence of multimodal large language models (MLLMs), recent studies have focused on endowing these models with the capability of temporal perception in untrimmed videos. To address the challenge, in this paper, we introduce a multimodal large language model named MLLM-TA with precise temporal perception to obtain temporal attention. Unlike the traditional MLLMs, answering temporal questions through one or two words related to temporal information, we leverage the text description proficiency of MLLMs to acquire video temporal attention with description. Specifically, we design a dual temporal-aware generative branches aimed at the visual space of the entire video and the textual space of global descriptions, simultaneously generating mutually supervised consistent temporal attention, thereby enhancing the video temporal perception capabilities of MLLMs. Finally, we evaluate our approach on both video grounding task and highlight detection task on three popular benchmarks, including Charades-STA, ActivityNet Captions and QVHighlights. The extensive results show that our MLLM-TA significantly outperforms previous approaches both on zero-shot and supervised setting, achieving state-of-the-art performance.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"281-285"},"PeriodicalIF":3.2,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142890394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Beyond Diagonal RIS: Key to Next-Generation Integrated Sensing and Communications? 超越对角线RIS:下一代集成传感和通信的关键?
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-12-04 DOI: 10.1109/LSP.2024.3511395
Tara Esmaeilbeig;Kumar Vijay Mishra;Mojtaba Soltanalian
Reconfigurable intelligent surfaces (RIS) offer unprecedented flexibility for smart wireless channels. Recent research shows that RIS platforms enhance signal quality, coverage, and link capacity in integrated sensing and communication (ISAC) systems. This paper explores the use of fully-connected beyond diagonal RIS (BD-RIS) in ISAC. BD-RIS provides additional degrees of freedom by allowing non-zero off-diagonal elements in the scattering matrix, enhancing functionality and performance. We aim to maximize the weighted sum of the signal-to-noise ratio (SNR) at both the radar receiver and communication users using BD-RIS. Numerical results demonstrate the advantages of BD-RIS in ISAC, significantly improving SNR for both radar and communication users.
可重构智能表面(RIS)为智能无线信道提供了前所未有的灵活性。最近的研究表明,RIS平台提高了集成传感和通信(ISAC)系统的信号质量、覆盖范围和链路容量。本文探讨了全连通超对角RIS (BD-RIS)在ISAC中的应用。BD-RIS通过允许散射矩阵中的非零非对角线元素提供额外的自由度,增强了功能和性能。我们的目标是最大化使用BD-RIS的雷达接收机和通信用户的信噪比(SNR)的加权和。数值结果证明了BD-RIS在ISAC中的优势,显著提高了雷达和通信用户的信噪比。
{"title":"Beyond Diagonal RIS: Key to Next-Generation Integrated Sensing and Communications?","authors":"Tara Esmaeilbeig;Kumar Vijay Mishra;Mojtaba Soltanalian","doi":"10.1109/LSP.2024.3511395","DOIUrl":"https://doi.org/10.1109/LSP.2024.3511395","url":null,"abstract":"Reconfigurable intelligent surfaces (RIS) offer unprecedented flexibility for smart wireless channels. Recent research shows that RIS platforms enhance signal quality, coverage, and link capacity in integrated sensing and communication (ISAC) systems. This paper explores the use of fully-connected beyond diagonal RIS (BD-RIS) in ISAC. BD-RIS provides additional degrees of freedom by allowing non-zero off-diagonal elements in the scattering matrix, enhancing functionality and performance. We aim to maximize the weighted sum of the signal-to-noise ratio (SNR) at both the radar receiver and communication users using BD-RIS. Numerical results demonstrate the advantages of BD-RIS in ISAC, significantly improving SNR for both radar and communication users.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"216-220"},"PeriodicalIF":3.2,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10777522","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142844408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FAMSeC: A Few-Shot-Sample-Based General AI-Generated Image Detection Method FAMSeC:一种基于少量采样的通用ai生成图像检测方法
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-12-04 DOI: 10.1109/LSP.2024.3511421
Juncong Xu;Yang Yang;Han Fang;Honggu Liu;Weiming Zhang
The explosive growth of generative AI has saturated the internet with AI-generated images, raising security concerns and increasing the need for reliable detection methods. The primary requirement for such detection is generalizability, typically achieved by training on numerous fake images from various models. However, practical limitations, such as closed-source models and restricted access, often result in limited training samples. Therefore, training a general detector with few-shot samples is essential for modern detection mechanisms. To address this challenge, we propose FAMSeC, a general AI-generated image detection method based on LoRA-based Forgery Awareness Module and Semantic feature-guided Contrastive learning strategy. To effectively learn from limited samples and prevent overfitting, we developed a forgery awareness module (FAM) based on LoRA, maintaining the generalization of pre-trained features. Additionally, to cooperate with FAM, we designed a semantic feature-guided contrastive learning strategy (SeC), making the FAM focus more on the differences between real/fake image than on the features of the samples themselves. Experiments show that FAMSeC outperforms state-of-the-art method, enhancing classification accuracy by 14.55% with just 0.56% of the training samples.
生成式人工智能的爆炸式增长使互联网上充斥着人工智能生成的图像,引发了安全问题,并增加了对可靠检测方法的需求。这种检测的主要要求是泛化性,通常通过对来自各种模型的大量假图像进行训练来实现。然而,实际的限制,如闭源模型和限制访问,经常导致有限的训练样本。因此,对现代检测机制来说,用少量样本训练一个通用检测器是必不可少的。为了解决这一挑战,我们提出了FAMSeC,一种基于基于lora的伪造感知模块和语义特征引导的对比学习策略的通用人工智能生成图像检测方法。为了有效地从有限的样本中学习并防止过拟合,我们开发了一个基于LoRA的伪造感知模块(FAM),以保持预训练特征的泛化。此外,为了与FAM合作,我们设计了一种语义特征引导的对比学习策略(SeC),使FAM更多地关注真假图像之间的差异,而不是样本本身的特征。实验表明,FAMSeC优于最先进的方法,仅用0.56%的训练样本就能提高14.55%的分类准确率。
{"title":"FAMSeC: A Few-Shot-Sample-Based General AI-Generated Image Detection Method","authors":"Juncong Xu;Yang Yang;Han Fang;Honggu Liu;Weiming Zhang","doi":"10.1109/LSP.2024.3511421","DOIUrl":"https://doi.org/10.1109/LSP.2024.3511421","url":null,"abstract":"The explosive growth of generative AI has saturated the internet with AI-generated images, raising security concerns and increasing the need for reliable detection methods. The primary requirement for such detection is generalizability, typically achieved by training on numerous fake images from various models. However, practical limitations, such as closed-source models and restricted access, often result in limited training samples. Therefore, training a general detector with few-shot samples is essential for modern detection mechanisms. To address this challenge, we propose FAMSeC, a general AI-generated image detection method based on LoRA-based \u0000<bold>F</b>\u0000orgery \u0000<bold>A</b>\u0000wareness \u0000<bold>M</b>\u0000odule and \u0000<bold>Se</b>\u0000mantic feature-guided \u0000<bold>C</b>\u0000ontrastive learning strategy. To effectively learn from limited samples and prevent overfitting, we developed a forgery awareness module (FAM) based on LoRA, maintaining the generalization of pre-trained features. Additionally, to cooperate with FAM, we designed a semantic feature-guided contrastive learning strategy (SeC), making the FAM focus more on the differences between real/fake image than on the features of the samples themselves. Experiments show that FAMSeC outperforms state-of-the-art method, enhancing classification accuracy by 14.55% with just 0.56% of the training samples.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"226-230"},"PeriodicalIF":3.2,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142844600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NeRF-DA: Neural Radiance Fields Deblurring With Active Learning NeRF-DA:神经辐射场去模糊与主动学习
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-12-04 DOI: 10.1109/LSP.2024.3511350
Sejun Hong;Eunwoo Kim
Neural radiance fields (NeRF) represent multi-view images as 3D scenes, achieving a photo-realistic novel view synthesis quality. However, capturing multi-view images in real-world scenarios is not well aligned and often results in blur or noise. Deblur-NeRF, which uses kernel deformation to improve sharpness, is effective but the quantity of training blur samples and imbalance significantly affect the overall results. In this study, we propose neural radiance fields deblurring with active learning (NeRF-DA), focusing on high-quality blurred images for 3D scene modeling. NeRF-DA uses pool-based active learning with uncertainty estimation to improve model efficiency with a high-quality training set. Subsequently, we deblur the data using the trained model and proceed with NeRF training by selecting the best-sharpened images for querying. Experiments on both camera motion blur and defocus blur demonstrate that NeRF-DA significantly enhances the quality of the existing Deblur-NeRF.
神经辐射场(NeRF)将多视图图像表示为3D场景,实现了逼真的新视图合成质量。然而,在真实场景中捕获的多视图图像没有很好地对齐,并且经常导致模糊或噪点。Deblur-NeRF是一种利用核变形来提高锐度的算法,虽然效果很好,但是训练模糊样本的数量和不平衡会显著影响整体效果。在这项研究中,我们提出了基于主动学习(NeRF-DA)的神经辐射场去模糊,重点是用于3D场景建模的高质量模糊图像。NeRF-DA使用基于池的主动学习和不确定性估计来提高模型效率和高质量的训练集。随后,我们使用训练好的模型对数据进行去模糊处理,并通过选择最佳锐化图像进行NeRF训练进行查询。对摄像机运动模糊和散焦模糊的实验表明,NeRF-DA算法显著提高了现有的Deblur-NeRF算法的质量。
{"title":"NeRF-DA: Neural Radiance Fields Deblurring With Active Learning","authors":"Sejun Hong;Eunwoo Kim","doi":"10.1109/LSP.2024.3511350","DOIUrl":"https://doi.org/10.1109/LSP.2024.3511350","url":null,"abstract":"Neural radiance fields (NeRF) represent multi-view images as 3D scenes, achieving a photo-realistic novel view synthesis quality. However, capturing multi-view images in real-world scenarios is not well aligned and often results in blur or noise. Deblur-NeRF, which uses kernel deformation to improve sharpness, is effective but the quantity of training blur samples and imbalance significantly affect the overall results. In this study, we propose neural radiance fields deblurring with active learning (NeRF-DA), focusing on high-quality blurred images for 3D scene modeling. NeRF-DA uses pool-based active learning with uncertainty estimation to improve model efficiency with a high-quality training set. Subsequently, we deblur the data using the trained model and proceed with NeRF training by selecting the best-sharpened images for querying. Experiments on both camera motion blur and defocus blur demonstrate that NeRF-DA significantly enhances the quality of the existing Deblur-NeRF.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"261-265"},"PeriodicalIF":3.2,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142875117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved Free-of-CPP ADMM-Based Iterative Decoding Algorithm of Binary LDPC Codes 基于改进Free-of-CPP admm的二进制LDPC码迭代译码算法
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-12-04 DOI: 10.1109/LSP.2024.3511336
Jing Bai;Zedong An;Yuhao Chi;Guanghui Song;Chau Yuen
Iterative decoding algorithms based on the alternating direction method of multipliers (ADMM) decoding of low density parity check (LDPC) codes has emerged as an alternating decoding method and bringed a boom of research on drawing upon mathematical optimization to LDPC decoding. Improving error-correcting performance is a key issue to enhance the superiority of ADMM decoding. In this letter, we investigate an efficient ADMM-based iterative decoder for binary LDPC codes. First, we build an mathematical programming equivalence of the maximum likelihood (ML) decoding problem by transforming parity-check constraints to multiple equivalent linear constraints and eliminating check-polytope projection (CPP). Then, an iterative algorithm based on ADMM technique is developed to solve this free-of-CPP (FCPP) equivalence and each ADMM update can be computed efficiently. Moreover, the proposed ADMM-FCPP decoding algorithm is analyzed to display a linear complexity to the length of the LDPC code at each iteration. Finally, simulation results demonstrate the superiority of the proposed decoder in error-correcting performance compared with the state-of-the-art ADMM-based decoders.
基于乘法器交替方向译码的迭代译码算法是低密度奇偶校验码(LDPC)译码的一种交替译码方法,并带来了将数学优化应用于LDPC译码的研究热潮。提高纠错性能是提高ADMM译码优越性的关键问题。在这封信中,我们研究了一个有效的基于admm的二进制LDPC码迭代解码器。首先,我们通过将奇偶校验约束转换为多个等价线性约束并消除校验多边形投影(CPP),建立了最大似然(ML)解码问题的数学规划等价。在此基础上,提出了一种基于ADMM技术的迭代算法来求解该FCPP等价,使每次ADMM更新都能得到有效的计算。此外,分析了ADMM-FCPP译码算法在每次迭代时显示出与LDPC码长度的线性复杂度。最后,仿真结果表明,与目前最先进的基于admm的解码器相比,该解码器在纠错性能方面具有优势。
{"title":"Improved Free-of-CPP ADMM-Based Iterative Decoding Algorithm of Binary LDPC Codes","authors":"Jing Bai;Zedong An;Yuhao Chi;Guanghui Song;Chau Yuen","doi":"10.1109/LSP.2024.3511336","DOIUrl":"https://doi.org/10.1109/LSP.2024.3511336","url":null,"abstract":"Iterative decoding algorithms based on the alternating direction method of multipliers (ADMM) decoding of low density parity check (LDPC) codes has emerged as an alternating decoding method and bringed a boom of research on drawing upon mathematical optimization to LDPC decoding. Improving error-correcting performance is a key issue to enhance the superiority of ADMM decoding. In this letter, we investigate an efficient ADMM-based iterative decoder for binary LDPC codes. First, we build an mathematical programming equivalence of the maximum likelihood (ML) decoding problem by transforming parity-check constraints to multiple equivalent linear constraints and eliminating check-polytope projection (CPP). Then, an iterative algorithm based on ADMM technique is developed to solve this free-of-CPP (FCPP) equivalence and each ADMM update can be computed efficiently. Moreover, the proposed ADMM-FCPP decoding algorithm is analyzed to display a linear complexity to the length of the LDPC code at each iteration. Finally, simulation results demonstrate the superiority of the proposed decoder in error-correcting performance compared with the state-of-the-art ADMM-based decoders.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"211-215"},"PeriodicalIF":3.2,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142844463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-Accuracy DOA Estimation for Non-Collinear Sparse Uniform Array 非共线稀疏均匀阵列的高精度方位估计
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-12-03 DOI: 10.1109/LSP.2024.3510462
Hongyong Wang;Xiaolong Chen;Weibo Deng;Caisheng Zhang;Yonghua Xue
Conventional sparse uniform arrays (SUAs) is composed of multiple identical and rigorously collinear uniform linear arrays. By adjusting the baseline length between the subarrays, the array aperture can be arbitrarily large, thus substantially improving the accuracy of the direction-of-arrival (DOA) estimation. However, in practical applications, it is challenging to meet the strict collinearity requirement due to geographical constraints. In this letter, to address this problem, we propose the non-collinear sparse uniform array (NCSUA) model to mitigate the influence of the non-ideal terrain and enhance the practicality of the SUA. A novel estimation algorithm is then proposed to resolve the angle ambiguity in NCSUA and effectively achieve high-accuracy DOA estimation. Compared with the conventional SUA, numerical simulation results demonstrate the superiority of NCSUA employing the new de-ambiguity algorithm in DOA estimation performance and practical applications.
传统的稀疏均匀阵列是由多个相同的严格共线均匀线性阵列组成的。通过调整子阵列之间的基线长度,阵列孔径可以任意增大,从而大大提高了到达方向(DOA)估计的精度。然而,在实际应用中,由于地理限制,要满足严格的共线性要求是一项挑战。为了解决这一问题,本文提出了非共线稀疏均匀阵列(NCSUA)模型,以减轻非理想地形的影响,提高该模型的实用性。然后提出了一种新的估计算法来解决NCSUA中的角度模糊问题,有效地实现了高精度的DOA估计。数值仿真结果表明,采用该消模糊算法的NCSUA在DOA估计性能和实际应用方面均优于传统的SUA。
{"title":"High-Accuracy DOA Estimation for Non-Collinear Sparse Uniform Array","authors":"Hongyong Wang;Xiaolong Chen;Weibo Deng;Caisheng Zhang;Yonghua Xue","doi":"10.1109/LSP.2024.3510462","DOIUrl":"https://doi.org/10.1109/LSP.2024.3510462","url":null,"abstract":"Conventional sparse uniform arrays (SUAs) is composed of multiple identical and rigorously collinear uniform linear arrays. By adjusting the baseline length between the subarrays, the array aperture can be arbitrarily large, thus substantially improving the accuracy of the direction-of-arrival (DOA) estimation. However, in practical applications, it is challenging to meet the strict collinearity requirement due to geographical constraints. In this letter, to address this problem, we propose the non-collinear sparse uniform array (NCSUA) model to mitigate the influence of the non-ideal terrain and enhance the practicality of the SUA. A novel estimation algorithm is then proposed to resolve the angle ambiguity in NCSUA and effectively achieve high-accuracy DOA estimation. Compared with the conventional SUA, numerical simulation results demonstrate the superiority of NCSUA employing the new de-ambiguity algorithm in DOA estimation performance and practical applications.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"206-210"},"PeriodicalIF":3.2,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142844602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
General Steganalysis of Generative Linguistic Steganography Based on Dynamic Segment-Level Lexical Association Extraction 基于动态词段级词汇关联提取的生成语言隐写通用隐写分析
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-12-02 DOI: 10.1109/LSP.2024.3510457
Songbin Li;Hui Du;Jingang Wang
In scenarios where steganographic texts from various steganographic domains generated by different generative steganography algorithms are mixed, most existing linguistic steganalysis methods lack corresponding network structures designed to account for the differences in steganographic texts from different domains, leading to the potential for further improvement in their general detection performance. To address the above issue, we propose a general generative linguistic steganalysis method based on the basic idea of dynamically extracting lexical association features of different steganographic domains at the segment level. We utilize dynamic-static text feature matrix to construct a word importance semantic encoding module to mine steganography-sensitive word features of different steganographic domains. Based on the obtained features, we propose a word correlation multi-scale perception module to focus on the segment-level lexical association changes caused by secret information embedding in different domains. Experimental results show that this method can improve the detection accuracy of existing mainstream linguistic steganalysis methods in various mixed steganography scenarios.
在由不同生成隐写算法生成的不同隐写域的隐写文本混合的情况下,大多数现有的语言隐写分析方法缺乏相应的网络结构来考虑来自不同域的隐写文本的差异,从而导致其总体检测性能的进一步提高。为了解决上述问题,我们提出了一种基于分段级动态提取不同隐写域词汇关联特征的通用生成语言隐写方法。​基于所获得的特征,我们提出了一个词相关多尺度感知模块,以关注由于在不同领域中嵌入秘密信息而引起的词段级词汇关联变化。实验结果表明,在各种混合隐写场景下,该方法可以提高现有主流语言隐写方法的检测精度。
{"title":"General Steganalysis of Generative Linguistic Steganography Based on Dynamic Segment-Level Lexical Association Extraction","authors":"Songbin Li;Hui Du;Jingang Wang","doi":"10.1109/LSP.2024.3510457","DOIUrl":"https://doi.org/10.1109/LSP.2024.3510457","url":null,"abstract":"In scenarios where steganographic texts from various steganographic domains generated by different generative steganography algorithms are mixed, most existing linguistic steganalysis methods lack corresponding network structures designed to account for the differences in steganographic texts from different domains, leading to the potential for further improvement in their general detection performance. To address the above issue, we propose a general generative linguistic steganalysis method based on the basic idea of dynamically extracting lexical association features of different steganographic domains at the segment level. We utilize dynamic-static text feature matrix to construct a word importance semantic encoding module to mine steganography-sensitive word features of different steganographic domains. Based on the obtained features, we propose a word correlation multi-scale perception module to focus on the segment-level lexical association changes caused by secret information embedding in different domains. Experimental results show that this method can improve the detection accuracy of existing mainstream linguistic steganalysis methods in various mixed steganography scenarios.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"191-195"},"PeriodicalIF":3.2,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142844511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scene Inference Using Saliency Graphs With Trust-Theoretic Semantic Information Encoding 基于信任理论语义信息编码的显著图场景推理
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-11-28 DOI: 10.1109/LSP.2024.3508538
Preeti Meena;Himanshu Kumar;Sandeep Yadav
Scene inference refers to the identification of the scene from a given set of scene representations such as images. A saliency graph of a scene contains scene-defining objects along with semantic information between them in the graph representation. Existing methods consider the entire scene with uniform weighting to all semantic information for scene inference, resulting in a suboptimal performance. This letter presents an optimal edge weight estimation using the trust theoretic framework to encode semantic information effectively in saliency graphs. We have utilized the notion of converged global absolute trust in saliency scores of salient objects to compute the weighting of semantic information. Experimental results highlight the efficacy of the proposed method.
场景推理是指从一组给定的场景表示(如图像)中识别场景。场景的显著性图包含场景定义对象,以及在图表示中它们之间的语义信息。现有方法考虑整个场景,对所有语义信息进行统一加权进行场景推理,导致性能不理想。本文提出了一种利用信任理论框架对显著图的语义信息进行有效编码的最优边权估计方法。我们利用显著性对象的显著性分数的收敛全局绝对信任的概念来计算语义信息的权重。实验结果表明了该方法的有效性。
{"title":"Scene Inference Using Saliency Graphs With Trust-Theoretic Semantic Information Encoding","authors":"Preeti Meena;Himanshu Kumar;Sandeep Yadav","doi":"10.1109/LSP.2024.3508538","DOIUrl":"https://doi.org/10.1109/LSP.2024.3508538","url":null,"abstract":"Scene inference refers to the identification of the scene from a given set of scene representations such as images. A saliency graph of a scene contains scene-defining objects along with semantic information between them in the graph representation. Existing methods consider the entire scene with uniform weighting to all semantic information for scene inference, resulting in a suboptimal performance. This letter presents an optimal edge weight estimation using the trust theoretic framework to encode semantic information effectively in saliency graphs. We have utilized the notion of converged global absolute trust in saliency scores of salient objects to compute the weighting of semantic information. Experimental results highlight the efficacy of the proposed method.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"256-260"},"PeriodicalIF":3.2,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142875013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Onset-and-Offset-Aware Sound Event Detection via Differentiable Frame-to-Event Mapping 通过可微分的帧到事件映射来识别启动和偏移的声音事件检测
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-11-28 DOI: 10.1109/LSP.2024.3509336
Tomoya Yoshinaga;Keitaro Tanaka;Yoshiaki Bando;Keisuke Imoto;Shigeo Morishima
This paper presents a sound event detection (SED) method that handles sound event boundaries in a statistically principled manner. A typical approach to SED is to train a deep neural network (DNN) in a supervised manner such that the model predicts frame-wise event activities. Since the predicted activities often contain fine insertion and deletion errors due to their temporal fluctuations, post-processing has been applied to obtain more accurate onset and offset boundaries. Existing post-processing methods are, however, non-differentiable and prohibit end-to-end (E2E) training. In this paper, we propose an E2E detection method based on a probabilistic formulation of sound event sequences called a hidden semi-Markov model (HSMM). The HSMM is utilized to transform frame-wise features predicted by a DNN into posterior probabilities of sound events represented by their class labels and temporal boundaries. We jointly train the DNN and HSMM in a supervised E2E manner by maximizing the event-wise posterior probabilities of the HSMM. This objective is a differentiable function thanks to the forward-backward algorithm of the HSMM. Experimental results with real recordings show that our method outperforms baseline systems with standard post-processing methods.
本文提出了一种声事件检测方法,该方法以统计原则的方式处理声事件边界。SED的一种典型方法是以监督的方式训练深度神经网络(DNN),使模型能够预测基于帧的事件活动。由于预测的活动往往由于其时间波动而包含细微的插入和删除错误,因此采用后处理来获得更准确的起始和偏移边界。然而,现有的后处理方法是不可微分的,并且禁止端到端(E2E)训练。在本文中,我们提出了一种基于声音事件序列的概率公式的E2E检测方法,称为隐藏半马尔可夫模型(HSMM)。HSMM用于将DNN预测的逐帧特征转换为由其类标签和时间边界表示的声音事件的后验概率。我们通过最大化HSMM的事件后验概率,以有监督的E2E方式联合训练DNN和HSMM。由于HSMM的前向后算法,该目标是一个可微函数。真实记录的实验结果表明,我们的方法优于采用标准后处理方法的基线系统。
{"title":"Onset-and-Offset-Aware Sound Event Detection via Differentiable Frame-to-Event Mapping","authors":"Tomoya Yoshinaga;Keitaro Tanaka;Yoshiaki Bando;Keisuke Imoto;Shigeo Morishima","doi":"10.1109/LSP.2024.3509336","DOIUrl":"https://doi.org/10.1109/LSP.2024.3509336","url":null,"abstract":"This paper presents a sound event detection (SED) method that handles sound event boundaries in a statistically principled manner. A typical approach to SED is to train a deep neural network (DNN) in a supervised manner such that the model predicts frame-wise event activities. Since the predicted activities often contain fine insertion and deletion errors due to their temporal fluctuations, post-processing has been applied to obtain more accurate onset and offset boundaries. Existing post-processing methods are, however, non-differentiable and prohibit end-to-end (E2E) training. In this paper, we propose an E2E detection method based on a probabilistic formulation of sound event sequences called a hidden semi-Markov model (HSMM). The HSMM is utilized to transform frame-wise features predicted by a DNN into posterior probabilities of sound events represented by their class labels and temporal boundaries. We jointly train the DNN and HSMM in a supervised E2E manner by maximizing the event-wise posterior probabilities of the HSMM. This objective is a differentiable function thanks to the forward-backward algorithm of the HSMM. Experimental results with real recordings show that our method outperforms baseline systems with standard post-processing methods.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"186-190"},"PeriodicalIF":3.2,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10771642","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142844359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Dual-Branch Multidomain Feature Fusion Network for Axial Super-Resolution in Optical Coherence Tomography 用于轴向超分辨率光学相干层析成像的双分支多域特征融合网络
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-11-28 DOI: 10.1109/LSP.2024.3509337
Quanqing Xu;Xiang He;Muhao Xu;Kaixuan Hu;Weiye Song
High-resolution retinal optical coherence tomography(OCT) images are crucial for the diagnosis of numerous retinal diseases, but images acquired by narrow bandwidth OCT devices suffer from axial resolution degradation and are difficult to support disease diagnosis. Deep learning-based methods can enhance the axial resolution of OCT images, but most methods focus on improving the model architecture, the potential of fully exploiting the fusion of spatial and frequency domain information for image reconstruction has not been fully explored. This paper proposes a Dual-branch Multidomain Feature Fusion Network (MDFNet). The core module of the model consists of a parallel Enhanced Multi-scale Spatial Feature module and an Auxiliary Frequcy Feature module to provide non-interfering dual-domain feature information to improve the reconstruction effect. MDFNet achieved the best performance in the tests of mouse retina and human retina datasets, outperforming the state-of-the-art (SOTA) algorithms by 0.11 dB and 0.18 dB respectively. In addition, the results of this method performed best in the retinal layer segmentation test.
高分辨率视网膜光学相干断层扫描(OCT)图像对许多视网膜疾病的诊断至关重要,但通过窄带宽OCT设备获得的图像存在轴向分辨率下降,难以支持疾病诊断。基于深度学习的方法可以提高OCT图像的轴向分辨率,但大多数方法都侧重于改进模型架构,充分利用空间和频域信息融合进行图像重建的潜力尚未得到充分挖掘。提出了一种双分支多域特征融合网络(MDFNet)。该模型的核心模块由并行的增强多尺度空间特征模块和辅助频率特征模块组成,提供互不干扰的双域特征信息,提高重建效果。MDFNet在小鼠视网膜和人类视网膜数据集的测试中取得了最好的性能,分别比最先进的(SOTA)算法高0.11 dB和0.18 dB。此外,该方法在视网膜层分割测试中表现最好。
{"title":"A Dual-Branch Multidomain Feature Fusion Network for Axial Super-Resolution in Optical Coherence Tomography","authors":"Quanqing Xu;Xiang He;Muhao Xu;Kaixuan Hu;Weiye Song","doi":"10.1109/LSP.2024.3509337","DOIUrl":"https://doi.org/10.1109/LSP.2024.3509337","url":null,"abstract":"High-resolution retinal optical coherence tomography(OCT) images are crucial for the diagnosis of numerous retinal diseases, but images acquired by narrow bandwidth OCT devices suffer from axial resolution degradation and are difficult to support disease diagnosis. Deep learning-based methods can enhance the axial resolution of OCT images, but most methods focus on improving the model architecture, the potential of fully exploiting the fusion of spatial and frequency domain information for image reconstruction has not been fully explored. This paper proposes a Dual-branch Multidomain Feature Fusion Network (MDFNet). The core module of the model consists of a parallel Enhanced Multi-scale Spatial Feature module and an Auxiliary Frequcy Feature module to provide non-interfering dual-domain feature information to improve the reconstruction effect. MDFNet achieved the best performance in the tests of mouse retina and human retina datasets, outperforming the state-of-the-art (SOTA) algorithms by 0.11 dB and 0.18 dB respectively. In addition, the results of this method performed best in the retinal layer segmentation test.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"461-465"},"PeriodicalIF":3.2,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142938250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Signal Processing Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1