首页 > 最新文献

Journal of Visual Communication and Image Representation最新文献

英文 中文
Image cropping based on order learning 基于顺序学习的图像裁剪
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-01 DOI: 10.1016/j.jvcir.2024.104253

A novel approach to image cropping, called crop region comparator (CRC), is proposed in this paper, which learns ordering relationships between aesthetic qualities of different crop regions. CRC employs the single-region refinement (SR) module and the inter-region correlation (IC) module. First, we design the SR module to identify essential information in an original image and consider the composition of each crop candidate. Thus, the SR module helps CRC to adaptively find the best crop region according to the essential information. Second, we develop the IC module, which aggregates the information across two crop candidates to analyze their differences effectively and estimate their ordering relationship reliably. Then, we decide the crop region based on the relative aesthetic scores of all crop candidates, computed by comparing them in a pairwise manner. Extensive experimental results demonstrate that the proposed CRC algorithm outperforms existing image cropping techniques on various datasets.

本文提出了一种新颖的图像裁剪方法,称为裁剪区域比较器(CRC),它可以学习不同裁剪区域美学品质之间的排序关系。CRC 采用单区域细化(SR)模块和区域间关联(IC)模块。首先,我们设计了 SR 模块来识别原始图像中的基本信息,并考虑每个候选作物的构成。因此,SR 模块可帮助 CRC 根据基本信息自适应地找到最佳作物区域。其次,我们开发了 IC 模块,该模块汇总两个候选作物的信息,以有效分析它们之间的差异,并可靠地估计它们之间的排序关系。然后,我们根据所有候选作物的相对美学分数来决定作物区域,该分数是通过成对比较的方式计算得出的。广泛的实验结果表明,在各种数据集上,所提出的 CRC 算法优于现有的图像裁剪技术。
{"title":"Image cropping based on order learning","authors":"","doi":"10.1016/j.jvcir.2024.104253","DOIUrl":"10.1016/j.jvcir.2024.104253","url":null,"abstract":"<div><p>A novel approach to image cropping, called crop region comparator (CRC), is proposed in this paper, which learns ordering relationships between aesthetic qualities of different crop regions. CRC employs the single-region refinement (SR) module and the inter-region correlation (IC) module. First, we design the SR module to identify essential information in an original image and consider the composition of each crop candidate. Thus, the SR module helps CRC to adaptively find the best crop region according to the essential information. Second, we develop the IC module, which aggregates the information across two crop candidates to analyze their differences effectively and estimate their ordering relationship reliably. Then, we decide the crop region based on the relative aesthetic scores of all crop candidates, computed by comparing them in a pairwise manner. Extensive experimental results demonstrate that the proposed CRC algorithm outperforms existing image cropping techniques on various datasets.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142011686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Blind omnidirectional image quality assessment based on semantic information replenishment 基于语义信息补充的盲全向图像质量评估
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-01 DOI: 10.1016/j.jvcir.2024.104241

Blind Omnidirectional Image Quality Assessment (BOIQA) is of great significance to the development of the immersive media technology. Most BOIQA metrics are achieved by projecting the raw spherical OIs into other plane spaces for better feature representation. For these metrics, the semantic information at the shared boundaries of the adjacent views are prone to be destroyed, hindering the semantic understanding and further performance improvement. To tackle this problem, we propose a lightweight but effective BOIQA metric via replenishing the damaged semantic information. Specifically, a multi-stream semantic information replenishment module is constructed by the multi-scale feature representation, which is designed to restore the destroyed semantic information from two adjacent views. For module learning, more than 20,000 image triplets are further built. Then, the restored features are integrated for the final quality prediction. To testify the effectiveness of the proposed method, extensive experiments are conducted on the public OIQA databases, and the results prove the superior performance of the proposed method.

盲全方位图像质量评估(BOIQA)对沉浸式媒体技术的发展具有重要意义。大多数 BOIQA 指标都是通过将原始球面全向图像投影到其他平面空间以获得更好的特征表示来实现的。对于这些指标来说,相邻视图共享边界处的语义信息很容易被破坏,从而阻碍了语义理解和性能的进一步提高。为了解决这个问题,我们提出了一种轻量级但有效的 BOIQA 指标,通过补充被破坏的语义信息来解决这个问题。具体来说,我们通过多尺度特征表示构建了一个多流语义信息补充模块,旨在从相邻的两个视图中恢复被破坏的语义信息。为了进行模块学习,进一步构建了 20,000 多个图像三元组。然后,对恢复的特征进行整合,以进行最终的质量预测。为了验证所提方法的有效性,我们在公开的 OIQA 数据库中进行了大量实验,结果证明了所提方法的优越性能。
{"title":"Blind omnidirectional image quality assessment based on semantic information replenishment","authors":"","doi":"10.1016/j.jvcir.2024.104241","DOIUrl":"10.1016/j.jvcir.2024.104241","url":null,"abstract":"<div><p>Blind Omnidirectional Image Quality Assessment (BOIQA) is of great significance to the development of the immersive media technology. Most BOIQA metrics are achieved by projecting the raw spherical OIs into other plane spaces for better feature representation. For these metrics, the semantic information at the shared boundaries of the adjacent views are prone to be destroyed, hindering the semantic understanding and further performance improvement. To tackle this problem, we propose a lightweight but effective BOIQA metric via replenishing the damaged semantic information. Specifically, a multi-stream semantic information replenishment module is constructed by the multi-scale feature representation, which is designed to restore the destroyed semantic information from two adjacent views. For module learning, more than 20,000 image triplets are further built. Then, the restored features are integrated for the final quality prediction. To testify the effectiveness of the proposed method, extensive experiments are conducted on the public OIQA databases, and the results prove the superior performance of the proposed method.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141782635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advancements in low light image enhancement techniques and recent applications 弱光图像增强技术的进步和最新应用
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-01 DOI: 10.1016/j.jvcir.2024.104223

Low-light image enhancement is an effective solution for improving image recognition by both humans and machines. Due to low illuminance, images captured in such conditions possess less color information compared to those taken in daylight, resulting in occluded images characterized by distortion, low contrast, low brightness, a narrow gray range, and noise. Low-light image enhancement techniques play a crucial role in enhancing the effectiveness of object detection. This paper reviews state-of-the-art low-light image enhancement techniques and their developments in recent years. Techniques such as gray transformation, histogram equalization, defogging, Retinex, image fusion, and wavelet transformation are examined, focusing on their working principles and assessing their ability to improve image quality. Further discussion addresses the contributions of deep learning and cognitive approaches, including attention mechanisms and adversarial methods, to image enhancement.

低照度图像增强是提高人类和机器图像识别能力的有效解决方案。由于照度较低,在这种条件下拍摄的图像与在日光下拍摄的图像相比,色彩信息较少,因此拍摄出的图像具有失真、对比度低、亮度低、灰度范围窄和噪点多等特点。低照度图像增强技术在提高物体检测效果方面起着至关重要的作用。本文回顾了最先进的低照度图像增强技术及其近年来的发展。本文研究了灰度变换、直方图均衡、去雾、Retinex、图像融合和小波变换等技术,重点探讨了这些技术的工作原理,并评估了它们提高图像质量的能力。此外,还讨论了深度学习和认知方法(包括注意力机制和对抗方法)对图像增强的贡献。
{"title":"Advancements in low light image enhancement techniques and recent applications","authors":"","doi":"10.1016/j.jvcir.2024.104223","DOIUrl":"10.1016/j.jvcir.2024.104223","url":null,"abstract":"<div><p>Low-light image enhancement is an effective solution for improving image recognition by both humans and machines. Due to low illuminance, images captured in such conditions possess less color information compared to those taken in daylight, resulting in occluded images characterized by distortion, low contrast, low brightness, a narrow gray range, and noise. Low-light image enhancement techniques play a crucial role in enhancing the effectiveness of object detection. This paper reviews state-of-the-art low-light image enhancement techniques and their developments in recent years. Techniques such as gray transformation, histogram equalization, defogging, Retinex, image fusion, and wavelet transformation are examined, focusing on their working principles and assessing their ability to improve image quality. Further discussion addresses the contributions of deep learning and cognitive approaches, including attention mechanisms and adversarial methods, to image enhancement.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141782636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards robust image watermarking via random distortion assignment based meta-learning 通过基于元学习的随机失真分配实现稳健的图像水印技术
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-01 DOI: 10.1016/j.jvcir.2024.104238

Recently, deep learning-based image watermarking methods have been proposed for copyright protection, which are robust to common post-processing operations. However, they suffer from distinct performance drops to open-set distortions, where distortions applied on testing samples are unseen in the training stage. To address this issue, we propose a random distortion assignment-based meta-learning framework for robust image watermarking, where meta-train and meta-test tasks are constructed to simulate open-set distortion scenarios. The embedding and extraction network of watermark information is constructed based on the invertible neural network and equipped with a multi-stage distortion layer, which can conduct random combinations of basic post-processing operators. Besides, to obtain a better balance between robustness and visual imperceptibility, a hybrid loss function is designed by considering global and local similarities based on wavelet decomposition to capture multi-scale texture information. Extensive experiments are conducted by considering various open-set distortions to verify the superiority of the proposed method.

最近,有人提出了基于深度学习的版权保护图像水印方法,这些方法对常见的后处理操作具有鲁棒性。然而,这些方法在开放集失真(即在测试样本上应用的失真在训练阶段是不可见的)的情况下性能明显下降。为解决这一问题,我们提出了基于随机失真分配的鲁棒图像水印元学习框架,其中元训练和元测试任务被用来模拟开放集失真场景。基于可逆神经网络构建了水印信息的嵌入和提取网络,并配备了多级失真层,可对基本后处理算子进行随机组合。此外,为了在鲁棒性和视觉不可感知性之间取得更好的平衡,设计了一种基于小波分解的混合损失函数,考虑了全局和局部相似性,以捕获多尺度纹理信息。通过考虑各种开放集失真,进行了广泛的实验,以验证所提方法的优越性。
{"title":"Towards robust image watermarking via random distortion assignment based meta-learning","authors":"","doi":"10.1016/j.jvcir.2024.104238","DOIUrl":"10.1016/j.jvcir.2024.104238","url":null,"abstract":"<div><p>Recently, deep learning-based image watermarking methods have been proposed for copyright protection, which are robust to common post-processing operations. However, they suffer from distinct performance drops to open-set distortions, where distortions applied on testing samples are unseen in the training stage. To address this issue, we propose a random distortion assignment-based meta-learning framework for robust image watermarking, where meta-train and meta-test tasks are constructed to simulate open-set distortion scenarios. The embedding and extraction network of watermark information is constructed based on the invertible neural network and equipped with a multi-stage distortion layer, which can conduct random combinations of basic post-processing operators. Besides, to obtain a better balance between robustness and visual imperceptibility, a hybrid loss function is designed by considering global and local similarities based on wavelet decomposition to capture multi-scale texture information. Extensive experiments are conducted by considering various open-set distortions to verify the superiority of the proposed method.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141782631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A lightweight and continuous dimensional emotion analysis system of facial expression recognition under complex background 复杂背景下面部表情识别的轻量级连续维度情感分析系统
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-01 DOI: 10.1016/j.jvcir.2024.104260

Facial expression recognition technology has a brilliant prospect in applying the Internet of Things systems. Concerning the limited hardware computing capability and high real-time processing requirements, this paper proposes a lightweight emotion analysis system based on edge computing, which could be deployed on edge devices. To further improve the accuracy of dimensional emotion analysis, we propose a modified network structure of MobileNetV3 to measure the intensity of dimensional emotion. The optimization scheme includes introducing the improved efficient channel attention mechanism and the feature pyramid network, adjusting the structure of the model, and optimizing the loss function. Furthermore, the system uses Intel’s OpenVINO toolkit to make the model more suitable for stable operation and provides an operable human–computer interaction interface. The experimental results show that the system has the advantages of few parameters, high recognition accuracy, and low latency. The optimized network size is reduced by 67% compared with the original, the root mean square error of potency and activation is 0.413 and 0.389, and the latency is up to 9ms in Myriad. This work meets the requirements of practical applications and has essential significance with the demand for continuous dimensional emotion analysis. The source code is available at https://github.com/SCNU-RISLAB/Lightweight-and-Continuous-Dimensional-Emotion-Analysis-System-of-Facial-Expression-Recognition/.

面部表情识别技术在物联网系统中有着广阔的应用前景。考虑到有限的硬件计算能力和较高的实时处理要求,本文提出了一种基于边缘计算的轻量级情感分析系统,可以部署在边缘设备上。为了进一步提高维度情感分析的准确性,我们提出了一种改进的 MobileNetV3 网络结构来测量维度情感的强度。优化方案包括引入改进的高效信道关注机制和特征金字塔网络,调整模型结构,优化损失函数。此外,系统还使用了英特尔的 OpenVINO 工具包,使模型更适合稳定运行,并提供了可操作的人机交互界面。实验结果表明,该系统具有参数少、识别准确率高、延迟低等优点。优化后的网络规模比原来缩小了 67%,效力和激活的均方根误差分别为 0.413 和 0.389,在 Myriad 中的延迟时间可达 9ms。这项工作符合实际应用的要求,对连续维度情感分析的需求具有重要意义。源代码见 https://github.com/SCNU-RISLAB/Lightweight-and-Continuous-Dimensional-Emotion-Analysis-System-of-Facial-Expression-Recognition/。
{"title":"A lightweight and continuous dimensional emotion analysis system of facial expression recognition under complex background","authors":"","doi":"10.1016/j.jvcir.2024.104260","DOIUrl":"10.1016/j.jvcir.2024.104260","url":null,"abstract":"<div><p>Facial expression recognition technology has a brilliant prospect in applying the Internet of Things systems. Concerning the limited hardware computing capability and high real-time processing requirements, this paper proposes a lightweight emotion analysis system based on edge computing, which could be deployed on edge devices. To further improve the accuracy of dimensional emotion analysis, we propose a modified network structure of MobileNetV3 to measure the intensity of dimensional emotion. The optimization scheme includes introducing the improved efficient channel attention mechanism and the feature pyramid network, adjusting the structure of the model, and optimizing the loss function. Furthermore, the system uses Intel’s OpenVINO toolkit to make the model more suitable for stable operation and provides an operable human–computer interaction interface. The experimental results show that the system has the advantages of few parameters, high recognition accuracy, and low latency. The optimized network size is reduced by 67% compared with the original, the root mean square error of potency and activation is 0.413 and 0.389, and the latency is up to 9ms in Myriad. This work meets the requirements of practical applications and has essential significance with the demand for continuous dimensional emotion analysis. The source code is available at <span><span>https://github.com/SCNU-RISLAB/Lightweight-and-Continuous-Dimensional-Emotion-Analysis-System-of-Facial-Expression-Recognition/</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142049920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel hybrid network model for image steganalysis 用于图像隐写分析的新型混合网络模型
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-01 DOI: 10.1016/j.jvcir.2024.104251

Steganalysis attempts to discover hidden signals in suspected carriers or at the least detect which media contain hidden signals. Conventional approaches to steganalysis depend on artificially designed image features. However, these methods are time-consuming and labor-intensive. Additionally, the statistical methods may not produce optimal outcomes. Deep learning-based steganalysis algorithms which use convolutional neural network (CNN) structures, such as ZhuNet, obviate the need for artificially design features while optimizing the feature extraction and classification processes via training and learning. This approach greatly boosts the applicability and effectiveness of steganalysis. Nevertheless, it is important to note that CNN-based steganalysis algorithms do have some limitations. To begin with, the feature extraction of stego images, which relies on deep neural networks, lacks consideration for the interdependence of local features when constructing the overall feature map. Furthermore, CNN-based steganalysis models use all features indiscriminately to classify stego images, which can potentially reduce the models’ accuracy. Based on ZhuNet, we provide a novel hybrid network model known as ZhuNet-ATT-BiLSTM in order to tackle the aforementioned concerns. This model introduces a Bidirectional Long Short-Term Memory (BiLSTM) structure to mutually learn about the relationships between image features to ensure comprehensive utilization of stego image features. In addition, an attention mechanism is integrated for steganalysis to dynamically allocate weights to feature data, amplifying the signal for the vital features while effectively attenuating the less important and irrelevant features. Lastly, the enhanced model is verified with two open datasets: Bossbase 1.01 and COCO. According to experimental findings, the proposed hybrid network model improves the image steganalysis accuracy by comparing with earlier methods.

隐写分析试图发现可疑载体中的隐藏信号,或至少检测出哪些媒体包含隐藏信号。传统的隐写分析方法依赖于人工设计的图像特征。然而,这些方法耗时耗力。此外,统计方法可能无法产生最佳结果。基于深度学习的隐写分析算法采用卷积神经网络(CNN)结构,如 ZhuNet,无需人为设计特征,同时通过训练和学习优化特征提取和分类过程。这种方法大大提高了隐写分析的适用性和有效性。不过,值得注意的是,基于 CNN 的隐写分析算法确实存在一些局限性。首先,依赖深度神经网络进行的偷窃图像特征提取,在构建整体特征图时缺乏对局部特征相互依赖关系的考虑。此外,基于 CNN 的隐写分析模型会不加区分地使用所有特征来对隐写图像进行分类,这有可能降低模型的准确性。为了解决上述问题,我们在 ZhuNet 的基础上提供了一种新颖的混合网络模型,即 ZhuNet-ATT-BiLSTM。该模型引入了双向长短期记忆(BiLSTM)结构,以相互学习图像特征之间的关系,从而确保对偷窃图像特征的综合利用。此外,还集成了用于隐写分析的注意力机制,为特征数据动态分配权重,放大重要特征的信号,同时有效削弱不重要和不相关的特征。最后,用两个开放数据集对增强型模型进行了验证:Bossbase 1.01 和 COCO。实验结果表明,与早期方法相比,所提出的混合网络模型提高了图像隐写分析的准确性。
{"title":"A novel hybrid network model for image steganalysis","authors":"","doi":"10.1016/j.jvcir.2024.104251","DOIUrl":"10.1016/j.jvcir.2024.104251","url":null,"abstract":"<div><p>Steganalysis attempts to discover hidden signals in suspected carriers or at the least detect which media contain hidden signals. Conventional approaches to steganalysis depend on artificially designed image features. However, these methods are time-consuming and labor-intensive. Additionally, the statistical methods may not produce optimal outcomes. Deep learning-based steganalysis algorithms which use convolutional neural network (CNN) structures, such as ZhuNet, obviate the need for artificially design features while optimizing the feature extraction and classification processes via training and learning. This approach greatly boosts the applicability and effectiveness of steganalysis. Nevertheless, it is important to note that CNN-based steganalysis algorithms do have some limitations. To begin with, the feature extraction of stego images, which relies on deep neural networks, lacks consideration for the interdependence of local features when constructing the overall feature map. Furthermore, CNN-based steganalysis models use all features indiscriminately to classify stego images, which can potentially reduce the models’ accuracy. Based on ZhuNet, we provide a novel hybrid network model known as ZhuNet-ATT-BiLSTM in order to tackle the aforementioned concerns. This model introduces a Bidirectional Long Short-Term Memory (BiLSTM) structure to mutually learn about the relationships between image features to ensure comprehensive utilization of stego image features. In addition, an attention mechanism is integrated for steganalysis to dynamically allocate weights to feature data, amplifying the signal for the vital features while effectively attenuating the less important and irrelevant features. Lastly, the enhanced model is verified with two open datasets: Bossbase 1.01 and COCO. According to experimental findings, the proposed hybrid network model improves the image steganalysis accuracy by comparing with earlier methods.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal spatiotemporal aggregation for point cloud accumulation 点云积累的多模态时空聚合
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-01 DOI: 10.1016/j.jvcir.2024.104243

Point cloud accumulation is a crucial technique in point cloud analysis, facilitating various downstream tasks like surface reconstruction. Current methods merely rely on raw LiDAR points, yielding unsatisfactory performance due to the limited geometric information, particularly in complex scenarios characterized by intricate motions, diverse objects, and an increased number of frames. In this paper, we introduce camera modality data, which is usually acquired alongside LiDAR data at minimal expense. To this end, we present the Multimodal Spatiotemporal Aggregation solution (termed MSA) to thoroughly explore and aggregate these two distinct modalities (sparse 3D points and multi-view 2D images). Concretely, we propose a multimodal spatial aggregation module to bridge the data gap between different modalities in the Bird’s-Eye-View (BEV) space and further fuse them by learnable adaptive channel-wise weights. By assembling their respective strengths, this module generates a reliable and consistent scene representation. Subsequently, we design a temporal aggregation module to capture continuous motion information across consecutive sequences, which is beneficial for identifying the motion state of the foreground scene and enabling the model to extend robustly to longer sequences. Experiments demonstrate MSA outperforms state-of-the-art (SoTA) point cloud accumulation methods across all evaluation metrics in the public benchmark, especially with more frames.

点云积累是点云分析中的一项关键技术,有助于完成各种下游任务,如表面重建。目前的方法仅仅依赖于原始的激光雷达点,由于几何信息有限,其性能并不令人满意,尤其是在运动复杂、对象多样、帧数增加的复杂场景中。在本文中,我们引入了相机模态数据,这种数据通常与激光雷达数据一起获取,成本极低。为此,我们提出了超模态患者时空聚合解决方案(称为 MSA),以彻底探索和聚合这两种不同的模态(稀疏三维点和多视角二维图像)。具体来说,我们提出了一个多模态空间聚合模块,以弥合鸟眼视图(BEV)空间中不同模态之间的数据差距,并通过可学习的自适应通道权重进一步融合它们。通过集合它们各自的优势,该模块可生成可靠、一致的场景表示。随后,我们设计了一个时间聚合模块来捕捉连续序列中的连续运动信息,这有利于识别前景场景的运动状态,并使模型能够稳健地扩展到更长的序列。实验证明,在公共基准中,MSA 在所有评估指标上都优于最先进的(SoTA)点云积累方法,尤其是在帧数较多的情况下。
{"title":"Multimodal spatiotemporal aggregation for point cloud accumulation","authors":"","doi":"10.1016/j.jvcir.2024.104243","DOIUrl":"10.1016/j.jvcir.2024.104243","url":null,"abstract":"<div><p>Point cloud accumulation is a crucial technique in point cloud analysis, facilitating various downstream tasks like surface reconstruction. Current methods merely rely on raw LiDAR points, yielding unsatisfactory performance due to the limited geometric information, particularly in complex scenarios characterized by intricate motions, diverse objects, and an increased number of frames. In this paper, we introduce camera modality data, which is usually acquired alongside LiDAR data at minimal expense. To this end, we present the <strong>M</strong>ultimodal <strong>S</strong>patiotemporal <strong>A</strong>ggregation solution (termed MSA) to thoroughly explore and aggregate these two distinct modalities (sparse 3D points and multi-view 2D images). Concretely, we propose a multimodal spatial aggregation module to bridge the data gap between different modalities in the Bird’s-Eye-View (BEV) space and further fuse them by learnable adaptive channel-wise weights. By assembling their respective strengths, this module generates a reliable and consistent scene representation. Subsequently, we design a temporal aggregation module to capture continuous motion information across consecutive sequences, which is beneficial for identifying the motion state of the foreground scene and enabling the model to extend robustly to longer sequences. Experiments demonstrate MSA outperforms state-of-the-art (SoTA) point cloud accumulation methods across all evaluation metrics in the public benchmark, especially with more frames.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141782633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning saliency-awareness Siamese network for visual object tracking 学习用于视觉物体跟踪的突出感知连体网络
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-01 DOI: 10.1016/j.jvcir.2024.104237

Siamese trackers have emerged as the predominant paradigm in visual tracking owing to their robust similarity matching. However, relying on dense regression strategies for predicting the target’s axis-aligned bounding box often leads to excessive background pixels. This limitation can compromise the model’s accuracy, especially when tracking non-rigid targets. To tackle this issue, this paper presents a novel saliency-awareness Siamese network for visual object tracking. Compared to the bounding box regression network, our method achieves accurate pixel-level target tracking. Specifically, a two-level U-structure encode-decode model is tailored to learn the saliency of backbone features. Additionally, a dual-pipeline parallel tracking framework is proposed, allowing top-down multi-stage saliency mask prediction, by integrating the aforementioned model into the Siamese network. Finally, a convolutional head is devised to generate a precise binary mask for tracking. Extensive experiments on four benchmarks, including VOT2016, VOT2019, GOT-10k, and UAV123, demonstrate that our tracker achieves superior tracking performance.

暹罗跟踪器因其稳健的相似性匹配而成为视觉跟踪的主要范例。然而,依靠密集回归策略预测目标的轴对齐包围盒往往会导致背景像素过多。这种局限性会影响模型的准确性,尤其是在跟踪非刚性目标时。为解决这一问题,本文提出了一种用于视觉目标跟踪的新型突出感知连体网络。与边界框回归网络相比,我们的方法实现了精确的像素级目标跟踪。具体来说,我们定制了一个两级 U 结构编解码模型来学习骨干特征的显著性。此外,我们还提出了一个双管道并行跟踪框架,通过将上述模型集成到连体网络中,实现自上而下的多阶段突出度掩码预测。最后,还设计了一个卷积头来生成用于跟踪的精确二进制掩码。在 VOT2016、VOT2019、GOT-10k 和 UAV123 等四个基准上进行的广泛实验证明,我们的跟踪器实现了卓越的跟踪性能。
{"title":"Learning saliency-awareness Siamese network for visual object tracking","authors":"","doi":"10.1016/j.jvcir.2024.104237","DOIUrl":"10.1016/j.jvcir.2024.104237","url":null,"abstract":"<div><p>Siamese trackers have emerged as the predominant paradigm in visual tracking owing to their robust similarity matching. However, relying on dense regression strategies for predicting the target’s axis-aligned bounding box often leads to excessive background pixels. This limitation can compromise the model’s accuracy, especially when tracking non-rigid targets. To tackle this issue, this paper presents a novel saliency-awareness Siamese network for visual object tracking. Compared to the bounding box regression network, our method achieves accurate pixel-level target tracking. Specifically, a two-level U-structure encode-decode model is tailored to learn the saliency of backbone features. Additionally, a dual-pipeline parallel tracking framework is proposed, allowing top-down multi-stage saliency mask prediction, by integrating the aforementioned model into the Siamese network. Finally, a convolutional head is devised to generate a precise binary mask for tracking. Extensive experiments on four benchmarks, including VOT2016, VOT2019, GOT-10k, and UAV123, demonstrate that our tracker achieves superior tracking performance.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141707669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance evaluation of efficient segmentation and classification based iris recognition using sheaf attention network 基于高效分割和分类的虹膜识别性能评估(使用 Sheaf 注意网络
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-01 DOI: 10.1016/j.jvcir.2024.104262

Iris recognition, a precise biometric identification technique, relies on the distinct epigenetic patterns within the iris. Existing methods often face challenges related to segmentation accuracy and classification efficiency. To improve the accuracy and efficiency of iris recognition systems, this research proposes an innovative approach for iris recognition, focusing on efficient segmentation and classification using Convolutional neural networks with Sheaf Attention Networks (CSAN). Main objective is to develop an integrated framework that optimizes iris segmentation and classification. Subsequently, dense extreme inception multipath guided up sampling network is employed for accurate segmentation. Finally, classifiers including convolutional neural network with sheaf attention networks are evaluated. The findings indicate that the proposed method achieves superior iris recognition accuracy and robustness, making it suitable for applications such as secure authentication and access control. By comparing with existing approaches CSAN obtains 99.98%, 99.35%, 99.45% and 99.65% accuracy for the four different proposed datasets respectively.

虹膜识别是一种精确的生物识别技术,它依赖于虹膜内独特的表观遗传模式。现有的方法往往在分割准确性和分类效率方面面临挑战。为了提高虹膜识别系统的准确性和效率,本研究提出了一种创新的虹膜识别方法,重点是使用卷积神经网络和 Sheaf Attention 网络(CSAN)进行高效的分割和分类。主要目标是开发一个集成框架,优化虹膜分割和分类。随后,采用密集极值截取多路径引导向上采样网络进行精确分割。最后,对包括卷积神经网络和片状注意力网络在内的分类器进行了评估。研究结果表明,所提出的方法实现了卓越的虹膜识别准确性和鲁棒性,使其适用于安全认证和访问控制等应用。与现有方法相比,CSAN 在四个不同的数据集上分别获得了 99.98%、99.35%、99.45% 和 99.65% 的准确率。
{"title":"Performance evaluation of efficient segmentation and classification based iris recognition using sheaf attention network","authors":"","doi":"10.1016/j.jvcir.2024.104262","DOIUrl":"10.1016/j.jvcir.2024.104262","url":null,"abstract":"<div><p>Iris recognition, a precise biometric identification technique, relies on the distinct epigenetic patterns within the iris. Existing methods often face challenges related to segmentation accuracy and classification efficiency. To improve the accuracy and efficiency of iris recognition systems, this research proposes an innovative approach for iris recognition, focusing on efficient segmentation and classification using Convolutional neural networks with Sheaf Attention Networks (CSAN). Main objective is to develop an integrated framework that optimizes iris segmentation and classification. Subsequently, dense extreme inception multipath guided up sampling network is employed for accurate segmentation. Finally, classifiers including convolutional neural network with sheaf attention networks are evaluated. The findings indicate that the proposed method achieves superior iris recognition accuracy and robustness, making it suitable for applications such as secure authentication and access control. By comparing with existing approaches CSAN obtains 99.98%, 99.35%, 99.45% and 99.65% accuracy for the four different proposed datasets respectively.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142058394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detection of HEVC double compression based on boundary effect of TU and non-zero DCT coefficient distribution 基于 TU 边界效应和非零 DCT 系数分布的 HEVC 双重压缩检测
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-01 DOI: 10.1016/j.jvcir.2024.104255

Video content tampering is a growing concern. While tampering, perpetrator needs to decompress and re-compress a video. Thus, detecting whether a video has undergone double compression is an important issue in video forensics. In this paper, a novel method is proposed for the detection of High Efficiency Video Coding (HEVC) double compression. Firstly, theoretical and statistical analysis are stated on the quality degradation during double compression, and the impact of Quantization Parameter (QP) on Transforming Unit (TU) and Discrete Cosine Transform (DCT) coefficients distribution. Then 3 sub-features based on Boundary Effect of TU and Non-Zero DCT Coefficients are calculated. Further, 3 sub-features are combined into a 26-dimension feature as the proposed detection feature, which is finally fed to the Multilayer Perceptron (MLP) classifier. Experiments are conducted on video sets with different settings, and the result proves that our method achieves better performance under several situations compared with two existing methods.

视频内容篡改问题日益受到关注。在篡改时,犯罪者需要对视频进行解压缩和重新压缩。因此,检测视频是否经过双重压缩是视频取证中的一个重要问题。本文提出了一种检测高效视频编码(HEVC)双重压缩的新方法。首先,对双重压缩过程中的质量下降、量化参数(QP)对变换单元(TU)和离散余弦变换(DCT)系数分布的影响进行了理论和统计分析。然后,根据 TU 和非零 DCT 系数的边界效应计算出 3 个子特征。然后,将 3 个子特征合并为一个 26 维特征,作为建议的检测特征,最后将其输入多层感知器(MLP)分类器。实验在不同设置的视频集上进行,结果证明,与现有的两种方法相比,我们的方法在多种情况下都取得了更好的性能。
{"title":"Detection of HEVC double compression based on boundary effect of TU and non-zero DCT coefficient distribution","authors":"","doi":"10.1016/j.jvcir.2024.104255","DOIUrl":"10.1016/j.jvcir.2024.104255","url":null,"abstract":"<div><p>Video content tampering is a growing concern. While tampering, perpetrator needs to decompress and re-compress a video. Thus, detecting whether a video has undergone double compression is an important issue in video forensics. In this paper, a novel method is proposed for the detection of High Efficiency Video Coding (HEVC) double compression. Firstly, theoretical and statistical analysis are stated on the quality degradation during double compression, and the impact of Quantization Parameter (QP) on Transforming Unit (TU) and Discrete Cosine Transform (DCT) coefficients distribution. Then 3 sub-features based on Boundary Effect of TU and Non-Zero DCT Coefficients are calculated. Further, 3 sub-features are combined into a 26-dimension feature as the proposed detection feature, which is finally fed to the Multilayer Perceptron (MLP) classifier. Experiments are conducted on video sets with different settings, and the result proves that our method achieves better performance under several situations compared with two existing methods.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Visual Communication and Image Representation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1