Pub Date : 2024-08-01DOI: 10.1016/j.jvcir.2024.104253
A novel approach to image cropping, called crop region comparator (CRC), is proposed in this paper, which learns ordering relationships between aesthetic qualities of different crop regions. CRC employs the single-region refinement (SR) module and the inter-region correlation (IC) module. First, we design the SR module to identify essential information in an original image and consider the composition of each crop candidate. Thus, the SR module helps CRC to adaptively find the best crop region according to the essential information. Second, we develop the IC module, which aggregates the information across two crop candidates to analyze their differences effectively and estimate their ordering relationship reliably. Then, we decide the crop region based on the relative aesthetic scores of all crop candidates, computed by comparing them in a pairwise manner. Extensive experimental results demonstrate that the proposed CRC algorithm outperforms existing image cropping techniques on various datasets.
本文提出了一种新颖的图像裁剪方法,称为裁剪区域比较器(CRC),它可以学习不同裁剪区域美学品质之间的排序关系。CRC 采用单区域细化(SR)模块和区域间关联(IC)模块。首先,我们设计了 SR 模块来识别原始图像中的基本信息,并考虑每个候选作物的构成。因此,SR 模块可帮助 CRC 根据基本信息自适应地找到最佳作物区域。其次,我们开发了 IC 模块,该模块汇总两个候选作物的信息,以有效分析它们之间的差异,并可靠地估计它们之间的排序关系。然后,我们根据所有候选作物的相对美学分数来决定作物区域,该分数是通过成对比较的方式计算得出的。广泛的实验结果表明,在各种数据集上,所提出的 CRC 算法优于现有的图像裁剪技术。
{"title":"Image cropping based on order learning","authors":"","doi":"10.1016/j.jvcir.2024.104253","DOIUrl":"10.1016/j.jvcir.2024.104253","url":null,"abstract":"<div><p>A novel approach to image cropping, called crop region comparator (CRC), is proposed in this paper, which learns ordering relationships between aesthetic qualities of different crop regions. CRC employs the single-region refinement (SR) module and the inter-region correlation (IC) module. First, we design the SR module to identify essential information in an original image and consider the composition of each crop candidate. Thus, the SR module helps CRC to adaptively find the best crop region according to the essential information. Second, we develop the IC module, which aggregates the information across two crop candidates to analyze their differences effectively and estimate their ordering relationship reliably. Then, we decide the crop region based on the relative aesthetic scores of all crop candidates, computed by comparing them in a pairwise manner. Extensive experimental results demonstrate that the proposed CRC algorithm outperforms existing image cropping techniques on various datasets.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142011686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01DOI: 10.1016/j.jvcir.2024.104241
Blind Omnidirectional Image Quality Assessment (BOIQA) is of great significance to the development of the immersive media technology. Most BOIQA metrics are achieved by projecting the raw spherical OIs into other plane spaces for better feature representation. For these metrics, the semantic information at the shared boundaries of the adjacent views are prone to be destroyed, hindering the semantic understanding and further performance improvement. To tackle this problem, we propose a lightweight but effective BOIQA metric via replenishing the damaged semantic information. Specifically, a multi-stream semantic information replenishment module is constructed by the multi-scale feature representation, which is designed to restore the destroyed semantic information from two adjacent views. For module learning, more than 20,000 image triplets are further built. Then, the restored features are integrated for the final quality prediction. To testify the effectiveness of the proposed method, extensive experiments are conducted on the public OIQA databases, and the results prove the superior performance of the proposed method.
{"title":"Blind omnidirectional image quality assessment based on semantic information replenishment","authors":"","doi":"10.1016/j.jvcir.2024.104241","DOIUrl":"10.1016/j.jvcir.2024.104241","url":null,"abstract":"<div><p>Blind Omnidirectional Image Quality Assessment (BOIQA) is of great significance to the development of the immersive media technology. Most BOIQA metrics are achieved by projecting the raw spherical OIs into other plane spaces for better feature representation. For these metrics, the semantic information at the shared boundaries of the adjacent views are prone to be destroyed, hindering the semantic understanding and further performance improvement. To tackle this problem, we propose a lightweight but effective BOIQA metric via replenishing the damaged semantic information. Specifically, a multi-stream semantic information replenishment module is constructed by the multi-scale feature representation, which is designed to restore the destroyed semantic information from two adjacent views. For module learning, more than 20,000 image triplets are further built. Then, the restored features are integrated for the final quality prediction. To testify the effectiveness of the proposed method, extensive experiments are conducted on the public OIQA databases, and the results prove the superior performance of the proposed method.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141782635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01DOI: 10.1016/j.jvcir.2024.104223
Low-light image enhancement is an effective solution for improving image recognition by both humans and machines. Due to low illuminance, images captured in such conditions possess less color information compared to those taken in daylight, resulting in occluded images characterized by distortion, low contrast, low brightness, a narrow gray range, and noise. Low-light image enhancement techniques play a crucial role in enhancing the effectiveness of object detection. This paper reviews state-of-the-art low-light image enhancement techniques and their developments in recent years. Techniques such as gray transformation, histogram equalization, defogging, Retinex, image fusion, and wavelet transformation are examined, focusing on their working principles and assessing their ability to improve image quality. Further discussion addresses the contributions of deep learning and cognitive approaches, including attention mechanisms and adversarial methods, to image enhancement.
{"title":"Advancements in low light image enhancement techniques and recent applications","authors":"","doi":"10.1016/j.jvcir.2024.104223","DOIUrl":"10.1016/j.jvcir.2024.104223","url":null,"abstract":"<div><p>Low-light image enhancement is an effective solution for improving image recognition by both humans and machines. Due to low illuminance, images captured in such conditions possess less color information compared to those taken in daylight, resulting in occluded images characterized by distortion, low contrast, low brightness, a narrow gray range, and noise. Low-light image enhancement techniques play a crucial role in enhancing the effectiveness of object detection. This paper reviews state-of-the-art low-light image enhancement techniques and their developments in recent years. Techniques such as gray transformation, histogram equalization, defogging, Retinex, image fusion, and wavelet transformation are examined, focusing on their working principles and assessing their ability to improve image quality. Further discussion addresses the contributions of deep learning and cognitive approaches, including attention mechanisms and adversarial methods, to image enhancement.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141782636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01DOI: 10.1016/j.jvcir.2024.104238
Recently, deep learning-based image watermarking methods have been proposed for copyright protection, which are robust to common post-processing operations. However, they suffer from distinct performance drops to open-set distortions, where distortions applied on testing samples are unseen in the training stage. To address this issue, we propose a random distortion assignment-based meta-learning framework for robust image watermarking, where meta-train and meta-test tasks are constructed to simulate open-set distortion scenarios. The embedding and extraction network of watermark information is constructed based on the invertible neural network and equipped with a multi-stage distortion layer, which can conduct random combinations of basic post-processing operators. Besides, to obtain a better balance between robustness and visual imperceptibility, a hybrid loss function is designed by considering global and local similarities based on wavelet decomposition to capture multi-scale texture information. Extensive experiments are conducted by considering various open-set distortions to verify the superiority of the proposed method.
{"title":"Towards robust image watermarking via random distortion assignment based meta-learning","authors":"","doi":"10.1016/j.jvcir.2024.104238","DOIUrl":"10.1016/j.jvcir.2024.104238","url":null,"abstract":"<div><p>Recently, deep learning-based image watermarking methods have been proposed for copyright protection, which are robust to common post-processing operations. However, they suffer from distinct performance drops to open-set distortions, where distortions applied on testing samples are unseen in the training stage. To address this issue, we propose a random distortion assignment-based meta-learning framework for robust image watermarking, where meta-train and meta-test tasks are constructed to simulate open-set distortion scenarios. The embedding and extraction network of watermark information is constructed based on the invertible neural network and equipped with a multi-stage distortion layer, which can conduct random combinations of basic post-processing operators. Besides, to obtain a better balance between robustness and visual imperceptibility, a hybrid loss function is designed by considering global and local similarities based on wavelet decomposition to capture multi-scale texture information. Extensive experiments are conducted by considering various open-set distortions to verify the superiority of the proposed method.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141782631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01DOI: 10.1016/j.jvcir.2024.104260
Facial expression recognition technology has a brilliant prospect in applying the Internet of Things systems. Concerning the limited hardware computing capability and high real-time processing requirements, this paper proposes a lightweight emotion analysis system based on edge computing, which could be deployed on edge devices. To further improve the accuracy of dimensional emotion analysis, we propose a modified network structure of MobileNetV3 to measure the intensity of dimensional emotion. The optimization scheme includes introducing the improved efficient channel attention mechanism and the feature pyramid network, adjusting the structure of the model, and optimizing the loss function. Furthermore, the system uses Intel’s OpenVINO toolkit to make the model more suitable for stable operation and provides an operable human–computer interaction interface. The experimental results show that the system has the advantages of few parameters, high recognition accuracy, and low latency. The optimized network size is reduced by 67% compared with the original, the root mean square error of potency and activation is 0.413 and 0.389, and the latency is up to 9ms in Myriad. This work meets the requirements of practical applications and has essential significance with the demand for continuous dimensional emotion analysis. The source code is available at https://github.com/SCNU-RISLAB/Lightweight-and-Continuous-Dimensional-Emotion-Analysis-System-of-Facial-Expression-Recognition/.
{"title":"A lightweight and continuous dimensional emotion analysis system of facial expression recognition under complex background","authors":"","doi":"10.1016/j.jvcir.2024.104260","DOIUrl":"10.1016/j.jvcir.2024.104260","url":null,"abstract":"<div><p>Facial expression recognition technology has a brilliant prospect in applying the Internet of Things systems. Concerning the limited hardware computing capability and high real-time processing requirements, this paper proposes a lightweight emotion analysis system based on edge computing, which could be deployed on edge devices. To further improve the accuracy of dimensional emotion analysis, we propose a modified network structure of MobileNetV3 to measure the intensity of dimensional emotion. The optimization scheme includes introducing the improved efficient channel attention mechanism and the feature pyramid network, adjusting the structure of the model, and optimizing the loss function. Furthermore, the system uses Intel’s OpenVINO toolkit to make the model more suitable for stable operation and provides an operable human–computer interaction interface. The experimental results show that the system has the advantages of few parameters, high recognition accuracy, and low latency. The optimized network size is reduced by 67% compared with the original, the root mean square error of potency and activation is 0.413 and 0.389, and the latency is up to 9ms in Myriad. This work meets the requirements of practical applications and has essential significance with the demand for continuous dimensional emotion analysis. The source code is available at <span><span>https://github.com/SCNU-RISLAB/Lightweight-and-Continuous-Dimensional-Emotion-Analysis-System-of-Facial-Expression-Recognition/</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142049920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01DOI: 10.1016/j.jvcir.2024.104251
Steganalysis attempts to discover hidden signals in suspected carriers or at the least detect which media contain hidden signals. Conventional approaches to steganalysis depend on artificially designed image features. However, these methods are time-consuming and labor-intensive. Additionally, the statistical methods may not produce optimal outcomes. Deep learning-based steganalysis algorithms which use convolutional neural network (CNN) structures, such as ZhuNet, obviate the need for artificially design features while optimizing the feature extraction and classification processes via training and learning. This approach greatly boosts the applicability and effectiveness of steganalysis. Nevertheless, it is important to note that CNN-based steganalysis algorithms do have some limitations. To begin with, the feature extraction of stego images, which relies on deep neural networks, lacks consideration for the interdependence of local features when constructing the overall feature map. Furthermore, CNN-based steganalysis models use all features indiscriminately to classify stego images, which can potentially reduce the models’ accuracy. Based on ZhuNet, we provide a novel hybrid network model known as ZhuNet-ATT-BiLSTM in order to tackle the aforementioned concerns. This model introduces a Bidirectional Long Short-Term Memory (BiLSTM) structure to mutually learn about the relationships between image features to ensure comprehensive utilization of stego image features. In addition, an attention mechanism is integrated for steganalysis to dynamically allocate weights to feature data, amplifying the signal for the vital features while effectively attenuating the less important and irrelevant features. Lastly, the enhanced model is verified with two open datasets: Bossbase 1.01 and COCO. According to experimental findings, the proposed hybrid network model improves the image steganalysis accuracy by comparing with earlier methods.
{"title":"A novel hybrid network model for image steganalysis","authors":"","doi":"10.1016/j.jvcir.2024.104251","DOIUrl":"10.1016/j.jvcir.2024.104251","url":null,"abstract":"<div><p>Steganalysis attempts to discover hidden signals in suspected carriers or at the least detect which media contain hidden signals. Conventional approaches to steganalysis depend on artificially designed image features. However, these methods are time-consuming and labor-intensive. Additionally, the statistical methods may not produce optimal outcomes. Deep learning-based steganalysis algorithms which use convolutional neural network (CNN) structures, such as ZhuNet, obviate the need for artificially design features while optimizing the feature extraction and classification processes via training and learning. This approach greatly boosts the applicability and effectiveness of steganalysis. Nevertheless, it is important to note that CNN-based steganalysis algorithms do have some limitations. To begin with, the feature extraction of stego images, which relies on deep neural networks, lacks consideration for the interdependence of local features when constructing the overall feature map. Furthermore, CNN-based steganalysis models use all features indiscriminately to classify stego images, which can potentially reduce the models’ accuracy. Based on ZhuNet, we provide a novel hybrid network model known as ZhuNet-ATT-BiLSTM in order to tackle the aforementioned concerns. This model introduces a Bidirectional Long Short-Term Memory (BiLSTM) structure to mutually learn about the relationships between image features to ensure comprehensive utilization of stego image features. In addition, an attention mechanism is integrated for steganalysis to dynamically allocate weights to feature data, amplifying the signal for the vital features while effectively attenuating the less important and irrelevant features. Lastly, the enhanced model is verified with two open datasets: Bossbase 1.01 and COCO. According to experimental findings, the proposed hybrid network model improves the image steganalysis accuracy by comparing with earlier methods.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01DOI: 10.1016/j.jvcir.2024.104243
Point cloud accumulation is a crucial technique in point cloud analysis, facilitating various downstream tasks like surface reconstruction. Current methods merely rely on raw LiDAR points, yielding unsatisfactory performance due to the limited geometric information, particularly in complex scenarios characterized by intricate motions, diverse objects, and an increased number of frames. In this paper, we introduce camera modality data, which is usually acquired alongside LiDAR data at minimal expense. To this end, we present the Multimodal Spatiotemporal Aggregation solution (termed MSA) to thoroughly explore and aggregate these two distinct modalities (sparse 3D points and multi-view 2D images). Concretely, we propose a multimodal spatial aggregation module to bridge the data gap between different modalities in the Bird’s-Eye-View (BEV) space and further fuse them by learnable adaptive channel-wise weights. By assembling their respective strengths, this module generates a reliable and consistent scene representation. Subsequently, we design a temporal aggregation module to capture continuous motion information across consecutive sequences, which is beneficial for identifying the motion state of the foreground scene and enabling the model to extend robustly to longer sequences. Experiments demonstrate MSA outperforms state-of-the-art (SoTA) point cloud accumulation methods across all evaluation metrics in the public benchmark, especially with more frames.
{"title":"Multimodal spatiotemporal aggregation for point cloud accumulation","authors":"","doi":"10.1016/j.jvcir.2024.104243","DOIUrl":"10.1016/j.jvcir.2024.104243","url":null,"abstract":"<div><p>Point cloud accumulation is a crucial technique in point cloud analysis, facilitating various downstream tasks like surface reconstruction. Current methods merely rely on raw LiDAR points, yielding unsatisfactory performance due to the limited geometric information, particularly in complex scenarios characterized by intricate motions, diverse objects, and an increased number of frames. In this paper, we introduce camera modality data, which is usually acquired alongside LiDAR data at minimal expense. To this end, we present the <strong>M</strong>ultimodal <strong>S</strong>patiotemporal <strong>A</strong>ggregation solution (termed MSA) to thoroughly explore and aggregate these two distinct modalities (sparse 3D points and multi-view 2D images). Concretely, we propose a multimodal spatial aggregation module to bridge the data gap between different modalities in the Bird’s-Eye-View (BEV) space and further fuse them by learnable adaptive channel-wise weights. By assembling their respective strengths, this module generates a reliable and consistent scene representation. Subsequently, we design a temporal aggregation module to capture continuous motion information across consecutive sequences, which is beneficial for identifying the motion state of the foreground scene and enabling the model to extend robustly to longer sequences. Experiments demonstrate MSA outperforms state-of-the-art (SoTA) point cloud accumulation methods across all evaluation metrics in the public benchmark, especially with more frames.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141782633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01DOI: 10.1016/j.jvcir.2024.104237
Siamese trackers have emerged as the predominant paradigm in visual tracking owing to their robust similarity matching. However, relying on dense regression strategies for predicting the target’s axis-aligned bounding box often leads to excessive background pixels. This limitation can compromise the model’s accuracy, especially when tracking non-rigid targets. To tackle this issue, this paper presents a novel saliency-awareness Siamese network for visual object tracking. Compared to the bounding box regression network, our method achieves accurate pixel-level target tracking. Specifically, a two-level U-structure encode-decode model is tailored to learn the saliency of backbone features. Additionally, a dual-pipeline parallel tracking framework is proposed, allowing top-down multi-stage saliency mask prediction, by integrating the aforementioned model into the Siamese network. Finally, a convolutional head is devised to generate a precise binary mask for tracking. Extensive experiments on four benchmarks, including VOT2016, VOT2019, GOT-10k, and UAV123, demonstrate that our tracker achieves superior tracking performance.
暹罗跟踪器因其稳健的相似性匹配而成为视觉跟踪的主要范例。然而,依靠密集回归策略预测目标的轴对齐包围盒往往会导致背景像素过多。这种局限性会影响模型的准确性,尤其是在跟踪非刚性目标时。为解决这一问题,本文提出了一种用于视觉目标跟踪的新型突出感知连体网络。与边界框回归网络相比,我们的方法实现了精确的像素级目标跟踪。具体来说,我们定制了一个两级 U 结构编解码模型来学习骨干特征的显著性。此外,我们还提出了一个双管道并行跟踪框架,通过将上述模型集成到连体网络中,实现自上而下的多阶段突出度掩码预测。最后,还设计了一个卷积头来生成用于跟踪的精确二进制掩码。在 VOT2016、VOT2019、GOT-10k 和 UAV123 等四个基准上进行的广泛实验证明,我们的跟踪器实现了卓越的跟踪性能。
{"title":"Learning saliency-awareness Siamese network for visual object tracking","authors":"","doi":"10.1016/j.jvcir.2024.104237","DOIUrl":"10.1016/j.jvcir.2024.104237","url":null,"abstract":"<div><p>Siamese trackers have emerged as the predominant paradigm in visual tracking owing to their robust similarity matching. However, relying on dense regression strategies for predicting the target’s axis-aligned bounding box often leads to excessive background pixels. This limitation can compromise the model’s accuracy, especially when tracking non-rigid targets. To tackle this issue, this paper presents a novel saliency-awareness Siamese network for visual object tracking. Compared to the bounding box regression network, our method achieves accurate pixel-level target tracking. Specifically, a two-level U-structure encode-decode model is tailored to learn the saliency of backbone features. Additionally, a dual-pipeline parallel tracking framework is proposed, allowing top-down multi-stage saliency mask prediction, by integrating the aforementioned model into the Siamese network. Finally, a convolutional head is devised to generate a precise binary mask for tracking. Extensive experiments on four benchmarks, including VOT2016, VOT2019, GOT-10k, and UAV123, demonstrate that our tracker achieves superior tracking performance.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141707669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01DOI: 10.1016/j.jvcir.2024.104262
Iris recognition, a precise biometric identification technique, relies on the distinct epigenetic patterns within the iris. Existing methods often face challenges related to segmentation accuracy and classification efficiency. To improve the accuracy and efficiency of iris recognition systems, this research proposes an innovative approach for iris recognition, focusing on efficient segmentation and classification using Convolutional neural networks with Sheaf Attention Networks (CSAN). Main objective is to develop an integrated framework that optimizes iris segmentation and classification. Subsequently, dense extreme inception multipath guided up sampling network is employed for accurate segmentation. Finally, classifiers including convolutional neural network with sheaf attention networks are evaluated. The findings indicate that the proposed method achieves superior iris recognition accuracy and robustness, making it suitable for applications such as secure authentication and access control. By comparing with existing approaches CSAN obtains 99.98%, 99.35%, 99.45% and 99.65% accuracy for the four different proposed datasets respectively.
{"title":"Performance evaluation of efficient segmentation and classification based iris recognition using sheaf attention network","authors":"","doi":"10.1016/j.jvcir.2024.104262","DOIUrl":"10.1016/j.jvcir.2024.104262","url":null,"abstract":"<div><p>Iris recognition, a precise biometric identification technique, relies on the distinct epigenetic patterns within the iris. Existing methods often face challenges related to segmentation accuracy and classification efficiency. To improve the accuracy and efficiency of iris recognition systems, this research proposes an innovative approach for iris recognition, focusing on efficient segmentation and classification using Convolutional neural networks with Sheaf Attention Networks (CSAN). Main objective is to develop an integrated framework that optimizes iris segmentation and classification. Subsequently, dense extreme inception multipath guided up sampling network is employed for accurate segmentation. Finally, classifiers including convolutional neural network with sheaf attention networks are evaluated. The findings indicate that the proposed method achieves superior iris recognition accuracy and robustness, making it suitable for applications such as secure authentication and access control. By comparing with existing approaches CSAN obtains 99.98%, 99.35%, 99.45% and 99.65% accuracy for the four different proposed datasets respectively.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142058394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01DOI: 10.1016/j.jvcir.2024.104255
Video content tampering is a growing concern. While tampering, perpetrator needs to decompress and re-compress a video. Thus, detecting whether a video has undergone double compression is an important issue in video forensics. In this paper, a novel method is proposed for the detection of High Efficiency Video Coding (HEVC) double compression. Firstly, theoretical and statistical analysis are stated on the quality degradation during double compression, and the impact of Quantization Parameter (QP) on Transforming Unit (TU) and Discrete Cosine Transform (DCT) coefficients distribution. Then 3 sub-features based on Boundary Effect of TU and Non-Zero DCT Coefficients are calculated. Further, 3 sub-features are combined into a 26-dimension feature as the proposed detection feature, which is finally fed to the Multilayer Perceptron (MLP) classifier. Experiments are conducted on video sets with different settings, and the result proves that our method achieves better performance under several situations compared with two existing methods.
{"title":"Detection of HEVC double compression based on boundary effect of TU and non-zero DCT coefficient distribution","authors":"","doi":"10.1016/j.jvcir.2024.104255","DOIUrl":"10.1016/j.jvcir.2024.104255","url":null,"abstract":"<div><p>Video content tampering is a growing concern. While tampering, perpetrator needs to decompress and re-compress a video. Thus, detecting whether a video has undergone double compression is an important issue in video forensics. In this paper, a novel method is proposed for the detection of High Efficiency Video Coding (HEVC) double compression. Firstly, theoretical and statistical analysis are stated on the quality degradation during double compression, and the impact of Quantization Parameter (QP) on Transforming Unit (TU) and Discrete Cosine Transform (DCT) coefficients distribution. Then 3 sub-features based on Boundary Effect of TU and Non-Zero DCT Coefficients are calculated. Further, 3 sub-features are combined into a 26-dimension feature as the proposed detection feature, which is finally fed to the Multilayer Perceptron (MLP) classifier. Experiments are conducted on video sets with different settings, and the result proves that our method achieves better performance under several situations compared with two existing methods.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}