Image and Vision Computing最新文献

英文中文

Efficient and robust multi-camera 3D object detection in bird-eye-view

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing

Pub Date : 2025-02-01 DOI: 10.1016/j.imavis.2025.105428

Yuanlong Wang, Hengtao Jiang, Guanying Chen, Tong Zhang, Jiaqing Zhou, Zezheng Qing, Chunyan Wang, Wanzhong Zhao

Bird's-eye view (BEV) representations are increasingly used in autonomous driving perception due to their comprehensive, unobstructed vehicle surroundings. Compared to transformer or depth based methods, ray transformation based methods are more suitable for vehicle deployment and more efficient. However, these methods typically depend on accurate extrinsic camera parameters, making them vulnerable to performance degradation when calibration errors or installation changes occur. In this work, we follow ray transformation based methods and propose an extrinsic parameters free approach, which reduces reliance on accurate offline camera extrinsic calibration by using a neural network to predict extrinsic parameters online and can effectively improve the robustness of the model. In addition, we propose a multi-level and multi-scale image encoder to better encode image features and adopt a more intensive temporal fusion strategy. Our framework further mainly contains four important designs: (1) a multi-level and multi-scale image encoder, which can leverage multi-scale information on the inter-layer and the intra-layer for better performance, (2) ray-transformation with extrinsic parameters free approach, which can transfers image features to BEV space and lighten the impact of extrinsic disturbance on m-odel's detection performance, (3) an intensive temporal fusion strategy using motion information from five historical frames. (4) a high-performance BEV encoder that efficiently reduces the spatial dimensions of a voxel-based feature map and fuse the multi-scale and the multi-frame BEV features. Experiments on nuScenes show that our best model (R101@900 × 1600) realized competitive 41.7% mAP and 53.8% NDS on the validation set, which outperforming several state-of-the-art visual BEV models in 3D object detection.

{"title":"Efficient and robust multi-camera 3D object detection in bird-eye-view","authors":"Yuanlong Wang, Hengtao Jiang, Guanying Chen, Tong Zhang, Jiaqing Zhou, Zezheng Qing, Chunyan Wang, Wanzhong Zhao","doi":"10.1016/j.imavis.2025.105428","DOIUrl":"10.1016/j.imavis.2025.105428","url":null,"abstract":"<div><div>Bird's-eye view (BEV) representations are increasingly used in autonomous driving perception due to their comprehensive, unobstructed vehicle surroundings. Compared to transformer or depth based methods, ray transformation based methods are more suitable for vehicle deployment and more efficient. However, these methods typically depend on accurate extrinsic camera parameters, making them vulnerable to performance degradation when calibration errors or installation changes occur. In this work, we follow ray transformation based methods and propose an extrinsic parameters free approach, which reduces reliance on accurate offline camera extrinsic calibration by using a neural network to predict extrinsic parameters online and can effectively improve the robustness of the model. In addition, we propose a multi-level and multi-scale image encoder to better encode image features and adopt a more intensive temporal fusion strategy. Our framework further mainly contains four important designs: (1) a multi-level and multi-scale image encoder, which can leverage multi-scale information on the inter-layer and the intra-layer for better performance, (2) ray-transformation with extrinsic parameters free approach, which can transfers image features to BEV space and lighten the impact of extrinsic disturbance on m-odel's detection performance, (3) an intensive temporal fusion strategy using motion information from five historical frames. (4) a high-performance BEV encoder that efficiently reduces the spatial dimensions of a voxel-based feature map and fuse the multi-scale and the multi-frame BEV features. Experiments on nuScenes show that our best model (R101@900 × 1600) realized competitive 41.7% mAP and 53.8% NDS on the validation set, which outperforming several state-of-the-art visual BEV models in 3D object detection.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"154 ","pages":"Article 105428"},"PeriodicalIF":4.2,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143138680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Advancing brain tumor segmentation and grading through integration of FusionNet and IBCO-based ALCResNet

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing

Pub Date : 2025-02-01 DOI: 10.1016/j.imavis.2025.105432

Abbas Rehman , Gu Naijie , Asma Aldrees , Muhammad Umer , Abeer Hakeem , Shtwai Alsubai , Lucia Cascone

Brain tumors represent a significant global health challenge, characterized by uncontrolled cerebral cell growth. The variability in size, shape, and anatomical positioning complicates computational classification, which is crucial for effective treatment planning. Accurate detection is essential, as even small diagnostic inaccuracies can significantly increase the mortality risk. Tumor grade stratification is also critical for automated diagnosis; however, current deep learning models often fall short in achieving the desired effectiveness. In this study, we propose an advanced approach that leverages cutting-edge deep learning techniques to improve early detection and tumor severity grading, facilitating automated diagnosis. Clinical bioinformatics datasets are used to source representative brain tumor images, which undergo pre-processing and data augmentation via a Generative Adversarial Network (GAN). The images are then classified using the Adaptive Layer Cascaded ResNet (ALCResNet) model, optimized with the Improved Border Collie Optimization (IBCO) algorithm for enhanced diagnostic accuracy. The integration of FusionNet for precise segmentation and the IBCO-enhanced ALCResNet for optimized feature extraction and classification forms a novel framework. This unique combination ensures not only accurate segmentation but also enhanced precision in grading tumor severity, addressing key limitations of existing methodologies. For segmentation, the FusionNet deep learning model is employed to identify abnormal regions, which are subsequently classified as Meningioma, Glioma, or Pituitary tumors using ALCResNet. Experimental results demonstrate significant improvements in tumor identification and severity grading, with the proposed method achieving superior precision (99.79%) and accuracy (99.33%) compared to existing classifiers and heuristic approaches.

{"title":"Advancing brain tumor segmentation and grading through integration of FusionNet and IBCO-based ALCResNet","authors":"Abbas Rehman , Gu Naijie , Asma Aldrees , Muhammad Umer , Abeer Hakeem , Shtwai Alsubai , Lucia Cascone","doi":"10.1016/j.imavis.2025.105432","DOIUrl":"10.1016/j.imavis.2025.105432","url":null,"abstract":"<div><div>Brain tumors represent a significant global health challenge, characterized by uncontrolled cerebral cell growth. The variability in size, shape, and anatomical positioning complicates computational classification, which is crucial for effective treatment planning. Accurate detection is essential, as even small diagnostic inaccuracies can significantly increase the mortality risk. Tumor grade stratification is also critical for automated diagnosis; however, current deep learning models often fall short in achieving the desired effectiveness. In this study, we propose an advanced approach that leverages cutting-edge deep learning techniques to improve early detection and tumor severity grading, facilitating automated diagnosis. Clinical bioinformatics datasets are used to source representative brain tumor images, which undergo pre-processing and data augmentation via a Generative Adversarial Network (GAN). The images are then classified using the Adaptive Layer Cascaded ResNet (ALCResNet) model, optimized with the Improved Border Collie Optimization (IBCO) algorithm for enhanced diagnostic accuracy. The integration of FusionNet for precise segmentation and the IBCO-enhanced ALCResNet for optimized feature extraction and classification forms a novel framework. This unique combination ensures not only accurate segmentation but also enhanced precision in grading tumor severity, addressing key limitations of existing methodologies. For segmentation, the FusionNet deep learning model is employed to identify abnormal regions, which are subsequently classified as Meningioma, Glioma, or Pituitary tumors using ALCResNet. Experimental results demonstrate significant improvements in tumor identification and severity grading, with the proposed method achieving superior precision (99.79%) and accuracy (99.33%) compared to existing classifiers and heuristic approaches.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"154 ","pages":"Article 105432"},"PeriodicalIF":4.2,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143139141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

EMA-GS: Improving sparse point cloud rendering with EMA gradient and anchor upsampling

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing

Pub Date : 2025-02-01 DOI: 10.1016/j.imavis.2025.105433

Ding Yuan , Sizhe Zhang , Hong Zhang , Yangyan Deng , Yifan Yang

The 3D Gaussian Splatting (3D-GS) technique combines 3D Gaussian primitives with differentiable rasterization for real-time high-quality novel view synthesis. However, in sparse regions of the initial point cloud, this often results in blurring and needle-like artifacts owing to the inadequacies of the existing densification criterion. To address this, an innovative approach that utilizes the Exponential Moving Average (EMA) of homodirectional positional gradients as the densification criterion is introduced. Additionally, in the early stages of training, anchors are upsampled near representative locations to infill details into the sparse initial point clouds. Testing on challenging datasets such as Mip-NeRF 360, Tanks and Temples, and DeepBlending, the results demonstrate that the proposed method achieves fine detail recovery without redundant Gaussians, exhibiting superior handling of complex scenes with high-quality reconstruction and without requiring excessive storage. The code will be available upon the acceptance of the article.

引用次数: 0

Pixel integration from fine to coarse for lightweight image super-resolution

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing

Pub Date : 2025-02-01 DOI: 10.1016/j.imavis.2024.105362

Yuxiang Wu , Xiaoyan Wang , Xiaoyan Liu , Yuzhao Gao , Yan Dou

Recently, Transformer-based methods have made significant progress on image super-resolution. They encode long-range dependencies between image patches through self-attention mechanism. However, when extracting all tokens from the entire feature map, the computational cost is expensive. In this paper, we propose a novel lightweight image super-resolution approach, pixel integration network(PIN). Specifically, our method employs fine pixel integration and coarse pixel integration from local and global receptive field. In particular, coarse pixel integration is implemented by a retractable attention, consisting of dense and sparse self-attention. In order to focus on enriching features with contextual information, spatial-gate mechanism and depth-wise convolution are introduced to multi-layer perception. Besides, a spatial frequency fusion block is adopted to obtain more comprehensive, detailed, and stable information at the end of deep feature extraction. Extensive experiments demonstrate that PIN achieves the state-of-the-art performance with small parameters on lightweight super-resolution.

{"title":"Pixel integration from fine to coarse for lightweight image super-resolution","authors":"Yuxiang Wu , Xiaoyan Wang , Xiaoyan Liu , Yuzhao Gao , Yan Dou","doi":"10.1016/j.imavis.2024.105362","DOIUrl":"10.1016/j.imavis.2024.105362","url":null,"abstract":"<div><div>Recently, Transformer-based methods have made significant progress on image super-resolution. They encode long-range dependencies between image patches through self-attention mechanism. However, when extracting all tokens from the entire feature map, the computational cost is expensive. In this paper, we propose a novel lightweight image super-resolution approach, pixel integration network(PIN). Specifically, our method employs fine pixel integration and coarse pixel integration from local and global receptive field. In particular, coarse pixel integration is implemented by a retractable attention, consisting of dense and sparse self-attention. In order to focus on enriching features with contextual information, spatial-gate mechanism and depth-wise convolution are introduced to multi-layer perception. Besides, a spatial frequency fusion block is adopted to obtain more comprehensive, detailed, and stable information at the end of deep feature extraction. Extensive experiments demonstrate that PIN achieves the state-of-the-art performance with small parameters on lightweight super-resolution.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"154 ","pages":"Article 105362"},"PeriodicalIF":4.2,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143138202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Understanding document images by introducing explicit semantic information and short-range information interaction

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing

Pub Date : 2025-02-01 DOI: 10.1016/j.imavis.2024.105392

Yufeng Cheng , Dongxue Wang , Shuang Bai , Jingkai Ma , Chen Liang , Kailong Liu , Tao Deng

Methods on the document visual question answering (DocVQA) task have achieved great success by using pre-trained multimodal models. However, two issues are limiting their performances from further improvement. On the one hand, previous methods didn't use explicit semantic information for answer prediction. On the other hand, these methods predict answers only based on global information interaction results and generate low-quality answers. To address the above issues, in this paper, we propose to utilize document semantic segmentation to introduce explicit semantic information of documents into the DocVQA task and design a star-shaped topology structure to enable the interaction of different tokens in short-range contexts. This way, we can obtain token representations with richer multimodal and contextual information for the DocVQA task. With these two strategies, our method can achieve 0.8430 ANLS (Average Normalized Levenshtein Similarity) on the test set of the DocVQA dataset, demonstrating the effectiveness of our method.

{"title":"Understanding document images by introducing explicit semantic information and short-range information interaction","authors":"Yufeng Cheng , Dongxue Wang , Shuang Bai , Jingkai Ma , Chen Liang , Kailong Liu , Tao Deng","doi":"10.1016/j.imavis.2024.105392","DOIUrl":"10.1016/j.imavis.2024.105392","url":null,"abstract":"<div><div>Methods on the document visual question answering (DocVQA) task have achieved great success by using pre-trained multimodal models. However, two issues are limiting their performances from further improvement. On the one hand, previous methods didn't use explicit semantic information for answer prediction. On the other hand, these methods predict answers only based on global information interaction results and generate low-quality answers. To address the above issues, in this paper, we propose to utilize document semantic segmentation to introduce explicit semantic information of documents into the DocVQA task and design a star-shaped topology structure to enable the interaction of different tokens in short-range contexts. This way, we can obtain token representations with richer multimodal and contextual information for the DocVQA task. With these two strategies, our method can achieve 0.8430 ANLS (Average Normalized Levenshtein Similarity) on the test set of the DocVQA dataset, demonstrating the effectiveness of our method.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"154 ","pages":"Article 105392"},"PeriodicalIF":4.2,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143138402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TPSFusion: A Transformer-based pyramid screening fusion network for 6D pose estimation

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing

Pub Date : 2025-02-01 DOI: 10.1016/j.imavis.2024.105402

Jiaqi Zhu , Bin Li , Xinhua Zhao

RGB-D based 6D pose estimation is a key technology for autonomous driving and robotics applications. Recently, methods based on dense correspondence have achieved huge progress. However, it still suffers from heavy computational burden and insufficient combination of two modalities. In this paper, we propose a novel 6D pose estimation algorithm (TPSFusion) which is based on Transformer and multi-level pyramid fusion features. We first introduce a Multi-modal Features Fusion module, which is composed of the Multi-modal Attention Fusion block (MAF) and Multi-level Screening-feature Fusion block (MSF) to enable high-quality cross-modality information interaction. Subsequently, we introduce a new weight estimation branch to calculate the contribution of different keypoints. Finally, our method has competitive results on YCB-Video, LineMOD, and Occlusion LineMOD datasets.

引用次数: 0

Cross-set data augmentation for semi-supervised medical image segmentation

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing

Pub Date : 2025-02-01 DOI: 10.1016/j.imavis.2024.105407

Qianhao Wu , Xixi Jiang , Dong Zhang , Yifei Feng , Jinhui Tang

Medical image semantic segmentation is a fundamental yet challenging research task. However, training a fully supervised model for this task requires a substantial amount of pixel-level annotated data, which poses a significant challenge for annotators due to the necessity of specialized medical expert knowledge. To mitigate the labeling burden, a semi-supervised medical image segmentation model that leverages both a small quantity of labeled data and a substantial amount of unlabeled data has attracted prominent attention. However, the performance of current methods is constrained by the distribution mismatch problem between limited labeled and unlabeled datasets. To address this issue, we propose a cross-set data augmentation strategy aimed at minimizing the feature divergence between labeled and unlabeled data. Our approach involves mixing labeled and unlabeled data, as well as integrating ground truth with pseudo-labels to produce augmented samples. By employing three distinct cross-set data augmentation strategies, we enhance the diversity of the training dataset and fully exploit the perturbation space. Our experimental results on COVID-19 CT data, spinal cord gray matter MRI data and prostate T2-weighted MRI data substantiate the efficacy of our proposed approach. The code has been released at: CDA.

{"title":"Cross-set data augmentation for semi-supervised medical image segmentation","authors":"Qianhao Wu , Xixi Jiang , Dong Zhang , Yifei Feng , Jinhui Tang","doi":"10.1016/j.imavis.2024.105407","DOIUrl":"10.1016/j.imavis.2024.105407","url":null,"abstract":"<div><div>Medical image semantic segmentation is a fundamental yet challenging research task. However, training a fully supervised model for this task requires a substantial amount of pixel-level annotated data, which poses a significant challenge for annotators due to the necessity of specialized medical expert knowledge. To mitigate the labeling burden, a semi-supervised medical image segmentation model that leverages both a small quantity of labeled data and a substantial amount of unlabeled data has attracted prominent attention. However, the performance of current methods is constrained by the distribution mismatch problem between limited labeled and unlabeled datasets. To address this issue, we propose a cross-set data augmentation strategy aimed at minimizing the feature divergence between labeled and unlabeled data. Our approach involves mixing labeled and unlabeled data, as well as integrating ground truth with pseudo-labels to produce augmented samples. By employing three distinct cross-set data augmentation strategies, we enhance the diversity of the training dataset and fully exploit the perturbation space. Our experimental results on COVID-19 CT data, spinal cord gray matter MRI data and prostate T2-weighted MRI data substantiate the efficacy of our proposed approach. The code has been released at: <span><span>CDA</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"154 ","pages":"Article 105407"},"PeriodicalIF":4.2,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143138487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AGSAM-Net: UAV route planning and visual guidance model for bridge surface defect detection

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing

Pub Date : 2025-02-01 DOI: 10.1016/j.imavis.2025.105416

Rongji Li, Ziqian Wang

Crack width is a critical indicator of bridge structural health. This paper proposes a UAV-based method for detecting bridge surface defects and quantifying crack width, aiming to improve efficiency and accuracy. The system integrates a UAV with a visual navigation system to capture high-resolution images (7322 × 5102 pixels) and GPS data, followed by image resolution computation and plane correction. For crack detection and segmentation, we introduce AGSAM-Net, a multi-class semantic segmentation network enhanced with attention gating to accurately identify and segment cracks at the pixel level. The system processes 8064 × 6048 pixel images in 2.4 s, with a detection time of 0.5 s per 540 × 540 pixel crack bounding box. By incorporating distance data, the system achieves over 90% accuracy in crack width quantification across multiple datasets. The study also explores potential collaboration with robotic arms, offering new insights into automated bridge maintenance.

引用次数: 0

Co-salient object detection with consensus mining and consistency cross-layer interactive decoding

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing

Pub Date : 2025-02-01 DOI: 10.1016/j.imavis.2025.105414

Yanliang Ge , Jinghuai Pan , Junchao Ren , Min He , Hongbo Bi , Qiao Zhang

The main goal of co-salient object detection (CoSOD) is to extract a group of notable objects that appear together in the image. The existing methods face two major challenges: the first is that in some complex scenes or in the case of interference by other salient objects, the mining of consensus cues for co-salient objects is inadequate; the second is that other methods input consensus cues from top to bottom into the decoder, which ignores the compactness of the consensus and lacks cross-layer interaction. To solve the above problems, we propose a consensus mining and consistency cross-layer interactive decoding network, called CCNet, which consists of two key components, namely, a consensus cue mining module (CCM) and a consistency cross-layer interactive decoder (CCID). Specifically, the purpose of CCM is to fully mine the cross-consensus clues among the co-salient objects in the image group, so as to achieve the group consistency modeling of the group of images. Furthermore, CCID accepts features of different levels as input and receives semantic information of group consensus from CCM, which is used to guide features of other levels to learn higher-level feature representations and cross-layer interaction of group semantic consensus clues, thereby maintaining the consistency of group consensus cues and enabling accurate co-saliency map prediction. We evaluated the proposed CCNet using four widely accepted metrics across three challenging CoSOD datasets and the experimental results demonstrate that our proposed approach outperforms other existing state-of-the-art CoSOD methods, particularly on the CoSal2015 and CoSOD3k datasets. The results of our method are available at https://github.com/jinghuaipan/CCNet.

{"title":"Co-salient object detection with consensus mining and consistency cross-layer interactive decoding","authors":"Yanliang Ge , Jinghuai Pan , Junchao Ren , Min He , Hongbo Bi , Qiao Zhang","doi":"10.1016/j.imavis.2025.105414","DOIUrl":"10.1016/j.imavis.2025.105414","url":null,"abstract":"<div><div>The main goal of co-salient object detection (CoSOD) is to extract a group of notable objects that appear together in the image. The existing methods face two major challenges: the first is that in some complex scenes or in the case of interference by other salient objects, the mining of consensus cues for co-salient objects is inadequate; the second is that other methods input consensus cues from top to bottom into the decoder, which ignores the compactness of the consensus and lacks cross-layer interaction. To solve the above problems, we propose a consensus mining and consistency cross-layer interactive decoding network, called CCNet, which consists of two key components, namely, a consensus cue mining module (CCM) and a consistency cross-layer interactive decoder (CCID). Specifically, the purpose of CCM is to fully mine the cross-consensus clues among the co-salient objects in the image group, so as to achieve the group consistency modeling of the group of images. Furthermore, CCID accepts features of different levels as input and receives semantic information of group consensus from CCM, which is used to guide features of other levels to learn higher-level feature representations and cross-layer interaction of group semantic consensus clues, thereby maintaining the consistency of group consensus cues and enabling accurate co-saliency map prediction. We evaluated the proposed CCNet using four widely accepted metrics across three challenging CoSOD datasets and the experimental results demonstrate that our proposed approach outperforms other existing state-of-the-art CoSOD methods, particularly on the CoSal2015 and CoSOD3k datasets. The results of our method are available at <span><span>https://github.com/jinghuaipan/CCNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"154 ","pages":"Article 105414"},"PeriodicalIF":4.2,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143138669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Transparency and privacy measures of biometric patterns for data processing with synthetic data using explainable artificial intelligence

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing

Pub Date : 2025-02-01 DOI: 10.1016/j.imavis.2025.105429

Achyut Shankar , Hariprasath Manoharan , Adil O. Khadidos , Alaa O. Khadidos , Shitharth Selvarajan , S.B. Goyal

In this paper the need of biometric authentication with synthetic data is analyzed for increasing the security of data in each transmission systems. Since more biometric patterns are represented the complexity of recognition changes where low security features are enabled in transmission process. Hence the process of increasing security is carried out with image biometric patterns where synthetic data is created with explainable artificial intelligence technique thereby appropriate decisions are made. Further sample data is generated at each case thereby all changing representations are minimized with increase in original image set values. Moreover the data flows at each identified biometric patterns are increased where partial decisive strategies are followed in proposed approach. Further more complete interpretabilities that are present in captured images or biometric patterns are reduced thus generated data is maximized to all end users. To verify the outcome of proposed approach four scenarios with comparative performance metrics are simulated where from the comparative analysis it is found that the proposed approach is less robust and complex at a rate of 4% and 6% respectively.

{"title":"Transparency and privacy measures of biometric patterns for data processing with synthetic data using explainable artificial intelligence","authors":"Achyut Shankar , Hariprasath Manoharan , Adil O. Khadidos , Alaa O. Khadidos , Shitharth Selvarajan , S.B. Goyal","doi":"10.1016/j.imavis.2025.105429","DOIUrl":"10.1016/j.imavis.2025.105429","url":null,"abstract":"<div><div>In this paper the need of biometric authentication with synthetic data is analyzed for increasing the security of data in each transmission systems. Since more biometric patterns are represented the complexity of recognition changes where low security features are enabled in transmission process. Hence the process of increasing security is carried out with image biometric patterns where synthetic data is created with explainable artificial intelligence technique thereby appropriate decisions are made. Further sample data is generated at each case thereby all changing representations are minimized with increase in original image set values. Moreover the data flows at each identified biometric patterns are increased where partial decisive strategies are followed in proposed approach. Further more complete interpretabilities that are present in captured images or biometric patterns are reduced thus generated data is maximized to all end users. To verify the outcome of proposed approach four scenarios with comparative performance metrics are simulated where from the comparative analysis it is found that the proposed approach is less robust and complex at a rate of 4% and 6% respectively.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"154 ","pages":"Article 105429"},"PeriodicalIF":4.2,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143139146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Image and Vision Computing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀