首页 > 最新文献

2021 IEEE International Conference on Image Processing (ICIP)最新文献

英文 中文
Joint Anomaly Detection and Inpainting for Microscopy Images Via Deep Self-Supervised Learning 基于深度自监督学习的显微图像联合异常检测与修复
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506454
Ling Huang, Deruo Cheng, Xulei Yang, Tong Lin, Yiqiong Shi, Kaiyi Yang, B. Gwee, B. Wen
While microscopy enables material scientists to view and analyze microstructures, the imaging results often include defects and anomalies with varied shapes and locations. The presence of such anomalies significantly degrades the quality of microscopy images and the subsequent analytical tasks. Comparing to classic feature-based methods, recent advancements in deep learning provide a more efficient, accurate, and scalable approach to detect and remove anomalies in microscopy images. However, most of the deep inpainting and anomaly detection schemes require a certain level of supervision, i.e., either annotation of the anomalies, or a corpus of purely normal data, which are limited in practice for supervision-starving microscopy applications. In this work, we propose a self-supervised deep learning scheme for joint anomaly detection and inpainting of microscopy images. The proposed anomaly detection model can be trained over a mixture of normal and abnormal microscopy images without any labeling. Instead of a two-stage scheme, our multi-task model can simultaneously detect abnormal regions and remove the defects via jointly training. To benchmark such microscopy application under the real-world setup, we propose a novel dataset of real microscopic images of integrated circuits, dubbed MIIC. The proposed dataset contains tens of thousands of normal microscopic images, while we labeled hundreds of them containing various imaging and manufacturing anomalies and defects for testing. Experiments show that the proposed model outperforms various popular or state-of-the-art competing methods for both microscopy image anomaly detection and inpainting.
虽然显微镜使材料科学家能够观察和分析微观结构,但成像结果通常包括各种形状和位置的缺陷和异常。这种异常的存在大大降低了显微镜图像的质量和随后的分析任务。与经典的基于特征的方法相比,深度学习的最新进展提供了一种更有效、更准确、更可扩展的方法来检测和去除显微镜图像中的异常。然而,大多数深度绘制和异常检测方案都需要一定程度的监督,即,要么对异常进行注释,要么使用纯正常数据的语料库,这在监督匮乏的显微镜应用中是有限的。在这项工作中,我们提出了一种自监督深度学习方案,用于显微镜图像的联合异常检测和涂漆。所提出的异常检测模型可以在没有任何标记的情况下对正常和异常显微镜图像进行混合训练。我们的多任务模型可以同时检测异常区域并通过联合训练去除缺陷,而不是采用两阶段方案。为了在真实世界的设置下对这种显微镜应用进行基准测试,我们提出了一个新的集成电路真实显微图像数据集,称为MIIC。提出的数据集包含数万个正常的显微图像,而我们标记了数百个包含各种成像和制造异常和缺陷的图像以供测试。实验表明,所提出的模型在显微镜图像异常检测和涂漆方面优于各种流行的或最先进的竞争方法。
{"title":"Joint Anomaly Detection and Inpainting for Microscopy Images Via Deep Self-Supervised Learning","authors":"Ling Huang, Deruo Cheng, Xulei Yang, Tong Lin, Yiqiong Shi, Kaiyi Yang, B. Gwee, B. Wen","doi":"10.1109/ICIP42928.2021.9506454","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506454","url":null,"abstract":"While microscopy enables material scientists to view and analyze microstructures, the imaging results often include defects and anomalies with varied shapes and locations. The presence of such anomalies significantly degrades the quality of microscopy images and the subsequent analytical tasks. Comparing to classic feature-based methods, recent advancements in deep learning provide a more efficient, accurate, and scalable approach to detect and remove anomalies in microscopy images. However, most of the deep inpainting and anomaly detection schemes require a certain level of supervision, i.e., either annotation of the anomalies, or a corpus of purely normal data, which are limited in practice for supervision-starving microscopy applications. In this work, we propose a self-supervised deep learning scheme for joint anomaly detection and inpainting of microscopy images. The proposed anomaly detection model can be trained over a mixture of normal and abnormal microscopy images without any labeling. Instead of a two-stage scheme, our multi-task model can simultaneously detect abnormal regions and remove the defects via jointly training. To benchmark such microscopy application under the real-world setup, we propose a novel dataset of real microscopic images of integrated circuits, dubbed MIIC. The proposed dataset contains tens of thousands of normal microscopic images, while we labeled hundreds of them containing various imaging and manufacturing anomalies and defects for testing. Experiments show that the proposed model outperforms various popular or state-of-the-art competing methods for both microscopy image anomaly detection and inpainting.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128614574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
3d Point Cloud Completion Using Stacked Auto-Encoder For Structure Preservation 3d点云补全使用堆叠自编码器结构保存
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506398
S. Kumari, S. Raman
3D point cloud completion problem deals with completing the shape from partial points. The problem finds its application in many vision-related applications. Here, structure plays an important role. Most of the existing approaches either do not consider structural information or consider structure at the decoder only. For maintaining the structure, it is also necessary to maintain the position of the available 3D points. However, most of the approaches lack the aspect of maintaining the available structural position. In this paper, we propose to employ stacked auto-encoder in conjunction a with shared Multi-Layer Perceptron (MLP). MLP converts each 3D point into a feature vector and the stacked auto-encoder helps in maintaining the available structural position of the input points. Further, it explores the redundancy present in the feature vector. It aids to incorporate coarse to fine scale information that further helps in better shape representation. The embedded feature is finally decoded by a structural preserving decoder. Both the encoding and the decoding operations of our method take care of preserving the structure of the available shape information. The experimental results demonstrate the structure preserving capability of our network as compared to the state-of-the-art methods.
三维点云补全问题处理的是局部点补全形状的问题。该问题在许多与视觉相关的应用中得到了应用。在这里,结构起着重要的作用。大多数现有的方法要么不考虑结构信息,要么只考虑解码器的结构。为了保持结构,还需要保持可用的3D点的位置。然而,大多数方法缺乏维持现有结构位置的方面。在本文中,我们提出将堆叠式自编码器与共享多层感知器(MLP)结合使用。MLP将每个3D点转换为特征向量,堆叠的自编码器有助于保持输入点的可用结构位置。此外,它还探讨了特征向量中存在的冗余。它有助于结合粗到细的尺度信息,进一步帮助更好的形状表示。最后用结构保持解码器对嵌入特征进行解码。我们的方法的编码和解码操作都注意保持可用形状信息的结构。实验结果表明,与现有方法相比,我们的网络具有良好的结构保持能力。
{"title":"3d Point Cloud Completion Using Stacked Auto-Encoder For Structure Preservation","authors":"S. Kumari, S. Raman","doi":"10.1109/ICIP42928.2021.9506398","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506398","url":null,"abstract":"3D point cloud completion problem deals with completing the shape from partial points. The problem finds its application in many vision-related applications. Here, structure plays an important role. Most of the existing approaches either do not consider structural information or consider structure at the decoder only. For maintaining the structure, it is also necessary to maintain the position of the available 3D points. However, most of the approaches lack the aspect of maintaining the available structural position. In this paper, we propose to employ stacked auto-encoder in conjunction a with shared Multi-Layer Perceptron (MLP). MLP converts each 3D point into a feature vector and the stacked auto-encoder helps in maintaining the available structural position of the input points. Further, it explores the redundancy present in the feature vector. It aids to incorporate coarse to fine scale information that further helps in better shape representation. The embedded feature is finally decoded by a structural preserving decoder. Both the encoding and the decoding operations of our method take care of preserving the structure of the available shape information. The experimental results demonstrate the structure preserving capability of our network as compared to the state-of-the-art methods.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128740884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Hierarchical and Multi-Level Cost Aggregation For Stereo Matching 立体匹配的分层和多级成本聚合
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506215
Wei Guo, Ziyu Zhu, F. Xia, Jiarui Sun, Yong Zhao
Nowadays, convolutional neural networks based on deep learning have greatly improved the performance of stereo matching. To obtain higher disparity estimation accuracy in ill-posed regions, this paper proposes a hierarchical and multi-level model based on a novel cost aggregation module (HMLNet). This effective cost aggregation consists of two main modules: one is the multi-level cost aggregation which incorporates global context information by fusing information in different levels, and the other called the hourglass+ module utilizes sufficiently volumes in the same level to regularize cost volumes better. Also, we take advantage of disparity refinement with residual learning to boost robustness to challenging situations. We conducted comprehensive experiments on Sceneflow, KITTI 2012, and KITTI 2015 datasets. The competitive results prove that our approach outperforms many other stereo matching algorithms.
目前,基于深度学习的卷积神经网络极大地提高了立体匹配的性能。为了在病态区域获得更高的视差估计精度,本文提出了一种基于新型成本聚合模块(HMLNet)的分层多级模型。这种有效的成本聚合包括两个主要模块:一个是多层次成本聚合,通过融合不同层次的信息来融合全局上下文信息;另一个是沙漏+模块,充分利用同一层次的数量,更好地规范成本量。此外,我们利用残差学习的差异细化来提高对挑战性情况的鲁棒性。我们在Sceneflow、KITTI 2012和KITTI 2015数据集上进行了综合实验。对比结果证明了我们的方法优于许多其他的立体匹配算法。
{"title":"Hierarchical and Multi-Level Cost Aggregation For Stereo Matching","authors":"Wei Guo, Ziyu Zhu, F. Xia, Jiarui Sun, Yong Zhao","doi":"10.1109/ICIP42928.2021.9506215","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506215","url":null,"abstract":"Nowadays, convolutional neural networks based on deep learning have greatly improved the performance of stereo matching. To obtain higher disparity estimation accuracy in ill-posed regions, this paper proposes a hierarchical and multi-level model based on a novel cost aggregation module (HMLNet). This effective cost aggregation consists of two main modules: one is the multi-level cost aggregation which incorporates global context information by fusing information in different levels, and the other called the hourglass+ module utilizes sufficiently volumes in the same level to regularize cost volumes better. Also, we take advantage of disparity refinement with residual learning to boost robustness to challenging situations. We conducted comprehensive experiments on Sceneflow, KITTI 2012, and KITTI 2015 datasets. The competitive results prove that our approach outperforms many other stereo matching algorithms.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129353319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Part-Based Feature Squeezing To Detect Adversarial Examples in Person Re-Identification Networks 基于部分特征压缩的人物再识别网络对抗样本检测
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506511
Yu Zheng, Senem Velipasalar
Although deep neural networks (DNNs) have achieved top performances in different computer vision tasks, such as object detection, image segmentation and person re-identification (ReID), they can easily be deceived by adversarial examples, which are carefully crafted images with perturbations that are imperceptible to human eyes. Such adversarial examples can significantly degrade the performance of existing DNNs. There are also targeted attacks misleading classifiers into making specific decisions based on attackers’ intentions. In this paper, we propose a new method to effectively detect adversarial examples presented to a person ReID network. The proposed method utilizes parts-based feature squeezing to detect the adversarial examples. We apply two types of squeezing to segmented body parts to better detect adversarial examples. We perform extensive experiments over three major datasets with different attacks, and compare the detection performance of the proposed body part-based approach with a ReID method that is not parts-based. Experimental results show that the proposed method can effectively detect the adversarial examples, and has the potential to avoid significant decreases in person ReID performance caused by adversarial examples.
尽管深度神经网络(dnn)在不同的计算机视觉任务中取得了优异的表现,如物体检测、图像分割和人物再识别(ReID),但它们很容易被对敌示例所欺骗,这些对敌示例是精心制作的图像,带有人眼无法察觉的扰动。这种对抗性示例会显著降低现有dnn的性能。也有针对性的攻击误导分类器根据攻击者的意图做出具体的决定。在本文中,我们提出了一种新的方法来有效地检测提供给个人ReID网络的对抗样本。该方法利用基于部件的特征压缩来检测对抗样例。我们将两种类型的挤压应用于分割的身体部位,以更好地检测对抗性样本。我们在三个具有不同攻击的主要数据集上进行了广泛的实验,并将所提出的基于身体部位的方法与非基于部位的ReID方法的检测性能进行了比较。实验结果表明,该方法可以有效地检测到对抗样本,并有可能避免对抗样本导致的人员ReID性能显著下降。
{"title":"Part-Based Feature Squeezing To Detect Adversarial Examples in Person Re-Identification Networks","authors":"Yu Zheng, Senem Velipasalar","doi":"10.1109/ICIP42928.2021.9506511","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506511","url":null,"abstract":"Although deep neural networks (DNNs) have achieved top performances in different computer vision tasks, such as object detection, image segmentation and person re-identification (ReID), they can easily be deceived by adversarial examples, which are carefully crafted images with perturbations that are imperceptible to human eyes. Such adversarial examples can significantly degrade the performance of existing DNNs. There are also targeted attacks misleading classifiers into making specific decisions based on attackers’ intentions. In this paper, we propose a new method to effectively detect adversarial examples presented to a person ReID network. The proposed method utilizes parts-based feature squeezing to detect the adversarial examples. We apply two types of squeezing to segmented body parts to better detect adversarial examples. We perform extensive experiments over three major datasets with different attacks, and compare the detection performance of the proposed body part-based approach with a ReID method that is not parts-based. Experimental results show that the proposed method can effectively detect the adversarial examples, and has the potential to avoid significant decreases in person ReID performance caused by adversarial examples.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127137160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Convolutional Neural Networks for Omnidirectional Image Quality Assessment: Pre-Trained or Re-Trained? 用于全方位图像质量评估的卷积神经网络:预训练还是再训练?
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506192
Abderrezzaq Sendjasni, M. Larabi, F. A. Cheikh
The use of convolutional neural networks (CNN) for image quality assessment (IQA) becomes many researcher’s focus. Various pre-trained models are fine-tuned and used for this task. In this paper, we conduct a benchmark study of seven state-of-the-art pre-trained models for IQA of omnidirectional images. To this end, we first train these models using an omnidirectional database and compare their performance with the pre-trained versions. Then, we compare the use of viewports versus equirectangular (ERP) images as inputs to the models. Finally, for the viewports-based models, we explore the impact of the input number of viewports on the models’ performance. Experimental results demonstrated the performance gain of the re-trained CNNs compared to their pre-trained versions. Also, the viewports-based approach outperformed the ERP-based one independently of the number of selected views.
卷积神经网络(CNN)在图像质量评估(IQA)中的应用成为众多研究者关注的焦点。各种预训练模型被微调并用于此任务。在本文中,我们对七个最先进的全向图像IQA预训练模型进行了基准研究。为此,我们首先使用全向数据库训练这些模型,并将其性能与预训练版本进行比较。然后,我们比较视口与等矩形(ERP)图像作为模型输入的使用。最后,对于基于视口的模型,我们探讨了视口输入数量对模型性能的影响。实验结果表明,与预先训练的cnn相比,重新训练的cnn的性能有所提高。此外,基于viewport的方法优于基于erp的方法,与所选视图的数量无关。
{"title":"Convolutional Neural Networks for Omnidirectional Image Quality Assessment: Pre-Trained or Re-Trained?","authors":"Abderrezzaq Sendjasni, M. Larabi, F. A. Cheikh","doi":"10.1109/ICIP42928.2021.9506192","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506192","url":null,"abstract":"The use of convolutional neural networks (CNN) for image quality assessment (IQA) becomes many researcher’s focus. Various pre-trained models are fine-tuned and used for this task. In this paper, we conduct a benchmark study of seven state-of-the-art pre-trained models for IQA of omnidirectional images. To this end, we first train these models using an omnidirectional database and compare their performance with the pre-trained versions. Then, we compare the use of viewports versus equirectangular (ERP) images as inputs to the models. Finally, for the viewports-based models, we explore the impact of the input number of viewports on the models’ performance. Experimental results demonstrated the performance gain of the re-trained CNNs compared to their pre-trained versions. Also, the viewports-based approach outperformed the ERP-based one independently of the number of selected views.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127375168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Of Linear Video Prediction Models In A Multi-Modal Framework For Anomaly Detection 多模态框架下的线性视频预测模型学习及其异常检测
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506049
Giulia Slavic, Abrham Shiferaw Alemaw, L. Marcenaro, C. Regazzoni
This paper proposes a method for performing future-frame prediction and anomaly detection on video data in a multi-modal framework based on Dynamic Bayesian Networks (DBNs). In particular, odometry data and video data from a moving vehicle are fused. A Markov Jump Particle Filter (MJPF) is learned on odometry data, and its features are used to aid the learning of a Kalman Variational Autoencoder (KVAE) on video data. Consequently, anomaly detection can be performed on video data using the learned model. We evaluate the proposed method using multi-modal data from a vehicle performing different tasks in a closed environment.
提出了一种基于动态贝叶斯网络(DBNs)的多模态框架下对视频数据进行未来帧预测和异常检测的方法。特别地,将里程计数据和来自移动车辆的视频数据融合在一起。在里程计数据上学习了马尔可夫跳变粒子滤波(MJPF),并利用其特征帮助学习视频数据上的卡尔曼变分自编码器(KVAE)。因此,可以使用学习到的模型对视频数据进行异常检测。我们使用在封闭环境中执行不同任务的车辆的多模态数据来评估所提出的方法。
{"title":"Learning Of Linear Video Prediction Models In A Multi-Modal Framework For Anomaly Detection","authors":"Giulia Slavic, Abrham Shiferaw Alemaw, L. Marcenaro, C. Regazzoni","doi":"10.1109/ICIP42928.2021.9506049","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506049","url":null,"abstract":"This paper proposes a method for performing future-frame prediction and anomaly detection on video data in a multi-modal framework based on Dynamic Bayesian Networks (DBNs). In particular, odometry data and video data from a moving vehicle are fused. A Markov Jump Particle Filter (MJPF) is learned on odometry data, and its features are used to aid the learning of a Kalman Variational Autoencoder (KVAE) on video data. Consequently, anomaly detection can be performed on video data using the learned model. We evaluate the proposed method using multi-modal data from a vehicle performing different tasks in a closed environment.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127543676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Mobile Registration Number Plate Recognition Using Artificial Intelligence 基于人工智能的手机注册号牌识别
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506699
Syed Talha Abid Ali, Abdul Hakeem Usama, I. R. Khan, M. Khan, Asif Siddiq
Automatic License Plate Recognition (ALPR) for years has remained a persistent topic of research due to numerous practicable applications, especially in the Intelligent Transportation system (ITS). Many currently available solutions are still not robust in various real-world circumstances and often impose constraints like fixed backgrounds and constant distance and camera angles. This paper presents an efficient multi-language repudiate ALPR system based on machine learning. Convolutional Neural Network (CNN) is trained and fine-tuned for the recognition stage to become more dynamic, plaint to diversification of backgrounds. For license plate (LP) detection, a newly released YOLOv5 object detecting framework is used. Data augmentation techniques such as gray scale and rotatation are also used to generate an augmented dataset for the training purpose. This proposed methodology achieved a recognition rate of 92.2%, producing better results than commercially available systems, PlateRecognizer (67%) and OpenALPR (77%). Our experiments validated that the proposed methodology can meet the pressing requirement of real-time analysis in Intelligent Transportation System (ITS).
车牌自动识别(ALPR)在智能交通系统(ITS)中有着广泛的应用,多年来一直是人们研究的热点。许多目前可用的解决方案在各种现实环境中仍然不够强大,并且通常会施加固定背景、恒定距离和相机角度等限制。提出了一种基于机器学习的高效多语言可否认ALPR系统。卷积神经网络(CNN)经过训练和微调,使识别阶段更加动态,以应对背景的多样化。车牌(LP)检测使用新发布的YOLOv5目标检测框架。数据增强技术,如灰度和旋转也用于生成增强数据集的训练目的。该方法实现了92.2%的识别率,比市面上可用的系统PlateRecognizer(67%)和OpenALPR(77%)产生更好的结果。实验结果表明,该方法能够满足智能交通系统实时分析的迫切要求。
{"title":"Mobile Registration Number Plate Recognition Using Artificial Intelligence","authors":"Syed Talha Abid Ali, Abdul Hakeem Usama, I. R. Khan, M. Khan, Asif Siddiq","doi":"10.1109/ICIP42928.2021.9506699","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506699","url":null,"abstract":"Automatic License Plate Recognition (ALPR) for years has remained a persistent topic of research due to numerous practicable applications, especially in the Intelligent Transportation system (ITS). Many currently available solutions are still not robust in various real-world circumstances and often impose constraints like fixed backgrounds and constant distance and camera angles. This paper presents an efficient multi-language repudiate ALPR system based on machine learning. Convolutional Neural Network (CNN) is trained and fine-tuned for the recognition stage to become more dynamic, plaint to diversification of backgrounds. For license plate (LP) detection, a newly released YOLOv5 object detecting framework is used. Data augmentation techniques such as gray scale and rotatation are also used to generate an augmented dataset for the training purpose. This proposed methodology achieved a recognition rate of 92.2%, producing better results than commercially available systems, PlateRecognizer (67%) and OpenALPR (77%). Our experiments validated that the proposed methodology can meet the pressing requirement of real-time analysis in Intelligent Transportation System (ITS).","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130068316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Listen To The Pixels 倾听像素
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506019
S. Chowdhury, Subhrajyoti Dasgupta, Sudip Das, U. Bhattacharya
Performing sound source separation and visual object segmentation jointly in naturally occurring videos is a notoriously difficult task, especially in the absence of annotated data. In this study, we leverage the concurrency between audio and visual modalities in an attempt to solve the joint audio-visual segmentation problem in a self-supervised manner. Human beings interact with the physical world through a few sensory systems such as vision, auditory, movement, etc. The usefulness of the interplay of such systems lies in the concept of degeneracy [1]. It tells us that the cross-modal signals can educate each other without the presence of an external supervisor. In this work, we efficiently exploit this fact that learning from one modality inherently helps to find patterns in others by introducing a novel audio-visual fusion technique. Also, to the best of our knowledge, we are the first to address the partially occluded sound source segmentation task. Our study shows that the proposed model significantly outperforms existing state-of-the-art methods in both visual and audio source separation tasks.
在自然发生的视频中进行声源分离和视觉对象分割是一项非常困难的任务,特别是在没有注释数据的情况下。在本研究中,我们利用音频和视觉模式之间的并发性,试图以自监督的方式解决联合视听分割问题。人类通过一些感官系统,如视觉、听觉、运动等,与物理世界互动。这种系统相互作用的有用性在于简并[1]的概念。它告诉我们,跨模态信号可以在没有外部监督的情况下相互教育。在这项工作中,我们通过引入一种新的视听融合技术,有效地利用了这一事实,即从一种模态中学习本质上有助于发现其他模态中的模式。此外,据我们所知,我们是第一个解决部分遮挡声源分割任务的。我们的研究表明,所提出的模型在视觉和音频源分离任务中都明显优于现有的最先进的方法。
{"title":"Listen To The Pixels","authors":"S. Chowdhury, Subhrajyoti Dasgupta, Sudip Das, U. Bhattacharya","doi":"10.1109/ICIP42928.2021.9506019","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506019","url":null,"abstract":"Performing sound source separation and visual object segmentation jointly in naturally occurring videos is a notoriously difficult task, especially in the absence of annotated data. In this study, we leverage the concurrency between audio and visual modalities in an attempt to solve the joint audio-visual segmentation problem in a self-supervised manner. Human beings interact with the physical world through a few sensory systems such as vision, auditory, movement, etc. The usefulness of the interplay of such systems lies in the concept of degeneracy [1]. It tells us that the cross-modal signals can educate each other without the presence of an external supervisor. In this work, we efficiently exploit this fact that learning from one modality inherently helps to find patterns in others by introducing a novel audio-visual fusion technique. Also, to the best of our knowledge, we are the first to address the partially occluded sound source segmentation task. Our study shows that the proposed model significantly outperforms existing state-of-the-art methods in both visual and audio source separation tasks.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130667596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Joint Co-Attention And Co-Reconstruction Representation Learning For One-Shot Object Detection 针对一次性目标检测的联合共同注意和共同重构表征学习
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506387
Jinghui Chu, Jiawei Feng, Peiguang Jing, Wei Lu
One-shot object detection aims to detect all candidate instances in a target image whose label class is unavailable in training, and only one labeled query image is given in testing. Nevertheless, insufficient utilization of the only known sample is one significant reason causing the performance degradation of current one-shot object detection models. To tackle the problem, we develop joint co-attention and co-reconstruction (CoAR) representation learning for one-shot object detection. The main contributions are described as follows. First, we propose a high-order feature fusion operation to exploit the deep co-attention of each target-query pair, which aims to enhance the correlation of the same class. Second, we use a low-rank structure to reconstruct the target-query feature in channel level, which aims to remove the irrelevant noise and enhance the latent similarity between the region proposals in target image and the query image. Experiments on both PASCAL VOC and MS COCO datasets demonstrate that our method outperforms previous state-of-the-art algorithms.
单次目标检测的目的是检测训练中标签类不可用的目标图像中的所有候选实例,测试中只给出一个标记查询图像。然而,对唯一已知样本的利用不足是导致当前单次目标检测模型性能下降的一个重要原因。为了解决这个问题,我们开发了用于一次性目标检测的联合共同关注和共同重建(CoAR)表示学习。主要贡献如下。首先,我们提出了一种高阶特征融合操作,利用每个目标查询对的深度共关注,增强同一类之间的相关性。其次,采用低秩结构在信道级重构目标查询特征,去除不相关噪声,增强目标图像和查询图像中区域建议的潜在相似度;在PASCAL VOC和MS COCO数据集上的实验表明,我们的方法优于以前最先进的算法。
{"title":"Joint Co-Attention And Co-Reconstruction Representation Learning For One-Shot Object Detection","authors":"Jinghui Chu, Jiawei Feng, Peiguang Jing, Wei Lu","doi":"10.1109/ICIP42928.2021.9506387","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506387","url":null,"abstract":"One-shot object detection aims to detect all candidate instances in a target image whose label class is unavailable in training, and only one labeled query image is given in testing. Nevertheless, insufficient utilization of the only known sample is one significant reason causing the performance degradation of current one-shot object detection models. To tackle the problem, we develop joint co-attention and co-reconstruction (CoAR) representation learning for one-shot object detection. The main contributions are described as follows. First, we propose a high-order feature fusion operation to exploit the deep co-attention of each target-query pair, which aims to enhance the correlation of the same class. Second, we use a low-rank structure to reconstruct the target-query feature in channel level, which aims to remove the irrelevant noise and enhance the latent similarity between the region proposals in target image and the query image. Experiments on both PASCAL VOC and MS COCO datasets demonstrate that our method outperforms previous state-of-the-art algorithms.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130728066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Generating Aesthetic Based Critique For Photographs 为照片生成基于美学的批评
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506385
Yong-Yaw Yeo, John See, Lai-Kuan Wong, Hui-Ngo Goh
The recent surge in deep learning methods across multiple modalities has resulted in an increased interest in image captioning. Most advances in image captioning are still focused on the generation of factual-centric captions, which mainly describe the contents of an image. However, generating captions to provide a meaningful and opinionated critique of photographs is less studied. This paper presents a framework for leveraging aesthetic features encoded from an image aesthetic scorer, to synthesize human-like textual critique via a sequence decoder. Experiments on a large-scale dataset show that the proposed method is capable of producing promising results on relevant metrics relating to semantic diversity and synonymity, with qualitative observations demonstrating likewise. We also suggest the use of Word Mover’s Distance as a semantically intuitive and informative metric for this task.
最近跨多种模式的深度学习方法激增,导致对图像字幕的兴趣增加。图像字幕的大多数进展仍然集中在以事实为中心的字幕的生成上,这些字幕主要描述图像的内容。然而,如何生成文字说明,为照片提供有意义的、固执己见的评论,这方面的研究却很少。本文提出了一个框架,利用从图像美学评分器编码的美学特征,通过序列解码器合成类似人类的文本评论。在大规模数据集上的实验表明,该方法能够在与语义多样性和同义性相关的相关指标上产生有希望的结果,定性观察也证明了这一点。我们还建议使用Word Mover’s Distance作为这项任务的语义直观和信息度量。
{"title":"Generating Aesthetic Based Critique For Photographs","authors":"Yong-Yaw Yeo, John See, Lai-Kuan Wong, Hui-Ngo Goh","doi":"10.1109/ICIP42928.2021.9506385","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506385","url":null,"abstract":"The recent surge in deep learning methods across multiple modalities has resulted in an increased interest in image captioning. Most advances in image captioning are still focused on the generation of factual-centric captions, which mainly describe the contents of an image. However, generating captions to provide a meaningful and opinionated critique of photographs is less studied. This paper presents a framework for leveraging aesthetic features encoded from an image aesthetic scorer, to synthesize human-like textual critique via a sequence decoder. Experiments on a large-scale dataset show that the proposed method is capable of producing promising results on relevant metrics relating to semantic diversity and synonymity, with qualitative observations demonstrating likewise. We also suggest the use of Word Mover’s Distance as a semantically intuitive and informative metric for this task.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132083270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2021 IEEE International Conference on Image Processing (ICIP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1