首页 > 最新文献

2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)最新文献

英文 中文
Real or Fake? A Practical Method for Detecting Tempered Images 是真的还是假的?一种检测调质图像的实用方法
Pub Date : 2022-12-05 DOI: 10.1109/IPAS55744.2022.10052973
Ching-yu Kao, Hongjia Wan, Karla Markert, Konstantin Böttinger
Tempering images has become technology that almost everyone can complete, including fake news, fake evidence presented in court, or forged documents. The main reason is because these editing tools, such as Photoshop, is simple to use, which is an urgent issue we need to solve. Hence, automatic tools helping to find manipulated images apart is critical for fighting misinformation campaigns. Here we propose and evaluate a neural network-based method. It can detect whether images have been artificially modified (classification), and further indicate the forged parts (segmentation). Our proposed method has better performance than most baseline methods. Last but not least, our method is not only effective on JPEG format, but can also be used on other formats.
篡改图像已经成为几乎每个人都能掌握的技术,包括假新闻、法庭上的假证据或伪造文件。主要原因是因为这些编辑工具,比如Photoshop,使用简单,这是我们急需解决的问题。因此,帮助找出被操纵图像的自动工具对于打击虚假信息活动至关重要。在此,我们提出并评估了一种基于神经网络的方法。它可以检测图像是否被人为修改(分类),并进一步指出伪造部分(分割)。与大多数基线方法相比,我们提出的方法具有更好的性能。最后,我们的方法不仅对JPEG格式有效,也适用于其他格式。
{"title":"Real or Fake? A Practical Method for Detecting Tempered Images","authors":"Ching-yu Kao, Hongjia Wan, Karla Markert, Konstantin Böttinger","doi":"10.1109/IPAS55744.2022.10052973","DOIUrl":"https://doi.org/10.1109/IPAS55744.2022.10052973","url":null,"abstract":"Tempering images has become technology that almost everyone can complete, including fake news, fake evidence presented in court, or forged documents. The main reason is because these editing tools, such as Photoshop, is simple to use, which is an urgent issue we need to solve. Hence, automatic tools helping to find manipulated images apart is critical for fighting misinformation campaigns. Here we propose and evaluate a neural network-based method. It can detect whether images have been artificially modified (classification), and further indicate the forged parts (segmentation). Our proposed method has better performance than most baseline methods. Last but not least, our method is not only effective on JPEG format, but can also be used on other formats.","PeriodicalId":322228,"journal":{"name":"2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132337365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel license plate detection based Time-To-Collision calculation for forward collision warning using Azure Kinect 一种基于碰撞时间计算的新型车牌检测方法,使用Azure Kinect进行前向碰撞预警
Pub Date : 2022-12-05 DOI: 10.1109/IPAS55744.2022.10053071
Zhouyan Qiu, J. Martínez-Sánchez, P. Arias
Forward Collision Warning (FCW) system constantly measures the relative position of the vehicle ahead and then predicts collisions. This paper proposes a new cost-effective and computationally efficient FCW method that uses a time-of-flight (ToF) camera to measure relevant distances to the front vehicle based on license plate detection. First, a Yolo V7 model is used to detect license plates to identify vehicles in front of the ego vehicle. Second, the distance between the front vehicle and the ego vehicle is determined by analyzing the captured depth map by the time-of-flight camera. In addition, the relative speed of the vehicle can be calculated by the direct distance change between the license plate and the camera between two consecutive frames. With a processing speed of 25–30 frames per second, the proposed FCW system is capable of determining relative distances and speeds within 26 meters in the real-time.
前方碰撞预警(FCW)系统不断测量前方车辆的相对位置,然后预测碰撞。本文提出了一种基于车牌检测,利用飞行时间(ToF)相机测量到前方车辆的相关距离的低成本、高计算效率的FCW方法。首先,使用Yolo V7模型来检测车牌,以识别ego车辆前面的车辆。其次,通过分析飞行时间相机捕获的深度图来确定前车与后车之间的距离;此外,还可以通过连续两帧之间车牌与摄像头之间的直接距离变化来计算车辆的相对速度。该系统的处理速度为25-30帧/秒,能够实时确定26米内的相对距离和速度。
{"title":"A novel license plate detection based Time-To-Collision calculation for forward collision warning using Azure Kinect","authors":"Zhouyan Qiu, J. Martínez-Sánchez, P. Arias","doi":"10.1109/IPAS55744.2022.10053071","DOIUrl":"https://doi.org/10.1109/IPAS55744.2022.10053071","url":null,"abstract":"Forward Collision Warning (FCW) system constantly measures the relative position of the vehicle ahead and then predicts collisions. This paper proposes a new cost-effective and computationally efficient FCW method that uses a time-of-flight (ToF) camera to measure relevant distances to the front vehicle based on license plate detection. First, a Yolo V7 model is used to detect license plates to identify vehicles in front of the ego vehicle. Second, the distance between the front vehicle and the ego vehicle is determined by analyzing the captured depth map by the time-of-flight camera. In addition, the relative speed of the vehicle can be calculated by the direct distance change between the license plate and the camera between two consecutive frames. With a processing speed of 25–30 frames per second, the proposed FCW system is capable of determining relative distances and speeds within 26 meters in the real-time.","PeriodicalId":322228,"journal":{"name":"2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)","volume":"409 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115935091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accurate Medicinal Plant Identification in Natural Environments by Embedding Mutual Information in a Convolution Neural Network Model 在卷积神经网络模型中嵌入互信息的自然环境下药用植物的准确识别
Pub Date : 2022-12-05 DOI: 10.1109/IPAS55744.2022.10053008
Lida Shahmiri, P. Wong, L. Dooley
Medicinal plants are a primary source of disease treatment in many countries. As most are edible however, consumption of the wrong herbal plants can have serious consequences and even lead to death. Automatic accurate recognition of plant species to help users who do not have specialist knowledge of herbal plants is thus a desirable aim. Several automatic medicinal plant identification systems have been proposed, though most are significantly constrained either in the small number of species or in requiring manual image segmentation of plant leaves. This means they are captured on a plain background rather than being readily identified in their natural surroundings, which often involve complex and noisy backgrounds. While deep learning (DL) based methods have made considerable strides in recent times, their potential has not always been maximised because they are trained with samples which are not always fully representative of the intra-class and inter-class differences between the plant species concerned. This paper addresses this challenge by incorporating mutual information into a Convolutional Neural Network (CNN) model to select samples for the training, validation, and testing sets based on a similarity measure. A critical comparative evaluation of this new CNN medicinal plant classification model incorporating a mutual information guided training (MIGT) algorithm for sample selection, corroborates the superior classification performance achieved for the VNPlant-200 dataset, with an average accuracy of more than 97%, while the precision and recall values are also consistently above 97%. This is significantly better than existing CNN classification methods for this dataset as it crucially means false positive rates are substantially lower thus affording improved identification reliability.
药用植物是许多国家治疗疾病的主要来源。然而,由于大多数是可食用的,食用错误的草药植物可能会造成严重后果,甚至导致死亡。自动准确识别植物种类,以帮助没有草药专业知识的用户,因此是一个理想的目标。目前已经提出了几种药用植物自动识别系统,但大多数都受到物种数量少或需要人工对植物叶片进行图像分割的限制。这意味着它们被拍摄在一个简单的背景上,而不是在它们的自然环境中很容易被识别出来,而自然环境通常涉及复杂和嘈杂的背景。虽然基于深度学习(DL)的方法近年来取得了相当大的进步,但它们的潜力并不总是得到最大限度的发挥,因为它们使用的样本并不总是完全代表有关植物物种之间的类内和类间差异。本文通过将互信息整合到卷积神经网络(CNN)模型中,以基于相似性度量为训练、验证和测试集选择样本,解决了这一挑战。对采用互信息引导训练(MIGT)算法进行样本选择的CNN药用植物分类模型进行了关键的对比评估,证实了VNPlant-200数据集取得的优异分类性能,平均准确率超过97%,而精度和召回率也始终在97%以上。对于该数据集,这明显优于现有的CNN分类方法,因为它至关重要地意味着误报率大大降低,从而提高了识别的可靠性。
{"title":"Accurate Medicinal Plant Identification in Natural Environments by Embedding Mutual Information in a Convolution Neural Network Model","authors":"Lida Shahmiri, P. Wong, L. Dooley","doi":"10.1109/IPAS55744.2022.10053008","DOIUrl":"https://doi.org/10.1109/IPAS55744.2022.10053008","url":null,"abstract":"Medicinal plants are a primary source of disease treatment in many countries. As most are edible however, consumption of the wrong herbal plants can have serious consequences and even lead to death. Automatic accurate recognition of plant species to help users who do not have specialist knowledge of herbal plants is thus a desirable aim. Several automatic medicinal plant identification systems have been proposed, though most are significantly constrained either in the small number of species or in requiring manual image segmentation of plant leaves. This means they are captured on a plain background rather than being readily identified in their natural surroundings, which often involve complex and noisy backgrounds. While deep learning (DL) based methods have made considerable strides in recent times, their potential has not always been maximised because they are trained with samples which are not always fully representative of the intra-class and inter-class differences between the plant species concerned. This paper addresses this challenge by incorporating mutual information into a Convolutional Neural Network (CNN) model to select samples for the training, validation, and testing sets based on a similarity measure. A critical comparative evaluation of this new CNN medicinal plant classification model incorporating a mutual information guided training (MIGT) algorithm for sample selection, corroborates the superior classification performance achieved for the VNPlant-200 dataset, with an average accuracy of more than 97%, while the precision and recall values are also consistently above 97%. This is significantly better than existing CNN classification methods for this dataset as it crucially means false positive rates are substantially lower thus affording improved identification reliability.","PeriodicalId":322228,"journal":{"name":"2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124281898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A CNN Architecture for Detection and Segmentation of Colorectal Polyps from CCE Images 从CCE图像中检测和分割结肠息肉的CNN架构
Pub Date : 2022-12-05 DOI: 10.1109/IPAS55744.2022.10052795
A. Tashk, Kasim E. Şahin, J. Herp, E. Nadimi
Colon capsule endoscopy (CCE) as a novel 2D biomedical image modality based on visible light provides a higher perspective of the potential gastrointestinal lesions like polyps within the small and large intestines than the conventional colonoscopy. As the quality of images acquired via CCE imagery is low, so the artificial intelligence methods are proposed to help detect and localize polyps within an acceptable level of efficiency and performance. In this paper, a new deep neural network architecture known as AID-U-Net is proposed. AID-U-Net consists of two distinct types of paths: a) Two main contracting/expansive paths, and b) Two sub-contracting/expansive paths. The playing role of the main paths is to localize polyps as the target objectives in high resolution and multi-scale manner, while the two sub paths are responsible for preserving and conveying the information of low resolution and low-scale target objects. Furthermore, the proposed network architecture provides simplicity so that the model can be deployed for real time processing. AID-U-Net with an implementation of a VGG19 backbone shows better performance to detect polyps in CCE images in comparison with the other state-of-the-art U-Net models like conventional U-Net, U-Net++, and U-Net3+ with different pre-trained backbones like ImageNet, VGG19, ResNeXt50, Resnet50, InceptionV3 and InceptionResNetV2.
结肠胶囊内窥镜(CCE)作为一种新型的基于可见光的二维生物医学成像方式,比传统的结肠镜检查提供了更高的视角来观察小肠和大肠息肉等潜在的胃肠道病变。由于通过CCE图像获得的图像质量较低,因此提出了人工智能方法来帮助在可接受的效率和性能水平下检测和定位息肉。本文提出了一种新的深度神经网络结构,称为AID-U-Net。AID-U-Net由两种不同类型的路径组成:a)两条主承包/扩展路径,b)两条分包/扩展路径。主路径的作用是将息肉以高分辨率、多尺度的方式定位为目标目标,而两个子路径则负责低分辨率、低尺度目标目标的信息保存和传递。此外,所提出的网络体系结构提供了简单性,使模型可以部署用于实时处理。与使用ImageNet、VGG19、ResNeXt50、Resnet50、InceptionV3和InceptionResNetV2等不同预训练骨干网的传统U-Net、u - net++和U-Net3+等先进U-Net模型相比,使用VGG19骨干网实现的AID-U-Net在CCE图像中显示出更好的息肉检测性能。
{"title":"A CNN Architecture for Detection and Segmentation of Colorectal Polyps from CCE Images","authors":"A. Tashk, Kasim E. Şahin, J. Herp, E. Nadimi","doi":"10.1109/IPAS55744.2022.10052795","DOIUrl":"https://doi.org/10.1109/IPAS55744.2022.10052795","url":null,"abstract":"Colon capsule endoscopy (CCE) as a novel 2D biomedical image modality based on visible light provides a higher perspective of the potential gastrointestinal lesions like polyps within the small and large intestines than the conventional colonoscopy. As the quality of images acquired via CCE imagery is low, so the artificial intelligence methods are proposed to help detect and localize polyps within an acceptable level of efficiency and performance. In this paper, a new deep neural network architecture known as AID-U-Net is proposed. AID-U-Net consists of two distinct types of paths: a) Two main contracting/expansive paths, and b) Two sub-contracting/expansive paths. The playing role of the main paths is to localize polyps as the target objectives in high resolution and multi-scale manner, while the two sub paths are responsible for preserving and conveying the information of low resolution and low-scale target objects. Furthermore, the proposed network architecture provides simplicity so that the model can be deployed for real time processing. AID-U-Net with an implementation of a VGG19 backbone shows better performance to detect polyps in CCE images in comparison with the other state-of-the-art U-Net models like conventional U-Net, U-Net++, and U-Net3+ with different pre-trained backbones like ImageNet, VGG19, ResNeXt50, Resnet50, InceptionV3 and InceptionResNetV2.","PeriodicalId":322228,"journal":{"name":"2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124844166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Experimental Validation of Photogrammetry based 3D Reconstruction Software 基于摄影测量的三维重建软件实验验证
Pub Date : 2022-12-05 DOI: 10.1109/IPAS55744.2022.10053055
Razeen Hussain, Marianna Pizzo, Giorgio Ballestin, Manuela Chessa, F. Solari
3D reconstruction is of interest to several fields. However, obtaining the 3D model is usually a time-consuming task that involves manual measurements and reproduction of the object using CAD software, which is not always feasible (e.g. for organic shapes). The necessity of quickly obtaining a dimensionally accurate 3D model of an object has led to the development of several reconstruction techniques, either vision based (with photogrammetry), using laser scanners, or a combination of the two. The contribution of this study is in the analysis of the performances of currently available 3D reconstruction frameworks with the aim of providing a guideline to novice users who may be unfamiliar with 3D reconstruction technologies. We evaluate various software packages on a synthetic dataset representing objects of various shapes and sizes. For comparison, we consider various metrics such as mean errors in the reconstructed cloud point and meshes and reconstruction time. Our results indicate that Colmap produces the best reconstruction.
三维重建是几个领域感兴趣的。然而,获得3D模型通常是一项耗时的任务,涉及使用CAD软件手动测量和复制对象,这并不总是可行的(例如有机形状)。快速获得物体尺寸精确的3D模型的必要性导致了几种重建技术的发展,要么基于视觉(与摄影测量),使用激光扫描仪,或两者的结合。本研究的贡献在于分析了目前可用的3D重建框架的性能,目的是为可能不熟悉3D重建技术的新手用户提供指导。我们在一个表示各种形状和大小的对象的合成数据集上评估各种软件包。为了进行比较,我们考虑了各种指标,如重建云点和网格的平均误差和重建时间。结果表明,Colmap的重建效果最好。
{"title":"Experimental Validation of Photogrammetry based 3D Reconstruction Software","authors":"Razeen Hussain, Marianna Pizzo, Giorgio Ballestin, Manuela Chessa, F. Solari","doi":"10.1109/IPAS55744.2022.10053055","DOIUrl":"https://doi.org/10.1109/IPAS55744.2022.10053055","url":null,"abstract":"3D reconstruction is of interest to several fields. However, obtaining the 3D model is usually a time-consuming task that involves manual measurements and reproduction of the object using CAD software, which is not always feasible (e.g. for organic shapes). The necessity of quickly obtaining a dimensionally accurate 3D model of an object has led to the development of several reconstruction techniques, either vision based (with photogrammetry), using laser scanners, or a combination of the two. The contribution of this study is in the analysis of the performances of currently available 3D reconstruction frameworks with the aim of providing a guideline to novice users who may be unfamiliar with 3D reconstruction technologies. We evaluate various software packages on a synthetic dataset representing objects of various shapes and sizes. For comparison, we consider various metrics such as mean errors in the reconstructed cloud point and meshes and reconstruction time. Our results indicate that Colmap produces the best reconstruction.","PeriodicalId":322228,"journal":{"name":"2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122899133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Vision Transformer for Automatic Student Engagement Estimation 用于学生参与度自动评估的视觉转换器
Pub Date : 2022-12-05 DOI: 10.1109/IPAS55744.2022.10052945
Sandeep Mandia, Kuldeep Singh, R. Mitharwal
Availability of the internet and quality of content attracted more learners to online platforms that are stimulated by COVID-19. Students of different cognitive capabilities join the learning process. However, it is challenging for the instructor to identify the level of comprehension of the individual learner, specifically when they waver in responding to feedback. The learner's facial expressions relate to content comprehension and engagement. This paper presents use of the vision transformer (ViT) to model automatic estimation of student engagement by learning the end-to-end features from facial images. The ViT architecture is used to enlarge the receptive field of the architecture by exploiting the multi-head attention operations. The model is trained using various loss functions to handle class imbalance. The ViT is evaluated on Dataset for Affective States in E-Environments (DAiSEE); it outperformed frame level baseline result by approximately 8% and the other two video level benchmarks by 8.78% and 2.78% achieving an overall accuracy of 55.18%. In addition, ViT with focal loss was also able to produce well distribution among classes except for one minority class.
互联网的可用性和内容的质量吸引了更多的学习者使用受COVID-19刺激的在线平台。不同认知能力的学生加入学习过程。然而,对于教师来说,确定单个学习者的理解水平是具有挑战性的,特别是当他们对反馈的反应犹豫不决时。学习者的面部表情与内容理解和参与有关。本文介绍了使用视觉转换器(ViT)通过学习面部图像的端到端特征来建模学生参与度的自动估计。ViT结构通过利用多头注意操作来扩大结构的接受域。使用各种损失函数来训练模型以处理类的不平衡。在电子环境中情感状态数据集(DAiSEE)上对ViT进行了评估;它比帧级基准测试结果高出约8%,比其他两个视频级基准测试结果高出8.78%和2.78%,总体准确率达到55.18%。此外,除一个少数类外,有焦损的ViT在各类间的分布也很好。
{"title":"Vision Transformer for Automatic Student Engagement Estimation","authors":"Sandeep Mandia, Kuldeep Singh, R. Mitharwal","doi":"10.1109/IPAS55744.2022.10052945","DOIUrl":"https://doi.org/10.1109/IPAS55744.2022.10052945","url":null,"abstract":"Availability of the internet and quality of content attracted more learners to online platforms that are stimulated by COVID-19. Students of different cognitive capabilities join the learning process. However, it is challenging for the instructor to identify the level of comprehension of the individual learner, specifically when they waver in responding to feedback. The learner's facial expressions relate to content comprehension and engagement. This paper presents use of the vision transformer (ViT) to model automatic estimation of student engagement by learning the end-to-end features from facial images. The ViT architecture is used to enlarge the receptive field of the architecture by exploiting the multi-head attention operations. The model is trained using various loss functions to handle class imbalance. The ViT is evaluated on Dataset for Affective States in E-Environments (DAiSEE); it outperformed frame level baseline result by approximately 8% and the other two video level benchmarks by 8.78% and 2.78% achieving an overall accuracy of 55.18%. In addition, ViT with focal loss was also able to produce well distribution among classes except for one minority class.","PeriodicalId":322228,"journal":{"name":"2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)","volume":"138 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115756216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
6D Pose Estimation for Precision Assembly 高精度装配的6D姿态估计
Pub Date : 2022-12-05 DOI: 10.1109/IPAS55744.2022.10052989
Ola Skeik, M. S. Erden, X. Kong
The assembly of 3D products with complex geometry and material, such as a concentrator photovoltaics solar panel unit, is typically conducted manually. This results in low efficiency, precision and throughput. This study is motivated by an actual industrial need and targeted towards automation of the currently manual assembly process. By replacing the manual assembly with robotic assembly systems, the efficiency and throughput could be improved. Prior to assembly, it is essential to estimate the pose of the objects to be assembled with high precision. The choice of the machine vision is important and plays a critical role in the overall accuracy of such a complex task. Therefore, this work focuses on the 6D pose estimation for precision assembly utilizing a 3D vision sensor. The sensor we use is a 3D structured light scanner which can generate high quality point cloud data in addition to 2D images. A 6D pose estimation method is developed for an actual industrial solar-cell object, which is one of the four objects of an assembly unit of concentrator photovoltaics solar panel. The proposed approach is a hybrid approach where a mask R-CNN network is trained on our custom dataset and the trained model is utilized such that the predicted 2D bounding boxes are used for point cloud segmentation. Then, the iterative closest point algorithm is used to estimate the object's pose by matching the CAD model to the segmented object in point cloud.
具有复杂几何形状和材料的3D产品的组装,例如聚光光伏太阳能电池板单元,通常是手动进行的。这导致了低效率,精度和吞吐量。本研究的动机是实际的工业需求,并针对目前手工装配过程的自动化。用机器人装配系统代替人工装配,可以提高效率和产量。在装配之前,必须对被装配物体的姿态进行高精度估计。机器视觉的选择是非常重要的,并且对这样一个复杂任务的整体精度起着至关重要的作用。因此,这项工作的重点是利用3D视觉传感器进行精确装配的6D姿态估计。我们使用的传感器是一个3D结构光扫描仪,除了2D图像外,还可以生成高质量的点云数据。针对聚光光伏太阳能电池板装配单元中四个物体之一的实际工业太阳能电池物体,提出了一种6D姿态估计方法。提出的方法是一种混合方法,其中在我们的自定义数据集上训练掩模R-CNN网络,并利用训练好的模型,使预测的2D边界框用于点云分割。然后,利用迭代最近点算法将CAD模型与点云中分割的目标进行匹配,估计目标的姿态;
{"title":"6D Pose Estimation for Precision Assembly","authors":"Ola Skeik, M. S. Erden, X. Kong","doi":"10.1109/IPAS55744.2022.10052989","DOIUrl":"https://doi.org/10.1109/IPAS55744.2022.10052989","url":null,"abstract":"The assembly of 3D products with complex geometry and material, such as a concentrator photovoltaics solar panel unit, is typically conducted manually. This results in low efficiency, precision and throughput. This study is motivated by an actual industrial need and targeted towards automation of the currently manual assembly process. By replacing the manual assembly with robotic assembly systems, the efficiency and throughput could be improved. Prior to assembly, it is essential to estimate the pose of the objects to be assembled with high precision. The choice of the machine vision is important and plays a critical role in the overall accuracy of such a complex task. Therefore, this work focuses on the 6D pose estimation for precision assembly utilizing a 3D vision sensor. The sensor we use is a 3D structured light scanner which can generate high quality point cloud data in addition to 2D images. A 6D pose estimation method is developed for an actual industrial solar-cell object, which is one of the four objects of an assembly unit of concentrator photovoltaics solar panel. The proposed approach is a hybrid approach where a mask R-CNN network is trained on our custom dataset and the trained model is utilized such that the predicted 2D bounding boxes are used for point cloud segmentation. Then, the iterative closest point algorithm is used to estimate the object's pose by matching the CAD model to the segmented object in point cloud.","PeriodicalId":322228,"journal":{"name":"2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122589274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI based Automatic Vehicle Detection from Unmanned Aerial Vehicles (UAV) using YOLOv5 Model 基于YOLOv5模型的无人机自动车辆检测
Pub Date : 2022-12-05 DOI: 10.1109/IPAS55744.2022.10053056
A. Panthakkan, N. Valappil, S. Al-Mansoori, Hussain Al-Ahmad
Unmanned aerial vehicle (UAV) detection of moving vehicles is becoming into a significant study area in traffic control, surveillance, and military applications. The challenge arises in keeping minimal computational complexity allowing the system to be real-time as well. Applications of vehicle detection from UAVs include traffic parameter estimation, violation detection, number plate reading, and parking lot monitoring. The one stage detection model, YOLOv5 is used in this research work to develop a deep neural model-based vehicle detection system on highways from UAVs. In our system, several improvised strategies are put forth that are appropriate for small vehicle recognition under an aerial view angle which can accomplish real-time detection and high accuracy by incorporating an optimal pooling approach and dense topology method. Tilting the orientation of aerial photographs can improve the system's effectiveness. Metrics like hit rate, accuracy, and precision values are used to assess the performance of the proposed hybrid model, and performance is compared to that of other state-of-the-art algorithms.
无人驾驶飞行器(UAV)对移动车辆的检测已成为交通控制、监视和军事应用的一个重要研究领域。挑战在于保持最小的计算复杂度,同时保证系统的实时性。无人机车辆检测的应用包括交通参数估计、违规检测、车牌读取和停车场监控。本研究采用单阶段检测模型YOLOv5,开发了一种基于深度神经模型的高速公路无人机车辆检测系统。在本系统中,结合最优池化方法和密集拓扑方法,提出了几种适合于鸟瞰视角下小型车辆识别的临时策略,实现了实时检测和高精度检测。倾斜航拍照片的方向可以提高系统的有效性。命中率、准确度和精度值等指标用于评估所提出的混合模型的性能,并将性能与其他最先进的算法进行比较。
{"title":"AI based Automatic Vehicle Detection from Unmanned Aerial Vehicles (UAV) using YOLOv5 Model","authors":"A. Panthakkan, N. Valappil, S. Al-Mansoori, Hussain Al-Ahmad","doi":"10.1109/IPAS55744.2022.10053056","DOIUrl":"https://doi.org/10.1109/IPAS55744.2022.10053056","url":null,"abstract":"Unmanned aerial vehicle (UAV) detection of moving vehicles is becoming into a significant study area in traffic control, surveillance, and military applications. The challenge arises in keeping minimal computational complexity allowing the system to be real-time as well. Applications of vehicle detection from UAVs include traffic parameter estimation, violation detection, number plate reading, and parking lot monitoring. The one stage detection model, YOLOv5 is used in this research work to develop a deep neural model-based vehicle detection system on highways from UAVs. In our system, several improvised strategies are put forth that are appropriate for small vehicle recognition under an aerial view angle which can accomplish real-time detection and high accuracy by incorporating an optimal pooling approach and dense topology method. Tilting the orientation of aerial photographs can improve the system's effectiveness. Metrics like hit rate, accuracy, and precision values are used to assess the performance of the proposed hybrid model, and performance is compared to that of other state-of-the-art algorithms.","PeriodicalId":322228,"journal":{"name":"2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128774642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
XAI enhancing cyber defence against adversarial attacks in industrial applications 增强工业应用中针对对抗性攻击的网络防御
Pub Date : 2022-12-05 DOI: 10.1109/IPAS55744.2022.10052858
Georgios Makridis, Spyros Theodoropoulos, Dimitrios Dardanis, I. Makridis, Maria Margarita Separdani, G. Fatouros, D. Kyriazis, Panagiotis Koulouris
In recent years there is a surge of interest in the interpretability and explainability of AI systems, which is largely motivated by the need for ensuring the transparency and accountability of Artificial Intelligence (AI) operations, as well as by the need to minimize the cost and consequences of poor decisions. Another challenge that needs to be mentioned is the Cyber security attacks against AI infrastructures in manufacturing environments. This study examines eXplainable AI (XAI)-enhanced approaches against adversarial attacks for optimizing Cyber defense methods in manufacturing image classification tasks. The examined XAI methods were applied to an image classification task providing some insightful results regarding the utility of Local Interpretable Model-agnostic Explanations (LIME), Saliency maps, and the Gradient-weighted Class Activation Mapping (Grad-Cam) as methods to fortify a dataset against gradient evasion attacks. To this end, we “attacked” the XAI-enhanced Images and used them as input to the classifier to measure their robustness of it. Given the analyzed dataset, our research indicates that LIME-masked images are more robust to adversarial attacks. We additionally propose an Encoder-Decoder schema that timely predicts (decodes) the masked images, setting the proposed approach sufficient for a real-life problem.
近年来,人们对人工智能系统的可解释性和可解释性产生了浓厚的兴趣,这主要是由于需要确保人工智能(AI)操作的透明度和问责制,以及需要将错误决策的成本和后果降至最低。另一个需要提及的挑战是制造环境中针对人工智能基础设施的网络安全攻击。本研究探讨了可解释的人工智能(XAI)增强的对抗对抗性攻击的方法,以优化制造图像分类任务中的网络防御方法。将所研究的XAI方法应用于图像分类任务,提供了一些关于局部可解释的模型不可知论解释(LIME)、显著性图和梯度加权类激活映射(gradcam)的效用的深刻结果,这些方法可以增强数据集抵御梯度逃避攻击。为此,我们“攻击”了xai增强的图像,并将它们作为分类器的输入来测量它们的鲁棒性。考虑到分析的数据集,我们的研究表明,石灰掩蔽图像对对抗性攻击更健壮。我们还提出了一个编码器-解码器模式,可以及时预测(解码)被屏蔽的图像,使所提出的方法足以解决现实问题。
{"title":"XAI enhancing cyber defence against adversarial attacks in industrial applications","authors":"Georgios Makridis, Spyros Theodoropoulos, Dimitrios Dardanis, I. Makridis, Maria Margarita Separdani, G. Fatouros, D. Kyriazis, Panagiotis Koulouris","doi":"10.1109/IPAS55744.2022.10052858","DOIUrl":"https://doi.org/10.1109/IPAS55744.2022.10052858","url":null,"abstract":"In recent years there is a surge of interest in the interpretability and explainability of AI systems, which is largely motivated by the need for ensuring the transparency and accountability of Artificial Intelligence (AI) operations, as well as by the need to minimize the cost and consequences of poor decisions. Another challenge that needs to be mentioned is the Cyber security attacks against AI infrastructures in manufacturing environments. This study examines eXplainable AI (XAI)-enhanced approaches against adversarial attacks for optimizing Cyber defense methods in manufacturing image classification tasks. The examined XAI methods were applied to an image classification task providing some insightful results regarding the utility of Local Interpretable Model-agnostic Explanations (LIME), Saliency maps, and the Gradient-weighted Class Activation Mapping (Grad-Cam) as methods to fortify a dataset against gradient evasion attacks. To this end, we “attacked” the XAI-enhanced Images and used them as input to the classifier to measure their robustness of it. Given the analyzed dataset, our research indicates that LIME-masked images are more robust to adversarial attacks. We additionally propose an Encoder-Decoder schema that timely predicts (decodes) the masked images, setting the proposed approach sufficient for a real-life problem.","PeriodicalId":322228,"journal":{"name":"2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126457596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Image Processing and Control of Tracking Intelligent Vehicle Based on Grayscale Camera 基于灰度摄像机的智能车辆跟踪图像处理与控制
Pub Date : 2022-12-05 DOI: 10.1109/IPAS55744.2022.10053002
Jian Zhang, Yufan Liu, Ao Li, Jinshan Zeng, Hongtu Xie
In order to realize the rapid and stable recognition and automatic tracking of various complex roads by the intelligent vehicles, this paper proposes image processing and cascade Proportion Integration Differentiation (PID) steering and speed control algorithms based on CMOS grayscale cameras in the context of the national college student intelligent vehicle competition. First, the grayscale image of the track is acquired by the grayscale camera. Then, the Otsu method is used to binarize the image, and the information of black boundary guide line is extracted. In order to improve the speed of the race, various track elements in the image are identified and classified, and the deviation between the actual centerline position and the ideal centerline position of the intelligent vehicle is calculated. Third, the discrete incremental cascade PID control algorithm is used to calculate the pulse width modulation (PWM) signal corresponding to the deviation. And the PWM signal is acted on the steering motor through the driving circuit, driving the intelligent vehicle to always drive along the middle road, so as to achieve the purpose of automatic tracking guidance. Experiments prove that the intelligent vehicle of this design can identify complex roads quickly and in a stable way, accurately complete automatic tracking, and obtain higher speed performance.
为了实现智能车辆对各种复杂道路的快速、稳定的识别和自动跟踪,本文以全国大学生智能汽车大赛为背景,提出了基于CMOS灰度摄像头的图像处理和级联PID转向与速度控制算法。首先,由灰度摄像机获取轨道的灰度图像;然后,采用Otsu方法对图像进行二值化,提取黑边界导线信息;为了提高比赛速度,对图像中的各种赛道元素进行识别和分类,计算智能车实际中心线位置与理想中心线位置之间的偏差。第三,采用离散增量级联PID控制算法计算偏差对应的脉宽调制(PWM)信号。并通过驱动电路将PWM信号作用于转向电机,驱动智能车辆始终沿中间道路行驶,从而达到自动跟踪引导的目的。实验证明,本设计的智能车辆能够快速、稳定地识别复杂道路,准确地完成自动跟踪,获得较高的速度性能。
{"title":"Image Processing and Control of Tracking Intelligent Vehicle Based on Grayscale Camera","authors":"Jian Zhang, Yufan Liu, Ao Li, Jinshan Zeng, Hongtu Xie","doi":"10.1109/IPAS55744.2022.10053002","DOIUrl":"https://doi.org/10.1109/IPAS55744.2022.10053002","url":null,"abstract":"In order to realize the rapid and stable recognition and automatic tracking of various complex roads by the intelligent vehicles, this paper proposes image processing and cascade Proportion Integration Differentiation (PID) steering and speed control algorithms based on CMOS grayscale cameras in the context of the national college student intelligent vehicle competition. First, the grayscale image of the track is acquired by the grayscale camera. Then, the Otsu method is used to binarize the image, and the information of black boundary guide line is extracted. In order to improve the speed of the race, various track elements in the image are identified and classified, and the deviation between the actual centerline position and the ideal centerline position of the intelligent vehicle is calculated. Third, the discrete incremental cascade PID control algorithm is used to calculate the pulse width modulation (PWM) signal corresponding to the deviation. And the PWM signal is acted on the steering motor through the driving circuit, driving the intelligent vehicle to always drive along the middle road, so as to achieve the purpose of automatic tracking guidance. Experiments prove that the intelligent vehicle of this design can identify complex roads quickly and in a stable way, accurately complete automatic tracking, and obtain higher speed performance.","PeriodicalId":322228,"journal":{"name":"2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127826738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1