2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)最新文献

英文中文

White Flies and Black Aphids Detection in Field Vegetable Crops using Deep Learning 利用深度学习检测大田蔬菜作物中的白蝇和黑蚜

2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)

Pub Date : 2022-12-05 DOI: 10.1109/IPAS55744.2022.10052855

Nikolaos Giakoumoglou, E. Pechlivani, N. Katsoulas, D. Tzovaras

Digital image processing for the early detection of plant pests as insects in vegetable crops is essential for plant's yield and quality. In recent years, deep learning has made strides in the digital image processing, opening up new possibilities for pest monitoring. In this paper, state-of-the-art deep learning models are presented to detect common insect pests in vegetable cultivation named whiteflies and black aphids. Due to the absence of data sources addressing the aforementioned insect pests, adhesive traps for catching the target insects were used for the creation of an annotated image dataset. In total 225 images were collected, and 5904 insect instances were labelled by expert agronomists. This dataset faces many challenges such as the tiny size of objects, occlusions and resemblance. Object detection models were used like YOLOv3, YOLOv5, Faster R-CNN, Mask R-CNN, and RetinaNet as baseline algorithms for benchmark experiments. For achieving accurate results, data augmentation was used. This study has addressed these challenges by applying deep learning models which are able to deal with tiny object detection ascribed to very small insect size. The experiment results exhibit a mean Average Precision (mAP) of 75%. Dataset is available for download at https://zenodo.org/record/7139220

利用数字图像处理技术对蔬菜作物害虫进行早期检测，对提高作物的产量和品质至关重要。近年来，深度学习在数字图像处理方面取得了长足的进步，为害虫监测开辟了新的可能性。本文提出了最先进的深度学习模型来检测蔬菜种植中常见的害虫，即白蝇和黑蚜。由于缺乏针对上述害虫的数据源，因此使用粘附陷阱捕获目标昆虫来创建带注释的图像数据集。共收集了225幅图像，由农学家对5904个昆虫实例进行了标记。该数据集面临许多挑战，例如物体的微小尺寸，遮挡和相似性。使用YOLOv3、YOLOv5、Faster R-CNN、Mask R-CNN、RetinaNet等目标检测模型作为基准算法进行基准实验。为了获得准确的结果，使用了数据增强。本研究通过应用深度学习模型解决了这些挑战，该模型能够处理归因于非常小的昆虫尺寸的微小物体检测。实验结果表明，平均精度(mAP)为75%。数据集可从https://zenodo.org/record/7139220下载

{"title":"White Flies and Black Aphids Detection in Field Vegetable Crops using Deep Learning","authors":"Nikolaos Giakoumoglou, E. Pechlivani, N. Katsoulas, D. Tzovaras","doi":"10.1109/IPAS55744.2022.10052855","DOIUrl":"https://doi.org/10.1109/IPAS55744.2022.10052855","url":null,"abstract":"Digital image processing for the early detection of plant pests as insects in vegetable crops is essential for plant's yield and quality. In recent years, deep learning has made strides in the digital image processing, opening up new possibilities for pest monitoring. In this paper, state-of-the-art deep learning models are presented to detect common insect pests in vegetable cultivation named whiteflies and black aphids. Due to the absence of data sources addressing the aforementioned insect pests, adhesive traps for catching the target insects were used for the creation of an annotated image dataset. In total 225 images were collected, and 5904 insect instances were labelled by expert agronomists. This dataset faces many challenges such as the tiny size of objects, occlusions and resemblance. Object detection models were used like YOLOv3, YOLOv5, Faster R-CNN, Mask R-CNN, and RetinaNet as baseline algorithms for benchmark experiments. For achieving accurate results, data augmentation was used. This study has addressed these challenges by applying deep learning models which are able to deal with tiny object detection ascribed to very small insect size. The experiment results exhibit a mean Average Precision (mAP) of 75%. Dataset is available for download at https://zenodo.org/record/7139220","PeriodicalId":322228,"journal":{"name":"2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124411878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Bacterial Blight and Cotton Leaf Curl Virus Detection Using Inception V4 Based CNN Model for Cotton Crops 基于Inception V4 CNN模型的棉花白叶枯病和卷曲叶病毒检测

2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)

Pub Date : 2022-12-05 DOI: 10.1109/IPAS55744.2022.10052835

Sohail Anwar, Abdul Rahim Kolachi, Shadi Khan Baloch, Shoaib R. Soomro

Agriculture sector is an important pillar of the global economy. The cotton crop is considered one of the prominent agricultural resources. It is widely cultivated in India, China, Pakistan, USA, Brazil, and other countries of the world. The worldwide cotton crop production is severely affected by numerous diseases such as cotton leaf curl virus (CLCV/CLCuV), bacterial blight, and ball rot. Image processing techniques together with machine learning algorithms are successfully employed in numerous fields and have also used for crop disease detection. In this study, we present a deep learning-based method for classifying diseases of the cotton crop, including bacterial blight and cotton leaf curl virus (CLCV). The dataset of cotton leaves showing disease symptoms is collected from various locations in Sindh, Pakistan. We employ the Inception v4 architecture as a convolutional neural network to identify diseased plant leaves in particular bacterial blight and CLCV. The accuracy of the designed model is 98.26% which shows prominent improvement compared to the existing models and systems.

农业是全球经济的重要支柱。棉花被认为是重要的农业资源之一。它广泛种植在印度、中国、巴基斯坦、美国、巴西和世界其他国家。全球棉花作物生产受到棉花卷曲病毒(CLCV/CLCuV)、细菌性枯萎病和球腐病等多种病害的严重影响。图像处理技术和机器学习算法已成功应用于许多领域，也已用于作物病害检测。在这项研究中，我们提出了一种基于深度学习的棉花作物病害分类方法，包括白叶枯病和棉花卷曲病毒(CLCV)。显示疾病症状的棉花叶片数据集是从巴基斯坦信德省不同地点收集的。我们采用Inception v4架构作为卷积神经网络来识别病害植物叶片，特别是细菌性枯萎病和CLCV。所设计模型的准确率为98.26%，与现有的模型和系统相比有了明显的提高。

{"title":"Bacterial Blight and Cotton Leaf Curl Virus Detection Using Inception V4 Based CNN Model for Cotton Crops","authors":"Sohail Anwar, Abdul Rahim Kolachi, Shadi Khan Baloch, Shoaib R. Soomro","doi":"10.1109/IPAS55744.2022.10052835","DOIUrl":"https://doi.org/10.1109/IPAS55744.2022.10052835","url":null,"abstract":"Agriculture sector is an important pillar of the global economy. The cotton crop is considered one of the prominent agricultural resources. It is widely cultivated in India, China, Pakistan, USA, Brazil, and other countries of the world. The worldwide cotton crop production is severely affected by numerous diseases such as cotton leaf curl virus (CLCV/CLCuV), bacterial blight, and ball rot. Image processing techniques together with machine learning algorithms are successfully employed in numerous fields and have also used for crop disease detection. In this study, we present a deep learning-based method for classifying diseases of the cotton crop, including bacterial blight and cotton leaf curl virus (CLCV). The dataset of cotton leaves showing disease symptoms is collected from various locations in Sindh, Pakistan. We employ the Inception v4 architecture as a convolutional neural network to identify diseased plant leaves in particular bacterial blight and CLCV. The accuracy of the designed model is 98.26% which shows prominent improvement compared to the existing models and systems.","PeriodicalId":322228,"journal":{"name":"2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121042270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Computer Vision-Based Bengali Sign Language To Text Generation 基于计算机视觉的孟加拉手语文本生成

2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)

Pub Date : 2022-12-05 DOI: 10.1109/IPAS55744.2022.10052928

Tonjih Tazalli, Zarin Anan Aunshu, Sumaya Sadbeen Liya, Magfirah Hossain, Zareen Mehjabeen, M. Ahmed, Muhammad Iqbal Hossain

In the whole world, around 7% of people have hearing and speech impairment problems. They use sign language as their communication method. As for our country, there are lots of people born with hearing and speech impairment problems. Therefore, our primary focus is to work for those people by converting Bangla sign language into text. There are already various projects on Bangla sign language done by other people. However, they focused more on the separate alphabets and numerical numbers. That is why, we want to concentrate on Bangla word signs since communication is done using words or phrases rather than alphabets. There is no proper database for Bangla word sign language, so we want to make a database for our work using BDSL. In recognition of sign language (SLR), there usually are two types of scenarios: isolated SLR, which takes words by word and completes recognize action, and the other one is continuous SLR, which completes action by translating the whole sentence at once. We are working on isolated SLR. We introduce a method where we are going to use PyTorch and YOLOv5 for a video classification model to convert Bangla sign language into the text from the video where each video has only one sign language word. Here, we have achieved an accuracy rate of 76.29% on the training dataset and 51.44% on the testing dataset. We are working to build a system that will make it easier for hearing and speech-disabled people to interact with the general public.

全世界约有7%的人有听力和语言障碍问题。他们用手语作为交流的方式。就我国而言，有很多人天生就有听力和语言障碍问题。因此，我们的主要重点是通过将孟加拉国手语转换为文本来为这些人工作。其他人已经完成了各种关于孟加拉手语的项目。然而，他们更关注单独的字母和数字。这就是为什么我们想把重点放在孟加拉语的单词符号上，因为交流是用单词或短语而不是字母来完成的。孟加拉语文字手语没有合适的数据库，所以我们想用BDSL为我们的工作建立一个数据库。在手语识别中，通常有两种场景:一种是孤立的单反，即一个词一个词地完成识别动作;另一种是连续的单反，即通过一次翻译整个句子来完成动作。我们正在研究独立的单反。我们介绍了一种方法，我们将使用PyTorch和YOLOv5作为视频分类模型，将孟加拉语手语转换为视频中的文本，其中每个视频只有一个手语单词。在这里，我们在训练数据集中实现了76.29%的准确率，在测试数据集中实现了51.44%的准确率。我们正在努力建立一个系统，使听障和语言障碍者更容易与公众互动。

{"title":"Computer Vision-Based Bengali Sign Language To Text Generation","authors":"Tonjih Tazalli, Zarin Anan Aunshu, Sumaya Sadbeen Liya, Magfirah Hossain, Zareen Mehjabeen, M. Ahmed, Muhammad Iqbal Hossain","doi":"10.1109/IPAS55744.2022.10052928","DOIUrl":"https://doi.org/10.1109/IPAS55744.2022.10052928","url":null,"abstract":"In the whole world, around 7% of people have hearing and speech impairment problems. They use sign language as their communication method. As for our country, there are lots of people born with hearing and speech impairment problems. Therefore, our primary focus is to work for those people by converting Bangla sign language into text. There are already various projects on Bangla sign language done by other people. However, they focused more on the separate alphabets and numerical numbers. That is why, we want to concentrate on Bangla word signs since communication is done using words or phrases rather than alphabets. There is no proper database for Bangla word sign language, so we want to make a database for our work using BDSL. In recognition of sign language (SLR), there usually are two types of scenarios: isolated SLR, which takes words by word and completes recognize action, and the other one is continuous SLR, which completes action by translating the whole sentence at once. We are working on isolated SLR. We introduce a method where we are going to use PyTorch and YOLOv5 for a video classification model to convert Bangla sign language into the text from the video where each video has only one sign language word. Here, we have achieved an accuracy rate of 76.29% on the training dataset and 51.44% on the testing dataset. We are working to build a system that will make it easier for hearing and speech-disabled people to interact with the general public.","PeriodicalId":322228,"journal":{"name":"2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125305663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

DWT Collusion Resistant Video Watermarking Using Tardos Family Codes 基于Tardos族码的DWT抗合谋视频水印

2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)

Pub Date : 2022-12-05 DOI: 10.1109/IPAS55744.2022.10053023

Abdul Rehman, Gaëtan Le Guelvouit, J. Dion, F. Guilloud, M. Arzel

A fingerprinting process is an efficient means of protecting multimedia content and preventing illegal distribution. The goal is to find individuals who were engaged in the production and illicit distribution of a multimedia product. We investigated discrete wavelet transform (DWT) based blind video watermarking strategy tied with probabilistic fingerprinting codes to avoid collusion among higher-resolution videos. We used FFmpeg to run a variety of collusion attacks (e.g., averaging, darkening, and lighten) on high resolution video and compared the most often suggested code generator and decoders in the literature to find at least one colluder within the necessary code length. The Laarhoven codes generator and nearest neighbor search (NNS) decoder outperforms all other suggested generators and decoders in the literature in terms of computational time, colluder detection and resources.

指纹识别是保护多媒体内容和防止非法传播的有效手段。目的是找出参与制作和非法分发多媒体产品的个人。研究了基于离散小波变换(DWT)和概率指纹码相结合的视频盲水印策略，以避免高分辨率视频之间的串通。我们使用FFmpeg在高分辨率视频上运行各种串通攻击(例如，平均，变暗和变亮)，并比较文献中最常建议的代码生成器和解码器，以在必要的代码长度内找到至少一个串通。Laarhoven码生成器和最近邻搜索(NNS)解码器在计算时间、合成器检测和资源方面优于文献中所有其他建议的生成器和解码器。

引用次数: 0

Society Infrormation 社会Infrormation

2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)

Pub Date : 2022-12-05 DOI: 10.1109/ipas55744.2022.10052899

引用次数: 0

Evaluating Attention in Convolutional Neural Networks for Blended Images 基于卷积神经网络的混合图像注意力评价

2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)

Pub Date : 2022-12-05 DOI: 10.1109/IPAS55744.2022.10052853

Andrea Portscher, Sebastian Stabinger, A. Rodríguez-Sánchez

In neuroscientific experiments, blended images are used to examine how attention mechanisms in the human brain work. They are particularly suited for this research area, as a subject needs to focus on particular features in an image to be able to classify superimposed objects. As Convolutional Neural Networks (CNNs) take some inspiration from the mammalian visual system – such as the hierarchical structure where different levels of abstraction are processed on different network layers – we examine how CNNs perform on this task. More specifically, we evaluate the performance of four popular CNN architectures (ResNet18, ResNet50, CORnet-Z, and Inception V3) on the classification of objects in blended images. Since humans can rather easily solve this task by applying object-based attention, we also augment all architectures with a multi-headed self-attention mechanism to examine its effect on performance. Lastly, we analyse if there is a correlation between the similarity of a network architecture's structure to the human visual system and its ability to correctly classify objects in blended images. Our findings showed that adding a self-attention mechanism reliably increases the similarity to the V4 area of the human ventral stream, an area where attention has a large influence on the processing of visual stimuli.

在神经科学实验中，混合图像被用来研究人类大脑的注意机制是如何工作的。它们特别适合这个研究领域，因为受试者需要关注图像中的特定特征，以便能够对重叠的物体进行分类。由于卷积神经网络(cnn)从哺乳动物的视觉系统中获得了一些灵感——比如在不同网络层上处理不同抽象层次的分层结构——我们研究了cnn在这项任务中的表现。更具体地说，我们评估了四种流行的CNN架构(ResNet18, ResNet50, CORnet-Z和Inception V3)在混合图像中对象分类方面的性能。由于人类可以很容易地通过应用基于对象的注意力来解决这个任务，我们还用多头自注意力机制来增强所有架构，以检查其对性能的影响。最后，我们分析了网络结构与人类视觉系统的相似度与其在混合图像中正确分类物体的能力之间是否存在相关性。我们的研究结果表明，增加自我注意机制可靠地增加了与人类腹侧流V4区域的相似性，该区域的注意力对视觉刺激的处理有很大的影响。

{"title":"Evaluating Attention in Convolutional Neural Networks for Blended Images","authors":"Andrea Portscher, Sebastian Stabinger, A. Rodríguez-Sánchez","doi":"10.1109/IPAS55744.2022.10052853","DOIUrl":"https://doi.org/10.1109/IPAS55744.2022.10052853","url":null,"abstract":"In neuroscientific experiments, blended images are used to examine how attention mechanisms in the human brain work. They are particularly suited for this research area, as a subject needs to focus on particular features in an image to be able to classify superimposed objects. As Convolutional Neural Networks (CNNs) take some inspiration from the mammalian visual system – such as the hierarchical structure where different levels of abstraction are processed on different network layers – we examine how CNNs perform on this task. More specifically, we evaluate the performance of four popular CNN architectures (ResNet18, ResNet50, CORnet-Z, and Inception V3) on the classification of objects in blended images. Since humans can rather easily solve this task by applying object-based attention, we also augment all architectures with a multi-headed self-attention mechanism to examine its effect on performance. Lastly, we analyse if there is a correlation between the similarity of a network architecture's structure to the human visual system and its ability to correctly classify objects in blended images. Our findings showed that adding a self-attention mechanism reliably increases the similarity to the V4 area of the human ventral stream, an area where attention has a large influence on the processing of visual stimuli.","PeriodicalId":322228,"journal":{"name":"2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)","volume":"Five 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129220602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Tool for Thermal Image Annotation and Automatic Temperature Extraction around Orthopedic Pin Sites 一种骨科针位热图像标注与自动温度提取工具

2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)

Pub Date : 2022-12-05 DOI: 10.1109/IPAS55744.2022.10053084

S. Annadatha, M. Fridberg, S. Kold, O. Rahbek, M. Shen

Existing annotation tools are mainly designed for visible images to support supervised learning problems for machine learning. A few tools exist for extracting temperature information from thermal images. However, they are time and manpower consuming, require different stages of data management, and are not automated. This paper focuses on addressing the limitation of existing tools in handling big thermal datasets for annotation, temperature distribution extraction in the Region of Interest (ROI) of Orthopedic surgical wounds and provides flexibility for a researcher to integrate thermal image analysis into wound care machine learning models. We present an easy to use research tool for one click annotation of Orthopedic pin sites for extraction of thermal information, which is a preliminary step of research to estimate the reliability of thermography for home based surveillance of post-operative infection. The proposed tool maps annotations from visible registered image onto thermal and radiometric images. Mapping these annotations from visible registered images avoids manual bias in annotating thermal images. Integrating the functionality of an annotation tool by processing thermal images to acquire single-click manual annotations and extracting temperature distributions in the ROI with those acquired annotations is the novelty of the proposed work and is also crucial for research on deep learning-based investigation on surgical wound infections.

现有标注工具主要针对可见图像设计，以支持机器学习的监督学习问题。有一些工具可以从热图像中提取温度信息。然而，它们耗费时间和人力，需要不同的数据管理阶段，并且不是自动化的。本文的重点是解决现有工具在处理大热数据集进行注释、骨科手术伤口感兴趣区域(ROI)温度分布提取方面的局限性，并为研究人员将热图像分析集成到伤口护理机器学习模型中提供了灵活性。我们提出了一种易于使用的研究工具，用于一键注释骨科针位以提取热信息，这是研究评估热成像用于术后感染家庭监测可靠性的初步步骤。提出的工具映射注解从可见的配准图像到热和辐射图像。从可见的配准图像映射这些注释，避免了在注释热图像时的手动偏差。通过对热图像进行处理，获得一键手动注释，并利用所获得的注释提取ROI中的温度分布，从而集成注释工具的功能，是本研究的新颖之处，也是基于深度学习的外科伤口感染研究的关键。

{"title":"A Tool for Thermal Image Annotation and Automatic Temperature Extraction around Orthopedic Pin Sites","authors":"S. Annadatha, M. Fridberg, S. Kold, O. Rahbek, M. Shen","doi":"10.1109/IPAS55744.2022.10053084","DOIUrl":"https://doi.org/10.1109/IPAS55744.2022.10053084","url":null,"abstract":"Existing annotation tools are mainly designed for visible images to support supervised learning problems for machine learning. A few tools exist for extracting temperature information from thermal images. However, they are time and manpower consuming, require different stages of data management, and are not automated. This paper focuses on addressing the limitation of existing tools in handling big thermal datasets for annotation, temperature distribution extraction in the Region of Interest (ROI) of Orthopedic surgical wounds and provides flexibility for a researcher to integrate thermal image analysis into wound care machine learning models. We present an easy to use research tool for one click annotation of Orthopedic pin sites for extraction of thermal information, which is a preliminary step of research to estimate the reliability of thermography for home based surveillance of post-operative infection. The proposed tool maps annotations from visible registered image onto thermal and radiometric images. Mapping these annotations from visible registered images avoids manual bias in annotating thermal images. Integrating the functionality of an annotation tool by processing thermal images to acquire single-click manual annotations and extracting temperature distributions in the ROI with those acquired annotations is the novelty of the proposed work and is also crucial for research on deep learning-based investigation on surgical wound infections.","PeriodicalId":322228,"journal":{"name":"2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129576278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RailSet: A Unique Dataset for Railway Anomaly Detection RailSet:铁路异常检测的独特数据集

2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)

Pub Date : 2022-12-05 DOI: 10.1109/IPAS55744.2022.10052883

Arij Zouaoui, Ankur Mahtani, Mohamed Amine Hadded, S. Ambellouis, J. Boonaert, H. Wannous

Understanding the driving environment is one of the key factors in achieving an autonomous vehicle. In particular, the detection of anomalies in the traffic lane is a high priority scenario, as it directly involves vehicle's safety. Recent state of the art image processing techniques for anomaly detection are all based on deep learning of neural networks. These algorithms require a considerable amount of annotated data for training and test purposes. While many datasets exist in the field of autonomous road vehicles, such datasets are extremely rare in the railway domain. In this work, we present a new innovative dataset relevant for railway anomaly detection called RailSet. It consists of 6600 high-quality manually annotated images containing normal situations and 1100 images of railway defects such as hole anomaly and rails discontinuity. Due to the lack of anomaly samples in public images and difficulties to create anomalies in the railway environment, we generate artificially images of abnormal scenes, using a deep learning algorithm named StyleMapGAN. This dataset is created as a contribution to the development of autonomous trains able to perceive tracks damage in front of the train. The dataset is available at this link.

了解驾驶环境是实现自动驾驶汽车的关键因素之一。特别是对车道异常的检测是一个高度优先的场景，因为它直接关系到车辆的安全。最新的异常检测图像处理技术都是基于神经网络的深度学习。这些算法需要大量带注释的数据用于训练和测试。虽然在自动驾驶道路车辆领域存在许多数据集，但在铁路领域这样的数据集却极为罕见。在这项工作中，我们提出了一个新的创新数据集，用于铁路异常检测，称为RailSet。它由6600张高质量的包含正常情况的人工注释图像和1100张铁路缺陷(如孔异常和轨道不连续)图像组成。由于公共图像中缺乏异常样本，并且在铁路环境中难以创建异常，我们使用名为StyleMapGAN的深度学习算法人工生成异常场景图像。该数据集的创建是为了开发能够感知列车前方轨道损坏的自动列车。该数据集可在此链接获得。

{"title":"RailSet: A Unique Dataset for Railway Anomaly Detection","authors":"Arij Zouaoui, Ankur Mahtani, Mohamed Amine Hadded, S. Ambellouis, J. Boonaert, H. Wannous","doi":"10.1109/IPAS55744.2022.10052883","DOIUrl":"https://doi.org/10.1109/IPAS55744.2022.10052883","url":null,"abstract":"Understanding the driving environment is one of the key factors in achieving an autonomous vehicle. In particular, the detection of anomalies in the traffic lane is a high priority scenario, as it directly involves vehicle's safety. Recent state of the art image processing techniques for anomaly detection are all based on deep learning of neural networks. These algorithms require a considerable amount of annotated data for training and test purposes. While many datasets exist in the field of autonomous road vehicles, such datasets are extremely rare in the railway domain. In this work, we present a new innovative dataset relevant for railway anomaly detection called RailSet. It consists of 6600 high-quality manually annotated images containing normal situations and 1100 images of railway defects such as hole anomaly and rails discontinuity. Due to the lack of anomaly samples in public images and difficulties to create anomalies in the railway environment, we generate artificially images of abnormal scenes, using a deep learning algorithm named StyleMapGAN. This dataset is created as a contribution to the development of autonomous trains able to perceive tracks damage in front of the train. The dataset is available at this link.","PeriodicalId":322228,"journal":{"name":"2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126611495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Union Embedding and Backbone-Attention boost Zero-Shot Learning Model (UBZSL) 联合嵌入和骨干注意力增强零射击学习模型(UBZSL)

2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)

Pub Date : 2022-12-05 DOI: 10.1109/IPAS55744.2022.10052972

Ziyu Li

Zero-Shot Learning (ZSL) aims to identify categories that are never seen during training. There are many ZSL methods available, and the number is steadily increasing. Even then, there are still some issues to be resolved, such as class embedding and image functions. Human-annotated attributes have been involved in recent work on class embedding. However, this type of attribute does not adequately represent the semantic and visual aspects of each class, and these annotating attributes are time-consuming. Furthermore, ZSL methods for extracting image features rely on the development of pre-trained image representations or fine-tuned models, focusing on learning appropriate functions between image representations and attributes. To reduce the dependency on manual annotation and improve the classification effectiveness, we believe that ZSL would benefit from using Contrastive Language-Image Pre-Training (CLIP) or combined with manual annotation. For this purpose, we propose an improved ZSL model named UBZSL. It uses CLIP combined with manual annotation as a class embedding method and uses an attention map for feature extraction. Experiments show that the performance of our ZSL model on the CUB dataset is greatly improved compared to the current model.

零射击学习(ZSL)旨在识别训练中从未见过的类别。有许多可用的ZSL方法，并且数量正在稳步增加。即便如此，仍然有一些问题需要解决，比如类嵌入和图像函数。最近的类嵌入工作涉及到人工注释属性。但是，这种类型的属性不能充分表示每个类的语义和视觉方面，而且这些注释属性非常耗时。此外，提取图像特征的ZSL方法依赖于预训练图像表示或微调模型的开发，重点是学习图像表示和属性之间的适当函数。为了减少对人工标注的依赖，提高分类效率，我们认为使用对比语言图像预训练(CLIP)或与人工标注相结合将有利于ZSL分类。为此，我们提出了一种改进的ZSL模型，命名为UBZSL。它采用CLIP结合手工标注作为类嵌入方法，并使用注意图进行特征提取。实验表明，与现有模型相比，我们的ZSL模型在CUB数据集上的性能有了很大的提高。

{"title":"Union Embedding and Backbone-Attention boost Zero-Shot Learning Model (UBZSL)","authors":"Ziyu Li","doi":"10.1109/IPAS55744.2022.10052972","DOIUrl":"https://doi.org/10.1109/IPAS55744.2022.10052972","url":null,"abstract":"Zero-Shot Learning (ZSL) aims to identify categories that are never seen during training. There are many ZSL methods available, and the number is steadily increasing. Even then, there are still some issues to be resolved, such as class embedding and image functions. Human-annotated attributes have been involved in recent work on class embedding. However, this type of attribute does not adequately represent the semantic and visual aspects of each class, and these annotating attributes are time-consuming. Furthermore, ZSL methods for extracting image features rely on the development of pre-trained image representations or fine-tuned models, focusing on learning appropriate functions between image representations and attributes. To reduce the dependency on manual annotation and improve the classification effectiveness, we believe that ZSL would benefit from using Contrastive Language-Image Pre-Training (CLIP) or combined with manual annotation. For this purpose, we propose an improved ZSL model named UBZSL. It uses CLIP combined with manual annotation as a class embedding method and uses an attention map for feature extraction. Experiments show that the performance of our ZSL model on the CUB dataset is greatly improved compared to the current model.","PeriodicalId":322228,"journal":{"name":"2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134603708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Paper Review Samples 论文综述样本

2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)

Pub Date : 2022-12-05 DOI: 10.1109/IPAS55744.2022.10052807

 1. Is the paper relevant to the conference topics? o very relevant  2. Is there any originality of the presented work? (5: high originality, ... 1: no originality) o (5)  3. How can you rate the structure of the paper? (5: well, ..., 1: poor) o (4)  4. How do you rate the appropriateness of the research/study method? ( 5: excellent,..., 1:poor) o (4)  5. How do you rate the relevance and clarity of drawings, figures and tables? ( 5: excellent, 1: poor) o (4)  6. How do you rate the appropriateness of the abstract as a description of the paper? ( 5: excellent, ..., 1:poor) o (4)  7. Are references adequate? recent? and correctly cited? (5: excellent,..., 1:poor) o (4)  8. Are discussions and conclusions appropriate? (5: excellent, ..., 1: poor) o (4)  9. Please, add some comments on the paper if you have any. o The paper is well written. Authors address the problem of audio signal augmentation based on Trans-GAN.

1。论文是否与会议主题相关?2.答案:a。所呈现的作品有什么独创性吗?高创意，……1:没有独创性)o (5)你怎样评价这篇论文的结构?嗯，……， 1:可怜的)o (4)你如何评价研究/学习方法的适当性?好极了，……， 1:可怜)o (4)你如何评价图纸、图表和表格的相关性和清晰度?(5:优秀，1:差)o (4)你如何评价摘要作为论文描述的恰当性?好极了，……， 1:可怜的)o (4)推荐信是否足够?最近吗?引用正确吗?(5:太好了,……， 1:可怜)o (4)讨论和结论是否恰当?好极了，……， 1:可怜的)o (4)如果你有什么意见，请在论文上加些评论。这篇论文写得很好。研究了基于反式gan的音频信号增强问题。

引用次数: 0

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀