2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)最新文献

英文中文

Improving the Efficient Neural Architecture Search via Rewarding Modifications 基于奖励修正的高效神经结构搜索

2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)

Pub Date : 2020-11-25 DOI: 10.1109/IVCNZ51579.2020.9290732

I. Gallo, Gabriele Magistrali, Nicola Landro, Riccardo La Grassa

Nowadays, a challenge for the scientific community concerning deep learning is to design architectural models to obtain the best performance on specific data sets. Building effective models is not a trivial task and it can be very time-consuming if done manually. Neural Architecture Search (NAS) has achieved remarkable results in deep learning applications in the past few years. It involves training a recurrent neural network (RNN) controller using Reinforcement Learning (RL) to automatically generate architectures. Efficient Neural Architecture Search (ENAS) was created to address the prohibitively expensive computational complexity of NAS using weight sharing. In this paper we propose Improved-ENAS (I-ENAS), a further improvement of ENAS that augments the reinforcement learning training method by modifying the reward of each tested architecture according to the results obtained in previously tested architectures. We have conducted many experiments on different public domain datasets and demonstrated that I-ENAS, in the worst-case reaches the performance of ENAS, but in many other cases it overcomes ENAS in terms of convergence time needed to achieve better accuracies.

目前，科学界在深度学习方面面临的一个挑战是设计架构模型以在特定数据集上获得最佳性能。构建有效的模型不是一项微不足道的任务，如果手工完成，可能会非常耗时。神经结构搜索(NAS)在过去几年在深度学习应用中取得了显著的成果。它涉及使用强化学习(RL)来训练循环神经网络(RNN)控制器以自动生成架构。高效神经结构搜索(ENAS)的创建是为了解决使用权重共享的NAS过于昂贵的计算复杂性。在本文中，我们提出了改进的ENAS (I-ENAS)，它是ENAS的进一步改进，通过根据先前测试架构中获得的结果修改每个测试架构的奖励来增强强化学习训练方法。我们在不同的公共领域数据集上进行了许多实验，并证明在最坏情况下，I-ENAS达到了ENAS的性能，但在许多其他情况下，它在达到更好精度所需的收敛时间方面克服了ENAS。

{"title":"Improving the Efficient Neural Architecture Search via Rewarding Modifications","authors":"I. Gallo, Gabriele Magistrali, Nicola Landro, Riccardo La Grassa","doi":"10.1109/IVCNZ51579.2020.9290732","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290732","url":null,"abstract":"Nowadays, a challenge for the scientific community concerning deep learning is to design architectural models to obtain the best performance on specific data sets. Building effective models is not a trivial task and it can be very time-consuming if done manually. Neural Architecture Search (NAS) has achieved remarkable results in deep learning applications in the past few years. It involves training a recurrent neural network (RNN) controller using Reinforcement Learning (RL) to automatically generate architectures. Efficient Neural Architecture Search (ENAS) was created to address the prohibitively expensive computational complexity of NAS using weight sharing. In this paper we propose Improved-ENAS (I-ENAS), a further improvement of ENAS that augments the reinforcement learning training method by modifying the reward of each tested architecture according to the results obtained in previously tested architectures. We have conducted many experiments on different public domain datasets and demonstrated that I-ENAS, in the worst-case reaches the performance of ENAS, but in many other cases it overcomes ENAS in terms of convergence time needed to achieve better accuracies.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115363647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Experimental Validation of Bias in Checkerboard Corner Detection 棋盘角点检测中偏差的实验验证

2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)

Pub Date : 2020-11-25 DOI: 10.1109/IVCNZ51579.2020.9290652

M. J. Edwards, M. Hayes, R. Green

The sub-pixel corner refinement algorithm in OpenCV is widely used to refine checkerboard corner location estimates to sub-pixel precision. This paper shows using both simulations and a large dataset of real images that the algorithm produces estimates with significant bias and noise which depend on the sub-pixel corner location. In the real images, the noise ranged from around 0.013 px at the pixel centre to 0.0072 px at the edges, a difference of around $1.8times$. The bias could not be determined from the real images due to residual lens distortion; in the simulated images it had a maximum magnitude of 0.043 px.

OpenCV中的亚像素角点优化算法被广泛用于将棋盘格角点位置估计细化到亚像素精度。本文使用模拟和真实图像的大型数据集表明，该算法产生的估计具有显著的偏差和噪声，这取决于亚像素角的位置。在真实图像中，噪声范围从像素中心的0.013 px到边缘的0.0072 px，差异约为1.8倍。由于残留的透镜畸变，无法从真实图像中确定偏差;在模拟图像中，其最大亮度为0.043 px。

引用次数: 2

Predicting physician gaze in clinical settings using optical flow and positioning 预测医生的目光在临床设置使用光流和定位

2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)

Pub Date : 2020-11-25 DOI: 10.1109/IVCNZ51579.2020.9290716

A. Govindaswamy, E. Montague, D. Raicu, J. Furst

Electronic health record systems used in clinical settings to facilitate informed decision making, affects the dynamics between the physician and the patient during clinical interactions. The interaction between the patient and the physician can impact patient satisfaction, and overall health outcomes. Gaze during patient-doctor interactions was found to impact patient-physician relationship and is an important measure of attention towards humans and technology. This study aims to automatically label physician gaze for video interactions which is typically measured using extensive human coding. In this study, physicians’ gaze is predicted at any time during the recorded video interaction using optical flow and body positioning coordinates as image features. Findings show that physician gaze could be predicted with an accuracy of over 83%. Our approach highlights the potential for the model to be an annotation tool which reduces the extensive human labor of annotating the videos for physician’s gaze. These interactions can further be connected to patient ratings to better understand patient outcomes.

电子健康记录系统用于临床设置，以促进知情决策，影响医生和病人之间的动态在临床互动。病人和医生之间的互动可以影响病人的满意度和整体健康结果。在医患互动过程中，凝视被发现会影响医患关系，是对人类和技术关注的重要衡量标准。这项研究的目的是自动标记医生的注视视频互动，通常使用大量的人类编码来测量。在本研究中，使用光流和身体定位坐标作为图像特征，在录制的视频交互过程中随时预测医生的凝视。研究结果表明，医生凝视的预测准确率超过83%。我们的方法突出了模型作为注释工具的潜力，它减少了为医生的目光注释视频的大量人力劳动。这些相互作用可以进一步与患者评分联系起来，以更好地了解患者的结果。

{"title":"Predicting physician gaze in clinical settings using optical flow and positioning","authors":"A. Govindaswamy, E. Montague, D. Raicu, J. Furst","doi":"10.1109/IVCNZ51579.2020.9290716","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290716","url":null,"abstract":"Electronic health record systems used in clinical settings to facilitate informed decision making, affects the dynamics between the physician and the patient during clinical interactions. The interaction between the patient and the physician can impact patient satisfaction, and overall health outcomes. Gaze during patient-doctor interactions was found to impact patient-physician relationship and is an important measure of attention towards humans and technology. This study aims to automatically label physician gaze for video interactions which is typically measured using extensive human coding. In this study, physicians’ gaze is predicted at any time during the recorded video interaction using optical flow and body positioning coordinates as image features. Findings show that physician gaze could be predicted with an accuracy of over 83%. Our approach highlights the potential for the model to be an annotation tool which reduces the extensive human labor of annotating the videos for physician’s gaze. These interactions can further be connected to patient ratings to better understand patient outcomes.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124995853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Wavelet Based Thresholding for Fourier Ptychography Microscopy 基于小波阈值的傅里叶显微摄影

2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)

Pub Date : 2020-11-25 DOI: 10.1109/IVCNZ51579.2020.9290707

Nazabat Hussain, Mojde Hasanzade, D. Breiby, M. Akram

Computational microscopy algorithms can be used to improve resolution by synthesizing a bigger numerical aperture. Fourier Ptychographic (FP) microscopy utilizes multiple exposures, each illuminated with a unique incidence angle coherent source. The recorded images are often corrupted with background noises and preprocessing improves the quality of the FP recovered image. The preprocessing involves data denoising, thresholding and intensity balancing. We propose a wavelet-based thresholding scheme for noise removal. Any image can be decomposed into its coarse approximation, horizontal details, vertical details, and diagonal details using suitable wavelets. The details are extracted to find a suitable threshold, which is used to perform thresholding. In the proposed algorithm, two wavelet families, Daubechies and Biorthogonal with compact support of db4, db30, bior2.2 and bior6.8, have been used in conjunction with ptychographic phase retrieval. The obtained results show that the wavelet-based thresholding significantly improves the quality of the reconstructed FP microscopy image.

计算显微镜算法可以通过合成更大的数值孔径来提高分辨率。傅里叶平面摄影(FP)显微镜利用多次曝光，每照射一个独特的入射角相干源。记录的图像经常受到背景噪声的破坏，预处理可以提高FP恢复图像的质量。预处理包括数据去噪、阈值化和强度平衡。我们提出了一种基于小波的阈值去噪方案。任何图像都可以用合适的小波分解成粗近似值、水平细节、垂直细节和对角细节。提取细节以找到合适的阈值，并使用该阈值执行阈值设置。在提出的算法中，两个小波族，Daubechies和bi正交与db4, db30, bior2.2和bior6.8紧凑的支持，已被用于结合平面相位检索。结果表明，基于小波的阈值分割方法显著提高了FP显微图像的重建质量。

{"title":"Wavelet Based Thresholding for Fourier Ptychography Microscopy","authors":"Nazabat Hussain, Mojde Hasanzade, D. Breiby, M. Akram","doi":"10.1109/IVCNZ51579.2020.9290707","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290707","url":null,"abstract":"Computational microscopy algorithms can be used to improve resolution by synthesizing a bigger numerical aperture. Fourier Ptychographic (FP) microscopy utilizes multiple exposures, each illuminated with a unique incidence angle coherent source. The recorded images are often corrupted with background noises and preprocessing improves the quality of the FP recovered image. The preprocessing involves data denoising, thresholding and intensity balancing. We propose a wavelet-based thresholding scheme for noise removal. Any image can be decomposed into its coarse approximation, horizontal details, vertical details, and diagonal details using suitable wavelets. The details are extracted to find a suitable threshold, which is used to perform thresholding. In the proposed algorithm, two wavelet families, Daubechies and Biorthogonal with compact support of db4, db30, bior2.2 and bior6.8, have been used in conjunction with ptychographic phase retrieval. The obtained results show that the wavelet-based thresholding significantly improves the quality of the reconstructed FP microscopy image.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115203192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Leveraging Linguistically-aware Object Relations and NASNet for Image Captioning 利用语言感知对象关系和NASNet进行图像字幕

2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)

Pub Date : 2020-11-25 DOI: 10.1109/IVCNZ51579.2020.9290719

Naeha Sharif, M. Jalwana, Bennamoun, Wei Liu, Syed Afaq Ali Shah

Image captioning is a challenging vision-to-language task, which has garnered a lot of attention over the past decade. The introduction of Encoder-Decoder based architectures expedited the research in this area and provided the backbone of the most recent systems. Moreover, leveraging relationships between objects for holistic scene understanding, which in turn improves captioning, has recently sparked interest among researchers. Our proposed model encodes the spatial and semantic proximity of object pairs into linguistically-aware relationship embeddings. Moreover, it captures the global semantics of the image using NASNet. This way, true semantic relations that are not apparent in visual content of an image can be learned, such that the decoder can attend to the most relevant object relations and visual features to generate more semantically-meaningful captions. Our experiments highlight the usefulness of linguistically-aware object relations as well as NASNet visual features for image captioning.

图像字幕是一项具有挑战性的视觉到语言的任务，在过去的十年中引起了很多关注。基于编码器-解码器架构的引入加速了这一领域的研究，并为最新系统提供了支柱。此外，利用对象之间的关系进行整体场景理解，从而改进字幕，最近引起了研究人员的兴趣。我们提出的模型将对象对的空间和语义接近性编码为语言感知的关系嵌入。此外，它使用NASNet捕获图像的全局语义。通过这种方式，可以学习在图像的视觉内容中不明显的真实语义关系，从而使解码器可以关注最相关的对象关系和视觉特征，以生成更有语义意义的字幕。我们的实验强调了语言感知对象关系以及NASNet视觉特征对图像字幕的有用性。

引用次数: 3

CoCoNet: A Collaborative Convolutional Network applied to fine-grained bird species classification 椰网:一个应用于细粒度鸟类分类的协同卷积网络

2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)

Pub Date : 2020-11-25 DOI: 10.1109/IVCNZ51579.2020.9290677

Tapabrata (Rohan) Chakraborty, B. McCane, S. Mills, U. Pal

We present an end-to-end deep network for fine-grained visual categorization called Collaborative Convolutional Network (CoCoNet). The network uses a collaborative layer after the convolutional layers to represent an image as an optimal weighted collaboration of features learned from training samples as a whole rather than one at a time. This gives CoCoNet more power to encode the fine-grained nature of the data with limited samples. We perform a detailed study of the performance with 1-stage and 2-stage transfer learning. The ablation study shows that the proposed method outperforms its constituent parts consistently. CoCoNet also outperforms few state-of-the-art competing methods. Experiments have been performed on the fine-grained bird species classification problem as a representative example, but the method may be applied to other similar tasks. We also introduce a new public dataset for fine-grained species recognition, that of Indian endemic birds and have reported initial results on it.

我们提出了一种用于细粒度视觉分类的端到端深度网络，称为协同卷积网络(CoCoNet)。该网络在卷积层之后使用协作层，将图像表示为从训练样本中学习到的整体特征的最佳加权协作，而不是一次一个。这使CoCoNet更有能力用有限的样本对数据的细粒度特性进行编码。我们对一阶段和两阶段迁移学习的表现进行了详细的研究。烧蚀研究表明，该方法的性能始终优于其组成部分。此外，CoCoNet的表现也胜过了一些最先进的竞争方法。以细粒度鸟类物种分类问题为例进行了实验，但该方法可以应用于其他类似的任务。我们还引入了一个用于细粒度物种识别的新公共数据集，即印度特有鸟类的数据集，并报告了它的初步结果。

引用次数: 1

Evolutionary Algorithm Based Residual Block Search for Compression Artifact Removal 基于进化算法的残差块搜索压缩伪影去除

2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)

Pub Date : 2020-11-25 DOI: 10.1109/IVCNZ51579.2020.9290620

Rishil Shah

Lossy image compression is ubiquitously used for storage and transmission at lower rates. Among the existing lossy image compression methods, the JPEG standard is the most widely used technique in the multimedia world. Over the years, numerous methods have been proposed to suppress the compression artifacts introduced in JPEG-compressed images. However, all current learning-based methods include deep convolutional neural networks (CNNs) that are manually-designed by researchers. The network design process requires extensive computational resources and expertise. Focusing on this issue, we investigate evolutionary search for finding the optimal residual block based architecture for artifact removal. We first define a residual network structure and its corresponding genotype representation used in the search. Then, we provide details of the evolutionary algorithm and the multi-objective function used to find the optimal residual block architecture. Finally, we present experimental results to indicate the effectiveness of our approach and compare performance with existing artifact removal networks. The proposed approach is scalable and portable to numerous low-level vision tasks.

有损图像压缩普遍用于低速率的存储和传输。在现有的有损图像压缩方法中，JPEG标准是多媒体领域应用最广泛的技术。多年来，已经提出了许多方法来抑制jpeg压缩图像中引入的压缩伪影。然而，目前所有基于学习的方法都包括由研究人员手动设计的深度卷积神经网络(cnn)。网络设计过程需要大量的计算资源和专业知识。针对这一问题，我们研究了进化搜索，以找到用于去除工件的最佳残差块架构。我们首先定义残差网络结构及其在搜索中使用的相应基因型表示。然后，详细介绍了用于寻找最优残差块结构的进化算法和多目标函数。最后，我们给出了实验结果来表明我们的方法的有效性，并将性能与现有的伪影去除网络进行了比较。该方法具有可扩展性和可移植性，适用于许多低级视觉任务。

{"title":"Evolutionary Algorithm Based Residual Block Search for Compression Artifact Removal","authors":"Rishil Shah","doi":"10.1109/IVCNZ51579.2020.9290620","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290620","url":null,"abstract":"Lossy image compression is ubiquitously used for storage and transmission at lower rates. Among the existing lossy image compression methods, the JPEG standard is the most widely used technique in the multimedia world. Over the years, numerous methods have been proposed to suppress the compression artifacts introduced in JPEG-compressed images. However, all current learning-based methods include deep convolutional neural networks (CNNs) that are manually-designed by researchers. The network design process requires extensive computational resources and expertise. Focusing on this issue, we investigate evolutionary search for finding the optimal residual block based architecture for artifact removal. We first define a residual network structure and its corresponding genotype representation used in the search. Then, we provide details of the evolutionary algorithm and the multi-objective function used to find the optimal residual block architecture. Finally, we present experimental results to indicate the effectiveness of our approach and compare performance with existing artifact removal networks. The proposed approach is scalable and portable to numerous low-level vision tasks.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123208083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep Learning Methods for Virus Identification from Digital Images 基于数字图像的病毒识别深度学习方法

2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)

Pub Date : 2020-11-25 DOI: 10.1109/IVCNZ51579.2020.9290670

Luxin Zhang, W. Yan

The use of deep learning methods for virus identification from digital images is a timely research topic. Given an electron microscopy image, virus recognition utilizing deep learning approaches is critical at present, because virus identification by human experts is relatively slow and time-consuming. In this project, our objective is to develop deep learning methods for automatic virus identification from digital images, there are four viral species taken into consideration, namely, SARS, MERS, HIV, and COVID-19. In this work, we firstly examine virus morphological characteristics and propose a novel loss function which aims at virus identification from the given electron micrographs. We take into account of attention mechanism for virus locating and classification from digital images. In order to generate the most reliable estimate of bounding boxes and classification for a virus as visual object, we train and test five deep learning models: R-CNN, Fast R-CNN, Faster R-CNN, YOLO, and SSD, based on our dataset of virus electron microscopy. Additionally, we explicate the evaluation approaches. The conclusion reveals SSD and Faster R-CNN outperform in the virus identification.

利用深度学习方法从数字图像中识别病毒是一个及时的研究课题。鉴于电子显微镜图像，目前利用深度学习方法进行病毒识别至关重要，因为由人类专家进行病毒识别相对缓慢且耗时。在这个项目中，我们的目标是开发从数字图像中自动识别病毒的深度学习方法，考虑了四种病毒，分别是SARS, MERS, HIV和COVID-19。在这项工作中，我们首先研究了病毒的形态特征，并提出了一种新的损失函数，旨在从给定的电子显微照片中识别病毒。利用注意机制对数字图像中的病毒进行定位和分类。为了生成最可靠的边界框估计和病毒作为视觉对象的分类，我们基于我们的病毒电子显微镜数据集训练和测试了五个深度学习模型:R-CNN、Fast R-CNN、Faster R-CNN、YOLO和SSD。此外，我们还阐述了评估方法。结果表明，SSD和Faster R-CNN在病毒识别方面表现较好。

{"title":"Deep Learning Methods for Virus Identification from Digital Images","authors":"Luxin Zhang, W. Yan","doi":"10.1109/IVCNZ51579.2020.9290670","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290670","url":null,"abstract":"The use of deep learning methods for virus identification from digital images is a timely research topic. Given an electron microscopy image, virus recognition utilizing deep learning approaches is critical at present, because virus identification by human experts is relatively slow and time-consuming. In this project, our objective is to develop deep learning methods for automatic virus identification from digital images, there are four viral species taken into consideration, namely, SARS, MERS, HIV, and COVID-19. In this work, we firstly examine virus morphological characteristics and propose a novel loss function which aims at virus identification from the given electron micrographs. We take into account of attention mechanism for virus locating and classification from digital images. In order to generate the most reliable estimate of bounding boxes and classification for a virus as visual object, we train and test five deep learning models: R-CNN, Fast R-CNN, Faster R-CNN, YOLO, and SSD, based on our dataset of virus electron microscopy. Additionally, we explicate the evaluation approaches. The conclusion reveals SSD and Faster R-CNN outperform in the virus identification.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126346106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Pothole Detection and Dimension Estimation System using Deep Learning (YOLO) and Image Processing 基于深度学习(YOLO)和图像处理的凹坑检测与尺寸估计系统

2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)

Pub Date : 2020-11-25 DOI: 10.1109/IVCNZ51579.2020.9290547

P. Chitale, Kaustubh Y. Kekre, Hrishikesh Shenai, R. Karani, Jay Gala

The world is advancing towards an autonomous environment at a great pace and it has become a need of an hour, especially during the current pandemic situation. The pandemic has hindered the functioning of many sectors, one of them being Road development and maintenance. Creating a safe working environment for workers is a major concern of road maintenance during such difficult times. This can be achieved to some extent with the help of an autonomous system that will aim at reducing human dependency. In this paper, one of such systems, a pothole detection and dimension estimation, is proposed. The proposed system uses a Deep Learning based algorithm YOLO (You Only Look Once) for pothole detection. Further, an image processing based triangular similarity measure is used for pothole dimension estimation. The proposed system provides reasonably accurate results of both pothole detection and dimension estimation. The proposed system also helps in reducing the time required for road maintenance. The system uses a custom made dataset consisting of images of water-logged and dry potholes of various shapes and sizes.

世界正在快速走向自主环境，这已经成为一个小时的需要，特别是在当前的大流行形势下。大流行病阻碍了许多部门的运作，其中之一是道路发展和维护。在这种困难时期，为工人创造一个安全的工作环境是道路养护的一个主要问题。在某种程度上，这可以通过一个旨在减少人类依赖的自主系统来实现。本文提出了一种凹坑检测与尺寸估计系统。该系统使用基于深度学习的YOLO (You Only Look Once)算法进行坑洞检测。在此基础上，采用基于图像处理的三角形相似性测度进行坑穴尺寸估计。该系统在凹坑探测和尺寸估计方面均提供了较为准确的结果。建议的系统亦有助减少道路维修所需的时间。该系统使用一个定制的数据集，包括各种形状和大小的积水和干坑的图像。

{"title":"Pothole Detection and Dimension Estimation System using Deep Learning (YOLO) and Image Processing","authors":"P. Chitale, Kaustubh Y. Kekre, Hrishikesh Shenai, R. Karani, Jay Gala","doi":"10.1109/IVCNZ51579.2020.9290547","DOIUrl":"https://doi.org/10.1109/IVCNZ51579.2020.9290547","url":null,"abstract":"The world is advancing towards an autonomous environment at a great pace and it has become a need of an hour, especially during the current pandemic situation. The pandemic has hindered the functioning of many sectors, one of them being Road development and maintenance. Creating a safe working environment for workers is a major concern of road maintenance during such difficult times. This can be achieved to some extent with the help of an autonomous system that will aim at reducing human dependency. In this paper, one of such systems, a pothole detection and dimension estimation, is proposed. The proposed system uses a Deep Learning based algorithm YOLO (You Only Look Once) for pothole detection. Further, an image processing based triangular similarity measure is used for pothole dimension estimation. The proposed system provides reasonably accurate results of both pothole detection and dimension estimation. The proposed system also helps in reducing the time required for road maintenance. The system uses a custom made dataset consisting of images of water-logged and dry potholes of various shapes and sizes.","PeriodicalId":164317,"journal":{"name":"2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129372719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

History and Evolution of Single Pass Connected Component Analysis 单道连通分量分析的历史与发展

2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)

Pub Date : 2020-11-25 DOI: 10.1109/IVCNZ51579.2020.9290585

D. Bailey

The techniques for single pass connected component analysis have undergone significant changes from their initial development to current state-of-the-art algorithms. This review traces the evolution of the algorithms, and explores the linkages and development of ideas introduced by various researchers. Three significant developments are: the recycling of labels to enable processing with resources proportional to the image width; reduction of overheads associated with label merging; and processing of multiple pixels in parallel. These are of particular interest to those developing high speed and low latency image processing and machine vision systems.

单道连接成分分析技术从最初的发展到目前最先进的算法经历了重大变化。这篇综述追溯了算法的演变，并探讨了各种研究人员介绍的思想的联系和发展。三个重要的发展是:回收标签，使处理与图像宽度成比例的资源;减少与标签合并相关的开销;并并行处理多个像素。这些对那些开发高速低延迟图像处理和机器视觉系统的人特别感兴趣。

引用次数: 2

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀