2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)最新文献_第8页

Semi-supervised learning with cross-localisation in shared GAN latent space for enhanced OCT data augmentation 基于共享GAN潜在空间的交叉定位半监督学习增强OCT数据增强

2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034570

Deep learning methods have demonstrated stateof-the-art performance for the segmentation of the retina and choroid in optical coherence tomography (OCT) images. These methods are automatic and fast, yielding high accuracy and precision, thus reducing the load of manual analysis. However, deep learning usually requires large amounts of diverse, labelled data for training which can be difficult or infeasible to obtain, especially for medical images. For example, privacy concerns and lack of confidentiality agreements are common and are an obstacle to the sharing of useful training data. Additionally, some data can be significantly more difficult to obtain in the first place such as that of rare pathologies. Even in cases where sufficient data is available, the cost and time to perform image labelling can be significant. In many cases, data augmentation is employed to enhance the size of the training set. Similarly, semisupervised learning (SSL) can be used to exploit potentially large amounts of unlabeled data which would otherwise be unused. Motivated by this, in this study, we propose an enhanced StyleGAN2-based data augmentation method for OCT images by employing SSL through a novel crosslocalisation technique. For OCT image patches, the proposed method significantly improved the classification accuracy over the previous GAN data augmentation approach which uses labelled data only. The technique works by automatically learning, mixing, and injecting unlabelled styles into the labelled data to further increase the diversity of the synthetic data. The proposed method can be trained using differing quantities of both labelled and unlabelled data simultaneously. The method is simple, effective, generalizable and can be easily applied and used to extend StyleGAN2. Hence, there is also significant potential for the proposed method to be applied to other domains and imaging modalities for data augmentation purposes where unlabelled data exists.

深度学习方法已经证明了光学相干断层扫描(OCT)图像中视网膜和脉络膜分割的最先进性能。这些方法自动、快速，准确度和精密度高，从而减少了人工分析的工作量。然而，深度学习通常需要大量不同的标记数据进行训练，这些数据很难或不可获得，特别是对于医学图像。例如，隐私问题和缺乏保密协议是常见的，是分享有用训练数据的障碍。此外，一些数据可能更难以获得，例如罕见疾病的数据。即使在有足够数据可用的情况下，执行图像标记的成本和时间也可能很大。在许多情况下，数据增强是用来增强训练集的大小。类似地，半监督学习(SSL)可用于利用大量未标记的数据，否则这些数据将不会被使用。基于此，在本研究中，我们提出了一种基于stylegan2的增强OCT图像数据增强方法，该方法通过一种新的交叉定位技术使用SSL。对于OCT图像贴片，与之前仅使用标记数据的GAN数据增强方法相比，该方法显著提高了分类精度。该技术的工作原理是自动学习、混合和将未标记的样式注入已标记的数据中，以进一步增加合成数据的多样性。所提出的方法可以同时使用不同数量的标记和未标记数据进行训练。该方法简单、有效、通用性强，易于应用和扩展StyleGAN2。因此，所提出的方法也有很大的潜力应用于其他领域和成像模式，用于存在未标记数据的数据增强目的。

{"title":"Semi-supervised learning with cross-localisation in shared GAN latent space for enhanced OCT data augmentation","authors":"","doi":"10.1109/DICTA56598.2022.10034570","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034570","url":null,"abstract":"Deep learning methods have demonstrated stateof-the-art performance for the segmentation of the retina and choroid in optical coherence tomography (OCT) images. These methods are automatic and fast, yielding high accuracy and precision, thus reducing the load of manual analysis. However, deep learning usually requires large amounts of diverse, labelled data for training which can be difficult or infeasible to obtain, especially for medical images. For example, privacy concerns and lack of confidentiality agreements are common and are an obstacle to the sharing of useful training data. Additionally, some data can be significantly more difficult to obtain in the first place such as that of rare pathologies. Even in cases where sufficient data is available, the cost and time to perform image labelling can be significant. In many cases, data augmentation is employed to enhance the size of the training set. Similarly, semisupervised learning (SSL) can be used to exploit potentially large amounts of unlabeled data which would otherwise be unused. Motivated by this, in this study, we propose an enhanced StyleGAN2-based data augmentation method for OCT images by employing SSL through a novel crosslocalisation technique. For OCT image patches, the proposed method significantly improved the classification accuracy over the previous GAN data augmentation approach which uses labelled data only. The technique works by automatically learning, mixing, and injecting unlabelled styles into the labelled data to further increase the diversity of the synthetic data. The proposed method can be trained using differing quantities of both labelled and unlabelled data simultaneously. The method is simple, effective, generalizable and can be easily applied and used to extend StyleGAN2. Hence, there is also significant potential for the proposed method to be applied to other domains and imaging modalities for data augmentation purposes where unlabelled data exists.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128998040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Analysis of the Over-Exposure Problem for Robust Scene Parsing 鲁棒场景分析中的过度曝光问题分析

2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034628

Developing a reliable high-level perception system that can work stably in different environments is highly useful, especially in autonomous driving tasks. Many previous studies have investigated extreme cases such as dark, rainy and foggy environments and proposed various datasets for these different tasks. In this work, we explore another extreme case: destructive over-exposure which may result in different degrees of content loss due to the limitations of dynamic range. These over-exposure cases can be found in most outdoor datasets with structured or unstructured environments but are usually neglected as they are mixed with other well-exposed images. To analyse the influence imposed by this kind of corruption, we generate realistic over-exposed images based on existing outdoor datasets using a simple but controllable formula proposed in a photographer's view. Our simulation is realistic, indicated by similar illumination distributions to other real over-exposed images. We also conduct several experiments on our over-exposed datasets and discover performance drops using state-of-the-art segmentation models. Subsequently, to address the over-exposure problem, we compare several image restoration approaches for over-exposure recovery and demonstrate their potential effectiveness as a preprocessing step in scene parsing tasks.

开发一种在不同环境下稳定工作的可靠的高级感知系统非常有用，特别是在自动驾驶任务中。许多先前的研究调查了极端情况，如黑暗、下雨和有雾的环境，并为这些不同的任务提出了各种数据集。在这项工作中，我们探讨了另一种极端情况:由于动态范围的限制，破坏性过度曝光可能导致不同程度的内容丢失。这些过度曝光的情况可以在大多数具有结构化或非结构化环境的室外数据集中发现，但通常被忽略，因为它们与其他曝光良好的图像混合在一起。为了分析这种腐败所带来的影响，我们使用一个简单但可控的公式，从摄影师的角度出发，基于现有的户外数据集生成逼真的过度曝光图像。我们的模拟是真实的，与其他真实的过度曝光图像相似的照明分布表明。我们还在过度曝光的数据集上进行了几个实验，并使用最先进的分割模型发现了性能下降。随后，为了解决过度曝光问题，我们比较了几种过度曝光恢复的图像恢复方法，并展示了它们作为场景解析任务预处理步骤的潜在有效性。

{"title":"Analysis of the Over-Exposure Problem for Robust Scene Parsing","authors":"","doi":"10.1109/DICTA56598.2022.10034628","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034628","url":null,"abstract":"Developing a reliable high-level perception system that can work stably in different environments is highly useful, especially in autonomous driving tasks. Many previous studies have investigated extreme cases such as dark, rainy and foggy environments and proposed various datasets for these different tasks. In this work, we explore another extreme case: destructive over-exposure which may result in different degrees of content loss due to the limitations of dynamic range. These over-exposure cases can be found in most outdoor datasets with structured or unstructured environments but are usually neglected as they are mixed with other well-exposed images. To analyse the influence imposed by this kind of corruption, we generate realistic over-exposed images based on existing outdoor datasets using a simple but controllable formula proposed in a photographer's view. Our simulation is realistic, indicated by similar illumination distributions to other real over-exposed images. We also conduct several experiments on our over-exposed datasets and discover performance drops using state-of-the-art segmentation models. Subsequently, to address the over-exposure problem, we compare several image restoration approaches for over-exposure recovery and demonstrate their potential effectiveness as a preprocessing step in scene parsing tasks.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129489080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards Generalized Deepfake Detection With Continual Learning On Limited New Data: Anonymous Authors 在有限新数据上持续学习的广义深度假检测:匿名作者

2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034569

Advancements in deep learning make it increasingly easy to produce highly realistic fake images and videos (also known as deepfakes), which could undermine trust in public discourse and pose threats to national and economic security. Despite diligent efforts that have been made to develop deepfake detection techniques, existing approaches often generalize poorly when the characteristics of new data and tasks differ significantly from the ones involved in their initial training phase. The detectors' limited generalizability hinders their widespread adoption if they cannot handle unseen manipulations in an open set. One solution to this issue is to endow the detectors with the capability of lifelong learning from the new data to improve themselves. However, it is not uncommon in real-world scenarios that the amount of training data associated with a certain deepfake algorithm is limited. Therefore, the effectiveness and agility of a continual learning scheme depend heavily on its ability to learn from limited new data. In this work, we propose a deepfake detection approach that combines spectral analysis and continual learning methods to pave the way towards generalized deepfake detection with limited new data. We demonstrate the generalization capability of the proposed approach through experiments using five datasets of deepfakes. The experiment results show that our proposed approach is effective in addressing catastrophic forgetting despite being updated with limited new data, decreasing the average forgetting rate by 35.04% and increasing the average accuracy by 22.45% compared without continual learning.

深度学习的进步使得制作高度逼真的假图像和视频(也称为deepfakes)变得越来越容易，这可能会破坏公众话语的信任，并对国家和经济安全构成威胁。尽管人们在开发深度伪造检测技术方面做出了不懈的努力，但当新数据和任务的特征与初始训练阶段的特征有很大不同时，现有的方法往往泛化得很差。如果检测器不能处理开放集合中看不见的操作，那么它们有限的泛化性阻碍了它们的广泛采用。解决这个问题的一个办法是赋予探测器终身学习新数据的能力，以提高自己。然而，在现实场景中，与某个deepfake算法相关的训练数据量是有限的，这并不罕见。因此，持续学习方案的有效性和敏捷性在很大程度上取决于它从有限的新数据中学习的能力。在这项工作中，我们提出了一种结合光谱分析和持续学习方法的深度伪造检测方法，为有限新数据的广义深度伪造检测铺平了道路。我们通过使用五个深度伪造数据集的实验证明了所提出方法的泛化能力。实验结果表明，我们提出的方法可以有效地解决灾难性遗忘问题，尽管更新的新数据有限，与不进行持续学习相比，平均遗忘率降低了35.04%，平均准确率提高了22.45%。

{"title":"Towards Generalized Deepfake Detection With Continual Learning On Limited New Data: Anonymous Authors","authors":"","doi":"10.1109/DICTA56598.2022.10034569","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034569","url":null,"abstract":"Advancements in deep learning make it increasingly easy to produce highly realistic fake images and videos (also known as deepfakes), which could undermine trust in public discourse and pose threats to national and economic security. Despite diligent efforts that have been made to develop deepfake detection techniques, existing approaches often generalize poorly when the characteristics of new data and tasks differ significantly from the ones involved in their initial training phase. The detectors' limited generalizability hinders their widespread adoption if they cannot handle unseen manipulations in an open set. One solution to this issue is to endow the detectors with the capability of lifelong learning from the new data to improve themselves. However, it is not uncommon in real-world scenarios that the amount of training data associated with a certain deepfake algorithm is limited. Therefore, the effectiveness and agility of a continual learning scheme depend heavily on its ability to learn from limited new data. In this work, we propose a deepfake detection approach that combines spectral analysis and continual learning methods to pave the way towards generalized deepfake detection with limited new data. We demonstrate the generalization capability of the proposed approach through experiments using five datasets of deepfakes. The experiment results show that our proposed approach is effective in addressing catastrophic forgetting despite being updated with limited new data, decreasing the average forgetting rate by 35.04% and increasing the average accuracy by 22.45% compared without continual learning.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115174462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Effective Utilisation of Multiple Open-Source Datasets to Improve Generalisation Performance of Point Cloud Segmentation Models 有效利用多个开源数据集提高点云分割模型的泛化性能

2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2022-11-29 DOI: 10.1109/DICTA56598.2022.10034566

Matthew Howe, Boris Repasky, Timothy Payne

Utilising a single point cloud segmentation model can be desirable in situations where point cloud source, quality, and content is unknown. In these situations the segmentation model must be able to handle these variations with predictable and consistent results. Although deep learning can segment point clouds accurately it often suffers with generalisation, adapting poorly to data which is different than the data it was trained on. To address this issue, we propose to utilise multiple available open source fully annotated datasets to train and test models that are better able to generalise. The open-source datasets we utilise are DublinCity, DALES, ISPRS, Swiss3DCities, SensatUrban, SUM, and H3D [5], [11], [10], [1], [3], [2], [6]. In this paper we discuss the combination of these datasets into a simple training set and challenging test set which evaluates multiple aspects of the generalisation task. We show that a naive combination and training produces improved results as expected. We also show that an improved sampling strategy which decreases sampling variations increases the generalisation performance substantially on top of this. Experiments to find the contributing factor of which variables give this performance boost found that none individually boost performance and rather it is the consistency of samples the model is evaluated on which yields this improvement.

在点云来源、质量和内容未知的情况下，利用单点云分割模型是可取的。在这些情况下，分割模型必须能够以可预测和一致的结果处理这些变化。虽然深度学习可以准确地分割点云，但它经常受到泛化的影响，对与训练数据不同的数据适应能力差。为了解决这个问题，我们建议利用多个可用的开源完全注释数据集来训练和测试能够更好地泛化的模型。我们使用的开源数据集是DublinCity、DALES、ISPRS、Swiss3DCities、SensatUrban、SUM和H3D[5]、[11]、[10]、[1]、[3]、[2]、[6]。在本文中，我们讨论了将这些数据集组合成一个简单的训练集和具有挑战性的测试集，以评估泛化任务的多个方面。我们证明了一个朴素的组合和训练产生了预期的改进结果。我们还表明，改进的采样策略减少了采样变化，在此基础上大大提高了泛化性能。通过实验发现，没有一个变量可以单独提高性能，而是评估模型的样本的一致性产生了这种改进。

{"title":"Effective Utilisation of Multiple Open-Source Datasets to Improve Generalisation Performance of Point Cloud Segmentation Models","authors":"Matthew Howe, Boris Repasky, Timothy Payne","doi":"10.1109/DICTA56598.2022.10034566","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034566","url":null,"abstract":"Utilising a single point cloud segmentation model can be desirable in situations where point cloud source, quality, and content is unknown. In these situations the segmentation model must be able to handle these variations with predictable and consistent results. Although deep learning can segment point clouds accurately it often suffers with generalisation, adapting poorly to data which is different than the data it was trained on. To address this issue, we propose to utilise multiple available open source fully annotated datasets to train and test models that are better able to generalise. The open-source datasets we utilise are DublinCity, DALES, ISPRS, Swiss3DCities, SensatUrban, SUM, and H3D [5], [11], [10], [1], [3], [2], [6]. In this paper we discuss the combination of these datasets into a simple training set and challenging test set which evaluates multiple aspects of the generalisation task. We show that a naive combination and training produces improved results as expected. We also show that an improved sampling strategy which decreases sampling variations increases the generalisation performance substantially on top of this. Experiments to find the contributing factor of which variables give this performance boost found that none individually boost performance and rather it is the consistency of samples the model is evaluated on which yields this improvement.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125226494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TW-BAG: Tensor-wise Brain-aware Gate Network for Inpainting Disrupted Diffusion Tensor Imaging TW-BAG:基于张量的脑感知门网络

2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2022-10-31 DOI: 10.1109/DICTA56598.2022.10034593

Zihao Tang, Xinyi Wang, Lihaowen Zhu, M. Cabezas, Dongnan Liu, Michael H Barnett, Weidong (Tom) Cai, Chengyu Wang

Diffusion Weighted Imaging (DWI) is an advanced imaging technique commonly used in neuroscience and neurological clinical research through a Diffusion Tensor Imaging (DTI) model. Volumetric scalar metrics including fractional anisotropy, mean diffusivity, and axial diffusivity can be derived from the DTI model to summarise water diffusivity and other quantitative microstructural information for clinical studies. However, clinical practice constraints can lead to sub-optimal DWI acquisitions with missing slices (either due to a limited field of view or the acquisition of disrupted slices). To avoid discarding valuable subjects for group-wise studies, we propose a novel 3D Tensor-Wise Brain-Aware Gate network (TW-BAG) for inpainting disrupted DTIs. The proposed method is tailored to the problem with a dynamic gate mechanism and independent tensor-wise decoders. We evaluated the proposed method on the publicly available Human Connectome Project (HCP) dataset using common image similarity metrics derived from the predicted tensors and scalar DTI metrics. Our experimental results show that the proposed approach can reconstruct the original brain DTI volume and recover relevant clinical imaging information.

弥散加权成像(Diffusion Weighted Imaging, DWI)是一种通过弥散张量成像(Diffusion Tensor Imaging, DTI)模型在神经科学和神经学临床研究中常用的先进成像技术。体积标量指标包括分数各向异性、平均扩散率和轴向扩散率可以从DTI模型中导出，以总结水扩散率和其他定量显微结构信息，用于临床研究。然而，临床实践的限制可能导致缺少切片的次优DWI采集(由于视野有限或获取破坏的切片)。为了避免丢弃有价值的群体研究对象，我们提出了一种新的3D张量智能脑觉门网络(TW-BAG)用于绘制中断的dti。该方法是针对具有动态门机制和独立张量解码器的问题而设计的。我们使用来自预测张量和标量DTI度量的常见图像相似性度量，在公开可用的人类连接组项目(HCP)数据集上评估了所提出的方法。实验结果表明，该方法可以重建原始脑DTI体积，恢复相关临床影像信息。

{"title":"TW-BAG: Tensor-wise Brain-aware Gate Network for Inpainting Disrupted Diffusion Tensor Imaging","authors":"Zihao Tang, Xinyi Wang, Lihaowen Zhu, M. Cabezas, Dongnan Liu, Michael H Barnett, Weidong (Tom) Cai, Chengyu Wang","doi":"10.1109/DICTA56598.2022.10034593","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034593","url":null,"abstract":"Diffusion Weighted Imaging (DWI) is an advanced imaging technique commonly used in neuroscience and neurological clinical research through a Diffusion Tensor Imaging (DTI) model. Volumetric scalar metrics including fractional anisotropy, mean diffusivity, and axial diffusivity can be derived from the DTI model to summarise water diffusivity and other quantitative microstructural information for clinical studies. However, clinical practice constraints can lead to sub-optimal DWI acquisitions with missing slices (either due to a limited field of view or the acquisition of disrupted slices). To avoid discarding valuable subjects for group-wise studies, we propose a novel 3D Tensor-Wise Brain-Aware Gate network (TW-BAG) for inpainting disrupted DTIs. The proposed method is tailored to the problem with a dynamic gate mechanism and independent tensor-wise decoders. We evaluated the proposed method on the publicly available Human Connectome Project (HCP) dataset using common image similarity metrics derived from the predicted tensors and scalar DTI metrics. Our experimental results show that the proposed approach can reconstruct the original brain DTI volume and recover relevant clinical imaging information.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130601913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Automatic Cattle Identification using YOLOv5 and Moasic Augmentation: A Comparative Analysis 利用YOLOv5和马赛克增强技术自动识别牛只的比较分析

2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2022-10-21 DOI: 10.1109/DICTA56598.2022.10034585

Rabindra Dulal, Lihong Zheng, M. A. Kabir, S. McGrath, J. Medway, D. Swain, Will Swain

You Only Look Once (YOLO) is a single-stage object detection model popular for real-time object detection, accuracy, and speed. This paper investigates the YOLOv5 model to identify cattle in the yards. The current solution to cattle identification includes radio-frequency identification (RFID) tags. The problem occurs when the RFID tag is lost or damaged. A biometric solution identifies the cattle and helps to assign the lost or damaged tag or replace the RFID-based system. Muzzle patterns in cattle are unique biometric solutions like a fingerprint in humans. This paper aims to present our recent research in utilizing five popular object detection models, looking at the architecture of YOLOv5, investigating the performance of eight backbones with the YOLOv5 model, and the influence of mosaic augmentation in YOLOv5 by experimental results on the available cattle muzzle images. Finally, we concluded with the excellent potential of using YOLOv5 in automatic cattle identification. Our experiments show YOLOv5 with transformer performed best with mean Average Precision (mAP)_0.5 (the average of AP when the IoU is greater than 50%) of 0.995, and mAP_0.5:0.95 (the average of AP from 50% to 95% IoU with an interval of 5%) of 0.9366. In addition, our experiments show the increase in accuracy of the model by using mosaic augmentation in all backbones used in our experiments. Moreover, we can also detect cattle with partial muzzle images.

You Only Look Once (YOLO)是一种单阶段对象检测模型，用于实时对象检测，准确性和速度。本文研究了YOLOv5模型在畜牧场牛群识别中的应用。目前的牛识别解决方案包括射频识别(RFID)标签。当RFID标签丢失或损坏时，就会出现问题。生物识别解决方案识别牛，并帮助分配丢失或损坏的标签或替换基于rfid的系统。牛的口吻图案是独特的生物识别解决方案，就像人类的指纹一样。本文介绍了我们利用5种流行的目标检测模型的最新研究成果，研究了YOLOv5的结构，研究了YOLOv5模型下8个骨干的性能，并通过实验结果研究了YOLOv5中马赛克增强对现有牛口枪口图像的影响。最后，我们认为YOLOv5在牛的自动识别中具有良好的应用潜力。我们的实验表明，带变压器的YOLOv5的平均平均精度(mAP)_0.5 (IoU大于50%时AP的平均值)为0.995,mAP_0.5:0.95 (IoU从50%到95%，间隔为5%)的平均值为0.9366。此外，我们的实验表明，通过在实验中使用的所有骨干网中使用马赛克增强，模型的精度得到了提高。此外，我们还可以用部分口吻图像来检测牛。

{"title":"Automatic Cattle Identification using YOLOv5 and Moasic Augmentation: A Comparative Analysis","authors":"Rabindra Dulal, Lihong Zheng, M. A. Kabir, S. McGrath, J. Medway, D. Swain, Will Swain","doi":"10.1109/DICTA56598.2022.10034585","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034585","url":null,"abstract":"You Only Look Once (YOLO) is a single-stage object detection model popular for real-time object detection, accuracy, and speed. This paper investigates the YOLOv5 model to identify cattle in the yards. The current solution to cattle identification includes radio-frequency identification (RFID) tags. The problem occurs when the RFID tag is lost or damaged. A biometric solution identifies the cattle and helps to assign the lost or damaged tag or replace the RFID-based system. Muzzle patterns in cattle are unique biometric solutions like a fingerprint in humans. This paper aims to present our recent research in utilizing five popular object detection models, looking at the architecture of YOLOv5, investigating the performance of eight backbones with the YOLOv5 model, and the influence of mosaic augmentation in YOLOv5 by experimental results on the available cattle muzzle images. Finally, we concluded with the excellent potential of using YOLOv5 in automatic cattle identification. Our experiments show YOLOv5 with transformer performed best with mean Average Precision (mAP)_0.5 (the average of AP when the IoU is greater than 50%) of 0.995, and mAP_0.5:0.95 (the average of AP from 50% to 95% IoU with an interval of 5%) of 0.9366. In addition, our experiments show the increase in accuracy of the model by using mosaic augmentation in all backbones used in our experiments. Moreover, we can also detect cattle with partial muzzle images.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134284302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

MKIS-Net: A Light-Weight Multi-Kernel Network for Medical Image Segmentation MKIS-Net:用于医学图像分割的轻量级多核网络

2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2022-10-15 DOI: 10.1109/DICTA56598.2022.10034573

T. M. Khan, Muhammad Arsalan, A. Robles-Kelly, E. Meijering

Image segmentation is an important task in medical imaging. It constitutes the backbone of a wide variety of clinical diagnostic methods, treatments, and computer-aided surgeries. In this paper, we propose a multi-kernel image segmentation net (MKIS-Net), which uses multiple kernels to create an efficient receptive field and enhance segmentation performance. As a result of its multi-kernel design, MKIS-Net is a light-weight architecture with a small number of trainable parameters. Moreover, these multi-kernel receptive fields also contribute to better segmentation results. We demonstrate the efficacy of MKIS-Net on several tasks including segmentation of retinal vessels, skin lesion segmentation, and chest X-ray segmentation. The performance of the proposed network is quite competitive, and often superior, in comparison to state-of-the-art methods. Moreover, in some cases MKIS-Net has more than an order of magnitude fewer trainable parameters than existing medical image segmentation alternatives and is at least four times smaller than other light-weight architectures.

图像分割是医学成像中的一项重要任务。它构成了各种临床诊断方法、治疗和计算机辅助手术的支柱。在本文中，我们提出了一种多核图像分割网络(MKIS-Net)，它使用多个核来创建一个有效的接受场，提高分割性能。由于其多内核设计，MKIS-Net是一个轻量级架构，具有少量可训练参数。此外，这些多核接受域也有助于更好的分割结果。我们证明了MKIS-Net在视网膜血管分割、皮肤病变分割和胸部x射线分割等几个任务上的有效性。与最先进的方法相比，所提出的网络的性能相当有竞争力，而且往往更优越。此外，在某些情况下，MKIS-Net的可训练参数比现有的医学图像分割替代方案少一个数量级以上，并且比其他轻量级架构至少小四倍。

{"title":"MKIS-Net: A Light-Weight Multi-Kernel Network for Medical Image Segmentation","authors":"T. M. Khan, Muhammad Arsalan, A. Robles-Kelly, E. Meijering","doi":"10.1109/DICTA56598.2022.10034573","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034573","url":null,"abstract":"Image segmentation is an important task in medical imaging. It constitutes the backbone of a wide variety of clinical diagnostic methods, treatments, and computer-aided surgeries. In this paper, we propose a multi-kernel image segmentation net (MKIS-Net), which uses multiple kernels to create an efficient receptive field and enhance segmentation performance. As a result of its multi-kernel design, MKIS-Net is a light-weight architecture with a small number of trainable parameters. Moreover, these multi-kernel receptive fields also contribute to better segmentation results. We demonstrate the efficacy of MKIS-Net on several tasks including segmentation of retinal vessels, skin lesion segmentation, and chest X-ray segmentation. The performance of the proposed network is quite competitive, and often superior, in comparison to state-of-the-art methods. Moreover, in some cases MKIS-Net has more than an order of magnitude fewer trainable parameters than existing medical image segmentation alternatives and is at least four times smaller than other light-weight architectures.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129729404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Prompt-guided Scene Generation for 3D Zero-Shot Learning 即时引导的场景生成3D零镜头学习

2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2022-09-29 DOI: 10.1109/DICTA56598.2022.10034623

Majid Nasiri, A. Cheraghian, T. Chowdhury, Sahar Ahmadi, Morteza Saberi, Shafin Rahman

Zero-shot learning on 3D point cloud data is a related underexplored problem compared to its 2D image counterpart. 3D data brings new challenges for ZSL due to the unavailability of robust pre-trained feature extraction models. To address this problem, we propose a prompt-guided 3D scene generation and supervision method that augments 3D data to learn the network better, exploring the complex interplay of seen and unseen objects. First, we merge point clouds of two 3D models in certain ways described by a prompt. The prompt acts like the annotation describing each 3D scene. Later, we perform contrastive learning to train our proposed architecture in an end-to-end manner. We argue that 3D scenes can relate objects more efficiently than single objects because popular language models (like BERT) can achieve high performance when objects appear in a context. Our proposed prompt-guided scene generation method encapsulates data augmentation and prompt-based annotation/captioning to improve 3D ZSL performance. We have achieved state-of-the-art ZSL and generalized ZSL performance on synthetic (ModelNet40, ModelNet10) and real-scanned (ScanOjbectNN) 3D object datasets.

与2D图像相比，3D点云数据的零射击学习是一个未被充分探索的问题。由于缺乏鲁棒的预训练特征提取模型，3D数据给ZSL带来了新的挑战。为了解决这个问题，我们提出了一种即时引导的3D场景生成和监督方法，该方法可以增强3D数据来更好地学习网络，探索可见和未见物体之间的复杂相互作用。首先，我们将两个三维模型的点云以提示符描述的特定方式合并。提示符的作用类似于描述每个3D场景的注释。稍后，我们执行对比学习，以端到端方式训练我们提出的体系结构。我们认为3D场景可以比单个对象更有效地关联对象，因为流行的语言模型(如BERT)可以在对象出现在上下文中时实现高性能。我们提出的基于提示的场景生成方法封装了数据增强和基于提示的注释/字幕，以提高3D ZSL的性能。我们已经在合成(ModelNet40, ModelNet10)和真实扫描(ScanOjbectNN) 3D对象数据集上实现了最先进的ZSL和广义ZSL性能。

{"title":"Prompt-guided Scene Generation for 3D Zero-Shot Learning","authors":"Majid Nasiri, A. Cheraghian, T. Chowdhury, Sahar Ahmadi, Morteza Saberi, Shafin Rahman","doi":"10.1109/DICTA56598.2022.10034623","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034623","url":null,"abstract":"Zero-shot learning on 3D point cloud data is a related underexplored problem compared to its 2D image counterpart. 3D data brings new challenges for ZSL due to the unavailability of robust pre-trained feature extraction models. To address this problem, we propose a prompt-guided 3D scene generation and supervision method that augments 3D data to learn the network better, exploring the complex interplay of seen and unseen objects. First, we merge point clouds of two 3D models in certain ways described by a prompt. The prompt acts like the annotation describing each 3D scene. Later, we perform contrastive learning to train our proposed architecture in an end-to-end manner. We argue that 3D scenes can relate objects more efficiently than single objects because popular language models (like BERT) can achieve high performance when objects appear in a context. Our proposed prompt-guided scene generation method encapsulates data augmentation and prompt-based annotation/captioning to improve 3D ZSL performance. We have achieved state-of-the-art ZSL and generalized ZSL performance on synthetic (ModelNet40, ModelNet10) and real-scanned (ScanOjbectNN) 3D object datasets.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127519777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Regularizing Neural Network Training via Identity-wise Discriminative Feature Suppression 基于身份识别特征抑制的神经网络正则化训练

2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2022-09-29 DOI: 10.1109/DICTA56598.2022.10034562

Avraham Chapman, Lingqiao Liu

It is well-known that a deep neural network has a strong fitting capability and can easily achieve a low training error even with randomly assigned class labels. When the number of training samples is small, or the class labels are noisy, networks tend to memorize patterns specific to individual instances to minimize the training error. This leads to the issue of overfitting and poor generalisation performance. This paper explores a remedy by suppressing the network's tendency to rely on instance-specific patterns for empirical error minimisation. The proposed method is based on an adversarial training framework. It suppresses features that can be utilized to identify individual instances among samples within each class. This leads to classifiers only using features that are both discriminative across classes and common within each class. We call our method Adversarial Suppression of Identity Features (ASIF), and demonstrate the usefulness of this technique in boosting generalisation accuracy when faced with small datasets or noisy labels. Our source code is available.

众所周知，深度神经网络具有很强的拟合能力，即使随机分配类标签，也能很容易地实现较低的训练误差。当训练样本数量较少，或者类标签有噪声时，网络倾向于记忆特定于单个实例的模式，以最小化训练误差。这导致了过拟合问题和较差的泛化性能。本文通过抑制网络依赖实例特定模式来最小化经验误差的倾向，探索了一种补救方法。该方法基于对抗性训练框架。它抑制了可用于在每个类中的样本中识别单个实例的特征。这导致分类器只使用在类之间具有区别性且在每个类中都是通用的特征。我们称我们的方法为对抗性身份特征抑制(ASIF)，并证明了该技术在面对小数据集或噪声标签时提高泛化准确性的有用性。我们的源代码是可用的。

{"title":"Regularizing Neural Network Training via Identity-wise Discriminative Feature Suppression","authors":"Avraham Chapman, Lingqiao Liu","doi":"10.1109/DICTA56598.2022.10034562","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034562","url":null,"abstract":"It is well-known that a deep neural network has a strong fitting capability and can easily achieve a low training error even with randomly assigned class labels. When the number of training samples is small, or the class labels are noisy, networks tend to memorize patterns specific to individual instances to minimize the training error. This leads to the issue of overfitting and poor generalisation performance. This paper explores a remedy by suppressing the network's tendency to rely on instance-specific patterns for empirical error minimisation. The proposed method is based on an adversarial training framework. It suppresses features that can be utilized to identify individual instances among samples within each class. This leads to classifiers only using features that are both discriminative across classes and common within each class. We call our method Adversarial Suppression of Identity Features (ASIF), and demonstrate the usefulness of this technique in boosting generalisation accuracy when faced with small datasets or noisy labels. Our source code is available.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130724104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization: Anonymous submission Paper ID 73 你真的是这个意思吗?内容驱动的视听深度伪造数据集和多模态时间伪造定位方法:匿名提交论文ID 73

2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2022-04-13 DOI: 10.1109/DICTA56598.2022.10034605

Zhixi Cai, Kalin Stefanov, Abhinav Dhall, Munawar Hayat

Due to its high societal impact, deepfake detection is getting active attention in the computer vision community. Most deepfake detection methods rely on identity, facial attributes, and adversarial perturbation-based spatio-temporal modifications at the whole video or random locations while keeping the meaning of the content intact. However, a sophisticated deepfake may contain only a small segment of video/audio manipulation, through which the meaning of the content can be, for example, completely inverted from a sentiment perspective. We introduce a content-driven audio-visual deepfake dataset, termed Localized Audio Visual DeepFake (LAV-DF), explicitly designed for the task of learning temporal forgery localization. Specifically, the content-driven audio-visual manipulations are performed strategically to change the sentiment polarity of the whole video. Our baseline method for benchmarking the proposed dataset is a 3DCNN model, termed as Boundary Aware Temporal Forgery Detection (BA-TFD), which is guided via contrastive, boundary matching, and frame classification loss functions. Our extensive quantitative and qualitative analysis demonstrates the proposed method's strong performance for temporal forgery localization and deepfake detection tasks.

由于其高度的社会影响，深度伪造检测在计算机视觉社区得到了积极的关注。大多数深度伪造检测方法依赖于整个视频或随机位置的身份、面部属性和基于对抗性扰动的时空修改，同时保持内容的意义完整。然而，一个复杂的深度伪造可能只包含一小部分视频/音频操作，例如，从情感的角度来看，内容的含义可以完全颠倒。我们介绍了一个内容驱动的视听深度伪造数据集，称为本地化视听深度伪造(LAV-DF)，明确设计用于学习时间伪造定位的任务。具体来说，通过内容驱动的视听操作，策略性地改变整个视频的情感极性。我们对所提出的数据集进行基准测试的基准方法是一个3DCNN模型，称为边界感知时间伪造检测(BA-TFD)，该模型通过对比、边界匹配和帧分类损失函数进行指导。我们广泛的定量和定性分析表明，该方法在时间伪造定位和深度伪造检测任务中具有很强的性能。

{"title":"Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization: Anonymous submission Paper ID 73","authors":"Zhixi Cai, Kalin Stefanov, Abhinav Dhall, Munawar Hayat","doi":"10.1109/DICTA56598.2022.10034605","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034605","url":null,"abstract":"Due to its high societal impact, deepfake detection is getting active attention in the computer vision community. Most deepfake detection methods rely on identity, facial attributes, and adversarial perturbation-based spatio-temporal modifications at the whole video or random locations while keeping the meaning of the content intact. However, a sophisticated deepfake may contain only a small segment of video/audio manipulation, through which the meaning of the content can be, for example, completely inverted from a sentiment perspective. We introduce a content-driven audio-visual deepfake dataset, termed Localized Audio Visual DeepFake (LAV-DF), explicitly designed for the task of learning temporal forgery localization. Specifically, the content-driven audio-visual manipulations are performed strategically to change the sentiment polarity of the whole video. Our baseline method for benchmarking the proposed dataset is a 3DCNN model, termed as Boundary Aware Temporal Forgery Detection (BA-TFD), which is guided via contrastive, boundary matching, and frame classification loss functions. Our extensive quantitative and qualitative analysis demonstrates the proposed method's strong performance for temporal forgery localization and deepfake detection tasks.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114462972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1