2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)最新文献

英文中文

Multi-Object Tracking with Multiple Cues and Switcher-Aware Classification 多线索多目标跟踪与切换感知分类

2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034575

Multi-Object Tracking (MOT) has been a popular and challenging topic in computer vision. However, identity issue, i.e., an object is wrongly associated with another object of a different identity, still remains to be a difficult problem. To address it, two factors are of great importance. First, multiple cues of different sources are needed for robust tracking to handle complicated situations where single source cue may not be reliable. Second, switchers that confuse targets and cause identity issues should be paid more attention to provide more information and avoid such issues. Based on these motivations, we propose a method for MOT, which mainly aims to take more cues and information of potential switchers into consideration. Other than the frequent usage of single appearance cue, we exploit cues from tracklet surroundings and historical appearance features and combine all cues in a unified manner. Unlike usual tracking methods, the proposed tracking classifier learns to deal with different strategies in varied situations w.r.t. a switcher. Extensive experiments show that our proposed method achieves competitive results in the challenging MOT benchmarks.

多目标跟踪(MOT)一直是计算机视觉领域的一个热门且具有挑战性的课题。然而，身份问题，即一个对象被错误地与另一个具有不同身份的对象联系在一起，仍然是一个难题。要解决这个问题，有两个因素非常重要。首先，鲁棒跟踪需要多个不同来源的线索，以处理单一来源线索可能不可靠的复杂情况。其次，切换器混淆目标，造成身份问题，应更加重视，提供更多的信息，避免此类问题。基于这些动机，我们提出了一种MOT方法，其主要目的是考虑潜在切换者更多的线索和信息。除了频繁使用单一外观线索外，我们还从赛道周围环境和历史外观特征中挖掘线索，并将所有线索统一组合起来。与通常的跟踪方法不同，所提出的跟踪分类器学习处理不同情况下的不同策略，而不是切换器。大量的实验表明，我们提出的方法在具有挑战性的MOT基准测试中取得了具有竞争力的结果。

{"title":"Multi-Object Tracking with Multiple Cues and Switcher-Aware Classification","authors":"","doi":"10.1109/DICTA56598.2022.10034575","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034575","url":null,"abstract":"Multi-Object Tracking (MOT) has been a popular and challenging topic in computer vision. However, identity issue, i.e., an object is wrongly associated with another object of a different identity, still remains to be a difficult problem. To address it, two factors are of great importance. First, multiple cues of different sources are needed for robust tracking to handle complicated situations where single source cue may not be reliable. Second, switchers that confuse targets and cause identity issues should be paid more attention to provide more information and avoid such issues. Based on these motivations, we propose a method for MOT, which mainly aims to take more cues and information of potential switchers into consideration. Other than the frequent usage of single appearance cue, we exploit cues from tracklet surroundings and historical appearance features and combine all cues in a unified manner. Unlike usual tracking methods, the proposed tracking classifier learns to deal with different strategies in varied situations w.r.t. a switcher. Extensive experiments show that our proposed method achieves competitive results in the challenging MOT benchmarks.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124873043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

D2E2-Net: Double Deep Edge Enhancement for Weakly-Supervised Cell Nuclei Segmentation with Incomplete Point Annotations 基于不完全点注释的弱监督细胞核分割的双深边缘增强

2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034599

Cell nuclei segmentation is important for histopathology image analysis. While deep learning has demonstrated promising results for automated cell nuclei segmentation, it is difficult to obtain accurate ground truth annotations due to the visual complexity of histopathology images and high density of cells. Weakly supervised cell segmentation can greatly reduce the effort required for annotation while maintaining high accuracy. However, current weakly supervised segmentation methods typically require the annotation of centroids for all cells, which is still a tedious task. In our study, we propose a semi- and weakly-supervised cell segmentation network named Deep Double Edge Enhancement Network (D2E2-Net) using only a small amount of points annotated. Our method focuses on tackling the issue of denoising the background noise to further enhance the cell boundary delineation. Our experimental results demonstrate state-of-the-art performance on three public histopathology image datasets.

细胞核分割是组织病理学图像分析的重要内容。虽然深度学习在自动细胞核分割方面已经显示出有希望的结果，但由于组织病理学图像的视觉复杂性和细胞的高密度，很难获得准确的ground truth注释。弱监督单元分割可以大大减少标注所需的工作量，同时保持较高的准确性。然而，目前的弱监督分割方法通常需要对所有细胞的质心进行标注，这仍然是一项繁琐的任务。在我们的研究中，我们提出了一种半监督和弱监督的细胞分割网络，称为深度双边缘增强网络(D2E2-Net)，只使用少量的注释点。我们的方法主要解决背景噪声去噪的问题，以进一步增强细胞边界的描绘。我们的实验结果在三个公共组织病理学图像数据集上展示了最先进的性能。

引用次数: 0

Optimized Hybrid Focal Margin Loss for Crack Segmentation 裂纹分割的优化混合焦缘损失

2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034608

Jiajie Chen

Many loss functions are derived from cross-entropy loss function, such as Large-Margin Softmax Loss which makes classification more rigorous and prevents over-fitting, Focal Loss which alleviates class imbalance in object detection by downweighting the loss of well-classified examples. However, these two loss functions derived from cross entropy lack inherent transformation. To this end, we further subdivide the entropybased loss into regularizer-based entropy loss and focal-based entropy loss and propose a novel optimized Hybrid Focal Margin Loss to handle extreme class imbalance and prevent over-fitting for crack segmentation. We have evaluated our proposal in comparison with three crack segmentation datasets (DeepCrack-DB, CRACK500 and our private PanelCrack dataset). Our experiments demonstrate that Focal Margin component can further increase the IoU of cracks by 0.43 points on DeepCrack- DB, 0.44 on our PanelCrack dataset, respectively

许多损失函数是由交叉熵损失函数衍生而来的，例如Large-Margin Softmax loss，它使分类更加严格，防止过拟合，Focal loss通过降低分类良好的样本的损失来减轻目标检测中的类不平衡。然而，这两种由交叉熵导出的损失函数缺乏固有的变换。为此，我们将基于熵的损失进一步细分为基于正则器的熵损失和基于焦点的熵损失，并提出了一种新的优化混合焦点裕度损失来处理极端类不平衡和防止裂缝分割的过拟合。我们与三个裂缝分割数据集(DeepCrack-DB, CRACK500和我们的私人PanelCrack数据集)进行了比较，评估了我们的建议。我们的实验表明，Focal Margin分量可以进一步将DeepCrack- DB数据集上的裂缝IoU分别提高0.43个点，在PanelCrack数据集上提高0.44个点

引用次数: 0

Detecting Microatolls from Drone Images 从无人机图像中检测微环礁

2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034596

Visual object detection has made significant progress with the advent of deep neural networks and has been extensively applied. This work reports a novel application that aims to detect individual microatolls, which are circular coral colonies, from island images captured by drones. We first describe data collection and labelling to create a microatoll detection dataset. Upon this dataset, the state-of-the-art object detectors are then evaluated for this task. To better integrate a detector with the characteristic of microatolls, we propose a modified detector called Microatoll-Net. It actively extracts features from the surrounding area of a microatoll to differentiate it from distractors to improve detection. Multiple ways to incorporate this information into the detector are designed. Experimental study shows the efficacy of the proposed Microatoll-Net, especially on the most challenging region of an island. The code and dataset will be released soon.

随着深度神经网络的出现，视觉目标检测取得了重大进展，并得到了广泛的应用。这项工作报告了一项新的应用，旨在从无人机捕获的岛屿图像中检测单个微环礁，即圆形珊瑚群落。我们首先描述数据收集和标记，以创建一个微环礁检测数据集。在此数据集上，最先进的目标探测器将被评估用于此任务。为了更好地将探测器与微环礁的特性结合起来，我们提出了一种改进的探测器，称为Microatoll-Net。它主动提取微环礁周围区域的特征，将其与干扰物区分开来，以提高检测能力。设计了多种方法将这些信息合并到检测器中。实验研究显示了拟议的微环礁网的有效性，特别是在岛屿上最具挑战性的区域。代码和数据集将很快发布。

引用次数: 0

Backpropagation Based Deep Multispectral VO Drift Elimination 基于反向传播的深多谱VO漂移消除

2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034568

Amar Ali N. Khan, N. Aouf

This paper attempts to tackle the problem of Deep Learning (DL) based Multispectral Visual Odometry (MSVO) through the adoption of backpropagation mechanism to optimise the multispectral feature matching. Based on imaging data abstraction, we can remove all modality specific artefacts from the image streams and therefore focus on the inherent structure of the scene, only, as encoded by edge maps. The systematic employment of multiple loss functions enables the edge map encoding through supervised learning and backpropagation optimisation enabling the elimination of errors attributable to the multispectral feature matching problem. To our knowledge, there exists no other work designed to eliminate the multispectral drift present in End-2-End DL-based VO solutions. Experimental data sets are used to validate our approach and show the quality results achieved.

本文试图通过采用反向传播机制优化多光谱特征匹配，解决基于深度学习(DL)的多光谱视觉里程测量(MSVO)问题。基于图像数据抽象，我们可以从图像流中去除所有模态特定的伪影，从而只关注场景的固有结构，通过边缘映射编码。系统地使用多个损失函数，通过监督学习和反向传播优化实现边缘图编码，从而消除多光谱特征匹配问题引起的误差。据我们所知，目前还没有其他的工作旨在消除端到端dl的VO解决方案中存在的多光谱漂移。实验数据集用于验证我们的方法，并显示所取得的质量结果。

引用次数: 0

A Concise Review on Deep Learning for Musculoskeletal X-ray Images 肌肉骨骼x射线图像深度学习研究综述

2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034618

Zaenab Alammar, Laith Alzubaidi, Jinglan Zhang, José Santamaréa, Yuefeng Li

Musculoskeletal refers to the muscles and skeleton of the body. In particular, the musculoskeletal system contains joints, muscles, bones, cartilage, ligaments, bursae, and tendons. In addition, the body's movement is allowed by this system, and the musculoskeletal support the stability of the body of a human being. Screening for musculoskeletal abnormalities is particularly critical as more than 1.7 billion people worldwide are affected by musculoskeletal conditions. Detecting whether a radiographic analysis is normal or abnormal is a critical radiographic issue. The most common mistake in the emergency department is the incorrect diagnosis of fractures, which could lead to delayed treatment and temporal/permanent disability. According to the latter, we can find several studies showing how a deep learning (DL) system can accurately detect fractures in the musculoskeletal system. This paper aimed to review the specific impact of using DL for musculoskeletal X-ray imaging. As far as we know, this is the first review focusing on the topic. In particular, this revision supports a more extensive study of the most significant aspects of machine learning (ML) and DL is dealing with it. It introduced the importance of using DL methods in musculoskeletal X-ray imaging and described MURA (musculoskeletal radiographs) dataset as an example. Specifically, convolutional neural networks (CNNs) are identified as one of the most widely adopted solutions within DL, and several enhancements have been described. Finally, current open challenges and suggested solutions are presented to help researchers propose new developments.

肌肉骨骼是指身体的肌肉和骨骼。特别是，肌肉骨骼系统包括关节、肌肉、骨骼、软骨、韧带、滑囊和肌腱。此外，人体的运动是由这个系统允许的，肌肉骨骼支持人体的稳定性。筛查肌肉骨骼异常尤为重要，因为全球有超过17亿人受到肌肉骨骼疾病的影响。检测放射分析是否正常或异常是一个关键的放射学问题。急诊科最常见的错误是对骨折的错误诊断，这可能导致治疗延误和暂时性/永久性残疾。根据后者，我们可以找到一些研究表明深度学习(DL)系统如何准确地检测肌肉骨骼系统中的骨折。本文旨在回顾使用DL进行肌肉骨骼x线成像的具体影响。据我们所知，这是第一次针对该主题的评论。特别是，该修订支持对机器学习(ML)的最重要方面进行更广泛的研究，而DL正在处理它。它介绍了在肌肉骨骼x射线成像中使用DL方法的重要性，并以MURA(肌肉骨骼x射线片)数据集为例进行了描述。具体来说，卷积神经网络(cnn)被认为是深度学习中最广泛采用的解决方案之一，并且已经描述了几种增强方法。最后，提出了当前开放的挑战和建议的解决方案，以帮助研究人员提出新的发展。

{"title":"A Concise Review on Deep Learning for Musculoskeletal X-ray Images","authors":"Zaenab Alammar, Laith Alzubaidi, Jinglan Zhang, José Santamaréa, Yuefeng Li","doi":"10.1109/DICTA56598.2022.10034618","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034618","url":null,"abstract":"Musculoskeletal refers to the muscles and skeleton of the body. In particular, the musculoskeletal system contains joints, muscles, bones, cartilage, ligaments, bursae, and tendons. In addition, the body's movement is allowed by this system, and the musculoskeletal support the stability of the body of a human being. Screening for musculoskeletal abnormalities is particularly critical as more than 1.7 billion people worldwide are affected by musculoskeletal conditions. Detecting whether a radiographic analysis is normal or abnormal is a critical radiographic issue. The most common mistake in the emergency department is the incorrect diagnosis of fractures, which could lead to delayed treatment and temporal/permanent disability. According to the latter, we can find several studies showing how a deep learning (DL) system can accurately detect fractures in the musculoskeletal system. This paper aimed to review the specific impact of using DL for musculoskeletal X-ray imaging. As far as we know, this is the first review focusing on the topic. In particular, this revision supports a more extensive study of the most significant aspects of machine learning (ML) and DL is dealing with it. It introduced the importance of using DL methods in musculoskeletal X-ray imaging and described MURA (musculoskeletal radiographs) dataset as an example. Specifically, convolutional neural networks (CNNs) are identified as one of the most widely adopted solutions within DL, and several enhancements have been described. Finally, current open challenges and suggested solutions are presented to help researchers propose new developments.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131600723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

SDBNet: Lightweight Real-Time Semantic Segmentation Using Short-Term Dense Bottleneck SDBNet:使用短期密集瓶颈的轻量级实时语义分割

2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034634

A popular choice when designing a semantic segmentation model is to adopt a pre-trained Deep Convolution Neural Network (DCNN) as a backbone and add extra modules for better semantic representation and competitive segmentation results. However, the large number of parameters and substantial memory footprint of these DCNN architectures make these large models unsuitable for real-time applications on mobile devices. To address the issue, this study proposes a very lightweight model, called Short-term Dense Bottleneck Network (SDBNet). By staging a series of bottleneck blocks, an efficient module, termed SDB, is carefully designed and it provides diverse field-ofviews for better contextualization of varied geometrical objects in a complex scene. For precise localization, a shallow branch is deployed in parallel to SDB which shares the spatial details with the SDB module at multiple stages. At the decoder end, a simple, yet effective feature refinement and semantic aggregation module is deployed for better context assimilation and region identification. The proposed model is evaluated using three public benchmarks and the results on Cityscapes (70.8%), Camvid (73.2%) and KITTI (51.8%) test sets clearly demonstrate a competitive performance under the real-time category. Among the real-time scene parsing models under 1.5 million parameters, the proposed SDBNet produces the state-of-the-art (SOTA) results on all three datasets.

在设计语义分割模型时，常用的选择是采用预训练的深度卷积神经网络(DCNN)作为主干，并添加额外的模块以获得更好的语义表示和有竞争力的分割结果。然而，这些DCNN架构的大量参数和大量内存占用使得这些大型模型不适合移动设备上的实时应用。为了解决这个问题，本研究提出了一个非常轻量级的模型，称为短期密集瓶颈网络(SDBNet)。通过一系列的瓶颈块，一个被称为SDB的高效模块被精心设计，它提供了不同的视野，以便在复杂的场景中更好地情境化各种几何物体。为了精确定位，一个浅分支与SDB并行部署，在多个阶段与SDB模块共享空间细节。在解码器端，部署了一个简单而有效的特征提炼和语义聚合模块，以更好地进行上下文同化和区域识别。所提出的模型使用三个公共基准进行评估，在cityscape(70.8%)、Camvid(73.2%)和KITTI(51.8%)测试集上的结果清楚地表明，在实时类别下具有竞争力的性能。在150万个参数下的实时场景解析模型中，提出的SDBNet在所有三个数据集上产生最先进的(SOTA)结果。

{"title":"SDBNet: Lightweight Real-Time Semantic Segmentation Using Short-Term Dense Bottleneck","authors":"","doi":"10.1109/DICTA56598.2022.10034634","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034634","url":null,"abstract":"A popular choice when designing a semantic segmentation model is to adopt a pre-trained Deep Convolution Neural Network (DCNN) as a backbone and add extra modules for better semantic representation and competitive segmentation results. However, the large number of parameters and substantial memory footprint of these DCNN architectures make these large models unsuitable for real-time applications on mobile devices. To address the issue, this study proposes a very lightweight model, called Short-term Dense Bottleneck Network (SDBNet). By staging a series of bottleneck blocks, an efficient module, termed SDB, is carefully designed and it provides diverse field-ofviews for better contextualization of varied geometrical objects in a complex scene. For precise localization, a shallow branch is deployed in parallel to SDB which shares the spatial details with the SDB module at multiple stages. At the decoder end, a simple, yet effective feature refinement and semantic aggregation module is deployed for better context assimilation and region identification. The proposed model is evaluated using three public benchmarks and the results on Cityscapes (70.8%), Camvid (73.2%) and KITTI (51.8%) test sets clearly demonstrate a competitive performance under the real-time category. Among the real-time scene parsing models under 1.5 million parameters, the proposed SDBNet produces the state-of-the-art (SOTA) results on all three datasets.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132607251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Evaluating the performance of different convolutional neural networks in glaucoma detection 不同卷积神经网络在青光眼检测中的性能评价

2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034616

Author One

Glaucoma is a silent killer of eyesight that affects people of all ages. The loss of sight from glaucoma is irreversible and usually gradual in nature, with treatments limited to slowing down its progression. Early detection is important to save vision loss. Colour fundus photographs (CFPs) are often used to diagnose glaucoma. In recent years there has been an increasing interest to develop convolutional neural network (CNN)-based approaches for automated assessment of glaucoma using CFPs. CNN models vary notably in network depth, computational cost, and performance. This paper aims to justify whether low computationally intensive CNNs are capable to detect glaucoma as good as high computationally intensive CNNs. With that aim, this paper evaluates the performance of seven state-of-the-art CNNs with varying computational intensity - MobileNetV2, MobileNetV3, Custom ResNet, InceptionV3, ResNet50, 18-Layer CNN and InceptionResNetV2. The publicly available large-scale attention-based glaucoma (LAG) dataset that has been used for experiments. With its 1, 711 “glaucomatous” and 3,143 “non-glaucomatous” sample images, LAG database is the largest publicly available glaucoma dataset to date. Experiments reveal that despite being significantly less computationally demanding, MobileNetV3 outperforms all others, and produces an accuracy, specificity and sensitivity of 97.7%, 97.8% and 97.6%, respectively.

青光眼是一种无声的视力杀手，影响着所有年龄段的人。青光眼导致的视力丧失是不可逆的，通常是渐进的，治疗仅限于减缓其进展。早期发现对挽救视力丧失很重要。彩色眼底照片(CFPs)常用于诊断青光眼。近年来，人们对开发基于卷积神经网络(CNN)的方法来使用CFPs自动评估青光眼越来越感兴趣。CNN模型在网络深度、计算成本和性能方面差异很大。本文旨在证明低计算密集型cnn是否能够像高计算密集型cnn一样好地检测青光眼。为此，本文评估了具有不同计算强度的七种最先进的CNN的性能- MobileNetV2, MobileNetV3, Custom ResNet, InceptionV3, ResNet50, 18层CNN和InceptionResNetV2。公开的大规模基于注意力的青光眼(LAG)数据集已用于实验。LAG数据库拥有1711张“青光眼”和3143张“非青光眼”样本图像，是迄今为止最大的公开青光眼数据集。实验表明，尽管MobileNetV3的计算要求显著降低，但其性能优于其他所有方法，其准确性、特异性和灵敏度分别为97.7%、97.8%和97.6%。

{"title":"Evaluating the performance of different convolutional neural networks in glaucoma detection","authors":"Author One","doi":"10.1109/DICTA56598.2022.10034616","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034616","url":null,"abstract":"Glaucoma is a silent killer of eyesight that affects people of all ages. The loss of sight from glaucoma is irreversible and usually gradual in nature, with treatments limited to slowing down its progression. Early detection is important to save vision loss. Colour fundus photographs (CFPs) are often used to diagnose glaucoma. In recent years there has been an increasing interest to develop convolutional neural network (CNN)-based approaches for automated assessment of glaucoma using CFPs. CNN models vary notably in network depth, computational cost, and performance. This paper aims to justify whether low computationally intensive CNNs are capable to detect glaucoma as good as high computationally intensive CNNs. With that aim, this paper evaluates the performance of seven state-of-the-art CNNs with varying computational intensity - MobileNetV2, MobileNetV3, Custom ResNet, InceptionV3, ResNet50, 18-Layer CNN and InceptionResNetV2. The publicly available large-scale attention-based glaucoma (LAG) dataset that has been used for experiments. With its 1, 711 “glaucomatous” and 3,143 “non-glaucomatous” sample images, LAG database is the largest publicly available glaucoma dataset to date. Experiments reveal that despite being significantly less computationally demanding, MobileNetV3 outperforms all others, and produces an accuracy, specificity and sensitivity of 97.7%, 97.8% and 97.6%, respectively.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133398507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Object Regression: Multi-Modal Data Enhanced Object Detection for Leasing Vehicle Return Assessment 目标回归:多模态数据增强的租赁车辆收益评估目标检测

2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034597

Various fields and industries have widely adopted Machine Learning (ML) to automate their manual processes and enable data-driven decision making. The Vehicle Leasing Return Assessment (VLRA) process requires all leased vehicles to be appraised for damages at the end of the contract period. These damages need to be classified, and a repair cost needs to be determined. This manual process adds time and labor overhead and introduces a high variance to the final cost due to human biases. A data-driven ML method is needed to automate and streamline VLRA to keep up with the increasing demand and ensure an optimal customer experience. In this work, we present Object Regression, an end-to-end detection and cost prediction model which leverages multi-modal image and vector data for damage detection and cost prediction in a single detection/regression network. Using Faster-RCNN coupled with a ResNet50 backbone, we can extend the capabilities of the standard two-stage object detector to utilize the inherent relationship between different data modalities that are not being leveraged by standalone detection or prediction models. We partner with one of Europe's biggest car manufacturers and detail the process of converting an industrial dataset for a ML task. We also showcase the performance improvements that can be achieved using highly related multi-modal data.

各个领域和行业已经广泛采用机器学习(ML)来自动化其手动流程并实现数据驱动的决策。车辆租赁回报评估(VLRA)流程要求在合同期限结束时对所有租赁车辆进行损害评估。需要对这些损坏进行分类，并确定修复费用。这种手工过程增加了时间和人力开销，并且由于人类的偏见，给最终成本带来了很大的差异。需要一种数据驱动的ML方法来自动化和简化VLRA，以跟上不断增长的需求并确保最佳的客户体验。在这项工作中，我们提出了对象回归，这是一种端到端检测和成本预测模型，它利用多模态图像和矢量数据在单个检测/回归网络中进行损伤检测和成本预测。使用Faster-RCNN与ResNet50骨干网相结合，我们可以扩展标准两阶段目标检测器的功能，以利用独立检测或预测模型无法利用的不同数据模式之间的内在关系。我们与欧洲最大的汽车制造商之一合作，详细介绍了将工业数据集转换为ML任务的过程。我们还展示了使用高度相关的多模态数据可以实现的性能改进。

{"title":"Object Regression: Multi-Modal Data Enhanced Object Detection for Leasing Vehicle Return Assessment","authors":"","doi":"10.1109/DICTA56598.2022.10034597","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034597","url":null,"abstract":"Various fields and industries have widely adopted Machine Learning (ML) to automate their manual processes and enable data-driven decision making. The Vehicle Leasing Return Assessment (VLRA) process requires all leased vehicles to be appraised for damages at the end of the contract period. These damages need to be classified, and a repair cost needs to be determined. This manual process adds time and labor overhead and introduces a high variance to the final cost due to human biases. A data-driven ML method is needed to automate and streamline VLRA to keep up with the increasing demand and ensure an optimal customer experience. In this work, we present Object Regression, an end-to-end detection and cost prediction model which leverages multi-modal image and vector data for damage detection and cost prediction in a single detection/regression network. Using Faster-RCNN coupled with a ResNet50 backbone, we can extend the capabilities of the standard two-stage object detector to utilize the inherent relationship between different data modalities that are not being leveraged by standalone detection or prediction models. We partner with one of Europe's biggest car manufacturers and detail the process of converting an industrial dataset for a ML task. We also showcase the performance improvements that can be achieved using highly related multi-modal data.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"187 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134297491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep Learning For Pose Estimation From Event Camera 基于事件相机的深度学习姿态估计

2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034617

Six degrees of freedom (6DOF) pose estimation is one of the common challenges in many robotic and computer vision applications. Most state of the art methods focus on conventional camera pose. In this paper, we propose to handle the problem of event camera pose estimation. We present in this paper to predict the camera pose using deep learning based method. It is composed of a convolutional and a recurrent neural networks connected to a dense layer regressor. We present results from a set of convolutional neural networks including commonly used ones. We demonstrated the performance of the proposed method on several datasets. The results demonstrate the superiority of the proposed methods compared to state-of-the art methods.

六自由度(6DOF)姿态估计是许多机器人和计算机视觉应用中常见的挑战之一。大多数最先进的方法集中在传统的相机姿势。本文提出了一种处理事件相机姿态估计问题的方法。本文提出了一种基于深度学习的相机姿态预测方法。它由一个卷积神经网络和一个递归神经网络连接到一个密集层回归器组成。我们给出了一组卷积神经网络的结果，包括常用的卷积神经网络。我们在多个数据集上验证了该方法的性能。结果表明，与现有的方法相比，所提出的方法具有优越性。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀