Pub Date : 2022-11-30DOI: 10.1109/DICTA56598.2022.10034575
Multi-Object Tracking (MOT) has been a popular and challenging topic in computer vision. However, identity issue, i.e., an object is wrongly associated with another object of a different identity, still remains to be a difficult problem. To address it, two factors are of great importance. First, multiple cues of different sources are needed for robust tracking to handle complicated situations where single source cue may not be reliable. Second, switchers that confuse targets and cause identity issues should be paid more attention to provide more information and avoid such issues. Based on these motivations, we propose a method for MOT, which mainly aims to take more cues and information of potential switchers into consideration. Other than the frequent usage of single appearance cue, we exploit cues from tracklet surroundings and historical appearance features and combine all cues in a unified manner. Unlike usual tracking methods, the proposed tracking classifier learns to deal with different strategies in varied situations w.r.t. a switcher. Extensive experiments show that our proposed method achieves competitive results in the challenging MOT benchmarks.
{"title":"Multi-Object Tracking with Multiple Cues and Switcher-Aware Classification","authors":"","doi":"10.1109/DICTA56598.2022.10034575","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034575","url":null,"abstract":"Multi-Object Tracking (MOT) has been a popular and challenging topic in computer vision. However, identity issue, i.e., an object is wrongly associated with another object of a different identity, still remains to be a difficult problem. To address it, two factors are of great importance. First, multiple cues of different sources are needed for robust tracking to handle complicated situations where single source cue may not be reliable. Second, switchers that confuse targets and cause identity issues should be paid more attention to provide more information and avoid such issues. Based on these motivations, we propose a method for MOT, which mainly aims to take more cues and information of potential switchers into consideration. Other than the frequent usage of single appearance cue, we exploit cues from tracklet surroundings and historical appearance features and combine all cues in a unified manner. Unlike usual tracking methods, the proposed tracking classifier learns to deal with different strategies in varied situations w.r.t. a switcher. Extensive experiments show that our proposed method achieves competitive results in the challenging MOT benchmarks.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124873043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-30DOI: 10.1109/DICTA56598.2022.10034599
Cell nuclei segmentation is important for histopathology image analysis. While deep learning has demonstrated promising results for automated cell nuclei segmentation, it is difficult to obtain accurate ground truth annotations due to the visual complexity of histopathology images and high density of cells. Weakly supervised cell segmentation can greatly reduce the effort required for annotation while maintaining high accuracy. However, current weakly supervised segmentation methods typically require the annotation of centroids for all cells, which is still a tedious task. In our study, we propose a semi- and weakly-supervised cell segmentation network named Deep Double Edge Enhancement Network (D2E2-Net) using only a small amount of points annotated. Our method focuses on tackling the issue of denoising the background noise to further enhance the cell boundary delineation. Our experimental results demonstrate state-of-the-art performance on three public histopathology image datasets.
{"title":"D2E2-Net: Double Deep Edge Enhancement for Weakly-Supervised Cell Nuclei Segmentation with Incomplete Point Annotations","authors":"","doi":"10.1109/DICTA56598.2022.10034599","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034599","url":null,"abstract":"Cell nuclei segmentation is important for histopathology image analysis. While deep learning has demonstrated promising results for automated cell nuclei segmentation, it is difficult to obtain accurate ground truth annotations due to the visual complexity of histopathology images and high density of cells. Weakly supervised cell segmentation can greatly reduce the effort required for annotation while maintaining high accuracy. However, current weakly supervised segmentation methods typically require the annotation of centroids for all cells, which is still a tedious task. In our study, we propose a semi- and weakly-supervised cell segmentation network named Deep Double Edge Enhancement Network (D2E2-Net) using only a small amount of points annotated. Our method focuses on tackling the issue of denoising the background noise to further enhance the cell boundary delineation. Our experimental results demonstrate state-of-the-art performance on three public histopathology image datasets.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128375105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-30DOI: 10.1109/DICTA56598.2022.10034608
Jiajie Chen
Many loss functions are derived from cross-entropy loss function, such as Large-Margin Softmax Loss which makes classification more rigorous and prevents over-fitting, Focal Loss which alleviates class imbalance in object detection by downweighting the loss of well-classified examples. However, these two loss functions derived from cross entropy lack inherent transformation. To this end, we further subdivide the entropybased loss into regularizer-based entropy loss and focal-based entropy loss and propose a novel optimized Hybrid Focal Margin Loss to handle extreme class imbalance and prevent over-fitting for crack segmentation. We have evaluated our proposal in comparison with three crack segmentation datasets (DeepCrack-DB, CRACK500 and our private PanelCrack dataset). Our experiments demonstrate that Focal Margin component can further increase the IoU of cracks by 0.43 points on DeepCrack- DB, 0.44 on our PanelCrack dataset, respectively
{"title":"Optimized Hybrid Focal Margin Loss for Crack Segmentation","authors":"Jiajie Chen","doi":"10.1109/DICTA56598.2022.10034608","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034608","url":null,"abstract":"Many loss functions are derived from cross-entropy loss function, such as Large-Margin Softmax Loss which makes classification more rigorous and prevents over-fitting, Focal Loss which alleviates class imbalance in object detection by downweighting the loss of well-classified examples. However, these two loss functions derived from cross entropy lack inherent transformation. To this end, we further subdivide the entropybased loss into regularizer-based entropy loss and focal-based entropy loss and propose a novel optimized Hybrid Focal Margin Loss to handle extreme class imbalance and prevent over-fitting for crack segmentation. We have evaluated our proposal in comparison with three crack segmentation datasets (DeepCrack-DB, CRACK500 and our private PanelCrack dataset). Our experiments demonstrate that Focal Margin component can further increase the IoU of cracks by 0.43 points on DeepCrack- DB, 0.44 on our PanelCrack dataset, respectively","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116251989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-30DOI: 10.1109/DICTA56598.2022.10034596
Visual object detection has made significant progress with the advent of deep neural networks and has been extensively applied. This work reports a novel application that aims to detect individual microatolls, which are circular coral colonies, from island images captured by drones. We first describe data collection and labelling to create a microatoll detection dataset. Upon this dataset, the state-of-the-art object detectors are then evaluated for this task. To better integrate a detector with the characteristic of microatolls, we propose a modified detector called Microatoll-Net. It actively extracts features from the surrounding area of a microatoll to differentiate it from distractors to improve detection. Multiple ways to incorporate this information into the detector are designed. Experimental study shows the efficacy of the proposed Microatoll-Net, especially on the most challenging region of an island. The code and dataset will be released soon.
{"title":"Detecting Microatolls from Drone Images","authors":"","doi":"10.1109/DICTA56598.2022.10034596","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034596","url":null,"abstract":"Visual object detection has made significant progress with the advent of deep neural networks and has been extensively applied. This work reports a novel application that aims to detect individual microatolls, which are circular coral colonies, from island images captured by drones. We first describe data collection and labelling to create a microatoll detection dataset. Upon this dataset, the state-of-the-art object detectors are then evaluated for this task. To better integrate a detector with the characteristic of microatolls, we propose a modified detector called Microatoll-Net. It actively extracts features from the surrounding area of a microatoll to differentiate it from distractors to improve detection. Multiple ways to incorporate this information into the detector are designed. Experimental study shows the efficacy of the proposed Microatoll-Net, especially on the most challenging region of an island. The code and dataset will be released soon.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"177 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117007557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-30DOI: 10.1109/DICTA56598.2022.10034568
Amar Ali N. Khan, N. Aouf
This paper attempts to tackle the problem of Deep Learning (DL) based Multispectral Visual Odometry (MSVO) through the adoption of backpropagation mechanism to optimise the multispectral feature matching. Based on imaging data abstraction, we can remove all modality specific artefacts from the image streams and therefore focus on the inherent structure of the scene, only, as encoded by edge maps. The systematic employment of multiple loss functions enables the edge map encoding through supervised learning and backpropagation optimisation enabling the elimination of errors attributable to the multispectral feature matching problem. To our knowledge, there exists no other work designed to eliminate the multispectral drift present in End-2-End DL-based VO solutions. Experimental data sets are used to validate our approach and show the quality results achieved.
{"title":"Backpropagation Based Deep Multispectral VO Drift Elimination","authors":"Amar Ali N. Khan, N. Aouf","doi":"10.1109/DICTA56598.2022.10034568","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034568","url":null,"abstract":"This paper attempts to tackle the problem of Deep Learning (DL) based Multispectral Visual Odometry (MSVO) through the adoption of backpropagation mechanism to optimise the multispectral feature matching. Based on imaging data abstraction, we can remove all modality specific artefacts from the image streams and therefore focus on the inherent structure of the scene, only, as encoded by edge maps. The systematic employment of multiple loss functions enables the edge map encoding through supervised learning and backpropagation optimisation enabling the elimination of errors attributable to the multispectral feature matching problem. To our knowledge, there exists no other work designed to eliminate the multispectral drift present in End-2-End DL-based VO solutions. Experimental data sets are used to validate our approach and show the quality results achieved.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"80 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129857099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-30DOI: 10.1109/DICTA56598.2022.10034618
Zaenab Alammar, Laith Alzubaidi, Jinglan Zhang, José Santamaréa, Yuefeng Li
Musculoskeletal refers to the muscles and skeleton of the body. In particular, the musculoskeletal system contains joints, muscles, bones, cartilage, ligaments, bursae, and tendons. In addition, the body's movement is allowed by this system, and the musculoskeletal support the stability of the body of a human being. Screening for musculoskeletal abnormalities is particularly critical as more than 1.7 billion people worldwide are affected by musculoskeletal conditions. Detecting whether a radiographic analysis is normal or abnormal is a critical radiographic issue. The most common mistake in the emergency department is the incorrect diagnosis of fractures, which could lead to delayed treatment and temporal/permanent disability. According to the latter, we can find several studies showing how a deep learning (DL) system can accurately detect fractures in the musculoskeletal system. This paper aimed to review the specific impact of using DL for musculoskeletal X-ray imaging. As far as we know, this is the first review focusing on the topic. In particular, this revision supports a more extensive study of the most significant aspects of machine learning (ML) and DL is dealing with it. It introduced the importance of using DL methods in musculoskeletal X-ray imaging and described MURA (musculoskeletal radiographs) dataset as an example. Specifically, convolutional neural networks (CNNs) are identified as one of the most widely adopted solutions within DL, and several enhancements have been described. Finally, current open challenges and suggested solutions are presented to help researchers propose new developments.
{"title":"A Concise Review on Deep Learning for Musculoskeletal X-ray Images","authors":"Zaenab Alammar, Laith Alzubaidi, Jinglan Zhang, José Santamaréa, Yuefeng Li","doi":"10.1109/DICTA56598.2022.10034618","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034618","url":null,"abstract":"Musculoskeletal refers to the muscles and skeleton of the body. In particular, the musculoskeletal system contains joints, muscles, bones, cartilage, ligaments, bursae, and tendons. In addition, the body's movement is allowed by this system, and the musculoskeletal support the stability of the body of a human being. Screening for musculoskeletal abnormalities is particularly critical as more than 1.7 billion people worldwide are affected by musculoskeletal conditions. Detecting whether a radiographic analysis is normal or abnormal is a critical radiographic issue. The most common mistake in the emergency department is the incorrect diagnosis of fractures, which could lead to delayed treatment and temporal/permanent disability. According to the latter, we can find several studies showing how a deep learning (DL) system can accurately detect fractures in the musculoskeletal system. This paper aimed to review the specific impact of using DL for musculoskeletal X-ray imaging. As far as we know, this is the first review focusing on the topic. In particular, this revision supports a more extensive study of the most significant aspects of machine learning (ML) and DL is dealing with it. It introduced the importance of using DL methods in musculoskeletal X-ray imaging and described MURA (musculoskeletal radiographs) dataset as an example. Specifically, convolutional neural networks (CNNs) are identified as one of the most widely adopted solutions within DL, and several enhancements have been described. Finally, current open challenges and suggested solutions are presented to help researchers propose new developments.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131600723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-30DOI: 10.1109/DICTA56598.2022.10034634
A popular choice when designing a semantic segmentation model is to adopt a pre-trained Deep Convolution Neural Network (DCNN) as a backbone and add extra modules for better semantic representation and competitive segmentation results. However, the large number of parameters and substantial memory footprint of these DCNN architectures make these large models unsuitable for real-time applications on mobile devices. To address the issue, this study proposes a very lightweight model, called Short-term Dense Bottleneck Network (SDBNet). By staging a series of bottleneck blocks, an efficient module, termed SDB, is carefully designed and it provides diverse field-ofviews for better contextualization of varied geometrical objects in a complex scene. For precise localization, a shallow branch is deployed in parallel to SDB which shares the spatial details with the SDB module at multiple stages. At the decoder end, a simple, yet effective feature refinement and semantic aggregation module is deployed for better context assimilation and region identification. The proposed model is evaluated using three public benchmarks and the results on Cityscapes (70.8%), Camvid (73.2%) and KITTI (51.8%) test sets clearly demonstrate a competitive performance under the real-time category. Among the real-time scene parsing models under 1.5 million parameters, the proposed SDBNet produces the state-of-the-art (SOTA) results on all three datasets.
{"title":"SDBNet: Lightweight Real-Time Semantic Segmentation Using Short-Term Dense Bottleneck","authors":"","doi":"10.1109/DICTA56598.2022.10034634","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034634","url":null,"abstract":"A popular choice when designing a semantic segmentation model is to adopt a pre-trained Deep Convolution Neural Network (DCNN) as a backbone and add extra modules for better semantic representation and competitive segmentation results. However, the large number of parameters and substantial memory footprint of these DCNN architectures make these large models unsuitable for real-time applications on mobile devices. To address the issue, this study proposes a very lightweight model, called Short-term Dense Bottleneck Network (SDBNet). By staging a series of bottleneck blocks, an efficient module, termed SDB, is carefully designed and it provides diverse field-ofviews for better contextualization of varied geometrical objects in a complex scene. For precise localization, a shallow branch is deployed in parallel to SDB which shares the spatial details with the SDB module at multiple stages. At the decoder end, a simple, yet effective feature refinement and semantic aggregation module is deployed for better context assimilation and region identification. The proposed model is evaluated using three public benchmarks and the results on Cityscapes (70.8%), Camvid (73.2%) and KITTI (51.8%) test sets clearly demonstrate a competitive performance under the real-time category. Among the real-time scene parsing models under 1.5 million parameters, the proposed SDBNet produces the state-of-the-art (SOTA) results on all three datasets.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132607251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-30DOI: 10.1109/DICTA56598.2022.10034616
Author One
Glaucoma is a silent killer of eyesight that affects people of all ages. The loss of sight from glaucoma is irreversible and usually gradual in nature, with treatments limited to slowing down its progression. Early detection is important to save vision loss. Colour fundus photographs (CFPs) are often used to diagnose glaucoma. In recent years there has been an increasing interest to develop convolutional neural network (CNN)-based approaches for automated assessment of glaucoma using CFPs. CNN models vary notably in network depth, computational cost, and performance. This paper aims to justify whether low computationally intensive CNNs are capable to detect glaucoma as good as high computationally intensive CNNs. With that aim, this paper evaluates the performance of seven state-of-the-art CNNs with varying computational intensity - MobileNetV2, MobileNetV3, Custom ResNet, InceptionV3, ResNet50, 18-Layer CNN and InceptionResNetV2. The publicly available large-scale attention-based glaucoma (LAG) dataset that has been used for experiments. With its 1, 711 “glaucomatous” and 3,143 “non-glaucomatous” sample images, LAG database is the largest publicly available glaucoma dataset to date. Experiments reveal that despite being significantly less computationally demanding, MobileNetV3 outperforms all others, and produces an accuracy, specificity and sensitivity of 97.7%, 97.8% and 97.6%, respectively.
{"title":"Evaluating the performance of different convolutional neural networks in glaucoma detection","authors":"Author One","doi":"10.1109/DICTA56598.2022.10034616","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034616","url":null,"abstract":"Glaucoma is a silent killer of eyesight that affects people of all ages. The loss of sight from glaucoma is irreversible and usually gradual in nature, with treatments limited to slowing down its progression. Early detection is important to save vision loss. Colour fundus photographs (CFPs) are often used to diagnose glaucoma. In recent years there has been an increasing interest to develop convolutional neural network (CNN)-based approaches for automated assessment of glaucoma using CFPs. CNN models vary notably in network depth, computational cost, and performance. This paper aims to justify whether low computationally intensive CNNs are capable to detect glaucoma as good as high computationally intensive CNNs. With that aim, this paper evaluates the performance of seven state-of-the-art CNNs with varying computational intensity - MobileNetV2, MobileNetV3, Custom ResNet, InceptionV3, ResNet50, 18-Layer CNN and InceptionResNetV2. The publicly available large-scale attention-based glaucoma (LAG) dataset that has been used for experiments. With its 1, 711 “glaucomatous” and 3,143 “non-glaucomatous” sample images, LAG database is the largest publicly available glaucoma dataset to date. Experiments reveal that despite being significantly less computationally demanding, MobileNetV3 outperforms all others, and produces an accuracy, specificity and sensitivity of 97.7%, 97.8% and 97.6%, respectively.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133398507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-30DOI: 10.1109/DICTA56598.2022.10034597
Various fields and industries have widely adopted Machine Learning (ML) to automate their manual processes and enable data-driven decision making. The Vehicle Leasing Return Assessment (VLRA) process requires all leased vehicles to be appraised for damages at the end of the contract period. These damages need to be classified, and a repair cost needs to be determined. This manual process adds time and labor overhead and introduces a high variance to the final cost due to human biases. A data-driven ML method is needed to automate and streamline VLRA to keep up with the increasing demand and ensure an optimal customer experience. In this work, we present Object Regression, an end-to-end detection and cost prediction model which leverages multi-modal image and vector data for damage detection and cost prediction in a single detection/regression network. Using Faster-RCNN coupled with a ResNet50 backbone, we can extend the capabilities of the standard two-stage object detector to utilize the inherent relationship between different data modalities that are not being leveraged by standalone detection or prediction models. We partner with one of Europe's biggest car manufacturers and detail the process of converting an industrial dataset for a ML task. We also showcase the performance improvements that can be achieved using highly related multi-modal data.
{"title":"Object Regression: Multi-Modal Data Enhanced Object Detection for Leasing Vehicle Return Assessment","authors":"","doi":"10.1109/DICTA56598.2022.10034597","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034597","url":null,"abstract":"Various fields and industries have widely adopted Machine Learning (ML) to automate their manual processes and enable data-driven decision making. The Vehicle Leasing Return Assessment (VLRA) process requires all leased vehicles to be appraised for damages at the end of the contract period. These damages need to be classified, and a repair cost needs to be determined. This manual process adds time and labor overhead and introduces a high variance to the final cost due to human biases. A data-driven ML method is needed to automate and streamline VLRA to keep up with the increasing demand and ensure an optimal customer experience. In this work, we present Object Regression, an end-to-end detection and cost prediction model which leverages multi-modal image and vector data for damage detection and cost prediction in a single detection/regression network. Using Faster-RCNN coupled with a ResNet50 backbone, we can extend the capabilities of the standard two-stage object detector to utilize the inherent relationship between different data modalities that are not being leveraged by standalone detection or prediction models. We partner with one of Europe's biggest car manufacturers and detail the process of converting an industrial dataset for a ML task. We also showcase the performance improvements that can be achieved using highly related multi-modal data.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"187 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134297491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-30DOI: 10.1109/DICTA56598.2022.10034617
Six degrees of freedom (6DOF) pose estimation is one of the common challenges in many robotic and computer vision applications. Most state of the art methods focus on conventional camera pose. In this paper, we propose to handle the problem of event camera pose estimation. We present in this paper to predict the camera pose using deep learning based method. It is composed of a convolutional and a recurrent neural networks connected to a dense layer regressor. We present results from a set of convolutional neural networks including commonly used ones. We demonstrated the performance of the proposed method on several datasets. The results demonstrate the superiority of the proposed methods compared to state-of-the art methods.
{"title":"Deep Learning For Pose Estimation From Event Camera","authors":"","doi":"10.1109/DICTA56598.2022.10034617","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034617","url":null,"abstract":"Six degrees of freedom (6DOF) pose estimation is one of the common challenges in many robotic and computer vision applications. Most state of the art methods focus on conventional camera pose. In this paper, we propose to handle the problem of event camera pose estimation. We present in this paper to predict the camera pose using deep learning based method. It is composed of a convolutional and a recurrent neural networks connected to a dense layer regressor. We present results from a set of convolutional neural networks including commonly used ones. We demonstrated the performance of the proposed method on several datasets. The results demonstrate the superiority of the proposed methods compared to state-of-the art methods.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131691305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}