首页 > 最新文献

2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)最新文献

英文 中文
Feature Similarity and its Correlation with Accuracy in Knowledge Distillation 知识蒸馏中特征相似度及其与准确性的关系
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034621
Knowledge distillation (KD) has emerged as a popular model compression technique to transfer knowledge from a larger, more performante teacher network to a more compact student network to improve its accuracy. Depending on the type of knowledge being transferred, KD can be categorised as follows: response-based, feature-based or similarity-based distillation [5], [26]. Inspired by Bucilua et al. [3], KD was originally proposed by Hinton et al. [8] as a response-based distillation technique which transferred the so-called “dark knowledge” by “softening” the teacher's prediction vector before distilling it to the student in image classification.
知识蒸馏(Knowledge distillation, KD)已成为一种流行的模型压缩技术,用于将知识从更大、更高效的教师网络转移到更紧凑的学生网络,以提高其准确性。根据转移的知识类型,知识分配可以分类如下:基于响应的、基于特征的或基于相似性的蒸馏[5],[26]。受Bucilua et al. b[8]的启发,KD最初是由Hinton et al.[8]提出的,作为一种基于响应的蒸馏技术,在图像分类中,通过“软化”老师的预测向量,将所谓的“暗知识”转移到学生面前,然后将其蒸馏出来。
{"title":"Feature Similarity and its Correlation with Accuracy in Knowledge Distillation","authors":"","doi":"10.1109/DICTA56598.2022.10034621","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034621","url":null,"abstract":"Knowledge distillation (KD) has emerged as a popular model compression technique to transfer knowledge from a larger, more performante teacher network to a more compact student network to improve its accuracy. Depending on the type of knowledge being transferred, KD can be categorised as follows: response-based, feature-based or similarity-based distillation [5], [26]. Inspired by Bucilua et al. [3], KD was originally proposed by Hinton et al. [8] as a response-based distillation technique which transferred the so-called “dark knowledge” by “softening” the teacher's prediction vector before distilling it to the student in image classification.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131637885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine Vision Approach for Slipper Lobster Weight Estimation 拖鞋龙虾重量估计的机器视觉方法
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034627
Computer vision techniques have been successfully applied across a large number of industries for a variety of purposes. In this work we extend the capabilities of computer vision to slipper lobster weight estimation. Our proposed method combines machine learning and traditional computer vision techniques to first detect slipper lobsters and their eyes. An algorithm to determine which eyes belong to which slipper lobster and estimate the weight from the distance between the eyes is then developed. The proposed method correctly identifies 86% of lobster eye pairs and estimates weight with a mean error of 4.78g. Our weight estimation method achieves high accuracy and has the potential to be implemented within aquaculture operations in the future.
计算机视觉技术已经成功地应用于许多行业的各种目的。在这项工作中,我们将计算机视觉的能力扩展到拖鞋龙虾的重量估计。我们提出的方法结合机器学习和传统的计算机视觉技术,首先检测拖鞋龙虾和它们的眼睛。然后开发了一种算法来确定哪只眼睛属于哪只拖鞋龙虾,并根据眼睛之间的距离估计重量。该方法正确识别了86%的龙虾眼睛对,估计重量的平均误差为4.78g。我们的权重估计方法具有较高的准确性,在未来的水产养殖操作中具有实施的潜力。
{"title":"Machine Vision Approach for Slipper Lobster Weight Estimation","authors":"","doi":"10.1109/DICTA56598.2022.10034627","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034627","url":null,"abstract":"Computer vision techniques have been successfully applied across a large number of industries for a variety of purposes. In this work we extend the capabilities of computer vision to slipper lobster weight estimation. Our proposed method combines machine learning and traditional computer vision techniques to first detect slipper lobsters and their eyes. An algorithm to determine which eyes belong to which slipper lobster and estimate the weight from the distance between the eyes is then developed. The proposed method correctly identifies 86% of lobster eye pairs and estimates weight with a mean error of 4.78g. Our weight estimation method achieves high accuracy and has the potential to be implemented within aquaculture operations in the future.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131692978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust Knowledge Adaptation for Federated Unsupervised Person ReID 联邦无监督人ReID的鲁棒知识自适应
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034631
Jianfeng Weng, Kun Hu, Tingting Yao, Jingya Wang, Zhiyong Wang
Person Re-identification (ReID) has been extensively studied in recent years due to the increasing demand in public security. However, collecting and dealing with sensitive personal data raises privacy concerns. Therefore, federated learning has been explored for Person ReID, which aims to share minimal sensitive data between different parties (clients). However, existing federated learning based person ReID methods generally rely on laborious and time-consuming data annotations and it is difficult to guarantee cross-domain consistency. Thus, in this work, a federated unsupervised cluster-contrastive (FedUCC) learning method is proposed for Person ReID. FedUCC introduces a three-stage modelling strategy following a coarse-to-fine manner. In detail, generic knowledge, specialized knowledge and patch knowledge are discovered using a deep neural network. This enables the sharing of mutual knowledge among clients while retaining local domain-specific knowledge based on the kinds of network layers and their parameters. Comprehensive experiments on 8 public benchmark datasets demonstrate the state-of-the-art performance of our proposed method.
近年来,由于公共安全领域对身份识别的需求日益增加,对身份识别进行了广泛的研究。然而,收集和处理敏感的个人数据会引起隐私问题。因此,已经为Person ReID探索了联邦学习,其目的是在不同的各方(客户端)之间共享最小的敏感数据。然而,现有的基于联邦学习的人员识别方法通常依赖于费力且耗时的数据标注,难以保证跨域一致性。因此,在这项工作中,提出了一种针对Person ReID的联邦无监督聚类对比(FedUCC)学习方法。FedUCC引入了一个三阶段建模策略,遵循从粗到精的方式。利用深度神经网络发现通用知识、专业知识和补丁知识。这允许在客户端之间共享相互知识,同时根据网络层的类型及其参数保留本地特定于领域的知识。在8个公共基准数据集上的综合实验证明了我们提出的方法的最先进性能。
{"title":"Robust Knowledge Adaptation for Federated Unsupervised Person ReID","authors":"Jianfeng Weng, Kun Hu, Tingting Yao, Jingya Wang, Zhiyong Wang","doi":"10.1109/DICTA56598.2022.10034631","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034631","url":null,"abstract":"Person Re-identification (ReID) has been extensively studied in recent years due to the increasing demand in public security. However, collecting and dealing with sensitive personal data raises privacy concerns. Therefore, federated learning has been explored for Person ReID, which aims to share minimal sensitive data between different parties (clients). However, existing federated learning based person ReID methods generally rely on laborious and time-consuming data annotations and it is difficult to guarantee cross-domain consistency. Thus, in this work, a federated unsupervised cluster-contrastive (FedUCC) learning method is proposed for Person ReID. FedUCC introduces a three-stage modelling strategy following a coarse-to-fine manner. In detail, generic knowledge, specialized knowledge and patch knowledge are discovered using a deep neural network. This enables the sharing of mutual knowledge among clients while retaining local domain-specific knowledge based on the kinds of network layers and their parameters. Comprehensive experiments on 8 public benchmark datasets demonstrate the state-of-the-art performance of our proposed method.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"108 14","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120850709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
End-to-End Traffic Sign Damage Assessment 端到端交通标志损坏评估
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034587
Traffic sign damage monitoring is a practical issue facing large operations all over the world. Despite the scale of traffic sign damage and its consequent impact on public safety, damage audits are performed manually. By automating components of damage assessment we can greatly improve the effectiveness and efficiency of the process and in doing so alleviate its negative impact on traffic safety. In this paper, traffic sign damage assessment is explored as a computer vision problem approached with deep learning. We specifically focus on occlusion-type damages that hinder sign legibility. This paper makes several contributions. Firstly, it provides a comprehensive survey of related work on this problem. Secondly, it provides an extension to the generation of synthetic images for such a study. Most importantly, it proposes an extension of the EfficientDet object detection framework to address the challenge. It is shown that synthetic images can be successfully used to train an object detector variant to assess the level of damage, as measured between 0.0 and 1.0, in traffic signs. The extended framework achieves a damage assessment root mean squared error (RMSE) of 0.087 on a synthetic test set while maintaining its object detection capabilities.
交通标志损坏监测是世界范围内大型企业面临的现实问题。尽管交通标志损坏的规模及其对公共安全的影响,但损坏审计是手工进行的。通过将损害评估的组成部分自动化,我们可以大大提高这一过程的有效性和效率,从而减轻其对交通安全的负面影响。本文将交通标志损伤评估作为一个基于深度学习的计算机视觉问题进行探讨。我们特别关注阻碍标志易读性的闭塞型损伤。本文有几点贡献。首先,对这一问题的相关工作进行了全面的综述。其次,它为该研究提供了合成图像生成的扩展。最重要的是,它提出了对EfficientDet对象检测框架的扩展,以应对这一挑战。研究表明,合成图像可以成功地用于训练物体检测器变体,以评估交通标志中0.0到1.0之间的损坏程度。扩展框架在保持其目标检测能力的同时,在合成测试集上实现了0.087的损伤评估均方根误差(RMSE)。
{"title":"End-to-End Traffic Sign Damage Assessment","authors":"","doi":"10.1109/DICTA56598.2022.10034587","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034587","url":null,"abstract":"Traffic sign damage monitoring is a practical issue facing large operations all over the world. Despite the scale of traffic sign damage and its consequent impact on public safety, damage audits are performed manually. By automating components of damage assessment we can greatly improve the effectiveness and efficiency of the process and in doing so alleviate its negative impact on traffic safety. In this paper, traffic sign damage assessment is explored as a computer vision problem approached with deep learning. We specifically focus on occlusion-type damages that hinder sign legibility. This paper makes several contributions. Firstly, it provides a comprehensive survey of related work on this problem. Secondly, it provides an extension to the generation of synthetic images for such a study. Most importantly, it proposes an extension of the EfficientDet object detection framework to address the challenge. It is shown that synthetic images can be successfully used to train an object detector variant to assess the level of damage, as measured between 0.0 and 1.0, in traffic signs. The extended framework achieves a damage assessment root mean squared error (RMSE) of 0.087 on a synthetic test set while maintaining its object detection capabilities.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116274582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ISAR Ship Classification Using Metadata Features 使用元数据特征的ISAR船舶分类
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034611
Inverse synthetic aperture radar (ISAR) is a common radar imaging technique used to characterise and classify non-cooperative targets. Different approaches to classification have been proposed and include the traditional approach using geometric features extracted from images of known targets and more recently, deep learning approaches that utilise transfer learning to deal with the small training datasets typically available. However, the challenge in a real-world scenario will be when no target training data is available and a different approach to classification will be required. In this work, we develop a deep neural network-based approach by utilising metadata features to enhance the performance of ISAR ship classification and provide an alternative metadata-only solution for ISAR ship classification.
逆合成孔径雷达(ISAR)是一种常用的雷达成像技术,用于识别和分类非合作目标。已经提出了不同的分类方法,包括使用从已知目标图像中提取的几何特征的传统方法,以及最近使用迁移学习来处理通常可用的小型训练数据集的深度学习方法。然而,在现实场景中的挑战是,当没有目标训练数据可用时,将需要一种不同的分类方法。在这项工作中,我们开发了一种基于深度神经网络的方法,利用元数据特征来提高ISAR船舶分类的性能,并为ISAR船舶分类提供了一种替代的纯元数据解决方案。
{"title":"ISAR Ship Classification Using Metadata Features","authors":"","doi":"10.1109/DICTA56598.2022.10034611","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034611","url":null,"abstract":"Inverse synthetic aperture radar (ISAR) is a common radar imaging technique used to characterise and classify non-cooperative targets. Different approaches to classification have been proposed and include the traditional approach using geometric features extracted from images of known targets and more recently, deep learning approaches that utilise transfer learning to deal with the small training datasets typically available. However, the challenge in a real-world scenario will be when no target training data is available and a different approach to classification will be required. In this work, we develop a deep neural network-based approach by utilising metadata features to enhance the performance of ISAR ship classification and provide an alternative metadata-only solution for ISAR ship classification.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125750207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SC-CrackSeg: A Real-Time Shared Feature Pyramid Network for Crack Detection and Segmentation SC-CrackSeg:一种用于裂纹检测和分割的实时共享特征金字塔网络
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034629
Detecting cracks is an important in a number of civil engineering applications. Recent advances in computer vision has enabled automatic crack detection and fine-grained segmentation using deep learning. However, the models used in previous work are often large and are therefore mainly suitable for offline structure monitoring where images taken from a site are analysed later by a powerful computer. In this work, we address the segmentation problem in an online setting, which permits the use of mobile inspection devices such as drones with limited computing power to monitor structures independently in realtime. We propose SC-CrackSeg, which has a very small number of parameters and can provide very high segmentation accuracy. Our main contribution is a multi-branch information-sharing architecture that efficiently manages global perspective while maintaining the fine and high-resolution details key in crack detection. SC-CrackSegextends a previously proposed model but optimized specifically for this application: reduction to a singleinput, a more efficient context mining module, and a simpler feature fusion module. We evaluate SC-CrackSeg on large crack detection data sets and the results show that our proposed model is competitive against the existing methods.
裂缝检测在许多土木工程应用中都是一项重要的工作。计算机视觉的最新进展使自动裂纹检测和使用深度学习的细粒度分割成为可能。然而,先前工作中使用的模型通常很大,因此主要适用于离线结构监测,从现场拍摄的图像稍后由功能强大的计算机进行分析。在这项工作中,我们解决了在线设置中的分割问题,这允许使用具有有限计算能力的移动检测设备(如无人机)独立实时监测结构。我们提出了SC-CrackSeg,它具有非常少的参数,可以提供非常高的分割精度。我们的主要贡献是一个多分支信息共享架构,有效地管理全局视角,同时保持裂纹检测的精细和高分辨率细节。sc - crackseg扩展了先前提出的模型,但专门针对此应用进行了优化:减少到单个输入,更有效的上下文挖掘模块和更简单的特征融合模块。我们在大型裂纹检测数据集上对SC-CrackSeg进行了评估,结果表明我们提出的模型与现有方法相比具有竞争力。
{"title":"SC-CrackSeg: A Real-Time Shared Feature Pyramid Network for Crack Detection and Segmentation","authors":"","doi":"10.1109/DICTA56598.2022.10034629","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034629","url":null,"abstract":"Detecting cracks is an important in a number of civil engineering applications. Recent advances in computer vision has enabled automatic crack detection and fine-grained segmentation using deep learning. However, the models used in previous work are often large and are therefore mainly suitable for offline structure monitoring where images taken from a site are analysed later by a powerful computer. In this work, we address the segmentation problem in an online setting, which permits the use of mobile inspection devices such as drones with limited computing power to monitor structures independently in realtime. We propose SC-CrackSeg, which has a very small number of parameters and can provide very high segmentation accuracy. Our main contribution is a multi-branch information-sharing architecture that efficiently manages global perspective while maintaining the fine and high-resolution details key in crack detection. SC-CrackSegextends a previously proposed model but optimized specifically for this application: reduction to a singleinput, a more efficient context mining module, and a simpler feature fusion module. We evaluate SC-CrackSeg on large crack detection data sets and the results show that our proposed model is competitive against the existing methods.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"70 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130937726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Single image rain removal using cWGAN network 基于cWGAN网络的单幅图像去雨
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034600
The atmospheric conditions like rain degrade visibility, creating problems for computer vision applications. The initial rain removal works are video-based; hence, they have temporal information, making the rain removal work much easier. In single image de-raining, the lack of temporal information creates challenges. Deep learning-based networks are recently popular for single-image rain removal. The networks may or may not use image decomposition for rain removal. This paper presents an end-to-end Wasserstein Generative Adversarial Network (WGAN) to restore a rain-free image from a rain image that does not require image decomposition. The network is trained using a combined Wasserstein, the mean absolute error or L1 loss, and VGG loss (perceptual loss) to improve the quality of generated rain-free images. Two networks, U-Net and W-Net, are trained as generators to show the network's performance. The proposed cWGAN is an end-to-end network that does not require further enhancement. An extensive test using natural and synthetic rainy images reveals that the proposed cWGAN network competes against the recent single image de-raining techniques.
像下雨这样的大气条件会降低能见度,给计算机视觉应用带来问题。最初的除雨工作是基于视频的;因此,它们有时间信息,使除雨工作容易得多。在单幅图像去训练中,缺乏时间信息带来了挑战。基于深度学习的网络最近在单幅图像的雨水去除方面很受欢迎。网络可以使用也可以不使用图像分解来去除雨水。本文提出了一种端到端的Wasserstein生成对抗网络(WGAN),用于从不需要图像分解的雨图像中恢复无雨图像。该网络使用Wasserstein、平均绝对误差或L1损失和VGG损失(感知损失)的组合来训练,以提高生成的无雨图像的质量。两个网络,U-Net和W-Net,被训练为生成器来显示网络的性能。提议的cWGAN是一个不需要进一步增强的端到端网络。使用自然和合成降雨图像的广泛测试表明,所提出的cWGAN网络与最近的单一图像去雨技术相竞争。
{"title":"Single image rain removal using cWGAN network","authors":"","doi":"10.1109/DICTA56598.2022.10034600","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034600","url":null,"abstract":"The atmospheric conditions like rain degrade visibility, creating problems for computer vision applications. The initial rain removal works are video-based; hence, they have temporal information, making the rain removal work much easier. In single image de-raining, the lack of temporal information creates challenges. Deep learning-based networks are recently popular for single-image rain removal. The networks may or may not use image decomposition for rain removal. This paper presents an end-to-end Wasserstein Generative Adversarial Network (WGAN) to restore a rain-free image from a rain image that does not require image decomposition. The network is trained using a combined Wasserstein, the mean absolute error or L1 loss, and VGG loss (perceptual loss) to improve the quality of generated rain-free images. Two networks, U-Net and W-Net, are trained as generators to show the network's performance. The proposed cWGAN is an end-to-end network that does not require further enhancement. An extensive test using natural and synthetic rainy images reveals that the proposed cWGAN network competes against the recent single image de-raining techniques.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131218384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Eating Activity Monitoring in Home Environments Using Smartphone-Based Video Recordings 使用基于智能手机的视频记录在家庭环境中监测饮食活动
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034636
Food intake monitoring plays an important role in personal dietary systems. Existing video based eating activity monitoring systems typically use recordings taken with an identical device in a single laboratory settings. In contrast, we explore videos recorded using smartphones for recognizing eating gestures in home environments. For this purpose, we collected 20 eating sessions from 14 participants using different smartphones. Specifically, the data is labelled into eating and no-eating classes. To recognize eating activity from video we have employed three deep learning approaches namely, 3D CNN, SlowFast network, and CNN-LSTM. Our approach has achieved the best F1-score of 0.560 with SlowFast network when evaluated using the Leave-One-Subject-Out (LOSO) scheme. Our preliminary results suggest that the video-based food intake monitoring can be used in home environments. However, our models failed to recognize the eating activity when the user tend to bend to pick food from the plate. More videos with such eating styles need to be incorporated in training data to enhance the performance.
食物摄入监测在个人饮食系统中起着重要的作用。现有的基于视频的饮食活动监控系统通常使用在单个实验室设置中使用相同设备拍摄的记录。相比之下,我们研究了用智能手机录制的视频,以识别家庭环境中的进食手势。为此,我们收集了14名使用不同智能手机的参与者的20个进食过程。具体来说,这些数据被分为进食和不进食两类。为了从视频中识别进食活动,我们采用了三种深度学习方法,即3D CNN、SlowFast网络和CNN- lstm。当使用Leave-One-Subject-Out (LOSO)方案进行评估时,我们的方法在SlowFast网络中获得了0.560的最佳f1分数。我们的初步结果表明,基于视频的食物摄入监测可以在家庭环境中使用。然而,当用户倾向于弯腰从盘子里取食物时,我们的模型无法识别进食活动。需要在训练数据中加入更多这种饮食方式的视频,以提高表现。
{"title":"Eating Activity Monitoring in Home Environments Using Smartphone-Based Video Recordings","authors":"","doi":"10.1109/DICTA56598.2022.10034636","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034636","url":null,"abstract":"Food intake monitoring plays an important role in personal dietary systems. Existing video based eating activity monitoring systems typically use recordings taken with an identical device in a single laboratory settings. In contrast, we explore videos recorded using smartphones for recognizing eating gestures in home environments. For this purpose, we collected 20 eating sessions from 14 participants using different smartphones. Specifically, the data is labelled into eating and no-eating classes. To recognize eating activity from video we have employed three deep learning approaches namely, 3D CNN, SlowFast network, and CNN-LSTM. Our approach has achieved the best F1-score of 0.560 with SlowFast network when evaluated using the Leave-One-Subject-Out (LOSO) scheme. Our preliminary results suggest that the video-based food intake monitoring can be used in home environments. However, our models failed to recognize the eating activity when the user tend to bend to pick food from the plate. More videos with such eating styles need to be incorporated in training data to enhance the performance.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130837914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A deep learning multi-capture segmentation modality for retinal OCT imaging 一种用于视网膜OCT成像的深度学习多捕获分割方法
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034625
Advances in image processing and deep learning methods have enhanced the analysis of optical coherence tomography (OCT) scans, which provide high-quality cross-sectional images of the posterior part of the eye (retina and choroid). These automatic methods support diagnosis and monitoring of ocular conditions by automatically segmenting the required tissues and quantifying the thickness of tissue layers. The performance of these automatic methods is often affected by the image quality, such as the presence of speckle noise or the variations of focus used to capture the OCT images. Changes in image quality can negatively impact the segmentation performance of the methods. In this study, OCT images of different capture modalities (i.e. various focus and denoise settings) are used to analyze how image quality factors can affect the segmentation performance of U-Net and TransU-Net methods, comparing segmentation results. To deal with the various modalities (i.e. different image quality aspects), an image-to-image translation process with CycleGAN is proposed to standardize the image quality and to facilitate the segmentation process of these methods. Results demonstrate that using this image-to-image process as a denoising technique for OCT images captured with an enhanced depth imaging focus modality gave the best performance when using the TransU-Net method, with a Dice coefficient of 0.99, improving the segmentation performance of the U-Net method. The proposed technique provides a viable alternative to OCT instrument-agnostic segmentation.
图像处理和深度学习方法的进步增强了光学相干断层扫描(OCT)扫描的分析,它提供了眼睛后部(视网膜和脉络膜)的高质量横截面图像。这些自动方法通过自动分割所需的组织和量化组织层的厚度来支持眼部疾病的诊断和监测。这些自动方法的性能经常受到图像质量的影响,例如斑点噪声的存在或用于捕获OCT图像的焦点变化。图像质量的变化会对方法的分割性能产生负面影响。在本研究中,使用不同捕获方式(即不同焦点和降噪设置)的OCT图像,分析图像质量因素如何影响U-Net和TransU-Net方法的分割性能,比较分割结果。针对各种模式(即图像质量的不同方面),提出了一种基于CycleGAN的图像到图像转换过程,以规范图像质量,并简化这些方法的分割过程。结果表明,将该图像到图像的处理作为增强深度成像焦点方式捕获的OCT图像的去噪技术,在使用TransU-Net方法时表现最佳,Dice系数为0.99,提高了U-Net方法的分割性能。提出的技术提供了一个可行的替代OCT仪器无关的分割。
{"title":"A deep learning multi-capture segmentation modality for retinal OCT imaging","authors":"","doi":"10.1109/DICTA56598.2022.10034625","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034625","url":null,"abstract":"Advances in image processing and deep learning methods have enhanced the analysis of optical coherence tomography (OCT) scans, which provide high-quality cross-sectional images of the posterior part of the eye (retina and choroid). These automatic methods support diagnosis and monitoring of ocular conditions by automatically segmenting the required tissues and quantifying the thickness of tissue layers. The performance of these automatic methods is often affected by the image quality, such as the presence of speckle noise or the variations of focus used to capture the OCT images. Changes in image quality can negatively impact the segmentation performance of the methods. In this study, OCT images of different capture modalities (i.e. various focus and denoise settings) are used to analyze how image quality factors can affect the segmentation performance of U-Net and TransU-Net methods, comparing segmentation results. To deal with the various modalities (i.e. different image quality aspects), an image-to-image translation process with CycleGAN is proposed to standardize the image quality and to facilitate the segmentation process of these methods. Results demonstrate that using this image-to-image process as a denoising technique for OCT images captured with an enhanced depth imaging focus modality gave the best performance when using the TransU-Net method, with a Dice coefficient of 0.99, improving the segmentation performance of the U-Net method. The proposed technique provides a viable alternative to OCT instrument-agnostic segmentation.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130205866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual-stream Convolutional Neural Networks for Koala Detection and Tracking 用于考拉检测和跟踪的双流卷积神经网络
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034583
Conservation of koalas is an urgent task for Australia given their rapidly declining numbers in the wild. To better estimate koala populations and analyse koala activity, a camera network was deployed to capture video of koalas from zoos and the wild. This led to the creation of the world's first koala video tracking dataset. Based on this dataset, a two-stream convolutional neural network model was constructed to detect and track koala activity in the video. The model has two branches, one using semantic information for object detection in the original video frames, and the other using optical flow for motion information tracking. Both branches use Yolov5, which generates the positions of objects detected in colour or infrared video. Finally, the features generated by the two branches are fused to determine the final position of the koala in each frame. Experimental results show that the dual-stream network can significantly improve the tracking performance when compared with the baseline model that uses only semantic information for tracking.
由于考拉在野外的数量迅速减少,保护考拉对澳大利亚来说是一项紧迫的任务。为了更好地估计考拉的数量和分析考拉的活动,部署了一个摄像机网络来捕捉动物园和野外考拉的视频。这导致了世界上第一个考拉视频跟踪数据集的创建。在此基础上,构建了一个双流卷积神经网络模型来检测和跟踪视频中的考拉活动。该模型分为两个分支,一个利用语义信息对原始视频帧中的目标进行检测,另一个利用光流对运动信息进行跟踪。这两个分支都使用Yolov5,它可以生成在彩色或红外视频中检测到的物体的位置。最后,将两个分支生成的特征融合在一起,确定考拉在每帧中的最终位置。实验结果表明,与仅使用语义信息进行跟踪的基线模型相比,双流网络可以显著提高跟踪性能。
{"title":"Dual-stream Convolutional Neural Networks for Koala Detection and Tracking","authors":"","doi":"10.1109/DICTA56598.2022.10034583","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034583","url":null,"abstract":"Conservation of koalas is an urgent task for Australia given their rapidly declining numbers in the wild. To better estimate koala populations and analyse koala activity, a camera network was deployed to capture video of koalas from zoos and the wild. This led to the creation of the world's first koala video tracking dataset. Based on this dataset, a two-stream convolutional neural network model was constructed to detect and track koala activity in the video. The model has two branches, one using semantic information for object detection in the original video frames, and the other using optical flow for motion information tracking. Both branches use Yolov5, which generates the positions of objects detected in colour or infrared video. Finally, the features generated by the two branches are fused to determine the final position of the koala in each frame. Experimental results show that the dual-stream network can significantly improve the tracking performance when compared with the baseline model that uses only semantic information for tracking.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124429554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1