首页 > 最新文献

2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)最新文献

英文 中文
Regional Video Object Segmentation by Efficient Motion-Aware Mask Propagation 基于高效运动感知掩码传播的区域视频目标分割
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034578
The use of optical flow to aid feature matching has been employed in recent self-supervised video object segmentation (VOS) methods and has shown promising results. However, computing pixel-wise optical flow is costly, and the optical flow can also be further utilized for efficient regional segmentation. To address these challenges, we propose an efficient motionaware mask propagation approach, dubbed EMMP, for selfsupervised VOS. EMMP introduces an efficient patch optical flow to compute the motion offsets of image patches for dynamic matching ROI generation. Fine-grained pixel-wise feature matching is performed based on the dynamic matching ROIs for mask propagation. To reduce redundant segmentation while avoiding unnecessary computations, we re-use the patch optical flow to estimate reliable foreground ROIs in the next frame and perform regional segmentation. Evaluation on benchmark VOS datasets shows that EMMP achieves competitive performance with significant wall-clock speed-ups compared to existing selfsupervised training methods, e.g., EMMP slightly outperforms MAMP and runs about 2× faster on segmentation. In addition, EMMP performs on par with many supervised training methods.
利用光流辅助特征匹配已被应用于最近的自监督视频目标分割(VOS)方法中,并显示出良好的效果。然而,计算逐像元的光流成本很高,并且可以进一步利用光流进行有效的区域分割。为了解决这些挑战,我们提出了一种有效的运动感知掩模传播方法,称为EMMP,用于自监督VOS。EMMP引入了一种高效的斑块光流来计算图像斑块的运动偏移量,从而产生动态匹配的ROI。基于掩码传播的动态匹配roi进行细粒度逐像素特征匹配。为了减少冗余分割,同时避免不必要的计算,我们重用patch光流在下一帧中估计可靠的前景roi并进行区域分割。对基准VOS数据集的评估表明,与现有的自监督训练方法相比,EMMP在具有显著的时钟加速的情况下取得了具有竞争力的性能,例如,EMMP的性能略优于MAMP,并且在分割上的运行速度提高了约2倍。此外,EMMP的表现与许多监督训练方法相当。
{"title":"Regional Video Object Segmentation by Efficient Motion-Aware Mask Propagation","authors":"","doi":"10.1109/DICTA56598.2022.10034578","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034578","url":null,"abstract":"The use of optical flow to aid feature matching has been employed in recent self-supervised video object segmentation (VOS) methods and has shown promising results. However, computing pixel-wise optical flow is costly, and the optical flow can also be further utilized for efficient regional segmentation. To address these challenges, we propose an efficient motionaware mask propagation approach, dubbed EMMP, for selfsupervised VOS. EMMP introduces an efficient patch optical flow to compute the motion offsets of image patches for dynamic matching ROI generation. Fine-grained pixel-wise feature matching is performed based on the dynamic matching ROIs for mask propagation. To reduce redundant segmentation while avoiding unnecessary computations, we re-use the patch optical flow to estimate reliable foreground ROIs in the next frame and perform regional segmentation. Evaluation on benchmark VOS datasets shows that EMMP achieves competitive performance with significant wall-clock speed-ups compared to existing selfsupervised training methods, e.g., EMMP slightly outperforms MAMP and runs about 2× faster on segmentation. In addition, EMMP performs on par with many supervised training methods.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129994458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Can Synthetic Data Improve Multi-Class Counting of Surgical Instruments? 综合数据能改善手术器械的多类计数吗?
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034591
Counting is a common preventative measure taken to ensure surgical instruments are not retained during surgery, which could cause serious detrimental effects including chronic pain and sepsis. A hybrid human-AI system could support or partially automate this manual counting of instruments. An important element to evaluate the viability of using deep learning computer vision-based counting is a suitable large-scale dataset of surgical instruments. Other domains, such as crowd analysis and instance counting, have leveraged synthetic datasets to evaluate and augment different approaches. We present a synthetic dataset (SORT), which is complemented by a smaller real-world dataset of surgical instruments (MSMI), to assess the hypothesis whether synthetic training data can improve the performance of multiclass multi-instance counting models when applied to real-world data. In this preliminary study, we provide comparative baselines for various popular counting techniques on synthetic data, such as direct regression, segmentation, localisation, and density estimation. These experiments are repeated at different resolutions - full high-definition (1080 × 1920 pixels), half (690 × 540 pixels), and quarter (480 × 270 pixels) - to measure the robustness of different supervision methods to varying image scales. The results indicate that neither the degree of supervision nor the image resolution during model training impact performance significantly on the synthetic data. However, when testing on the real-world instrument dataset, the models trained on synthetic data were significantly less accurate. These results indicate a need for further work in either the refinement of the synthetic depictions or fine-tuning on real-world data to achieve similar performance in domain adaptation scenarios compared to training and testing solely on the synthetic data.
计数是一种常见的预防措施,以确保手术器械在手术过程中不会被保留,这可能导致严重的有害影响,包括慢性疼痛和败血症。人工智能混合系统可以支持或部分自动化这种手动计数仪器。评估使用深度学习计算机视觉计数的可行性的一个重要因素是合适的大规模手术器械数据集。其他领域,如人群分析和实例计数,已经利用合成数据集来评估和增强不同的方法。我们提出了一个合成数据集(SORT),并辅以一个较小的现实世界的手术器械数据集(MSMI),以评估合成训练数据在应用于现实世界数据时是否能提高多类多实例计数模型的性能。在这项初步研究中,我们为各种流行的合成数据计数技术提供了比较基线,如直接回归、分割、局部化和密度估计。这些实验在不同的分辨率下重复进行——全高清(1080 × 1920像素)、半高清(690 × 540像素)和四分之一高清(480 × 270像素)——以衡量不同监督方法对不同图像尺度的鲁棒性。结果表明,模型训练过程中的监督程度和图像分辨率对合成数据的性能都没有显著影响。然而,当在真实世界的仪器数据集上进行测试时,在合成数据上训练的模型的准确性显着降低。这些结果表明,需要进一步改进合成描述或对现实世界数据进行微调,以便在领域适应场景中实现与仅在合成数据上进行训练和测试相似的性能。
{"title":"Can Synthetic Data Improve Multi-Class Counting of Surgical Instruments?","authors":"","doi":"10.1109/DICTA56598.2022.10034591","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034591","url":null,"abstract":"Counting is a common preventative measure taken to ensure surgical instruments are not retained during surgery, which could cause serious detrimental effects including chronic pain and sepsis. A hybrid human-AI system could support or partially automate this manual counting of instruments. An important element to evaluate the viability of using deep learning computer vision-based counting is a suitable large-scale dataset of surgical instruments. Other domains, such as crowd analysis and instance counting, have leveraged synthetic datasets to evaluate and augment different approaches. We present a synthetic dataset (SORT), which is complemented by a smaller real-world dataset of surgical instruments (MSMI), to assess the hypothesis whether synthetic training data can improve the performance of multiclass multi-instance counting models when applied to real-world data. In this preliminary study, we provide comparative baselines for various popular counting techniques on synthetic data, such as direct regression, segmentation, localisation, and density estimation. These experiments are repeated at different resolutions - full high-definition (1080 × 1920 pixels), half (690 × 540 pixels), and quarter (480 × 270 pixels) - to measure the robustness of different supervision methods to varying image scales. The results indicate that neither the degree of supervision nor the image resolution during model training impact performance significantly on the synthetic data. However, when testing on the real-world instrument dataset, the models trained on synthetic data were significantly less accurate. These results indicate a need for further work in either the refinement of the synthetic depictions or fine-tuning on real-world data to achieve similar performance in domain adaptation scenarios compared to training and testing solely on the synthetic data.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131158049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised Deep Learning for Online Foreground Segmentation Exploiting Low-Rank and Sparse Priors 利用低秩和稀疏先验的无监督深度学习在线前景分割
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034581
This paper proposes a simple approach to unsupervised deep learning for foreground object segmentation. Robust principal component analysis (RPCA) can achieve background subtraction by minimizing nuclear and $ell_{1}$ norms to exploit the prior knowledge about spatio-temporal sparseness and low-rankness of the foreground objects and background scene. With a combination of these norms as a loss function, the proposed method trains a U-Net-based model so as to encode and decode the sparse foreground objects for a batch of input images with a low-rank background. Once the model has learned enough features common to the foreground objects, it has the potential to detect them from any single image regardless of the low-rankness and sparseness. The proposed model performs online object segmentation with much less computational expense than that of RPCA. These advantages over RPCA are demonstrated with background subtraction in video surveillance. It is also shown experimentally that the present method can build up a well-generalized cell nuclei segmentation model from only a few dozens of unannotated training images.
本文提出了一种简单的无监督深度学习前景目标分割方法。鲁棒主成分分析(RPCA)利用前景目标和背景场景的时空稀疏性和低秩性的先验知识,通过最小化核范数和$ell_{1}$范数来实现背景减法。该方法以这些准则的组合作为损失函数,训练一个基于u - net的模型,对一批具有低秩背景的输入图像进行稀疏前景目标的编码解码。一旦模型学习了足够多的前景物体的共同特征,它就有可能从任何一张图像中检测到它们,而不管其低秩和稀疏性。与RPCA相比,该模型的计算量大大减少。在视频监控中,通过背景减法证明了这些优点。实验还表明,该方法可以从几十张未加注释的训练图像中建立一个很好的泛化细胞核分割模型。
{"title":"Unsupervised Deep Learning for Online Foreground Segmentation Exploiting Low-Rank and Sparse Priors","authors":"","doi":"10.1109/DICTA56598.2022.10034581","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034581","url":null,"abstract":"This paper proposes a simple approach to unsupervised deep learning for foreground object segmentation. Robust principal component analysis (RPCA) can achieve background subtraction by minimizing nuclear and $ell_{1}$ norms to exploit the prior knowledge about spatio-temporal sparseness and low-rankness of the foreground objects and background scene. With a combination of these norms as a loss function, the proposed method trains a U-Net-based model so as to encode and decode the sparse foreground objects for a batch of input images with a low-rank background. Once the model has learned enough features common to the foreground objects, it has the potential to detect them from any single image regardless of the low-rankness and sparseness. The proposed model performs online object segmentation with much less computational expense than that of RPCA. These advantages over RPCA are demonstrated with background subtraction in video surveillance. It is also shown experimentally that the present method can build up a well-generalized cell nuclei segmentation model from only a few dozens of unannotated training images.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128378766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An energy-efficient AkidaNet for morphologically similar weeds and crops recognition at the Edge 一种用于形态相似杂草和作物边缘识别的高效AkidaNet
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034619
There are a myriad of factors that influence on modern agriculture including water scarcity [1], climate change [2], biodiversity loss [3], pollutants [4], etc. Weed invasion is one of the greatest environmental threats to the productivity of agriculture. A threatening invasive species in Western Australia [5], and in many regions of the world [6] is wild radish (Raphanus raphanistrum). It has detrimental impacts on production costs, crop yield reduction (from 10% to 90% [7]), and quality of crops due to its capacity to compete with crops for nutrients, light and water [8], [9]. Particularly, the influence of wild radish on the quality and yield of canola has always been a concern in weed control [10]. Canola is known to be one of the world's healthiest vegetable oils, can be grown in winter or spring, and is an environmentally friendly biofuel [11]. The production of canola has a market value of AU$2.2 billion and has increased significantly to over four million tonnes in Australia from 2012 to 2013 [12]. With more than two million tonnes of canola seed exported by Australia every year, Australia has become the world's second largest exporter of canola [13]. However, the challenges in spraying herbicides on only targeted weeds has arisen from the high visual similarity between canola and wild radish species.
影响现代农业的因素有很多,包括水资源短缺[1]、气候变化[2]、生物多样性丧失[3]、污染物[4]等。杂草入侵是对农业生产力最大的环境威胁之一。野生萝卜(Raphanus raphanistrum)是西澳大利亚和世界许多地区的一种具有威胁性的入侵物种。它对生产成本、作物减产(从10%到90%[7])和作物质量产生不利影响,因为它有能力与作物竞争养分、光和水[8],b[9]。特别是野生萝卜对油菜品质和产量的影响一直是杂草防治领域关注的问题。菜籽油被认为是世界上最健康的植物油之一,可以在冬季或春季种植,而且是一种环保的生物燃料。油菜籽的生产市场价值为22亿澳元,从2012年到2013年,澳大利亚的油菜籽产量大幅增加到400多万吨。澳大利亚每年出口200多万吨油菜籽,已成为世界第二大油菜籽出口国。然而,由于油菜和野生萝卜在视觉上的高度相似性,在仅对目标杂草喷洒除草剂方面存在挑战。
{"title":"An energy-efficient AkidaNet for morphologically similar weeds and crops recognition at the Edge","authors":"","doi":"10.1109/DICTA56598.2022.10034619","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034619","url":null,"abstract":"There are a myriad of factors that influence on modern agriculture including water scarcity [1], climate change [2], biodiversity loss [3], pollutants [4], etc. Weed invasion is one of the greatest environmental threats to the productivity of agriculture. A threatening invasive species in Western Australia [5], and in many regions of the world [6] is wild radish (Raphanus raphanistrum). It has detrimental impacts on production costs, crop yield reduction (from 10% to 90% [7]), and quality of crops due to its capacity to compete with crops for nutrients, light and water [8], [9]. Particularly, the influence of wild radish on the quality and yield of canola has always been a concern in weed control [10]. Canola is known to be one of the world's healthiest vegetable oils, can be grown in winter or spring, and is an environmentally friendly biofuel [11]. The production of canola has a market value of AU$2.2 billion and has increased significantly to over four million tonnes in Australia from 2012 to 2013 [12]. With more than two million tonnes of canola seed exported by Australia every year, Australia has become the world's second largest exporter of canola [13]. However, the challenges in spraying herbicides on only targeted weeds has arisen from the high visual similarity between canola and wild radish species.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122008869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Kidney Tumor Segmentation and Classification Using Deep Neural Network on CT Images 基于CT图像的深度神经网络肾肿瘤分割与分类
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034638
Kidney disease is one of many severe chronic disease that a person can have. Early detection of this disease can be pivotal for proper treatment. Different neural networks have proven to be useful in disease prediction in the progression of modern science. In this paper, we have proposed a segmentation based kidney tumor classification method using Deep Neural Network (DNN). We have done our work in two Steps. Firstly, we have segmented kidneys using a manual segmentation technique and UNet along with SegNet for kidney segmentation. Then, for the classification task, the modified MobileNetV2, VGG16 and InceptionV3 was trained on the segmented kidney data. CT KIDNEY DATASET: Normal-Cyst-Tumor and Stone dataset(published in Kaggle) was used to train our models. Finally, the classification models MobileNetV2, VGG16, InceptionV3 scored with 95.29%, 99.48% and 97.38% accuracy on test set. We found that the VGG16 model has the best accuracy and the highest sensitivity and specificity. Explainable AI (GradCam) method has been applied to expalain our model's result.
肾脏疾病是人类可能患有的许多严重慢性疾病之一。这种疾病的早期发现对于适当治疗至关重要。随着现代科学的发展,不同的神经网络在疾病预测中已被证明是有用的。本文提出了一种基于深度神经网络(DNN)分割的肾肿瘤分类方法。我们分两步完成了我们的工作。首先,我们使用人工分割技术和UNet以及SegNet进行肾脏分割。然后,对于分类任务,在分割的肾脏数据上训练改进的MobileNetV2、VGG16和InceptionV3。CT肾脏数据集:使用正常囊肿-肿瘤和结石数据集(发表在Kaggle上)来训练我们的模型。最终,MobileNetV2、VGG16、InceptionV3分类模型在测试集上的准确率分别为95.29%、99.48%和97.38%。我们发现VGG16模型具有最好的准确性和最高的灵敏度和特异性。可解释的AI (GradCam)方法已被应用于解释我们的模型的结果。
{"title":"Kidney Tumor Segmentation and Classification Using Deep Neural Network on CT Images","authors":"","doi":"10.1109/DICTA56598.2022.10034638","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034638","url":null,"abstract":"Kidney disease is one of many severe chronic disease that a person can have. Early detection of this disease can be pivotal for proper treatment. Different neural networks have proven to be useful in disease prediction in the progression of modern science. In this paper, we have proposed a segmentation based kidney tumor classification method using Deep Neural Network (DNN). We have done our work in two Steps. Firstly, we have segmented kidneys using a manual segmentation technique and UNet along with SegNet for kidney segmentation. Then, for the classification task, the modified MobileNetV2, VGG16 and InceptionV3 was trained on the segmented kidney data. CT KIDNEY DATASET: Normal-Cyst-Tumor and Stone dataset(published in Kaggle) was used to train our models. Finally, the classification models MobileNetV2, VGG16, InceptionV3 scored with 95.29%, 99.48% and 97.38% accuracy on test set. We found that the VGG16 model has the best accuracy and the highest sensitivity and specificity. Explainable AI (GradCam) method has been applied to expalain our model's result.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121932756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Multi-Domain Thermal Object Detection Using Generative Adversarial Networks 基于生成对抗网络的多域热目标检测
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034641
Autonomous driving (AD) is undeniably gaining a substantial role in the automotive industry. AD exploits emerging advances in object detection that benefit from improvements in deep learning algorithms, especially convolutional neural networks (CNNs). Scene perception is a crucial task in self-driving vehicles. Scene perception in AD is the ability to extract relevant data from the surroundings. The efficacy of perception largely depends on the sensors and cameras used in capturing the scene. It also depends on the surrounding environmental conditions, which affect the sensors and cameras.
不可否认,自动驾驶(AD)在汽车行业的地位越来越重要。AD利用了目标检测方面的新进展,这些进展得益于深度学习算法的改进,尤其是卷积神经网络(cnn)。场景感知是自动驾驶汽车的一项关键任务。AD中的场景感知是指从周围环境中提取相关数据的能力。感知的效果很大程度上取决于用于捕捉场景的传感器和相机。它还取决于周围的环境条件,这会影响传感器和摄像头。
{"title":"Multi-Domain Thermal Object Detection Using Generative Adversarial Networks","authors":"","doi":"10.1109/DICTA56598.2022.10034641","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034641","url":null,"abstract":"Autonomous driving (AD) is undeniably gaining a substantial role in the automotive industry. AD exploits emerging advances in object detection that benefit from improvements in deep learning algorithms, especially convolutional neural networks (CNNs). Scene perception is a crucial task in self-driving vehicles. Scene perception in AD is the ability to extract relevant data from the surroundings. The efficacy of perception largely depends on the sensors and cameras used in capturing the scene. It also depends on the surrounding environmental conditions, which affect the sensors and cameras.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"176 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132273261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
3D LiDAR Transformer for City-scale Vegetation Segmentation and Biomass Estimation: immediate 用于城市尺度植被分割和生物量估计的3D激光雷达变压器:立即
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034637
3D LiDAR has transformed various urban infrastructure management practices, including urban vegetation detection and monitoring. The accessibility and convenience of use of LiDAR observations in ecological investigations has substantially improved because of advancements in LiDAR hardware systems and data processing techniques. In this paper, we introduce a slot attention-based network for semantic segmentation and biomass estimation of vegetation. We named it the 3D semantic vegetation transformer (3DSVT). Our proposed method first extracts point features by exploiting RandLA-Net, which are then passed to slot attention to extract object central features for semantic segmentation. Finally, vegetation biomass is computed based on the resultant semantic segmentation. We compare our proposed approach to the state-of-the-art 3D point cloud semantic segmentation methods on SensatUrban and semantic3D datasets. The experiments show that our proposed method is giving promising results and can thus be used to analyse and compute the vegetation biomass of 3D point clouds at a large scale.
3D激光雷达已经改变了各种城市基础设施管理实践,包括城市植被检测和监测。由于激光雷达硬件系统和数据处理技术的进步,在生态调查中使用激光雷达观测的可及性和便利性已经大大提高。本文提出了一种基于缝隙注意力的植被语义分割和生物量估计网络。我们将其命名为3D语义植被转换器(3DSVT)。我们提出的方法首先利用RandLA-Net提取点特征,然后将其传递给槽注意提取对象中心特征进行语义分割。最后,根据生成的语义分割计算植被生物量。我们将我们提出的方法与SensatUrban和semantic3D数据集上最先进的3D点云语义分割方法进行了比较。实验结果表明,该方法具有较好的应用前景,可用于大尺度三维点云植被生物量的分析与计算。
{"title":"3D LiDAR Transformer for City-scale Vegetation Segmentation and Biomass Estimation: immediate","authors":"","doi":"10.1109/DICTA56598.2022.10034637","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034637","url":null,"abstract":"3D LiDAR has transformed various urban infrastructure management practices, including urban vegetation detection and monitoring. The accessibility and convenience of use of LiDAR observations in ecological investigations has substantially improved because of advancements in LiDAR hardware systems and data processing techniques. In this paper, we introduce a slot attention-based network for semantic segmentation and biomass estimation of vegetation. We named it the 3D semantic vegetation transformer (3DSVT). Our proposed method first extracts point features by exploiting RandLA-Net, which are then passed to slot attention to extract object central features for semantic segmentation. Finally, vegetation biomass is computed based on the resultant semantic segmentation. We compare our proposed approach to the state-of-the-art 3D point cloud semantic segmentation methods on SensatUrban and semantic3D datasets. The experiments show that our proposed method is giving promising results and can thus be used to analyse and compute the vegetation biomass of 3D point clouds at a large scale.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"219 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132311821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Multi-Granularity Feature Fusion Model for Pedestrian Attribute Recognition 行人属性识别的多粒度特征融合模型
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034642
Pedestrian attributes are defined as pedestrian appearance features which can be observed directly, usually including gender, age, clothing, etc. The purpose of pedestrian attribute recognition (PAR) is to perform semantic analysis on a given pedestrian image, which is widely used in person reidentification [1] and human detection [2]. Owing to the influence of factors such as changeable postures, occlusion, uneven lighting and different perspectives, some features with poor semantics in pedestrian images are too weak to learn, and thus the classification becomes more difficult.
行人属性是指行人可以直接观察到的外观特征,通常包括性别、年龄、衣着等。行人属性识别(PAR)的目的是对给定的行人图像进行语义分析,广泛应用于人的再识别[1]和人体检测[2]。由于姿态变化、遮挡、光照不均匀、视角不同等因素的影响,行人图像中一些语义较差的特征学习能力较弱,从而增加了分类难度。
{"title":"A Multi-Granularity Feature Fusion Model for Pedestrian Attribute Recognition","authors":"","doi":"10.1109/DICTA56598.2022.10034642","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034642","url":null,"abstract":"Pedestrian attributes are defined as pedestrian appearance features which can be observed directly, usually including gender, age, clothing, etc. The purpose of pedestrian attribute recognition (PAR) is to perform semantic analysis on a given pedestrian image, which is widely used in person reidentification [1] and human detection [2]. Owing to the influence of factors such as changeable postures, occlusion, uneven lighting and different perspectives, some features with poor semantics in pedestrian images are too weak to learn, and thus the classification becomes more difficult.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"163 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114662009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Local quality assessment of patient specific synthetic-CT via voxel-wise analysis 通过体素分析对患者特异性合成ct进行局部质量评估
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034622
Synthetic-Computed Tomography (sCT) generation is a critical component of Magnetic Resonance Imaging (MRI)-only radiation therapy workflows. The sCT computed from MRI is generally assessed by measuring Hounsfield Units (HU) discrepancies with a reference CT. The aim of this work was to propose a process for the blind assessment of local errors in generated sCTs where a reference CT is unavailable, allowing for safe MRI-only radiation therapy treatment planning. A personalised inter-patient registration method was applied to align a cohort of reference CTs into the same coordinate system. This process resulted in probability maps for each segmented organ, a mean CT image and a standard deviation map. These data were propagated to the anatomical space for each sCT, allowing for out of distribution intensities to be detected at a voxel level by computing local z-scores. Probability maps of organs were used to weight the resulting z-scores, reducing the bias induced by the registration around structures. Two sCT generation methods were chosen as examples to illustrate this methodology: an atlas-based method (ABM) and a deep-learning approach based on a Generative Adversarial Network (GAN) architecture. 39 patients treated with external beam radiotherapy for prostate cancer, with co-registered CT and MR pairs, were used for sCT generation. 26 of these patients were selected as reference CT, and sCT of the remaining 13 patients were assessed. Accurate inter-individual registration was achieved, with mean Dice scores higher than 0.91 for all organs. The average volume of error represented 0.29% of the image for the ABM, 0.37% for the GAN. The proposed methodology produced 3D volumes which identify significant local sCT errors. Depending on their size and location, these errors could lead to inaccurate tissue density computation during radiation therapy. This work provides an automated QA method aimed at preventing incorrect radiation dose delivery to patients.
合成计算机断层扫描(sCT)生成是仅磁共振成像(MRI)放射治疗工作流程的关键组成部分。MRI计算的sCT通常通过测量与参考CT的Hounsfield单位(HU)差异来评估。这项工作的目的是提出一种在没有参考CT的情况下对生成的sct局部错误进行盲评估的过程,从而允许安全的仅使用mri的放射治疗计划。采用个性化的患者间登记方法将参考ct队列对齐到同一坐标系中。这一过程产生了每个分割器官的概率图、平均CT图像和标准差图。这些数据被传播到每个sCT的解剖空间,允许通过计算局部z分数在体素级别检测出分布强度。器官的概率图被用来加权得到的z分数,减少了由结构周围的配准引起的偏差。本文选择了两种sCT生成方法作为示例来说明这种方法:基于图集的方法(ABM)和基于生成对抗网络(GAN)架构的深度学习方法。39体外放射治疗前列腺癌的患者,和co-registered CT和MR对用于sCT的一代。选取26例患者作为参考CT,对其余13例患者进行sCT评估。实现了准确的个体间配准,所有器官的平均Dice评分均高于0.91。ABM的平均误差体积占图像的0.29%,GAN的平均误差体积占图像的0.37%。所提出的方法产生的3D体积可以识别显著的局部sCT错误。根据它们的大小和位置,这些误差可能导致放射治疗期间组织密度计算不准确。这项工作提供了一种自动化的质量保证方法,旨在防止错误的辐射剂量给病人。
{"title":"Local quality assessment of patient specific synthetic-CT via voxel-wise analysis","authors":"","doi":"10.1109/DICTA56598.2022.10034622","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034622","url":null,"abstract":"Synthetic-Computed Tomography (sCT) generation is a critical component of Magnetic Resonance Imaging (MRI)-only radiation therapy workflows. The sCT computed from MRI is generally assessed by measuring Hounsfield Units (HU) discrepancies with a reference CT. The aim of this work was to propose a process for the blind assessment of local errors in generated sCTs where a reference CT is unavailable, allowing for safe MRI-only radiation therapy treatment planning. A personalised inter-patient registration method was applied to align a cohort of reference CTs into the same coordinate system. This process resulted in probability maps for each segmented organ, a mean CT image and a standard deviation map. These data were propagated to the anatomical space for each sCT, allowing for out of distribution intensities to be detected at a voxel level by computing local z-scores. Probability maps of organs were used to weight the resulting z-scores, reducing the bias induced by the registration around structures. Two sCT generation methods were chosen as examples to illustrate this methodology: an atlas-based method (ABM) and a deep-learning approach based on a Generative Adversarial Network (GAN) architecture. 39 patients treated with external beam radiotherapy for prostate cancer, with co-registered CT and MR pairs, were used for sCT generation. 26 of these patients were selected as reference CT, and sCT of the remaining 13 patients were assessed. Accurate inter-individual registration was achieved, with mean Dice scores higher than 0.91 for all organs. The average volume of error represented 0.29% of the image for the ABM, 0.37% for the GAN. The proposed methodology produced 3D volumes which identify significant local sCT errors. Depending on their size and location, these errors could lead to inaccurate tissue density computation during radiation therapy. This work provides an automated QA method aimed at preventing incorrect radiation dose delivery to patients.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125105454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Underwater Object Detection Enhancement via Channel Stabilization 通过信道稳定增强水下目标检测
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034594
The complex marine environment exacerbates the challenges of object Abstract-The complex marine environment exacerbates the challenges of object detection manifold. With the advent of the modern era, marine trash presents a danger to the aquatic ecosystem, and it has always been challenging to address this issue with complete grip. Therefore, there is a significant need to precisely detect marine deposits and locate them accurately in challenging aquatic surroundings. To ensure the safety of the marine environment caused by waste, the deployment of underwater object detection is a crucial tool to mitigate the harm of such waste. Our work explains the image enhancement strategies used and experiments exploring the best detection obtained after applying these methods. Specifically, we evaluate Detectron 2's backbones performance using different base models and configurations for the underwater detection task. We first propose a channel stabilization technique on top of a simplified image enhancement model to help reduce haze and colour cast in training images. The proposed procedure shows improved results on multi-scale size objects present in the data set. After processing the images, we explore various backbones in Detectron2 to give the best detection accuracy for these images. In addition, we use a sharpening filter with augmentation techniques. This highlights the profile of the object which helps us recognize it easily. We demonstrate our results by verifying these on TrashCan Data set, both instance and material version. We then explore the best-performing backbone method for this setting. We apply our channel stabilization and augmentation methods to the best-performing technique. We also compare our detection results from Detectron2 using the best backbones with those from Deformable Transformer. The detection result for small size objects in the Instance-version of TRASHCAN 1.0 gives us a 9.53box we get the absolute gain of 7to the baseline.
摘要复杂的海洋环境加剧了目标检测多方面的挑战。随着现代时代的到来,海洋垃圾对水生生态系统构成了威胁,彻底解决这一问题一直是一项挑战。因此,在具有挑战性的水生环境中,需要精确地探测海洋沉积物并准确地定位它们。为确保废弃物对海洋环境造成的安全,部署水下目标探测是减轻废弃物危害的重要工具。我们的工作解释了使用的图像增强策略和实验,探索应用这些方法后获得的最佳检测。具体来说,我们使用不同的水下探测任务基本模型和配置来评估Detectron 2的主干网性能。我们首先在简化的图像增强模型之上提出通道稳定技术,以帮助减少训练图像中的雾霾和色偏。该方法对数据集中存在的多尺度大小的对象显示了改进的结果。在对图像进行处理后,我们在Detectron2中探索各种主干,以获得这些图像的最佳检测精度。此外,我们还使用了带有增强技术的锐化滤波器。这突出了物体的轮廓,帮助我们更容易识别它。我们通过在垃圾桶数据集(实例和材料版本)上验证这些结果来证明我们的结果。然后,我们将探索此设置中性能最佳的骨干方法。我们将我们的信道稳定和增强方法应用于性能最好的技术。我们还比较了我们的检测结果从Detectron2使用最好的骨干与变形变压器。在实例版的TRASHCAN 1.0中,小尺寸对象的检测结果为9.53,我们得到的绝对增益为基线的7。
{"title":"Underwater Object Detection Enhancement via Channel Stabilization","authors":"","doi":"10.1109/DICTA56598.2022.10034594","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034594","url":null,"abstract":"The complex marine environment exacerbates the challenges of object Abstract-The complex marine environment exacerbates the challenges of object detection manifold. With the advent of the modern era, marine trash presents a danger to the aquatic ecosystem, and it has always been challenging to address this issue with complete grip. Therefore, there is a significant need to precisely detect marine deposits and locate them accurately in challenging aquatic surroundings. To ensure the safety of the marine environment caused by waste, the deployment of underwater object detection is a crucial tool to mitigate the harm of such waste. Our work explains the image enhancement strategies used and experiments exploring the best detection obtained after applying these methods. Specifically, we evaluate Detectron 2's backbones performance using different base models and configurations for the underwater detection task. We first propose a channel stabilization technique on top of a simplified image enhancement model to help reduce haze and colour cast in training images. The proposed procedure shows improved results on multi-scale size objects present in the data set. After processing the images, we explore various backbones in Detectron2 to give the best detection accuracy for these images. In addition, we use a sharpening filter with augmentation techniques. This highlights the profile of the object which helps us recognize it easily. We demonstrate our results by verifying these on TrashCan Data set, both instance and material version. We then explore the best-performing backbone method for this setting. We apply our channel stabilization and augmentation methods to the best-performing technique. We also compare our detection results from Detectron2 using the best backbones with those from Deformable Transformer. The detection result for small size objects in the Instance-version of TRASHCAN 1.0 gives us a 9.53box we get the absolute gain of 7to the baseline.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125160764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1