首页 > 最新文献

2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)最新文献

英文 中文
UCL: Unsupervised Curriculum Learning for Utility Pole Classification from Aerial Imagery 基于航空图像的电线杆分类的无监督课程学习
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034610
This paper introduces a machine learning-based approach for detecting electric poles, an essential part of power grid maintenance. With the increasing popularity of deep learning, several such approaches have been proposed for electric pole detection. However, most of these approaches are supervised, requiring a large amount of labeled data, which is time-consuming and labor-intensive. Unsupervised deep learning approaches have the potential to overcome the need for huge amounts of training data. This paper presents an unsupervised deep learning framework for utility pole classification. The framework combines Convolutional Neural Network (CNN) and clustering algorithms with a selection operation. The CNN architecture for extracting meaningful features from aerial imagery, a clustering algorithm for generating pseudo labels for the resulting features, and a selection operation to filter out reliable samples to fine-tune the CNN architecture further. The fine-tuned version then replaces the initial CNN model, thus improving the framework, and we iteratively repeat this process so that the model learns the prominent patterns in the data progressively. The presented framework is trained and tested on a small dataset of utility poles provided by “Mention Fuvex” (a Spanish company utilizing long-range drones for power line inspection). Our extensive experimentation demonstrates the progressive learning behavior of the proposed method and results in promising classification scores on the utility pole dataset.
本文介绍了一种基于机器学习的电线杆检测方法,电线杆是电网维护的重要组成部分。随着深度学习的日益普及,人们提出了几种用于电极检测的方法。然而,这些方法大多是有监督的,需要大量的标记数据,这是耗时和劳动密集型的。无监督深度学习方法有潜力克服对大量训练数据的需求。提出了一种用于电线杆分类的无监督深度学习框架。该框架结合了卷积神经网络(CNN)和聚类算法以及选择操作。从航空图像中提取有意义特征的CNN架构,为生成的特征生成伪标签的聚类算法,以及过滤出可靠样本以进一步微调CNN架构的选择操作。然后,微调版本替换初始CNN模型,从而改进框架,我们迭代重复这个过程,使模型逐步学习数据中的突出模式。所提出的框架在“Mention Fuvex”(一家利用远程无人机进行电力线检查的西班牙公司)提供的电线杆小数据集上进行了训练和测试。我们广泛的实验证明了所提出方法的渐进式学习行为,并在电线杆数据集上获得了有希望的分类分数。
{"title":"UCL: Unsupervised Curriculum Learning for Utility Pole Classification from Aerial Imagery","authors":"","doi":"10.1109/DICTA56598.2022.10034610","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034610","url":null,"abstract":"This paper introduces a machine learning-based approach for detecting electric poles, an essential part of power grid maintenance. With the increasing popularity of deep learning, several such approaches have been proposed for electric pole detection. However, most of these approaches are supervised, requiring a large amount of labeled data, which is time-consuming and labor-intensive. Unsupervised deep learning approaches have the potential to overcome the need for huge amounts of training data. This paper presents an unsupervised deep learning framework for utility pole classification. The framework combines Convolutional Neural Network (CNN) and clustering algorithms with a selection operation. The CNN architecture for extracting meaningful features from aerial imagery, a clustering algorithm for generating pseudo labels for the resulting features, and a selection operation to filter out reliable samples to fine-tune the CNN architecture further. The fine-tuned version then replaces the initial CNN model, thus improving the framework, and we iteratively repeat this process so that the model learns the prominent patterns in the data progressively. The presented framework is trained and tested on a small dataset of utility poles provided by “Mention Fuvex” (a Spanish company utilizing long-range drones for power line inspection). Our extensive experimentation demonstrates the progressive learning behavior of the proposed method and results in promising classification scores on the utility pole dataset.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126882359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transformer with enhanced encoder and monotonic decoder for Automatic Speech recognition 具有增强型编码器和单调解码器的自动语音识别变压器
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034576
Automatic speech recognition (ASR) systems map the input speech signals to corresponding texts in the output. It mainly consists of two stages: encoding the speech signals to an intermediate feature representation which are to be decoded to obtain the corresponding characters. Therefore, it is essential to extract features and temporal dependencies effectively from the speech signals. In addition, it is also necessary to implement a decoding strategy which adequately can leverage the monotonic property of speech transcription. In this paper, we have proposed speech transcription with Transformer where we aimed to enhance the encoder effectiveness by leveraging the strength of convolution neural network in conjunction with self-attention. In addition, we have investigated the possibilities of incorporating monotonicity at the decoder side. Experimental results show the effectiveness of our proposals in ASR.
自动语音识别(ASR)系统将输入的语音信号映射到输出的相应文本。它主要包括两个阶段:将语音信号编码为中间特征表示,再对中间特征表示进行解码,得到相应的字符。因此,有效地提取语音信号的特征和时间依赖关系至关重要。此外,还需要实现一种解码策略,充分利用语音转录的单调性。在本文中,我们提出了使用Transformer的语音转录,我们的目标是通过利用卷积神经网络的强度与自注意相结合来提高编码器的有效性。此外,我们还研究了在解码器侧加入单调性的可能性。实验结果表明了我们的方法在ASR中的有效性。
{"title":"Transformer with enhanced encoder and monotonic decoder for Automatic Speech recognition","authors":"","doi":"10.1109/DICTA56598.2022.10034576","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034576","url":null,"abstract":"Automatic speech recognition (ASR) systems map the input speech signals to corresponding texts in the output. It mainly consists of two stages: encoding the speech signals to an intermediate feature representation which are to be decoded to obtain the corresponding characters. Therefore, it is essential to extract features and temporal dependencies effectively from the speech signals. In addition, it is also necessary to implement a decoding strategy which adequately can leverage the monotonic property of speech transcription. In this paper, we have proposed speech transcription with Transformer where we aimed to enhance the encoder effectiveness by leveraging the strength of convolution neural network in conjunction with self-attention. In addition, we have investigated the possibilities of incorporating monotonicity at the decoder side. Experimental results show the effectiveness of our proposals in ASR.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123743016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SARFish: Space-Based Maritime Surveillance Using Complex Synthetic Aperture Radar Imagery SARFish:使用复杂合成孔径雷达图像的天基海上监视
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034640
T. Cao, Connor Luckett, Jerome Williams, T. Cooke, Ben Yip, Arvind Rajagopalan, S. Wong
Maritime Surveillance (MS) involves the detection and classification of maritime vessels using a range of imaging modalities, ranging from radio frequencies such as radar imaging to visible frequencies such as electro-optic (EO) imaging. Among the radar imaging category, Synthetic Aperture Radar (SAR) imagery plays an essential role in MS since SAR can operate in most weather conditions, day and night while providing the sufficient spatial resolution required [1]. An important task of MS is to monitor illegal, unreported, and unregulated (IUU) fishing activities which have caused damage to natural ecosystems with an estimated cost of billions of dollars to fisheries industry and governments worldwide [2]. Space-borne SAR imagery is especially suitable for monitoring IUU fishing activities since it can provide worldwide sensing and large image coverage in the order of hundred of kilometers per image scene.
海上监视(MS)涉及使用一系列成像模式对海上船只进行检测和分类,从无线电频率(如雷达成像)到可见频率(如电光(EO)成像)。在雷达成像类别中,合成孔径雷达(Synthetic Aperture radar, SAR)成像在MS中起着至关重要的作用,因为SAR可以在大多数天气条件下昼夜工作,同时提供所需的足够空间分辨率[1]。MS的一项重要任务是监测非法、不报告和不管制(IUU)捕捞活动,这些活动对自然生态系统造成了破坏,估计给全球渔业和政府造成了数十亿美元的损失[2]。星载SAR图像特别适合监测IUU捕鱼活动,因为它可以提供世界范围的遥感和大图像覆盖范围,每个图像场景的覆盖范围约为数百公里。
{"title":"SARFish: Space-Based Maritime Surveillance Using Complex Synthetic Aperture Radar Imagery","authors":"T. Cao, Connor Luckett, Jerome Williams, T. Cooke, Ben Yip, Arvind Rajagopalan, S. Wong","doi":"10.1109/DICTA56598.2022.10034640","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034640","url":null,"abstract":"Maritime Surveillance (MS) involves the detection and classification of maritime vessels using a range of imaging modalities, ranging from radio frequencies such as radar imaging to visible frequencies such as electro-optic (EO) imaging. Among the radar imaging category, Synthetic Aperture Radar (SAR) imagery plays an essential role in MS since SAR can operate in most weather conditions, day and night while providing the sufficient spatial resolution required [1]. An important task of MS is to monitor illegal, unreported, and unregulated (IUU) fishing activities which have caused damage to natural ecosystems with an estimated cost of billions of dollars to fisheries industry and governments worldwide [2]. Space-borne SAR imagery is especially suitable for monitoring IUU fishing activities since it can provide worldwide sensing and large image coverage in the order of hundred of kilometers per image scene.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127792670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GAN-Uplift: 2D to 3D Uplift with Generative Adversarial Networks gan隆起:2D到3D隆起与生成对抗网络
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034635
Human pose estimation and prediction has many applications from autonomous vehicles to video games development, animation and security. In many instances humans are recorded by video in two dimensions and this two-dimensional representation requires uplifting into three dimensions before being fully utilised. This paper proposes lifting two-dimensional skeleton representations of human movement into three dimensional skeleton representations over a sequence of human action, 2D to 3D uplift. The proposed approach builds on the work in HP-GAN [1] utilising a generative adversarial network (GAN) with a recurrent neural network encoder decoder as the generator and a multilayer fully connected neural network as the critic. A novel approach adds random noise from a normal distribution to the z dimension of each joint and a custom loss function consisting of the joint position in 3D space and bone length. The proposed algorithm GAN-Uplift successfully uplifts 2D motion sequences into their respective 3D motion sequences, with a sequence mean joint accuracy of 30.9mm and outperforms several state-of-theart methods and is within 0.4mm of the best state-of-the-art models on the Human3.6M skeleton dataset. In addition, stateof-the-art methods uplift a single pose from a sequence of pose input. GAN-Uplift uplifts a sequence of human poses rather than a single pose.
人体姿态估计和预测有许多应用,从自动驾驶汽车到视频游戏开发、动画和安全。在许多情况下,人类被视频记录在二维空间,而这种二维的表现需要在被充分利用之前提升到三维空间。本文提出将人类运动的二维骨架表示提升为三维骨架表示,通过一系列人类动作,2D到3D提升。所提出的方法建立在HP-GAN[1]中的工作基础上,利用生成式对抗网络(GAN),其中循环神经网络编码器解码器作为生成器,多层全连接神经网络作为批判器。一种新颖的方法将正态分布的随机噪声添加到每个关节的z维,并自定义由关节在3D空间中的位置和骨长度组成的损失函数。GAN-Uplift算法成功地将2D运动序列提升为各自的3D运动序列,序列平均关节精度为30.9mm,优于几种最先进的方法,与Human3.6M骨骼数据集上最先进的模型相差0.4mm。此外,最先进的方法从姿态输入序列中提升单个姿态。GAN-Uplift提升一系列人体姿势,而不是单一姿势。
{"title":"GAN-Uplift: 2D to 3D Uplift with Generative Adversarial Networks","authors":"","doi":"10.1109/DICTA56598.2022.10034635","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034635","url":null,"abstract":"Human pose estimation and prediction has many applications from autonomous vehicles to video games development, animation and security. In many instances humans are recorded by video in two dimensions and this two-dimensional representation requires uplifting into three dimensions before being fully utilised. This paper proposes lifting two-dimensional skeleton representations of human movement into three dimensional skeleton representations over a sequence of human action, 2D to 3D uplift. The proposed approach builds on the work in HP-GAN [1] utilising a generative adversarial network (GAN) with a recurrent neural network encoder decoder as the generator and a multilayer fully connected neural network as the critic. A novel approach adds random noise from a normal distribution to the z dimension of each joint and a custom loss function consisting of the joint position in 3D space and bone length. The proposed algorithm GAN-Uplift successfully uplifts 2D motion sequences into their respective 3D motion sequences, with a sequence mean joint accuracy of 30.9mm and outperforms several state-of-theart methods and is within 0.4mm of the best state-of-the-art models on the Human3.6M skeleton dataset. In addition, stateof-the-art methods uplift a single pose from a sequence of pose input. GAN-Uplift uplifts a sequence of human poses rather than a single pose.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125554964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual Image QR Codes: The Best of Both Worlds 双图像QR码:两全其美
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034633
Due to the high adoption rate of QR codes across the world, researchers have been attempting to improve classical QR codes by either improving their appearance to be more meaningful to human perception or improving their capability to be able to store more messages. In this work, we propose dual image QR codes that aim to improve both aspects while preserving the ability to be able to scan by standard QR code readers. We improve the appearance of the QR code using the halftone QR principle and increase the capacity of the QR code with the lenticular imaging technique. To test the robustness of the proposed QR code, we evaluated six important parameters and searched for appropriate conditions through 24, 000 combinations. From the experiments, we found 3, 714 appropriate conditions which achieved 100% successful scanning rate. Lastly, we also list of examples use cases to use in real-world situations for the proposed dual image QR codes.
由于二维码在世界范围内的高采用率,研究人员一直在尝试改进传统的二维码,要么改进其外观,使其对人类的感知更有意义,要么提高其存储更多信息的能力。在这项工作中,我们提出了双图像QR码,旨在改善这两个方面,同时保留标准QR码阅读器扫描的能力。我们利用半色调QR原理改善了QR码的外观,并利用透镜成像技术提高了QR码的容量。为了测试所提出的QR码的稳健性,我们评估了六个重要参数,并通过24,000个组合寻找合适的条件。通过实验,我们找到了3714个可以达到100%扫描成功率的合适条件。最后,我们还列出了在现实世界中使用双图像QR码的示例用例。
{"title":"Dual Image QR Codes: The Best of Both Worlds","authors":"","doi":"10.1109/DICTA56598.2022.10034633","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034633","url":null,"abstract":"Due to the high adoption rate of QR codes across the world, researchers have been attempting to improve classical QR codes by either improving their appearance to be more meaningful to human perception or improving their capability to be able to store more messages. In this work, we propose dual image QR codes that aim to improve both aspects while preserving the ability to be able to scan by standard QR code readers. We improve the appearance of the QR code using the halftone QR principle and increase the capacity of the QR code with the lenticular imaging technique. To test the robustness of the proposed QR code, we evaluated six important parameters and searched for appropriate conditions through 24, 000 combinations. From the experiments, we found 3, 714 appropriate conditions which achieved 100% successful scanning rate. Lastly, we also list of examples use cases to use in real-world situations for the proposed dual image QR codes.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131498421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Robust Approach for Small-Scale Object Detection From Aerial-View 一种基于鸟瞰图的小尺度目标检测鲁棒方法
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034601
In computer vision, object detection is an important task and gained significant progress but object detection using aerial images is still a challenging task for researchers. Small target size, low resolution, occlusion, attitude, and scale variations are the big concerns with aerial images that prevent many state-of-the-art object detectors performing well. In our experimentation, we have modified CenterNet and provided comparison of results achieved using nine different CNN-based backbones i.e., resNet18, resNet34, resNet50, resNet101, resNet152, res2Net50, res2Net101, DLA34 and hourglass104 and found promising results using invariant of centerNet and hourglass104 as a backbone. We used three challenging datasets to validate our approach i.e., VisDrone, Stanford and AU-AIR. By keeping the standard mAP, we achieved 91.62, 75.62 and 34.85 validation results using AU-AIR, Stanford and VisDrone datasets respectively. We have also compared the achieved mAP using IoU@0.5 and IoU@0.75 against different backbones. Our approach has achieved the promising results as compared to results provided in latest research.
在计算机视觉中,目标检测是一项重要的任务,并取得了重大进展,但利用航空图像进行目标检测仍然是一项具有挑战性的任务。小目标尺寸、低分辨率、遮挡、姿态和尺度变化是航空图像的主要问题,这些问题阻碍了许多最先进的目标探测器的良好运行。在我们的实验中,我们修改了CenterNet,并提供了使用九个不同的基于cnn的骨干网(resNet18, resNet34, resNet50, resNet101, resNet152, res2Net50, res2Net101, DLA34和hourglass104)取得的结果的比较,并发现使用CenterNet和hourglass104的不变量作为骨干网获得了有希望的结果。我们使用了三个具有挑战性的数据集来验证我们的方法,即VisDrone, Stanford和AU-AIR。在AU-AIR、Stanford和VisDrone数据集上,我们分别获得了91.62、75.62和34.85的验证结果。我们还比较了使用IoU@0.5和IoU@0.75在不同主干上实现的mAP。与最新的研究结果相比,我们的方法取得了令人鼓舞的结果。
{"title":"A Robust Approach for Small-Scale Object Detection From Aerial-View","authors":"","doi":"10.1109/DICTA56598.2022.10034601","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034601","url":null,"abstract":"In computer vision, object detection is an important task and gained significant progress but object detection using aerial images is still a challenging task for researchers. Small target size, low resolution, occlusion, attitude, and scale variations are the big concerns with aerial images that prevent many state-of-the-art object detectors performing well. In our experimentation, we have modified CenterNet and provided comparison of results achieved using nine different CNN-based backbones i.e., resNet18, resNet34, resNet50, resNet101, resNet152, res2Net50, res2Net101, DLA34 and hourglass104 and found promising results using invariant of centerNet and hourglass104 as a backbone. We used three challenging datasets to validate our approach i.e., VisDrone, Stanford and AU-AIR. By keeping the standard mAP, we achieved 91.62, 75.62 and 34.85 validation results using AU-AIR, Stanford and VisDrone datasets respectively. We have also compared the achieved mAP using IoU@0.5 and IoU@0.75 against different backbones. Our approach has achieved the promising results as compared to results provided in latest research.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120983393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Disentangling Convolutional Neural Network towards an explainable Vehicle Classifier 面向可解释车辆分类器的卷积神经网络解纠缠
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034615
Vehicle category classification is an integral part of intelligent transportation systems (ITS). In this context, vision-based approaches are of increasing interest due to recent progress in camera hardware and machine learning algorithms. Currently, for vision-based classification an end-to-end approach based on Convolutional Neural Networks (CNNs) is the state-of-the-art. However, their inherent black-box approach and the difficulty of modifying existing or adding new categories currently limit their application in ITS. Here, we present an alternative classification approach that partially removes these limitations. It consists of three parts: 1) a CNN-based detector for semantically strong vehicle parts provides the basis for 2) a feature construction step, followed by 3) the final classification based on a decision tree. Ultimately this approach will allow to keep the training-intensive part-detector fixed, once a sufficiently large set of vehicle parts has been trained. Modification of existing categories and addition of new ones are possible by changes to the feature construction and classification steps only. We illustrate the effectiveness of this approach through the extension of the vehicle classifier from 11 to 16 categories by adding an “articulate” feature. In addition, the vehicle parts provide clear interpretability and the conceptually simple feature construction and decision tree classifier provide explainability of the approach. Nevertheless, the part-based classifier achieves comparable accuracy to an end-to-end CNN model trained on all 16 classes.
车辆类别分类是智能交通系统的重要组成部分。在这种情况下,由于相机硬件和机器学习算法的最新进展,基于视觉的方法越来越受到关注。目前,对于基于视觉的分类,基于卷积神经网络(cnn)的端到端方法是最先进的。然而,它们固有的黑盒方法和修改现有或添加新类别的困难目前限制了它们在ITS中的应用。在这里,我们提出了另一种分类方法,它部分地消除了这些限制。它由三部分组成:1)基于cnn的语义强车辆部件检测器提供基础;2)特征构建步骤;3)基于决策树的最终分类。最终,这种方法将允许训练密集的部件检测器保持固定,一旦足够大的车辆部件集被训练。修改现有的类别和增加新的类别是可能的,通过改变特征构造和分类步骤。我们通过添加“铰接”特征将车辆分类器从11个类别扩展到16个类别来说明这种方法的有效性。此外,车辆部件提供了清晰的可解释性,概念简单的特征构造和决策树分类器提供了该方法的可解释性。然而,基于部分的分类器达到了与在所有16个类上训练的端到端CNN模型相当的精度。
{"title":"Disentangling Convolutional Neural Network towards an explainable Vehicle Classifier","authors":"","doi":"10.1109/DICTA56598.2022.10034615","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034615","url":null,"abstract":"Vehicle category classification is an integral part of intelligent transportation systems (ITS). In this context, vision-based approaches are of increasing interest due to recent progress in camera hardware and machine learning algorithms. Currently, for vision-based classification an end-to-end approach based on Convolutional Neural Networks (CNNs) is the state-of-the-art. However, their inherent black-box approach and the difficulty of modifying existing or adding new categories currently limit their application in ITS. Here, we present an alternative classification approach that partially removes these limitations. It consists of three parts: 1) a CNN-based detector for semantically strong vehicle parts provides the basis for 2) a feature construction step, followed by 3) the final classification based on a decision tree. Ultimately this approach will allow to keep the training-intensive part-detector fixed, once a sufficiently large set of vehicle parts has been trained. Modification of existing categories and addition of new ones are possible by changes to the feature construction and classification steps only. We illustrate the effectiveness of this approach through the extension of the vehicle classifier from 11 to 16 categories by adding an “articulate” feature. In addition, the vehicle parts provide clear interpretability and the conceptually simple feature construction and decision tree classifier provide explainability of the approach. Nevertheless, the part-based classifier achieves comparable accuracy to an end-to-end CNN model trained on all 16 classes.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114772468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Region Adaptive Motion Estimation Strategy Leveraging on the Edge Position Difference Measure: Anonymous ICME submission 利用边缘位置差度量的区域自适应运动估计策略:匿名ICME提交
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034565
To capture motion homogeneity between successive frames, the edge position difference (EPD) measure based motion modeling (EPD-MM) has shown good motion compensation capabilities. The EPD-MM technique is underpinned by the fact that from one frame to next, edges map to edges and such mapping can be captured by an appropriate motion model. However, the EPD-MM approach may produce inferior quality motion model in those regions of the current frame where moving edges are few in number. For such regions, traditional pixel intensity difference (PID) measure based motion modeling (PID-MM) may yield superior motion compensation. Therefore, in this paper, the entire current frame is at first partitioned into two regions (edge dominant region and edge sparse region) based on the frequency of moving edge pixels. This segmentation is carried out over the EPD image since it possesses information pertinent to the distance of every pixel from its nearest edge. After that for motion modeling, in the edge dominant region, the EPD-MM technique is adopted and for the rest of the current frame regions, the PID-MM approach is chosen. Experimental results show an improved prediction PSNR of 1.90 dB from the proposed approach compared to that of the baseline EPD-MM approach that does not differentiate between edge dominant and edge sparse regions. Moreover, if this predicted frame is employed as an additional reference frame to encode current frames, bit rate savings of up to 7.84% is achievable over a HEVC reference codec.
为了捕获连续帧之间的运动均匀性,基于边缘位置差(EPD)测度的运动建模(EPD- mm)显示出良好的运动补偿能力。EPD-MM技术的基础是,从一帧到下一帧,边缘映射到边缘,这种映射可以通过适当的运动模型捕获。然而,对于当前帧中运动边缘数量较少的区域,EPD-MM方法可能产生质量较差的运动模型。对于这些区域,传统的基于像素强度差(PID)测量的运动建模(PID- mm)可能会产生更好的运动补偿。因此,本文首先根据边缘像素的移动频率将整个当前帧划分为边缘优势区和边缘稀疏区。这种分割是在EPD图像上进行的,因为它拥有与每个像素到其最近边缘的距离相关的信息。在运动建模中,在边缘优势区域采用EPD-MM技术,在当前帧区域采用PID-MM方法。实验结果表明,与不区分边缘优势区和边缘稀疏区的基线EPD-MM方法相比,该方法的预测PSNR提高了1.90 dB。此外,如果使用该预测帧作为额外的参考帧来编码当前帧,那么通过HEVC参考编解码器可以实现高达7.84%的比特率节省。
{"title":"A Region Adaptive Motion Estimation Strategy Leveraging on the Edge Position Difference Measure: Anonymous ICME submission","authors":"","doi":"10.1109/DICTA56598.2022.10034565","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034565","url":null,"abstract":"To capture motion homogeneity between successive frames, the edge position difference (EPD) measure based motion modeling (EPD-MM) has shown good motion compensation capabilities. The EPD-MM technique is underpinned by the fact that from one frame to next, edges map to edges and such mapping can be captured by an appropriate motion model. However, the EPD-MM approach may produce inferior quality motion model in those regions of the current frame where moving edges are few in number. For such regions, traditional pixel intensity difference (PID) measure based motion modeling (PID-MM) may yield superior motion compensation. Therefore, in this paper, the entire current frame is at first partitioned into two regions (edge dominant region and edge sparse region) based on the frequency of moving edge pixels. This segmentation is carried out over the EPD image since it possesses information pertinent to the distance of every pixel from its nearest edge. After that for motion modeling, in the edge dominant region, the EPD-MM technique is adopted and for the rest of the current frame regions, the PID-MM approach is chosen. Experimental results show an improved prediction PSNR of 1.90 dB from the proposed approach compared to that of the baseline EPD-MM approach that does not differentiate between edge dominant and edge sparse regions. Moreover, if this predicted frame is employed as an additional reference frame to encode current frames, bit rate savings of up to 7.84% is achievable over a HEVC reference codec.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121773806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
FaceCook: Attribute-Controllable Face Generation Based on Linear Scaling Factors FaceCook:基于线性缩放因子的属性可控人脸生成
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034574
With the excellent disentanglement properties of state-of-the-art generative models, image editing has been the dominant approach to controlling the attributes of synthesized face images. However, these edited results often suffer from artifacts or incorrect feature rendering, especially when there is a large discrepancy between the image to be edited and the desired feature set. Therefore, we propose a new approach to mapping the latent vectors of the generative model to the scaling factors through solving a set of multivariate linear equations. The coefficients of the equations are the eigenvectors of the weight parameters of the pre-trained model, which form the basis of a hyper coordinate system. The qualitative and quantitative results both show that the proposed method outperforms the baseline in terms of image diversity. In addition, the method is much more time-efficient since the synthesized images with desirable features can be obtained directly from the latent vectors, rather than the former process of editing randomly generated images with redundant steps.
由于最先进的生成模型具有出色的解纠缠特性,图像编辑已成为控制合成人脸图像属性的主要方法。然而,这些编辑后的结果往往会受到伪影或不正确的特征呈现的影响,特别是当待编辑的图像与期望的特征集之间存在很大差异时。因此,我们提出了一种新的方法,通过求解一组多元线性方程,将生成模型的潜在向量映射到比例因子。方程的系数是预训练模型权参数的特征向量,构成超坐标系的基础。定性和定量结果均表明,该方法在图像多样性方面优于基线方法。此外,该方法可以直接从潜在向量中获得具有所需特征的合成图像,而不是通过冗余步骤对随机生成的图像进行编辑,从而节省了时间。
{"title":"FaceCook: Attribute-Controllable Face Generation Based on Linear Scaling Factors","authors":"","doi":"10.1109/DICTA56598.2022.10034574","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034574","url":null,"abstract":"With the excellent disentanglement properties of state-of-the-art generative models, image editing has been the dominant approach to controlling the attributes of synthesized face images. However, these edited results often suffer from artifacts or incorrect feature rendering, especially when there is a large discrepancy between the image to be edited and the desired feature set. Therefore, we propose a new approach to mapping the latent vectors of the generative model to the scaling factors through solving a set of multivariate linear equations. The coefficients of the equations are the eigenvectors of the weight parameters of the pre-trained model, which form the basis of a hyper coordinate system. The qualitative and quantitative results both show that the proposed method outperforms the baseline in terms of image diversity. In addition, the method is much more time-efficient since the synthesized images with desirable features can be obtained directly from the latent vectors, rather than the former process of editing randomly generated images with redundant steps.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123827867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Convolution-mix-Transformer Generator model to synthesize PET images from CT scans 卷积-混合-变压器发生器模型合成CT扫描的PET图像
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034560
T. Thanh
PET/CT is a type of medical imaging that has been shown to be useful for both disease diagnosis and treatment monitoring. However, PET/CT imaging systems are still not widely used in hospitals because radioactive material injection is required and there aren't many PET scanners available, but CT scanners are widely used in health care facilities. In this paper, we introduce a new Convolution-mix-Transformer as a Generator network for translating CT images to PET images. The method has been tested on PET/CT images of 791 patients, CT images are trained as a full range image instead of just a small window, as systems often do to make it easier to see specific parts of the body. We generate PET images and analyze SUV values based on CT images. We use five common measures to evaluate results: peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), mean structural similarity index measure (MSSIM), mean absolute error (MAE) and Fréchet inception distance (FID), and our result which are significantly higher than other models. Our suggested model successfully converts CT images to PET images by including a trained image-to-image translation approach. The generated PET images maintain both the regional and global features of medical imaging.
PET/CT是一种医学成像,已被证明对疾病诊断和治疗监测都很有用。然而,PET/CT成像系统仍未在医院广泛使用,因为需要注入放射性物质,并且可用的PET扫描仪不多,但CT扫描仪在医疗机构中广泛使用。在本文中,我们引入了一种新的卷积混合变压器作为生成器网络,用于将CT图像转换为PET图像。该方法已在791名患者的PET/CT图像上进行了测试,CT图像被训练为全范围图像,而不仅仅是一个小窗口,而系统通常这样做是为了更容易看到身体的特定部位。我们生成PET图像,并基于CT图像分析SUV值。采用峰值信噪比(PSNR)、结构相似度指数(SSIM)、平均结构相似度指数(MSSIM)、平均绝对误差(MAE)和fr起始距离(FID)等5种常用度量指标对结果进行评价,结果显著高于其他模型。我们建议的模型通过包含经过训练的图像到图像的转换方法,成功地将CT图像转换为PET图像。生成的PET图像既保持了医学成像的区域特征,又保持了医学成像的全局特征。
{"title":"Convolution-mix-Transformer Generator model to synthesize PET images from CT scans","authors":"T. Thanh","doi":"10.1109/DICTA56598.2022.10034560","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034560","url":null,"abstract":"PET/CT is a type of medical imaging that has been shown to be useful for both disease diagnosis and treatment monitoring. However, PET/CT imaging systems are still not widely used in hospitals because radioactive material injection is required and there aren't many PET scanners available, but CT scanners are widely used in health care facilities. In this paper, we introduce a new Convolution-mix-Transformer as a Generator network for translating CT images to PET images. The method has been tested on PET/CT images of 791 patients, CT images are trained as a full range image instead of just a small window, as systems often do to make it easier to see specific parts of the body. We generate PET images and analyze SUV values based on CT images. We use five common measures to evaluate results: peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), mean structural similarity index measure (MSSIM), mean absolute error (MAE) and Fréchet inception distance (FID), and our result which are significantly higher than other models. Our suggested model successfully converts CT images to PET images by including a trained image-to-image translation approach. The generated PET images maintain both the regional and global features of medical imaging.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"158 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124417430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1