首页 > 最新文献

2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)最新文献

英文 中文
A Lightweight Image Decolorization Approach based on Contrast Preservation 一种基于对比度保持的轻量图像脱色方法
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034612
Nowadays, color-to-gray conversion known as decolorization is widely used not only in image processing but also in machine learning and deep learning-based applications, due to its lower complexity. In the conventional decolorization process, in general, fixed conversion coefficients are used for whole images. Using fixed coefficients, namely weighting parameters, for the whole images may lead to deterioration of the quality of original color images. On the other hand, using unfixed conversion coefficients, which depend on the contents of the images, demonstrates better decolorization performance with respect to the fixed coefficients. The critical points in the decolorization process are to preserve spatial information, such as contrast preserving, and to have less computational complexity. In this study, a very fast decolorization method is proposed without sacrificing contrast preservation. The proposed method exploits the gradient-based correlation similarity approach, in which global and local contrast information is considered for a total of 66 distinct weighting coefficients. Under the CADIK dataset, COLOR250 dataset and high resolution images, these weighting coefficients are reduced by using the frequency of best weighting coefficients. Experimental results show that the proposed method can be used in real-time without compromising the image decolorization performance.
如今,颜色到灰色的转换,即脱色,由于其较低的复杂性,不仅广泛应用于图像处理,而且还广泛应用于机器学习和基于深度学习的应用中。在传统的脱色过程中,通常对整个图像使用固定的转换系数。对整个图像使用固定系数即加权参数,可能会导致原始彩色图像质量的下降。另一方面,使用依赖于图像内容的非固定转换系数,相对于固定系数,表现出更好的脱色性能。脱色过程的关键是保持空间信息,如保持对比度,并降低计算复杂度。在本研究中,提出了一种不牺牲对比度保存的快速脱色方法。该方法利用基于梯度的相关相似度方法,对66个不同的加权系数考虑全局和局部对比信息。在CADIK数据集、COLOR250数据集和高分辨率图像下,利用最佳加权系数的频率来降低这些加权系数。实验结果表明,该方法可以在不影响图像脱色性能的情况下实现实时脱色。
{"title":"A Lightweight Image Decolorization Approach based on Contrast Preservation","authors":"","doi":"10.1109/DICTA56598.2022.10034612","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034612","url":null,"abstract":"Nowadays, color-to-gray conversion known as decolorization is widely used not only in image processing but also in machine learning and deep learning-based applications, due to its lower complexity. In the conventional decolorization process, in general, fixed conversion coefficients are used for whole images. Using fixed coefficients, namely weighting parameters, for the whole images may lead to deterioration of the quality of original color images. On the other hand, using unfixed conversion coefficients, which depend on the contents of the images, demonstrates better decolorization performance with respect to the fixed coefficients. The critical points in the decolorization process are to preserve spatial information, such as contrast preserving, and to have less computational complexity. In this study, a very fast decolorization method is proposed without sacrificing contrast preservation. The proposed method exploits the gradient-based correlation similarity approach, in which global and local contrast information is considered for a total of 66 distinct weighting coefficients. Under the CADIK dataset, COLOR250 dataset and high resolution images, these weighting coefficients are reduced by using the frequency of best weighting coefficients. Experimental results show that the proposed method can be used in real-time without compromising the image decolorization performance.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"47 10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123169416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identification of suspicious naevi in dermoscopy images by learning their appearance 通过了解皮肤镜图像中可疑痣的外观来识别它们
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034588
Naevi are benign skin lesions that appear on the skin surface as small brown, tan, or pink spots. These are important to monitor, as a high number of naevi is the strongest known phenotypic risk factor for melanoma, and about a third of melanoma are thought to derive directly from a precursor naevus. The main aim of this research is to model the dermoscopic naevus appearance using machine learning and characterise their features by classifying them as suspicious or non-suspicious naevi. To extract the prominent appearance features of naevi, principal component analysis (PCA) along with convolutional autoencoder (CAE) were implemented for automated feature extraction. These features were then used for the classification of naevi using random forest (RF) and artificial neural network (ANN) classifiers. Using the features extracted by CAE, ANN achieved high average accuracy, specificity, sensitivity, precision, and AUC of 95.62%, 91.24%, 100%, 91.95%, 95.6% respectively. In addition, RF showed that both PCA and CAE based methods had an overall accuracy of 88.46%. Moreover, RF used to rank the features which helped in selecting the most important features useful for naevus classification. If validated clinically, machine learning (ML) approaches might be an efficient guide in early detection of melanoma by identifying the suspicious naevi clinicians need to assess carefully.
痣是一种良性皮肤病变,出现在皮肤表面的小棕色、棕褐色或粉红色斑点。这些对监测很重要,因为大量痣是黑色素瘤已知最强的表型风险因素,大约三分之一的黑色素瘤被认为直接来源于前体痣。本研究的主要目的是使用机器学习对皮肤镜下的痣外观进行建模,并通过将其分类为可疑痣或非可疑痣来表征其特征。为了提取naevi的突出外观特征,采用主成分分析(PCA)和卷积自编码器(CAE)进行特征自动提取。然后使用随机森林(RF)和人工神经网络(ANN)分类器将这些特征用于naevi分类。利用CAE提取的特征,ANN的平均准确率、特异度、灵敏度、精密度和AUC分别达到95.62%、91.24%、100%、91.95%和95.6%。此外,RF显示,基于PCA和CAE的方法的总体准确率均为88.46%。此外,RF用于对特征进行排序,这有助于选择对痣分类有用的最重要特征。如果在临床上得到验证,机器学习(ML)方法可能会通过识别临床医生需要仔细评估的可疑痣,成为早期发现黑色素瘤的有效指南。
{"title":"Identification of suspicious naevi in dermoscopy images by learning their appearance","authors":"","doi":"10.1109/DICTA56598.2022.10034588","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034588","url":null,"abstract":"Naevi are benign skin lesions that appear on the skin surface as small brown, tan, or pink spots. These are important to monitor, as a high number of naevi is the strongest known phenotypic risk factor for melanoma, and about a third of melanoma are thought to derive directly from a precursor naevus. The main aim of this research is to model the dermoscopic naevus appearance using machine learning and characterise their features by classifying them as suspicious or non-suspicious naevi. To extract the prominent appearance features of naevi, principal component analysis (PCA) along with convolutional autoencoder (CAE) were implemented for automated feature extraction. These features were then used for the classification of naevi using random forest (RF) and artificial neural network (ANN) classifiers. Using the features extracted by CAE, ANN achieved high average accuracy, specificity, sensitivity, precision, and AUC of 95.62%, 91.24%, 100%, 91.95%, 95.6% respectively. In addition, RF showed that both PCA and CAE based methods had an overall accuracy of 88.46%. Moreover, RF used to rank the features which helped in selecting the most important features useful for naevus classification. If validated clinically, machine learning (ML) approaches might be an efficient guide in early detection of melanoma by identifying the suspicious naevi clinicians need to assess carefully.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123327880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Unified Framework for Effective Knowledge Distillation in Single-stage Object Detectors 单阶段目标检测器中有效知识提炼的统一框架
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034590
Knowledge distillation for object detection has seen far less activity in literature than classification tasks. This is primarily due to the complexity of object detection, which involves classification as well as localisation. In this paper, we propose a three-pronged distillation framework which includes: (a) homogenised logit-matching; (b) hint learning; and (c) soft masking. Prediction matching is a commonly used response-based distillation technique, which suffers from knowledge loss due to filtering of detected objects through non-maximal suppression. To circumvent this problem, we draw inspiration from the idea of hint learning to propose output logit-matching, where the teacher and student outputs feature maps are directly matched without filtering any of the teacher's detections as in prediction matching. We demonstrate that by transferring the raw knowledge of the high-performing teacher, we reduce the knowledge loss and thereby improve the student's performance. We then perform an ablation study to understand whether early, middle or latestage hint learning is most beneficial. Finally, we propose an alternate imitation masking technique called “soft” masking that uses a 2D Gaussian to mask regions of interest on a feature map. As opposed to vanilla “hard” imitation masking, we show that this method satisfies the philosophy of using softened labels for effective knowledge distillation.
在文献中,用于对象检测的知识蒸馏的活动远远少于分类任务。这主要是由于对象检测的复杂性,它涉及分类和定位。在本文中,我们提出了一个三管齐下的蒸馏框架,其中包括:(a)均质逻辑匹配;(b)提示学习;(c)软屏蔽。预测匹配是一种常用的基于响应的精馏技术,由于对检测对象进行非最大抑制过滤,存在知识损失。为了解决这个问题,我们从提示学习的思想中汲取灵感,提出了输出对数匹配,其中教师和学生的输出特征映射直接匹配,而不像预测匹配那样过滤任何教师的检测。我们证明,通过转移高绩效教师的原始知识,我们减少了知识损失,从而提高了学生的表现。然后我们进行消融研究,以了解早期,中期或晚期暗示学习是最有益的。最后,我们提出了一种替代的模仿掩蔽技术,称为“软”掩蔽,它使用二维高斯掩蔽特征图上感兴趣的区域。与香草的“硬”模仿掩蔽相反,我们表明这种方法满足使用软化标签进行有效知识蒸馏的理念。
{"title":"Unified Framework for Effective Knowledge Distillation in Single-stage Object Detectors","authors":"","doi":"10.1109/DICTA56598.2022.10034590","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034590","url":null,"abstract":"Knowledge distillation for object detection has seen far less activity in literature than classification tasks. This is primarily due to the complexity of object detection, which involves classification as well as localisation. In this paper, we propose a three-pronged distillation framework which includes: (a) homogenised logit-matching; (b) hint learning; and (c) soft masking. Prediction matching is a commonly used response-based distillation technique, which suffers from knowledge loss due to filtering of detected objects through non-maximal suppression. To circumvent this problem, we draw inspiration from the idea of hint learning to propose output logit-matching, where the teacher and student outputs feature maps are directly matched without filtering any of the teacher's detections as in prediction matching. We demonstrate that by transferring the raw knowledge of the high-performing teacher, we reduce the knowledge loss and thereby improve the student's performance. We then perform an ablation study to understand whether early, middle or latestage hint learning is most beneficial. Finally, we propose an alternate imitation masking technique called “soft” masking that uses a 2D Gaussian to mask regions of interest on a feature map. As opposed to vanilla “hard” imitation masking, we show that this method satisfies the philosophy of using softened labels for effective knowledge distillation.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"2014 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128999840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Understanding the Effect of Smartphone Cameras on Estimating Munsell Soil Color from Imagery 了解智能手机相机对从图像中估计孟塞尔土壤颜色的影响
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034624
Many theories have developed over the years in attempts to better explain how colours work and how best to calculate and describe colour differences. In 1913 Albert Munsell introduced the Atlas of the Munsell Color System, arranging colour into the tristimulus of hue, value, and chroma. The Munsell colour system has enabled professionals to bridge the disciplines of art and science and is the basis of many professional applications today such as food science [1], dentistry [2], printing [3], painting [4], and soil science [5].
多年来,人们发展了许多理论,试图更好地解释颜色是如何工作的,以及如何最好地计算和描述色差。1913年,阿尔伯特·孟塞尔推出了孟塞尔色彩系统图集,将色彩分为色调、值和色度三种刺激。孟塞尔色彩系统使专业人士能够跨越艺术和科学的学科,是当今许多专业应用的基础,如食品科学[1]、牙科[2]、印刷[3]、绘画[4]和土壤科学[5]。
{"title":"Understanding the Effect of Smartphone Cameras on Estimating Munsell Soil Color from Imagery","authors":"","doi":"10.1109/DICTA56598.2022.10034624","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034624","url":null,"abstract":"Many theories have developed over the years in attempts to better explain how colours work and how best to calculate and describe colour differences. In 1913 Albert Munsell introduced the Atlas of the Munsell Color System, arranging colour into the tristimulus of hue, value, and chroma. The Munsell colour system has enabled professionals to bridge the disciplines of art and science and is the basis of many professional applications today such as food science [1], dentistry [2], printing [3], painting [4], and soil science [5].","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126792124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Filtering Method for SIFT based Palm Vein Recognition 基于SIFT的手掌静脉识别滤波方法
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034589
A key issue with palm vein images is that slight movements of fingers and the thumb or changes in the hand pose can stretch the skin in different areas and alter the vein patterns. This can produce palm vein images with an infinite number of variations for a given subject. This paper presents a novel filtering method for SIFT based feature matching referred to as the Median Distance (MMD) Filter, which checks the difference of keypoint coordinates and calculates the mean and the median in each direction, and uses a set of rules to determine the correct matches. Our experiments conducted with the 850nm subset of the CASIA dataset show that the MMD filter can detect and filter false positives that were not detected by other filtering methods. Comparison against existing SIFT based palm vein recognition systems demonstrates that the proposed MMD filter produces excellent performance with lower EER values.
手掌静脉图像的一个关键问题是手指和拇指的轻微运动或手部姿势的变化可以拉伸不同区域的皮肤并改变静脉模式。对于给定的对象,这可以产生具有无限数量变化的手掌静脉图像。本文提出了一种新的基于SIFT的特征匹配滤波方法——中值距离滤波(MMD),该方法通过检测关键点坐标的差值,计算每个方向上的均值和中值,并使用一套规则来确定正确的匹配。我们对CASIA数据集的850nm子集进行的实验表明,MMD滤波器可以检测和过滤其他滤波方法无法检测到的假阳性。与现有基于SIFT的掌纹识别系统的比较表明,本文提出的MMD滤波器在较低的EER值下具有优异的性能。
{"title":"A Filtering Method for SIFT based Palm Vein Recognition","authors":"","doi":"10.1109/DICTA56598.2022.10034589","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034589","url":null,"abstract":"A key issue with palm vein images is that slight movements of fingers and the thumb or changes in the hand pose can stretch the skin in different areas and alter the vein patterns. This can produce palm vein images with an infinite number of variations for a given subject. This paper presents a novel filtering method for SIFT based feature matching referred to as the Median Distance (MMD) Filter, which checks the difference of keypoint coordinates and calculates the mean and the median in each direction, and uses a set of rules to determine the correct matches. Our experiments conducted with the 850nm subset of the CASIA dataset show that the MMD filter can detect and filter false positives that were not detected by other filtering methods. Comparison against existing SIFT based palm vein recognition systems demonstrates that the proposed MMD filter produces excellent performance with lower EER values.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125466702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explainable Deep Learning for Medical Imaging Models Through Class Specific Semantic Dictionaries 基于类特定语义字典的医学影像模型的可解释深度学习
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034639
Explainability is important in the design and deployment of neural networks. It allows engineers to design better models and can give end-users an improved understanding of the outputs. However, many explainability methods are unsuited to the domain of medical imaging. Saliency mapping methods only describe what regions of an input image contributed to the output, but don't explain the important visual features within those regions. Feature visualization methods have not yet been useful in the domain of medical imaging due to the visual complexity of images generally resulting in un-interpretable features. In this work, we propose a novel explainability technique called “Class Specific Semantic Dictionaries”. This extends saliency mapping and feature visualisation methods to enable the analysis of neural network decision-making in the context of medical image diagnosis. By utilising gradient information from the fully connected layers, our approach is able to give insight into the channels deemed important by the network for the diagnosis of each particular disease. The important channels for a class are contextualised by showing the highly activating examples from the training data, providing an understanding of the learned features through example. The explainability techniques are combined into a single User Interface (UI) to streamline the evaluation of neural networks. To demonstrate how our new method overcomes the explainability challenges of medical imaging models we analyse COVID-Net, an open source convolutional neural network for diagnosing COVID-19 from chest x-rays. We present evidence that, despite achieving 96.3% accuracy on the test data, COVID-Net uses confounding variables not indicative of underlying disease to discriminate between COVID-Positive and COVID-Negative patients and may not generalise well on new data.
可解释性在神经网络的设计和部署中很重要。它允许工程师设计更好的模型,并可以让最终用户更好地理解输出。然而,许多可解释性方法并不适用于医学成像领域。显著性映射方法只描述输入图像的哪些区域对输出有贡献,但不解释这些区域内的重要视觉特征。由于图像的视觉复杂性通常导致无法解释的特征,特征可视化方法尚未在医学成像领域发挥作用。在这项工作中,我们提出了一种新的可解释性技术,称为“类特定语义字典”。这扩展了显著性映射和特征可视化方法,使医学图像诊断背景下的神经网络决策分析成为可能。通过利用来自完全连接层的梯度信息,我们的方法能够深入了解网络认为对每种特定疾病诊断重要的通道。通过显示来自训练数据的高度激活的示例,将类的重要通道置于上下文环境中,通过示例提供对学习特征的理解。可解释性技术被合并到一个单一的用户界面(UI),以简化神经网络的评估。为了演示我们的新方法如何克服医学成像模型的可解释性挑战,我们分析了COVID-Net,这是一个用于从胸部x射线诊断COVID-19的开源卷积神经网络。我们提供的证据表明,尽管测试数据的准确性达到96.3%,但COVID-Net使用不指示潜在疾病的混杂变量来区分covid - 19阳性和covid - 19阴性患者,并且可能无法很好地概括新数据。
{"title":"Explainable Deep Learning for Medical Imaging Models Through Class Specific Semantic Dictionaries","authors":"","doi":"10.1109/DICTA56598.2022.10034639","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034639","url":null,"abstract":"Explainability is important in the design and deployment of neural networks. It allows engineers to design better models and can give end-users an improved understanding of the outputs. However, many explainability methods are unsuited to the domain of medical imaging. Saliency mapping methods only describe what regions of an input image contributed to the output, but don't explain the important visual features within those regions. Feature visualization methods have not yet been useful in the domain of medical imaging due to the visual complexity of images generally resulting in un-interpretable features. In this work, we propose a novel explainability technique called “Class Specific Semantic Dictionaries”. This extends saliency mapping and feature visualisation methods to enable the analysis of neural network decision-making in the context of medical image diagnosis. By utilising gradient information from the fully connected layers, our approach is able to give insight into the channels deemed important by the network for the diagnosis of each particular disease. The important channels for a class are contextualised by showing the highly activating examples from the training data, providing an understanding of the learned features through example. The explainability techniques are combined into a single User Interface (UI) to streamline the evaluation of neural networks. To demonstrate how our new method overcomes the explainability challenges of medical imaging models we analyse COVID-Net, an open source convolutional neural network for diagnosing COVID-19 from chest x-rays. We present evidence that, despite achieving 96.3% accuracy on the test data, COVID-Net uses confounding variables not indicative of underlying disease to discriminate between COVID-Positive and COVID-Negative patients and may not generalise well on new data.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130898078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semantic multi-modal reprojection for robust visual question answering 鲁棒视觉问答的语义多模态重投影
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034598
Wei Luo, Wei Luo
Despite the recent progress in the development of Vision-Language models in accurate visual question answering (VQA), the robustness of these models is still quite limited under the presence of out-of-distribution datasets that include unanswerable questions. In our work, we first implement a randomized VQA dataset with unanswerable questions to test the robustness of a state-of-the-art VQA model. The dataset combines the visual inputs with randomized questions from the VQA v2 dataset to test the sensitivity of predictions from the model. We establish that even on unanswerable questions that are not relevant to the visual clues, a state-of-the-art VQA model either fails to predict the “unknown” answer, or gives an inaccurate answer with a high softmax score. To alleviate this issue without needing to retrain the large backbone models, we propose a technique called Cross Modal Augmentation (CMA), a multi-modal semantic augmentation during test time only, which reprojects the visual and textual inputs into multiple copies, while maintaining semantic information. These multiple instances, with similar semantics, are then fed to the same model and the predictions are combined to achieve a more robust output from the model. We demonstrate that using this model-agnostic technique enables the VQA model to provide more robust answers in scenarios that may include unanswerable questions.
尽管视觉语言模型在精确视觉问答(VQA)领域的发展取得了一定的进展,但这些模型的鲁棒性仍然受到分布外数据集(包括无法回答的问题)的限制。在我们的工作中,我们首先实现了一个随机的VQA数据集,其中包含无法回答的问题,以测试最先进的VQA模型的鲁棒性。该数据集将视觉输入与来自VQA v2数据集的随机问题相结合,以测试模型预测的灵敏度。我们确定,即使在与视觉线索无关的无法回答的问题上,最先进的VQA模型要么无法预测“未知”的答案,要么给出一个高softmax分数的不准确答案。为了在不需要重新训练大型骨干模型的情况下缓解这一问题,我们提出了一种称为跨模态增强(Cross Modal Augmentation, CMA)的技术,即仅在测试时间内进行多模态语义增强,该技术将视觉和文本输入重新投影到多个副本中,同时保持语义信息。然后将这些具有相似语义的多个实例馈送到相同的模型中,并将预测组合起来,以从模型中获得更健壮的输出。我们证明,使用这种与模型无关的技术可以使VQA模型在可能包含无法回答的问题的场景中提供更健壮的答案。
{"title":"Semantic multi-modal reprojection for robust visual question answering","authors":"Wei Luo, Wei Luo","doi":"10.1109/DICTA56598.2022.10034598","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034598","url":null,"abstract":"Despite the recent progress in the development of Vision-Language models in accurate visual question answering (VQA), the robustness of these models is still quite limited under the presence of out-of-distribution datasets that include unanswerable questions. In our work, we first implement a randomized VQA dataset with unanswerable questions to test the robustness of a state-of-the-art VQA model. The dataset combines the visual inputs with randomized questions from the VQA v2 dataset to test the sensitivity of predictions from the model. We establish that even on unanswerable questions that are not relevant to the visual clues, a state-of-the-art VQA model either fails to predict the “unknown” answer, or gives an inaccurate answer with a high softmax score. To alleviate this issue without needing to retrain the large backbone models, we propose a technique called Cross Modal Augmentation (CMA), a multi-modal semantic augmentation during test time only, which reprojects the visual and textual inputs into multiple copies, while maintaining semantic information. These multiple instances, with similar semantics, are then fed to the same model and the predictions are combined to achieve a more robust output from the model. We demonstrate that using this model-agnostic technique enables the VQA model to provide more robust answers in scenarios that may include unanswerable questions.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115751805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
RPPG Detection in Children with Autism Spectrum Disorder during Robot-child Interaction Studies 机器人-儿童互动研究中自闭症谱系障碍儿童的RPPG检测
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034613
Children with ASD (Autism spectrum disorder) have difficulties in expressing their emotions and social interaction with their environment. In recent studies, asistive robots are used to support children's social skills and emotion recognition is performed to improve the quality of the interaction between the robot and the child. In this study, rPPG(remote photoplethysmography) signals were extracted using face images that were captured through camera during the interaction between the robot and children with ASD. These signals were then compared with the physiological data obtained through Empatica E4 wristwatch. The results were evaluated and the reasons that might affect the results were discussed. Although the correlation between these data was low, some advantages were found in the results compared to both the E4 wristwatch and emotion recognition from face. Unlike the signals obtained from E4, rPPG signals can be found when the child moves. In addition, although no emotion can be detected from the child's face, rPPG signals can be obtained. The aim of the study is to use rPPG signals in emotion recognition as an alternative method since other emotion recognition modalities face with challenges during robot child interaction in children with ASD.
患有ASD(自闭症谱系障碍)的儿童在表达情感和与环境的社会互动方面存在困难。在最近的研究中,辅助机器人被用于支持儿童的社交技能,并进行情感识别以提高机器人与儿童之间的互动质量。在本研究中,rPPG(remote photoplethysmography)信号是通过相机在机器人与ASD儿童互动过程中捕获的面部图像提取的。然后将这些信号与Empatica E4腕表获得的生理数据进行比较。对结果进行了评价,并讨论了可能影响结果的原因。尽管这些数据之间的相关性很低,但与E4腕表和面部情感识别相比,我们发现了一些优势。与从E4中获得的信号不同,当孩子移动时可以发现rPPG信号。此外,虽然不能从孩子的脸上检测到情绪,但可以获得rPPG信号。由于其他情感识别方式在自闭症儿童的机器人儿童互动中面临挑战,因此本研究的目的是将rPPG信号用于情感识别作为一种替代方法。
{"title":"RPPG Detection in Children with Autism Spectrum Disorder during Robot-child Interaction Studies","authors":"","doi":"10.1109/DICTA56598.2022.10034613","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034613","url":null,"abstract":"Children with ASD (Autism spectrum disorder) have difficulties in expressing their emotions and social interaction with their environment. In recent studies, asistive robots are used to support children's social skills and emotion recognition is performed to improve the quality of the interaction between the robot and the child. In this study, rPPG(remote photoplethysmography) signals were extracted using face images that were captured through camera during the interaction between the robot and children with ASD. These signals were then compared with the physiological data obtained through Empatica E4 wristwatch. The results were evaluated and the reasons that might affect the results were discussed. Although the correlation between these data was low, some advantages were found in the results compared to both the E4 wristwatch and emotion recognition from face. Unlike the signals obtained from E4, rPPG signals can be found when the child moves. In addition, although no emotion can be detected from the child's face, rPPG signals can be obtained. The aim of the study is to use rPPG signals in emotion recognition as an alternative method since other emotion recognition modalities face with challenges during robot child interaction in children with ASD.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122795644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Stereo Saliency Detection by Modeling Concatenation Cost Volume Feature 基于连接代价体积特征建模的立体显著性检测
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034604
RGB-D image pair based salient object detection models aim to localize the salient objects in an RGB image with extra depth information about the scene provided to guide the detection process. The conventional practice for this task involves explicitly using depth as input to achieve multi-modal learning. In this paper, we observe two main issues within existing RGB-D saliency detection frameworks. Firstly, we claim that it is better to define depth as extra prior information instead of as a part of the input for RGB-D saliency detection, as we can directly perform saliency detection based only on the appearance information from the RGB image, while we cannot perform saliency detection given only the depth data. Secondly, there exists a huge domain gap in terms of the source of depth between different benchmark testing datasets, e.g. depth from Kinect and stereo cameras. In this paper, we focus on the variant of stereo image pair based saliency detection, where the depth is “implicitly” encoded in the stereo image pair for effective RGB-D saliency detection. Experimental results illustrate the effectiveness of our solution.
基于RGB- d图像对的显著目标检测模型旨在定位RGB图像中的显著目标,并提供有关场景的额外深度信息,以指导检测过程。这项任务的传统实践包括明确地使用深度作为输入来实现多模态学习。在本文中,我们观察到现有RGB-D显著性检测框架中的两个主要问题。首先,我们认为最好将深度定义为额外的先验信息,而不是作为RGB- d显著性检测输入的一部分,因为我们可以仅基于RGB图像的外观信息直接进行显著性检测,而仅给定深度数据我们无法进行显著性检测。其次,不同基准测试数据集之间在深度来源方面存在巨大的领域差距,例如来自Kinect和立体相机的深度。在本文中,我们重点研究了基于立体图像对的显著性检测的变体,其中深度被“隐式”编码在立体图像对中,以实现有效的RGB-D显著性检测。实验结果表明了该方法的有效性。
{"title":"Stereo Saliency Detection by Modeling Concatenation Cost Volume Feature","authors":"","doi":"10.1109/DICTA56598.2022.10034604","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034604","url":null,"abstract":"RGB-D image pair based salient object detection models aim to localize the salient objects in an RGB image with extra depth information about the scene provided to guide the detection process. The conventional practice for this task involves explicitly using depth as input to achieve multi-modal learning. In this paper, we observe two main issues within existing RGB-D saliency detection frameworks. Firstly, we claim that it is better to define depth as extra prior information instead of as a part of the input for RGB-D saliency detection, as we can directly perform saliency detection based only on the appearance information from the RGB image, while we cannot perform saliency detection given only the depth data. Secondly, there exists a huge domain gap in terms of the source of depth between different benchmark testing datasets, e.g. depth from Kinect and stereo cameras. In this paper, we focus on the variant of stereo image pair based saliency detection, where the depth is “implicitly” encoded in the stereo image pair for effective RGB-D saliency detection. Experimental results illustrate the effectiveness of our solution.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128185631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Deeply Domain-invariant Features for Action Recognition Around the Clock 深度学习用于动作识别的域不变特征
Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034580
Due to the numerous potential applications in visual surveillance and nighttime driving, recognizing human action in low-light conditions remains a difficult problem in computer vision. Existing methods separate action recognition and dark enhancement into two distinct steps to accomplish this task. However, Isolating the recognition and enhancement impedes end-to-end learning of the space-time representation for video action classification. This paper presents a domain adaptation-based action recognition approach that uses adversarial learning in cross-domain settings to learn cross-domain action recognition. Supervised learning can train it on a large amount of labelled data from the source domain (daytime action sequences). However, it uses deep domain invariant features to perform unsupervised learning on many unlabelled data from the target domain (nighttime action sequences). The resulting augmented model, named 3D-DiNet can be trained using standard backpropagation with an additional layer. It achieves SOTA performance on InFAR and XD145 actions datasets.
由于在视觉监控和夜间驾驶中有许多潜在的应用,在低光条件下识别人类行为仍然是计算机视觉的一个难题。现有的方法将动作识别和暗增强分为两个不同的步骤来完成这项任务。然而,将识别和增强分离开来阻碍了视频动作分类中时空表示的端到端学习。本文提出了一种基于领域自适应的动作识别方法,该方法在跨领域设置中使用对抗学习来学习跨领域的动作识别。监督学习可以在源域(白天动作序列)的大量标记数据上训练它。然而,它使用深度域不变特征对来自目标域(夜间动作序列)的许多未标记数据执行无监督学习。由此产生的增强模型,称为3D-DiNet,可以使用带有附加层的标准反向传播进行训练。它在InFAR和XD145动作数据集上实现了SOTA性能。
{"title":"Learning Deeply Domain-invariant Features for Action Recognition Around the Clock","authors":"","doi":"10.1109/DICTA56598.2022.10034580","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034580","url":null,"abstract":"Due to the numerous potential applications in visual surveillance and nighttime driving, recognizing human action in low-light conditions remains a difficult problem in computer vision. Existing methods separate action recognition and dark enhancement into two distinct steps to accomplish this task. However, Isolating the recognition and enhancement impedes end-to-end learning of the space-time representation for video action classification. This paper presents a domain adaptation-based action recognition approach that uses adversarial learning in cross-domain settings to learn cross-domain action recognition. Supervised learning can train it on a large amount of labelled data from the source domain (daytime action sequences). However, it uses deep domain invariant features to perform unsupervised learning on many unlabelled data from the target domain (nighttime action sequences). The resulting augmented model, named 3D-DiNet can be trained using standard backpropagation with an additional layer. It achieves SOTA performance on InFAR and XD145 actions datasets.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128286434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1