首页 > 最新文献

2020 Digital Image Computing: Techniques and Applications (DICTA)最新文献

英文 中文
A Novel Signature Watermarking Scheme for Identity Protection 一种新的身份保护签名水印方案
Pub Date : 2020-11-29 DOI: 10.1109/DICTA51227.2020.9363396
Sunpreet Sharma, J. Zou, G. Fang
A novel non-blind watermarking technique for identity protection is presented. The proposed watermarking scheme uses the owner's signature as the watermark through which the ownership and validity of the document can be proven and kept intact. The proposed scheme is robust, imperceptible and faster in comparison to the other state of the art methods. Experimental simulations and evaluations of the proposed method show excellent results from both objective and subjective view points.
提出了一种新的用于身份保护的非盲水印技术。提出的水印方案使用所有者的签名作为水印,通过该水印可以证明文件的所有权和有效性并保持其完整性。与其他先进的方法相比,所提出的方案具有鲁棒性、不易察觉性和速度快等特点。实验模拟和评价表明,该方法在客观和主观方面都取得了良好的效果。
{"title":"A Novel Signature Watermarking Scheme for Identity Protection","authors":"Sunpreet Sharma, J. Zou, G. Fang","doi":"10.1109/DICTA51227.2020.9363396","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363396","url":null,"abstract":"A novel non-blind watermarking technique for identity protection is presented. The proposed watermarking scheme uses the owner's signature as the watermark through which the ownership and validity of the document can be proven and kept intact. The proposed scheme is robust, imperceptible and faster in comparison to the other state of the art methods. Experimental simulations and evaluations of the proposed method show excellent results from both objective and subjective view points.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125386429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Temporal 3D RetinaNet for fish detection 用于鱼类检测的时间三维视网膜网
Pub Date : 2020-11-29 DOI: 10.1109/DICTA51227.2020.9363372
Zhou Shen, Chuong H. Nguyen
Automatic detection and tracking of fish provides valuable information for marine life science. Deep convolutional networks have been applied with some success but performance is affected by challenging imaging conditions including complex background, variation of light and the low visibility of the underwater environment. Existing works including Fast R-CNN and RetinaNet rely on single frame fish detection and suffer noisy and unreliable detections. In this paper, we propose and examine two 3D deep learning networks using temporal features to improve fish detection performance. The first one called 3D-backbone RetinaNet based 3D ResNet for temporal information is found worse than 2D RetinaNet. The second one called 3D-subnets RetinaNet based on 3D Regression subnet and Classification subnet to extract the temporal information is found better than 2D RetinaNet. To validating the performance of these networks, we also created a new fish data set which will be made publicly available with codes of the proposed networks.
鱼类的自动检测和跟踪为海洋生命科学提供了宝贵的信息。深度卷积网络的应用已经取得了一些成功,但其性能受到复杂背景、光线变化和水下环境低能见度等具有挑战性的成像条件的影响。现有的工作包括Fast R-CNN和RetinaNet依赖于单帧鱼检测,并且存在噪声和不可靠的检测。在本文中,我们提出并研究了两种使用时间特征来提高鱼类检测性能的3D深度学习网络。第一个称为3D-backbone RetinaNet的基于3D ResNet的时间信息发现比2D RetinaNet差。第二种基于三维回归子网和分类子网提取时间信息的3D-subnets retanet优于2D retanet。为了验证这些网络的性能,我们还创建了一个新的鱼类数据集,该数据集将公开提供拟议网络的代码。
{"title":"Temporal 3D RetinaNet for fish detection","authors":"Zhou Shen, Chuong H. Nguyen","doi":"10.1109/DICTA51227.2020.9363372","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363372","url":null,"abstract":"Automatic detection and tracking of fish provides valuable information for marine life science. Deep convolutional networks have been applied with some success but performance is affected by challenging imaging conditions including complex background, variation of light and the low visibility of the underwater environment. Existing works including Fast R-CNN and RetinaNet rely on single frame fish detection and suffer noisy and unreliable detections. In this paper, we propose and examine two 3D deep learning networks using temporal features to improve fish detection performance. The first one called 3D-backbone RetinaNet based 3D ResNet for temporal information is found worse than 2D RetinaNet. The second one called 3D-subnets RetinaNet based on 3D Regression subnet and Classification subnet to extract the temporal information is found better than 2D RetinaNet. To validating the performance of these networks, we also created a new fish data set which will be made publicly available with codes of the proposed networks.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116701569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Contour Detection of Multiple Moving Objects in Unconstrained Scenes using Optical Strain 基于光学应变的无约束场景中多运动物体轮廓检测
Pub Date : 2020-11-29 DOI: 10.1109/DICTA51227.2020.9363368
Maria Oliver-Parera, Julien Muzeau, P. Ladret, P. Bertolino
Moving Object Detection (MOD) is still an active area of research due to the amount of scenarios it can tackle and the different characteristics that may appear in them. Therefore, getting a unique method that performs well in all the situations becomes a challenging task. In this paper we address the MOD problem from a physical point of view: given the optical flow between two images, we propose to find its motion-boundaries by means of the optical strain, which gives information about the deformation of any vector field. As optical strain detects all the motions from a sequence, we propose to work on temporal windows and apply thresholding on them in order to separate noise from real motion. The proposed approach shows competitive results when compared to other methods on known datasets.
移动目标检测(MOD)仍然是一个活跃的研究领域,因为它可以处理大量的场景,以及可能出现在其中的不同特征。因此,找到一种在所有情况下都表现良好的独特方法成为一项具有挑战性的任务。在本文中,我们从物理的角度来解决MOD问题:给定两个图像之间的光流,我们建议通过光学应变来找到它的运动边界,光学应变提供了关于任何矢量场变形的信息。由于光学应变检测序列中的所有运动,我们建议对时间窗口进行处理并对其应用阈值,以便从真实运动中分离噪声。在已知数据集上,与其他方法相比,该方法显示出具有竞争力的结果。
{"title":"Contour Detection of Multiple Moving Objects in Unconstrained Scenes using Optical Strain","authors":"Maria Oliver-Parera, Julien Muzeau, P. Ladret, P. Bertolino","doi":"10.1109/DICTA51227.2020.9363368","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363368","url":null,"abstract":"Moving Object Detection (MOD) is still an active area of research due to the amount of scenarios it can tackle and the different characteristics that may appear in them. Therefore, getting a unique method that performs well in all the situations becomes a challenging task. In this paper we address the MOD problem from a physical point of view: given the optical flow between two images, we propose to find its motion-boundaries by means of the optical strain, which gives information about the deformation of any vector field. As optical strain detects all the motions from a sequence, we propose to work on temporal windows and apply thresholding on them in order to separate noise from real motion. The proposed approach shows competitive results when compared to other methods on known datasets.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117036069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evolutionary Attention Network for Medical Image Segmentation 医学图像分割的进化关注网络
Pub Date : 2020-11-29 DOI: 10.1109/DICTA51227.2020.9363425
T. Hassanzadeh, D. Essam, R. Sarker
Medical image segmentation is an active research topic to analyse medical images to find an organ or possible abnormalities in an image. Using a Convolutional Neural Network (CNN) is a successful technique for medical image segmentation. However, developing a CNN is a difficult task, especially when it includes complex structures, such as an attention mechanism. A CNN equipped with an attention mechanism is able to focus on a specific part of an image to extract a Region Of Interest (ROI), that can play a significant role to increase the accuracy of an image segmentation. Due to the difficulty of developing an attention network, in this paper, we introduce a new evolutionary technique to generate an attention network automatically for medical image segmentation. To the best of our knowledge, this is the first attempt to create an attention network using an evolutionary technique. To do this, a new encoding model is introduced to create a network topology, along with its training parameters, to ease the complexity of developing a CNN. Also, a Genetic Algorithm (GA) is applied to evolve the networks. To show the capability of the proposed technique, we used three publicly available medical segmentation datasets. The obtained results show that the proposed model can generate networks corresponding to each dataset, such that the developed networks have high performance for medical image segmentation.
医学图像分割是一个活跃的研究课题,通过对医学图像进行分析,发现图像中的某个器官或可能存在的异常。卷积神经网络(CNN)是一种成功的医学图像分割技术。然而,开发CNN是一项艰巨的任务,特别是当它包含复杂的结构时,比如注意机制。配备了注意机制的CNN能够将注意力集中在图像的特定部分提取感兴趣区域(ROI),这对于提高图像分割的准确性具有重要作用。针对关注网络的开发困难,本文提出了一种自动生成医学图像分割关注网络的进化算法。据我们所知,这是第一次尝试使用进化技术来创建一个注意力网络。为此,引入了一种新的编码模型来创建网络拓扑及其训练参数,以减轻开发CNN的复杂性。同时,采用遗传算法对网络进行演化。为了展示所提出的技术的能力,我们使用了三个公开可用的医疗分割数据集。实验结果表明,该模型能够生成与每个数据集相对应的网络,使得所开发的网络具有较高的医学图像分割性能。
{"title":"Evolutionary Attention Network for Medical Image Segmentation","authors":"T. Hassanzadeh, D. Essam, R. Sarker","doi":"10.1109/DICTA51227.2020.9363425","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363425","url":null,"abstract":"Medical image segmentation is an active research topic to analyse medical images to find an organ or possible abnormalities in an image. Using a Convolutional Neural Network (CNN) is a successful technique for medical image segmentation. However, developing a CNN is a difficult task, especially when it includes complex structures, such as an attention mechanism. A CNN equipped with an attention mechanism is able to focus on a specific part of an image to extract a Region Of Interest (ROI), that can play a significant role to increase the accuracy of an image segmentation. Due to the difficulty of developing an attention network, in this paper, we introduce a new evolutionary technique to generate an attention network automatically for medical image segmentation. To the best of our knowledge, this is the first attempt to create an attention network using an evolutionary technique. To do this, a new encoding model is introduced to create a network topology, along with its training parameters, to ease the complexity of developing a CNN. Also, a Genetic Algorithm (GA) is applied to evolve the networks. To show the capability of the proposed technique, we used three publicly available medical segmentation datasets. The obtained results show that the proposed model can generate networks corresponding to each dataset, such that the developed networks have high performance for medical image segmentation.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132621572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
WEmbSim: A Simple yet Effective Metric for Image Captioning WEmbSim:一个简单而有效的图像标题度量
Pub Date : 2020-11-29 DOI: 10.1109/DICTA51227.2020.9363392
Naeha Sharif, Lyndon White, Bennamoun, Wei Liu, Syed Afaq Ali Shah
The area of automatic image caption evaluation is still undergoing intensive research to address the needs of generating captions which can meet adequacy and fluency requirements. Based on our past attempts at developing highly sophisticated learning-based metrics, we have discovered that a simple cosine similarity measure using the Mean of Word Embeddings (MOWE) of captions can actually achieve a surprisingly high performance on unsupervised caption evaluation. This inspires our proposed work on an effective metric WEmbSim, which beats complex measures such as SPICE, CIDEr and WMD at system-level correlation with human judgments. Moreover, it also achieves the best accuracy at matching human consensus scores for caption pairs, against commonly used unsupervised methods. Therefore, we believe that WEmbSim sets a new baseline for any complex metric to be justified.
图像标题自动评价领域仍在深入研究中,以解决生成满足充分性和流畅性要求的标题的需求。基于我们过去开发高度复杂的基于学习的指标的尝试,我们发现使用标题的词嵌入均值(MOWE)的简单余弦相似度度量实际上可以在无监督的标题评估中取得惊人的高性能。这激发了我们提出的有效度量WEmbSim的工作,它在与人类判断的系统级相关性方面击败了复杂的度量,如SPICE, CIDEr和WMD。此外,与常用的无监督方法相比,它在匹配人类对标题的共识分数方面也达到了最好的准确性。因此,我们相信WEmbSim为任何复杂的度量标准设定了一个新的基准。
{"title":"WEmbSim: A Simple yet Effective Metric for Image Captioning","authors":"Naeha Sharif, Lyndon White, Bennamoun, Wei Liu, Syed Afaq Ali Shah","doi":"10.1109/DICTA51227.2020.9363392","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363392","url":null,"abstract":"The area of automatic image caption evaluation is still undergoing intensive research to address the needs of generating captions which can meet adequacy and fluency requirements. Based on our past attempts at developing highly sophisticated learning-based metrics, we have discovered that a simple cosine similarity measure using the Mean of Word Embeddings (MOWE) of captions can actually achieve a surprisingly high performance on unsupervised caption evaluation. This inspires our proposed work on an effective metric WEmbSim, which beats complex measures such as SPICE, CIDEr and WMD at system-level correlation with human judgments. Moreover, it also achieves the best accuracy at matching human consensus scores for caption pairs, against commonly used unsupervised methods. Therefore, we believe that WEmbSim sets a new baseline for any complex metric to be justified.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"74 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134162856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Max-Variance Convolutional Neural Network Model Compression 最大方差卷积神经网络模型压缩
Pub Date : 2020-11-29 DOI: 10.1109/DICTA51227.2020.9363347
Tanya Boone-Sifuentes, A. Robles-Kelly, A. Nazari
In this paper, we present a method for convolutional neural network model compression which is based on the removal of filter banks that correspond to unimportant weights. To do this, we depart from the relationship between consecutive layers so as to obtain a factor that can be used to assess the degree upon which each pair of filters are coupled to each other. This allows us to use the unit-response of the coupling between two layers so as to remove pathways int he network that are negligible. Moreover, since the back-propagation gradients tend to diminish as the chain rule is applied from the output to the input layer, here we maximise the variance on the coupling factors while enforcing a monotonicity constraint that assures the most relevant pathways are preserved. We show results on widely used networks employing classification and facial expression recognition datasets. In our experiments, our approach delivers a very competitive trade-off between compression rates and performance as compared to both, the uncompressed models and alternatives elsewhere in the literature. pages = 271–279
本文提出了一种基于去除不重要权重对应的滤波器组的卷积神经网络模型压缩方法。为了做到这一点,我们从连续层之间的关系出发,以获得一个可用于评估每对滤波器相互耦合程度的因子。这允许我们使用两层之间耦合的单位响应,从而去除网络中可以忽略不计的路径。此外,由于从输出层到输入层应用链式法则时,反向传播梯度趋于减小,因此在这里,我们最大化耦合因子的方差,同时强制执行单调性约束,以确保保留最相关的路径。我们展示了使用分类和面部表情识别数据集的广泛使用的网络的结果。在我们的实验中,我们的方法在压缩率和性能之间提供了一个非常有竞争力的权衡,与未压缩模型和其他文献中的替代方案相比。页数= 271-279
{"title":"Max-Variance Convolutional Neural Network Model Compression","authors":"Tanya Boone-Sifuentes, A. Robles-Kelly, A. Nazari","doi":"10.1109/DICTA51227.2020.9363347","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363347","url":null,"abstract":"In this paper, we present a method for convolutional neural network model compression which is based on the removal of filter banks that correspond to unimportant weights. To do this, we depart from the relationship between consecutive layers so as to obtain a factor that can be used to assess the degree upon which each pair of filters are coupled to each other. This allows us to use the unit-response of the coupling between two layers so as to remove pathways int he network that are negligible. Moreover, since the back-propagation gradients tend to diminish as the chain rule is applied from the output to the input layer, here we maximise the variance on the coupling factors while enforcing a monotonicity constraint that assures the most relevant pathways are preserved. We show results on widely used networks employing classification and facial expression recognition datasets. In our experiments, our approach delivers a very competitive trade-off between compression rates and performance as compared to both, the uncompressed models and alternatives elsewhere in the literature. pages = 271–279","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134084778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A-DeepPixBis: Attentional Angular Margin for Face Anti-Spoofing A-DeepPixBis:脸部防欺骗的注意角边缘
Pub Date : 2020-11-29 DOI: 10.1109/DICTA51227.2020.9363382
M. Hossain, L. Rupty, Koushik Roy, Mohammed Hasan, Shirshajit Sengupta, Nabeel Mohammed
Face Anti Spoofing (FAS) systems are used to identify malicious spoofing attempts targeting face recognition systems using mediums such as video replay or printed papers. With increasing adoption of face recognition technology as a biometric authentication method, FAS techniques are gaining in importance. From a learning perspective, such systems pose a binary classification task. When implemented with Neural Network based solutions, it is common to use the binary cross entropy (BCE) function as the loss to optimize. In this study, we propose a variant of BCE that enforces a margin in angular space and incorporate it in training the DeepPixBis model [1]. In addition, we also present a method to incorporate such a loss for attentive pixel wise supervision applicable in a fully convolutional setting. Our proposed approach achieves competitive scores in both intra and inter-dataset testing on multiple benchmark datasets, consistently outperforming vanilla DeepPixBis. Interestingly, in the case of Protocol 4 of OULU-NPU, considered to be the hardest protocol, our proposed method achieves 5.22% ACER, which is only 0.22% higher than the current State of the Art without requiring any expensive Neural Architecture Search.
人脸反欺骗(FAS)系统用于识别利用视频回放或印刷纸张等媒介针对人脸识别系统的恶意欺骗企图。随着人脸识别技术作为一种生物特征认证方法被越来越多地采用,人脸识别技术越来越受到重视。从学习的角度来看,这样的系统构成了一个二元分类任务。当使用基于神经网络的解决方案实现时,通常使用二进制交叉熵(BCE)函数作为损失进行优化。在本研究中,我们提出了BCE的一种变体,该变体在角空间中强制执行边缘,并将其纳入deepppixbis模型的训练[1]。此外,我们还提出了一种方法,将这种损失纳入到适用于全卷积设置的细心像素明智监督中。我们提出的方法在多个基准数据集的数据集内部和数据集间测试中都取得了具有竞争力的分数,始终优于传统的DeepPixBis。有趣的是,在OULU-NPU的协议4(被认为是最难的协议)的情况下,我们提出的方法实现了5.22%的ACER,仅比目前的技术水平高0.22%,而不需要任何昂贵的神经架构搜索。
{"title":"A-DeepPixBis: Attentional Angular Margin for Face Anti-Spoofing","authors":"M. Hossain, L. Rupty, Koushik Roy, Mohammed Hasan, Shirshajit Sengupta, Nabeel Mohammed","doi":"10.1109/DICTA51227.2020.9363382","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363382","url":null,"abstract":"Face Anti Spoofing (FAS) systems are used to identify malicious spoofing attempts targeting face recognition systems using mediums such as video replay or printed papers. With increasing adoption of face recognition technology as a biometric authentication method, FAS techniques are gaining in importance. From a learning perspective, such systems pose a binary classification task. When implemented with Neural Network based solutions, it is common to use the binary cross entropy (BCE) function as the loss to optimize. In this study, we propose a variant of BCE that enforces a margin in angular space and incorporate it in training the DeepPixBis model [1]. In addition, we also present a method to incorporate such a loss for attentive pixel wise supervision applicable in a fully convolutional setting. Our proposed approach achieves competitive scores in both intra and inter-dataset testing on multiple benchmark datasets, consistently outperforming vanilla DeepPixBis. Interestingly, in the case of Protocol 4 of OULU-NPU, considered to be the hardest protocol, our proposed method achieves 5.22% ACER, which is only 0.22% higher than the current State of the Art without requiring any expensive Neural Architecture Search.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131347232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Generalised Zero-shot Learning with Multi-modal Embedding Spaces 基于多模态嵌入空间的广义零学习
Pub Date : 2020-11-29 DOI: 10.1109/DICTA51227.2020.9363405
Rafael Felix, M. Sasdelli, Ben Harwood, G. Carneiro
Generalised zero-shot learning (GZSL) methods aim to classify previously seen and unseen visual classes by leveraging the semantic information of those classes. In the context of GZSL, semantic information is non-visual data such as a text description of the seen and unseen classes. Previous GZSL methods have explored transformations between visual and semantic spaces, as well as the learning of a latent joint visual and semantic space. In these methods, even though learning has explored a combination of spaces (i.e., visual, semantic or joint latent space), inference tended to focus on using just one of the spaces. By hypothesising that inference must explore all three spaces, we propose a new GZSL method based on a multimodal classification over visual, semantic and joint latent spaces. Another issue affecting current GZSL methods is the intrinsic bias toward the classification of seen classes - a problem that is usually mitigated by a domain classifier which modulates seen and unseen classification. Our proposed approach replaces the modulated classification by a computationally simpler multidomain classification based on averaging the multi-modal calibrated classifiers from the seen and unseen domains. Experiments on GZSL benchmarks show that our proposed GZSL approach achieves competitive results compared with the state-of-the-art.
广义零次学习(GZSL)方法的目的是利用类的语义信息对以前见过和未见过的视觉类进行分类。在GZSL的上下文中,语义信息是非可视数据,例如已见和未见类的文本描述。以前的GZSL方法已经探索了视觉和语义空间之间的转换,以及潜在的视觉和语义联合空间的学习。在这些方法中,即使学习探索了空间的组合(即视觉、语义或联合潜在空间),推理往往只关注使用其中一个空间。通过假设推理必须探索所有三个空间,我们提出了一种基于视觉、语义和联合潜在空间的多模态分类的新的GZSL方法。影响当前GZSL方法的另一个问题是对可见类分类的固有偏见——这个问题通常通过调节可见和不可见分类的领域分类器来缓解。我们提出的方法用基于从可见域和不可见域平均多模态校准分类器的计算更简单的多域分类器取代调制分类。在GZSL基准测试上的实验表明,我们提出的GZSL方法与目前最先进的方法相比取得了具有竞争力的结果。
{"title":"Generalised Zero-shot Learning with Multi-modal Embedding Spaces","authors":"Rafael Felix, M. Sasdelli, Ben Harwood, G. Carneiro","doi":"10.1109/DICTA51227.2020.9363405","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363405","url":null,"abstract":"Generalised zero-shot learning (GZSL) methods aim to classify previously seen and unseen visual classes by leveraging the semantic information of those classes. In the context of GZSL, semantic information is non-visual data such as a text description of the seen and unseen classes. Previous GZSL methods have explored transformations between visual and semantic spaces, as well as the learning of a latent joint visual and semantic space. In these methods, even though learning has explored a combination of spaces (i.e., visual, semantic or joint latent space), inference tended to focus on using just one of the spaces. By hypothesising that inference must explore all three spaces, we propose a new GZSL method based on a multimodal classification over visual, semantic and joint latent spaces. Another issue affecting current GZSL methods is the intrinsic bias toward the classification of seen classes - a problem that is usually mitigated by a domain classifier which modulates seen and unseen classification. Our proposed approach replaces the modulated classification by a computationally simpler multidomain classification based on averaging the multi-modal calibrated classifiers from the seen and unseen domains. Experiments on GZSL benchmarks show that our proposed GZSL approach achieves competitive results compared with the state-of-the-art.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124089178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Synthetic Data for the Analysis of Archival Documents: Handwriting Determination 档案文件分析的综合数据:笔迹鉴定
Pub Date : 2020-11-29 DOI: 10.1109/DICTA51227.2020.9363410
Christian Bartz, Laurenz Seidel, Duy-Hung Nguyen, Joseph Bethge, Haojin Yang, C. Meinel
Archives contain a wealth of information and are invaluable for historical research. Thanks to digitization, many archives are preserved in a digital format, making it easier to share and access documents from an archive. Handwriting and handwritten notes are commonly found in archives and contain a lot of information that can not be extracted by analyzing documents with Optical Character Recognition (OCR) for printed text. In this paper, we present an approach for determining whether a scan of a document contains handwriting. As a preprocessing step, this approach can help to identify documents that need further analysis with a full recognition pipeline. Our method consists of a deep neural network that classifies whether a document contains handwriting. Our method is designed in such a way that we overcome the most significant challenge when working with archival data, which is the scarcity of annotated training data. To overcome this problem, we introduce a data generation method to successfully train our proposed deep neural network. Our experiments show that our model, trained on synthetic data, can achieve promising results on a real-world dataset from an art-historical archive.
档案包含了丰富的信息,对历史研究是无价的。由于数字化,许多档案都以数字格式保存,这使得共享和访问档案中的文件变得更加容易。笔迹和手写笔记在档案中很常见,包含了大量的信息,这些信息是用光学字符识别(OCR)对打印文本进行分析无法提取的。在本文中,我们提出了一种方法来确定扫描的文件是否包含手写。作为预处理步骤,这种方法可以通过完整的识别管道帮助识别需要进一步分析的文档。我们的方法由一个深度神经网络组成,该网络对文档是否包含手写进行分类。我们的方法是这样设计的,我们克服了处理档案数据时最重要的挑战,即带注释的训练数据的稀缺性。为了克服这个问题,我们引入了一种数据生成方法来成功地训练我们所提出的深度神经网络。我们的实验表明,我们的模型经过合成数据的训练,可以在来自艺术历史档案的真实数据集上取得有希望的结果。
{"title":"Synthetic Data for the Analysis of Archival Documents: Handwriting Determination","authors":"Christian Bartz, Laurenz Seidel, Duy-Hung Nguyen, Joseph Bethge, Haojin Yang, C. Meinel","doi":"10.1109/DICTA51227.2020.9363410","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363410","url":null,"abstract":"Archives contain a wealth of information and are invaluable for historical research. Thanks to digitization, many archives are preserved in a digital format, making it easier to share and access documents from an archive. Handwriting and handwritten notes are commonly found in archives and contain a lot of information that can not be extracted by analyzing documents with Optical Character Recognition (OCR) for printed text. In this paper, we present an approach for determining whether a scan of a document contains handwriting. As a preprocessing step, this approach can help to identify documents that need further analysis with a full recognition pipeline. Our method consists of a deep neural network that classifies whether a document contains handwriting. Our method is designed in such a way that we overcome the most significant challenge when working with archival data, which is the scarcity of annotated training data. To overcome this problem, we introduce a data generation method to successfully train our proposed deep neural network. Our experiments show that our model, trained on synthetic data, can achieve promising results on a real-world dataset from an art-historical archive.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128078076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Object Based Remote Sensing Using Sentinel Data 利用哨兵数据的基于对象的遥感
Pub Date : 2020-11-29 DOI: 10.1109/DICTA51227.2020.9363427
C. McLaughlin, A. Woodley, S. Geva, Timothy Chappell, W. Kelly, W. Boles, Lance De Vine, Holly Hutson
Identifying changes on the Earth's surface is one of the most fundamental aspects of Earth observation from satellite images. Historically, the predominant form of analysis has measured change at a pixel level. Here, we present a new strategy that conducts the analysis based on objects. The objects are placed inside a random forest regressor. We have tested our approach in Queensland, Australia using Sentinel data. We find that the use of object-based approach either outperforms or is comparable to alternative approaches.
从卫星图像中识别地球表面的变化是地球观测最基本的方面之一。从历史上看,主要的分析形式是在像素水平上测量变化。在这里,我们提出了一种基于对象进行分析的新策略。对象被放置在随机森林回归器中。我们在澳大利亚昆士兰用Sentinel的数据测试了我们的方法。我们发现,使用基于对象的方法优于或可与其他方法相媲美。
{"title":"Object Based Remote Sensing Using Sentinel Data","authors":"C. McLaughlin, A. Woodley, S. Geva, Timothy Chappell, W. Kelly, W. Boles, Lance De Vine, Holly Hutson","doi":"10.1109/DICTA51227.2020.9363427","DOIUrl":"https://doi.org/10.1109/DICTA51227.2020.9363427","url":null,"abstract":"Identifying changes on the Earth's surface is one of the most fundamental aspects of Earth observation from satellite images. Historically, the predominant form of analysis has measured change at a pixel level. Here, we present a new strategy that conducts the analysis based on objects. The objects are placed inside a random forest regressor. We have tested our approach in Queensland, Australia using Sentinel data. We find that the use of object-based approach either outperforms or is comparable to alternative approaches.","PeriodicalId":348164,"journal":{"name":"2020 Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129153175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2020 Digital Image Computing: Techniques and Applications (DICTA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1