首页 > 最新文献

2018 Digital Image Computing: Techniques and Applications (DICTA)最新文献

英文 中文
In Situ Cane Toad Recognition 就地蔗蟾蜍识别
Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615780
D. Konovalov, Simindokht Jahangard, L. Schwarzkopf
Cane toads are invasive, toxic to native predators, compete with native insectivores, and have a devastating impact on Australian ecosystems, prompting the Australian government to list toads as a key threatening process under the Environment Protection and Biodiversity Conservation Act 1999. Mechanical cane toad traps could be made more native-fauna friendly if they could distinguish invasive cane toads from native species. Here we designed and trained a Convolution Neural Network (CNN) starting from the Xception CNN. The XToadGmp toad-recognition CNN we developed was trained end-to-end using heat-map Gaussian targets. After training, XToadGmp required minimum image pre/post-processing and when tested on 720×1280 shaped images, it achieved 97.1% classification accuracy on 1863 toad and 2892 not-toad test images, which were not used in training.
甘蔗蟾蜍具有入侵性,对本地捕食者有毒,与本地食虫动物竞争,并对澳大利亚的生态系统产生破坏性影响,促使澳大利亚政府将蟾蜍列为1999年《环境保护和生物多样性保护法》下的关键威胁过程。如果能够区分入侵蔗蜍和本地蔗蜍,则可以使机械蔗蜍陷阱对本地动物更加友好。在这里,我们设计并训练了一个卷积神经网络(CNN),从异常CNN开始。我们开发的XToadGmp蟾蜍识别CNN使用热图高斯目标进行端到端训练。经过训练后,XToadGmp对图像预处理/后处理要求最低,在720×1280形状图像上进行测试时,对未用于训练的1863张蟾蜍和2892张非蟾蜍测试图像的分类准确率达到97.1%。
{"title":"In Situ Cane Toad Recognition","authors":"D. Konovalov, Simindokht Jahangard, L. Schwarzkopf","doi":"10.1109/DICTA.2018.8615780","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615780","url":null,"abstract":"Cane toads are invasive, toxic to native predators, compete with native insectivores, and have a devastating impact on Australian ecosystems, prompting the Australian government to list toads as a key threatening process under the Environment Protection and Biodiversity Conservation Act 1999. Mechanical cane toad traps could be made more native-fauna friendly if they could distinguish invasive cane toads from native species. Here we designed and trained a Convolution Neural Network (CNN) starting from the Xception CNN. The XToadGmp toad-recognition CNN we developed was trained end-to-end using heat-map Gaussian targets. After training, XToadGmp required minimum image pre/post-processing and when tested on 720×1280 shaped images, it achieved 97.1% classification accuracy on 1863 toad and 2892 not-toad test images, which were not used in training.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131232544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Table Detection in Document Images using Foreground and Background Features 使用前景和背景特征的文档图像中的表检测
Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615795
Saman Arif, F. Shafait
Table detection is an important step in many document analysis systems. It is a difficult problem due to the variety of table layouts, encoding techniques and the similarity of tabular regions with non-tabular document elements. Earlier approaches of table detection are based on heuristic rules or require additional PDF metadata. Recently proposed methods based on machine learning have shown good results. This paper demonstrates performance improvement to these table detection techniques. The proposed solution is based on the observation that tables tend to contain more numeric data and hence it applies color coding/coloration as a signal for telling apart numeric and textual data. Deep learning based Faster R-CNN is used for detection of tabular regions from document images. To gauge the performance of our proposed solution, publicly available UNLV dataset is used. Performance measures indicate improvement when compared with best in-class strategies.
表检测是许多文档分析系统中的一个重要步骤。由于表格布局、编码技术的多样性以及表格区域与非表格文档元素的相似性,这是一个难题。早期的表检测方法基于启发式规则,或者需要额外的PDF元数据。最近提出的基于机器学习的方法已经显示出良好的效果。本文演示了这些表检测技术的性能改进。提出的解决方案是基于观察到表往往包含更多的数字数据,因此它应用颜色编码/颜色作为区分数字和文本数据的信号。基于深度学习的Faster R-CNN用于从文档图像中检测表格区域。为了衡量我们提出的解决方案的性能,使用了公开可用的UNLV数据集。与同类最佳策略相比,性能指标表明有所改善。
{"title":"Table Detection in Document Images using Foreground and Background Features","authors":"Saman Arif, F. Shafait","doi":"10.1109/DICTA.2018.8615795","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615795","url":null,"abstract":"Table detection is an important step in many document analysis systems. It is a difficult problem due to the variety of table layouts, encoding techniques and the similarity of tabular regions with non-tabular document elements. Earlier approaches of table detection are based on heuristic rules or require additional PDF metadata. Recently proposed methods based on machine learning have shown good results. This paper demonstrates performance improvement to these table detection techniques. The proposed solution is based on the observation that tables tend to contain more numeric data and hence it applies color coding/coloration as a signal for telling apart numeric and textual data. Deep learning based Faster R-CNN is used for detection of tabular regions from document images. To gauge the performance of our proposed solution, publicly available UNLV dataset is used. Performance measures indicate improvement when compared with best in-class strategies.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"447 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115606782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
DICTA 2018 Keynotes
Pub Date : 2018-12-01 DOI: 10.1109/dicta.2018.8615756
{"title":"DICTA 2018 Keynotes","authors":"","doi":"10.1109/dicta.2018.8615756","DOIUrl":"https://doi.org/10.1109/dicta.2018.8615756","url":null,"abstract":"","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115018684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DGDI: A Dataset for Detecting Glomeruli on Renal Direct Immunofluorescence DGDI:肾直接免疫荧光检测肾小球的数据集
Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615769
Kun Zhao, Yuliang Tang, Teng Zhang, J. Carvajal, Daniel F. Smith, A. Wiliem, Peter Hobson, A. Jennings, B. Lovell
With the growing popularity of whole slide scanners, there is a high demand to develop computer aided diagnostic techniques for this new digitized pathology data. The ability to extract effective information from digital slides, which serve as fundamental representations of the prognostic data patterns or structures, provides promising opportunities to improve the accuracy of automatic disease diagnosis. The recent advances in computer vision have shown that Convolutional Neural Networks (CNNs) can be used to analyze digitized pathology images providing more consistent and objective information to the pathologists. In this paper, to advance the progress in developing computer aided diagnosis systems for renal direct immunofluorescence test, we introduce a new benchmark dataset for Detecting Glomeruli on renal Direct Immunofluorescence (DGDI). To build the baselines, we investigate various CNN-based detectors on DGDI. Experiments demonstrate that DGDI well represents the challenges of renal direct immunofluorescence image analysis and encourages the progress in developing new approaches for understanding renal disease.
随着全切片扫描仪的日益普及,对这种新型数字化病理数据的计算机辅助诊断技术提出了很高的要求。从数字载玻片中提取有效信息的能力,作为预后数据模式或结构的基本表示,为提高自动疾病诊断的准确性提供了有希望的机会。计算机视觉的最新进展表明,卷积神经网络(cnn)可以用于分析数字化病理图像,为病理学家提供更加一致和客观的信息。为了促进肾脏直接免疫荧光检测计算机辅助诊断系统的发展,我们介绍了一个新的肾脏直接免疫荧光检测肾小球的基准数据集。为了建立基线,我们在DGDI上研究了各种基于cnn的检测器。实验表明,DGDI很好地代表了肾脏直接免疫荧光图像分析的挑战,并鼓励开发新的方法来了解肾脏疾病。
{"title":"DGDI: A Dataset for Detecting Glomeruli on Renal Direct Immunofluorescence","authors":"Kun Zhao, Yuliang Tang, Teng Zhang, J. Carvajal, Daniel F. Smith, A. Wiliem, Peter Hobson, A. Jennings, B. Lovell","doi":"10.1109/DICTA.2018.8615769","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615769","url":null,"abstract":"With the growing popularity of whole slide scanners, there is a high demand to develop computer aided diagnostic techniques for this new digitized pathology data. The ability to extract effective information from digital slides, which serve as fundamental representations of the prognostic data patterns or structures, provides promising opportunities to improve the accuracy of automatic disease diagnosis. The recent advances in computer vision have shown that Convolutional Neural Networks (CNNs) can be used to analyze digitized pathology images providing more consistent and objective information to the pathologists. In this paper, to advance the progress in developing computer aided diagnosis systems for renal direct immunofluorescence test, we introduce a new benchmark dataset for Detecting Glomeruli on renal Direct Immunofluorescence (DGDI). To build the baselines, we investigate various CNN-based detectors on DGDI. Experiments demonstrate that DGDI well represents the challenges of renal direct immunofluorescence image analysis and encourages the progress in developing new approaches for understanding renal disease.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121888617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Left Ventricle Volume Measuring using Echocardiography Sequences 超声心动图序列测量左心室容积
Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615766
Yi Guo, S. Green, L. Park, Lauren Rispen
Measuring left ventricle (LV) volume is a challenging problem in physiological study. One of the non-intrusive methods that is possible for this task is echocardiography. By extracting left ventricle area from ultrasound images, the volume can be approximated by the size of the left ventricle area. The core of the problem becomes the identification of the left ventricle in noisy images considering spatial temporal information. We propose adaptive sparse smoothing for left ventricle segmentation for each frame in echocardiography video for the benefit of robustness against strong speckle noise in ultrasound imagery. Then we adjust the identified left ventricle areas (as curves in polar coordinate system) further by a fixed rank principal component analysis as post processing. This method is tested on two data sets with labelled left ventricle areas for some frames by expert physiologist and compared against active contour based method. The experimental results show clearly that the proposed method has better accuracy than that of the competitor.
左心室容积测量是生理学研究中的一个难题。其中一种非侵入性的方法是超声心动图。通过从超声图像中提取左心室面积,可以用左心室面积的大小来近似计算容积。该问题的核心是在考虑时空信息的噪声图像中识别左心室。我们提出了自适应稀疏平滑左心室分割的超声心动图视频的每一帧的优势,鲁棒性强的斑点噪声在超声图像。然后通过固定秩主成分分析作为后处理进一步调整识别出的左心室面积(作为极坐标系曲线)。由生理学专家在两个数据集上对部分帧的左心室区域进行了测试,并与基于活动轮廓的方法进行了比较。实验结果表明,该方法具有较好的精度。
{"title":"Left Ventricle Volume Measuring using Echocardiography Sequences","authors":"Yi Guo, S. Green, L. Park, Lauren Rispen","doi":"10.1109/DICTA.2018.8615766","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615766","url":null,"abstract":"Measuring left ventricle (LV) volume is a challenging problem in physiological study. One of the non-intrusive methods that is possible for this task is echocardiography. By extracting left ventricle area from ultrasound images, the volume can be approximated by the size of the left ventricle area. The core of the problem becomes the identification of the left ventricle in noisy images considering spatial temporal information. We propose adaptive sparse smoothing for left ventricle segmentation for each frame in echocardiography video for the benefit of robustness against strong speckle noise in ultrasound imagery. Then we adjust the identified left ventricle areas (as curves in polar coordinate system) further by a fixed rank principal component analysis as post processing. This method is tested on two data sets with labelled left ventricle areas for some frames by expert physiologist and compared against active contour based method. The experimental results show clearly that the proposed method has better accuracy than that of the competitor.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122182999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Convolutional 3D Attention Network for Video Based Freezing of Gait Recognition 基于视频冻结的卷积三维注意网络步态识别
Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615791
Renfei Sun, Zhiyong Wang, K. E. Martens, S. Lewis
Freezing of gait (FoG) is defined as a brief, episodic absence or marked reduction of forward progression of the feet despite the intention to walk. It is a typical symptom of Parkinson's disease (PD) and has a significant impact on the life quality of PD patients. Generally trained experts need to review the gait of a patient for clinical diagnosis, which is time consuming and subjective. Nowadays, automatic FoG identification from videos provides a promising solution to address these issues by formulating FoG identification as a human action recognition task. However, most existing human action recognition algorithms are limited in this task as FoG is very subtle and can be easily overlooked when being interfered with by irrelevant motion. In this paper, we propose a novel action recognition algorithm, namely convolutional 3D attention network (C3DAN), to address this issue by learning an informative region for more effective recognition. The network consists of two main parts: Spatial Attention Network (SAN) and 3-dimensional convolutional network (C3D). SAN aims to generate an attention region from coarse to fine, while C3D extracts discriminative features. Our proposed approach is able to localize attention region without manual annotation and to extract discriminative features in an end-to-end way. We evaluate our proposed C3DAN method on a video dataset collected from 45 PD patients in a clinical setting for the quantification of FoG in PD. We obtained sensitivity of 68.2%, specificity of 80.8% and accuracy of 79.3%, which outperformed several state-of-the-art human action recognition methods. To the best of our knowledge, our work is one of the first studies detecting FoG from clinical videos.
步态冻结(FoG)被定义为尽管有行走的意图,但短暂的,间歇性的缺乏或脚向前进展的显着减少。它是帕金森病(PD)的典型症状,对PD患者的生活质量有显著影响。一般训练有素的专家需要检查病人的步态进行临床诊断,这是耗时和主观的。目前,从视频中自动识别FoG为解决这些问题提供了一个很有前途的解决方案,它将FoG识别制定为一个人类动作识别任务。然而,大多数现有的人类动作识别算法在这项任务中都受到限制,因为FoG非常微妙,当被无关运动干扰时很容易被忽略。在本文中,我们提出了一种新的动作识别算法,即卷积三维注意网络(C3DAN),通过学习一个信息区域来解决这个问题,从而更有效地识别。该网络主要由两部分组成:空间注意网络(SAN)和三维卷积网络(C3D)。SAN的目标是生成一个由粗到细的注意区域,而C3D则是提取判别特征。我们提出的方法能够在不需要人工标注的情况下定位注意区域,并以端到端方式提取判别特征。我们在临床环境中收集了45名PD患者的视频数据集,以评估我们提出的C3DAN方法,用于量化PD中的FoG。我们获得的灵敏度为68.2%,特异性为80.8%,准确率为79.3%,优于几种最先进的人体动作识别方法。据我们所知,我们的工作是最早从临床视频中检测FoG的研究之一。
{"title":"Convolutional 3D Attention Network for Video Based Freezing of Gait Recognition","authors":"Renfei Sun, Zhiyong Wang, K. E. Martens, S. Lewis","doi":"10.1109/DICTA.2018.8615791","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615791","url":null,"abstract":"Freezing of gait (FoG) is defined as a brief, episodic absence or marked reduction of forward progression of the feet despite the intention to walk. It is a typical symptom of Parkinson's disease (PD) and has a significant impact on the life quality of PD patients. Generally trained experts need to review the gait of a patient for clinical diagnosis, which is time consuming and subjective. Nowadays, automatic FoG identification from videos provides a promising solution to address these issues by formulating FoG identification as a human action recognition task. However, most existing human action recognition algorithms are limited in this task as FoG is very subtle and can be easily overlooked when being interfered with by irrelevant motion. In this paper, we propose a novel action recognition algorithm, namely convolutional 3D attention network (C3DAN), to address this issue by learning an informative region for more effective recognition. The network consists of two main parts: Spatial Attention Network (SAN) and 3-dimensional convolutional network (C3D). SAN aims to generate an attention region from coarse to fine, while C3D extracts discriminative features. Our proposed approach is able to localize attention region without manual annotation and to extract discriminative features in an end-to-end way. We evaluate our proposed C3DAN method on a video dataset collected from 45 PD patients in a clinical setting for the quantification of FoG in PD. We obtained sensitivity of 68.2%, specificity of 80.8% and accuracy of 79.3%, which outperformed several state-of-the-art human action recognition methods. To the best of our knowledge, our work is one of the first studies detecting FoG from clinical videos.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122837838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Size-Invariant Attention Accuracy Metric for Image Captioning with High-Resolution Residual Attention 具有高分辨率剩余注意的图像标题尺寸不变注意精度度量
Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615788
Zongjian Zhang, Qiang Wu, Yang Wang, Fang Chen
Spatial visual attention mechanisms have achieved significant performance improvements for image captioning. To quantitatively evaluate the performances of attention mechanisms, the "attention correctness" metric has been proposed to calculate the sum of attention weights generated for ground truth regions. However, this metric cannot consistently measure the attention accuracy among the element regions with large size variance. Moreover, its evaluations are inconsistent with captioning performances across different fine-grained attention resolutions. To address these problems, this paper proposes a size-invariant evaluation metric by normalizing the "attention correctness" metric with the size percentage of the attended region. To demonstrate the efficiency of our size-invariant metric, this paper further proposes a high-resolution residual attention model that uses RefineNet as the Fully Convolutional Network (FCN) encoder. By using the COCO-Stuff dataset, we can achieve pixel-level evaluations on both object and "stuff" regions. We use our metric to evaluate the proposed attention model across four high fine-grained resolutions (i.e., 27×27, 40×40, 60×60, 80×80). The results demonstrate that, compared with the "attention correctness" metric, our size-invariant metric is more consistent with the captioning performances and is more efficient for evaluating the attention accuracy.
空间视觉注意机制在图像字幕方面取得了显著的性能改进。为了定量评价注意机制的性能,提出了“注意正确性”度量来计算为地面真区生成的注意权值的总和。然而,该指标不能一致地衡量元素区域之间的注意准确性,且差异较大。此外,它的评价与字幕在不同细粒度注意力分辨率下的表现不一致。为了解决这些问题,本文提出了一个大小不变的评价指标,通过将“注意正确性”指标规范化为被关注区域的大小百分比。为了证明我们的尺寸不变度量的有效性,本文进一步提出了一个高分辨率的剩余注意力模型,该模型使用RefineNet作为全卷积网络(FCN)编码器。通过使用COCO-Stuff数据集,我们可以在对象和“材料”区域上实现像素级的评估。我们使用我们的度量来评估四种高细粒度分辨率(即27×27, 40×40, 60×60, 80×80)的建议的注意力模型。结果表明,与“注意正确性”度量相比,我们的尺寸不变度量更符合字幕的表现,更有效地评价字幕的注意准确性。
{"title":"Size-Invariant Attention Accuracy Metric for Image Captioning with High-Resolution Residual Attention","authors":"Zongjian Zhang, Qiang Wu, Yang Wang, Fang Chen","doi":"10.1109/DICTA.2018.8615788","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615788","url":null,"abstract":"Spatial visual attention mechanisms have achieved significant performance improvements for image captioning. To quantitatively evaluate the performances of attention mechanisms, the \"attention correctness\" metric has been proposed to calculate the sum of attention weights generated for ground truth regions. However, this metric cannot consistently measure the attention accuracy among the element regions with large size variance. Moreover, its evaluations are inconsistent with captioning performances across different fine-grained attention resolutions. To address these problems, this paper proposes a size-invariant evaluation metric by normalizing the \"attention correctness\" metric with the size percentage of the attended region. To demonstrate the efficiency of our size-invariant metric, this paper further proposes a high-resolution residual attention model that uses RefineNet as the Fully Convolutional Network (FCN) encoder. By using the COCO-Stuff dataset, we can achieve pixel-level evaluations on both object and \"stuff\" regions. We use our metric to evaluate the proposed attention model across four high fine-grained resolutions (i.e., 27×27, 40×40, 60×60, 80×80). The results demonstrate that, compared with the \"attention correctness\" metric, our size-invariant metric is more consistent with the captioning performances and is more efficient for evaluating the attention accuracy.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124269994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Image Enhancement for Face Recognition in Adverse Environments 不利环境下人脸识别的图像增强
Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615793
D. Kamenetsky, Sau Yee Yiu, Martyn Hole
Face recognition in adverse environments, such as at long distances or in low light conditions, remains a challenging task for current state-of-the-art face matching algorithms. The facial images taken in these conditions are often low resolution and low quality due to the effects of atmospheric turbulence and/or insufficient amount of light reaching the camera. In this work, we use an atmospheric turbulence mitigation algorithm (MPE) to enhance low resolution RGB videos of faces captured either at long distances or in low light conditions. Due to its interactive nature, MPE is tuned to work well in each specific environment. We also propose three image enhancement techniques that further improve the images produced by MPE: two for low light imagery (MPEf and fMPE) and one for long distance imagery (MPEh). Experimental results show that all three methods significantly improve the image quality and face recognition performance, allowing effective face recognition in almost complete darkness (at close range) or at distances up to 200m (in daylight).
对于当前最先进的人脸匹配算法来说,在远距离或弱光条件下的恶劣环境下的人脸识别仍然是一项具有挑战性的任务。由于大气湍流和/或到达相机的光线不足的影响,在这些条件下拍摄的面部图像通常是低分辨率和低质量的。在这项工作中,我们使用大气湍流缓解算法(MPE)来增强在远距离或弱光条件下拍摄的低分辨率RGB人脸视频。由于其交互性,MPE可以在每个特定环境中很好地工作。我们还提出了三种图像增强技术,以进一步改善MPE产生的图像:两种用于弱光图像(MPEf和fMPE),一种用于远距离图像(MPEh)。实验结果表明,这三种方法都显著提高了图像质量和人脸识别性能,可以在几乎完全黑暗(近距离)或200米(白天)的距离下进行有效的人脸识别。
{"title":"Image Enhancement for Face Recognition in Adverse Environments","authors":"D. Kamenetsky, Sau Yee Yiu, Martyn Hole","doi":"10.1109/DICTA.2018.8615793","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615793","url":null,"abstract":"Face recognition in adverse environments, such as at long distances or in low light conditions, remains a challenging task for current state-of-the-art face matching algorithms. The facial images taken in these conditions are often low resolution and low quality due to the effects of atmospheric turbulence and/or insufficient amount of light reaching the camera. In this work, we use an atmospheric turbulence mitigation algorithm (MPE) to enhance low resolution RGB videos of faces captured either at long distances or in low light conditions. Due to its interactive nature, MPE is tuned to work well in each specific environment. We also propose three image enhancement techniques that further improve the images produced by MPE: two for low light imagery (MPEf and fMPE) and one for long distance imagery (MPEh). Experimental results show that all three methods significantly improve the image quality and face recognition performance, allowing effective face recognition in almost complete darkness (at close range) or at distances up to 200m (in daylight).","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126059532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Object Classification using Deep Learning on Extremely Low-Resolution Time-of-Flight Data 在极低分辨率飞行时间数据上使用深度学习的目标分类
Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615877
Ana Daysi Ruvalcaba-Cardenas, T. Scoleri, Geoffrey Day
This paper proposes two novel deep learning models for 2D and 3D classification of objects in extremely low-resolution time-of-flight imagery. The models have been developed to suit contemporary range imaging hardware based on a recently fabricated Single Photon Avalanche Diode (SPAD) camera with 64 χ 64 pixel resolution. Being the first prototype of its kind, only a small data set has been collected so far which makes it challenging for training models. To bypass this hurdle, transfer learning is applied to the widely used VGG-16 convolutional neural network (CNN), with supplementary layers added specifically to handle SPAD data. This classifier and the renowned Faster-RCNN detector offer benchmark models for comparison to a newly created 3D CNN operating on time-of-flight data acquired by the SPAD sensor. Another contribution of this work is the proposed shot noise removal algorithm which is particularly useful to mitigate the camera sensitivity in situations of excessive lighting. Models have been tested in both low-light indoor settings and outdoor daytime conditions, on eight objects exhibiting small physical dimensions, low reflectivity, featureless structures and located at ranges from 25m to 700m. Despite antagonist factors, the proposed 2D model has achieved 95% average precision and recall, with higher accuracy for the 3D model.
本文提出了两种新的深度学习模型,用于极低分辨率飞行时间图像中物体的二维和三维分类。这些模型的开发是为了适应基于最近制造的64 x 64像素分辨率的单光子雪崩二极管(SPAD)相机的当代距离成像硬件。作为该类型的第一个原型,到目前为止只收集了一小部分数据集,这给训练模型带来了挑战。为了绕过这个障碍,迁移学习被应用到广泛使用的VGG-16卷积神经网络(CNN)中,并添加了专门用于处理SPAD数据的补充层。该分类器和著名的Faster-RCNN检测器提供了基准模型,用于与SPAD传感器获取的飞行时间数据上运行的新创建的3D CNN进行比较。这项工作的另一个贡献是提出的镜头噪声去除算法,该算法特别有助于在过度照明的情况下降低相机的灵敏度。模型在室内低光环境和室外白天条件下进行了测试,测试对象为8个物体,它们的物理尺寸小,反射率低,结构无特征,距离从25米到700米不等。尽管存在拮抗剂因素,所提出的2D模型达到了95%的平均精度和召回率,3D模型的准确率更高。
{"title":"Object Classification using Deep Learning on Extremely Low-Resolution Time-of-Flight Data","authors":"Ana Daysi Ruvalcaba-Cardenas, T. Scoleri, Geoffrey Day","doi":"10.1109/DICTA.2018.8615877","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615877","url":null,"abstract":"This paper proposes two novel deep learning models for 2D and 3D classification of objects in extremely low-resolution time-of-flight imagery. The models have been developed to suit contemporary range imaging hardware based on a recently fabricated Single Photon Avalanche Diode (SPAD) camera with 64 χ 64 pixel resolution. Being the first prototype of its kind, only a small data set has been collected so far which makes it challenging for training models. To bypass this hurdle, transfer learning is applied to the widely used VGG-16 convolutional neural network (CNN), with supplementary layers added specifically to handle SPAD data. This classifier and the renowned Faster-RCNN detector offer benchmark models for comparison to a newly created 3D CNN operating on time-of-flight data acquired by the SPAD sensor. Another contribution of this work is the proposed shot noise removal algorithm which is particularly useful to mitigate the camera sensitivity in situations of excessive lighting. Models have been tested in both low-light indoor settings and outdoor daytime conditions, on eight objects exhibiting small physical dimensions, low reflectivity, featureless structures and located at ranges from 25m to 700m. Despite antagonist factors, the proposed 2D model has achieved 95% average precision and recall, with higher accuracy for the 3D model.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128587255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Cluster-Based Crowd Movement Behavior Detection 基于集群的人群运动行为检测
Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615809
Meng Yang, Lida Rashidi, A. S. Rao, S. Rajasegarar, Mohadeseh Ganji, M. Palaniswami, C. Leckie
Crowd behaviour monitoring and prediction is an important research topic in video surveillance that has gained increasing attention. In this paper, we propose a novel architecture for crowd event detection, which comprises methods for object detection, clustering of various groups of objects, characterizing the movement patterns of the various groups of objects, detecting group events, and finding the change point of group events. In our proposed framework, we use clusters to represent the groups of objects/people present in the scene. We then extract the movement patterns of the various groups of objects over the video sequence to detect movement patterns. We define several crowd events and propose a methodology to detect the change point of the group events over time. We evaluated our scheme using six video sequences from benchmark datasets, which include events such as walking, running, global merging, local merging, global splitting and local splitting. We compared our scheme with state of the art methods and showed the superiority of our method in accurately detecting the crowd behavioral changes.
人群行为监测与预测是视频监控领域的一个重要研究课题,越来越受到人们的关注。在本文中,我们提出了一种新的群体事件检测体系结构,该体系结构包括对象检测方法、各种对象组的聚类方法、描述各种对象组的运动模式方法、检测群体事件方法和寻找群体事件变化点方法。在我们提出的框架中,我们使用集群来表示场景中存在的对象/人组。然后,我们提取视频序列上不同组对象的运动模式,以检测运动模式。我们定义了几个群体事件,并提出了一种方法来检测群体事件随时间的变化点。我们使用来自基准数据集的6个视频序列来评估我们的方案,包括步行、跑步、全局合并、局部合并、全局分裂和局部分裂等事件。我们将该方案与当前最先进的方法进行了比较,证明了该方法在准确检测人群行为变化方面的优越性。
{"title":"Cluster-Based Crowd Movement Behavior Detection","authors":"Meng Yang, Lida Rashidi, A. S. Rao, S. Rajasegarar, Mohadeseh Ganji, M. Palaniswami, C. Leckie","doi":"10.1109/DICTA.2018.8615809","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615809","url":null,"abstract":"Crowd behaviour monitoring and prediction is an important research topic in video surveillance that has gained increasing attention. In this paper, we propose a novel architecture for crowd event detection, which comprises methods for object detection, clustering of various groups of objects, characterizing the movement patterns of the various groups of objects, detecting group events, and finding the change point of group events. In our proposed framework, we use clusters to represent the groups of objects/people present in the scene. We then extract the movement patterns of the various groups of objects over the video sequence to detect movement patterns. We define several crowd events and propose a methodology to detect the change point of the group events over time. We evaluated our scheme using six video sequences from benchmark datasets, which include events such as walking, running, global merging, local merging, global splitting and local splitting. We compared our scheme with state of the art methods and showed the superiority of our method in accurately detecting the crowd behavioral changes.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124800550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
2018 Digital Image Computing: Techniques and Applications (DICTA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1