首页 > 最新文献

2018 Digital Image Computing: Techniques and Applications (DICTA)最新文献

英文 中文
Accurate Shift Estimation under One-Parameter Geometric Distortion using the Brosc Filter 基于Brosc滤波器的单参数几何失真下的精确移位估计
Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615835
P. Fletcher, Matthew R. Arnison, Eric W. Chong
Shift estimation is the task of estimating an unknown translation factor which best relates two relatively distorted representations of the same image data. Where distortion is large and also includes rotation and scaling, estimates of the global distortion can be obtained with good accuracy using RST-matching methods, but such algorithms are slow and complicated. Where geometric distortion is small, correlation-based methods can achieve millipixel accuracy. These methods begin to fail, however, when even quite small geometric distortions are present, such as rotation by 1° or 2°, or a scaling by as little as 5%. A new spatially-variant filter, the brosc filter ("better rotation or scaling"), can be used to preserve the accuracy of correlation-based shift estimation where the expected distortion can be modelled as a single parameter, for example, as a pure rotation, a pure scaling, or a pure scaling along a known axis. By applying the brosc filter before shift estimation, shift accuracy under geometric distortion is improved, and a variant of the brosc filter using complex arithmetic provides in addition an estimate of the single parameter representing the unknown distortion.
移位估计是估计一个未知的平移因子的任务,该因子最好地联系了同一图像数据的两个相对扭曲的表示。在失真较大且包含旋转和缩放的情况下,使用rst匹配方法可以获得较好的全局失真估计,但这种算法速度慢且复杂。在几何畸变较小的情况下,基于相关的方法可以达到毫像素的精度。然而,当存在很小的几何畸变时,例如旋转1°或2°,或缩放仅为5%时,这些方法开始失效。一种新的空间变化滤波器,brosc滤波器(“更好的旋转或缩放”),可用于保持基于相关的移位估计的准确性,其中预期的失真可以建模为单个参数,例如,作为纯旋转,纯缩放或沿已知轴的纯缩放。通过在移位估计前应用brosc滤波器,提高了几何畸变下的移位精度,并且使用复杂算法的brosc滤波器变体提供了对表示未知畸变的单个参数的附加估计。
{"title":"Accurate Shift Estimation under One-Parameter Geometric Distortion using the Brosc Filter","authors":"P. Fletcher, Matthew R. Arnison, Eric W. Chong","doi":"10.1109/DICTA.2018.8615835","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615835","url":null,"abstract":"Shift estimation is the task of estimating an unknown translation factor which best relates two relatively distorted representations of the same image data. Where distortion is large and also includes rotation and scaling, estimates of the global distortion can be obtained with good accuracy using RST-matching methods, but such algorithms are slow and complicated. Where geometric distortion is small, correlation-based methods can achieve millipixel accuracy. These methods begin to fail, however, when even quite small geometric distortions are present, such as rotation by 1° or 2°, or a scaling by as little as 5%. A new spatially-variant filter, the brosc filter (\"better rotation or scaling\"), can be used to preserve the accuracy of correlation-based shift estimation where the expected distortion can be modelled as a single parameter, for example, as a pure rotation, a pure scaling, or a pure scaling along a known axis. By applying the brosc filter before shift estimation, shift accuracy under geometric distortion is improved, and a variant of the brosc filter using complex arithmetic provides in addition an estimate of the single parameter representing the unknown distortion.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123180746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Descriptor-Driven Keypoint Detection 描述符驱动的关键点检测
Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615841
A. Sluzek
A methodology is proposed (and illustrated on exemplary cases) for detecting keypoints in such a way that usability of those keypoints in image matching tasks can be potentially maximized. Following the approach used for MSER detection, we localize keypoints at image patches for which the selected keypoint descriptor is maximally stable under fluctuations of the parameter(s) (e.g. image threshold, scale, shift, etc.) determining how configurations of those patches evolve. In this way, keypoint descriptors are used in the scenarios where descriptors' volatility due to minor image distortions is minimized and, thus, performances of keypoint matching are prospectively maximized. Experimental verification on selected types of keypoint descriptors fully confirmed this hypothesis. Additionally, a novel concept of semi-dense feature representation of images (based on the proposed methodology) has been preliminarily discussed and illustrated (and its prospective links with deep learning and tracking applications highlighted).
提出了一种方法(并在示例案例中说明),用于以这样的方式检测关键点,这些关键点在图像匹配任务中的可用性可以潜在地最大化。按照用于MSER检测的方法,我们在图像补丁上定位关键点,选择的关键点描述符在参数(例如图像阈值,尺度,位移等)波动下最大稳定,确定这些补丁的配置如何演变。通过这种方式,关键点描述符用于描述符由于轻微图像失真而导致的波动性最小化的场景,从而使关键点匹配的性能有望最大化。对选定类型的关键点描述符进行实验验证,充分证实了这一假设。此外,本文还初步讨论和说明了图像半密集特征表示的新概念(基于所提出的方法)(并强调了其与深度学习和跟踪应用的潜在联系)。
{"title":"Descriptor-Driven Keypoint Detection","authors":"A. Sluzek","doi":"10.1109/DICTA.2018.8615841","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615841","url":null,"abstract":"A methodology is proposed (and illustrated on exemplary cases) for detecting keypoints in such a way that usability of those keypoints in image matching tasks can be potentially maximized. Following the approach used for MSER detection, we localize keypoints at image patches for which the selected keypoint descriptor is maximally stable under fluctuations of the parameter(s) (e.g. image threshold, scale, shift, etc.) determining how configurations of those patches evolve. In this way, keypoint descriptors are used in the scenarios where descriptors' volatility due to minor image distortions is minimized and, thus, performances of keypoint matching are prospectively maximized. Experimental verification on selected types of keypoint descriptors fully confirmed this hypothesis. Additionally, a novel concept of semi-dense feature representation of images (based on the proposed methodology) has been preliminarily discussed and illustrated (and its prospective links with deep learning and tracking applications highlighted).","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"227 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123244197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Learning Models for Facial Expression Recognition 面部表情识别的深度学习模型
Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615843
Atul Sajjanhar, Zhaoqi Wu, Q. Wen
We investigate facial expression recognition using state-of-the-art classification models. Recently, CNNs have been extensively used for face recognition. However, CNNs have not been thoroughly evaluated for facial expression recognition. In this paper, we train and test a CNN model for facial expression recognition. The performance of the CNN model is used as benchmark for evaluating other pre-trained deep CNN models. We evaluate the performance of Inception and VGG which are pre-trained for object recognition, and compare these with VGG-Face which is pre-trained for face recognition. All experiments are performed on publicly available face databases, namely, CK+, JAFFE and FACES.
我们使用最先进的分类模型研究面部表情识别。近年来,cnn被广泛应用于人脸识别。然而,cnn在面部表情识别方面还没有得到彻底的评估。在本文中,我们训练和测试了一个用于面部表情识别的CNN模型。该CNN模型的性能被用作评价其他预训练深度CNN模型的基准。我们评估了Inception和VGG的性能,并将其与人脸识别的VGG- face进行了比较。所有实验均在公开的人脸数据库上进行,即CK+、JAFFE和FACES。
{"title":"Deep Learning Models for Facial Expression Recognition","authors":"Atul Sajjanhar, Zhaoqi Wu, Q. Wen","doi":"10.1109/DICTA.2018.8615843","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615843","url":null,"abstract":"We investigate facial expression recognition using state-of-the-art classification models. Recently, CNNs have been extensively used for face recognition. However, CNNs have not been thoroughly evaluated for facial expression recognition. In this paper, we train and test a CNN model for facial expression recognition. The performance of the CNN model is used as benchmark for evaluating other pre-trained deep CNN models. We evaluate the performance of Inception and VGG which are pre-trained for object recognition, and compare these with VGG-Face which is pre-trained for face recognition. All experiments are performed on publicly available face databases, namely, CK+, JAFFE and FACES.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123739878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Early Experience of Depth Estimation on Intricate Objects using Generative Adversarial Networks 基于生成对抗网络的复杂目标深度估计的早期经验
Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615783
Wai Y. K. San, Teng Zhang, Shaokang Chen, A. Wiliem, Dario Stefanelli, B. Lovell
Object parts within a scene observed by the human eye exhibit their own unique depth. Producing a single image with an accurate depth of field has many implications, namely: virtual and augmented reality, mobile robotics, digital photography and medical imaging. In this work, we aim to exploit the effectiveness of conditional Generative Adversarial Networks (GAN) to improve depth estimation from a singular inexpensive monocular sensor camera sensor. The complexity of an object shape, texture and environmental conditions make depth estimations challenging. Our approach is evaluated on our novel depth map dataset we release publicly containing the challenging photo-depth image pairs. Standard evaluation metrics against other depth map estimation techniques demonstrates the effectiveness of our approach. A study of the effectiveness of GAN on different test data is demonstrated both qualitatively and quantitatively.
人眼所观察到的场景中的物体各部分都有其独特的深度。产生具有准确景深的单幅图像具有许多含义,即:虚拟和增强现实,移动机器人,数字摄影和医学成像。在这项工作中,我们的目标是利用条件生成对抗网络(GAN)的有效性来改进单一廉价的单目传感器相机传感器的深度估计。物体形状、纹理和环境条件的复杂性使得深度估计具有挑战性。我们的方法在我们公开发布的包含具有挑战性的照片深度图像对的新型深度图数据集上进行了评估。针对其他深度图估计技术的标准评估指标证明了我们方法的有效性。本文从定性和定量两方面论证了氮化镓在不同测试数据上的有效性。
{"title":"Early Experience of Depth Estimation on Intricate Objects using Generative Adversarial Networks","authors":"Wai Y. K. San, Teng Zhang, Shaokang Chen, A. Wiliem, Dario Stefanelli, B. Lovell","doi":"10.1109/DICTA.2018.8615783","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615783","url":null,"abstract":"Object parts within a scene observed by the human eye exhibit their own unique depth. Producing a single image with an accurate depth of field has many implications, namely: virtual and augmented reality, mobile robotics, digital photography and medical imaging. In this work, we aim to exploit the effectiveness of conditional Generative Adversarial Networks (GAN) to improve depth estimation from a singular inexpensive monocular sensor camera sensor. The complexity of an object shape, texture and environmental conditions make depth estimations challenging. Our approach is evaluated on our novel depth map dataset we release publicly containing the challenging photo-depth image pairs. Standard evaluation metrics against other depth map estimation techniques demonstrates the effectiveness of our approach. A study of the effectiveness of GAN on different test data is demonstrated both qualitatively and quantitatively.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"184 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127048221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A General Approach to Segmentation in CT Grayscale Images using Variable Neighborhood Search 基于变邻域搜索的CT灰度图像分割方法
Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615823
T. Siriapisith, Worapan Kusakunniran, P. Haddawy
Medical image segmentation is essential for several tasks including pre-treatment planning and tumor monitoring. Computed tomography (CT) is the most useful imaging modality for abdominal organs and tumors, with benefits of high imaging resolution and few motion artifacts. Unfortunately, CT images contain only limited information of intensity and gradient, which makes accurate segmentation a challenge. In this paper, we propose a 2D segmentation method that applies the concept of variable neighborhood search (VNS) by iteratively alternating search through intensity and gradient spaces. By alternating between the two search spaces, the technique can escape local minima that occur when segmenting in a single search space. The main techniques used in the proposed framework are graph-cut with probability density function (GCPDF) and graph-cut based active contour (GCBAC). The presented method is quantitatively evaluated on a public clinical dataset, which includes various sizes of liver tumor, kidney and spleen. The segmentation performance is evaluated using dice similarity coefficient (DSC), Jaccard similarity coefficient (JSC), and volume difference (VD). The presented method achieves the outstanding segmentation performance with a DSC of 84.48±5.84%, 76.93±8.24%, 91.70±2.68% and 89.27±5.21%, for large liver tumor, small liver tumor, kidney and spleen, respectively.
医学图像分割对于治疗前计划和肿瘤监测等任务至关重要。计算机断层扫描(CT)是腹部器官和肿瘤最有用的成像方式,具有成像分辨率高、运动伪影少的优点。然而,CT图像只包含有限的强度和梯度信息,这给准确分割带来了挑战。本文提出了一种基于可变邻域搜索(VNS)的二维分割方法,该方法在强度空间和梯度空间中迭代交替搜索。通过在两个搜索空间之间交替,该技术可以避免在单个搜索空间中分割时出现的局部最小值。该框架中使用的主要技术是基于概率密度函数的图切(GCPDF)和基于图切的活动轮廓(GCBAC)。该方法在一个公共临床数据集上进行了定量评估,该数据集包括不同大小的肝脏肿瘤、肾脏和脾脏。使用骰子相似系数(DSC)、Jaccard相似系数(JSC)和体积差(VD)来评估分割性能。该方法对大肝肿瘤、小肝肿瘤、肾脏和脾脏的DSC分别为84.48±5.84%、76.93±8.24%、91.70±2.68%和89.27±5.21%,具有良好的分割效果。
{"title":"A General Approach to Segmentation in CT Grayscale Images using Variable Neighborhood Search","authors":"T. Siriapisith, Worapan Kusakunniran, P. Haddawy","doi":"10.1109/DICTA.2018.8615823","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615823","url":null,"abstract":"Medical image segmentation is essential for several tasks including pre-treatment planning and tumor monitoring. Computed tomography (CT) is the most useful imaging modality for abdominal organs and tumors, with benefits of high imaging resolution and few motion artifacts. Unfortunately, CT images contain only limited information of intensity and gradient, which makes accurate segmentation a challenge. In this paper, we propose a 2D segmentation method that applies the concept of variable neighborhood search (VNS) by iteratively alternating search through intensity and gradient spaces. By alternating between the two search spaces, the technique can escape local minima that occur when segmenting in a single search space. The main techniques used in the proposed framework are graph-cut with probability density function (GCPDF) and graph-cut based active contour (GCBAC). The presented method is quantitatively evaluated on a public clinical dataset, which includes various sizes of liver tumor, kidney and spleen. The segmentation performance is evaluated using dice similarity coefficient (DSC), Jaccard similarity coefficient (JSC), and volume difference (VD). The presented method achieves the outstanding segmentation performance with a DSC of 84.48±5.84%, 76.93±8.24%, 91.70±2.68% and 89.27±5.21%, for large liver tumor, small liver tumor, kidney and spleen, respectively.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"299302 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116576789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Binarization of Color Character Strings in Scene Images using Deep Neural Network 基于深度神经网络的场景图像颜色字符串二值化
Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615837
Wenjiao Bian, T. Wakahara, Tao Wu, He Tang, Jirui Lin
This paper addresses the problem of binarizing multicolored character strings in scene images with complex backgrounds and heavy image degradations. The proposed method consists of three steps. The first step is combinatorial generation of binarized images via every dichotomization of K clusters obtained by K-means clustering of constituent pixels of an input image in the HSI color space. The second step is classification of each binarized image using deep neural network into two categories: character string and non-character string. The final step is selection of a single binarized image with the highest degree of character string as an optimal binarization result. Experimental results using ICDAR 2003 robust word recognition dataset show that the proposed method achieves a correct binarization rate of 87.4% that is highly competitive with the state of the art of binarization of scene character strings.
本文研究了背景复杂、图像退化严重的场景图像中彩色字符串的二值化问题。该方法分为三个步骤。第一步是通过对HSI色彩空间中输入图像的组成像素的K-means聚类获得的K个聚类的每一个二分类来组合生成二值化图像。第二步是利用深度神经网络将二值化后的图像分为字符串和非字符串两类。最后一步是选择具有最高字符串度的单幅二值化图像作为最佳二值化结果。使用ICDAR 2003鲁棒词识别数据集进行的实验结果表明,该方法的二值化正确率达到87.4%,与当前场景字符串二值化技术相比具有很强的竞争力。
{"title":"Binarization of Color Character Strings in Scene Images using Deep Neural Network","authors":"Wenjiao Bian, T. Wakahara, Tao Wu, He Tang, Jirui Lin","doi":"10.1109/DICTA.2018.8615837","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615837","url":null,"abstract":"This paper addresses the problem of binarizing multicolored character strings in scene images with complex backgrounds and heavy image degradations. The proposed method consists of three steps. The first step is combinatorial generation of binarized images via every dichotomization of K clusters obtained by K-means clustering of constituent pixels of an input image in the HSI color space. The second step is classification of each binarized image using deep neural network into two categories: character string and non-character string. The final step is selection of a single binarized image with the highest degree of character string as an optimal binarization result. Experimental results using ICDAR 2003 robust word recognition dataset show that the proposed method achieves a correct binarization rate of 87.4% that is highly competitive with the state of the art of binarization of scene character strings.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123634370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Colour Analysis of Strawberries on a Real Time Production Line 草莓实时生产线上的颜色分析
Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615779
Gilbert Eaton, Andrew Busch, Rudi Bartels, Yongsheng Gao
A novel system has been designed where colour analysis algorithms facilitate grading ripeness of packed strawberries on a fast-paced production line. The Strawberry quality system acquires images at the rate of 2punnets/s, and feeds the images to the two algorithms. Using CIELAB and HSV colourspaces, both underripe and overripe colour features are analysed resulting in F1 scores of 94.7% and 90.6% respectively, when measured on multiple defect samples. The single defect class results scored 80.1% and 77.1%. The algorithms total time for the current hardware configuration is 121ms maximum and 80ms average, which is well below the required time window of 500ms. 105, 542 punnets have been assessed by the algorithm and has rejected 4, 952 in total (4.9%), helping to ensure the quality of the product being shipped to customers and avoiding costly returns.
设计了一种新的系统,其中颜色分析算法有助于在快节奏的生产线上对包装草莓的成熟度进行分级。草莓质量系统以2punnets/s的速度获取图像,并将图像馈送给两个算法。使用CIELAB和HSV颜色空间,分析了欠熟和过熟的颜色特征,在多个缺陷样品上测量时,F1得分分别为94.7%和90.6%。单个缺陷分类结果分别为80.1%和77.1%。当前硬件配置的算法总时间最大为121ms,平均为80ms,远低于所需的500ms时间窗口。该算法已经评估了105,542个篮子,总共拒绝了4,952个(4.9%),这有助于确保运送给客户的产品的质量,避免昂贵的退货。
{"title":"Colour Analysis of Strawberries on a Real Time Production Line","authors":"Gilbert Eaton, Andrew Busch, Rudi Bartels, Yongsheng Gao","doi":"10.1109/DICTA.2018.8615779","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615779","url":null,"abstract":"A novel system has been designed where colour analysis algorithms facilitate grading ripeness of packed strawberries on a fast-paced production line. The Strawberry quality system acquires images at the rate of 2punnets/s, and feeds the images to the two algorithms. Using CIELAB and HSV colourspaces, both underripe and overripe colour features are analysed resulting in F1 scores of 94.7% and 90.6% respectively, when measured on multiple defect samples. The single defect class results scored 80.1% and 77.1%. The algorithms total time for the current hardware configuration is 121ms maximum and 80ms average, which is well below the required time window of 500ms. 105, 542 punnets have been assessed by the algorithm and has rejected 4, 952 in total (4.9%), helping to ensure the quality of the product being shipped to customers and avoiding costly returns.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126286450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
RGB-D Fall Detection via Deep Residual Convolutional LSTM Networks 基于深度残差卷积LSTM网络的RGB-D跌落检测
Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615759
A. Abobakr, M. Hossny, Hala Abdelkader, S. Nahavandi
The development of smart healthcare environments has witnessed impressive advancements exploiting the recent technological capabilities. Since falls are considered a major health concern especially among older adults, low-cost fall detection systems have become an indispensable component in these environments. This paper proposes an integrable, privacy preserving and efficient fall detection system from depth images acquired using a Kinect RGB-D sensor. The proposed system uses an end-to-end deep learning architecture composed of convolutional and recurrent neural networks to detect fall events. The deep convolutional network (ConvNet) analyses the human body and extracts visual features from input sequence frames. Fall events are detected via modeling complex temporal dependencies between subsequent frame features using Long-Shot-Term-Memory (LSTM) recurrent neural networks. Both models are combined and jointly trained in an end-to-end ConvLSTM architecture. This allows the model to learn visual representations and complex temporal dynamics of fall motions simultaneously. The proposed method has been validated on the public URFD fall detection dataset and compared with different approaches, including accelerometer based methods. We achieved a near unity sensitivity and specificity rates in detecting fall events.
利用最新的技术能力,智能医疗环境的发展取得了令人印象深刻的进步。由于跌倒被认为是一个主要的健康问题,特别是在老年人中,低成本的跌倒检测系统已成为这些环境中不可或缺的组成部分。本文提出了一种基于Kinect RGB-D传感器深度图像的可积、隐私保护和高效的跌倒检测系统。该系统使用由卷积和循环神经网络组成的端到端深度学习架构来检测跌倒事件。深度卷积网络(ConvNet)对人体进行分析,并从输入序列帧中提取视觉特征。通过使用长短期记忆(LSTM)递归神经网络建模后续帧特征之间的复杂时间依赖性来检测跌倒事件。这两个模型在端到端的ConvLSTM体系结构中组合并联合训练。这使得模型可以同时学习视觉表征和坠落运动的复杂时间动态。该方法已在公开的URFD跌倒检测数据集上进行了验证,并与不同的方法进行了比较,包括基于加速度计的方法。我们在检测跌倒事件方面获得了接近统一的灵敏度和特异性。
{"title":"RGB-D Fall Detection via Deep Residual Convolutional LSTM Networks","authors":"A. Abobakr, M. Hossny, Hala Abdelkader, S. Nahavandi","doi":"10.1109/DICTA.2018.8615759","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615759","url":null,"abstract":"The development of smart healthcare environments has witnessed impressive advancements exploiting the recent technological capabilities. Since falls are considered a major health concern especially among older adults, low-cost fall detection systems have become an indispensable component in these environments. This paper proposes an integrable, privacy preserving and efficient fall detection system from depth images acquired using a Kinect RGB-D sensor. The proposed system uses an end-to-end deep learning architecture composed of convolutional and recurrent neural networks to detect fall events. The deep convolutional network (ConvNet) analyses the human body and extracts visual features from input sequence frames. Fall events are detected via modeling complex temporal dependencies between subsequent frame features using Long-Shot-Term-Memory (LSTM) recurrent neural networks. Both models are combined and jointly trained in an end-to-end ConvLSTM architecture. This allows the model to learn visual representations and complex temporal dynamics of fall motions simultaneously. The proposed method has been validated on the public URFD fall detection dataset and compared with different approaches, including accelerometer based methods. We achieved a near unity sensitivity and specificity rates in detecting fall events.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121980119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Fine-Grained Categorization by Deep Part-Collaboration Convolution Net 基于深度局部协作卷积网络的细粒度分类
Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615855
Qiyu Liao, H. Holewa, Min Xu, Dadong Wang
In part-based categorization context, the ability to learn representative feature from quantitative tiny object parts is of similar importance as to exactly localize the parts. We propose a new deep net structure for fine-grained categorization that follows the taxonomy workflow, which makes it interpretable and understandable for humans. By training customized sub-nets on each manually annotated parts, we increased the state-of-the-art part-based classification accuracy for general fine-grained CUB-200-2011 dataset by 2.1%. Our study shows the proposed method can produce more activation to discriminate detail part difference while maintaining high computing performance by applying a set of strategies to optimize the deep net structure.
在基于零件的分类环境中,从定量的微小物体零件中学习代表性特征的能力与精确定位零件的能力同样重要。我们提出了一种新的深度网络结构,用于细粒度分类,它遵循分类法工作流程,使其对人类来说是可解释和可理解的。通过在每个手工标注的部件上训练定制的子网,我们将基于最先进部件的分类准确率提高了2.1%。我们的研究表明,该方法可以产生更多的激活来区分细节部分差异,同时通过一组策略来优化深度网络结构,保持较高的计算性能。
{"title":"Fine-Grained Categorization by Deep Part-Collaboration Convolution Net","authors":"Qiyu Liao, H. Holewa, Min Xu, Dadong Wang","doi":"10.1109/DICTA.2018.8615855","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615855","url":null,"abstract":"In part-based categorization context, the ability to learn representative feature from quantitative tiny object parts is of similar importance as to exactly localize the parts. We propose a new deep net structure for fine-grained categorization that follows the taxonomy workflow, which makes it interpretable and understandable for humans. By training customized sub-nets on each manually annotated parts, we increased the state-of-the-art part-based classification accuracy for general fine-grained CUB-200-2011 dataset by 2.1%. Our study shows the proposed method can produce more activation to discriminate detail part difference while maintaining high computing performance by applying a set of strategies to optimize the deep net structure.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133670599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Hand Detection using Deformable Part Models on an Egocentric Perspective 基于自中心视角的可变形零件模型手部检测
Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615781
Sergio R. Cruz, Antoni B. Chan
The egocentric perspective is a recent perspective brought by new devices like the GoPro and Google Glass, which are becoming more available to the public. The hands are the most consistent objects in the egocentric perspective and they can represent more information about people and their activities, but the nature of the perspective and the ever changing shape of the hands makes them difficult to detect. Previous work has focused on indoor environments or controlled data since it brings simpler ways to approach it, but in this work we use data with changing background and variable illumination, which makes it more challenging. We use a Deformable Part Model based approach to generate hand proposals since it can handle the many gestures the hand can adopt and rivals other techniques on locating the hands while reducing the number of proposals. We also use the location where the hands appear and size in the image to reduce the number of detections. Finally, a CNN classifier is applied to remove the final false positives to generate the hand detections.
以自我为中心的视角是GoPro和谷歌眼镜等新设备最近带来的一种视角,它们越来越多地面向公众。在以自我为中心的视角中,手是最一致的对象,它们可以代表关于人和他们的活动的更多信息,但视角的性质和手的不断变化的形状使它们很难被发现。以前的工作主要集中在室内环境或受控数据,因为它带来了更简单的方法来接近它,但在这项工作中,我们使用的数据具有变化的背景和可变的照明,这使得它更具挑战性。我们使用一种基于可变形部分模型的方法来生成手建议,因为它可以处理手可以采用的许多手势,并且在定位手的同时减少了建议的数量。我们还使用手在图像中出现的位置和大小来减少检测的次数。最后,使用CNN分类器去除最终的假阳性以生成手部检测。
{"title":"Hand Detection using Deformable Part Models on an Egocentric Perspective","authors":"Sergio R. Cruz, Antoni B. Chan","doi":"10.1109/DICTA.2018.8615781","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615781","url":null,"abstract":"The egocentric perspective is a recent perspective brought by new devices like the GoPro and Google Glass, which are becoming more available to the public. The hands are the most consistent objects in the egocentric perspective and they can represent more information about people and their activities, but the nature of the perspective and the ever changing shape of the hands makes them difficult to detect. Previous work has focused on indoor environments or controlled data since it brings simpler ways to approach it, but in this work we use data with changing background and variable illumination, which makes it more challenging. We use a Deformable Part Model based approach to generate hand proposals since it can handle the many gestures the hand can adopt and rivals other techniques on locating the hands while reducing the number of proposals. We also use the location where the hands appear and size in the image to reduce the number of detections. Finally, a CNN classifier is applied to remove the final false positives to generate the hand detections.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127426397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
2018 Digital Image Computing: Techniques and Applications (DICTA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1