2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA)最新文献

英文中文

Automatic UAV Forced Landing Site Detection Using Machine Learning 基于机器学习的无人机迫降点自动检测

2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2014-11-01 DOI: 10.1109/DICTA.2014.7008097

Xufeng Guo, S. Denman, C. Fookes, Luis Mejías Alvarez, S. Sridharan

The commercialization of aerial image processing is highly dependent on the platforms such as UAVs (Unmanned Aerial Vehicles). However, the lack of an automated UAV forced landing site detection system has been identified as one of the main impediments to allow UAV flight over populated areas in civilian airspace. This article proposes a UAV forced landing site detection system that is based on machine learning approaches including the Gaussian Mixture Model and the Support Vector Machine. A range of learning parameters are analysed including the number of Guassian mixtures, support vector kernels including linear, radial basis function Kernel (RBF) and polynormial kernel (poly), and the order of RBF kernel and polynormial kernel. Moreover, a modified footprint operator is employed during feature extraction to better describe the geometric characteristics of the local area surrounding a pixel. The performance of the presented system is compared to a baseline UAV forced landing site detection system which uses edge features and an Artificial Neural Network (ANN) region type classifier. Experiments conducted on aerial image datasets captured over typical urban environments reveal improved landing site detection can be achieved with an SVM classifier with an RBF kernel using a combination of colour and texture features. Compared to the baseline system, the proposed system provides significant improvement in term of the chance to detect a safe landing area, and the performance is more stable than the baseline in the presence of changes to the UAV altitude.

航空图像处理的商业化高度依赖于无人机(uav)等平台。然而，缺乏一个自动的UAV强制着陆地点探测系统已经被确定为允许UAV在民用空域的人口稠密地区上空飞行的主要障碍之一。本文提出了一种基于高斯混合模型和支持向量机等机器学习方法的无人机迫降点检测系统。分析了一系列的学习参数，包括高斯混合的数量，支持向量核包括线性、径向基函数核(RBF)和多正态核(poly)，以及RBF核和多正态核的阶数。此外，在特征提取过程中采用了改进的足迹算子，以更好地描述像素周围局部区域的几何特征。将该系统的性能与使用边缘特征和人工神经网络(ANN)区域类型分类器的基线无人机迫降点检测系统进行了比较。在典型城市环境中捕获的航空图像数据集上进行的实验表明，使用结合颜色和纹理特征的RBF核的SVM分类器可以实现改进的着陆点检测。与基线系统相比，所提出的系统在检测安全着陆区域的机会方面提供了显着改进，并且在UAV高度变化的情况下，性能比基线更稳定。

{"title":"Automatic UAV Forced Landing Site Detection Using Machine Learning","authors":"Xufeng Guo, S. Denman, C. Fookes, Luis Mejías Alvarez, S. Sridharan","doi":"10.1109/DICTA.2014.7008097","DOIUrl":"https://doi.org/10.1109/DICTA.2014.7008097","url":null,"abstract":"The commercialization of aerial image processing is highly dependent on the platforms such as UAVs (Unmanned Aerial Vehicles). However, the lack of an automated UAV forced landing site detection system has been identified as one of the main impediments to allow UAV flight over populated areas in civilian airspace. This article proposes a UAV forced landing site detection system that is based on machine learning approaches including the Gaussian Mixture Model and the Support Vector Machine. A range of learning parameters are analysed including the number of Guassian mixtures, support vector kernels including linear, radial basis function Kernel (RBF) and polynormial kernel (poly), and the order of RBF kernel and polynormial kernel. Moreover, a modified footprint operator is employed during feature extraction to better describe the geometric characteristics of the local area surrounding a pixel. The performance of the presented system is compared to a baseline UAV forced landing site detection system which uses edge features and an Artificial Neural Network (ANN) region type classifier. Experiments conducted on aerial image datasets captured over typical urban environments reveal improved landing site detection can be achieved with an SVM classifier with an RBF kernel using a combination of colour and texture features. Compared to the baseline system, the proposed system provides significant improvement in term of the chance to detect a safe landing area, and the performance is more stable than the baseline in the presence of changes to the UAV altitude.","PeriodicalId":146695,"journal":{"name":"2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125261356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

A Non-Rigid 3D Multi-Modal Registration Algorithm Using Partial Volume Interpolation and the Sum of Conditional Variance 基于部分体插值和条件方差和的非刚性三维多模态配准算法

2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2014-11-01 DOI: 10.1109/DICTA.2014.7008088

Mst. Nargis Aktar, M. Alam, M. Pickering

Multi-modal medical image registration provides complementary information from the fusion of various medical imaging modalities. This paper presents a volume based multi-modal affine registration algorithm to register images acquired using different magnetic resonance imaging (MRI) modes. In the proposed algorithm, the sum-of-conditional variance (SCV) similarity measure is used. The SCV is considered to be a state-of-the- art similarity measure for registering multi-modal images. However, the main drawback of the SCV is that it uses only quantized information to calculate a joint histogram. To overcome this limitation, we propose to use partial volume interpolation (PVI) in the joint histogram calculation to improve the performance of the existing registration algorithm. To evaluate the performance of the registration algorithm, different similarity measures were compared in conjunction with gradient-based Gauss-Newton (GN) optimization to optimize the spatial transformation parameters. The experimental evaluation shows that the proposed approach provides a higher success rate and comparable accuracy to other methods that have been recently proposed for multi-modal medical image registration.

多模态医学图像配准提供了多种医学成像模式融合的互补信息。本文提出了一种基于体的多模态仿射配准算法，用于配准不同磁共振成像模式下的图像。该算法采用条件方差和(sum-of-conditional variance, SCV)相似度度量。SCV被认为是一种最先进的多模态图像配准相似度量。然而，SCV的主要缺点是它只使用量化信息来计算联合直方图。为了克服这一限制，我们提出在联合直方图计算中使用部分体积插值(PVI)来提高现有配准算法的性能。为了评估配准算法的性能，比较了不同的相似度度量，并结合基于梯度的高斯-牛顿(GN)优化来优化空间变换参数。实验结果表明，该方法在多模态医学图像配准方面具有较高的成功率和相当的精度。

{"title":"A Non-Rigid 3D Multi-Modal Registration Algorithm Using Partial Volume Interpolation and the Sum of Conditional Variance","authors":"Mst. Nargis Aktar, M. Alam, M. Pickering","doi":"10.1109/DICTA.2014.7008088","DOIUrl":"https://doi.org/10.1109/DICTA.2014.7008088","url":null,"abstract":"Multi-modal medical image registration provides complementary information from the fusion of various medical imaging modalities. This paper presents a volume based multi-modal affine registration algorithm to register images acquired using different magnetic resonance imaging (MRI) modes. In the proposed algorithm, the sum-of-conditional variance (SCV) similarity measure is used. The SCV is considered to be a state-of-the- art similarity measure for registering multi-modal images. However, the main drawback of the SCV is that it uses only quantized information to calculate a joint histogram. To overcome this limitation, we propose to use partial volume interpolation (PVI) in the joint histogram calculation to improve the performance of the existing registration algorithm. To evaluate the performance of the registration algorithm, different similarity measures were compared in conjunction with gradient-based Gauss-Newton (GN) optimization to optimize the spatial transformation parameters. The experimental evaluation shows that the proposed approach provides a higher success rate and comparable accuracy to other methods that have been recently proposed for multi-modal medical image registration.","PeriodicalId":146695,"journal":{"name":"2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131715111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Fusion of Multiple Sensor Data to Recognise Moving Objects in Wide Area Motion Imagery 广域运动图像中多传感器数据融合识别运动目标

2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2014-11-01 DOI: 10.1109/DICTA.2014.7008110

S. Fehlmann, C. Pontecorvo, D. Booth, P. Janney, Robert Christie, N. Redding, Mike Royce, Merrilyn J. Fiebig

This work addresses the problem of extracting semantics associated with multiple, cooperatively managed motion imagery sensors to support indexing and search of large imagery collections. The extracted semantics relate to the motion and identity of vehicles within a scene, viewed from aircraft and the ground. Semantic extraction required three steps: Video Moving Target Indication (VMTI), imagery fusion, and object recognition. VMTI used a previously published algorithm, with some novel modifications allowing detection and tracking in low frame rate, Wide Area Motion Imagery (WAMI), and Full Motion Video (FMV). Following this, the data from multiple sensors were fused to identify a highest resolution image, corresponding to each moving object. A final recognition stage attempted to fit each delineated object to a database of 3D models to determine its type. A proof-of-concept has been developed to allow processing of imagery collected during a recent experiment using a state of the art airborne surveillance sensor providing WAMI, with coincident narrower-area FMV sensors and simultaneous collection by a ground-based camera. An indication of the potential utility of the system was obtained using ground-truthed examples.

这项工作解决了提取与多个协同管理的运动图像传感器相关的语义的问题，以支持大型图像集合的索引和搜索。从飞机和地面上看，提取的语义与场景中车辆的运动和身份有关。语义提取需要三个步骤:视频移动目标指示(VMTI)、图像融合和目标识别。VMTI使用了先前发布的算法，并进行了一些新的修改，允许在低帧率、广域运动图像(WAMI)和全运动视频(FMV)中进行检测和跟踪。随后，来自多个传感器的数据被融合以识别最高分辨率的图像，对应于每个移动物体。最后的识别阶段试图将每个描绘的物体与3D模型数据库相匹配，以确定其类型。在最近的一次实验中，使用提供WAMI的最先进的机载监视传感器，与同步的窄区域FMV传感器和地面摄像机同时收集的图像进行了概念验证，从而可以处理收集到的图像。通过实例验证了该系统的潜在效用。

{"title":"Fusion of Multiple Sensor Data to Recognise Moving Objects in Wide Area Motion Imagery","authors":"S. Fehlmann, C. Pontecorvo, D. Booth, P. Janney, Robert Christie, N. Redding, Mike Royce, Merrilyn J. Fiebig","doi":"10.1109/DICTA.2014.7008110","DOIUrl":"https://doi.org/10.1109/DICTA.2014.7008110","url":null,"abstract":"This work addresses the problem of extracting semantics associated with multiple, cooperatively managed motion imagery sensors to support indexing and search of large imagery collections. The extracted semantics relate to the motion and identity of vehicles within a scene, viewed from aircraft and the ground. Semantic extraction required three steps: Video Moving Target Indication (VMTI), imagery fusion, and object recognition. VMTI used a previously published algorithm, with some novel modifications allowing detection and tracking in low frame rate, Wide Area Motion Imagery (WAMI), and Full Motion Video (FMV). Following this, the data from multiple sensors were fused to identify a highest resolution image, corresponding to each moving object. A final recognition stage attempted to fit each delineated object to a database of 3D models to determine its type. A proof-of-concept has been developed to allow processing of imagery collected during a recent experiment using a state of the art airborne surveillance sensor providing WAMI, with coincident narrower-area FMV sensors and simultaneous collection by a ground-based camera. An indication of the potential utility of the system was obtained using ground-truthed examples.","PeriodicalId":146695,"journal":{"name":"2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116520849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Multi-Focus Image Fusion via Boundary Finding and Multi-Scale Morphological Focus-Measure 基于边界发现和多尺度形态焦点测量的多焦点图像融合

2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2014-11-01 DOI: 10.1109/DICTA.2014.7008116

Yu Zhang, X. Bai, Tao Wang

Multi-focus image fusion is to extract the focused regions from the multiple images of the same scene and combine them together to produce one fully focused image. The key is to find the focused regions from the source images. In this paper, we transform the problem of finding the focused regions to find the boundaries between the focused and defocused regions in the source images, and propose a novel image fusion method via boundary finding and a multi-scale morphological focus-measure. Firstly, a morphological focus-measure, consisted of multi- scale morphological gradients, is proposed to measure the focus of the images. Secondly, a novel boundary finding method is presented, which utilizes the relations of the focus information of the source images. Thirdly, the found boundaries naturally segment the source images into regions with the same focus condition and the focused regions can be simply selected by comparing the focus-measures of the corresponding regions. Fourthly, the detected focused regions are reconstructed to obtain the decision map for the multi-focus image fusion. Finally, the fused image is produced according to the decision map and the given fusion rule. Experimental results demonstrate the proposed algorithm outperforms other spatial domain fusion algorithms.

多焦点图像融合是从同一场景的多幅图像中提取焦点区域，并将它们组合在一起，得到一幅完全聚焦的图像。关键是从源图像中找出聚焦区域。本文将寻找焦点区域的问题转化为寻找源图像中焦点和散焦区域之间的边界问题，提出了一种基于边界发现和多尺度形态学焦点测量的图像融合方法。首先，提出了一种由多尺度形态梯度组成的形态学焦点测量方法来测量图像的焦点。其次，利用源图像焦点信息之间的关系，提出了一种新的边界查找方法;第三，找到的边界自然地将源图像分割成具有相同聚焦条件的区域，通过比较相应区域的聚焦度量，可以简单地选择聚焦区域。第四，对检测到的聚焦区域进行重构，得到多聚焦图像融合的决策图;最后，根据决策图和给定的融合规则生成融合图像。实验结果表明，该算法优于其他空间域融合算法。

{"title":"Multi-Focus Image Fusion via Boundary Finding and Multi-Scale Morphological Focus-Measure","authors":"Yu Zhang, X. Bai, Tao Wang","doi":"10.1109/DICTA.2014.7008116","DOIUrl":"https://doi.org/10.1109/DICTA.2014.7008116","url":null,"abstract":"Multi-focus image fusion is to extract the focused regions from the multiple images of the same scene and combine them together to produce one fully focused image. The key is to find the focused regions from the source images. In this paper, we transform the problem of finding the focused regions to find the boundaries between the focused and defocused regions in the source images, and propose a novel image fusion method via boundary finding and a multi-scale morphological focus-measure. Firstly, a morphological focus-measure, consisted of multi- scale morphological gradients, is proposed to measure the focus of the images. Secondly, a novel boundary finding method is presented, which utilizes the relations of the focus information of the source images. Thirdly, the found boundaries naturally segment the source images into regions with the same focus condition and the focused regions can be simply selected by comparing the focus-measures of the corresponding regions. Fourthly, the detected focused regions are reconstructed to obtain the decision map for the multi-focus image fusion. Finally, the fused image is produced according to the decision map and the given fusion rule. Experimental results demonstrate the proposed algorithm outperforms other spatial domain fusion algorithms.","PeriodicalId":146695,"journal":{"name":"2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"180 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115036032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Fast Intermode Selection for HEVC Video Coding Using Phase Correlation 基于相位相关的HEVC视频编码间模快速选择

2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2014-11-01 DOI: 10.1109/DICTA.2014.7008109

Pallab Kanti Podder, M. Paul, Manzur Murshed, Subrata Chakraborty

The recent High Efficiency Video Coding (HEVC) Standard demonstrates higher rate-distortion (RD) performance compared to its predecessor H.264/AVC using different new tools especially larger and asymmetric inter-mode variable size motion estimation and compensation. This requires more than 4 times computational time compared to H.264/AVC. As a result it has always been a big concern for the researchers to reduce the amount of time while maintaining the standard quality of the video. The reduction of computational time by smart selection of the appropriate modes in HEVC is our motivation. To accomplish this task in this paper, we use phase correlation to approximate the motion information between current and reference blocks by comparing with a number of different binary pattern templates and then select a subset of motion estimation modes without exhaustively exploring all possible modes. The experimental results exhibit that the proposed HEVC-PC (HEVC with Phase Correlation) scheme outperforms the standard HEVC scheme in terms of computational time while preserving-the same quality of the video sequences. More specifically, around 40% encoding time is reduced compared to the exhaustive mode selection in HEVC.

最近的高效视频编码(HEVC)标准使用不同的新工具，特别是更大和不对称的模式间可变尺寸运动估计和补偿，与其前身H.264/AVC相比，显示出更高的率失真(RD)性能。与H.264/AVC相比，这需要4倍以上的计算时间。因此，如何在保持标准视频质量的同时减少时间一直是研究人员关注的一个大问题。在HEVC中通过智能选择合适的模式来减少计算时间是我们的动机。为了完成本文的任务，我们通过比较许多不同的二进制模式模板，使用相位相关来近似当前和参考块之间的运动信息，然后选择运动估计模式的子集，而不是穷尽地探索所有可能的模式。实验结果表明，所提出的HEVC- pc (HEVC with Phase Correlation)方案在保持视频序列相同质量的前提下，在计算时间上优于标准HEVC方案。更具体地说，与HEVC中的穷举模式选择相比，大约减少了40%的编码时间。

{"title":"Fast Intermode Selection for HEVC Video Coding Using Phase Correlation","authors":"Pallab Kanti Podder, M. Paul, Manzur Murshed, Subrata Chakraborty","doi":"10.1109/DICTA.2014.7008109","DOIUrl":"https://doi.org/10.1109/DICTA.2014.7008109","url":null,"abstract":"The recent High Efficiency Video Coding (HEVC) Standard demonstrates higher rate-distortion (RD) performance compared to its predecessor H.264/AVC using different new tools especially larger and asymmetric inter-mode variable size motion estimation and compensation. This requires more than 4 times computational time compared to H.264/AVC. As a result it has always been a big concern for the researchers to reduce the amount of time while maintaining the standard quality of the video. The reduction of computational time by smart selection of the appropriate modes in HEVC is our motivation. To accomplish this task in this paper, we use phase correlation to approximate the motion information between current and reference blocks by comparing with a number of different binary pattern templates and then select a subset of motion estimation modes without exhaustively exploring all possible modes. The experimental results exhibit that the proposed HEVC-PC (HEVC with Phase Correlation) scheme outperforms the standard HEVC scheme in terms of computational time while preserving-the same quality of the video sequences. More specifically, around 40% encoding time is reduced compared to the exhaustive mode selection in HEVC.","PeriodicalId":146695,"journal":{"name":"2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134188018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Infrared Ship Target Image Smoothing Based on Adaptive Mean Shift 基于自适应均值位移的红外舰船目标图像平滑

2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2014-11-01 DOI: 10.1109/DICTA.2014.7008113

Zhaoying Liu, Changming Sun, X. Bai, F. Zhou

Infrared (IR) image denoising is important for IR image analysis. In this paper, we propose a method based on adaptive range bandwidth mean shift for IR ship target image smoothing, aiming to effectively suppress noise as well as preserve important target structures. First, local image properties, including the mean value and standard deviation, are combined to build a salient region map, and a thresholding method is applied to obtain a binary mask on the target region. Then, we develop an adaptive range bandwidth mean shift method for image denoising. By associating the range bandwidth of the mean shift with local region saliency, we can adjust the bandwidth adaptively, thus to smooth the background region while preserving important target structures. Experimental results show that this method works well for IR ship target images with different backgrounds. It demonstrates superior performance for image denoising and target preserving, compared with some existing image denoising methods.

红外图像去噪是红外图像分析的重要内容。本文提出了一种基于自适应距离带宽平均偏移的红外舰船目标图像平滑方法，旨在有效地抑制噪声并保留重要的目标结构。首先，结合图像的局部属性，包括均值和标准差，构建显著区域图，并应用阈值法在目标区域上获得二值掩码;在此基础上，提出了一种自适应范围带宽均值移图像去噪方法。通过将均值漂移的范围带宽与局部区域显著性相关联，可以自适应地调整带宽，从而在保持重要目标结构的同时平滑背景区域。实验结果表明，该方法对不同背景的红外舰船目标图像具有较好的效果。与现有的一些图像去噪方法相比，该方法在图像去噪和目标保持方面表现出优异的性能。

引用次数: 7

Crowd Behavior Recognition Using Dense Trajectories 基于密集轨迹的人群行为识别

2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2014-11-01 DOI: 10.1109/DICTA.2014.7008098

Muhammad Rizwan Khokher, A. Bouzerdoum, S. L. Phung

This article presents a new method for crowd behavior recognition, using dynamic features extracted from dense trajectories. The histogram of oriented gradient and motion boundary histogram descriptors are computed at dense points along motion trajectories, and tracked using median filtering and displacement information obtained from a dense optical flow field. Then a global representation of the scene is obtained using a bag-of-words model of the extracted features. The locality-constrained linear encoding with sum pooling and L2 plus power normalization are employed in the bag-of-words model. Finally, a support vector machine classifier is trained to recognize the crowd behavior in a short video sequence. The proposed method is tested on two benchmark datasets, and its performance is compared with those of some existing methods. Experimental results show that the proposed approach can achieve a classification rate of 93.8% on PETS2009 S3 and area under the curve score of 0.985 on UMN datasets respectively.

本文提出了一种新的人群行为识别方法，利用从密集轨迹中提取的动态特征进行人群行为识别。在沿运动轨迹的密集点处计算定向梯度直方图和运动边界直方图描述符，并利用密集光流场中值滤波和位移信息进行跟踪。然后使用提取的特征的词袋模型获得场景的全局表示。在词袋模型中采用了位置约束线性编码和和池和L2 +幂归一化。最后，训练支持向量机分类器识别短视频序列中的人群行为。在两个基准数据集上对该方法进行了测试，并与现有方法进行了性能比较。实验结果表明，该方法在PETS2009 S3上的分类率为93.8%，在UMN数据集上的曲线下面积得分为0.985。

引用次数: 8

Dual Graph Regularized NMF for Hyperspectral Unmixing 高光谱解混的对偶图正则化NMF

2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2014-11-01 DOI: 10.1109/DICTA.2014.7008103

Lei Tong, J. Zhou, Xiao Bai, Yongsheng Gao

Hyperspectral unmixing is an important technique for estimating fraction of different land cover types from remote sensing imagery. In recent years, nonnegative matrix factorization (NMF) with various constraints have been introduced into hyperspectral unmixing. Among these methods, graph based constraint have been proved to be useful in capturing the latent manifold structure of the hyperspectral data in the feature space. In this paper, we propose to integrate graph-based constraints based on manifold assumption in feature spaces and consistency of spatial space to regularize the NMF method. Results on both synthetic and real data have validated the effectiveness of the proposed method.

高光谱解混是估算不同土地覆盖类型遥感影像比例的重要技术。近年来，各种约束条件下的非负矩阵分解(NMF)被引入到高光谱分解中。在这些方法中，基于图的约束在特征空间中捕获高光谱数据的潜在流形结构方面被证明是有用的。本文提出将基于特征空间流形假设和空间空间一致性的基于图的约束结合起来，对NMF方法进行正则化。合成数据和实际数据均验证了该方法的有效性。

引用次数: 20

Image Segmentation Based on Spatially Coherent Gaussian Mixture Model 基于空间相干高斯混合模型的图像分割

2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2014-11-01 DOI: 10.1109/DICTA.2014.7008111

Guangpu Shao, Junbin Gao, Tianjiang Wang, Fang Liu, Yucheng Shu, Yong Yang

It has been demonstrated that a finite mixture model (FMM) with Gaussian distribution is a powerful tool in modeling probability density function of image data, with wide applications in computer vision and image analysis. We propose a simple-yet-effective way to enhance robustness of finite mixture models (FMM) by incorporating local spatial constraints. It is natural to make an assumption that the label of an image pixel is influenced by that of its neighboring pixels. We use mean template to represent local spatial constraints. Our algorithm is better than other mixture models based on Markov random fields (MRF) as our method avoids inferring the posterior field distribution and choosing the temperature parameter. We use the expectation maximization (EM) algorithm to optimize all the model parameters. Besides, the proposed algorithm is fully free of empirically adjusted hyperparameters. The idea used in our method can also be adopted to other mixture models. Several experiments on synthetic and real-world images have been conducted to demonstrate effectiveness, efficiency and robustness of the proposed method.

研究表明，高斯分布的有限混合模型(FMM)是图像数据概率密度函数建模的有力工具，在计算机视觉和图像分析中有着广泛的应用。我们提出了一种简单而有效的方法，通过结合局部空间约束来增强有限混合模型(FMM)的鲁棒性。假设图像像素的标签受到其相邻像素的标签的影响是很自然的。我们使用均值模板来表示局部空间约束。该算法避免了后场分布的推断和温度参数的选择，优于其他基于马尔可夫随机场的混合模型。我们使用期望最大化(EM)算法来优化所有模型参数。此外，该算法完全没有经验调整的超参数。本文方法的思想也可以应用到其他混合模型中。在合成图像和真实图像上进行了实验，证明了该方法的有效性、高效性和鲁棒性。

{"title":"Image Segmentation Based on Spatially Coherent Gaussian Mixture Model","authors":"Guangpu Shao, Junbin Gao, Tianjiang Wang, Fang Liu, Yucheng Shu, Yong Yang","doi":"10.1109/DICTA.2014.7008111","DOIUrl":"https://doi.org/10.1109/DICTA.2014.7008111","url":null,"abstract":"It has been demonstrated that a finite mixture model (FMM) with Gaussian distribution is a powerful tool in modeling probability density function of image data, with wide applications in computer vision and image analysis. We propose a simple-yet-effective way to enhance robustness of finite mixture models (FMM) by incorporating local spatial constraints. It is natural to make an assumption that the label of an image pixel is influenced by that of its neighboring pixels. We use mean template to represent local spatial constraints. Our algorithm is better than other mixture models based on Markov random fields (MRF) as our method avoids inferring the posterior field distribution and choosing the temperature parameter. We use the expectation maximization (EM) algorithm to optimize all the model parameters. Besides, the proposed algorithm is fully free of empirically adjusted hyperparameters. The idea used in our method can also be adopted to other mixture models. Several experiments on synthetic and real-world images have been conducted to demonstrate effectiveness, efficiency and robustness of the proposed method.","PeriodicalId":146695,"journal":{"name":"2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116990829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

The Influence of Temporal Information on Human Action Recognition with Large Number of Classes 时间信息对大类别人类动作识别的影响

2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2014-11-01 DOI: 10.1109/DICTA.2014.7008131

O. V. R. Murthy, Roland Göcke

Human action recognition from video input has seen much interest over the last decade. In recent years, the trend is clearly towards action recognition in real-world, unconstrained conditions (i.e. not acted) with an ever growing number of action classes. Much of the work so far has used single frames or sequences of frames where each frame was treated individually. This paper investigates the contribution that temporal information can make to human action recognition in the context of a large number of action classes. The key contributions are: (i) We propose a complementary information channel to the Bag-of- Words framework that models the temporal occurrence of the local information in videos. (ii) We investigate the influence of sensible local information whose temporal occurrence is more vital than any local information. The experimental validation on action recognition datasets with the largest number of classes to date shows the effectiveness of the proposed approach.

在过去的十年里，从视频输入中识别人类行为已经引起了人们的极大兴趣。近年来，随着动作类数量的不断增加，现实世界中无约束条件(即未采取行动)的动作识别趋势明显。到目前为止，大部分工作都是使用单个帧或帧序列，其中每个帧都被单独处理。本文研究了在大量动作类别的背景下，时间信息对人类动作识别的贡献。本文的主要贡献有:(i)我们提出了一个与词袋框架互补的信息通道，该框架对视频中局部信息的时间发生进行建模。(ii)我们研究了敏感的局部信息的影响，这些信息的时间出现比任何局部信息都重要。在迄今为止类数最多的动作识别数据集上进行的实验验证表明了该方法的有效性。

引用次数: 5

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀