首页 > 最新文献

2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops最新文献

英文 中文
Audio-visual speech synchronization detection using a bimodal linear prediction model 使用双峰线性预测模型的视听语音同步检测
Kshitiz Kumar, Jirí Navrátil, E. Marcheret, V. Libal, G. Ramaswamy, G. Potamianos
In this work, we study the problem of detecting audio-visual (AV) synchronization in video segments containing a speaker in frontal head pose. The problem holds important applications in biometrics, for example spoofing detection, and it constitutes an important step in AV segmentation necessary for deriving AV fingerprints in multimodal speaker recognition. To attack the problem, we propose a time-evolution model for AV features and derive an analytical approach to capture the notion of synchronization between them. We report results on an appropriate AV database, using two types of visual features extracted from the speaker's facial area: geometric ones and features based on the discrete cosine image transform. Our results demonstrate that the proposed approach provides substantially better AV synchrony detection over a baseline method that employs mutual information, with the geometric visual features outperforming the image transform ones.
在这项工作中,我们研究了在包含说话者正面头部姿势的视频片段中检测视听(AV)同步的问题。该问题在生物识别中具有重要的应用,例如欺骗检测,并且它构成了在多模态说话人识别中提取AV指纹所必需的AV分割的重要步骤。为了解决这个问题,我们提出了一个AV特征的时间演化模型,并推导了一种分析方法来捕捉它们之间的同步概念。我们在适当的AV数据库上报告结果,使用从说话者面部区域提取的两种视觉特征:几何特征和基于离散余弦图像变换的特征。我们的研究结果表明,与采用互信息的基线方法相比,该方法提供了更好的AV同步检测,其几何视觉特征优于图像变换特征。
{"title":"Audio-visual speech synchronization detection using a bimodal linear prediction model","authors":"Kshitiz Kumar, Jirí Navrátil, E. Marcheret, V. Libal, G. Ramaswamy, G. Potamianos","doi":"10.1109/CVPRW.2009.5204303","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204303","url":null,"abstract":"In this work, we study the problem of detecting audio-visual (AV) synchronization in video segments containing a speaker in frontal head pose. The problem holds important applications in biometrics, for example spoofing detection, and it constitutes an important step in AV segmentation necessary for deriving AV fingerprints in multimodal speaker recognition. To attack the problem, we propose a time-evolution model for AV features and derive an analytical approach to capture the notion of synchronization between them. We report results on an appropriate AV database, using two types of visual features extracted from the speaker's facial area: geometric ones and features based on the discrete cosine image transform. Our results demonstrate that the proposed approach provides substantially better AV synchrony detection over a baseline method that employs mutual information, with the geometric visual features outperforming the image transform ones.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123318752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Color calibration of multi-projector displays through automatic optimization of hardware settings 通过自动优化硬件设置来校准多投影机显示器的颜色
R. M. Steele, Mao Ye, Ruigang Yang
We describe a system that performs automatic, camera-based photometric projector calibration by adjusting hardware settings (e.g. brightness, contrast, etc.). The approach has two basic advantages over software-correction methods. First, there is no software interface imposed on graphical programs: all imagery displayed on the projector benefits from the calibration immediately, without render-time overhead or code changes. Secondly, the approach benefits from the fact that projector hardware settings typically are capable of expanding or shifting color gamuts (e.g. trading off maximum brightness versus darkness of black levels), something that software methods, which only shrink gamuts, cannot do. In practice this means that hardware settings can possibly match colors between projectors while maintaining a larger overall color gamut (e.g. better contrast) than software-only correction can. The prototype system is fully automatic. The space of hardware settings is explored by using a computer-controlled universal remote to navigate each projector's menu system. An off-the-shelf camera observes each projector's response curves. A cost function is computed for the curves based on their similarity to each other, as well as intrinsic characteristics, including color balance, black level, gamma, and dynamic range. An approximate optimum is found using a heuristic combinatoric search. Results show significant qualitative improvements in the absolute colors, as well as the color consistency, of the display.
我们描述了一个通过调整硬件设置(例如亮度,对比度等)来执行自动,基于相机的光度投影仪校准的系统。与软件校正方法相比,该方法有两个基本优点。首先,没有强加在图形程序上的软件界面:投影仪上显示的所有图像都可以立即从校准中受益,而无需渲染时间开销或代码更改。其次,这种方法得益于投影机硬件设置通常能够扩展或移动色域(例如,在最大亮度与黑色水平的黑暗之间进行交易),这是只能缩小色域的软件方法无法做到的。在实践中,这意味着硬件设置可以在投影机之间匹配颜色,同时保持更大的整体色域(例如,更好的对比度),而不是仅使用软件进行校正。原型系统是全自动的。通过使用计算机控制的通用遥控器来导航每个投影仪的菜单系统,探索硬件设置的空间。一台现成的摄像机观察每个投影仪的响应曲线。根据曲线之间的相似性以及内在特征(包括色彩平衡、黑电平、伽马和动态范围)计算曲线的成本函数。用启发式组合搜索法求出近似最优解。结果显示,在绝对颜色和颜色一致性方面,显示器的质量有了显著提高。
{"title":"Color calibration of multi-projector displays through automatic optimization of hardware settings","authors":"R. M. Steele, Mao Ye, Ruigang Yang","doi":"10.1109/CVPRW.2009.5204322","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204322","url":null,"abstract":"We describe a system that performs automatic, camera-based photometric projector calibration by adjusting hardware settings (e.g. brightness, contrast, etc.). The approach has two basic advantages over software-correction methods. First, there is no software interface imposed on graphical programs: all imagery displayed on the projector benefits from the calibration immediately, without render-time overhead or code changes. Secondly, the approach benefits from the fact that projector hardware settings typically are capable of expanding or shifting color gamuts (e.g. trading off maximum brightness versus darkness of black levels), something that software methods, which only shrink gamuts, cannot do. In practice this means that hardware settings can possibly match colors between projectors while maintaining a larger overall color gamut (e.g. better contrast) than software-only correction can. The prototype system is fully automatic. The space of hardware settings is explored by using a computer-controlled universal remote to navigate each projector's menu system. An off-the-shelf camera observes each projector's response curves. A cost function is computed for the curves based on their similarity to each other, as well as intrinsic characteristics, including color balance, black level, gamma, and dynamic range. An approximate optimum is found using a heuristic combinatoric search. Results show significant qualitative improvements in the absolute colors, as well as the color consistency, of the display.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126388389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Learning to segment using machine-learned penalized logistic models 学习使用机器学习惩罚逻辑模型进行分割
Yong Yue, H. Tagare
Classical maximum-a-posteriori (MAP) segmentation uses generative models for images. However, creating tractable generative models can be difficult for complex images. Moreover, generative models require auxiliary parameters to be included in the maximization, which makes the maximization more complicated. This paper proposes an alternative to the MAP approach: using a penalized logistic model to directly model the segmentation posterior. This approach has two advantages: (1) It requires fewer auxiliary parameters, and (2) it provides a standard way of incorporating powerful machine-learning methods into segmentation so that complex image phenomenon can be learned easily from a training set. The technique is used to segment cardiac ultrasound images sequences which have substantial spatio-temporal contrast variation that is cumbersome to model. Experimental results show that the method gives accurate segmentations of the endocardium in spite of the contrast variation.
经典的最大后验分割(MAP)使用生成模型对图像进行分割。然而,对于复杂的图像,创建易于处理的生成模型可能很困难。此外,生成模型要求在最大化过程中包含辅助参数,这使得最大化过程更加复杂。本文提出了一种替代MAP方法的方法:使用惩罚逻辑模型直接对分割后验进行建模。这种方法有两个优点:(1)它需要更少的辅助参数,(2)它提供了一种将强大的机器学习方法纳入分割的标准方法,以便可以从训练集中轻松学习复杂的图像现象。该技术用于分割心脏超声图像序列,这些图像序列具有大量的时空对比变化,难以建模。实验结果表明,该方法在对比度变化的情况下仍能准确地分割心内膜。
{"title":"Learning to segment using machine-learned penalized logistic models","authors":"Yong Yue, H. Tagare","doi":"10.1109/CVPRW.2009.5204343","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204343","url":null,"abstract":"Classical maximum-a-posteriori (MAP) segmentation uses generative models for images. However, creating tractable generative models can be difficult for complex images. Moreover, generative models require auxiliary parameters to be included in the maximization, which makes the maximization more complicated. This paper proposes an alternative to the MAP approach: using a penalized logistic model to directly model the segmentation posterior. This approach has two advantages: (1) It requires fewer auxiliary parameters, and (2) it provides a standard way of incorporating powerful machine-learning methods into segmentation so that complex image phenomenon can be learned easily from a training set. The technique is used to segment cardiac ultrasound images sequences which have substantial spatio-temporal contrast variation that is cumbersome to model. Experimental results show that the method gives accurate segmentations of the endocardium in spite of the contrast variation.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"74 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114099009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Biometric data hiding: A 3 factor authentication approach to verify identity with a single image using steganography, encryption and matching 生物特征数据隐藏:一种3因素身份验证方法,使用隐写,加密和匹配来验证单个图像的身份
Neha Agrawal, M. Savvides
Digital Steganography exploits the use of host data to hide a piece of information in such a way that it is imperceptible to a human observer. Its main objectives are imperceptibility, robustness and high payload. DCT Domain message embedding in Spread Spectrum Steganography describes a novel method of using redundancy in DCT coefficients. We improved upon the method of DCT embedding by using the sign of the DCT coefficients to get better accuracy of retrieved data and more robustness under channel attacks like channel noise and JPEG compression artifacts while maintaining the visual imperceptibility of cover image, and even extending the method further to obtain higher payloads. We also apply this method for secure biometric data hiding, transmission and recovery. We hide iris code templates and fingerprints in the host image which can be any arbitrary image, such as face biometric modality and transmit the so formed imperceptible Stego-Image securely and robustly for authentication, and yet obtain perfect reconstruction and classification of iris codes and retrieval of fingerprints at the receiving end without any knowledge of the cover image i.e. a blind method of steganography, which in this case is used to hide biometric template in another biometric modality.
数字隐写术利用主机数据来隐藏信息,使人类观察者无法察觉。它的主要目标是不可感知性、鲁棒性和高负载。扩频隐写中的DCT域信息嵌入描述了一种利用DCT系数冗余的新方法。我们对DCT嵌入方法进行了改进,利用DCT系数的符号,在保持覆盖图像的视觉不可感知性的同时,提高了检索数据的准确性和对信道噪声、JPEG压缩伪像等信道攻击的鲁棒性,甚至进一步扩展了该方法以获得更高的有效载荷。我们还将这种方法应用于生物特征数据的安全隐藏、传输和恢复。我们将虹膜码模板和指纹隐藏在宿主图像中,宿主图像可以是任意的图像,如人脸生物识别模态,并将形成的不易察觉的隐写图像安全、稳健地传输进行认证,在接收端不知道封面图像的情况下,实现虹膜码的完美重建、分类和指纹的检索,即一种盲隐写方法。在这种情况下,它被用来隐藏另一种生物识别模式的生物识别模板。
{"title":"Biometric data hiding: A 3 factor authentication approach to verify identity with a single image using steganography, encryption and matching","authors":"Neha Agrawal, M. Savvides","doi":"10.1109/CVPRW.2009.5204308","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204308","url":null,"abstract":"Digital Steganography exploits the use of host data to hide a piece of information in such a way that it is imperceptible to a human observer. Its main objectives are imperceptibility, robustness and high payload. DCT Domain message embedding in Spread Spectrum Steganography describes a novel method of using redundancy in DCT coefficients. We improved upon the method of DCT embedding by using the sign of the DCT coefficients to get better accuracy of retrieved data and more robustness under channel attacks like channel noise and JPEG compression artifacts while maintaining the visual imperceptibility of cover image, and even extending the method further to obtain higher payloads. We also apply this method for secure biometric data hiding, transmission and recovery. We hide iris code templates and fingerprints in the host image which can be any arbitrary image, such as face biometric modality and transmit the so formed imperceptible Stego-Image securely and robustly for authentication, and yet obtain perfect reconstruction and classification of iris codes and retrieval of fingerprints at the receiving end without any knowledge of the cover image i.e. a blind method of steganography, which in this case is used to hide biometric template in another biometric modality.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122405120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
Square Loss based regularized LDA for face recognition using image sets 基于平方损失的正则化LDA图像集人脸识别
Yanlin Geng, Caifeng Shan, Pengwei Hao
In this paper, we focus on face recognition over image sets, where each set is represented by a linear subspace. Linear Discriminant Analysis (LDA) is adopted for discriminative learning. After investigating the relation between regularization on Fisher Criterion and Maximum Margin Criterion, we present a unified framework for regularized LDA. With the framework, the ratio-form maximization of regularized Fisher LDA can be reduced to the difference-form optimization with an additional constraint. By incorporating the empirical loss as the regularization term, we introduce a generalized Square Loss based Regularized LDA (SLR-LDA) with suggestion on parameter setting. Our approach achieves superior performance to the state-of-the-art methods on face recognition. Its effectiveness is also evidently verified in general object and object category recognition experiments.
在本文中,我们关注图像集上的人脸识别,其中每个集由一个线性子空间表示。判别学习采用线性判别分析(LDA)。在研究Fisher准则的正则化与最大余量准则的正则化关系的基础上,提出了正则化LDA的统一框架。利用该框架,正则化Fisher LDA的比值形式最大化问题可以简化为附加约束的差分形式优化问题。通过将经验损失作为正则化项,提出了一种基于广义平方损失的正则化LDA (SLR-LDA),并给出了参数设置建议。我们的方法在人脸识别方面取得了比最先进的方法更好的性能。在一般物体识别和物体类别识别实验中,也验证了该方法的有效性。
{"title":"Square Loss based regularized LDA for face recognition using image sets","authors":"Yanlin Geng, Caifeng Shan, Pengwei Hao","doi":"10.1109/CVPRW.2009.5204307","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204307","url":null,"abstract":"In this paper, we focus on face recognition over image sets, where each set is represented by a linear subspace. Linear Discriminant Analysis (LDA) is adopted for discriminative learning. After investigating the relation between regularization on Fisher Criterion and Maximum Margin Criterion, we present a unified framework for regularized LDA. With the framework, the ratio-form maximization of regularized Fisher LDA can be reduced to the difference-form optimization with an additional constraint. By incorporating the empirical loss as the regularization term, we introduce a generalized Square Loss based Regularized LDA (SLR-LDA) with suggestion on parameter setting. Our approach achieves superior performance to the state-of-the-art methods on face recognition. Its effectiveness is also evidently verified in general object and object category recognition experiments.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"287 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131425184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
An alignment based similarity measure for hand detection in cluttered sign language video 一种基于对齐的相似度方法用于杂乱手语视频中的手部检测
Ashwin Thangali, S. Sclaroff
Locating hands in sign language video is challenging due to a number of factors. Hand appearance varies widely across signers due to anthropometric variations and varying levels of signer proficiency. Video can be captured under varying illumination, camera resolutions, and levels of scene clutter, e.g., high-res video captured in a studio vs. low-res video gathered by a Web cam in a user's home. Moreover, the signers' clothing varies, e.g., skin-toned clothing vs. contrasting clothing, short-sleeved vs. long-sleeved shirts, etc. In this work, the hand detection problem is addressed in an appearance matching framework. The histogram of oriented gradient (HOG) based matching score function is reformulated to allow non-rigid alignment between pairs of images to account for hand shape variation. The resulting alignment score is used within a support vector machine hand/not-hand classifier for hand detection. The new matching score function yields improved performance (in ROC area and hand detection rate) over the vocabulary guided pyramid match kernel (VGPMK) and the traditional, rigid HOG distance on American Sign Language video gestured by expert signers. The proposed match score function is computationally less expensive (for training and testing), has fewer parameters and is less sensitive to parameter settings than VGPMK. The proposed detector works well on test sequences from an inexpert signer in a non-studio setting with cluttered background.
由于许多因素,在手语视频中定位手势是具有挑战性的。由于人体测量差异和不同水平的熟练程度,不同的签名者的手外观差异很大。视频可以在不同的照明、相机分辨率和场景杂乱程度下拍摄,例如,在工作室拍摄的高分辨率视频与在用户家中的网络摄像头收集的低分辨率视频。此外,签名者的服装也各不相同,如肤色服装与对比色服装,短袖衬衫与长袖衬衫等。在这项工作中,在外观匹配框架中解决了手部检测问题。直方图定向梯度(HOG)为基础的匹配分数函数重新制定,以允许对图像之间的非刚性对齐,以说明手的形状变化。得到的对齐分数用于支持向量机手/无手分类器中进行手检测。在美国手语视频上,新的匹配分数函数比词汇引导金字塔匹配核(VGPMK)和传统的、严格的HOG距离得到了更好的性能(ROC面积和手部检测率)。与VGPMK相比,所提出的匹配分数函数计算成本更低(用于训练和测试),参数更少,对参数设置的敏感度更低。所提出的检测器可以很好地检测来自非专业签名者的测试序列,并且在非工作室设置中具有杂乱的背景。
{"title":"An alignment based similarity measure for hand detection in cluttered sign language video","authors":"Ashwin Thangali, S. Sclaroff","doi":"10.1109/CVPRW.2009.5204266","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204266","url":null,"abstract":"Locating hands in sign language video is challenging due to a number of factors. Hand appearance varies widely across signers due to anthropometric variations and varying levels of signer proficiency. Video can be captured under varying illumination, camera resolutions, and levels of scene clutter, e.g., high-res video captured in a studio vs. low-res video gathered by a Web cam in a user's home. Moreover, the signers' clothing varies, e.g., skin-toned clothing vs. contrasting clothing, short-sleeved vs. long-sleeved shirts, etc. In this work, the hand detection problem is addressed in an appearance matching framework. The histogram of oriented gradient (HOG) based matching score function is reformulated to allow non-rigid alignment between pairs of images to account for hand shape variation. The resulting alignment score is used within a support vector machine hand/not-hand classifier for hand detection. The new matching score function yields improved performance (in ROC area and hand detection rate) over the vocabulary guided pyramid match kernel (VGPMK) and the traditional, rigid HOG distance on American Sign Language video gestured by expert signers. The proposed match score function is computationally less expensive (for training and testing), has fewer parameters and is less sensitive to parameter settings than VGPMK. The proposed detector works well on test sequences from an inexpert signer in a non-studio setting with cluttered background.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126974158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Dual domain auxiliary particle filter with integrated target signature update 集成目标特征更新的双域辅助粒子滤波
C. M. Johnston, N.A. Mould, J. Havlicek, Guoliang Fan
For the first time, we formulate an auxiliary particle filter jointly in the pixel domain and modulation domain for tracking infrared targets. This dual domain approach provides an information rich image representation comprising the pixel domain frames acquired directly from an imaging infrared sensor as well as 18 amplitude modulation functions obtained through a multicomponent AM-FM image analysis. The new dual domain auxiliary particle filter successfully tracks all of the difficult targets in the well-known AMCOM closure sequences in terms of both centroid location and target magnification. In addition, we incorporate the template update procedure into the particle filter formulation to extend previously studied dual domain track consistency checking mechanism far beyond the normalized cross correlation (NCC) trackers of the past by explicitly quantifying the differences in target signature evolution between the modulation and pixel domains. Experimental results indicate that the dual domain auxiliary particle filter with integrated target signature update provides a significant performance advantage relative to several recent competing algorithms.
本文首次在像素域和调制域联合设计了用于红外目标跟踪的辅助粒子滤波器。这种双域方法提供了一种信息丰富的图像表示,包括直接从成像红外传感器获取的像素域帧以及通过多分量AM-FM图像分析获得的18个调幅函数。新的双域辅助粒子滤波器在质心定位和目标放大两方面都成功地跟踪了AMCOM闭包序列中的所有难目标。此外,我们将模板更新过程纳入粒子滤波公式,通过明确量化调制域和像素域之间目标特征演化的差异,扩展了先前研究的双域跟踪一致性检查机制,远远超出了过去的归一化互相关(NCC)跟踪器。实验结果表明,结合目标特征更新的双域辅助粒子滤波算法相对于目前的几种算法具有明显的性能优势。
{"title":"Dual domain auxiliary particle filter with integrated target signature update","authors":"C. M. Johnston, N.A. Mould, J. Havlicek, Guoliang Fan","doi":"10.1109/CVPRW.2009.5204143","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204143","url":null,"abstract":"For the first time, we formulate an auxiliary particle filter jointly in the pixel domain and modulation domain for tracking infrared targets. This dual domain approach provides an information rich image representation comprising the pixel domain frames acquired directly from an imaging infrared sensor as well as 18 amplitude modulation functions obtained through a multicomponent AM-FM image analysis. The new dual domain auxiliary particle filter successfully tracks all of the difficult targets in the well-known AMCOM closure sequences in terms of both centroid location and target magnification. In addition, we incorporate the template update procedure into the particle filter formulation to extend previously studied dual domain track consistency checking mechanism far beyond the normalized cross correlation (NCC) trackers of the past by explicitly quantifying the differences in target signature evolution between the modulation and pixel domains. Experimental results indicate that the dual domain auxiliary particle filter with integrated target signature update provides a significant performance advantage relative to several recent competing algorithms.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121749060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Fast features for time constrained object detection 快速特征的时间约束的目标检测
G. Overett, L. Petersson
This paper concerns itself with the development and design of fast features suitable for time constrained object detection. Primarily we consider three aspects of feature design; the form of the precomputed datatype (e.g. the integral image), the form of the features themselves (i.e. the measurements made of an image), and the models/weak- learners used to construct weak classifiers (class, non-class statistics). The paper is laid out as a guide to feature designers, demonstrating how appropriate choices in combining the above three characteristics can prevent bottlenecks in the run-time evaluation of classifiers. This leads to reductions in the computational time of the features themselves and, by providing more discriminant features, reductions in the time taken to reach specific classification error rates. Results are compared using variants of the well known Haar-like feature types, Rectangular Histogram of Oriented Gradient (RHOG) features and a special set of Histogram of Oriented Gradient features which are highly optimized for speed. Experimental results suggest the adoption of this set of features for time-critical applications. Time-constrained comparisons are presented using pedestrian and road sign detection problems. Comparison results are presented on time-error plots, which are a replacement of the traditional ROC performance curves.
本文研究了适合于时间约束目标检测的快速特征的开发和设计。我们主要考虑三个方面的特征设计;预先计算的数据类型的形式(例如积分图像),特征本身的形式(例如对图像进行的测量),以及用于构建弱分类器(类,非类统计)的模型/弱学习器。本文是作为特征设计者的指南,展示了如何结合上述三个特征进行适当的选择,以防止分类器在运行时评估时出现瓶颈。这减少了特征本身的计算时间,并且通过提供更多的判别特征,减少了达到特定分类错误率所需的时间。使用众所周知的haar样特征类型的变体,定向梯度矩形直方图(RHOG)特征和一组特殊的定向梯度直方图特征进行比较,这些特征对速度进行了高度优化。实验结果表明,在时间要求严格的应用中可以采用这组特征。时间约束的比较提出了行人和道路标志检测问题。在时间误差图上给出了比较结果,这是传统ROC性能曲线的替代。
{"title":"Fast features for time constrained object detection","authors":"G. Overett, L. Petersson","doi":"10.1109/CVPRW.2009.5204293","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204293","url":null,"abstract":"This paper concerns itself with the development and design of fast features suitable for time constrained object detection. Primarily we consider three aspects of feature design; the form of the precomputed datatype (e.g. the integral image), the form of the features themselves (i.e. the measurements made of an image), and the models/weak- learners used to construct weak classifiers (class, non-class statistics). The paper is laid out as a guide to feature designers, demonstrating how appropriate choices in combining the above three characteristics can prevent bottlenecks in the run-time evaluation of classifiers. This leads to reductions in the computational time of the features themselves and, by providing more discriminant features, reductions in the time taken to reach specific classification error rates. Results are compared using variants of the well known Haar-like feature types, Rectangular Histogram of Oriented Gradient (RHOG) features and a special set of Histogram of Oriented Gradient features which are highly optimized for speed. Experimental results suggest the adoption of this set of features for time-critical applications. Time-constrained comparisons are presented using pedestrian and road sign detection problems. Comparison results are presented on time-error plots, which are a replacement of the traditional ROC performance curves.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134033434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Multi-modal laughter recognition in video conversations 视频对话中的多模态笑声识别
Sergio Escalera, Eloi Puertas, P. Radeva, O. Pujol
Laughter detection is an important area of interest in the Affective Computing and Human-computer Interaction fields. In this paper, we propose a multi-modal methodology based on the fusion of audio and visual cues to deal with the laughter recognition problem in face-to-face conversations. The audio features are extracted from the spectogram and the video features are obtained estimating the mouth movement degree and using a smile and laughter classifier. Finally, the multi-modal cues are included in a sequential classifier. Results over videos from the public discussion blog of the New York Times show that both types of features perform better when considered together by the classifier. Moreover, the sequential methodology shows to significantly outperform the results obtained by an Adaboost classifier.
笑声检测是情感计算和人机交互领域的一个重要研究领域。在本文中,我们提出了一种基于视听线索融合的多模态方法来处理面对面对话中的笑声识别问题。从频谱图中提取音频特征,并使用微笑和笑声分类器估计嘴部运动程度,从而获得视频特征。最后,将多模态线索包含在顺序分类器中。来自纽约时报公共讨论博客的视频结果表明,当分类器同时考虑这两种类型的特征时,它们的表现更好。此外,顺序方法显示出明显优于Adaboost分类器获得的结果。
{"title":"Multi-modal laughter recognition in video conversations","authors":"Sergio Escalera, Eloi Puertas, P. Radeva, O. Pujol","doi":"10.1109/CVPRW.2009.5204268","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204268","url":null,"abstract":"Laughter detection is an important area of interest in the Affective Computing and Human-computer Interaction fields. In this paper, we propose a multi-modal methodology based on the fusion of audio and visual cues to deal with the laughter recognition problem in face-to-face conversations. The audio features are extracted from the spectogram and the video features are obtained estimating the mouth movement degree and using a smile and laughter classifier. Finally, the multi-modal cues are included in a sequential classifier. Results over videos from the public discussion blog of the New York Times show that both types of features perform better when considered together by the classifier. Moreover, the sequential methodology shows to significantly outperform the results obtained by an Adaboost classifier.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134481443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Regularization of diffusion tensor field using coupled robust anisotropic diffusion filters 利用耦合鲁棒各向异性扩散滤波器的扩散张量场正则化
Songyuan Tang, Yong Fan, Hongtu Zhu, P. Yap, Wei Gao, Weili Lin, D. Shen
This paper presents a method to simultaneously regularize diffusion weighted images and their estimated diffusion tensors, with the goal of suppressing noise and restoring tensor information. We enforce a data fidelity constraint, using coupled robust anisotropic diffusion filters, to ensure consistency of the restored diffusion tensors with the regularized diffusion weighted images. The filters are designed to take advantage of robust statistics and to be adopted to the anisotropic nature of diffusion tensors, which can effectively keep boundaries between piecewise constant regions in the tensor volume and also the diffusion weighted images during the regularized process. To facilitate Euclidean operations on the diffusion tensors, log-Euclidean metrics are adopted when performing the filtering. Experimental results on simulated and real image data demonstrate the effectiveness of the proposed method.
本文提出了一种同时正则化扩散加权图像及其估计的扩散张量的方法,目的是抑制噪声和恢复张量信息。我们使用耦合鲁棒各向异性扩散滤波器强制数据保真度约束,以确保恢复的扩散张量与正则化扩散加权图像的一致性。该滤波器利用鲁棒统计特性,针对扩散张量的各向异性,在正则化过程中可以有效地保持张量体积中分段常数区域和扩散加权图像之间的边界。为了便于对扩散张量进行欧几里德运算,在进行滤波时采用对数欧几里德度量。仿真和真实图像数据的实验结果证明了该方法的有效性。
{"title":"Regularization of diffusion tensor field using coupled robust anisotropic diffusion filters","authors":"Songyuan Tang, Yong Fan, Hongtu Zhu, P. Yap, Wei Gao, Weili Lin, D. Shen","doi":"10.1109/CVPRW.2009.5204342","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204342","url":null,"abstract":"This paper presents a method to simultaneously regularize diffusion weighted images and their estimated diffusion tensors, with the goal of suppressing noise and restoring tensor information. We enforce a data fidelity constraint, using coupled robust anisotropic diffusion filters, to ensure consistency of the restored diffusion tensors with the regularized diffusion weighted images. The filters are designed to take advantage of robust statistics and to be adopted to the anisotropic nature of diffusion tensors, which can effectively keep boundaries between piecewise constant regions in the tensor volume and also the diffusion weighted images during the regularized process. To facilitate Euclidean operations on the diffusion tensors, log-Euclidean metrics are adopted when performing the filtering. Experimental results on simulated and real image data demonstrate the effectiveness of the proposed method.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134548149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1