基于半马尔可夫条件随机场的指纹拼写识别

2013 IEEE International Conference on Computer Vision Pub Date : 2013-12-01 DOI:10.1109/ICCV.2013.192

Taehwan Kim, Gregory Shakhnarovich, Karen Livescu

{"title":"基于半马尔可夫条件随机场的指纹拼写识别","authors":"Taehwan Kim, Gregory Shakhnarovich, Karen Livescu","doi":"10.1109/ICCV.2013.192","DOIUrl":null,"url":null,"abstract":"Recognition of gesture sequences is in general a very difficult problem, but in certain domains the difficulty may be mitigated by exploiting the domain's ``grammar''. One such grammatically constrained gesture sequence domain is sign language. In this paper we investigate the case of finger spelling recognition, which can be very challenging due to the quick, small motions of the fingers. Most prior work on this task has assumed a closed vocabulary of finger spelled words, here we study the more natural open-vocabulary case, where the only domain knowledge is the possible finger spelled letters and statistics of their sequences. We develop a semi-Markov conditional model approach, where feature functions are defined over segments of video and their corresponding letter labels. We use classifiers of letters and linguistic hand shape features, along with expected motion profiles, to define segmental feature functions. This approach improves letter error rate (Levenshtein distance between hypothesized and correct letter sequences) from 16.3% using a hidden Markov model baseline to 11.6% using the proposed semi-Markov model.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":"33 1","pages":"1521-1528"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"Fingerspelling Recognition with Semi-Markov Conditional Random Fields\",\"authors\":\"Taehwan Kim, Gregory Shakhnarovich, Karen Livescu\",\"doi\":\"10.1109/ICCV.2013.192\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recognition of gesture sequences is in general a very difficult problem, but in certain domains the difficulty may be mitigated by exploiting the domain's ``grammar''. One such grammatically constrained gesture sequence domain is sign language. In this paper we investigate the case of finger spelling recognition, which can be very challenging due to the quick, small motions of the fingers. Most prior work on this task has assumed a closed vocabulary of finger spelled words, here we study the more natural open-vocabulary case, where the only domain knowledge is the possible finger spelled letters and statistics of their sequences. We develop a semi-Markov conditional model approach, where feature functions are defined over segments of video and their corresponding letter labels. We use classifiers of letters and linguistic hand shape features, along with expected motion profiles, to define segmental feature functions. This approach improves letter error rate (Levenshtein distance between hypothesized and correct letter sequences) from 16.3% using a hidden Markov model baseline to 11.6% using the proposed semi-Markov model.\",\"PeriodicalId\":6351,\"journal\":{\"name\":\"2013 IEEE International Conference on Computer Vision\",\"volume\":\"33 1\",\"pages\":\"1521-1528\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE International Conference on Computer Vision\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCV.2013.192\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Conference on Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCV.2013.192","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 24

摘要

手势序列的识别通常是一个非常困难的问题，但在某些领域，通过利用该领域的“语法”可以减轻难度。其中一个受语法约束的手势序列域就是手语。在本文中，我们研究了手指拼写识别的情况，由于手指的快速，小的运动，这可能是非常具有挑战性的。大多数先前的工作都假设了一个封闭的手指拼写单词词汇表，这里我们研究了更自然的开放词汇表情况，其中唯一的领域知识是可能的手指拼写字母及其序列的统计。我们开发了一种半马尔可夫条件模型方法，其中在视频片段及其相应的字母标签上定义特征函数。我们使用字母和语言手部形状特征的分类器，以及预期的运动轮廓，来定义分段特征函数。该方法将字母错误率(假设和正确字母序列之间的Levenshtein距离)从使用隐马尔可夫模型基线的16.3%提高到使用所提出的半马尔可夫模型的11.6%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Fingerspelling Recognition with Semi-Markov Conditional Random Fields

Recognition of gesture sequences is in general a very difficult problem, but in certain domains the difficulty may be mitigated by exploiting the domain's ``grammar''. One such grammatically constrained gesture sequence domain is sign language. In this paper we investigate the case of finger spelling recognition, which can be very challenging due to the quick, small motions of the fingers. Most prior work on this task has assumed a closed vocabulary of finger spelled words, here we study the more natural open-vocabulary case, where the only domain knowledge is the possible finger spelled letters and statistics of their sequences. We develop a semi-Markov conditional model approach, where feature functions are defined over segments of video and their corresponding letter labels. We use classifiers of letters and linguistic hand shape features, along with expected motion profiles, to define segmental feature functions. This approach improves letter error rate (Levenshtein distance between hypothesized and correct letter sequences) from 16.3% using a hidden Markov model baseline to 11.6% using the proposed semi-Markov model.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 IEEE International Conference on Computer Vision

自引率

0.00%

发文量

期刊最新文献

PixelTrack: A Fast Adaptive Algorithm for Tracking Non-rigid Objects A General Dense Image Matching Framework Combining Direct and Feature-Based Costs Latent Space Sparse Subspace Clustering Non-convex P-Norm Projection for Robust Sparsity Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition