历史手稿中的单词点错功能

T. Rath, R. Manmatha
{"title":"历史手稿中的单词点错功能","authors":"T. Rath, R. Manmatha","doi":"10.1109/ICDAR.2003.1227662","DOIUrl":null,"url":null,"abstract":"For the transition from traditional to digital libraries, the large number of handwritten manuscripts that exist pose a great challenge. Easy access to such collections requires an index, which is currently created manually at great cost. Because automatic handwriting recognizers fail on historical manuscripts, the word spotting technique has been developed: the words in a collection are matched as images and grouped into clusters which contain all instances of the same word. By annotating \"interesting\" clusters, an index that links words to the locations where they occur can be built automatically. Due to the noise in historical documents, selecting the right features for matching words is crucial. We analyzed a range of features suitable for matching words using dynamic time warping (DTW), which aligns and compares sets of features extracted from two images. Each feature's individual performance was measured on a test set. With an average precision of 72%, a combination of features outperforms competing techniques in speed and precision.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"255","resultStr":"{\"title\":\"Features for word spotting in historical manuscripts\",\"authors\":\"T. Rath, R. Manmatha\",\"doi\":\"10.1109/ICDAR.2003.1227662\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For the transition from traditional to digital libraries, the large number of handwritten manuscripts that exist pose a great challenge. Easy access to such collections requires an index, which is currently created manually at great cost. Because automatic handwriting recognizers fail on historical manuscripts, the word spotting technique has been developed: the words in a collection are matched as images and grouped into clusters which contain all instances of the same word. By annotating \\\"interesting\\\" clusters, an index that links words to the locations where they occur can be built automatically. Due to the noise in historical documents, selecting the right features for matching words is crucial. We analyzed a range of features suitable for matching words using dynamic time warping (DTW), which aligns and compares sets of features extracted from two images. Each feature's individual performance was measured on a test set. With an average precision of 72%, a combination of features outperforms competing techniques in speed and precision.\",\"PeriodicalId\":249193,\"journal\":{\"name\":\"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-08-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"255\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDAR.2003.1227662\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2003.1227662","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 255

摘要

对于传统图书馆向数字图书馆的过渡来说,现存的大量手写体手稿构成了巨大的挑战。方便地访问这样的集合需要索引,而目前手工创建索引的成本很高。由于自动手写识别器无法识别历史手稿,因此人们开发了单词识别技术:将集合中的单词作为图像进行匹配,并将其分组为包含同一单词的所有实例的簇。通过标注“有趣的”集群,可以自动建立一个索引,将单词链接到它们出现的位置。由于历史文献中存在噪声,选择合适的特征进行匹配至关重要。我们使用动态时间扭曲(DTW)分析了一系列适合匹配单词的特征,DTW对从两幅图像中提取的特征集进行对齐和比较。在一个测试集上测量每个特征的单独性能。平均精度为72%,这些特征的组合在速度和精度上优于竞争对手的技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Features for word spotting in historical manuscripts
For the transition from traditional to digital libraries, the large number of handwritten manuscripts that exist pose a great challenge. Easy access to such collections requires an index, which is currently created manually at great cost. Because automatic handwriting recognizers fail on historical manuscripts, the word spotting technique has been developed: the words in a collection are matched as images and grouped into clusters which contain all instances of the same word. By annotating "interesting" clusters, an index that links words to the locations where they occur can be built automatically. Due to the noise in historical documents, selecting the right features for matching words is crucial. We analyzed a range of features suitable for matching words using dynamic time warping (DTW), which aligns and compares sets of features extracted from two images. Each feature's individual performance was measured on a test set. With an average precision of 72%, a combination of features outperforms competing techniques in speed and precision.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Impact of imperfect OCR on part-of-speech tagging Writer identification using innovative binarised features of handwritten numerals Word searching in CCITT group 4 compressed document images Exploiting reliability for dynamic selection of classi .ers by means of genetic algorithms Investigation of off-line Japanese signature verification using a pattern matching
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1