自动合成历史阿拉伯语文本的单词定位

M. Kassis, Jihad El-Sana
{"title":"自动合成历史阿拉伯语文本的单词定位","authors":"M. Kassis, Jihad El-Sana","doi":"10.1109/DAS.2016.9","DOIUrl":null,"url":null,"abstract":"We present a novel framework for automatic and efficient synthesis of historical handwritten Arabic text. The main purpose of this framework is to assist word spotting and keyword searching in handwritten historical documents. The proposed framework consists of two main procedures: building a letter connectivity map and synthesizing words. A letter connectivity map includes multiple instances of the various shape of each letter, since a letter in Arabic usually has multiple shapes depends in its position in the word. Each map represents one writer and encodes the specific handwriting style. The letter connectivity map is used to guide the synthesis of any Arabic continuous subword, word, or sentence. The proposed framework automatically generates the letter connectivity map annotation from a several pages historical pages previously annotated. Once the letter connectivity map is available our framework can synthesis the pictorial representation of any Arabic word or sentence from their text representation. The writing style of the synthesized text resembles the writing style of the input pages. The synthesized words can be used in word-spotting and many other historical document processing applications. The proposed approach provides an intuitive and easy-to-use framework to search for a keyword in the rest of the manuscript. Our experimental study shows that our approach enables accurate results in word spotting algorithms.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Automatic Synthesis of Historical Arabic Text for Word-Spotting\",\"authors\":\"M. Kassis, Jihad El-Sana\",\"doi\":\"10.1109/DAS.2016.9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a novel framework for automatic and efficient synthesis of historical handwritten Arabic text. The main purpose of this framework is to assist word spotting and keyword searching in handwritten historical documents. The proposed framework consists of two main procedures: building a letter connectivity map and synthesizing words. A letter connectivity map includes multiple instances of the various shape of each letter, since a letter in Arabic usually has multiple shapes depends in its position in the word. Each map represents one writer and encodes the specific handwriting style. The letter connectivity map is used to guide the synthesis of any Arabic continuous subword, word, or sentence. The proposed framework automatically generates the letter connectivity map annotation from a several pages historical pages previously annotated. Once the letter connectivity map is available our framework can synthesis the pictorial representation of any Arabic word or sentence from their text representation. The writing style of the synthesized text resembles the writing style of the input pages. The synthesized words can be used in word-spotting and many other historical document processing applications. The proposed approach provides an intuitive and easy-to-use framework to search for a keyword in the rest of the manuscript. Our experimental study shows that our approach enables accurate results in word spotting algorithms.\",\"PeriodicalId\":197359,\"journal\":{\"name\":\"2016 12th IAPR Workshop on Document Analysis Systems (DAS)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-04-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 12th IAPR Workshop on Document Analysis Systems (DAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DAS.2016.9\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DAS.2016.9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

我们提出了一个新的框架自动和有效的合成历史手写体阿拉伯语文本。该框架的主要目的是帮助手写体历史文献中的单词识别和关键字搜索。该框架包括两个主要步骤:建立字母连接图和合成单词。字母连接图包括每个字母的各种形状的多个实例,因为阿拉伯语中的字母通常根据其在单词中的位置有多种形状。每个地图代表一个写作者,并编码特定的书写风格。字母连接图用于指导任何阿拉伯语连续子词、单词或句子的合成。提出的框架自动从先前注释的几个历史页面生成字母连通性映射注释。一旦字母连接图可用,我们的框架就可以从文本表示中合成任何阿拉伯单词或句子的图形表示。合成文本的写作风格类似于输入页面的写作风格。合成词可用于单词识别和许多其他历史文档处理应用程序。提出的方法提供了一个直观和易于使用的框架来搜索手稿的其余部分的关键字。我们的实验研究表明,我们的方法可以在单词识别算法中获得准确的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Automatic Synthesis of Historical Arabic Text for Word-Spotting
We present a novel framework for automatic and efficient synthesis of historical handwritten Arabic text. The main purpose of this framework is to assist word spotting and keyword searching in handwritten historical documents. The proposed framework consists of two main procedures: building a letter connectivity map and synthesizing words. A letter connectivity map includes multiple instances of the various shape of each letter, since a letter in Arabic usually has multiple shapes depends in its position in the word. Each map represents one writer and encodes the specific handwriting style. The letter connectivity map is used to guide the synthesis of any Arabic continuous subword, word, or sentence. The proposed framework automatically generates the letter connectivity map annotation from a several pages historical pages previously annotated. Once the letter connectivity map is available our framework can synthesis the pictorial representation of any Arabic word or sentence from their text representation. The writing style of the synthesized text resembles the writing style of the input pages. The synthesized words can be used in word-spotting and many other historical document processing applications. The proposed approach provides an intuitive and easy-to-use framework to search for a keyword in the rest of the manuscript. Our experimental study shows that our approach enables accurate results in word spotting algorithms.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Handwritten and Machine-Printed Text Discrimination Using a Template Matching Approach General Pattern Run-Length Transform for Writer Identification Automatic Selection of Parameters for Document Image Enhancement Using Image Quality Assessment Large Scale Continuous Dating of Medieval Scribes Using a Combined Image and Language Model Performance of an Off-Line Signature Verification Method Based on Texture Features on a Large Indic-Script Signature Dataset
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1