Using OCR to automate document conversion to LATEX

Shashwat Pandey, Aditya Rohatgi
{"title":"Using OCR to automate document conversion to LATEX","authors":"Shashwat Pandey, Aditya Rohatgi","doi":"10.1109/iccica52458.2021.9697266","DOIUrl":null,"url":null,"abstract":"The process of transforming a physical document to a digital version leaves loose ends in several portions. There is a lack of solutions that offer end-to-end conversion of hard copies entailing images, graphs, tables, and other details into soft copies. To this end, we attempt to develop a computationally efficient algorithm to convert a document into its digital version through LATEX representations of the hard copy. Our research efforts take the problem of using OCR techniques into account for converting an image of a typesetted document into LATEX. This work serves as a proof of concept that equation layouts can be learned and individual character recognition is possible with not so sophisticated OCR techniques. The method we created to break the problem down step by step helped modularize and compartmentalize the tasks so that each can focus on the different types of issues that can occur at different levels of granularity.","PeriodicalId":327193,"journal":{"name":"2021 International Conference on Computational Intelligence and Computing Applications (ICCICA)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Computational Intelligence and Computing Applications (ICCICA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iccica52458.2021.9697266","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The process of transforming a physical document to a digital version leaves loose ends in several portions. There is a lack of solutions that offer end-to-end conversion of hard copies entailing images, graphs, tables, and other details into soft copies. To this end, we attempt to develop a computationally efficient algorithm to convert a document into its digital version through LATEX representations of the hard copy. Our research efforts take the problem of using OCR techniques into account for converting an image of a typesetted document into LATEX. This work serves as a proof of concept that equation layouts can be learned and individual character recognition is possible with not so sophisticated OCR techniques. The method we created to break the problem down step by step helped modularize and compartmentalize the tasks so that each can focus on the different types of issues that can occur at different levels of granularity.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用OCR自动将文档转换为LATEX
将物理文档转换为数字版本的过程在几个部分留下了遗漏。缺乏提供端到端将包含图像、图形、表格和其他细节的硬拷贝转换为软拷贝的解决方案。为此,我们试图开发一种计算效率高的算法,通过硬拷贝的LATEX表示将文档转换为数字版本。我们的研究工作考虑了使用OCR技术将排版文档的图像转换为LATEX的问题。这项工作证明了等式布局是可以学习的,个人字符识别是可能的,不那么复杂的OCR技术。我们创建的逐步分解问题的方法有助于模块化和划分任务,以便每个任务都可以专注于可能在不同粒度级别上发生的不同类型的问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Use of Body Sensors for Implementation of Human Activity Recognition Performance Prediction of Product/Person Using Real Time Twitter Tweets Survey on Centric Data Protection Method for Cloud Storage Application Twitter Sentiment Analysis using Natural Language Processing Crime Visualization using A Novel GIS-Based Framework
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1