A Comparative Evaluation of PDF-to-HTML Conversion Tools

Pramodya Pathirana, Asini Silva, Thenuka Lawrence, T. Weerasinghe, Roshan Abeyweera
{"title":"A Comparative Evaluation of PDF-to-HTML Conversion Tools","authors":"Pramodya Pathirana, Asini Silva, Thenuka Lawrence, T. Weerasinghe, Roshan Abeyweera","doi":"10.1109/SCSE59836.2023.10214989","DOIUrl":null,"url":null,"abstract":"PDF (Portable Document Format) is a popular file format used for sharing and storing documents across different platforms. However, there are occasions when the content of a PDF document needs to be re-purposed for online use. PDF-toHTML conversion is a common method used to achieve this goal. This research paper presents a comparative evaluation of existing PDF-to-HTML conversion tools for their suitability in extracting text and images. These tools were tested using school textbooks in Sri Lanka, which contain complex text formatting and non-textual elements. The evaluation was based on various criteria, such as the accuracy of the output, handling of complex text formatting, and non-textual elements. Comparisons were drawn based on the performance of each of these tools with respect to the criteria. The study provides useful insights for individuals and organizations looking to re-purpose PDF content for online use in the HTML format, particularly in the education sector.","PeriodicalId":429228,"journal":{"name":"2023 International Research Conference on Smart Computing and Systems Engineering (SCSE)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Research Conference on Smart Computing and Systems Engineering (SCSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCSE59836.2023.10214989","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

PDF (Portable Document Format) is a popular file format used for sharing and storing documents across different platforms. However, there are occasions when the content of a PDF document needs to be re-purposed for online use. PDF-toHTML conversion is a common method used to achieve this goal. This research paper presents a comparative evaluation of existing PDF-to-HTML conversion tools for their suitability in extracting text and images. These tools were tested using school textbooks in Sri Lanka, which contain complex text formatting and non-textual elements. The evaluation was based on various criteria, such as the accuracy of the output, handling of complex text formatting, and non-textual elements. Comparisons were drawn based on the performance of each of these tools with respect to the criteria. The study provides useful insights for individuals and organizations looking to re-purpose PDF content for online use in the HTML format, particularly in the education sector.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
pdf到html转换工具的比较评估
PDF(可移植文档格式)是一种流行的文件格式,用于跨不同平台共享和存储文档。但是,在某些情况下,PDF文档的内容需要重新用于在线使用。pdf -to - html转换是实现这一目标的常用方法。本研究报告对现有PDF-to-HTML转换工具在提取文本和图像方面的适用性进行了比较评估。这些工具在斯里兰卡使用学校教科书进行了测试,这些教科书包含复杂的文本格式和非文本元素。评估基于各种标准,例如输出的准确性、复杂文本格式的处理和非文本元素。根据每个工具相对于标准的性能进行比较。该研究为希望将PDF内容转换成HTML格式用于在线使用的个人和组织提供了有用的见解,特别是在教育部门。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Exploring Music Similarity through Siamese CNNs using Triplet Loss on Music Samples Impacts of Integrated Railway-Based Containerized Cargo Transport Network to Connect the Port of Colombo and Free Trade Zones in Sri Lanka Investigating Factors Influencing Behavioral Intention Toward Green Computing Practices Among Undergraduates In Sri Lankan Universities Preserving India’s Rich Dance Heritage: A Classification of Indian Dance Forms and Innovative Digital Management Solutions for Cultural Heritage Conservation An Automatic Density Cluster Generation Method to Identify the Amount of Tool Flank Wear via Tool Vibration
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1