Anonymizing and obfuscating PDF content while preserving document structure

Charlotte Curtis
{"title":"Anonymizing and obfuscating PDF content while preserving document structure","authors":"Charlotte Curtis","doi":"10.1145/3558100.3563849","DOIUrl":null,"url":null,"abstract":"The portable document format (PDF) is both versatile and complex, with a specification exceeding well over a thousand pages. For independent developers writing software that reads, displays, or transforms PDFs, it is difficult to comprehensively account for all of the potential variations that might exist in the wild. Compounding this problem are the usage agreements that often accompany purchased and proprietary PDFs, preventing end users from uploading a troublesome document as part of a bug report and limiting the set of test cases that can be made public for open source development. In this paper, pdf-mangler is presented as a solution to this problem. The goal of pdf-mangler is to remove information in the form of text, images, and vector graphics while retaining as much of the document structure and general visual appearance as possible. The intention is for pdf-mangler to be deployed as part of an automated bug reporting tool for PDF software.","PeriodicalId":146244,"journal":{"name":"Proceedings of the 22nd ACM Symposium on Document Engineering","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 22nd ACM Symposium on Document Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3558100.3563849","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The portable document format (PDF) is both versatile and complex, with a specification exceeding well over a thousand pages. For independent developers writing software that reads, displays, or transforms PDFs, it is difficult to comprehensively account for all of the potential variations that might exist in the wild. Compounding this problem are the usage agreements that often accompany purchased and proprietary PDFs, preventing end users from uploading a troublesome document as part of a bug report and limiting the set of test cases that can be made public for open source development. In this paper, pdf-mangler is presented as a solution to this problem. The goal of pdf-mangler is to remove information in the form of text, images, and vector graphics while retaining as much of the document structure and general visual appearance as possible. The intention is for pdf-mangler to be deployed as part of an automated bug reporting tool for PDF software.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
匿名化和模糊化PDF内容,同时保留文档结构
可移植文档格式(PDF)既通用又复杂,其规范远远超过一千页。对于编写读取、显示或转换pdf的软件的独立开发人员来说,很难全面考虑可能存在的所有潜在变化。使这个问题更加复杂的是,通常伴随着购买和专有pdf的使用协议,阻止最终用户上传麻烦的文档作为bug报告的一部分,并限制了可以公开用于开源开发的测试用例集。本文提出了pdf-mangler来解决这个问题。pdf-mangler的目标是删除文本、图像和矢量图形形式的信息,同时尽可能多地保留文档结构和一般视觉外观。其目的是将PDF -mangler部署为PDF软件的自动错误报告工具的一部分。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
How did dennis ritchie produce his PhD thesis?: a typographical mystery From print to online newspapers on small displays: a layout generation approach aimed at preserving entry points Binarization of photographed documents image quality, processing time and size assessment Tab this folder of documents: page stream segmentation of business documents Graphical document representation for french newsletters analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1