Story Beyond the Eye: Glyph Positions Break PDF Text Redaction

Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium Pub Date : 2023-07-01 DOI:10.56553/popets-2023-0069

Maxwell Bland, Anushya Iyer, Kirill Levchenko

{"title":"Story Beyond the Eye: Glyph Positions Break PDF Text Redaction","authors":"Maxwell Bland, Anushya Iyer, Kirill Levchenko","doi":"10.56553/popets-2023-0069","DOIUrl":null,"url":null,"abstract":"In this work we find that many current redactions of PDF text are insecure due to non-redacted character positioning information. In particular, subpixel-sized horizontal shifts in redacted and non-redacted characters can be recovered and used to effectively deredact first and last names. Unfortunately these findings affect redactions where the text underneath the black box is removed from the PDF. We demonstrate these findings by performing a comprehensive vulnerability assessment of common PDF redaction types. We examine 11 popular PDF redaction tools, including Adobe Acrobat, and find that they leak information about redacted text. We also effectively deredact hundreds of real-world PDF redactions, including those found in OIG investigation reports and FOIA responses. To correct the problem, we have released open source algorithms to fix vulnerable redactions and reduce the amount of information leaked by nonexcising redactions (where the text underneath the redaction is copy-pastable). We have also notified the developers of the studied redaction tools. We have notified the Office of Inspector General, the Free Law Project, PACER, Adobe, Microsoft, and the US Department of Justice. We are working with several of these groups to prevent our discoveries from being used for malicious purposes.","PeriodicalId":74556,"journal":{"name":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","volume":"95 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.56553/popets-2023-0069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

In this work we find that many current redactions of PDF text are insecure due to non-redacted character positioning information. In particular, subpixel-sized horizontal shifts in redacted and non-redacted characters can be recovered and used to effectively deredact first and last names. Unfortunately these findings affect redactions where the text underneath the black box is removed from the PDF. We demonstrate these findings by performing a comprehensive vulnerability assessment of common PDF redaction types. We examine 11 popular PDF redaction tools, including Adobe Acrobat, and find that they leak information about redacted text. We also effectively deredact hundreds of real-world PDF redactions, including those found in OIG investigation reports and FOIA responses. To correct the problem, we have released open source algorithms to fix vulnerable redactions and reduce the amount of information leaked by nonexcising redactions (where the text underneath the redaction is copy-pastable). We have also notified the developers of the studied redaction tools. We have notified the Office of Inspector General, the Free Law Project, PACER, Adobe, Microsoft, and the US Department of Justice. We are working with several of these groups to prevent our discoveries from being used for malicious purposes.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

超越眼睛的故事:字形位置打破PDF文本编校

在这项工作中，我们发现许多当前的PDF文本编校是不安全的，由于未编校字符定位信息。特别是，在编辑和未编辑的字符中，亚像素级大小的水平位移可以恢复并用于有效地删除名字和姓氏。不幸的是，这些发现影响了编辑，黑盒子下面的文本被从PDF中删除。我们通过对常见的PDF编校类型进行全面的漏洞评估来证明这些发现。我们检查了11种流行的PDF编校工具，包括Adobe Acrobat，并发现它们会泄露有关编校文本的信息。我们还有效地删除了数百份真实世界的PDF版本，包括OIG调查报告和《信息自由法》回应中发现的内容。为了纠正这个问题，我们发布了开源算法来修复易受攻击的编校，并减少非删节编校(其中编校下面的文本是可复制粘贴的)泄露的信息数量。我们还通知了所研究的编校工具的开发人员。我们已经通知了监察长办公室、自由法律项目、PACER、Adobe、微软和美国司法部。我们正在与其中几个组织合作，以防止我们的发现被用于恶意目的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊