In 2019, we published attacks on PDF Signatures and PDF Encryption. During our research and studying the related work, we discovered a lot of blog posts, talks, and papers focusing on malicious PDFs causing some damage. However, there was no systematic analysis of all possible dangerous features supported by PDFs, but only isolated exploits and attack concepts.
We decided to fill this gap and systematize the possibilities to use legitimate PDF features and do bad stuff. We define four attack categories: Denial of Service, Information Disclosure, Data Manipulation, and Code Execution.
Our evaluation reveals 26 of 28 popular PDF processing applications are vulnerable to at least one attack. You can download all malicious PDFs here. You can also find more technical details in our NDSS'21 paper.
This is a joined work of Jens Müller, Dominik Noss, Christian Mainka, Vladislav Mladenov, and Jörg Schwenk.
Dangerous Paths: Overview
We identified four PDF objects which allow calling arbitrary actions (Page, Annotation, Field, and Catalog). Most objects offer multiple alternatives for this purpose. For example, the Catalog object, defines the OpenAction or additional actions (AA) events. Each event can launch any sequence of PDF actions, for example, Launch, Thread, etc. JavaScript actions can be embedded within documents. It opens a new area for attacks, for example, new annotations can be created that can have actions which once again lead to accessing file handles.
Denial of Service
Infinite Loop
This variant induces an endless loop causing the program execution to get stuck. The PDF standard allows various elements of the document structure to reference to themselves, or to other elements of the same type.
- Action loop: PDF actions allow to specify a Next action to be performed, thereby resulting in "action cycles".
- ObjStm loop: Object streams may extend other object streams allows the crafting of a document with cycles.
- Outline loop: PDF documents may contain an outline. Its entries, however, can refer to themselves or each other.
- Calculations: PDF defines "Type 4" calculator functions, for example, to transform colors. Processing hard-to-solve mathematical formulas may lead to high demands of CPU.
- JavaScript: Finally, in case the PDF application processes scripts within documents, infinite loops can be induced.
Deflate Bomb
Information Disclosure
URL Invocation
PDF documents that silently "phone home" should be considered as privacy-invasive. They can be used, for example, to deanonymize reviewers, journalists, or activists behind a shared mailbox. The attack's goal is to open a backchannel to an attacker-controlled server once the PDF file is opened by the victim.
Form Data Leakage
Local File Leakage
The PDF standard defines various methods to embed external files into a document or otherwise access files on the host's file system, as documented below.
- External streams: Documents can contain stream objects (e.g., images) to be included from external files on disk.
- Reference XObjects: This feature allows a document to import content from another (external) PDF document.
- Open Prepress Interface: Before printing a document, local files can be defined as low-resolution placeholders.
- Forms Data Format (FDF): Interactive form data can be stored in, and auto-imported from, external FDF files.
- JavaScript functions: The Adobe JavaScript reference enables documents to read data from or import local files.
If a malicious document managed to firstly read files from the victim’s disk and secondly, send them back to the attacker, such behavior would arguably be critical.
Credential Theft
Data Manipulation
Form Modification
File Write Access
Content Masking
The goal of this attack is to craft a document that renders differently, depending on the applied PDF interpreter. This can be used, for example, to show different content to different reviewers, to trick content filters (AI-based machines as well as human content moderators), plagiarism detection software, or search engines, which index a different text than the one shown to users when opening the document.
- Stream confusion: It is unclear how content streams are parsed if their Length value does not match the offset of the endstream marker, or if syntax errors are introduced.
- Object confusion: An object can overlay another object. The second object may not be processed if it has a duplicate object number, if it is not listed in the XRef table, or if other structural syntax errors are introduced.
- Document confusion: A PDF file can contain yet another document (e.g., as embedded file), multiple XRef tables, etc., which results in ambiguities on the structural level.
- PDF confusion: Objects before the PDF header or after an EOF marker may be processed by implementations, introducing ambiguities in the outer document structure.
Code Execution
Evaluation
Authors of this Post
Christian Mainka
Sources
[2] Aaron Spangler. WinNT/Win95 Automatic Authentication Vulnerability (IE Bug #4). https://insecure.org/sploits/winnt.automatic.authentication.html. Mar. 1997.
[3] Check Point Research. NTLM Credentials Theft via PDF Files. https://research.checkpoint.com/ntlm-credentials-theft-via-pdf-files/. 2018.