What Exactly is a PDF?

Explore the basics and key features of PDF files.

We often assume we know what a PDF is, but it is rarely explained in detail. This article provides a clear and simple overview of PDFs without getting too technical. We will cover the basics, including the internal structure of a PDF and why it is still such a popular format. Let’s get started.

PDF basics

Definition

PDF stands for Portable Document Format. It’s an electronic document format designed to look and function like paper documents. The term "portable" means that a PDF should appear the same no matter where or how it is viewed.

History

PDF was created by Adobe in 1991 and became an open standard so anyone could develop tools for creating, editing, and viewing PDFs. In 2008, it was standardized as an ISO standard, which further promoted its wide adoption.

Features

A key characteristic of a PDF is that it is self-contained; everything needed to display the document is included in the file. This makes PDFs easy to transfer, store, and archive. Adobe Reader, the PDF viewer, is free, which has also contributed to its widespread use. Understanding how PDFs are structured can help you use tools like Acrobat more effectively for your document projects.

How do PDFs work?

Simple PDF

At its core, a PDF is like a binder or folder containing individual pages. You can add pages to a PDF, split pages, and move pages from one PDF to another, similar to handling paper pages in a binder.

PDFs also contain a set of data that applies to the entire document, known as document level data. It includes information such as document security settings, metadata, and other properties that apply to the entire document.

Think of it like a physical paper binder with a lock and information written on the inside or outside cover. This binder analogy helps explain how these properties work in an electronic PDF document.

PDF file

More to a PDF

There is more to a PDF than that. Let’s take a closer look at the document level.

A PDF can contain:

  • Bookmarks: Bookmarks act as a navigation tool, similar to a table of contents.
  • Security data: This controls access to the document.
  • File attachments: These are files attached to the PDF, so the PDF can work like a zip file.
  • Document scripts: Scripts at the document level are triggered by document-level events, such as opening or printing the PDF.
  • Form fields and data: Even though users interact with form fields on the pages, these are stored at the document level. The fields are global to the entire document, while the widgets are the local appearance and user interface for those fields on specific pages.
  • Document metadata: This includes information such as author, title, and keywords.
  • Various resources: These include fonts, color spaces, images, videos, and more, used in other parts of the document.

The pages of a PDF are what the user sees and interacts with. These pages are shown by a rendering engine that draws the page content. The rendering engine needs resources such as fonts, color space definitions, and images. These resources are stored in the PDF, which supports its portability. Fonts are an exception: they do not always have to be embedded in the PDF.

When a font is embedded, it is stored inside the PDF. If it is not embedded, Acrobat will look for the font on the user’s system or use a default font that does not need embedding. In such cases, the PDF is not completely self-contained.

Types of elements

On a page, there are two types of elements: static page content and a list of annotations. The static page content includes all the regular text, graphics, and images (main document content).

Annotations are special elements that the user can interact with, such as form field widgets, commenting and markup tools, and multimedia tools. Unlike static content, annotations do not always have to be visible. For example, a link is an annotation that occupies space on the page but may not have any visible appearance.

When an annotation, like a circle, is drawn, it has the appearance of a red circular line. Inside the PDF’s structure, both page content and annotations are defined using the same vector graphics language. The rendering engine draws the page content first, followed by the annotations in a specified order. This layered approach makes annotations appear as if they are floating above the page content.

PDF Static Content and Annotations

Annotations provide dynamic and interactive features to the PDF. They are the only elements on a page that respond to user actions, such as keystrokes and mouse clicks. For instance, a circle annotation can be selected, moved, and resized.

Different types of annotations offer different interactions. A note annotation prompts the user to enter text and can be moved but not resized. Each annotation type responds in its own way to user inputs, enhancing the interactive capabilities of the PDF while appearing over the main page content.

Editing PDFs

The page content in a PDF is meant to be static. When viewed in Adobe Reader, the page content cannot be changed because the reader does not include editing tools. However, in Adobe Acrobat, you can edit the content directly. Edits are best done in the original application used to create the document.

After making the changes, save the document as a PDF again. This method preserves the document's integrity and helps prevent issues with formatting and content accuracy.

TIP: For quick edits, PDF2Go offers an online solution with its PDF To Word Converter. This tool lets you convert your PDF into an editable Word document, making larger changes easier. Once your edits are complete, you can save the document back to PDF format.

Graphic Operators

Graphic operators are key elements in the precise rendering of PDF content. These operators, forming the core of the graphics language, control every aspect of what appears on a PDF page, whether it is static content like text or dynamic elements like annotations.

A vector graphic, the exact description of what is drawn, is built using these operators. They specify important details such as where a line begins and ends, its color, thickness, and other visual attributes. This detailed instruction set ensures that every graphical element in a PDF is accurately reproduced across different viewers and when printing.

PDF Structure

The internal structure of a PDF can be visualized as a tree. At the top are document-level properties (metadata, scripts, pages, security info, AcroForm), followed by a set of pages, each containing static content, a set of resources used to render that content, and a list of annotations.

Note that annotations use resources within a PDF. If an annotation has a visual appearance, it uses the vector graphics language used for the main page content. In other words, it requires the same resources as the primary content for accurate rendering and display.

AcroForm

An AcroForm is like a master list for all form fields and their data across the entire PDF document. Each field widget you see on individual pages is essentially a copy of an entry in this main list. These form field widgets are listed alongside commenting and markup annotations in the PDF structure.

To the rendering engine that displays everything on the page, all annotations, whether they are form fields or markup, are treated the same as elements to be shown. The real difference between these types of annotations lies in how they behave interactively, not in how they are displayed.

In Conclusion

Understanding the structure and capabilities of PDFs helps you use their full potential, whether for creating forms, securing documents, or sharing information reliably. With reliable PDF tools, you can explore and use the powerful features of this common format.