Are your Portable Document Format files all that?
Tuesday, 29 September 2009
Like most people who are trying to archive reams of paper, the one reliable tool I always turn to is Adobe Portable Document Format.
I trust my digital life to PDF. Almost everything I scan and most documents I write eventually end up squirreled away somewhere as PDF documents.
Have you ever considered just how portable those documents really are?
What’s wrong with PDF?
It seems strange to question the portability of these files, doesn’t it?
For the past ten or fifteen years Adobe has been providing Acrobat Reader and singing the wonders of their new universal document format. And it seemed to be all that, too—regardless of the audible groan we give when Acrobat launches after we click a link, isn’t it amazing that we can download press-ready copies of our income tax forms, that are guaranteed to look exactly the same when you print them as when I print them? Read on to see what dangers lurk within.
What’s the Problem?
In order to understand the nature of the PDF portability issues, one need only look as far as the web browser for an analogy. Consider how the web browser went from a barebones tool that could display a simple language, HTML, in a neutral way, fitting the web content onto each user’s screen, to a memory hogging behemoth that is an integral part of your operating system. It didn’t happen all at one; it has been death by a thousand cuts.
Mirroring the evolution of web browsers, the PDF document standard has adapted over the years to include many bells and whistles such as embedded audio, video, and JavaScript. It is these features that chip away at the core purpose and raison d’être of the PDF standard.
An example: Font Issues
A simple example of the weakness of these extended PDF features is the humble text font. When your application generates a PDF document, there is the option of using 14 standard PDF fonts, local machine fonts, or embedded TTF or Postscript fonts.
There are 14 standard fonts that should be available by default in each PDF reader. These fonts are Courier, Courier Bold, Courier Italic (Oblique), Courier Bold and Italic, Helvetica, Helvetica Bold, Helvetica Italic (Oblique), Helvetica Bold and Italic, Times Roman, Times Roman Bold, Times Roman Italic, Times Roman Bold and Italic, Symbol and ZapfDingBats® (source)
Guess what happens when you set your document in Mona Lisa Solid ITC and then print to PDF and send to all of your colleagues? Does your friend’s machine have a copy of this font? Maybe, and maybe not.
As I was writing this, I planned on putting together a cute demo by saving a document set in Mona Lisa Solid ITC in PDF from my Mac and then opening it on a PC. Much to my surprise (and delight), I found that the default “Print to PDF” functionality on my Mac does, in fact, embed the font within the document.
Regardless, if you have always just trusted that the fonts would be identical across platforms, you could get quite a surprise when your friend tries to print your beautiful document.
PDF/A Standard
Some time back, Adobe recognized the need for a more tightly controlled standard, for creating really portable documents, instead of mere portable documents. This standard, dating from 2005, is referred to as PDF/A, where the A stands for Archive.
A key element to … reproducibility is the requirement for PDF/A documents to be 100 % self-contained. All of the information necessary for displaying the document in the same manner every time is embedded in the file. This includes, but is not limited to, all content (text, raster images and vector graphics), fonts, and color information. A PDF/A document is not permitted to be reliant on information from external sources (e.g. font programs and hyperlinks). (Wikipedia)
Basically PDF/A forbids all of the flashy stuff and sticks to the basics: good solid document rendering.
Banned features include:
- Audio and Video
- JavaScript
- Encryption
- Nonstandard metadata
- Transparent images
In addition to the loss of several features, PDF/A documents can be somewhat larger, due to the embedded fonts, and they might have rendering issues with images that depend on transparency.
With all that, it still sounds like an enticing concept. Many PDF tools speak fluent PDF/A. Check out your own toolkit and see if you can future-proof your documents a little more
Here’s more on PDF/A documents
Long-term digital archiving with PDF/A (The PDF Blog)
PDF/A (Wikipedia)
PDF/A – A new Standard for Long-Term Archiving (PDF/A Competence Center)



No. 1 — September 30th, 2009 at 6:56 am
Great blog to let folks know about PDF/A and the importance of embedding fonts (and other things) in their PDFs.
Just wanted to remind you and your readers that as of 2007, Adobe no longer owns PDF. We turned it over the ISO (international standards organization) so that it is now a TRUE OPEN International Standard known as ISO 32000.
Leonard Rosenthol
PDF Standards Architect
Adobe Systems
No. 2 — October 27th, 2009 at 4:52 pm
[...] wrote a few weeks about document formats, mentioning the PDF/A document standard. This is worth investigating, regardless of what your document needs [...]