Keeping Your Documents Readable for Years to Come
Monday, 13 July 2009
Whether you are a cube dweller sharing an electronic document with your next door neighbor or a homeowner attempting to catalogue your digital life, you will soon encounter resistance in the form of document incompatibility. What good is a byte-for-byte perfect duplicate of the original if you cannot open it in an application?
My own choice for document format is almost always Portable Document Format (PDF), but rather than just state this, I would like to consider some of the factors involved.
This is the first of a series of articles covering document formats. This article focuses specifically on the distinction between works in progress and finished product.
Two Kinds of Documents
In general, we can consider two broad categories of documents: working documents (works in progress) and archived documents. You can call these by many different names, but the fundamental distinction is still there.
These are documents that you are still writing. They share some characteristics:
- They must be retained in their original format, such as Microsoft Word.
- The formats are often very specialized. Quite often another tool can import such a document, but you usually lose something in the translation.
- You and your colleagues need to have the same editor software to view and modify the documents.
- They are often short-lived. This phase of a document’s life usually doesn’t more than a few months (though a template document might be kept for many years).
- A good backup strategy will need a short window between backups; these documents change often, so they should be backed up frequently.
- You may want to consider a document versioning strategy, so you can see how the document appeared at different stages during its life.
Here are some examples:
- Microsoft Word documents
- Visio diagrams
- Photos that you are still retouching
- Audio files that you are in the process of curating (e.g. applying ID3 tags)
These are documents that are read-only, meant to be viewed but never modified.
- They often must be rendered in very precise ways, so each viewer sees the document as intended (consider a 1040 form from the IRS)
- They may be around for a long time.
- These documents should be less tightly bound to a particular software product. PDF vs. MS Word; JPG vs. Adobe Photoshop.
- They typically have a wider audience. You may share a work-in-progress with a co-worker or two, but a finished read-only document might be read by hundreds or thousands.
- Any user should be able to read these documents, with little effort.
- Your backup strategy is probably going to be more focused on longevity and less focused on frequency. These documents are in it for the long haul.
Why not start with a simple example?
Here is a screenshot of an application I use in my day job:
Just in case you did not recognize the unmistakable visage of this small molecule, I have labeled it appropriately.
This is an application called ChemDraw from Cambridgesoft, and unless you are a chemist you have probably never heard of it. My molecule is saved as caffeine.cdx in a format that only ChemDraw knows intimately (though there are other similar chemistry tools that can import this file format).
My point is simple: if your friend sent you a copy of caffeine.cdx, how exactly would you open it?
In contrast, here is a more accessible rendition of the same molecule in PDF format. Try it out; you should be able to view the molecule, and zoom in on details.
What if you had to show someone this document five years down the road? Do you want to have to chase down a possibly obsolete version of a very expensive application that might not even run on your operating system?
Some time back I was sifting through some files on an old server at work that apparently had been written by me. Fifteen years ago I was attending night classes and writing many of my English assignments on a VAX running VMS at work (over my lunch break!). I was using some anemic version of WordPerfect that had been ported to VMS. This arrangement saw me safely through college, but was not conducive to long term document storage.
Do you have any idea what VMS directory structures look like? Maybe, and maybe not. Are these files compatible with the contemporary DOS versions of WordPerfect? Maybe.
Could I open these files on a Windows Vista machine in 2009 using Microsoft Word? With luck. What about using Pages from Apple iWork on my Mac running OS X? Doubtful.
Not only do we need to be concerned with special applications that only a select few (with expensive licenses) have, but we also need to consider that the file format might be obsolete beyond hope.
For an exaggerated example, consider the image of punched paper tape at the top of this article. I would have no clue what to do if I were given a roll of this tape.
Which do you keep?
Look at the characteristics of the document types listed above and see which one fits your document best. Quite often you will find yourself keeping both the original document and a PDF rendition. Indeed, this is what many professional document databases do.
If you can’t easily choose one, keep both. In most cases, I have found that I only need the PDF rendition for the long term and I couldn’t care less about the source document.
In the world of the paperless home, much of what we do is store digital copies of old documents for searching and possible reprinting some time in the future. Don’t make the mistake of keeping all of your documents only in their original editable format; you might just find yourself with a digital file that cannot be viewed!