It’s all about searching
Thursday, 5 February 2009
The first time I searched for a random scanned document on my computer—and found it—I realized that this is one of the greatest benefits of the paperless home. This is one area where traditional paper lacks. And with the free products available today, there’s no need to be without.
In this article, I describe two searching options, one for the Mac and one for the PC.
A few days ago I wrote a post about five minutes in the paperless life, where I described registering my wife’s car and dealing with the associated paperwork.
As I filled out the form, I needed my insurance information, but had no idea where it was. I simply hit ⌘-Space and typed in my insurance company’s name, immediately bringing up all documents that mention the company, including my recent policy renewal.
Until the past couple of years, such full-system document searching on a PC or Mac was hit-or-miss, slow and unreliable. Anyone serious about finding their documents needed special document indexing software.
Today, anyone can begin effortless full-system searching for free on either operating system.
Spotlight
If you are on a Macintosh, running Tiger or Leopard, you have access to an excellent document indexing tool that is built right in to your operating system. Just click on the little magnifying glass in the upper-right corner of the screen and start typing; you will see all kinds of documents, files, applications, etcetera, that match your search.
This is especially nice because there is no sluggish indexing process running all the time. The Os X operating system provides filesystem events that tell any interested applications that “something has changed over here on the disk” so there is no need to laboriously crawl the filesystem looking for changes.
Spotlight will always search filenames and any metadata you have added to the file using the Finder.
If your documents are in PDF (that have been OCR’d) then Spotlight will search the entire contents of every single PDF on your system for the search string. This is good.
Of course, any word processing documents are indexed and searched as well.
It’s a thing of beauty when you type in some obscure account number and see five documents pop up that happen to mention the number.
If you prefer the keyboard to the mouse, you can hit ⌘-Space to bring up Spotlight.
Google Desktop
On the PC (and the Mac) you can use Google Desktop. This is truly an amazing product, and it’s free. It does exactly what its name says: it puts Google on your desktop, in a very literal sense.
I use this tool regularly at work, where our machines all run Windows XP, and I love it. The only reason why I don’t use it on my Mac is because Spotlight is more tightly integrated with the operating system.
Usage is simple: you type in a few words and are immediately presented with a dropdown list of possible document hits. If you want, you can see a full web page, appearing exactly like a Google search page (no surprise there), but with your local documents. In addition, when you do regular Google searches, you will see hits that have been found on your local machine. Very nice.
Just like Spotlight on the Macintosh, Google Desktop searches file names and file content. Again, if you have run OCR on your scanned PDF documents, the content will be searched.
Like Spotlight, there is a hotkey to launch the search. Just hit the Ctrl key twice.
Privacy Issues
It is important for you to understand that your data is easily searchable from your keyboard. For an illustration of what this means, try typing in your Social Security number in the search field—if you have scanned in many private documents (e.g. tax returns), you might be surprised how many hits you find.
This is not necessarily a bad thing since quite often you are searching for sensitive documents. Besides, the search tool itself provides an easy way to smoke out these kinds of documents in order to protect them better.
Both tools provide the ability to tweak the kind of content searched as well as the ability to identify which portions of your hard drive are accessible and which are private.
If you are very serious about locking down confidential files you need to carefully study the security concerns surrounding indexing tools. For example, here’s an article discussing how tools such as Google Desktop might circumvent the privacy of encrypted volumes offered by TrueCrypt (a popular open-source encryption product).
Key Points
If you have never used either of these products, they are both free and you will be amazed with the results.
These search tools really shine when you run OCR on any scanned documents.
Remember that your data is now searchable by anyone sitting at your machine.
If you aren’t using Google Desktop or Spotlight right now, what are you waiting for?
[Update: I guess I should practice what I preach—a few minutes ago I typed my SSN into the Spotlight search field and was quite surprised to find a decade-old Word document. It was a letter I sent to a former employer giving my address change information. Oops!]



No. 1 — February 22nd, 2009 at 9:07 pm
[...] OCR software, your documents are not searchable. Fortunately, most scanners come bundled with some kind of OCR software. Mine came with both Abbyy [...]
No. 2 — March 24th, 2009 at 11:53 pm
[...] sure your files are searchable where [...]