<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Paper Jammed &#187; PDF</title>
	<atom:link href="http://paperjammed.com/tag/pdf/feed/" rel="self" type="application/rss+xml" />
	<link>http://paperjammed.com</link>
	<description>Has paper taken over your life?</description>
	<lastBuildDate>Wed, 04 Apr 2012 00:42:56 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>I wish the hackers would leave PDF alone!</title>
		<link>http://paperjammed.com/2010/08/03/i-wish-the-hackers-would-leave-pdf-alone/</link>
		<comments>http://paperjammed.com/2010/08/03/i-wish-the-hackers-would-leave-pdf-alone/#comments</comments>
		<pubDate>Wed, 04 Aug 2010 03:15:59 +0000</pubDate>
		<dc:creator>Tad</dc:creator>
				<category><![CDATA[Paperless Life]]></category>
		<category><![CDATA[Security]]></category>
		<category><![CDATA[PDF]]></category>
		<category><![CDATA[Rants]]></category>

		<guid isPermaLink="false">http://paperjammed.com/?p=1028</guid>
		<description><![CDATA[In case I haven&#8217;t made myself clear in other posts, I like PDF documents. I mean I Really Like PDF documents. And I want to be able to treat a PDF file exactly as I would a sheaf of printed pages. Then along comes someone who exploits yet another bug in someone&#8217;s PDF renderer. A [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-full wp-image-1029" title="20100804-50568_3739" src="http://paperjammed.com/wp-content/uploads/2010/08/20100804-50568_3739.png" alt="" width="300" height="133" />In case I haven&#8217;t made myself clear in other posts, I like PDF documents. I mean I Really Like PDF documents.</p>
<p>And I want to be able to treat a PDF file exactly as I would a sheaf of printed pages.</p>
<p>Then along comes someone who exploits yet another bug in someone&#8217;s PDF renderer. A few months ago Acrobat Reader was all over the news. Today I saw that all of the cool kids are <a href="http://www.engadget.com/2010/08/03/jailbreakme-using-pdf-exploit-to-hack-your-iphone-so-could-the/">jailbreaking their iPhones using a simple web site</a> that exploits a PDF defect in mobile Safari in iOS4.</p>
<p>And if the slick website can inject code that does something as profound as jailbreaking your iPhone, it should be child&#8217;s play for a black hat to use the same thing to take over your iPhone and ring up millions of dollars of charges to some telephone extortion outfit in a remote part of Africa.</p>
<p>I guess all of the fancy PDF features are a double edged sword—recall that Active-X controls and DDT were both amazing and powerful when they were introduced, but the improper use of both have sullied their good names. I just hope that the goal of a pure paper replacement standard is not lost and that these events do not cause PDF to become a marginalized technology.</p>
]]></content:encoded>
			<wfw:commentRss>http://paperjammed.com/2010/08/03/i-wish-the-hackers-would-leave-pdf-alone/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Showin&#8217; your chops on those piles of sheet music</title>
		<link>http://paperjammed.com/2010/03/29/showin-your-chops-on-those-piles-of-sheet-music/</link>
		<comments>http://paperjammed.com/2010/03/29/showin-your-chops-on-those-piles-of-sheet-music/#comments</comments>
		<pubDate>Tue, 30 Mar 2010 00:34:23 +0000</pubDate>
		<dc:creator>Tad</dc:creator>
				<category><![CDATA[Paperless Life]]></category>
		<category><![CDATA[Scanning]]></category>
		<category><![CDATA[Searching and Indexing]]></category>
		<category><![CDATA[Files and Folders]]></category>
		<category><![CDATA[Music]]></category>
		<category><![CDATA[Organization]]></category>
		<category><![CDATA[PDF]]></category>

		<guid isPermaLink="false">http://paperjammed.com/?p=957</guid>
		<description><![CDATA[Show me a musician and I&#8217;ll show you someone who has at least a three foot stack of sheet music squirreled away somewhere. My situation is worse—both my wife and I are musicians, to one degree or another. Throw in the fact that she is a music teacher and you can imagine just how many [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-medium wp-image-959" title="Hollow Body" src="http://paperjammed.com/wp-content/uploads/2010/03/iStock_000000536065XSmall-300x257.jpg" alt="iStockphoto" width="300" height="257" />Show me a musician and I&#8217;ll show you someone who has at least a three foot stack of sheet music squirreled away somewhere.</p>
<p>My situation is worse—both my wife and I are musicians, to one degree or another. Throw in the fact that she is a music teacher and you can imagine just how many pages of sheet music there are filling bins and flexing cheap shelving in my house.</p>
<p><strong>What do I have and Where is it?</strong></p>
<p>The biggest problem we face is knowing what we have and where it is. I have hundreds and hundreds of pages of classical and jazz guitar sheet music, but if I need to find Villalobos&#8217; <em>Choros no. 1</em>, where do I look?<span id="more-957"></span></p>
<p>Shortly after I bought my ScanSnap, I began scanning in all of my sheet music (I have left much of my wife&#8217;s collection untouched—I&#8217;m sure you&#8217;ll understand). In most cases, I simply hacked the spine off of the original book and fed the sheets through the scanner. Now, I have less paper in the house and my music is searchable.</p>
<p>In most cases I didn&#8217;t bother to run OCR on the documents since there is little in the way of printed words on most sheet music that is worth indexing. I did take care to name the files well.</p>
<p>If you ever hope to find your music on your computer, make sure you include at least the composer/artist and song title in the file name.</p>
<p><strong>Is this really cutting down on paper?</strong></p>
<p>Whenever I find what I&#8217;m looking for I might play it directly off of the computer screen, but it is more likely that I&#8217;ll print it out. Doesn&#8217;t this kind of negate the idea of removing paper from my home? Not really. Think about it—most sheet music is never played. We have books with hundreds of songs in them and we play only  a handful. That&#8217;s just the way it is.</p>
<p>The fact that I print out five or ten pages in a month does not negate the many hundreds of pages that were scanned and then recycled.</p>
<p><strong>Great for Music Lessons</strong></p>
<p>I started taking jazz lessons again a month or two back, and my teacher gave me some lead sheets, with all kinds of useful annotations on them. As soon as I was home, I scanned those babies in, so I would not risk losing the valuable information. I also went through all of my notes from prior lessons and scanned them in as well. These kinds of things are precisely the sorts of paper that tend to get lost in some mismash of unsorted music.</p>
<p>Now, I can type in &#8220;Four&#8221; in my favorite PDF library application and find the lead sheet for Miles Davis&#8217; <em>Four</em>.</p>
<p><img class="alignnone size-full wp-image-958" title="20100329-yep" src="http://paperjammed.com/wp-content/uploads/2010/03/20100329-yep.png" alt="" width="535" height="404" /></p>
<p>Maybe you don&#8217;t have that many notebooks full of music lesson notes, but when you have been trying (poorly) to learn for as many years as I have, those notebooks begin to proliferate. Just scan them all in, give them some good filenames, add some keywords to help, and you&#8217;re in business.</p>
<p><strong>What about copyright?</strong></p>
<p>It seems that the jury is still out on digitizing works you own. There&#8217;s <a href="http://www.wired.com/gadgetlab/2009/12/diy-book-scanner/">one fellow who made a right awesome device</a> for scanning in textbooks in minutes, by photographing the pages. That guy&#8217;s machine has spurred much debate about whether or not you have the right to digitize your own stuff.</p>
<p>On the one hand, you bought the book and paid for it, so it would seem that fair use covers this; on the other hand, publishers are eager to monetize digital media, reselling the same works to you if they can.</p>
<p>So, is Daniel Reetz&#8217;s butt-kickin&#8217; book scanner legal?</p>
<p style="padding-left: 30px;">That would depend on who you talk to, says Pamela Samuelson, a professor at University of California at Berkeley, who specializes in digital-copyright law. Trade publishers are almost certain to cry copyright infringement, she says, though it may not necessarily be the case.</p>
<p style="padding-left: 30px;">Google was recently forced to pay $125 million to settle with angry book publishers and authors who claimed copyright infringement as a result of the search giant’s book-scanning project.</p>
<p style="padding-left: 30px;">But not so individual users who already own the book, says Samuelson. If you scan a book that you have already purchased, it is “fine, and fair use,” she says. “Personal-use copying should be deemed to be fair, unless there is a demonstrable showing of harm to the market for the copyright at work,” says Samuelson.</p>
<p style="padding-left: 30px;">(<a href="http://www.wired.com/gadgetlab/2009/12/diy-book-scanner/">Source</a>: wired.com)</p>
<p>Here&#8217;s another take on this:</p>
<p style="padding-left: 30px;">Question</p>
<p style="padding-left: 60px;">I bought a book for school, can I make a copy of the book for my own use to write on so I don&#8217;t write in the book and can get my money back when I return the book to the campus store.</p>
<p style="padding-left: 30px;">Accepted Answer</p>
<p style="padding-left: 60px;">You have the right to make a copy of the book you purchased as long as you are using the copy for your personal use. The copyright laws merely prevent you from making copies to sell or distribute.</p>
<p style="padding-left: 30px;">(<a href="http://www.justanswer.com/questions/2heyq-i-bought-a-book-for-school-can-i-make-a-copy-of-the-book-for">Source</a>: justanswer.com)</p>
<p>Of course, if you go passing your PDF documents around to all of your friends, all bets are off.</p>
<p><strong>Final thoughts</strong><br />
Music is a hobby that seems to accumulate great stacks of paper, but these music sheets are peculiar in that you only need one or two out of every hundred. Why not digitize the whole lot and keep those book shelves from sagging?</p>
]]></content:encoded>
			<wfw:commentRss>http://paperjammed.com/2010/03/29/showin-your-chops-on-those-piles-of-sheet-music/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A handful of sweet freebie tools to save the day</title>
		<link>http://paperjammed.com/2010/03/16/a-handful-of-sweet-freebie-tools-to-save-the-day/</link>
		<comments>http://paperjammed.com/2010/03/16/a-handful-of-sweet-freebie-tools-to-save-the-day/#comments</comments>
		<pubDate>Wed, 17 Mar 2010 03:31:14 +0000</pubDate>
		<dc:creator>Tad</dc:creator>
				<category><![CDATA[Searching and Indexing]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Workflow]]></category>
		<category><![CDATA[Geeky]]></category>
		<category><![CDATA[Macros]]></category>
		<category><![CDATA[PDF]]></category>
		<category><![CDATA[Scripting]]></category>
		<category><![CDATA[Tips]]></category>
		<category><![CDATA[Windows]]></category>

		<guid isPermaLink="false">http://paperjammed.com/?p=930</guid>
		<description><![CDATA[It so happens that my employer has made a most welcome decision to replace the aging creaky old Novell GroupWise mail software with Microsoft Outlook, joining the rest of the modern corporate world. Now, there is little love in my heart for GroupWise, but it does have one feature that the new Outlook configuration will [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-medium wp-image-935" title="iStock_000000846660XSmall" src="http://paperjammed.com/wp-content/uploads/2010/03/iStock_000000846660XSmall-300x199.jpg" alt="" width="300" height="199" />It so happens that my employer has made a most welcome decision to replace the aging creaky old Novell GroupWise mail software with Microsoft Outlook, joining the rest of the modern corporate world. Now, there is little love in my heart for GroupWise, but it does have one feature that the new Outlook configuration will lack: you can keep as many emails as you want, just like Gmail.</p>
<p>The problem is this: with Outlook we will be limited to 1000 messages in our in-box; sadly, many of us have tens of thousands of emails in our old GroupWise mail. Even after a fairly rigorous slash and burn mission, hacking out all of the low hanging fruit, there will be many thousands remaining and I don&#8217;t want to lose that information. It might be useful to search and find how I set up a Zebra bar code printer in 2003, no?</p>
<p>A bundle of different freeware glue tools came to my rescue. Read on to hear about the toolset that has made it so I can keep those messages for years to come.<span id="more-930"></span></p>
<p><strong>Possible Solutions</strong></p>
<p>Right out of the gate, I began looking for ways to migrate messages from one mail client to the other. Some apps have this built right in, and if not, there are scripts and utilities out there to do this; but I was hampered by a few key facts:</p>
<ul>
<li>I have no control over the email clients and their configuration. Even if there is a menu option for exporting GroupWise messages from version 7.2, I&#8217;m stuck at 6.4 and cannot use that option.</li>
<li>GroupWise is a minor player in the email world. I&#8217;m not sure if Outlook would import from GroupWise, but I doubt it.</li>
<li>They are <em>replacing</em> the client in one shot. There will be no interim period where both GroupWise and Outlook will be available.</li>
<li>There is no getting around the hard limit of 1000 messages.</li>
<li>I don&#8217;t want to spend money on this.</li>
</ul>
<p>With these constraints in mind, I immediately thought about PDF documents. I then considered the following questions:</p>
<ul>
<li>How do I convert my email to PDF?</li>
<li>How can I do this automatically with thousands of emails?</li>
<li>Once I&#8217;m done, how do I search these documents?</li>
</ul>
<p>Here&#8217;s what I did:</p>
<p><strong>Conversion to PDF</strong></p>
<p>The first part was easy. I downloaded one of the many free print-to-PDF products available.</p>
<p>I chose <a href="http://sourceforge.net/projects/pdfcreator/">PDFCreator</a>, because I am familiar with its use and I know that it <a href="http://paperjammed.com/2009/10/27/dodged-the-corrupt-document-bullet-this-time-just-barely/">does not munge the fonts</a>.</p>
<p>Like many other PDF generation utilities, PDFCreator functions by providing a virtual printer to which any application can print. For example, to make a PDF of a web page, you use the Firefox <strong>Print</strong> menu and select <strong>PDFCreator</strong> from the drop-down list of available printers.</p>
<p>You are provided with a list of metadata fields that you can fill in, and these fields are used in the PDF generation.</p>
<p>Here&#8217;s what the PDFCreator screen looks like:</p>
<p><img class="alignnone size-full wp-image-931" title="20100316-pdfcreator1" src="http://paperjammed.com/wp-content/uploads/2010/03/20100316-pdfcreator1.gif" alt="" width="500" height="367" /></p>
<p><strong>A word of caution:</strong> PDF Creator is free, but you must be careful to deselect their spammy toolbar options in two different places during the installation process. I don&#8217;t like software that comes with preselected toolbars to install (even nice ones like Google&#8217;s) because I&#8217;m certain that 95% of the folks who actually install the toolbar would never have chosen to do so if it were unchecked by default.</p>
<p><strong>Running Everything Automatically</strong></p>
<p>This was the interesting bit. I work with Windows machines at work, so there was no AppleScript option available. So I did the next best thing: I used <a href="http://www.autoitscript.com/autoit3/index.shtml">AutoIT</a>.</p>
<p>I will warn you that AutoIT is pretty much the Windows analog of AppleScript, without the cutesy pseudo English syntax. In other words, you will need to roll up your sleeves and get your hands a little dirty in order to put together a decent AutoIT script.</p>
<p>The payoff comes when you finish your work and compile it into a tight executable that you can share with your friends, allowing them to automate some complex series of button clicks and copy/paste operations.</p>
<p>I walked through the manual process of exporting an email to PDF and listed each action:</p>
<ul>
<li>Get the date, sender, and subject</li>
<li>Create a filename based on date + sender + subject</li>
<li>Launch the <strong>Print</strong> dialog</li>
<li>Select <strong>PDFCreator</strong></li>
<li>Fill in the <strong>Document Title</strong>, <strong>Creation Date</strong>, and <strong>Subject</strong> in the PDFCreator dialog</li>
<li>Fill in the full file path in the Save dialog</li>
</ul>
<p>In addition, I wanted to make the script a little better by adding the following:</p>
<ul>
<li>Check that user has PDFCreator installed</li>
<li>Verify that GroupWise is running and that the user has selected one or more messages</li>
<li>Prompt the user for a target directory before processing the messages</li>
<li>Sanitize the filenames by replacing illegal characters with underscores and truncating to meet maximum filename and path length in Windows</li>
<li>Skip over files that have already been generated, quickly, so that one doesn&#8217;t need to worry about accidentally selecting messages that were already printed</li>
</ul>
<p>There were other adjustments needed, but the process was the same: run the script, hit a problem, tweak the script a little to address the problem, and repeat.</p>
<p>Here&#8217;s a little bit of the AutoIT script:</p>
<p><img class="size-full wp-image-943 alignnone" title="20100316-autoit" src="http://paperjammed.com/wp-content/uploads/2010/03/20100316-autoit.gif" alt="" width="500" height="345" /></p>
<p>You can see that it is a bit more intense than AppleScript, but remember that the full script wasn&#8217;t written in one go. I had a little short ten-line script that I kept tweaking as small problems cropped up until I had adjusted things to my liking.</p>
<p>Note that this is a GUI macro language. The machine starts clicking and typing away right in front of you and you probably shouldn&#8217;t interfere until your script finishes.</p>
<p>As of this afternoon, I have generated around 4,000 PDF documents for my email messages.</p>
<p><strong>Searching All of Those Documents</strong></p>
<p>This was the easiest part. These days there is an excellent tool available for searching documents on your desktop: <a href="http://desktop.google.com/">Google Desktop</a>. This product indexes every useful file on your desktop and provides a full Google search with a quick double-tap of the &lt;control&gt; key.</p>
<p>So you can enter a search like &#8220;Zebra bar code&#8221;</p>
<p><img class="alignnone size-full wp-image-944" title="20100316-google1" src="http://paperjammed.com/wp-content/uploads/2010/03/20100316-google1.gif" alt="" width="300" height="205" /></p>
<p>And the results look exactly like a Google web search, but it&#8217;s showing your desktop files. And you can see inline previews too.</p>
<p><img class="alignnone size-full wp-image-945" title="20100316-google2" src="http://paperjammed.com/wp-content/uploads/2010/03/20100316-google2.gif" alt="" width="500" height="443" /></p>
<p>Macintosh users can install Google Desktop as well, but all of these files should already be indexed and searchable by Spotlight.</p>
<p><strong>Closing Thoughts</strong></p>
<p>Whenever I reach for tools like this I feel a twinge of guilt—it&#8217;s outright hackery, isn&#8217;t it?</p>
<p>But there is a place for quick and dirty jobs in every workplace. I needed to get my files from one place to another, one time only. It just didn&#8217;t make sense to spend money or time on a more elegant solution.</p>
<p>Play around with each of these tools a little. Especially AutoIT—it&#8217;s a handy Swiss Army Knife to have at your disposal.</p>
]]></content:encoded>
			<wfw:commentRss>http://paperjammed.com/2010/03/16/a-handful-of-sweet-freebie-tools-to-save-the-day/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bring back the old-school way of managing computer folders and documents yourself!</title>
		<link>http://paperjammed.com/2010/01/24/bring-back-the-old-school-way-of-managing-computer-folders-and-documents-yourself/</link>
		<comments>http://paperjammed.com/2010/01/24/bring-back-the-old-school-way-of-managing-computer-folders-and-documents-yourself/#comments</comments>
		<pubDate>Mon, 25 Jan 2010 02:25:40 +0000</pubDate>
		<dc:creator>Tad</dc:creator>
				<category><![CDATA[Paperless Life]]></category>
		<category><![CDATA[Searching and Indexing]]></category>
		<category><![CDATA[Files and Folders]]></category>
		<category><![CDATA[Knowledge Management]]></category>
		<category><![CDATA[Organization]]></category>
		<category><![CDATA[PDF]]></category>
		<category><![CDATA[Photos]]></category>

		<guid isPermaLink="false">http://paperjammed.com/?p=857</guid>
		<description><![CDATA[One of my pet peeves in software is the black-box application that calmly sucks in all of your files and does everything for you, until the day you want to swich apps. This is the iTunes model, followed by many other products. I am of the opinion that rather than allowing an application to shuffle [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-medium wp-image-858" title="iStock_000010275242XSmall" src="http://paperjammed.com/wp-content/uploads/2010/01/iStock_000010275242XSmall-300x225.jpg" alt="" width="300" height="225" />One of my pet peeves in software is the <a href="http://paperjammed.com/2009/03/24/help-my-data-is-being-held-hostage/">black-box application that calmly sucks in all of your files and does everything for you</a>, until the day you want to swich apps. This is the iTunes model, followed by many other products.</p>
<p>I am of the opinion that rather than allowing an application to shuffle your life randomly, why not do it the old fashioned way and move your documents into folders of your choosing?</p>
<p>This article discusses some of the advantages of old-school folder management and gives a few hints along the way.<span id="more-857"></span></p>
<p><strong>Why bother?</strong></p>
<p>By creating your own well thought out folder structure, you gain the following advantages:</p>
<ul>
<li>You can find something fairly easily without needing to launch the special app.</li>
<li>You can copy reasonable subsets of your document sets to friends or for backups.</li>
<li>Someone else can find something without needing the special app.</li>
<li>You can place files in a common network drive that others can see, from PC, Linux, or Mac.</li>
<li>You do not lose all of the metadata about your files if the document management app ceases to exist.</li>
</ul>
<p>People have been managing their documents this way for decades, so this is not anything new. What is new, however, is that folks don&#8217;t necessarily see what flexibility they give up when they allow the computer to squirrel things away on their behalf.</p>
<p><strong>What kind of folders?<br />
</strong></p>
<p>In short, pick some categories of documents that you will be filing, and optionally pick a timeframe which to partition the folders. This mirrors what we do with paper folders, doesn&#8217;t it? We create dozens of manila folders with tabs, and optionally create subsets of these files by date (e.g. Receipts, 2009).</p>
<p>One key difference helps us: Computer folders enjoy one feature that their physical counterparts lack—they can be nested several layers deep.</p>
<p><strong>A few examples are probably in order&#8230;</strong></p>
<p><img class="alignright size-full wp-image-859" title="20100124-file-folders" src="http://paperjammed.com/wp-content/uploads/2010/01/20100124-file-folders.gif" alt="" width="342" height="233" />I like to keep several kinds of scanned documents relating to day to day home paperwork. Over time, it has become clear that I scan lots of receipts, health insurance papers, banking papers, bills, and &#8230; everything else.</p>
<p>As such, I created the following top-level folders: <strong>Banking</strong>, <strong>Bills</strong>, <strong>Health Insurance</strong>, <strong>Receipts</strong>, and <strong>Miscellaneous</strong>.</p>
<p>Over time, they start to get stuffed to the gills with things, especially the Bills and the Receipts folders. My answer to this was to split them out by date. Within each category folder I have subfolders by date. This is because some categories need lots of years, while others might not need to be broken down by date at all.</p>
<p>Digital photos are a different creature: I feel that the date of the photo is the most important piece of information, and subject matter is secondary. For this reason, I store my photos in a series of top-level folders labeled with the years.</p>
<p>With photos I have a three-level system: <strong>Year</strong>, <strong>Month</strong>, and <strong>Subject</strong>. For example, within the <strong>Photos</strong> folder there is a <strong>2009</strong> folder. That contains a <strong>2009-02</strong> folder, and that one contains a folder called <strong>Cats</strong>. There are many ways to arrange these, I have chosen this approach.</p>
<p>I like iPhoto as much as anyone, and I use it for my photos. The difference is that, for me, iPhoto only holds a copy of each photo—the original photos are all stored on a NAS using the file structure I describe above.</p>
<p>Put a little thought into it and come up with a system that works for you.</p>
<p><strong>Closing thoughts</strong></p>
<p>We are looking for ease of use here, as well as avoiding lock-in to some proprietary app. We also want it to be easy to back up specific bits of the data and share specific bits.</p>
<p>By looking at my example above, you can see how easy it would be to find a bill from 2009. By <a href="http://paperjammed.com/2009/02/07/pick-a-file-name-style-and-stick-with-it/">following a specific naming convention</a>, you can see that each document is fairly descriptive as well. You don&#8217;t need DEVONthink or its brethren to tell you how to find the Allstate bill from June of 2009. In addition, the folder names are now easily searchable by my operating system, as are the filenames.</p>
<p>This might create extra work for you in the beginning, but do you really want to be at the mercy of someone else&#8217;s application?</p>
<p>Oh, and about making those folders? There are applications out there that can generate a bunch of folders for you following your own chosen rules. One I use is <a href="http://www.publicspace.net/BigMeanFolderMachine/index.html">The Big Mean Folder Machine</a>.  I wouldn&#8217;t want to depend on an automatic system for daily use, but as a one-time jump start, tools like this can work wonders.</p>
<p>Don&#8217;t forget to back up your files!</p>
]]></content:encoded>
			<wfw:commentRss>http://paperjammed.com/2010/01/24/bring-back-the-old-school-way-of-managing-computer-folders-and-documents-yourself/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Automate ScanSnap OCR process on your Mac with AppleScript (Snow Leopard Edition)</title>
		<link>http://paperjammed.com/2010/01/04/automate-scansnap-ocr-process-on-your-mac-with-applescript-snow-leopard-edition/</link>
		<comments>http://paperjammed.com/2010/01/04/automate-scansnap-ocr-process-on-your-mac-with-applescript-snow-leopard-edition/#comments</comments>
		<pubDate>Tue, 05 Jan 2010 01:51:52 +0000</pubDate>
		<dc:creator>Tad</dc:creator>
				<category><![CDATA[Software]]></category>
		<category><![CDATA[Workflow]]></category>
		<category><![CDATA[Geeky]]></category>
		<category><![CDATA[PDF]]></category>
		<category><![CDATA[Scanning]]></category>
		<category><![CDATA[Scripting]]></category>
		<category><![CDATA[Searching and Indexing]]></category>

		<guid isPermaLink="false">http://paperjammed.com/?p=840</guid>
		<description><![CDATA[Some time back I published an AppleScript that allows one to automatically run OCR in the background on scanned files generated by your Fujitsu ScanSnap, while you to continue scanning more files. ScanSnap owners should all be familiar with this: the out-of-the-box configuration of the ScanSnap Manager and Abbyy Finereader force the scan and OCR [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://paperjammed.com/wp-content/uploads/2009/08/20090829-applescript.gif"><img class="alignright size-full wp-image-658" title="20090829-applescript" src="http://paperjammed.com/wp-content/uploads/2009/08/20090829-applescript.gif" alt="" width="128" height="128" /></a>Some time back I published an AppleScript that allows one to <a href="http://paperjammed.com/2009/08/29/automate-scansnap-ocr-process-on-your-mac-with-applescript/">automatically run OCR in the background on scanned files</a> generated by your Fujitsu ScanSnap, while you to continue scanning more files. ScanSnap owners should all be familiar with this: the out-of-the-box configuration of the ScanSnap Manager and Abbyy Finereader force the scan and OCR stages to run in lockstep: scan 1&#8230;OCR 1&#8230;scan 2&#8230;OCR 2&#8230; and so on. This script allowed you to scan regardless of the OCR processing going on.</p>
<p>As it turns out, my original script does not work in Snow Leopard, and I promised that I would one day clean up and publish my new and improved version.</p>
<p>Chris posted a comment today as a gentle reminder, so here is the new and improved version without further delay&#8230;<br />
<span id="more-840"></span><br />
<strong>The Details</strong></p>
<p>Unfortunately, Snow Leopard came around <a href="http://paperjammed.com/2009/09/07/when-migrating-to-a-new-operating-system-look-before-you-leap/">and caused some indigestion</a>. For starters, the ScanSnap Manager didn&#8217;t work correctly and Abbyy Finereader would not process anything made by the ScanSnap. A couple of months later <a href="http://paperjammed.com/2009/11/13/snow-leopard-update-for-scansnap/">they got everything straightened out</a> and delivered <a href="http://www.fujitsu.com/us/services/computing/peripherals/scanners/support/sl_download.html">new versions of each product</a>.</p>
<p>The new version of the Abbyy Finereader product does not play well with my original script.</p>
<p>Since I cannot do without this important functionality, I rolled up my sleeves and rewrote most of the script. The new version works in Snow Leopard quite nicely with one small annoyance: you really don&#8217;t want to try to use the machine for anything other than scanning or OCR while it is going because the new Finereader version keeps bouncing the darned icon all the time it is running and that is quite annoying to watch.</p>
<p>Fortunately, I really don&#8217;t need to use my machine for anything else while it is chewing on the docs; I just wanted to be able to continue scanning at the same time!</p>
<p><strong>Note: </strong>Before going forward, note that you will need to upgrade the ScanSnap Manager and Abbyy Finereader to the Snow Leopard versions first! Get the files <a href="http://www.fujitsu.com/us/services/computing/peripherals/scanners/support/sl_download.html">here</a>.</p>
<p>Here is a link to the <a href="http://paperjammed.com/wp-content/uploads/2010/01/Run-OCR-on-New-Folder-Items.scpt">new script</a>&#8230;</p>
<p>And here&#8217;s the code itself:</p>
<div class="codecolorer-container applescript default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;height:300px;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />11<br />12<br />13<br />14<br />15<br />16<br />17<br />18<br />19<br />20<br />21<br />22<br />23<br />24<br />25<br />26<br />27<br />28<br />29<br />30<br />31<br />32<br />33<br />34<br />35<br />36<br />37<br />38<br />39<br />40<br />41<br />42<br />43<br />44<br />45<br />46<br />47<br />48<br />49<br />50<br />51<br />52<br />53<br />54<br />55<br />56<br />57<br />58<br />59<br />60<br />61<br />62<br />63<br />64<br />65<br />66<br />67<br />68<br />69<br />70<br />71<br />72<br />73<br />74<br />75<br />76<br />77<br />78<br />79<br />80<br />81<br />82<br />83<br />84<br />85<br />86<br />87<br />88<br />89<br />90<br />91<br />92<br />93<br />94<br />95<br />96<br />97<br />98<br />99<br />100<br />101<br />102<br />103<br />104<br />105<br />106<br />107<br />108<br />109<br />110<br />111<br />112<br />113<br />114<br />115<br />116<br />117<br />118<br />119<br />120<br />121<br />122<br />123<br />124<br />125<br />126<br />127<br />128<br />129<br />130<br />131<br />132<br />133<br />134<br />135<br />136<br />137<br />138<br />139<br />140<br />141<br />142<br />143<br />144<br />145<br />146<br />147<br />148<br />149<br />150<br />151<br />152<br />153<br />154<br />155<br />156<br />157<br />158<br />159<br />160<br />161<br />162<br />163<br />164<br />165<br />166<br />167<br />168<br />169<br />170<br />171<br />172<br />173<br />174<br />175<br />176<br />177<br />178<br />179<br />180<br />181<br />182<br />183<br />184<br />185<br />186<br />187<br />188<br />189<br />190<br />191<br />192<br />193<br />194<br />195<br />196<br />197<br />198<br />199<br />200<br />201<br />202<br />203<br />204<br />205<br />206<br />207<br />208<br />209<br />210<br />211<br />212<br />213<br />214<br />215<br />216<br />217<br />218<br />219<br />220<br />221<br />222<br />223<br />224<br />225<br />226<br />227<br />228<br />229<br />230<br />231<br />232<br />233<br />234<br /></div></td><td><div class="applescript codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #808080; font-style: italic;">(*<br />
<br />
NOTE: This script was written for Snow Leopard. It may work<br />
on Leopard, but I never tried it.<br />
<br />
This is a folder listener script that will act as a queue, receiving<br />
PDF files from the ScanSnap scanner and feeding them, one by one, to<br />
the Abbyy FineReader OCR software.<br />
<br />
This allows you to keep scanning while the OCR job runs in the background<br />
on all of the unprocessed files.<br />
<br />
Why do we want to do this?<br />
<br />
The ScanSnap Manager software does not support this by default, so<br />
when you scan in a file, it sends it to FineReader for OCR. You then<br />
must wait until FineReader finishes its work before scanning in another<br />
document.<br />
<br />
This script allows you to keep scanning without waiting for OCR.<br />
<br />
Installation:<br />
<br />
o &nbsp; Copy this script to:<br />
<br />
&nbsp; &nbsp; &lt;home&gt;/Library/Scripts/Folder Action Scripts<br />
<br />
&nbsp; &nbsp; You may have to create the &quot;Folder Action Scripts&quot; folder.<br />
<br />
o &nbsp; Open a Finder window and navigate to the parent folder<br />
&nbsp; of the scanned documents folder.<br />
<br />
o Right click (control-click) the scanned documents folder and<br />
&nbsp; choose:<br />
<br />
&nbsp; &nbsp; Folder Actions Setup...<br />
<br />
o At this point if folder actions are not enabled, you will<br />
&nbsp; likely have to enable them and add the script manually.<br />
&nbsp; &nbsp; - check &quot;Enable Folder Actions&quot;<br />
&nbsp; &nbsp; - Use the &quot;+&quot; buttons on the left and right sides to add the<br />
&nbsp; &nbsp; &nbsp; scan folder and then this script.<br />
&nbsp; &nbsp; <br />
o Otherwise, a list of scripts will come up. Choose this script<br />
&nbsp; from the &quot;Choose a Script to Attach&quot; dialog.<br />
<br />
o Close all windows.<br />
<br />
Copyright (C) 2010 Tad Harrison<br />
*)</span><br />
<span style="color: #ff0033; font-weight: bold;">property</span> ocrFileSuffix : <span style="color: #009900;">&quot; processed by FineReader.pdf&quot;</span><br />
<span style="color: #ff0033; font-weight: bold;">property</span> ocrApplicationName : <span style="color: #009900;">&quot;Scan to Searchable PDF&quot;</span><br />
<span style="color: #ff0033; font-weight: bold;">property</span> ocrApplicationWindow : <span style="color: #009900;">&quot;Converting the document&quot;</span><br />
<span style="color: #ff0033; font-weight: bold;">property</span> ocrLockFileName : <span style="color: #009900;">&quot;OCR in Progress&quot;</span><br />
<span style="color: #ff0033; font-weight: bold;">on</span> <span style="color: #0066ff;">adding</span> <span style="color: #0066ff;">folder</span> <span style="color: #0066ff;">items</span> <span style="color: #ff0033; font-weight: bold;">to</span> this_folder <span style="color: #ff0033;">after</span> <span style="color: #0066ff;">receiving</span> added_items<br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> lockFilePath <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #000000;">&#40;</span><span style="color: #0066ff;">POSIX path</span> <span style="color: #ff0033; font-weight: bold;">of</span> <span style="color: #000000;">&#40;</span><span style="color: #0066ff;">path to</span> <span style="color: #0066ff;">desktop</span> <span style="color: #0066ff;">folder</span> <span style="color: #ff0033;">as</span> <span style="color: #0066ff;">text</span><span style="color: #000000;">&#41;</span><span style="color: #000000;">&#41;</span> <span style="color: #000000;">&amp;</span> ocrLockFileName<br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">try</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; logEvent<span style="color: #000000;">&#40;</span><span style="color: #009900;">&quot;=== Run OCR on New Folder Items ===&quot;</span><span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #808080; font-style: italic;">-- Test for lockfile; exit if lockfile exists</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">tell</span> <span style="color: #0066ff;">application</span> <span style="color: #009900;">&quot;System Events&quot;</span> <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #ff0033; font-weight: bold;">set</span> lockFileExists <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #0066ff;">exists</span> <span style="color: #0066ff;">file</span> lockFilePath<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">if</span> lockFileExists <span style="color: #ff0033; font-weight: bold;">then</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; logEvent<span style="color: #000000;">&#40;</span><span style="color: #009900;">&quot;Other script running. Exiting...&quot;</span><span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">return</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">else</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #0066ff;">do shell script</span> <span style="color: #009900;">&quot;/usr/bin/touch <span style="color: #000000; font-weight: bold;">\&quot;</span>&quot;</span> <span style="color: #000000;">&amp;</span> lockFilePath <span style="color: #000000;">&amp;</span> <span style="color: #009900;">&quot;<span style="color: #000000; font-weight: bold;">\&quot;</span>&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">end</span> <span style="color: #ff0033; font-weight: bold;">if</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #808080; font-style: italic;">-- Main loop</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> moreWorkToDo <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #0066ff;">true</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">repeat</span> <span style="color: #ff0033; font-weight: bold;">while</span> moreWorkToDo<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> aFile <span style="color: #ff0033; font-weight: bold;">to</span> getNextFile<span style="color: #000000;">&#40;</span>this_folder<span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">if</span> <span style="color: #ff0033;">not</span> aFile <span style="color: #000000;">=</span> <span style="color: #009900;">&quot;&quot;</span> <span style="color: #ff0033; font-weight: bold;">then</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ocrFile<span style="color: #000000;">&#40;</span>aFile<span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">else</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> moreWorkToDo <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #0066ff;">false</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">end</span> <span style="color: #ff0033; font-weight: bold;">if</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">end</span> <span style="color: #ff0033; font-weight: bold;">repeat</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; logEvent<span style="color: #000000;">&#40;</span><span style="color: #009900;">&quot;No more work.&quot;</span><span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; exitApp<span style="color: #000000;">&#40;</span>ocrApplicationName<span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">on</span> <span style="color: #ff0033; font-weight: bold;">error</span> errorStr <span style="color: #0066ff;">number</span> errNum<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #0066ff;">display dialog</span> <span style="color: #009900;">&quot;Error &quot;</span> <span style="color: #000000;">&amp;</span> errNum <span style="color: #000000;">&amp;</span> <span style="color: #009900;">&quot; while running OCR: &quot;</span> <span style="color: #000000;">&amp;</span> errorStr<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> <span style="color: #ff0033; font-weight: bold;">my</span> isRunning <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #0066ff;">false</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">end</span> <span style="color: #ff0033; font-weight: bold;">try</span><br />
&nbsp; &nbsp; <span style="color: #808080; font-style: italic;">-- Get rid of the lockfile, ignoring any errors</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">try</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #0066ff;">do shell script</span> <span style="color: #009900;">&quot;/bin/rm <span style="color: #000000; font-weight: bold;">\&quot;</span>&quot;</span> <span style="color: #000000;">&amp;</span> lockFilePath <span style="color: #000000;">&amp;</span> <span style="color: #009900;">&quot;<span style="color: #000000; font-weight: bold;">\&quot;</span>&quot;</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">end</span> <span style="color: #ff0033; font-weight: bold;">try</span><br />
<span style="color: #ff0033; font-weight: bold;">end</span> <span style="color: #0066ff;">adding</span> <span style="color: #0066ff;">folder</span> <span style="color: #0066ff;">items</span> <span style="color: #ff0033; font-weight: bold;">to</span><br />
<span style="color: #808080; font-style: italic;">(*<br />
Name: ocrFile<br />
Description: Runs OCR on the next un-OCR'd file<br />
Parameters:<br />
&nbsp; aFile - the file to be OCR'd<br />
*)</span><br />
<span style="color: #ff0033; font-weight: bold;">on</span> ocrFile<span style="color: #000000;">&#40;</span>aFile<span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> posixFilePath <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #0066ff;">POSIX path</span> <span style="color: #ff0033; font-weight: bold;">of</span> aFile<br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> posixOcrFilePath <span style="color: #ff0033; font-weight: bold;">to</span> getPosixOcrFilePath<span style="color: #000000;">&#40;</span>posixFilePath<span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; logEvent<span style="color: #000000;">&#40;</span><span style="color: #009900;">&quot;OCR: &quot;</span> <span style="color: #000000;">&amp;</span> posixFilePath<span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">tell</span> <span style="color: #0066ff;">application</span> ocrApplicationName <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #0066ff;">open</span> aFile<br />
&nbsp; &nbsp; <span style="color: #808080; font-style: italic;">--</span><br />
&nbsp; &nbsp; <span style="color: #808080; font-style: italic;">-- Now sit in a loop checking once per second for the OCR file</span><br />
&nbsp; &nbsp; <span style="color: #808080; font-style: italic;">-- Give up after five minutes</span><br />
&nbsp; &nbsp; <span style="color: #808080; font-style: italic;">--</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">with</span> <span style="color: #ff0033; font-weight: bold;">timeout</span> <span style="color: #ff0033; font-weight: bold;">of</span> <span style="color: #000000;">300</span> seconds<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> ocrFileExists <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #0066ff;">false</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">repeat</span> <span style="color: #ff0033; font-weight: bold;">until</span> ocrFileExists<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> ocrFileExists <span style="color: #ff0033; font-weight: bold;">to</span> posixFileExists<span style="color: #000000;">&#40;</span>posixOcrFilePath<span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">if</span> ocrFileExists <span style="color: #ff0033; font-weight: bold;">then</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; logEvent<span style="color: #000000;">&#40;</span><span style="color: #009900;">&quot;OCR file generated.&quot;</span><span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #808080; font-style: italic;">-- Wait 5 even if the file was found, to let things settle</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; delay <span style="color: #000000;">5</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">else</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #808080; font-style: italic;">-- Wait a second before checking again</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; delay <span style="color: #000000;">1</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">end</span> <span style="color: #ff0033; font-weight: bold;">if</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">end</span> <span style="color: #ff0033; font-weight: bold;">repeat</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">end</span> <span style="color: #ff0033; font-weight: bold;">timeout</span><br />
<span style="color: #ff0033; font-weight: bold;">end</span> ocrFile<br />
<span style="color: #808080; font-style: italic;">(*<br />
Name: appIsRunning<br />
Description: Determines if a particular application is running.<br />
Parameters:<br />
&nbsp; &nbsp; appName - the name of the application to be tested<br />
Returns: True if the application is running; otherwise False<br />
*)</span><br />
<span style="color: #ff0033; font-weight: bold;">on</span> appIsRunning<span style="color: #000000;">&#40;</span>appName<span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">tell</span> <span style="color: #0066ff;">application</span> <span style="color: #009900;">&quot;System Events&quot;</span> <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #000000;">&#40;</span><span style="color: #0066ff;">name</span> <span style="color: #ff0033; font-weight: bold;">of</span> processes<span style="color: #000000;">&#41;</span> <span style="color: #ff0033;">contains</span> appName<br />
<span style="color: #ff0033; font-weight: bold;">end</span> appIsRunning<br />
<span style="color: #808080; font-style: italic;">(*<br />
Name: posixFileExists<br />
Description: Determines if a particular file exists.<br />
Parameters:<br />
&nbsp; &nbsp; posixFilePath - the POSIX path to the file<br />
Returns: True if the file exists; otherwise False<br />
*)</span><br />
<span style="color: #ff0033; font-weight: bold;">on</span> posixFileExists<span style="color: #000000;">&#40;</span>posixFilePath<span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">tell</span> <span style="color: #0066ff;">application</span> <span style="color: #009900;">&quot;System Events&quot;</span> <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #0066ff;">exists</span> <span style="color: #0066ff;">file</span> posixFilePath<br />
<span style="color: #ff0033; font-weight: bold;">end</span> posixFileExists<br />
<span style="color: #808080; font-style: italic;">(*<br />
Name: exitApp<br />
Description: Exits the specified app if it is running.<br />
Parameters:<br />
&nbsp; &nbsp; appName - the application name<br />
*)</span><br />
<span style="color: #ff0033; font-weight: bold;">on</span> exitApp<span style="color: #000000;">&#40;</span>appName<span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">if</span> appIsRunning<span style="color: #000000;">&#40;</span>appName<span style="color: #000000;">&#41;</span> <span style="color: #ff0033; font-weight: bold;">then</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">tell</span> <span style="color: #0066ff;">application</span> appName <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #0066ff;">quit</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">end</span> <span style="color: #ff0033; font-weight: bold;">if</span><br />
<span style="color: #ff0033; font-weight: bold;">end</span> exitApp<br />
<span style="color: #808080; font-style: italic;">(*<br />
Name: getPosixOcrFilePath<br />
Description: Gets the OCR output filename for a given input filename.<br />
Parameters:<br />
&nbsp; &nbsp; posixFilePath - the full path to the source file<br />
Return: the POSIX path of the OCR output file<br />
*)</span><br />
<span style="color: #ff0033; font-weight: bold;">on</span> getPosixOcrFilePath<span style="color: #000000;">&#40;</span>posixFilePath<span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> posixBaseName <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #0066ff;">do shell script</span> ¬<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;">&quot;filename=&quot;</span> <span style="color: #000000;">&amp;</span> <span style="color: #0066ff;">quoted form</span> <span style="color: #ff0033; font-weight: bold;">of</span> posixFilePath <span style="color: #000000;">&amp;</span> <span style="color: #009900;">&quot;; echo ${filename%<span style="color: #000000; font-weight: bold;">\\</span>.*}&quot;</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> posixOcrFilePath <span style="color: #ff0033; font-weight: bold;">to</span> posixBaseName <span style="color: #000000;">&amp;</span> ocrFileSuffix<br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">return</span> posixOcrFilePath<br />
<span style="color: #ff0033; font-weight: bold;">end</span> getPosixOcrFilePath<br />
<span style="color: #808080; font-style: italic;">(*<br />
Name: getNextFile<br />
Description: Finds the next unprocessed ScanSnap PDF<br />
Return: the file or &quot;&quot;<br />
*)</span><br />
<span style="color: #ff0033; font-weight: bold;">on</span> getNextFile<span style="color: #000000;">&#40;</span>aFolder<span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; logEvent<span style="color: #000000;">&#40;</span><span style="color: #009900;">&quot;Getting next file...&quot;</span><span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> masterFileList <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #0066ff;">list</span> <span style="color: #0066ff;">folder</span> aFolder ¬<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">without</span> <span style="color: #0066ff;">invisibles</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> posixPath <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #0066ff;">POSIX path</span> <span style="color: #ff0033; font-weight: bold;">of</span> aFolder<br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">repeat</span> <span style="color: #ff0033; font-weight: bold;">with</span> i <span style="color: #ff0033; font-weight: bold;">from</span> <span style="color: #000000;">1</span> <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #0066ff;">count</span> masterFileList<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> fileName <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #0066ff;">item</span> i <span style="color: #ff0033; font-weight: bold;">of</span> masterFileList<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> posixFilePath <span style="color: #ff0033; font-weight: bold;">to</span> posixPath <span style="color: #000000;">&amp;</span> fileName<br />
&nbsp; &nbsp; &nbsp; &nbsp; log posixFilePath<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #808080; font-style: italic;">--</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #808080; font-style: italic;">-- Construct a FineReader file name from our file</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #808080; font-style: italic;">--</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> posixOcrFilePath <span style="color: #ff0033; font-weight: bold;">to</span> getPosixOcrFilePath<span style="color: #000000;">&#40;</span>posixFilePath<span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #808080; font-style: italic;">--</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #808080; font-style: italic;">-- See if the FineReader file we constructed exists</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #808080; font-style: italic;">--</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> ocrFileExists <span style="color: #ff0033; font-weight: bold;">to</span> posixFileExists<span style="color: #000000;">&#40;</span>posixOcrFilePath<span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">tell</span> <span style="color: #0066ff;">me</span> <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #ff0033; font-weight: bold;">set</span> fileCreator <span style="color: #ff0033; font-weight: bold;">to</span> getSpotlightInfo for <span style="color: #009900;">&quot;kMDItemCreator&quot;</span> <span style="color: #ff0033; font-weight: bold;">from</span> posixFilePath<br />
&nbsp; &nbsp; &nbsp; &nbsp; log <span style="color: #000000;">&#40;</span><span style="color: #009900;">&quot;Creator: &quot;</span> <span style="color: #000000;">&amp;</span> fileCreator<span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">if</span> <span style="color: #ff0033;">not</span> ocrFileExists <span style="color: #ff0033;">and</span> fileCreator <span style="color: #000000;">=</span> <span style="color: #009900;">&quot;ScanSnap Manager&quot;</span> <span style="color: #ff0033; font-weight: bold;">then</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">return</span> <span style="color: #0066ff;">POSIX file</span> posixFilePath<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">end</span> <span style="color: #ff0033; font-weight: bold;">if</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">end</span> <span style="color: #ff0033; font-weight: bold;">repeat</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">return</span> <span style="color: #009900;">&quot;&quot;</span><br />
<span style="color: #ff0033; font-weight: bold;">end</span> getNextFile<br />
<span style="color: #808080; font-style: italic;">(*<br />
Name: getSpotlightInfo<br />
Description: Gets a named attribute from metadata for a specific file.<br />
Parameters:<br />
&nbsp; &nbsp; for myattribute - the name of the attribute<br />
&nbsp; &nbsp; from myfile - the name of the file<br />
Returns: the attribute value or &quot;&quot; if none found<br />
*)</span><br />
<span style="color: #ff0033; font-weight: bold;">on</span> getSpotlightInfo for myattribute <span style="color: #ff0033; font-weight: bold;">from</span> myfile<br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">try</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> this_kMDItemResult <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #009900;">&quot;&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">tell</span> <span style="color: #0066ff;">application</span> <span style="color: #009900;">&quot;Finder&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> this_item <span style="color: #ff0033; font-weight: bold;">to</span> myfile <span style="color: #ff0033;">as</span> <span style="color: #0066ff;">string</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> this_item <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #0066ff;">POSIX path</span> <span style="color: #ff0033; font-weight: bold;">of</span> this_item<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> this_kMDItem <span style="color: #ff0033; font-weight: bold;">to</span> myattribute<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> theResult <span style="color: #ff0033; font-weight: bold;">to</span> words <span style="color: #ff0033; font-weight: bold;">of</span> <span style="color: #000000;">&#40;</span><span style="color: #0066ff;">do shell script</span> <span style="color: #009900;">&quot;/usr/bin/mdls -name &quot;</span> <span style="color: #000000;">&amp;</span> this_kMDItem <span style="color: #000000;">&amp;</span> <span style="color: #009900;">&quot; -raw -nullMarker None &quot;</span> <span style="color: #000000;">&amp;</span> <span style="color: #0066ff;">quoted form</span> <span style="color: #ff0033; font-weight: bold;">of</span> this_item<span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; log <span style="color: #009900;">&quot;Result: &quot;</span> <span style="color: #000000;">&amp;</span> theResult <span style="color: #ff0033;">as</span> <span style="color: #0066ff;">string</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">repeat</span> <span style="color: #ff0033; font-weight: bold;">with</span> j <span style="color: #ff0033; font-weight: bold;">from</span> <span style="color: #000000;">1</span> <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #0066ff;">number</span> <span style="color: #ff0033; font-weight: bold;">of</span> <span style="color: #0066ff;">items</span> <span style="color: #ff0033; font-weight: bold;">in</span> theResult<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> this_kMDItemResult <span style="color: #ff0033; font-weight: bold;">to</span> this_kMDItemResult <span style="color: #000000;">&amp;</span> <span style="color: #0066ff;">item</span> j <span style="color: #ff0033; font-weight: bold;">of</span> theResult <span style="color: #ff0033;">as</span> <span style="color: #0066ff;">string</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">if</span> j <span style="color: #000000;">&lt;</span> <span style="color: #0066ff;">number</span> <span style="color: #ff0033; font-weight: bold;">of</span> <span style="color: #0066ff;">items</span> <span style="color: #ff0033; font-weight: bold;">in</span> theResult <span style="color: #ff0033; font-weight: bold;">then</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> this_kMDItemResult <span style="color: #ff0033; font-weight: bold;">to</span> this_kMDItemResult <span style="color: #000000;">&amp;</span> <span style="color: #009900;">&quot; &quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">end</span> <span style="color: #ff0033; font-weight: bold;">if</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">end</span> <span style="color: #ff0033; font-weight: bold;">repeat</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">end</span> <span style="color: #ff0033; font-weight: bold;">tell</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">on</span> <span style="color: #ff0033; font-weight: bold;">error</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> this_kMDItemResult <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #009900;">&quot;&quot;</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">end</span> <span style="color: #ff0033; font-weight: bold;">try</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">return</span> this_kMDItemResult<br />
<span style="color: #ff0033; font-weight: bold;">end</span> getSpotlightInfo<br />
<span style="color: #808080; font-style: italic;">(*<br />
Name: logEvent<br />
Description: Write an event to an event log<br />
Parameters:<br />
&nbsp; &nbsp; themessage - the message to write to the log<br />
*)</span><br />
<span style="color: #ff0033; font-weight: bold;">on</span> logEvent<span style="color: #000000;">&#40;</span>themessage<span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> theLine <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #000000;">&#40;</span><span style="color: #0066ff;">do shell script</span> ¬<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;">&quot;date &nbsp;+'%Y-%m-%d %H:%M:%S'&quot;</span> <span style="color: #ff0033;">as</span> <span style="color: #0066ff;">string</span><span style="color: #000000;">&#41;</span> ¬<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #000000;">&amp;</span> <span style="color: #009900;">&quot; &quot;</span> <span style="color: #000000;">&amp;</span> themessage<br />
&nbsp; &nbsp; <span style="color: #0066ff;">do shell script</span> <span style="color: #009900;">&quot;echo &quot;</span> <span style="color: #000000;">&amp;</span> theLine <span style="color: #000000;">&amp;</span> ¬<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;">&quot; &gt;&gt; ~/Library/Logs/AppleScript-events.log&quot;</span><br />
<span style="color: #ff0033; font-weight: bold;">end</span> logEvent</div></td></tr></tbody></table></div>
<p><strong>Installation</strong></p>
<ul>
<li>Use the Script Editor to save this script as <strong>Run OCR on New Folder Items</strong> under <strong><em>User Home</em>/Library/Scripts/Folder Action Scripts</strong><br />
You may have to create the <strong>Folder Action Scripts</strong> folder.</li>
<li>Now open a Finder window and navigate to the parent folder of your scanned documents folder.</li>
<li>Right click (control-click) the scanned documents folder and choose <strong>Folder Actions Setup&#8230;</strong></li>
<li>At this point if folder actions are not enabled, you will likely have to enable them and add the script manually.
<ul>
<li> Check <strong>Enable Folder Actions</strong></li>
<li>Use the &#8220;+&#8221; buttons on the left and right sides to add the scan folder and then this script.</li>
</ul>
</li>
<li>Otherwise, a list of scripts will come up. Choose this script from the <strong>Choose a Script to Attach</strong> dialog.</li>
<li>Close all windows.</li>
</ul>
<p>That&#8217;s it! The script will be invoked automatically every time a new file appears in your scanned documents folder.</p>
<p>Please let me know if you have any ideas that can improve this script. I&#8217;m not an AppleScript guru, so someone might just know how to keep that annoying Finereader icon from jumping.</p>
]]></content:encoded>
			<wfw:commentRss>http://paperjammed.com/2010/01/04/automate-scansnap-ocr-process-on-your-mac-with-applescript-snow-leopard-edition/feed/</wfw:commentRss>
		<slash:comments>25</slash:comments>
		</item>
		<item>
		<title>Don&#8217;t worry if you didn&#8217;t sanitize your documents—even the TSA forgets occasionally</title>
		<link>http://paperjammed.com/2009/12/08/dont-worry-if-you-didnt-sanitize-your-documents%e2%80%94even-the-tsa-forgets-occasionally/</link>
		<comments>http://paperjammed.com/2009/12/08/dont-worry-if-you-didnt-sanitize-your-documents%e2%80%94even-the-tsa-forgets-occasionally/#comments</comments>
		<pubDate>Tue, 08 Dec 2009 22:29:29 +0000</pubDate>
		<dc:creator>Tad</dc:creator>
				<category><![CDATA[Paperless Life]]></category>
		<category><![CDATA[Searching and Indexing]]></category>
		<category><![CDATA[Security]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Data Loss]]></category>
		<category><![CDATA[PDF]]></category>
		<category><![CDATA[Privacy]]></category>
		<category><![CDATA[Rants]]></category>
		<category><![CDATA[Shredding]]></category>

		<guid isPermaLink="false">http://paperjammed.com/?p=796</guid>
		<description><![CDATA[It&#8217;s too comical to be true. A few months back, when I wrote an article warning about inadequate attempts at sanitizing PDF documents, I thought that any organization serious about censoring documents would not make such a basic error. Especially not a government agency, after the military had been caught by this pitfall. Apparently this [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-full wp-image-797" title="20091208-redaction1" src="http://paperjammed.com/wp-content/uploads/2009/12/20091208-redaction1.gif" alt="20091208-redaction1" width="361" height="280" />It&#8217;s too comical to be true. A few months back, when I wrote an article <a href="http://paperjammed.com/2009/04/21/keeping-your-secrets-to-yourself—what-can-your-shared-documents-tell-others/">warning about inadequate attempts at sanitizing PDF documents</a>, I thought that any organization serious about censoring documents would not make such a basic error. Especially not a government agency, after the military <a href="http://www.schneier.com/blog/archives/2005/05/pdf_radacting_f.html">had been caught</a> by this pitfall.</p>
<p><a href="http://www.wanderingaramean.com/2009/12/tsa-makes-another-stupid-move.html">Apparently this is not the case</a></p>
<p>It seems that the TSA has leaked their official document of airport security guidelines. ABC News says <a href="http://abcnews.go.com/Blotter/massive-tsa-security-breach-agency-secrets/story?id=9280503">Online Posting Reveals a &#8220;How To&#8221; for Terrorists to Get Through Airport Security</a></p>
<p><a href="http://abcnews.go.com/Blotter/massive-tsa-security-breach-agency-secrets/story?id=9280503"></a><span id="more-796"></span></p>
<p><strong>A Rookie Mistake</strong></p>
<p>Look at the screenshot of the document at the top of this post. Even though a certain part of the document has been blacked out, it is possible to select the text and copy/paste to find out what is hidden behind the black text.</p>
<p>What kinds of things are listed in this document?</p>
<ul>
<li>Photographs of all kinds of official ID cards. Ever wondered what a U.S. Senator&#8217;s ID card looks like?</li>
<li>Procedures for calibrating equipment, such as where guns should be hidden for the testing and such.</li>
<li>Guidelines for who gets searched and who doesn&#8217;t.</li>
<li>Guidelines for what objects get searched and which don&#8217;t.</li>
<li>And much much more!</li>
</ul>
<p>In other words, this was a most unfortunate event.</p>
<p>See for yourself—ABC News (and others) have <a href="http://a.abcnews.go.com/images/Blotter/ht_tsa_screening_2_091208.pdf">posted the document with redactions removed</a>.</p>
<p><strong>Easy as Pie</strong></p>
<p>Here&#8217;s a screenshot of the original document, opened in Adobe Acrobat Professional.</p>
<p><img class="alignnone size-full wp-image-801" title="20091208-redaction2" src="http://paperjammed.com/wp-content/uploads/2009/12/20091208-redaction2.gif" alt="20091208-redaction2" width="500" height="197" /></p>
<p>As you can see, it was a trivial matter to use the <strong>TouchUp Object</strong> tool to gently slide the black rectangle off of the secret stuff (I have blurred the text here, though you can read it from ABC News if you wish).</p>
<p>If you are working with confidential documents that could potentially cause disaster if leaked, <em>please</em> learn how to redact your documents correctly!</p>
]]></content:encoded>
			<wfw:commentRss>http://paperjammed.com/2009/12/08/dont-worry-if-you-didnt-sanitize-your-documents%e2%80%94even-the-tsa-forgets-occasionally/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Keeping your secrets to yourself—old changes lingering in your PDF files</title>
		<link>http://paperjammed.com/2009/11/23/keeping-your-secrets-to-yourself-old-changes-lingering-in-your-pdf-files/</link>
		<comments>http://paperjammed.com/2009/11/23/keeping-your-secrets-to-yourself-old-changes-lingering-in-your-pdf-files/#comments</comments>
		<pubDate>Tue, 24 Nov 2009 04:46:58 +0000</pubDate>
		<dc:creator>Tad</dc:creator>
				<category><![CDATA[Security]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Data Loss]]></category>
		<category><![CDATA[Geeky]]></category>
		<category><![CDATA[PDF]]></category>

		<guid isPermaLink="false">http://paperjammed.com/?p=781</guid>
		<description><![CDATA[A few months ago I wrote an article that touched upon the problems inherent in attempts to sanitize documents before sending them to the enemy—perhaps to remove competitor&#8217;s names or trade secrets. I was reading a post on a board I frequent where a person was describing exactly this kind of activity—removing sensitive information from [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-medium wp-image-791" title="Rusty trap" src="http://paperjammed.com/wp-content/uploads/2009/11/iStock_000011076402XSmall-300x225.jpg" alt="Rusty trap" width="300" height="225" />A few months ago I wrote an article that touched upon <a href="http://paperjammed.com/2009/04/21/keeping-your-secrets-to-yourself—what-can-your-shared-documents-tell-others/">the problems inherent in attempts to sanitize documents</a> before sending them to the enemy—perhaps to remove competitor&#8217;s names or trade secrets.</p>
<p>I was reading a post on a board I frequent where a person was describing exactly this kind of activity—removing sensitive information from PDF documents. Several suggestions were made, but one individual suggested opening the file in Acrobat Pro and replacing the sensitive text with good old <a href="http://www.lipsum.com/">Lorem Ipsum</a>.</p>
<p>It was at that moment that I recalled a peculiar feature of the PDF file format: it is designed to support nondestructive updates, allowing people to make vast changes to a PDF document while still retaining the original document, fully intact. I did a few experiments and was surprised with the results.<span id="more-781"></span></p>
<p><strong>A Brief Note on the PDF File Format</strong></p>
<p>For the geeky types among us, one place to begin is this article:</p>
<p><a href="http://www.mactech.com/articles/mactech/Vol.15/15.09/PDFIntro/">Portable Document Format: An Introduction for Programmers</a></p>
<p>The key points to get out of the article is this: A PDF document is comprised of several distinct sections, a <strong>Header</strong>, a <strong>Body</strong>, an <strong>&#8220;xref&#8221; Table</strong>, and a <strong>Trailer</strong>. At the very end of the file you will find the character sequence <strong>%%EOF</strong></p>
<p>The PDF standard was designed to allow multiple updates to a document, while retaining the original version. This is accomplished by appending anything new to the end of the document, after the original <strong>EOF</strong> tag. The document will now have two <strong>EOF</strong> tags: one indicating where the original document ended, and a new <strong>EOF</strong> tag indicating where the new changes end.</p>
<p>If we wish to revert PDF changes, it should be a simple matter of opening the PDF file in a binary editor, searching for the first <strong>EOF</strong> tag, and deleting everything following.</p>
<p><strong>A Simple Experiment</strong></p>
<p>Let&#8217;s start with a proper secret document containing missile plans&#8230;</p>
<p><img class="alignnone size-full wp-image-785" title="20091123-missile-plans-1" src="http://paperjammed.com/wp-content/uploads/2009/11/20091123-missile-plans-1.gif" alt="20091123-missile-plans-1" width="439" height="418" /></p>
<p>Suppose we want to obscure some special information in paragraph 37. We can open the file in Acrobat Professional and use its text editing features to swap in the venerable <em>Lorem Ipsum</em> text.</p>
<p>Here&#8217;s what it looks like after the switch:</p>
<p><img class="alignnone size-full wp-image-786" title="20091123-lorem-ipsum" src="http://paperjammed.com/wp-content/uploads/2009/11/20091123-lorem-ipsum.gif" alt="20091123-lorem-ipsum" width="598" height="243" /></p>
<p>You can see here that the first seven lines of text starting on paragraph 37 have been replaced with appropriate unreadable text.</p>
<p>Now, open the new PDF file in a binary editor (since PDF files contain a mix of text and binary, the editor must be a binary editor).</p>
<p><img class="alignnone size-full wp-image-787" title="20091123-binary-editor" src="http://paperjammed.com/wp-content/uploads/2009/11/20091123-binary-editor.gif" alt="20091123-binary-editor" width="693" height="633" /></p>
<p>Note the <strong>%%EOF</strong> character sequence embedded in the text. This is the first <strong>EOF</strong> tag, indicating where the original file ended. All we need to do is place the cursor to the right of the <strong>EOF</strong> and delete everything to the end of the file.</p>
<p>Once we have done so, it&#8217;s like magic:</p>
<p><img class="alignnone size-full wp-image-788" title="20091123-after-binary-editing" src="http://paperjammed.com/wp-content/uploads/2009/11/20091123-after-binary-editing.gif" alt="20091123-after-binary-editing" width="794" height="323" /></p>
<p>The edits that replaced lines of paragraph 37 with gibberish have neatly been undone!</p>
<p><strong>More Details</strong></p>
<p>From the <a href="http://www.mactech.com/articles/mactech/Vol.15/15.09/PDFIntro/">PDF Intro document</a> linked earlier:</p>
<p>&#8220;The trailer, it turns out, plays an important role in the way PDF implements incremental updating. The key concept to understand here is that a PDF file is never overwritten, only added to. That goes for all portions of the PDF file &#8211; even the trailer itself, and the end-of-file marker. In other words, a multiply-updated PDF document may contain multiple trailers &#8211; and multiple end-of-file markers! (There may be numerous occurrences of %%EOF.) Each time the file is edited, an addendum is written to the tail of the file, consisting of the content objects that have changed, a new xref section, and a new trailer containing all the information that was in the previous trailer, as well as a /Prev key specifying the byte offset (from the beginning of the file) of the previous xref section. The cross-reference info will then be distributed across more than one xref section. To access all of the cross-references, the reader must walk the list of /Prev keys in all the trailers, in reverse order.</p>
<p>Space doesn&#8217;t permit a detailed exploration of updates here, but you can find several examples in Appendix A of the PDF 1.3 specification (available at <a href="http://partners.adobe.com/asn/developer">http://partners.adobe.com/asn/developer</a>).&#8221;</p>
<p><strong>Summary</strong></p>
<p>It is important to understand that the PDF standard allows for appended updates to files that leave the original document intact, regardless of how drastic the changes are. If you are intent on redacting text from PDF documents, do not depend on simply deleting the secrets using a PDF editor—you must use a proper redaction tool that addresses these issues correctly.</p>
<p>That said, I did some experimenting with a few utilities (Apple Preview, PDFpen, and Adobe Acrobat Pro) and found that some write the file from scratch each time, with no lingering cruft from former versions, while others respect the original intent of the PDF standard. This means that you can&#8217;t trust that older revisions are being retained in your file and you can&#8217;t trust that they aren&#8217;t.</p>
<p>Be conservative: use a redaction tool for secrecy and proper backups for versioning.</p>
]]></content:encoded>
			<wfw:commentRss>http://paperjammed.com/2009/11/23/keeping-your-secrets-to-yourself-old-changes-lingering-in-your-pdf-files/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Snow Leopard Update for ScanSnap</title>
		<link>http://paperjammed.com/2009/11/13/snow-leopard-update-for-scansnap/</link>
		<comments>http://paperjammed.com/2009/11/13/snow-leopard-update-for-scansnap/#comments</comments>
		<pubDate>Sat, 14 Nov 2009 04:36:58 +0000</pubDate>
		<dc:creator>Tad</dc:creator>
				<category><![CDATA[Scanning]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Tools of the Trade]]></category>
		<category><![CDATA[Hardware]]></category>
		<category><![CDATA[Macintosh]]></category>
		<category><![CDATA[PDF]]></category>

		<guid isPermaLink="false">http://paperjammed.com/?p=770</guid>
		<description><![CDATA[This evening I opened my email and found a most welcome message: Fujitsu has released their patched version of the ScanSnap software for Snow Leopard. [UPDATE: I spoke too soon—they only delivered half of the goods. See below.] [UPDATE 2: Hurray! It's fixed! The birds are chirping and the sun is shining and life is [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-full wp-image-773" title="20091113-scansnap-update" src="http://paperjammed.com/wp-content/uploads/2009/11/20091113-scansnap-update.gif" alt="20091113-scansnap-update" width="371" height="228" />This evening I opened my email and found a most welcome message: <a href="http://www.fujitsu.com/us/services/computing/peripherals/scanners/support/sl_download.html">Fujitsu has released their patched version of the ScanSnap software for Snow Leopard</a>.</p>
<p>[UPDATE: I spoke too soon—they only delivered half of the goods. See below.]</p>
<p>[UPDATE 2: Hurray! It's fixed! The birds are chirping and the sun is shining and life is good!]</p>
<p>When Snow Leopard came out back in August, I ordered my copy the first week and was so excited that I installed it the day it arrived. <a href="http://paperjammed.com/2009/09/07/when-migrating-to-a-new-operating-system-look-before-you-leap/">My joy was short-lived</a>, however: the most important software package I use did not work!<span id="more-770"></span></p>
<p>I depend greatly on the OCR capabilities of the ABBYY FineReader software that comes with the ScanSnap scanners, and this was one of the many pieces of software that did not smoothly transition to Snow Leopard. I could scan documents, with limited functionality, but the OCR feature did not work.</p>
<p>Now that Fujitsu has released their official update, I will probably be installing Snow Leopard tomorrow evening. Now, do I do an upgrade or a full install? Hmmmm&#8230;</p>
<p>UPDATE</p>
<p>Well, they only delivered half of the goods. <img src='http://paperjammed.com/wp-includes/images/smilies/icon_sad.gif' alt=':-(' class='wp-smiley' /> </p>
<p>I was reading through the seven-step process for updating the ScanSnap drivers and I arrived at step seven:</p>
<blockquote><p><strong>Step 7:</strong> The download for FineReader for ScanSnap update to Snow Leopard will be hosted by ABBYY but is not yet available. If you have already subscribed to be notified by Fujitsu regarding the Snow Leopard updates, an email will be sent to you when it is posted.</p></blockquote>
<p>How displeasing. The only thing I really cared about was getting the OCR to work, and apparently Abbyy has not yet delivered their part (How hard can it be to update the part in your code that says &#8220;If the PDF metadata doesn&#8217;t match X then the document isn&#8217;t a ScanSnap doc&#8221; ?).</p>
<p>UPDATE 2</p>
<p>Instead of making us wait another month or two, ABBYY has delivered their patch in record time <img src='http://paperjammed.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> . I quickly installed the update and tested my ScanSnap script&#8230;</p>
<p>There are a few kinks to work out, but it is pretty clear that I will have my same old workflow AppleScript folder action up and running in short order. I will post the newer script as soon as I have it running properly.</p>
<p>And, as a side note, I had already purchased <a href="http://www.smileonmymac.com/PDFpen/index.html">PDFpen</a> from <a href="http://www.smileonmymac.com/">SmileOnMyMac</a> as a backup plan. Their tool incorporates the <a href="http://www.nuance.com/imaging/omnipage/omnipage-professional.asp">OmniPage OCR engine</a>, an engine that rivals that of ABBYY. My script was already running with PDFpen, but there were some issues with tables in documents that I forwarded on to SmileOnMyMac. One of these days I&#8217;ll post my scripts for PDFpen.</p>
]]></content:encoded>
			<wfw:commentRss>http://paperjammed.com/2009/11/13/snow-leopard-update-for-scansnap/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Dodged the corrupt-document bullet this time, just barely&#8230;</title>
		<link>http://paperjammed.com/2009/10/27/dodged-the-corrupt-document-bullet-this-time-just-barely/</link>
		<comments>http://paperjammed.com/2009/10/27/dodged-the-corrupt-document-bullet-this-time-just-barely/#comments</comments>
		<pubDate>Tue, 27 Oct 2009 21:52:30 +0000</pubDate>
		<dc:creator>Tad</dc:creator>
				<category><![CDATA[Searching and Indexing]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Data Loss]]></category>
		<category><![CDATA[Geeky]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[Knowledge Management]]></category>
		<category><![CDATA[PDF]]></category>

		<guid isPermaLink="false">http://paperjammed.com/?p=750</guid>
		<description><![CDATA[A couple of weeks ago, a co-worker sent me a PDF document to look at. He said that he was having trouble copying and pasting from the document and was scratching his head about why this particular PDF would have such issues. As it would turn out, there were several thousand other documents on a [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-medium wp-image-751" title="gibberish document in a file folder" src="http://paperjammed.com/wp-content/uploads/2009/10/iStock_000006486654XSmall-300x199.jpg" alt="gibberish document in a file folder" width="300" height="199" />A couple of weeks ago, a co-worker sent me a PDF document to look at. He said that he was having trouble copying and pasting from the document and was scratching his head about why this particular PDF would have such issues.</p>
<p>As it would turn out, there were several thousand other documents on a file server that shared the same funny behavior. By the time we were done struggling with this problem I had gained new respect for PDF corruption issues and their prevention.<span id="more-750"></span></p>
<p><strong>The Problem</strong></p>
<p>We were looking to load a few thousand of these scientific reports into a fancy-schmancy new database, with linguistics searching and other bells and whistles. Much to our chagrin, these documents just weren&#8217;t loading, and we couldn&#8217;t understand why. They were text documents, with some embedded images, but mostly straightforward text.</p>
<p>Here is an excerpt:</p>
<p><img class="alignnone size-full wp-image-755" title="20091027-plaintext" src="http://paperjammed.com/wp-content/uploads/2009/10/20091027-plaintext.gif" alt="20091027-plaintext" width="521" height="93" /></p>
<p>And you can tell that it is right and proper text because when I blow it up all the way, the fonts are nice and smooth—this isn&#8217;t just an image of text.</p>
<p><img class="alignnone size-full wp-image-756" title="20091027-smooth-letter" src="http://paperjammed.com/wp-content/uploads/2009/10/20091027-smooth-letter.gif" alt="20091027-smooth-letter" width="258" height="295" /></p>
<p>But if I copy and paste that particular paragraph into any handy editor (Notepad, in this case), this is what I see:</p>
<p><img class="alignnone size-full wp-image-757" title="20091027-notepad" src="http://paperjammed.com/wp-content/uploads/2009/10/20091027-notepad.gif" alt="20091027-notepad" width="496" height="155" /></p>
<p>And as far as I know, at this point the actual text is beyond the reach of average folks like me. We tried, believe me we tried.</p>
<p><strong>What went wrong?</strong></p>
<p>A quick Google of the subject led us to understand that many PDF generation tools embed subsets of fonts, with nonstandard mappings from the text to the font.</p>
<p>This fellow explains it nicely:</p>
<p>&#8220;The PDF file does not contain all the information to extract the text. The problem is that a character in a PDF file may not contain information what &#8220;real&#8221; character it relates to. Some PDF generators do a pretty bad job when they embed fonts into PDF files. They use a proprietary encoding mechanism (e.g. 1 is A, 2 is B, 3 is C, &#8230;) in both the embedded font and when they place glyphs on the page. Without a table that implements the reverse (e.g. character code 1 is &#8216;A&#8217;) you cannot extract text from such a file.</p>
<p>There is nothing you can do (besides to complain to whoever created the PDF file, and the author of the software that created this file).&#8221;<br />
— from <a href="http://www.experts-exchange.com/Web_Development/Document_Imaging/Adobe_Acrobat/Q_21426533.html">khkremer on experts-exchange.com</a></p>
<p>As it would turn out, many of the reports had been generated by printing to Adobe Distiller from Microsoft Word. It would seem that the default settings used for Distiller included the &#8220;totally hose my document content&#8221; switch.</p>
<p><strong>The Solution</strong></p>
<p>We fretted over this quite a bit. These are important scientific reports, and there is no way to easily ungarble them. We finally ended up contacting the <a href="http://finereader.abbyy.com/">Abbyy Finereader</a> folks and trying out their OCR toolkit for Linux: not only did this product make fast work of running optical character recognition on the sample document, but once we had a script running, we managed to blow through the 10,000 pages the trial license gave us, in a day or two.</p>
<p><strong>Imperfect, at best</strong></p>
<p>I am happy that we were able to salvage the bulk of the electronic knowledge found within those thousands of files, but our work barely scratched the surface.</p>
<p>For example, most of these documents have rich bookmarking of sections and keywording, such as this (content tastefully blurred on purpose).</p>
<p><img class="alignnone size-full wp-image-760" title="20091027-doc-with-contents" src="http://paperjammed.com/wp-content/uploads/2009/10/20091027-doc-with-contents.gif" alt="20091027-doc-with-contents" width="500" height="348" /></p>
<p>In addition, scientific documents typically have loads of tables full of numbers. Though it is possible to mine this data with a good OCR tool (the FineReader API provides tools for just this purpose), the tables are far more difficult to extract correctly once the original text information is lost.</p>
<p><strong>Final thoughts</strong></p>
<p>I wrote a few weeks about document formats, <a href="http://paperjammed.com/2009/09/29/are-your-portable-document-format-files-all-that/">mentioning the PDF/A document standard</a>. This is worth investigating, regardless of what your document needs are.</p>
<p>If our thousands of files had been originally generated as PDF/A, it is certain that we would have been able to copy/paste from them without problem: PDF/A prohibits such font shenanigans as were perpetrated on our garbled reports.</p>
<p>In the end, our OCR sledgehammer approach worked like a charm, and is probably sufficient for our needs. Text mining is a pretty slushy business, so no-one will complain if there are a few typos on each page—if they find the doc in a search, they can print it and read it the old fashioned way.</p>
]]></content:encoded>
			<wfw:commentRss>http://paperjammed.com/2009/10/27/dodged-the-corrupt-document-bullet-this-time-just-barely/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Are your Portable Document Format files all that?</title>
		<link>http://paperjammed.com/2009/09/29/are-your-portable-document-format-files-all-that/</link>
		<comments>http://paperjammed.com/2009/09/29/are-your-portable-document-format-files-all-that/#comments</comments>
		<pubDate>Wed, 30 Sep 2009 00:41:36 +0000</pubDate>
		<dc:creator>Tad</dc:creator>
				<category><![CDATA[Green Living]]></category>
		<category><![CDATA[Paperless Life]]></category>
		<category><![CDATA[Scanning]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Data Loss]]></category>
		<category><![CDATA[PDF]]></category>
		<category><![CDATA[Printing]]></category>
		<category><![CDATA[Searching and Indexing]]></category>

		<guid isPermaLink="false">http://paperjammed.com/?p=692</guid>
		<description><![CDATA[Like most people who are trying to archive reams of paper, the one reliable tool I always turn to is Adobe Portable Document Format. I trust my digital life to PDF. Almost everything I scan and most documents I write eventually end up squirreled away somewhere as PDF documents. Have you ever considered just how [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-medium wp-image-696" src="http://paperjammed.com/wp-content/uploads/2009/09/iStock_000009658438XSmall-201x300.jpg" alt="Lost keys at the beach" width="201" height="300" />Like most people who are trying to archive reams of paper, the one reliable tool I always turn to is Adobe Portable Document Format.</p>
<p>I trust my digital life to PDF. Almost everything I scan and most documents I write eventually end up squirreled away somewhere as PDF documents.</p>
<p>Have you ever considered just how portable those documents really are?</p>
<p><strong>What&#8217;s wrong with PDF?</strong></p>
<p><strong></strong>It seems strange to question the portability of these files, doesn&#8217;t it?</p>
<p>For the past ten or fifteen years Adobe has been providing Acrobat Reader and singing the wonders of their new universal document format. And it seemed to be all that, too—regardless of the audible groan we give when Acrobat launches after we click a link, isn&#8217;t it amazing that we can download press-ready copies of our income tax forms, that are guaranteed to look exactly the same when you print them as when I print them? Read on to see what dangers lurk within.<span id="more-692"></span></p>
<p><strong>What&#8217;s the Problem?</strong></p>
<p>In order to understand the nature of the PDF portability issues, one need only look as far as the web browser for an analogy. Consider how the web browser went from a barebones tool that could display a simple language, HTML, in a neutral way, fitting the web content onto each user&#8217;s screen, to a memory hogging behemoth that is an integral part of your operating system. It didn&#8217;t happen all at one; it has been death by a thousand cuts.</p>
<p>Mirroring the evolution of web browsers, the PDF document standard has adapted over the years to include many bells and whistles such as embedded audio, video, and JavaScript. It is these features that chip away at the core purpose and <em>raison d&#8217;être</em> of the PDF standard.</p>
<p><strong>An example: Font Issues</strong></p>
<p><strong></strong>A simple example of the weakness of these extended PDF features is the humble text font. When your application generates a PDF document, there is the option of using 14 standard PDF fonts, local machine fonts, or embedded TTF or Postscript fonts.</p>
<blockquote><p>There are 14 standard fonts that should be available by default in each PDF reader. These fonts are Courier, Courier Bold, Courier Italic (Oblique), Courier Bold and Italic, Helvetica, Helvetica Bold, Helvetica Italic (Oblique), Helvetica Bold and Italic, Times Roman, Times Roman Bold, Times Roman Italic, Times Roman Bold and Italic, Symbol and ZapfDingBats® (<a href="http://itextdocs.lowagie.com/tutorial/fonts/index.php">source</a>)</p></blockquote>
<p>Guess what happens when you set your document in <em><a href="http://new.myfonts.com/fonts/linotype/itc-mona-lisa/">Mona Lisa Solid ITC</a></em> and then print to PDF and send to all of your colleagues? Does your friend&#8217;s machine have a copy of this font? Maybe, and maybe not.</p>
<p>As I was writing this, I planned on putting together a cute demo by saving a document set in Mona Lisa Solid ITC in PDF from my Mac and then opening it on a PC. Much to my surprise (and delight), I found that the default &#8220;Print to PDF&#8221; functionality on my Mac does, in fact, embed the font within the document.</p>
<p>Regardless, if you have always just trusted that the fonts would be identical across platforms, you could get quite a surprise when your friend tries to print your beautiful document.</p>
<p><strong>PDF/A Standard</strong></p>
<p>Some time back, Adobe recognized the need for a more tightly controlled standard, for creating <em>really portable</em> documents, instead of mere <em>portable</em> documents. This standard, dating from 2005, is referred to as <a href="http://en.wikipedia.org/wiki/PDF/A">PDF/A</a>, where the A stands for Archive.</p>
<blockquote><p>A key element to &#8230; reproducibility is the requirement for PDF/A documents to be 100 % self-contained. All of the information necessary for displaying the document in the same manner every time is embedded in the file. This includes, but is not limited to, all content (text, raster images and vector graphics), fonts, and color information. A PDF/A document is not permitted to be reliant on information from external sources (e.g. font programs and hyperlinks). (<a href="http://en.wikipedia.org/wiki/PDF/A#Description">Wikipedia</a>)</p></blockquote>
<p>Basically PDF/A forbids all of the flashy stuff and sticks to the basics: good solid document rendering.</p>
<p>Banned features include:</p>
<ul>
<li>Audio and Video</li>
<li>JavaScript</li>
<li>Encryption</li>
<li>Nonstandard metadata</li>
<li>Transparent images</li>
</ul>
<p>In addition to the loss of several features, PDF/A documents can be somewhat larger, due to the embedded fonts, and they might have rendering issues with images that depend on transparency.</p>
<p>With all that, it still sounds like an enticing concept. Many PDF tools speak fluent PDF/A. Check out your own toolkit and see if you can future-proof your documents a little more</p>
<p><strong>Here&#8217;s more on PDF/A documents</strong></p>
<p><a href="http://blog.nitropdf.com/index.php/2009/07/13/longterm-digital-archiving-pdfa/">Long-term digital archiving with PDF/A</a> (The PDF Blog)<br />
<a href="http://en.wikipedia.org/wiki/PDF/A">PDF/A</a> (Wikipedia)<br />
<a href="http://www.pdfa.org/doku.php?id=pdfa:en:pdfa_whitepaper">PDF/A &#8211; A new Standard for Long-Term Archiving</a> (PDF/A Competence Center)</p>
]]></content:encoded>
			<wfw:commentRss>http://paperjammed.com/2009/09/29/are-your-portable-document-format-files-all-that/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

