<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Paper Jammed &#187; Software</title>
	<atom:link href="http://paperjammed.com/category/software/feed/" rel="self" type="application/rss+xml" />
	<link>http://paperjammed.com</link>
	<description>Has paper taken over your life?</description>
	<lastBuildDate>Wed, 30 Jun 2010 02:14:53 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Sort out those disorganized thoughts with a Mind Map</title>
		<link>http://paperjammed.com/2010/05/27/sort-out-those-disorganized-thoughts-with-a-mind-map/</link>
		<comments>http://paperjammed.com/2010/05/27/sort-out-those-disorganized-thoughts-with-a-mind-map/#comments</comments>
		<pubDate>Fri, 28 May 2010 01:35:37 +0000</pubDate>
		<dc:creator>Tad</dc:creator>
				<category><![CDATA[Software]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[Knowledge Management]]></category>
		<category><![CDATA[Macintosh]]></category>
		<category><![CDATA[Tips]]></category>
		<category><![CDATA[Windows]]></category>

		<guid isPermaLink="false">http://paperjammed.com/?p=997</guid>
		<description><![CDATA[You know the feeling: you are involved in some intractable problem that has all kinds of weird angles and you just can’t get your head around it—perhaps you feel like you are inspecting an elephant, one square inch at a time, or maybe you simply feel like you are herding cats.
There are plenty of different [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-medium wp-image-999" src="http://paperjammed.com/wp-content/uploads/2010/05/iStock_000008990728XSmall-300x199.jpg" alt="iStockphoto" width="300" height="199" />You know the feeling: you are involved in some intractable problem that has all kinds of weird angles and you just can’t get your head around it—perhaps you feel like you are inspecting an elephant, one square inch at a time, or maybe you simply feel like you are herding cats.</p>
<p>There are plenty of different ways to catalog loosely associated knowledge of varying complexity—a few months back I discussed using a wiki for this—but some problems just don’t need that level of complexity and depth.</p>
<p>Some problems are more suited to random scribblings on a whiteboard, and that is where mind mapping software comes in.<span id="more-997"></span></p>
<p><strong>What&#8217;s a Mind Map?</strong></p>
<p>Imagine you and your family are going to fly to Rio de Janeiro this summer to visit distant relatives and you realize that there are about a thousand things to do  in preparation but you just can&#8217;t sort it all out.</p>
<p>You know that you need passports and you need to verify that everyone&#8217;s visa is still valid. There is the monumental task of deciding what to pack. You might want to make a checklist of places you want to visit. And you want to go hang gliding down to the beach, but there&#8217;s something nagging at you about whether or not your health insurance would cover a broken leg in a foreign land.</p>
<p>The problem is that it is difficult to keep the whole thing in your mind—if you concentrate on the luggage, you forget about the international driver&#8217;s license.</p>
<p>Mind maps allow you to visualize the whole thing at once, and you can slide stuff around and get it looking nice and pretty.</p>
<p><img class="alignnone size-full wp-image-1004" src="http://paperjammed.com/wp-content/uploads/2010/05/20100527-mind-map-11.png" alt="" width="550" height="305" /></p>
<p>This is a relatively simple start at a mind map that represents the vacation. These maps are typically read from the top right going clockwise, though many have no specific sequence. In fact, you can do pretty much what you like with a mind map as long as it works for you.</p>
<p><strong>Just Another Outliner?</strong></p>
<p>At first glance these tools look like glorified outlining applications such as OmniOutliner, and they do serve admirably in this respect, but they are so much more. An outline gives you a very easy way to organize topics and thoughts, adding annotations and such along the way, but it isn&#8217;t nearly as easy to visualize and nonlinear concepts do not map well to an outline.</p>
<p>Consider the Rio Trip example above—you could put all of that information into an outline, but it would not be nearly as easy to process mentally.</p>
<p>And these mind maps look especially cool in presentations.</p>
<p><strong>Slick Document Generation</strong></p>
<p>Some of the commercial mind mapping products provide pretty good integration with Microsoft Office products.</p>
<p>Some time back, I needed to write up a set of style standards for Oracle&#8217;s PL/SQL programming language for our offshore team. Rather than just dive into Word and hope for the best, I used Mind Manager Pro from Mindjet to make the following map:</p>
<p><img class="alignnone size-full wp-image-1005" src="http://paperjammed.com/wp-content/uploads/2010/05/20100527-mind-map-21.png" alt="" width="550" height="331" /></p>
<p>This tool supported attaching rich text to nodes in the map, so I used these notes to handle the actual code examples.</p>
<p>I was then able to put together a nice Word template that matched our corporate documents and I clicked the <strong>Export to MS Word </strong>button and had an instant document:</p>
<p><img class="alignnone size-full wp-image-1006" src="http://paperjammed.com/wp-content/uploads/2010/05/20100527-mind-map-31.png" alt="" width="550" height="483" /></p>
<p>I have also used mind maps to auto-generate PowerPoint slide decks as well. These days I use Mind Manager on a daily basis at the office; it makes a great difference when I am trying to grasp complex topics with lots of strange dangly bits hanging off of the edges.</p>
<p><strong>Nothing New</strong></p>
<p>Mind maps have been around for a long time. A quick search of the &#8216;Net will show you that mind mapping software is quite plentiful and mature. There are good free products for PC and Mac available and there are many commercial products that take mind mapping a step further, often integrating with Microsoft Office.</p>
<p>Take a look at <a href="http://en.wikipedia.org/wiki/Mind_map">what Wikipedia has to say</a> about mind mapping. <a href="http://www.43folders.com/2006/09/17/mac-mind-mapping">Here&#8217;s a post</a> from 43 Folders on the same topic. Peter Russell <a href="http://www.peterrussell.com/MindMaps/Uses.php">has useful information</a> as well about them.</p>
<p>I learned about mind mapping from a friend at work who had been using them for years. After seeing him make some quick notes during a meeting, I was sold.</p>
<p><strong>Closing Thoughts</strong></p>
<p>I made the Rio de Janeiro example using XMind, the free &#8220;lite&#8221; version of <a href="http://www.xmind.net/">XMindPro</a>. This application is pretty full featured for a free basic version—the Pro version adds enterprise features such as import/export and collaboration.</p>
<p>There are <a href="http://www.xmind.net/downloads/">XMind versions for Mac, PC, and Linux</a>.</p>
<p>If you are looking for more, all of the commercial products offer a trial period. I use MindJet&#8217;s <a href="http://www.mindjet.com/products/mindmanager-8-win/overview">Mind Manager</a>. Be warned, these tools are expensive, just like buying MS Office, but you might just find that they more than make up for their cost with your newly found productivity.</p>
<div id="_mcePaste" style="overflow: hidden; position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px;"><!--[if gte mso 9]><xml> <w:WordDocument> <w:View>Normal</w:View> <w:Zoom>0</w:Zoom> <w:PunctuationKerning /> <w:ValidateAgainstSchemas /> <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid> <w:IgnoreMixedContent>false</w:IgnoreMixedContent> <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText> <w:Compatibility> <w:BreakWrappedTables /> <w:SnapToGridInCell /> <w:WrapTextWithPunct /> <w:UseAsianBreakRules /> <w:DontGrowAutofit /> </w:Compatibility> <w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel> </w:WordDocument> </xml><![endif]--><!--[if gte mso 9]><xml> <w:LatentStyles DefLockedState="false" LatentStyleCount="156"> </w:LatentStyles> </xml><![endif]--><!--  /* Style Definitions */  p.MsoNormal, li.MsoNormal, div.MsoNormal 	{mso-style-parent:""; 	margin:0in; 	margin-bottom:.0001pt; 	mso-pagination:widow-orphan; 	font-size:12.0pt; 	font-family:"Times New Roman"; 	mso-fareast-font-family:"Times New Roman";} @page Section1 	{size:8.5in 11.0in; 	margin:1.0in 1.25in 1.0in 1.25in; 	mso-header-margin:.5in; 	mso-footer-margin:.5in; 	mso-paper-source:0;} div.Section1 	{page:Section1;} --><!--[if gte mso 10]> <mce:style><!   /* Style Definitions */  table.MsoNormalTable 	{mso-style-name:"Table Normal"; 	mso-tstyle-rowband-size:0; 	mso-tstyle-colband-size:0; 	mso-style-noshow:yes; 	mso-style-parent:""; 	mso-padding-alt:0in 5.4pt 0in 5.4pt; 	mso-para-margin:0in; 	mso-para-margin-bottom:.0001pt; 	mso-pagination:widow-orphan; 	font-size:10.0pt; 	font-family:"Times New Roman"; 	mso-ansi-language:#0400; 	mso-fareast-language:#0400; 	mso-bidi-language:#0400;} --> <!--[endif]--></p>
<p class="MsoNormal">You know the feeling: you are involved in some intractable problem that has all kinds of weird angles and you just can’t get your head around it—perhaps you feel like you are inspecting an elephant, one square inch at a time, or maybe you simply feel like you are herding cats.</p>
<p class="MsoNormal">
<p class="MsoNormal">There are plenty of different ways to catalog loosely associated knowledge of varying complexity—a few months back I discussed using a wiki for this—but some problems just don’t need that level of complexity and depth.</p>
<p class="MsoNormal">
<p class="MsoNormal">Some problems are more suited to random scribblings on a whiteboard, and that is where mind mapping software comes in.</p>
<p class="MsoNormal">
<p class="MsoNormal">A short time back I was asked to serve on a project team that is involved in managing a stream of data going to dozens of downstream systems. Not only is the project massive, but each of these downstream products has its own project team and politics to deal with.</p>
<p class="MsoNormal">How do you even begin to learn about forty different computer systems? What do you do with all of the odd little tidbits of information that keep flowing in from all sides?</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://paperjammed.com/2010/05/27/sort-out-those-disorganized-thoughts-with-a-mind-map/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New life for an old PC—no geek card required</title>
		<link>http://paperjammed.com/2010/05/05/new-life-for-an-old-pc%e2%80%94no-geek-card-required/</link>
		<comments>http://paperjammed.com/2010/05/05/new-life-for-an-old-pc%e2%80%94no-geek-card-required/#comments</comments>
		<pubDate>Thu, 06 May 2010 01:52:22 +0000</pubDate>
		<dc:creator>Tad</dc:creator>
				<category><![CDATA[Paperless Life]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Backups]]></category>
		<category><![CDATA[Data Loss]]></category>
		<category><![CDATA[Geeky]]></category>
		<category><![CDATA[Good Sites]]></category>
		<category><![CDATA[Hardware]]></category>
		<category><![CDATA[Knowledge Management]]></category>
		<category><![CDATA[Networking]]></category>
		<category><![CDATA[Reviews]]></category>
		<category><![CDATA[Tips]]></category>

		<guid isPermaLink="false">http://paperjammed.com/?p=985</guid>
		<description><![CDATA[Do you still have an old machine kicking around in the basement or the back room, long forgotten?
For no cost and almost zero effort, you can set it up as a dedicated network appliance, using one of the many turnkey products from the open-source TurnKey Linux project.
I&#8217;m serious. You don&#8217;t need to know anything at [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-medium wp-image-986" src="http://paperjammed.com/wp-content/uploads/2010/05/iStock_000004973496XSmall-200x300.jpg" alt="istockphoto.com" width="200" height="300" />Do you still have an old machine kicking around in the basement or the back room, long forgotten?<br />
For no cost and almost zero effort, you can set it up as a dedicated network appliance, using one of the many turnkey products from the open-source TurnKey Linux project.</p>
<p>I&#8217;m serious. You don&#8217;t need to know anything at all about Linux to use one of these. Just download the image, install, and you suddenly have a full featured NAS file server, or you might have a database or a source code repository.</p>
<p>Last year I wrote an article on <a href="http://paperjammed.com/2009/02/15/new-life-for-an-old-clunker/">how to set up a NAS device using Ubuntu Linux</a>. I have been a fan of Ubuntu since the start because it is a very easy distribution to install and configure. The down-side of using Linux has always been the fairly steep learning curve. Before you can get around to using the server, you need to get down in the weeds with configuration files and other stuff.</p>
<p>TurnKey Linux changes all of that.<span id="more-985"></span></p>
<p><strong>Painless Installation</strong></p>
<p>A few weeks back, I was setting up an aging PC as a standalone wiki server for a small office—this machine was going to provide a place for the office staff to document their procedures, how-tos, and other things.</p>
<p>I was about to set up an Ubuntu server, as I have done before many times, and install MoinMoin, like I did <a href="http://paperjammed.com/2009/10/12/why-not-try-a-personal-wiki-for-some-of-your-more-amorphous-notes/">some months back</a>. I remembered that it was a bit of a pain to get everything tweaked just right, so I did a quick check to see what kind of standalone wiki options were available online.</p>
<p>This is how I found TurnKey Linux. This project is all about single-purpose preconfigured Ubuntu server images.</p>
<p>One of those preconfigured images happens to be a <a href="http://www.turnkeylinux.org/mediawiki">MediaWiki appliance</a>—the wiki engine behind Wikipedia—and I was in business.</p>
<p>The installation took about fifteen minutes, with very little user interaction. I answered a few basic questions and the installer took over from there. As soon as the install was done, the machine rebooted and displayed a message on the monitor with the IP addresses where you can browse to from any other machine.</p>
<p><strong>Full Featured</strong></p>
<p>The work that has gone in to these appliances is amazing. In fifteen minutes I had installed a complex configuration that has the Apache, PHP, MySQL, MediaWiki core, as well as maintenance utilities such as a neat tool that provides a <span style="text-decoration: line-through;">Flash-based</span> pure-AJAX-based SSH command line in a remote browser (i.e. your browser becomes a terminal). Even someone with Linux experience would have to spend quite a bit of time fiddling around with different packages and configuration options in other to provide the same functionality that TurnKey gives you out of the box.</p>
<p>As with most open source projects, the documentation is about 80% complete, with deep detail in some areas, but leaving others fairly sparsely documented. But don&#8217;t let this deter you: in most cases users know how to use the product they are installing (e.g. MediaWiki) but don&#8217;t want the hassle of configuring it on Linux. That&#8217;s where TurnKey shines.</p>
<p><strong>Some Examples</strong></p>
<p>In minutes, you can set up a <a href="http://www.turnkeylinux.org/fileserver">NAS device</a>. If you want to try advanced content management in your office, try <a href="http://www.turnkeylinux.org/joomla">Joomla</a> or <a href="http://www.turnkeylinux.org/drupal6">Drupal</a>.</p>
<p>If you are working on a small project team and want to protect your source code, try <a href="http://www.turnkeylinux.org/redmine">Redmine</a> or <a href="http://www.turnkeylinux.org/trac">Trac</a> and do your bug tracking using <a href="http://www.turnkeylinux.org/bugzilla">Bugzilla</a>.</p>
<p>And while you are at it, you can document your organization&#8217;s working practices using a wiki such as <a href="http://www.turnkeylinux.org/moinmoin">MoinMoin</a> or <a href="http://www.turnkeylinux.org/mediawiki">MediaWiki</a>.</p>
<p><strong>Don&#8217;t forget to back it up!</strong></p>
<p>As with any computer, you should include your new TurnKey appliance in your backup strategy. The nice thing is that you don&#8217;t really need to care at all about backing up Linux or the other software; just back up the data. I don&#8217;t need to back up my entire MediaWiki machine; I just need to back up the database and image files. If anything goes wrong, you can rebuild the TurnKey appliance from scratch in minutes and then restore your data.</p>
<p>To save yourself some pain, keep notes on any small tweaks you made to the configuration.</p>
<p><strong>One Machine, One Purpose</strong></p>
<p>These disk images share common Ubuntu underpinnings, but they are referred to as Appliances because they turn your PC into a purpose-built appliance.</p>
<p>This means that if you want a content management system and you also want a ticket management system, you will need two old computers—not a rare commodity these days.</p>
<p>Take a look at <a href="http://www.turnkeylinux.org/">what they have to offer</a> and give TurnKey a shot—specialized software used in corporate environments is now within reach of small offices at the right price.</p>
]]></content:encoded>
			<wfw:commentRss>http://paperjammed.com/2010/05/05/new-life-for-an-old-pc%e2%80%94no-geek-card-required/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>A handful of sweet freebie tools to save the day</title>
		<link>http://paperjammed.com/2010/03/16/a-handful-of-sweet-freebie-tools-to-save-the-day/</link>
		<comments>http://paperjammed.com/2010/03/16/a-handful-of-sweet-freebie-tools-to-save-the-day/#comments</comments>
		<pubDate>Wed, 17 Mar 2010 03:31:14 +0000</pubDate>
		<dc:creator>Tad</dc:creator>
				<category><![CDATA[Searching and Indexing]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Workflow]]></category>
		<category><![CDATA[Geeky]]></category>
		<category><![CDATA[Macros]]></category>
		<category><![CDATA[PDF]]></category>
		<category><![CDATA[Scripting]]></category>
		<category><![CDATA[Tips]]></category>
		<category><![CDATA[Windows]]></category>

		<guid isPermaLink="false">http://paperjammed.com/?p=930</guid>
		<description><![CDATA[It so happens that my employer has made a most welcome decision to replace the aging creaky old Novell GroupWise mail software with Microsoft Outlook, joining the rest of the modern corporate world. Now, there is little love in my heart for GroupWise, but it does have one feature that the new Outlook configuration will [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-medium wp-image-935" title="iStock_000000846660XSmall" src="http://paperjammed.com/wp-content/uploads/2010/03/iStock_000000846660XSmall-300x199.jpg" alt="" width="300" height="199" />It so happens that my employer has made a most welcome decision to replace the aging creaky old Novell GroupWise mail software with Microsoft Outlook, joining the rest of the modern corporate world. Now, there is little love in my heart for GroupWise, but it does have one feature that the new Outlook configuration will lack: you can keep as many emails as you want, just like Gmail.</p>
<p>The problem is this: with Outlook we will be limited to 1000 messages in our in-box; sadly, many of us have tens of thousands of emails in our old GroupWise mail. Even after a fairly rigorous slash and burn mission, hacking out all of the low hanging fruit, there will be many thousands remaining and I don&#8217;t want to lose that information. It might be useful to search and find how I set up a Zebra bar code printer in 2003, no?</p>
<p>A bundle of different freeware glue tools came to my rescue. Read on to hear about the toolset that has made it so I can keep those messages for years to come.<span id="more-930"></span></p>
<p><strong>Possible Solutions</strong></p>
<p>Right out of the gate, I began looking for ways to migrate messages from one mail client to the other. Some apps have this built right in, and if not, there are scripts and utilities out there to do this; but I was hampered by a few key facts:</p>
<ul>
<li>I have no control over the email clients and their configuration. Even if there is a menu option for exporting GroupWise messages from version 7.2, I&#8217;m stuck at 6.4 and cannot use that option.</li>
<li>GroupWise is a minor player in the email world. I&#8217;m not sure if Outlook would import from GroupWise, but I doubt it.</li>
<li>They are <em>replacing</em> the client in one shot. There will be no interim period where both GroupWise and Outlook will be available.</li>
<li>There is no getting around the hard limit of 1000 messages.</li>
<li>I don&#8217;t want to spend money on this.</li>
</ul>
<p>With these constraints in mind, I immediately thought about PDF documents. I then considered the following questions:</p>
<ul>
<li>How do I convert my email to PDF?</li>
<li>How can I do this automatically with thousands of emails?</li>
<li>Once I&#8217;m done, how do I search these documents?</li>
</ul>
<p>Here&#8217;s what I did:</p>
<p><strong>Conversion to PDF</strong></p>
<p>The first part was easy. I downloaded one of the many free print-to-PDF products available.</p>
<p>I chose <a href="http://sourceforge.net/projects/pdfcreator/">PDFCreator</a>, because I am familiar with its use and I know that it <a href="http://paperjammed.com/2009/10/27/dodged-the-corrupt-document-bullet-this-time-just-barely/">does not munge the fonts</a>.</p>
<p>Like many other PDF generation utilities, PDFCreator functions by providing a virtual printer to which any application can print. For example, to make a PDF of a web page, you use the Firefox <strong>Print</strong> menu and select <strong>PDFCreator</strong> from the drop-down list of available printers.</p>
<p>You are provided with a list of metadata fields that you can fill in, and these fields are used in the PDF generation.</p>
<p>Here&#8217;s what the PDFCreator screen looks like:</p>
<p><img class="alignnone size-full wp-image-931" title="20100316-pdfcreator1" src="http://paperjammed.com/wp-content/uploads/2010/03/20100316-pdfcreator1.gif" alt="" width="500" height="367" /></p>
<p><strong>A word of caution:</strong> PDF Creator is free, but you must be careful to deselect their spammy toolbar options in two different places during the installation process. I don&#8217;t like software that comes with preselected toolbars to install (even nice ones like Google&#8217;s) because I&#8217;m certain that 95% of the folks who actually install the toolbar would never have chosen to do so if it were unchecked by default.</p>
<p><strong>Running Everything Automatically</strong></p>
<p>This was the interesting bit. I work with Windows machines at work, so there was no AppleScript option available. So I did the next best thing: I used <a href="http://www.autoitscript.com/autoit3/index.shtml">AutoIT</a>.</p>
<p>I will warn you that AutoIT is pretty much the Windows analog of AppleScript, without the cutesy pseudo English syntax. In other words, you will need to roll up your sleeves and get your hands a little dirty in order to put together a decent AutoIT script.</p>
<p>The payoff comes when you finish your work and compile it into a tight executable that you can share with your friends, allowing them to automate some complex series of button clicks and copy/paste operations.</p>
<p>I walked through the manual process of exporting an email to PDF and listed each action:</p>
<ul>
<li>Get the date, sender, and subject</li>
<li>Create a filename based on date + sender + subject</li>
<li>Launch the <strong>Print</strong> dialog</li>
<li>Select <strong>PDFCreator</strong></li>
<li>Fill in the <strong>Document Title</strong>, <strong>Creation Date</strong>, and <strong>Subject</strong> in the PDFCreator dialog</li>
<li>Fill in the full file path in the Save dialog</li>
</ul>
<p>In addition, I wanted to make the script a little better by adding the following:</p>
<ul>
<li>Check that user has PDFCreator installed</li>
<li>Verify that GroupWise is running and that the user has selected one or more messages</li>
<li>Prompt the user for a target directory before processing the messages</li>
<li>Sanitize the filenames by replacing illegal characters with underscores and truncating to meet maximum filename and path length in Windows</li>
<li>Skip over files that have already been generated, quickly, so that one doesn&#8217;t need to worry about accidentally selecting messages that were already printed</li>
</ul>
<p>There were other adjustments needed, but the process was the same: run the script, hit a problem, tweak the script a little to address the problem, and repeat.</p>
<p>Here&#8217;s a little bit of the AutoIT script:</p>
<p><img class="size-full wp-image-943 alignnone" title="20100316-autoit" src="http://paperjammed.com/wp-content/uploads/2010/03/20100316-autoit.gif" alt="" width="500" height="345" /></p>
<p>You can see that it is a bit more intense than AppleScript, but remember that the full script wasn&#8217;t written in one go. I had a little short ten-line script that I kept tweaking as small problems cropped up until I had adjusted things to my liking.</p>
<p>Note that this is a GUI macro language. The machine starts clicking and typing away right in front of you and you probably shouldn&#8217;t interfere until your script finishes.</p>
<p>As of this afternoon, I have generated around 4,000 PDF documents for my email messages.</p>
<p><strong>Searching All of Those Documents</strong></p>
<p>This was the easiest part. These days there is an excellent tool available for searching documents on your desktop: <a href="http://desktop.google.com/">Google Desktop</a>. This product indexes every useful file on your desktop and provides a full Google search with a quick double-tap of the &lt;control&gt; key.</p>
<p>So you can enter a search like &#8220;Zebra bar code&#8221;</p>
<p><img class="alignnone size-full wp-image-944" title="20100316-google1" src="http://paperjammed.com/wp-content/uploads/2010/03/20100316-google1.gif" alt="" width="300" height="205" /></p>
<p>And the results look exactly like a Google web search, but it&#8217;s showing your desktop files. And you can see inline previews too.</p>
<p><img class="alignnone size-full wp-image-945" title="20100316-google2" src="http://paperjammed.com/wp-content/uploads/2010/03/20100316-google2.gif" alt="" width="500" height="443" /></p>
<p>Macintosh users can install Google Desktop as well, but all of these files should already be indexed and searchable by Spotlight.</p>
<p><strong>Closing Thoughts</strong></p>
<p>Whenever I reach for tools like this I feel a twinge of guilt—it&#8217;s outright hackery, isn&#8217;t it?</p>
<p>But there is a place for quick and dirty jobs in every workplace. I needed to get my files from one place to another, one time only. It just didn&#8217;t make sense to spend money or time on a more elegant solution.</p>
<p>Play around with each of these tools a little. Especially AutoIT—it&#8217;s a handy Swiss Army Knife to have at your disposal.</p>
]]></content:encoded>
			<wfw:commentRss>http://paperjammed.com/2010/03/16/a-handful-of-sweet-freebie-tools-to-save-the-day/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Don&#8217;t punish your family with stacks of photos!</title>
		<link>http://paperjammed.com/2010/02/24/dont-punish-your-family-with-stacks-of-photos/</link>
		<comments>http://paperjammed.com/2010/02/24/dont-punish-your-family-with-stacks-of-photos/#comments</comments>
		<pubDate>Thu, 25 Feb 2010 04:18:47 +0000</pubDate>
		<dc:creator>Tad</dc:creator>
				<category><![CDATA[Paperless Life]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Buying guide]]></category>
		<category><![CDATA[Online Services]]></category>
		<category><![CDATA[Photos]]></category>
		<category><![CDATA[Printing]]></category>
		<category><![CDATA[Reviews]]></category>

		<guid isPermaLink="false">http://paperjammed.com/?p=890</guid>
		<description><![CDATA[A while back I had a rare opportunity indeed: I went on a business trip to India to visit our offshore team. We knew it was a once in a lifetime trip, so four of us took some vacation days and paid our own way on a side trip to some of the great cities [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-medium wp-image-897" title="iStock_000000110397XSmall" src="http://paperjammed.com/wp-content/uploads/2010/02/iStock_000000110397XSmall-225x300.jpg" alt="iStockphoto" width="225" height="300" />A while back I had a rare opportunity indeed: I went on a business trip to India to visit our offshore team. We knew it was a once in a lifetime trip, so four of us took some vacation days and paid our own way on a side trip to some of the great cities of India after the business was done. When we finally sat down to pool our collection on layover in Frankfurt, there were over 1,500 photos.</p>
<p>What do you do with 1,500 photographs?</p>
<p>In hope of sparing some folks hours of boredom I&#8217;d like to share my ideas on this topic here.<span id="more-890"></span></p>
<p><strong>The Endless Stack of Photos</strong></p>
<p>We have all been there. A friend or family member brandishes a stack of photos, saying &#8220;Let me show you photos of our trip to Ecuador&#8230;&#8221; (oh no&#8230; here it comes&#8230;). At this point, you reach for the photographs, but they hold the stack out of reach. They then turn each one over slowly, telling a long tale about every single image. &#8220;Oh, look at this monkey, it was so cute when he stole the candy out of little Billy&#8217;s hand and spit it into Aunt Sally&#8217;s hair.&#8221; and so on and so on.</p>
<p>You begin to look at the size of the stack and estimate how long this process will take.</p>
<p>Everyone has been on the receiving end of this treatment, but have you ever been the perpetrator?</p>
<p>It&#8217;s an easy trap to fall in to. To be honest, when you are showing photos to a friend, each image brings back a wave of pleasant memories and it is tempting to bask in the enjoyment of the memory, talking about how you felt at the time, as your friend&#8217;s eyes begin to aquire a glossy sheen.</p>
<p><strong>Don&#8217;t Be <em>That Person</em></strong></p>
<p>Remember how it felt the last time you endured a four hour photo flipping marathon and have pity on those around you.</p>
<p>Here&#8217;s my strategy for pleasant photo sharing:</p>
<ul>
<li>Pare down the photo collection. Substantially.</li>
<li>Create an attractive photo album with the finest photos of the lot.</li>
<li>Hand the album to your friend and <em>let them turn the pages</em>.</li>
</ul>
<p>Back to those 1,500 photos from India&#8230;</p>
<p>Here is one of the pages from the album I made from that trip.</p>
<p><img class="alignnone size-full wp-image-903" src="http://paperjammed.com/wp-content/uploads/2010/02/20100224-album1.jpg" alt="" width="535" height="413" /></p>
<p>As you can see, the end result is fairly simplistic, with a few very nice photos.</p>
<p><strong>Paring Down the Stack</strong></p>
<p>Even the most avid photographer understands that <em>nobody</em> wants to see a thousand photos. And don&#8217;t think that just because you made a slideshow with music instead of printing the photos that you are exempt. You can afford to cut the number down quite a bit.</p>
<p>Consider that a typical Hollywood motion picture contains less than 10 percent of the total footage filmed. Stanley Kubrick, ever the perfectionist, took this to the extreme with shooting ratios around 100 to 1. Following the analogy, in photography, it is quite reasonable to take dozens of photos for every single picture that you might share to others.</p>
<p>The real trick is deciding exactly how far to go with the selection process.</p>
<p>In my experience, you can weed out the bad photos for hours, and when you think the job is done, you can still go back and toss out a few dozen more.</p>
<p>I filter my photos in three major phases, using the five-star rating tool of my photo library software to help keep things in order. I personally use iPhoto, but any other good photo library suite should offer ratings and smart folders.</p>
<p><strong>Phase 1: Removing obviously bad photos</strong></p>
<p>This is a very quick pass through the whole collection. I start by selecting everything and marking all photos with a neutral rating of three stars.</p>
<p>I then find any photos that are underexposed or are blurry and give them one single star. Along the way, any photos that obviously have no useful content get the same treatment.</p>
<p><strong>Phase 2: Identifying decent photos</strong></p>
<p>I use a smart folder to show all photos with three stars or greater. This hides all of the junk from the first pass.</p>
<p>Now, I go through each photo and give it a deeper look. I sort them into three different stacks, giving two stars to anything that has useless or boring content and giving four stars to photos that I think are worth showing to people. Photos that don&#8217;t fit either description retain the neutral three-star rating. These are often repeats of the one good photo I tossed in the four-star stack.</p>
<p><strong>Phase 3: Identifying the best of the best</strong></p>
<p>I use a new smart folder to show only photos with four stars or better.</p>
<p>This is the hardest part. I go through the photos and try to find the absolutely best photo that expresses each experience or thought.</p>
<ul>
<li>If the photos are of places, then it makes sense to check that you have at least one photo of each important place you visited.</li>
<li>If these are shapshots of friends and family, then you probably should verify that each person shows up in at least one of the pictures.</li>
</ul>
<p>I often have a difficult time working through photo collections from visits to my wife&#8217;s family in Brazil: there are hundreds of people in the pictures and I often have doubts over who is family and who isn&#8217;t. Fortunately, my wife sits patiently with me and helps at this stage.</p>
<p>Look again at the photos of the elephant ride and the snake charmer. I probably have two dozen different shots of the snake charmers, while the elephant shot was a single dodgy photo taken by the tour guide. I was able to pick the very best snake charmer photo, but I had little choice with the other—there was no way I was going to omit a picture of me on an elephant so I used it. These are the kinds of tradeoffs we are dealing with.</p>
<p><img class="alignnone size-full wp-image-896" src="http://paperjammed.com/wp-content/uploads/2010/02/20100224-rating-photos.jpg" alt="" width="535" height="350" /></p>
<p>You can see in this screenshot how I flagged most of the photos as three stars (I have already hidden all of the one-star dreck). There is one photo that has four stars, while the one next to it had an unappealing composition in my opinion, so it got two.</p>
<p><strong>How far should you go?</strong></p>
<p>I suppose this comes down to personal preference, but I like to keep things down to thirty or forty photos—from a starting point in the thousands. In a photo album, you can represent your whole trip in fifteen or twenty pages. This is far less intimidating than a big thick stack of photographs.</p>
<p>Of course, context is important. I will go out on a limb here and say that a new baby can be shown to co-workers in five photos.</p>
<p>In the end, I create a smart folder that shows only photos with five stars. And boy do they look good!</p>
<p><strong>Tweak the best photos</strong></p>
<p>My favorite tool of all for tweaking photos is the crop tool. A good crop can dramatically change the composition of a shot while still retaining the purity of the photo.</p>
<p>I will also straighten any slanting horizons and possibly fix funky light balance at this point. The tool set provided in iPhoto is quite adequate for these simple tasks.</p>
<p>Now create a slick photo album using any of the great tools available online.</p>
<p><strong>Make the Photo Album</strong></p>
<p>Again, I like using iPhoto. It allows you to easily create fancy albums using templates and so forth. Once you are done, you can buy a finished album with a few clicks.</p>
<p>Once you have your short list of photos, use a five-star smart folder as the source for the photo album. You can then spend a pleasant evening or two playing around with the layouts and composition and adding captions to your photos.</p>
<ul>
<li>Create a smart folder that shows only five-star photos</li>
<li>Create a new photo album based on that smart folder</li>
<li>Choose a pleasing layout</li>
<li>Add your photos in varying page styles to the book</li>
<li>Write some informative and/or witty captions for the photos</li>
</ul>
<p><img class="alignnone size-full wp-image-906" src="http://paperjammed.com/wp-content/uploads/2010/02/20100224-iphoto-edit2.jpg" alt="" width="535" height="306" /></p>
<p>This is a screenshot of the iPhoto application, with the A-list photos along the top and the snake charmer page in the editing window.</p>
<p>You can even make albums like this online, without any editing software whatsoever&#8230;</p>
<p><img class="alignnone size-full wp-image-907" src="http://paperjammed.com/wp-content/uploads/2010/02/20100224-winkflash.jpg" alt="" width="535" height="540" /></p>
<p>This is a similar photo album creation tool that is run completely from the <a href="http://www.winkflash.com/">Winkflash</a> website.</p>
<p>Now, finish picking out the features you want on your album (e.g. cover style) and place your order.<br />
In my opinion, these albums come with the optimal number of pages (usually 20). Any more pages could make it tedious and boring. Fit your vacation into those 20 pages.</p>
<p>These books usually cost around forty bucks, but they are worth every penny.</p>
<p><strong>The Finished Product</strong></p>
<p>When everything is done, you will have a beautiful printed photo album that looks like you bought it at a book store.</p>
<p>I have seen photo albums from three different outfits up close.</p>
<p><em>MyPublisher</em></p>
<p>I have a couple of albums from these guys and they are near perfect. The pages look like thick magazine pages, with magazine-quality photos. I found that my white-on-black text bled a little.<br />
Note that the leather used on the leather-bound books is paper thin.</p>
<p>These folks are always sending me coupons for 40% off, so it seems that you really don&#8217;t ever have to pay full price for their wares.</p>
<p><em>Apple iPhoto</em></p>
<p>These are identical to the MyPublisher books. Indeed, at one time iPhoto could send its output to either MyPublisher or Apple. I heard a rumor that MyPublisher did the books for Apple at one point.</p>
<p>The only down side to the Apple books is that you pay high shipping costs. Otherwise, the books are perfect.</p>
<p><em>Winkflash</em></p>
<p>A friend shared one of these with me. I happened to have my India album nearby, so we compared them. On the one hand, Winkflash is much cheaper; however, the image quality is not nearly as nice as the other books. Perhaps I was looking at a lower-end book from them, but the ink dots were a little coarse for my taste.</p>
<p><strong>Summary</strong></p>
<p>By putting a little effort in aggressive photo selection, basic image tweaks, and then taking advantage of the many photo book tools out there you can create a beautiful album that is a pleasure to leaf through.</p>
<p>These books have been available for several years; even so, whenever I hand one to a friend, they page through it, transfixed. People really love these albums and they actually enjoy looking through them.</p>
<p>Oh, and they make great gift ideas too!</p>
<p>What does this topic have to do with reducing paper in our lives? Believe me, printing one of these books is so much neater and cleaner than printing hundreds of loose photos. And you will enjoy them more.</p>
]]></content:encoded>
			<wfw:commentRss>http://paperjammed.com/2010/02/24/dont-punish-your-family-with-stacks-of-photos/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Automate ScanSnap OCR process on your Mac with AppleScript (Snow Leopard Edition)</title>
		<link>http://paperjammed.com/2010/01/04/automate-scansnap-ocr-process-on-your-mac-with-applescript-snow-leopard-edition/</link>
		<comments>http://paperjammed.com/2010/01/04/automate-scansnap-ocr-process-on-your-mac-with-applescript-snow-leopard-edition/#comments</comments>
		<pubDate>Tue, 05 Jan 2010 01:51:52 +0000</pubDate>
		<dc:creator>Tad</dc:creator>
				<category><![CDATA[Software]]></category>
		<category><![CDATA[Workflow]]></category>
		<category><![CDATA[Geeky]]></category>
		<category><![CDATA[PDF]]></category>
		<category><![CDATA[Scanning]]></category>
		<category><![CDATA[Scripting]]></category>
		<category><![CDATA[Searching and Indexing]]></category>

		<guid isPermaLink="false">http://paperjammed.com/?p=840</guid>
		<description><![CDATA[Some time back I published an AppleScript that allows one to automatically run OCR in the background on scanned files generated by your Fujitsu ScanSnap, while you to continue scanning more files. ScanSnap owners should all be familiar with this: the out-of-the-box configuration of the ScanSnap Manager and Abbyy Finereader force the scan and OCR [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://paperjammed.com/wp-content/uploads/2009/08/20090829-applescript.gif"><img class="alignright size-full wp-image-658" title="20090829-applescript" src="http://paperjammed.com/wp-content/uploads/2009/08/20090829-applescript.gif" alt="" width="128" height="128" /></a>Some time back I published an AppleScript that allows one to <a href="http://paperjammed.com/2009/08/29/automate-scansnap-ocr-process-on-your-mac-with-applescript/">automatically run OCR in the background on scanned files</a> generated by your Fujitsu ScanSnap, while you to continue scanning more files. ScanSnap owners should all be familiar with this: the out-of-the-box configuration of the ScanSnap Manager and Abbyy Finereader force the scan and OCR stages to run in lockstep: scan 1&#8230;OCR 1&#8230;scan 2&#8230;OCR 2&#8230; and so on. This script allowed you to scan regardless of the OCR processing going on.</p>
<p>As it turns out, my original script does not work in Snow Leopard, and I promised that I would one day clean up and publish my new and improved version.</p>
<p>Chris posted a comment today as a gentle reminder, so here is the new and improved version without further delay&#8230;<br />
<span id="more-840"></span><br />
<strong>The Details</strong></p>
<p>Unfortunately, Snow Leopard came around <a href="http://paperjammed.com/2009/09/07/when-migrating-to-a-new-operating-system-look-before-you-leap/">and caused some indigestion</a>. For starters, the ScanSnap Manager didn&#8217;t work correctly and Abbyy Finereader would not process anything made by the ScanSnap. A couple of months later <a href="http://paperjammed.com/2009/11/13/snow-leopard-update-for-scansnap/">they got everything straightened out</a> and delivered <a href="http://www.fujitsu.com/us/services/computing/peripherals/scanners/support/sl_download.html">new versions of each product</a>.</p>
<p>The new version of the Abbyy Finereader product does not play well with my original script.</p>
<p>Since I cannot do without this important functionality, I rolled up my sleeves and rewrote most of the script. The new version works in Snow Leopard quite nicely with one small annoyance: you really don&#8217;t want to try to use the machine for anything other than scanning or OCR while it is going because the new Finereader version keeps bouncing the darned icon all the time it is running and that is quite annoying to watch.</p>
<p>Fortunately, I really don&#8217;t need to use my machine for anything else while it is chewing on the docs; I just wanted to be able to continue scanning at the same time!</p>
<p><strong>Note: </strong>Before going forward, note that you will need to upgrade the ScanSnap Manager and Abbyy Finereader to the Snow Leopard versions first! Get the files <a href="http://www.fujitsu.com/us/services/computing/peripherals/scanners/support/sl_download.html">here</a>.</p>
<p>Here is a link to the <a href="http://paperjammed.com/wp-content/uploads/2010/01/Run-OCR-on-New-Folder-Items.scpt">new script</a>&#8230;</p>
<p>And here&#8217;s the code itself:</p>
<div class="codecolorer-container applescript default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;height:300px;"><table cellspacing="0" cellpadding="0"><tbody><tr><td style="padding:5px;text-align:center;color:#888888;background-color:#EEEEEE;border-right: 1px solid #9F9F9F;font: normal 12px/1.4em Monaco, Lucida Console, monospace;"><div>1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />11<br />12<br />13<br />14<br />15<br />16<br />17<br />18<br />19<br />20<br />21<br />22<br />23<br />24<br />25<br />26<br />27<br />28<br />29<br />30<br />31<br />32<br />33<br />34<br />35<br />36<br />37<br />38<br />39<br />40<br />41<br />42<br />43<br />44<br />45<br />46<br />47<br />48<br />49<br />50<br />51<br />52<br />53<br />54<br />55<br />56<br />57<br />58<br />59<br />60<br />61<br />62<br />63<br />64<br />65<br />66<br />67<br />68<br />69<br />70<br />71<br />72<br />73<br />74<br />75<br />76<br />77<br />78<br />79<br />80<br />81<br />82<br />83<br />84<br />85<br />86<br />87<br />88<br />89<br />90<br />91<br />92<br />93<br />94<br />95<br />96<br />97<br />98<br />99<br />100<br />101<br />102<br />103<br />104<br />105<br />106<br />107<br />108<br />109<br />110<br />111<br />112<br />113<br />114<br />115<br />116<br />117<br />118<br />119<br />120<br />121<br />122<br />123<br />124<br />125<br />126<br />127<br />128<br />129<br />130<br />131<br />132<br />133<br />134<br />135<br />136<br />137<br />138<br />139<br />140<br />141<br />142<br />143<br />144<br />145<br />146<br />147<br />148<br />149<br />150<br />151<br />152<br />153<br />154<br />155<br />156<br />157<br />158<br />159<br />160<br />161<br />162<br />163<br />164<br />165<br />166<br />167<br />168<br />169<br />170<br />171<br />172<br />173<br />174<br />175<br />176<br />177<br />178<br />179<br />180<br />181<br />182<br />183<br />184<br />185<br />186<br />187<br />188<br />189<br />190<br />191<br />192<br />193<br />194<br />195<br />196<br />197<br />198<br />199<br />200<br />201<br />202<br />203<br />204<br />205<br />206<br />207<br />208<br />209<br />210<br />211<br />212<br />213<br />214<br />215<br />216<br />217<br />218<br />219<br />220<br />221<br />222<br />223<br />224<br />225<br />226<br />227<br />228<br />229<br />230<br />231<br />232<br />233<br />234<br /></div></td><td><div class="applescript codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #808080; font-style: italic;">(*<br />
<br />
NOTE: This script was written for Snow Leopard. It may work<br />
on Leopard, but I never tried it.<br />
<br />
This is a folder listener script that will act as a queue, receiving<br />
PDF files from the ScanSnap scanner and feeding them, one by one, to<br />
the Abbyy FineReader OCR software.<br />
<br />
This allows you to keep scanning while the OCR job runs in the background<br />
on all of the unprocessed files.<br />
<br />
Why do we want to do this?<br />
<br />
The ScanSnap Manager software does not support this by default, so<br />
when you scan in a file, it sends it to FineReader for OCR. You then<br />
must wait until FineReader finishes its work before scanning in another<br />
document.<br />
<br />
This script allows you to keep scanning without waiting for OCR.<br />
<br />
Installation:<br />
<br />
o &nbsp; Copy this script to:<br />
<br />
&nbsp; &nbsp; &lt;home&gt;/Library/Scripts/Folder Action Scripts<br />
<br />
&nbsp; &nbsp; You may have to create the &quot;Folder Action Scripts&quot; folder.<br />
<br />
o &nbsp; Open a Finder window and navigate to the parent folder<br />
&nbsp; of the scanned documents folder.<br />
<br />
o Right click (control-click) the scanned documents folder and<br />
&nbsp; choose:<br />
<br />
&nbsp; &nbsp; Folder Actions Setup...<br />
<br />
o At this point if folder actions are not enabled, you will<br />
&nbsp; likely have to enable them and add the script manually.<br />
&nbsp; &nbsp; - check &quot;Enable Folder Actions&quot;<br />
&nbsp; &nbsp; - Use the &quot;+&quot; buttons on the left and right sides to add the<br />
&nbsp; &nbsp; &nbsp; scan folder and then this script.<br />
&nbsp; &nbsp; <br />
o Otherwise, a list of scripts will come up. Choose this script<br />
&nbsp; from the &quot;Choose a Script to Attach&quot; dialog.<br />
<br />
o Close all windows.<br />
<br />
Copyright (C) 2010 Tad Harrison<br />
*)</span><br />
<span style="color: #ff0033; font-weight: bold;">property</span> ocrFileSuffix : <span style="color: #009900;">&quot; processed by FineReader.pdf&quot;</span><br />
<span style="color: #ff0033; font-weight: bold;">property</span> ocrApplicationName : <span style="color: #009900;">&quot;Scan to Searchable PDF&quot;</span><br />
<span style="color: #ff0033; font-weight: bold;">property</span> ocrApplicationWindow : <span style="color: #009900;">&quot;Converting the document&quot;</span><br />
<span style="color: #ff0033; font-weight: bold;">property</span> ocrLockFileName : <span style="color: #009900;">&quot;OCR in Progress&quot;</span><br />
<span style="color: #ff0033; font-weight: bold;">on</span> <span style="color: #0066ff;">adding</span> <span style="color: #0066ff;">folder</span> <span style="color: #0066ff;">items</span> <span style="color: #ff0033; font-weight: bold;">to</span> this_folder <span style="color: #ff0033;">after</span> <span style="color: #0066ff;">receiving</span> added_items<br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> lockFilePath <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #000000;">&#40;</span><span style="color: #0066ff;">POSIX path</span> <span style="color: #ff0033; font-weight: bold;">of</span> <span style="color: #000000;">&#40;</span><span style="color: #0066ff;">path to</span> <span style="color: #0066ff;">desktop</span> <span style="color: #0066ff;">folder</span> <span style="color: #ff0033;">as</span> <span style="color: #0066ff;">text</span><span style="color: #000000;">&#41;</span><span style="color: #000000;">&#41;</span> <span style="color: #000000;">&amp;</span> ocrLockFileName<br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">try</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; logEvent<span style="color: #000000;">&#40;</span><span style="color: #009900;">&quot;=== Run OCR on New Folder Items ===&quot;</span><span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #808080; font-style: italic;">-- Test for lockfile; exit if lockfile exists</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">tell</span> <span style="color: #0066ff;">application</span> <span style="color: #009900;">&quot;System Events&quot;</span> <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #ff0033; font-weight: bold;">set</span> lockFileExists <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #0066ff;">exists</span> <span style="color: #0066ff;">file</span> lockFilePath<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">if</span> lockFileExists <span style="color: #ff0033; font-weight: bold;">then</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; logEvent<span style="color: #000000;">&#40;</span><span style="color: #009900;">&quot;Other script running. Exiting...&quot;</span><span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">return</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">else</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #0066ff;">do shell script</span> <span style="color: #009900;">&quot;/usr/bin/touch <span style="color: #000000; font-weight: bold;">\&quot;</span>&quot;</span> <span style="color: #000000;">&amp;</span> lockFilePath <span style="color: #000000;">&amp;</span> <span style="color: #009900;">&quot;<span style="color: #000000; font-weight: bold;">\&quot;</span>&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">end</span> <span style="color: #ff0033; font-weight: bold;">if</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #808080; font-style: italic;">-- Main loop</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> moreWorkToDo <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #0066ff;">true</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">repeat</span> <span style="color: #ff0033; font-weight: bold;">while</span> moreWorkToDo<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> aFile <span style="color: #ff0033; font-weight: bold;">to</span> getNextFile<span style="color: #000000;">&#40;</span>this_folder<span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">if</span> <span style="color: #ff0033;">not</span> aFile <span style="color: #000000;">=</span> <span style="color: #009900;">&quot;&quot;</span> <span style="color: #ff0033; font-weight: bold;">then</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ocrFile<span style="color: #000000;">&#40;</span>aFile<span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">else</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> moreWorkToDo <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #0066ff;">false</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">end</span> <span style="color: #ff0033; font-weight: bold;">if</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">end</span> <span style="color: #ff0033; font-weight: bold;">repeat</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; logEvent<span style="color: #000000;">&#40;</span><span style="color: #009900;">&quot;No more work.&quot;</span><span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; exitApp<span style="color: #000000;">&#40;</span>ocrApplicationName<span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">on</span> <span style="color: #ff0033; font-weight: bold;">error</span> errorStr <span style="color: #0066ff;">number</span> errNum<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #0066ff;">display dialog</span> <span style="color: #009900;">&quot;Error &quot;</span> <span style="color: #000000;">&amp;</span> errNum <span style="color: #000000;">&amp;</span> <span style="color: #009900;">&quot; while running OCR: &quot;</span> <span style="color: #000000;">&amp;</span> errorStr<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> <span style="color: #ff0033; font-weight: bold;">my</span> isRunning <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #0066ff;">false</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">end</span> <span style="color: #ff0033; font-weight: bold;">try</span><br />
&nbsp; &nbsp; <span style="color: #808080; font-style: italic;">-- Get rid of the lockfile, ignoring any errors</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">try</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #0066ff;">do shell script</span> <span style="color: #009900;">&quot;/bin/rm <span style="color: #000000; font-weight: bold;">\&quot;</span>&quot;</span> <span style="color: #000000;">&amp;</span> lockFilePath <span style="color: #000000;">&amp;</span> <span style="color: #009900;">&quot;<span style="color: #000000; font-weight: bold;">\&quot;</span>&quot;</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">end</span> <span style="color: #ff0033; font-weight: bold;">try</span><br />
<span style="color: #ff0033; font-weight: bold;">end</span> <span style="color: #0066ff;">adding</span> <span style="color: #0066ff;">folder</span> <span style="color: #0066ff;">items</span> <span style="color: #ff0033; font-weight: bold;">to</span><br />
<span style="color: #808080; font-style: italic;">(*<br />
Name: ocrFile<br />
Description: Runs OCR on the next un-OCR'd file<br />
Parameters:<br />
&nbsp; aFile - the file to be OCR'd<br />
*)</span><br />
<span style="color: #ff0033; font-weight: bold;">on</span> ocrFile<span style="color: #000000;">&#40;</span>aFile<span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> posixFilePath <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #0066ff;">POSIX path</span> <span style="color: #ff0033; font-weight: bold;">of</span> aFile<br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> posixOcrFilePath <span style="color: #ff0033; font-weight: bold;">to</span> getPosixOcrFilePath<span style="color: #000000;">&#40;</span>posixFilePath<span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; logEvent<span style="color: #000000;">&#40;</span><span style="color: #009900;">&quot;OCR: &quot;</span> <span style="color: #000000;">&amp;</span> posixFilePath<span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">tell</span> <span style="color: #0066ff;">application</span> ocrApplicationName <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #0066ff;">open</span> aFile<br />
&nbsp; &nbsp; <span style="color: #808080; font-style: italic;">--</span><br />
&nbsp; &nbsp; <span style="color: #808080; font-style: italic;">-- Now sit in a loop checking once per second for the OCR file</span><br />
&nbsp; &nbsp; <span style="color: #808080; font-style: italic;">-- Give up after five minutes</span><br />
&nbsp; &nbsp; <span style="color: #808080; font-style: italic;">--</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">with</span> <span style="color: #ff0033; font-weight: bold;">timeout</span> <span style="color: #ff0033; font-weight: bold;">of</span> <span style="color: #000000;">300</span> seconds<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> ocrFileExists <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #0066ff;">false</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">repeat</span> <span style="color: #ff0033; font-weight: bold;">until</span> ocrFileExists<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> ocrFileExists <span style="color: #ff0033; font-weight: bold;">to</span> posixFileExists<span style="color: #000000;">&#40;</span>posixOcrFilePath<span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">if</span> ocrFileExists <span style="color: #ff0033; font-weight: bold;">then</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; logEvent<span style="color: #000000;">&#40;</span><span style="color: #009900;">&quot;OCR file generated.&quot;</span><span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #808080; font-style: italic;">-- Wait 5 even if the file was found, to let things settle</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; delay <span style="color: #000000;">5</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">else</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #808080; font-style: italic;">-- Wait a second before checking again</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; delay <span style="color: #000000;">1</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">end</span> <span style="color: #ff0033; font-weight: bold;">if</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">end</span> <span style="color: #ff0033; font-weight: bold;">repeat</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">end</span> <span style="color: #ff0033; font-weight: bold;">timeout</span><br />
<span style="color: #ff0033; font-weight: bold;">end</span> ocrFile<br />
<span style="color: #808080; font-style: italic;">(*<br />
Name: appIsRunning<br />
Description: Determines if a particular application is running.<br />
Parameters:<br />
&nbsp; &nbsp; appName - the name of the application to be tested<br />
Returns: True if the application is running; otherwise False<br />
*)</span><br />
<span style="color: #ff0033; font-weight: bold;">on</span> appIsRunning<span style="color: #000000;">&#40;</span>appName<span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">tell</span> <span style="color: #0066ff;">application</span> <span style="color: #009900;">&quot;System Events&quot;</span> <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #000000;">&#40;</span><span style="color: #0066ff;">name</span> <span style="color: #ff0033; font-weight: bold;">of</span> processes<span style="color: #000000;">&#41;</span> <span style="color: #ff0033;">contains</span> appName<br />
<span style="color: #ff0033; font-weight: bold;">end</span> appIsRunning<br />
<span style="color: #808080; font-style: italic;">(*<br />
Name: posixFileExists<br />
Description: Determines if a particular file exists.<br />
Parameters:<br />
&nbsp; &nbsp; posixFilePath - the POSIX path to the file<br />
Returns: True if the file exists; otherwise False<br />
*)</span><br />
<span style="color: #ff0033; font-weight: bold;">on</span> posixFileExists<span style="color: #000000;">&#40;</span>posixFilePath<span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">tell</span> <span style="color: #0066ff;">application</span> <span style="color: #009900;">&quot;System Events&quot;</span> <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #0066ff;">exists</span> <span style="color: #0066ff;">file</span> posixFilePath<br />
<span style="color: #ff0033; font-weight: bold;">end</span> posixFileExists<br />
<span style="color: #808080; font-style: italic;">(*<br />
Name: exitApp<br />
Description: Exits the specified app if it is running.<br />
Parameters:<br />
&nbsp; &nbsp; appName - the application name<br />
*)</span><br />
<span style="color: #ff0033; font-weight: bold;">on</span> exitApp<span style="color: #000000;">&#40;</span>appName<span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">if</span> appIsRunning<span style="color: #000000;">&#40;</span>appName<span style="color: #000000;">&#41;</span> <span style="color: #ff0033; font-weight: bold;">then</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">tell</span> <span style="color: #0066ff;">application</span> appName <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #0066ff;">quit</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">end</span> <span style="color: #ff0033; font-weight: bold;">if</span><br />
<span style="color: #ff0033; font-weight: bold;">end</span> exitApp<br />
<span style="color: #808080; font-style: italic;">(*<br />
Name: getPosixOcrFilePath<br />
Description: Gets the OCR output filename for a given input filename.<br />
Parameters:<br />
&nbsp; &nbsp; posixFilePath - the full path to the source file<br />
Return: the POSIX path of the OCR output file<br />
*)</span><br />
<span style="color: #ff0033; font-weight: bold;">on</span> getPosixOcrFilePath<span style="color: #000000;">&#40;</span>posixFilePath<span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> posixBaseName <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #0066ff;">do shell script</span> ¬<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;">&quot;filename=&quot;</span> <span style="color: #000000;">&amp;</span> <span style="color: #0066ff;">quoted form</span> <span style="color: #ff0033; font-weight: bold;">of</span> posixFilePath <span style="color: #000000;">&amp;</span> <span style="color: #009900;">&quot;; echo ${filename%<span style="color: #000000; font-weight: bold;">\\</span>.*}&quot;</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> posixOcrFilePath <span style="color: #ff0033; font-weight: bold;">to</span> posixBaseName <span style="color: #000000;">&amp;</span> ocrFileSuffix<br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">return</span> posixOcrFilePath<br />
<span style="color: #ff0033; font-weight: bold;">end</span> getPosixOcrFilePath<br />
<span style="color: #808080; font-style: italic;">(*<br />
Name: getNextFile<br />
Description: Finds the next unprocessed ScanSnap PDF<br />
Return: the file or &quot;&quot;<br />
*)</span><br />
<span style="color: #ff0033; font-weight: bold;">on</span> getNextFile<span style="color: #000000;">&#40;</span>aFolder<span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; logEvent<span style="color: #000000;">&#40;</span><span style="color: #009900;">&quot;Getting next file...&quot;</span><span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> masterFileList <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #0066ff;">list</span> <span style="color: #0066ff;">folder</span> aFolder ¬<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">without</span> <span style="color: #0066ff;">invisibles</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> posixPath <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #0066ff;">POSIX path</span> <span style="color: #ff0033; font-weight: bold;">of</span> aFolder<br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">repeat</span> <span style="color: #ff0033; font-weight: bold;">with</span> i <span style="color: #ff0033; font-weight: bold;">from</span> <span style="color: #000000;">1</span> <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #0066ff;">count</span> masterFileList<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> fileName <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #0066ff;">item</span> i <span style="color: #ff0033; font-weight: bold;">of</span> masterFileList<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> posixFilePath <span style="color: #ff0033; font-weight: bold;">to</span> posixPath <span style="color: #000000;">&amp;</span> fileName<br />
&nbsp; &nbsp; &nbsp; &nbsp; log posixFilePath<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #808080; font-style: italic;">--</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #808080; font-style: italic;">-- Construct a FineReader file name from our file</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #808080; font-style: italic;">--</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> posixOcrFilePath <span style="color: #ff0033; font-weight: bold;">to</span> getPosixOcrFilePath<span style="color: #000000;">&#40;</span>posixFilePath<span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #808080; font-style: italic;">--</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #808080; font-style: italic;">-- See if the FineReader file we constructed exists</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #808080; font-style: italic;">--</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> ocrFileExists <span style="color: #ff0033; font-weight: bold;">to</span> posixFileExists<span style="color: #000000;">&#40;</span>posixOcrFilePath<span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">tell</span> <span style="color: #0066ff;">me</span> <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #ff0033; font-weight: bold;">set</span> fileCreator <span style="color: #ff0033; font-weight: bold;">to</span> getSpotlightInfo for <span style="color: #009900;">&quot;kMDItemCreator&quot;</span> <span style="color: #ff0033; font-weight: bold;">from</span> posixFilePath<br />
&nbsp; &nbsp; &nbsp; &nbsp; log <span style="color: #000000;">&#40;</span><span style="color: #009900;">&quot;Creator: &quot;</span> <span style="color: #000000;">&amp;</span> fileCreator<span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">if</span> <span style="color: #ff0033;">not</span> ocrFileExists <span style="color: #ff0033;">and</span> fileCreator <span style="color: #000000;">=</span> <span style="color: #009900;">&quot;ScanSnap Manager&quot;</span> <span style="color: #ff0033; font-weight: bold;">then</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">return</span> <span style="color: #0066ff;">POSIX file</span> posixFilePath<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">end</span> <span style="color: #ff0033; font-weight: bold;">if</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">end</span> <span style="color: #ff0033; font-weight: bold;">repeat</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">return</span> <span style="color: #009900;">&quot;&quot;</span><br />
<span style="color: #ff0033; font-weight: bold;">end</span> getNextFile<br />
<span style="color: #808080; font-style: italic;">(*<br />
Name: getSpotlightInfo<br />
Description: Gets a named attribute from metadata for a specific file.<br />
Parameters:<br />
&nbsp; &nbsp; for myattribute - the name of the attribute<br />
&nbsp; &nbsp; from myfile - the name of the file<br />
Returns: the attribute value or &quot;&quot; if none found<br />
*)</span><br />
<span style="color: #ff0033; font-weight: bold;">on</span> getSpotlightInfo for myattribute <span style="color: #ff0033; font-weight: bold;">from</span> myfile<br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">try</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> this_kMDItemResult <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #009900;">&quot;&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">tell</span> <span style="color: #0066ff;">application</span> <span style="color: #009900;">&quot;Finder&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> this_item <span style="color: #ff0033; font-weight: bold;">to</span> myfile <span style="color: #ff0033;">as</span> <span style="color: #0066ff;">string</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> this_item <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #0066ff;">POSIX path</span> <span style="color: #ff0033; font-weight: bold;">of</span> this_item<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> this_kMDItem <span style="color: #ff0033; font-weight: bold;">to</span> myattribute<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> theResult <span style="color: #ff0033; font-weight: bold;">to</span> words <span style="color: #ff0033; font-weight: bold;">of</span> <span style="color: #000000;">&#40;</span><span style="color: #0066ff;">do shell script</span> <span style="color: #009900;">&quot;/usr/bin/mdls -name &quot;</span> <span style="color: #000000;">&amp;</span> this_kMDItem <span style="color: #000000;">&amp;</span> <span style="color: #009900;">&quot; -raw -nullMarker None &quot;</span> <span style="color: #000000;">&amp;</span> <span style="color: #0066ff;">quoted form</span> <span style="color: #ff0033; font-weight: bold;">of</span> this_item<span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; log <span style="color: #009900;">&quot;Result: &quot;</span> <span style="color: #000000;">&amp;</span> theResult <span style="color: #ff0033;">as</span> <span style="color: #0066ff;">string</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">repeat</span> <span style="color: #ff0033; font-weight: bold;">with</span> j <span style="color: #ff0033; font-weight: bold;">from</span> <span style="color: #000000;">1</span> <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #0066ff;">number</span> <span style="color: #ff0033; font-weight: bold;">of</span> <span style="color: #0066ff;">items</span> <span style="color: #ff0033; font-weight: bold;">in</span> theResult<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> this_kMDItemResult <span style="color: #ff0033; font-weight: bold;">to</span> this_kMDItemResult <span style="color: #000000;">&amp;</span> <span style="color: #0066ff;">item</span> j <span style="color: #ff0033; font-weight: bold;">of</span> theResult <span style="color: #ff0033;">as</span> <span style="color: #0066ff;">string</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">if</span> j <span style="color: #000000;">&lt;</span> <span style="color: #0066ff;">number</span> <span style="color: #ff0033; font-weight: bold;">of</span> <span style="color: #0066ff;">items</span> <span style="color: #ff0033; font-weight: bold;">in</span> theResult <span style="color: #ff0033; font-weight: bold;">then</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> this_kMDItemResult <span style="color: #ff0033; font-weight: bold;">to</span> this_kMDItemResult <span style="color: #000000;">&amp;</span> <span style="color: #009900;">&quot; &quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">end</span> <span style="color: #ff0033; font-weight: bold;">if</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">end</span> <span style="color: #ff0033; font-weight: bold;">repeat</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">end</span> <span style="color: #ff0033; font-weight: bold;">tell</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">on</span> <span style="color: #ff0033; font-weight: bold;">error</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> this_kMDItemResult <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #009900;">&quot;&quot;</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">end</span> <span style="color: #ff0033; font-weight: bold;">try</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">return</span> this_kMDItemResult<br />
<span style="color: #ff0033; font-weight: bold;">end</span> getSpotlightInfo<br />
<span style="color: #808080; font-style: italic;">(*<br />
Name: logEvent<br />
Description: Write an event to an event log<br />
Parameters:<br />
&nbsp; &nbsp; themessage - the message to write to the log<br />
*)</span><br />
<span style="color: #ff0033; font-weight: bold;">on</span> logEvent<span style="color: #000000;">&#40;</span>themessage<span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff0033; font-weight: bold;">set</span> theLine <span style="color: #ff0033; font-weight: bold;">to</span> <span style="color: #000000;">&#40;</span><span style="color: #0066ff;">do shell script</span> ¬<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;">&quot;date &nbsp;+'%Y-%m-%d %H:%M:%S'&quot;</span> <span style="color: #ff0033;">as</span> <span style="color: #0066ff;">string</span><span style="color: #000000;">&#41;</span> ¬<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #000000;">&amp;</span> <span style="color: #009900;">&quot; &quot;</span> <span style="color: #000000;">&amp;</span> themessage<br />
&nbsp; &nbsp; <span style="color: #0066ff;">do shell script</span> <span style="color: #009900;">&quot;echo &quot;</span> <span style="color: #000000;">&amp;</span> theLine <span style="color: #000000;">&amp;</span> ¬<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #009900;">&quot; &gt;&gt; ~/Library/Logs/AppleScript-events.log&quot;</span><br />
<span style="color: #ff0033; font-weight: bold;">end</span> logEvent</div></td></tr></tbody></table></div>
<p><strong>Installation</strong></p>
<ul>
<li>Use the Script Editor to save this script as <strong>Run OCR on New Folder Items</strong> under <strong><em>User Home</em>/Library/Scripts/Folder Action Scripts</strong><br />
You may have to create the <strong>Folder Action Scripts</strong> folder.</li>
<li>Now open a Finder window and navigate to the parent folder of your scanned documents folder.</li>
<li>Right click (control-click) the scanned documents folder and choose <strong>Folder Actions Setup&#8230;</strong></li>
<li>At this point if folder actions are not enabled, you will likely have to enable them and add the script manually.
<ul>
<li> Check <strong>Enable Folder Actions</strong></li>
<li>Use the &#8220;+&#8221; buttons on the left and right sides to add the scan folder and then this script.</li>
</ul>
</li>
<li>Otherwise, a list of scripts will come up. Choose this script from the <strong>Choose a Script to Attach</strong> dialog.</li>
<li>Close all windows.</li>
</ul>
<p>That&#8217;s it! The script will be invoked automatically every time a new file appears in your scanned documents folder.</p>
<p>Please let me know if you have any ideas that can improve this script. I&#8217;m not an AppleScript guru, so someone might just know how to keep that annoying Finereader icon from jumping.</p>
]]></content:encoded>
			<wfw:commentRss>http://paperjammed.com/2010/01/04/automate-scansnap-ocr-process-on-your-mac-with-applescript-snow-leopard-edition/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Don&#8217;t worry if you didn&#8217;t sanitize your documents—even the TSA forgets occasionally</title>
		<link>http://paperjammed.com/2009/12/08/dont-worry-if-you-didnt-sanitize-your-documents%e2%80%94even-the-tsa-forgets-occasionally/</link>
		<comments>http://paperjammed.com/2009/12/08/dont-worry-if-you-didnt-sanitize-your-documents%e2%80%94even-the-tsa-forgets-occasionally/#comments</comments>
		<pubDate>Tue, 08 Dec 2009 22:29:29 +0000</pubDate>
		<dc:creator>Tad</dc:creator>
				<category><![CDATA[Paperless Life]]></category>
		<category><![CDATA[Searching and Indexing]]></category>
		<category><![CDATA[Security]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Data Loss]]></category>
		<category><![CDATA[PDF]]></category>
		<category><![CDATA[Privacy]]></category>
		<category><![CDATA[Rants]]></category>
		<category><![CDATA[Shredding]]></category>

		<guid isPermaLink="false">http://paperjammed.com/?p=796</guid>
		<description><![CDATA[It&#8217;s too comical to be true. A few months back, when I wrote an article warning about inadequate attempts at sanitizing PDF documents, I thought that any organization serious about censoring documents would not make such a basic error. Especially not a government agency, after the military had been caught by this pitfall.
Apparently this is [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-full wp-image-797" title="20091208-redaction1" src="http://paperjammed.com/wp-content/uploads/2009/12/20091208-redaction1.gif" alt="20091208-redaction1" width="361" height="280" />It&#8217;s too comical to be true. A few months back, when I wrote an article <a href="http://paperjammed.com/2009/04/21/keeping-your-secrets-to-yourself—what-can-your-shared-documents-tell-others/">warning about inadequate attempts at sanitizing PDF documents</a>, I thought that any organization serious about censoring documents would not make such a basic error. Especially not a government agency, after the military <a href="http://www.schneier.com/blog/archives/2005/05/pdf_radacting_f.html">had been caught</a> by this pitfall.</p>
<p><a href="http://www.wanderingaramean.com/2009/12/tsa-makes-another-stupid-move.html">Apparently this is not the case</a></p>
<p>It seems that the TSA has leaked their official document of airport security guidelines. ABC News says <a href="http://abcnews.go.com/Blotter/massive-tsa-security-breach-agency-secrets/story?id=9280503">Online Posting Reveals a &#8220;How To&#8221; for Terrorists to Get Through Airport Security</a></p>
<p><a href="http://abcnews.go.com/Blotter/massive-tsa-security-breach-agency-secrets/story?id=9280503"></a><span id="more-796"></span></p>
<p><strong>A Rookie Mistake</strong></p>
<p>Look at the screenshot of the document at the top of this post. Even though a certain part of the document has been blacked out, it is possible to select the text and copy/paste to find out what is hidden behind the black text.</p>
<p>What kinds of things are listed in this document?</p>
<ul>
<li>Photographs of all kinds of official ID cards. Ever wondered what a U.S. Senator&#8217;s ID card looks like?</li>
<li>Procedures for calibrating equipment, such as where guns should be hidden for the testing and such.</li>
<li>Guidelines for who gets searched and who doesn&#8217;t.</li>
<li>Guidelines for what objects get searched and which don&#8217;t.</li>
<li>And much much more!</li>
</ul>
<p>In other words, this was a most unfortunate event.</p>
<p>See for yourself—ABC News (and others) have <a href="http://a.abcnews.go.com/images/Blotter/ht_tsa_screening_2_091208.pdf">posted the document with redactions removed</a>.</p>
<p><strong>Easy as Pie</strong></p>
<p>Here&#8217;s a screenshot of the original document, opened in Adobe Acrobat Professional.</p>
<p><img class="alignnone size-full wp-image-801" title="20091208-redaction2" src="http://paperjammed.com/wp-content/uploads/2009/12/20091208-redaction2.gif" alt="20091208-redaction2" width="500" height="197" /></p>
<p>As you can see, it was a trivial matter to use the <strong>TouchUp Object</strong> tool to gently slide the black rectangle off of the secret stuff (I have blurred the text here, though you can read it from ABC News if you wish).</p>
<p>If you are working with confidential documents that could potentially cause disaster if leaked, <em>please</em> learn how to redact your documents correctly!</p>
]]></content:encoded>
			<wfw:commentRss>http://paperjammed.com/2009/12/08/dont-worry-if-you-didnt-sanitize-your-documents%e2%80%94even-the-tsa-forgets-occasionally/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Keeping your secrets to yourself—old changes lingering in your PDF files</title>
		<link>http://paperjammed.com/2009/11/23/keeping-your-secrets-to-yourself-old-changes-lingering-in-your-pdf-files/</link>
		<comments>http://paperjammed.com/2009/11/23/keeping-your-secrets-to-yourself-old-changes-lingering-in-your-pdf-files/#comments</comments>
		<pubDate>Tue, 24 Nov 2009 04:46:58 +0000</pubDate>
		<dc:creator>Tad</dc:creator>
				<category><![CDATA[Security]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Data Loss]]></category>
		<category><![CDATA[Geeky]]></category>
		<category><![CDATA[PDF]]></category>

		<guid isPermaLink="false">http://paperjammed.com/?p=781</guid>
		<description><![CDATA[A few months ago I wrote an article that touched upon the problems inherent in attempts to sanitize documents before sending them to the enemy—perhaps to remove competitor&#8217;s names or trade secrets.
I was reading a post on a board I frequent where a person was describing exactly this kind of activity—removing sensitive information from PDF [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-medium wp-image-791" title="Rusty trap" src="http://paperjammed.com/wp-content/uploads/2009/11/iStock_000011076402XSmall-300x225.jpg" alt="Rusty trap" width="300" height="225" />A few months ago I wrote an article that touched upon <a href="http://paperjammed.com/2009/04/21/keeping-your-secrets-to-yourself—what-can-your-shared-documents-tell-others/">the problems inherent in attempts to sanitize documents</a> before sending them to the enemy—perhaps to remove competitor&#8217;s names or trade secrets.</p>
<p>I was reading a post on a board I frequent where a person was describing exactly this kind of activity—removing sensitive information from PDF documents. Several suggestions were made, but one individual suggested opening the file in Acrobat Pro and replacing the sensitive text with good old <a href="http://www.lipsum.com/">Lorem Ipsum</a>.</p>
<p>It was at that moment that I recalled a peculiar feature of the PDF file format: it is designed to support nondestructive updates, allowing people to make vast changes to a PDF document while still retaining the original document, fully intact. I did a few experiments and was surprised with the results.<span id="more-781"></span></p>
<p><strong>A Brief Note on the PDF File Format</strong></p>
<p>For the geeky types among us, one place to begin is this article:</p>
<p><a href="http://www.mactech.com/articles/mactech/Vol.15/15.09/PDFIntro/">Portable Document Format: An Introduction for Programmers</a></p>
<p>The key points to get out of the article is this: A PDF document is comprised of several distinct sections, a <strong>Header</strong>, a <strong>Body</strong>, an <strong>&#8220;xref&#8221; Table</strong>, and a <strong>Trailer</strong>. At the very end of the file you will find the character sequence <strong>%%EOF</strong></p>
<p>The PDF standard was designed to allow multiple updates to a document, while retaining the original version. This is accomplished by appending anything new to the end of the document, after the original <strong>EOF</strong> tag. The document will now have two <strong>EOF</strong> tags: one indicating where the original document ended, and a new <strong>EOF</strong> tag indicating where the new changes end.</p>
<p>If we wish to revert PDF changes, it should be a simple matter of opening the PDF file in a binary editor, searching for the first <strong>EOF</strong> tag, and deleting everything following.</p>
<p><strong>A Simple Experiment</strong></p>
<p>Let&#8217;s start with a proper secret document containing missile plans&#8230;</p>
<p><img class="alignnone size-full wp-image-785" title="20091123-missile-plans-1" src="http://paperjammed.com/wp-content/uploads/2009/11/20091123-missile-plans-1.gif" alt="20091123-missile-plans-1" width="439" height="418" /></p>
<p>Suppose we want to obscure some special information in paragraph 37. We can open the file in Acrobat Professional and use its text editing features to swap in the venerable <em>Lorem Ipsum</em> text.</p>
<p>Here&#8217;s what it looks like after the switch:</p>
<p><img class="alignnone size-full wp-image-786" title="20091123-lorem-ipsum" src="http://paperjammed.com/wp-content/uploads/2009/11/20091123-lorem-ipsum.gif" alt="20091123-lorem-ipsum" width="598" height="243" /></p>
<p>You can see here that the first seven lines of text starting on paragraph 37 have been replaced with appropriate unreadable text.</p>
<p>Now, open the new PDF file in a binary editor (since PDF files contain a mix of text and binary, the editor must be a binary editor).</p>
<p><img class="alignnone size-full wp-image-787" title="20091123-binary-editor" src="http://paperjammed.com/wp-content/uploads/2009/11/20091123-binary-editor.gif" alt="20091123-binary-editor" width="693" height="633" /></p>
<p>Note the <strong>%%EOF</strong> character sequence embedded in the text. This is the first <strong>EOF</strong> tag, indicating where the original file ended. All we need to do is place the cursor to the right of the <strong>EOF</strong> and delete everything to the end of the file.</p>
<p>Once we have done so, it&#8217;s like magic:</p>
<p><img class="alignnone size-full wp-image-788" title="20091123-after-binary-editing" src="http://paperjammed.com/wp-content/uploads/2009/11/20091123-after-binary-editing.gif" alt="20091123-after-binary-editing" width="794" height="323" /></p>
<p>The edits that replaced lines of paragraph 37 with gibberish have neatly been undone!</p>
<p><strong>More Details</strong></p>
<p>From the <a href="http://www.mactech.com/articles/mactech/Vol.15/15.09/PDFIntro/">PDF Intro document</a> linked earlier:</p>
<p>&#8220;The trailer, it turns out, plays an important role in the way PDF implements incremental updating. The key concept to understand here is that a PDF file is never overwritten, only added to. That goes for all portions of the PDF file &#8211; even the trailer itself, and the end-of-file marker. In other words, a multiply-updated PDF document may contain multiple trailers &#8211; and multiple end-of-file markers! (There may be numerous occurrences of %%EOF.) Each time the file is edited, an addendum is written to the tail of the file, consisting of the content objects that have changed, a new xref section, and a new trailer containing all the information that was in the previous trailer, as well as a /Prev key specifying the byte offset (from the beginning of the file) of the previous xref section. The cross-reference info will then be distributed across more than one xref section. To access all of the cross-references, the reader must walk the list of /Prev keys in all the trailers, in reverse order.</p>
<p>Space doesn&#8217;t permit a detailed exploration of updates here, but you can find several examples in Appendix A of the PDF 1.3 specification (available at <a href="http://partners.adobe.com/asn/developer">http://partners.adobe.com/asn/developer</a>).&#8221;</p>
<p><strong>Summary</strong></p>
<p>It is important to understand that the PDF standard allows for appended updates to files that leave the original document intact, regardless of how drastic the changes are. If you are intent on redacting text from PDF documents, do not depend on simply deleting the secrets using a PDF editor—you must use a proper redaction tool that addresses these issues correctly.</p>
<p>That said, I did some experimenting with a few utilities (Apple Preview, PDFpen, and Adobe Acrobat Pro) and found that some write the file from scratch each time, with no lingering cruft from former versions, while others respect the original intent of the PDF standard. This means that you can&#8217;t trust that older revisions are being retained in your file and you can&#8217;t trust that they aren&#8217;t.</p>
<p>Be conservative: use a redaction tool for secrecy and proper backups for versioning.</p>
]]></content:encoded>
			<wfw:commentRss>http://paperjammed.com/2009/11/23/keeping-your-secrets-to-yourself-old-changes-lingering-in-your-pdf-files/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Snow Leopard Update for ScanSnap</title>
		<link>http://paperjammed.com/2009/11/13/snow-leopard-update-for-scansnap/</link>
		<comments>http://paperjammed.com/2009/11/13/snow-leopard-update-for-scansnap/#comments</comments>
		<pubDate>Sat, 14 Nov 2009 04:36:58 +0000</pubDate>
		<dc:creator>Tad</dc:creator>
				<category><![CDATA[Scanning]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Tools of the Trade]]></category>
		<category><![CDATA[Hardware]]></category>
		<category><![CDATA[Macintosh]]></category>
		<category><![CDATA[PDF]]></category>

		<guid isPermaLink="false">http://paperjammed.com/?p=770</guid>
		<description><![CDATA[This evening I opened my email and found a most welcome message: Fujitsu has released their patched version of the ScanSnap software for Snow Leopard.
[UPDATE: I spoke too soon—they only delivered half of the goods. See below.]
[UPDATE 2: Hurray! It's fixed! The birds are chirping and the sun is shining and life is good!]
When Snow [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-full wp-image-773" title="20091113-scansnap-update" src="http://paperjammed.com/wp-content/uploads/2009/11/20091113-scansnap-update.gif" alt="20091113-scansnap-update" width="371" height="228" />This evening I opened my email and found a most welcome message: <a href="http://www.fujitsu.com/us/services/computing/peripherals/scanners/support/sl_download.html">Fujitsu has released their patched version of the ScanSnap software for Snow Leopard</a>.</p>
<p>[UPDATE: I spoke too soon—they only delivered half of the goods. See below.]</p>
<p>[UPDATE 2: Hurray! It's fixed! The birds are chirping and the sun is shining and life is good!]</p>
<p>When Snow Leopard came out back in August, I ordered my copy the first week and was so excited that I installed it the day it arrived. <a href="http://paperjammed.com/2009/09/07/when-migrating-to-a-new-operating-system-look-before-you-leap/">My joy was short-lived</a>, however: the most important software package I use did not work!<span id="more-770"></span></p>
<p>I depend greatly on the OCR capabilities of the ABBYY FineReader software that comes with the ScanSnap scanners, and this was one of the many pieces of software that did not smoothly transition to Snow Leopard. I could scan documents, with limited functionality, but the OCR feature did not work.</p>
<p>Now that Fujitsu has released their official update, I will probably be installing Snow Leopard tomorrow evening. Now, do I do an upgrade or a full install? Hmmmm&#8230;</p>
<p>UPDATE</p>
<p>Well, they only delivered half of the goods. <img src='http://paperjammed.com/wp-includes/images/smilies/icon_sad.gif' alt=':-(' class='wp-smiley' /> </p>
<p>I was reading through the seven-step process for updating the ScanSnap drivers and I arrived at step seven:</p>
<blockquote><p><strong>Step 7:</strong> The download for FineReader for ScanSnap update to Snow Leopard will be hosted by ABBYY but is not yet available. If you have already subscribed to be notified by Fujitsu regarding the Snow Leopard updates, an email will be sent to you when it is posted.</p></blockquote>
<p>How displeasing. The only thing I really cared about was getting the OCR to work, and apparently Abbyy has not yet delivered their part (How hard can it be to update the part in your code that says &#8220;If the PDF metadata doesn&#8217;t match X then the document isn&#8217;t a ScanSnap doc&#8221; ?).</p>
<p>UPDATE 2</p>
<p>Instead of making us wait another month or two, ABBYY has delivered their patch in record time <img src='http://paperjammed.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> . I quickly installed the update and tested my ScanSnap script&#8230;</p>
<p>There are a few kinks to work out, but it is pretty clear that I will have my same old workflow AppleScript folder action up and running in short order. I will post the newer script as soon as I have it running properly.</p>
<p>And, as a side note, I had already purchased <a href="http://www.smileonmymac.com/PDFpen/index.html">PDFpen</a> from <a href="http://www.smileonmymac.com/">SmileOnMyMac</a> as a backup plan. Their tool incorporates the <a href="http://www.nuance.com/imaging/omnipage/omnipage-professional.asp">OmniPage OCR engine</a>, an engine that rivals that of ABBYY. My script was already running with PDFpen, but there were some issues with tables in documents that I forwarded on to SmileOnMyMac. One of these days I&#8217;ll post my scripts for PDFpen.</p>
]]></content:encoded>
			<wfw:commentRss>http://paperjammed.com/2009/11/13/snow-leopard-update-for-scansnap/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Dodged the corrupt-document bullet this time, just barely&#8230;</title>
		<link>http://paperjammed.com/2009/10/27/dodged-the-corrupt-document-bullet-this-time-just-barely/</link>
		<comments>http://paperjammed.com/2009/10/27/dodged-the-corrupt-document-bullet-this-time-just-barely/#comments</comments>
		<pubDate>Tue, 27 Oct 2009 21:52:30 +0000</pubDate>
		<dc:creator>Tad</dc:creator>
				<category><![CDATA[Searching and Indexing]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Data Loss]]></category>
		<category><![CDATA[Geeky]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[Knowledge Management]]></category>
		<category><![CDATA[PDF]]></category>

		<guid isPermaLink="false">http://paperjammed.com/?p=750</guid>
		<description><![CDATA[A couple of weeks ago, a co-worker sent me a PDF document to look at. He said that he was having trouble copying and pasting from the document and was scratching his head about why this particular PDF would have such issues.
As it would turn out, there were several thousand other documents on a file [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-medium wp-image-751" title="gibberish document in a file folder" src="http://paperjammed.com/wp-content/uploads/2009/10/iStock_000006486654XSmall-300x199.jpg" alt="gibberish document in a file folder" width="300" height="199" />A couple of weeks ago, a co-worker sent me a PDF document to look at. He said that he was having trouble copying and pasting from the document and was scratching his head about why this particular PDF would have such issues.</p>
<p>As it would turn out, there were several thousand other documents on a file server that shared the same funny behavior. By the time we were done struggling with this problem I had gained new respect for PDF corruption issues and their prevention.<span id="more-750"></span></p>
<p><strong>The Problem</strong></p>
<p>We were looking to load a few thousand of these scientific reports into a fancy-schmancy new database, with linguistics searching and other bells and whistles. Much to our chagrin, these documents just weren&#8217;t loading, and we couldn&#8217;t understand why. They were text documents, with some embedded images, but mostly straightforward text.</p>
<p>Here is an excerpt:</p>
<p><img class="alignnone size-full wp-image-755" title="20091027-plaintext" src="http://paperjammed.com/wp-content/uploads/2009/10/20091027-plaintext.gif" alt="20091027-plaintext" width="521" height="93" /></p>
<p>And you can tell that it is right and proper text because when I blow it up all the way, the fonts are nice and smooth—this isn&#8217;t just an image of text.</p>
<p><img class="alignnone size-full wp-image-756" title="20091027-smooth-letter" src="http://paperjammed.com/wp-content/uploads/2009/10/20091027-smooth-letter.gif" alt="20091027-smooth-letter" width="258" height="295" /></p>
<p>But if I copy and paste that particular paragraph into any handy editor (Notepad, in this case), this is what I see:</p>
<p><img class="alignnone size-full wp-image-757" title="20091027-notepad" src="http://paperjammed.com/wp-content/uploads/2009/10/20091027-notepad.gif" alt="20091027-notepad" width="496" height="155" /></p>
<p>And as far as I know, at this point the actual text is beyond the reach of average folks like me. We tried, believe me we tried.</p>
<p><strong>What went wrong?</strong></p>
<p>A quick Google of the subject led us to understand that many PDF generation tools embed subsets of fonts, with nonstandard mappings from the text to the font.</p>
<p>This fellow explains it nicely:</p>
<p>&#8220;The PDF file does not contain all the information to extract the text. The problem is that a character in a PDF file may not contain information what &#8220;real&#8221; character it relates to. Some PDF generators do a pretty bad job when they embed fonts into PDF files. They use a proprietary encoding mechanism (e.g. 1 is A, 2 is B, 3 is C, &#8230;) in both the embedded font and when they place glyphs on the page. Without a table that implements the reverse (e.g. character code 1 is &#8216;A&#8217;) you cannot extract text from such a file.</p>
<p>There is nothing you can do (besides to complain to whoever created the PDF file, and the author of the software that created this file).&#8221;<br />
— from <a href="http://www.experts-exchange.com/Web_Development/Document_Imaging/Adobe_Acrobat/Q_21426533.html">khkremer on experts-exchange.com</a></p>
<p>As it would turn out, many of the reports had been generated by printing to Adobe Distiller from Microsoft Word. It would seem that the default settings used for Distiller included the &#8220;totally hose my document content&#8221; switch.</p>
<p><strong>The Solution</strong></p>
<p>We fretted over this quite a bit. These are important scientific reports, and there is no way to easily ungarble them. We finally ended up contacting the <a href="http://finereader.abbyy.com/">Abbyy Finereader</a> folks and trying out their OCR toolkit for Linux: not only did this product make fast work of running optical character recognition on the sample document, but once we had a script running, we managed to blow through the 10,000 pages the trial license gave us, in a day or two.</p>
<p><strong>Imperfect, at best</strong></p>
<p>I am happy that we were able to salvage the bulk of the electronic knowledge found within those thousands of files, but our work barely scratched the surface.</p>
<p>For example, most of these documents have rich bookmarking of sections and keywording, such as this (content tastefully blurred on purpose).</p>
<p><img class="alignnone size-full wp-image-760" title="20091027-doc-with-contents" src="http://paperjammed.com/wp-content/uploads/2009/10/20091027-doc-with-contents.gif" alt="20091027-doc-with-contents" width="500" height="348" /></p>
<p>In addition, scientific documents typically have loads of tables full of numbers. Though it is possible to mine this data with a good OCR tool (the FineReader API provides tools for just this purpose), the tables are far more difficult to extract correctly once the original text information is lost.</p>
<p><strong>Final thoughts</strong></p>
<p>I wrote a few weeks about document formats, <a href="http://paperjammed.com/2009/09/29/are-your-portable-document-format-files-all-that/">mentioning the PDF/A document standard</a>. This is worth investigating, regardless of what your document needs are.</p>
<p>If our thousands of files had been originally generated as PDF/A, it is certain that we would have been able to copy/paste from them without problem: PDF/A prohibits such font shenanigans as were perpetrated on our garbled reports.</p>
<p>In the end, our OCR sledgehammer approach worked like a charm, and is probably sufficient for our needs. Text mining is a pretty slushy business, so no-one will complain if there are a few typos on each page—if they find the doc in a search, they can print it and read it the old fashioned way.</p>
]]></content:encoded>
			<wfw:commentRss>http://paperjammed.com/2009/10/27/dodged-the-corrupt-document-bullet-this-time-just-barely/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Why not try a personal Wiki for some of your more amorphous notes?</title>
		<link>http://paperjammed.com/2009/10/12/why-not-try-a-personal-wiki-for-some-of-your-more-amorphous-notes/</link>
		<comments>http://paperjammed.com/2009/10/12/why-not-try-a-personal-wiki-for-some-of-your-more-amorphous-notes/#comments</comments>
		<pubDate>Tue, 13 Oct 2009 03:59:04 +0000</pubDate>
		<dc:creator>Tad</dc:creator>
				<category><![CDATA[Paperless Life]]></category>
		<category><![CDATA[Searching and Indexing]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Geeky]]></category>
		<category><![CDATA[Knowledge Management]]></category>
		<category><![CDATA[Media]]></category>
		<category><![CDATA[Networking]]></category>
		<category><![CDATA[Tools of the Trade]]></category>
		<category><![CDATA[Windows]]></category>

		<guid isPermaLink="false">http://paperjammed.com/?p=706</guid>
		<description><![CDATA[In my evenings, I sometimes find myself performing the role of &#8220;Resident Geek&#8221; at my nephew&#8217;s school, tending to network issues, computer problems, and my favorite, &#8220;The Internet is down!&#8221;
Over the past couple of years I have considered several different approaches for keeping a grip on which computers had which service patch, which router is [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-medium wp-image-736" src="http://paperjammed.com/wp-content/uploads/2009/10/iStock_000008986250XSmall-300x199.jpg" alt="" width="300" height="199" />In my evenings, I sometimes find myself performing the role of &#8220;Resident Geek&#8221; at my nephew&#8217;s school, tending to network issues, computer problems, and my favorite, &#8220;The Internet is down!&#8221;</p>
<p>Over the past couple of years I have considered several different approaches for keeping a grip on which computers had which service patch, which router is getting flaky, and which cable connects the library to the classroom at the end of the hall.</p>
<p>I have tried Excel spreadsheets, an Access database, even a spiral-bound notebook—none of them made the job any easier. A few weeks ago I thought about trying a <a href="http://en.wikipedia.org/wiki/Wiki">Wiki</a> and this has turned out to be a perfect fit!</p>
<p>If you are looking to keep a loose scrapbook of notes with lots of arbitrary categories and relationships between them, a wiki might do the trick. In this article I&#8217;ll cover two simple freeware wikis you can carry around on a thumb drive.<span id="more-706"></span></p>
<p><strong>What&#8217;s in a Wiki?</strong></p>
<p>All of us have used Wikipedia at one time or another, and though it may be regarded with disdain by high school teachers, when you consider how it works, Wikipedia is an amazing achievement. But what is the nature of a wiki?</p>
<p>One of the key features is that any page can be easily edited at any time (of course this can be limited by permissions). Another attribute is the ability to breathe life into a new page just by calling its name.</p>
<p>Between these two features, you get the essence of wiki-ness.</p>
<p>For example, if I have a page that discusses North American bears, I can type in a list of bears in a special format, often in jammed-together <a href="http://en.wikipedia.org/wiki/CamelCase">Wiki Words</a>, like this:</p>
<ul>
<li><span style="color: #3366ff;"><strong>GrizzleyBear</strong></span></li>
<li><span style="color: #3366ff;"><strong>BlackBear</strong></span></li>
<li><span style="color: #3366ff;"><strong>BrownBear</strong></span></li>
</ul>
<p>As soon as I save the page, those bear names become hyperlinks. Even though I haven&#8217;t written any pages about the individual bears, whenever it finally suits me, I can click on <span style="color: #3366ff;"><strong>BlackBear </strong></span>and accept the invitation to &#8220;Create a new page called <span style="color: #3366ff;"><strong>BlackBear</strong></span>&#8221;</p>
<p>Better still, a friend who knows about black bears might click on <span style="color: #3366ff;"><strong>BlackBear </strong></span>and write a beautiful page about the animals.</p>
<p>That&#8217;s what wikis are all about.</p>
<p><strong>Back to the School Computers</strong></p>
<p>In a matter of minutes I was able to make a page that described the building and listed the various rooms in the building. I was able to then click on each room and &#8220;auto-vivify&#8221; a page for the room.</p>
<p>From that point, it was easy to create custom pages for each computer in the building, with each page listing the machine&#8217;s stats. I also created pages for each network switch or router.</p>
<p>In a matter of two or three evenings I had the skeleton of a solid knowledge base populated—it&#8217;s a pretty fancy looking web site with dozens of pages that took little effort to put together.</p>
<p>Last night I noticed that one of the machines wasn&#8217;t connecting to the Internet, though it connects fine to internal servers. I popped open its page on the wiki and added a simple note at the bottom of the page:</p>
<p><tt>2009-10-11 - This machine isn't able to connect to the Internet. Not sure why. It connects fine to internal servers.</tt></p>
<p>A few weeks ago I replaced a fan in a network switch. An easy annotation on the wiki page for that device.</p>
<p><strong>Personal Wikis</strong></p>
<p>There are many uses for personal wikis, mostly centered around <a href="http://en.wikipedia.org/wiki/Personal_knowledge_management">personal knowledge management</a> and <a href="http://en.wikipedia.org/wiki/Personal_information_management">personal information management</a>. People use wikis as a replacement for time and task management tools, as a place for gathering thoughts, as a sort of amorphous database, and many other things.</p>
<p>There are many different personal wikis available—here&#8217;s a <a href="http://en.wikipedia.org/wiki/Personal_wiki#Free_software">short list of free ones</a>. One nice simple wiki to try is <a href="http://en.wikipedia.org/wiki/TiddlyWiki">TiddlyWiki</a>. If you are looking for something with a bit more substance, you can try a portable version of <a href="http://en.wikipedia.org/wiki/MediaWiki">MediaWiki</a>—the engine behind Wikipedia—that runs off your thumb drive.</p>
<p><strong>TiddlyWiki</strong></p>
<p>This afternoon I downloaded the flyweight portable wiki called TiddlyWiki. This is an amazingly tight little application—it comes in the form of a single fat web page that you copy to your thumb drive. As you make edits to your TiddlyWiki, the single html page is saved with your changes. Since it&#8217;s a single fancy file, backups are dead easy.</p>
<p>Here&#8217;s what it looks like when you first launch the &#8220;empty.html&#8221; file:</p>
<p><img class="alignnone size-medium wp-image-718" src="http://paperjammed.com/wp-content/uploads/2009/10/20091012-tiddly1-300x161.png" alt="" width="300" height="161" /></p>
<p>After a half hour of twiddling around, I had thrown together this basic set of &#8220;Tiddlers&#8221;</p>
<p><img class="alignnone size-full wp-image-720" src="http://paperjammed.com/wp-content/uploads/2009/10/20091012-tiddly2.png" alt="" width="626" height="720" /></p>
<p>In this screen shot you can see that there are now links that bring up custom &#8220;Tiddlers&#8221; for each computer and for each room. I have opened one of the little pages for <span style="color: #3366ff;"><strong>Computer21</strong></span>.</p>
<p>They describe these pages as being comparable to note cards. All in all, it is tight and easy to use.</p>
<p>Want to give it a try? Download it from the <a href="http://www.tiddlywiki.com/">TiddlyWiki</a> site. You really need to play with it to get a feel for what it can do!</p>
<p><strong>MediaWiki</strong></p>
<p>If you are looking for something with a little more meat on it, you can run the Wikipedia engine on your USB drive.</p>
<p>The easiest way to set this up is to let <a href="http://www.chsoftware.net/en/useware/mowes/mowes.htm">MoWeS</a> do everything for you. <strong>MoWeS</strong> stands for <strong>Mo</strong>dular <strong>We</strong>bserver <strong>S</strong>ystem. It&#8217;s a free product that you can configure as a self-contained Apache web server with a variety of cool apps like MediaWiki, running off a thumb drive.</p>
<p>Here&#8217;s how to set up MediaWiki in five minutes:</p>
<ul>
<li>Go to the <a href="http://www.chsoftware.net/en/useware/mowes/download.htm">MoWeS Mixer</a></li>
<li>The first time around choose &#8220;I do not have a <strong>MoWeS Portable II</strong> Package and want to obtain a new package&#8221; when prompted and click <strong>Go</strong>.</li>
<li>On the software lists, check <strong>Apache2</strong>, <strong>MySQL5</strong>, <strong>PHP5</strong>, and <strong>MediaWiki</strong></li>
<li>Click <strong>Download Now</strong></li>
<li>At this point they ask you some kind of question <em>in German</em>, to filter spambots, but it seems to be a simple math problem. Fill in the answer and click <strong>Submit Query</strong><br />
(&#8220;<em>Zum Schutz vor Downloadrobotern geben Sie bitte das Ergebnis dieser Aufgabe ein: 5 + 8 =  ?</em>&#8220;)</li>
<li>Unzip the downloaded zip file,  <strong>mowes_portable.zip</strong>, and copy the files to your USB drive</li>
<li>Open your thumb drive and double-click <strong>mowes.exe</strong></li>
<li>Select your language and accept the license</li>
<li>Click <strong>install</strong>, and confirm when prompted</li>
</ul>
<p>The installation process may take several minutes, but rest assured that it isn&#8217;t installing anything on your computer.</p>
<p><strong>Note: </strong>I received two or three firewall warnings for the Apache web server and the MySQL database. I had to click the &#8220;Unblock&#8221; button for all of them before my new MediaWiki-on-a-stick would work correctly.</p>
<p>After all of the dust settled, I have this little window on my screen:</p>
<p><img class="alignnone size-medium wp-image-725" title="20091012-MoWeS1" src="http://paperjammed.com/wp-content/uploads/2009/10/20091012-MoWeS1-300x209.png" alt="20091012-MoWeS1" width="300" height="209" /></p>
<p>In order to shut down and close out, just click the <strong>End</strong> button.</p>
<p>Once your MediaWiki USB key is running, you can go to this web page:</p>
<p><span style="color: #3366ff;">http://127.0.0.1/mediawiki/index.php/Main_Page</span></p>
<p><img class="alignnone size-full wp-image-726" src="http://paperjammed.com/wp-content/uploads/2009/10/20091012-MoWeS2.png" alt="" width="593" height="524" /></p>
<p>It looks just like Wikipedia, doesn&#8217;t it?</p>
<p>What a truly amazing thing: you can carry around your own Wikipedia server on a USB key and plug it in any random machine and start it up.</p>
<p><strong>Different Wiki Features</strong></p>
<p>As you try out different wiki software, you will notice that there are plenty of differences in the features they support:</p>
<ul>
<li>Each wiki has a different kind of editor. Some are visual; others are simple text editors.</li>
<li>The markup syntax you use for pages is different from wiki to wiki.</li>
<li>Most wikis support features such as &#8220;category pages&#8221; that find all pages tagged with a category.</li>
<li>Some support adding images and other content; others don&#8217;t. I imagine that TiddlyWiki probably has some means of embedding images, but I couldn&#8217;t find it.</li>
<li>A quick glance at the MediaWiki screenshot above shows extended features such as the Discussion tab and the History tab.</li>
<li>Some use the filesystem for their pages; others use a database.</li>
</ul>
<p>Since I wanted a central wiki for the whole school, I chose a different product from the portable wikis I discussed here—I decided to run <a href="http://moinmo.in/">MoinMoin</a> on a <a href="http://www.ubuntu.com/">Ubuntu</a> installation on an aging Gateway desktop machine. Nevertheless, the basic idea is still the same.</p>
<p>Once that arrangement becomes a little more stable I&#8217;ll write up a howto document, like the <a href="http://paperjammed.com/2009/02/15/new-life-for-an-old-clunker/">Linux NAS</a> one from a few months back.</p>
<p><strong>Other Sources</strong></p>
<p>There are loads of different personal wiki options out there and many people have written how-to documents and tutorials. Here&#8217;s a few:</p>
<ul>
<li><a href="http://lifehacker.com/354005/run-your-personal-wikipedia-from-a-usb-stick">Run Your Personal Wikipedia from a USB Stick</a> (Lifehacker.com)</li>
<li><a href="http://lifehacker.com/163707/geek-to-live--set-up-your-personal-wikipedia">Geek to Live: Set up your personal Wikipedia</a> (Lifehacker.com)</li>
<li><a href="http://www.pmwiki.org/wiki/Cookbook/WikiOnAStick">Wiki On A Stick</a> (PmWiki.org)</li>
<li><a href="http://cplus.about.com/od/thebusinessofsoftware/ss/woas.htm">Getting Started with Wiki on a Stick</a> (About.com)</li>
<li><a href="http://www.giffmex.org/twfortherestofus.html">TiddlyWiki for the rest of us</a> (giffmex)</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://paperjammed.com/2009/10/12/why-not-try-a-personal-wiki-for-some-of-your-more-amorphous-notes/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>
