<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Paper Jammed &#187; Data Loss</title>
	<atom:link href="http://paperjammed.com/tag/data-loss/feed/" rel="self" type="application/rss+xml" />
	<link>http://paperjammed.com</link>
	<description>Has paper taken over your life?</description>
	<lastBuildDate>Wed, 30 Jun 2010 02:14:53 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>New life for an old PC—no geek card required</title>
		<link>http://paperjammed.com/2010/05/05/new-life-for-an-old-pc%e2%80%94no-geek-card-required/</link>
		<comments>http://paperjammed.com/2010/05/05/new-life-for-an-old-pc%e2%80%94no-geek-card-required/#comments</comments>
		<pubDate>Thu, 06 May 2010 01:52:22 +0000</pubDate>
		<dc:creator>Tad</dc:creator>
				<category><![CDATA[Paperless Life]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Backups]]></category>
		<category><![CDATA[Data Loss]]></category>
		<category><![CDATA[Geeky]]></category>
		<category><![CDATA[Good Sites]]></category>
		<category><![CDATA[Hardware]]></category>
		<category><![CDATA[Knowledge Management]]></category>
		<category><![CDATA[Networking]]></category>
		<category><![CDATA[Reviews]]></category>
		<category><![CDATA[Tips]]></category>

		<guid isPermaLink="false">http://paperjammed.com/?p=985</guid>
		<description><![CDATA[Do you still have an old machine kicking around in the basement or the back room, long forgotten?
For no cost and almost zero effort, you can set it up as a dedicated network appliance, using one of the many turnkey products from the open-source TurnKey Linux project.
I&#8217;m serious. You don&#8217;t need to know anything at [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-medium wp-image-986" src="http://paperjammed.com/wp-content/uploads/2010/05/iStock_000004973496XSmall-200x300.jpg" alt="istockphoto.com" width="200" height="300" />Do you still have an old machine kicking around in the basement or the back room, long forgotten?<br />
For no cost and almost zero effort, you can set it up as a dedicated network appliance, using one of the many turnkey products from the open-source TurnKey Linux project.</p>
<p>I&#8217;m serious. You don&#8217;t need to know anything at all about Linux to use one of these. Just download the image, install, and you suddenly have a full featured NAS file server, or you might have a database or a source code repository.</p>
<p>Last year I wrote an article on <a href="http://paperjammed.com/2009/02/15/new-life-for-an-old-clunker/">how to set up a NAS device using Ubuntu Linux</a>. I have been a fan of Ubuntu since the start because it is a very easy distribution to install and configure. The down-side of using Linux has always been the fairly steep learning curve. Before you can get around to using the server, you need to get down in the weeds with configuration files and other stuff.</p>
<p>TurnKey Linux changes all of that.<span id="more-985"></span></p>
<p><strong>Painless Installation</strong></p>
<p>A few weeks back, I was setting up an aging PC as a standalone wiki server for a small office—this machine was going to provide a place for the office staff to document their procedures, how-tos, and other things.</p>
<p>I was about to set up an Ubuntu server, as I have done before many times, and install MoinMoin, like I did <a href="http://paperjammed.com/2009/10/12/why-not-try-a-personal-wiki-for-some-of-your-more-amorphous-notes/">some months back</a>. I remembered that it was a bit of a pain to get everything tweaked just right, so I did a quick check to see what kind of standalone wiki options were available online.</p>
<p>This is how I found TurnKey Linux. This project is all about single-purpose preconfigured Ubuntu server images.</p>
<p>One of those preconfigured images happens to be a <a href="http://www.turnkeylinux.org/mediawiki">MediaWiki appliance</a>—the wiki engine behind Wikipedia—and I was in business.</p>
<p>The installation took about fifteen minutes, with very little user interaction. I answered a few basic questions and the installer took over from there. As soon as the install was done, the machine rebooted and displayed a message on the monitor with the IP addresses where you can browse to from any other machine.</p>
<p><strong>Full Featured</strong></p>
<p>The work that has gone in to these appliances is amazing. In fifteen minutes I had installed a complex configuration that has the Apache, PHP, MySQL, MediaWiki core, as well as maintenance utilities such as a neat tool that provides a <span style="text-decoration: line-through;">Flash-based</span> pure-AJAX-based SSH command line in a remote browser (i.e. your browser becomes a terminal). Even someone with Linux experience would have to spend quite a bit of time fiddling around with different packages and configuration options in other to provide the same functionality that TurnKey gives you out of the box.</p>
<p>As with most open source projects, the documentation is about 80% complete, with deep detail in some areas, but leaving others fairly sparsely documented. But don&#8217;t let this deter you: in most cases users know how to use the product they are installing (e.g. MediaWiki) but don&#8217;t want the hassle of configuring it on Linux. That&#8217;s where TurnKey shines.</p>
<p><strong>Some Examples</strong></p>
<p>In minutes, you can set up a <a href="http://www.turnkeylinux.org/fileserver">NAS device</a>. If you want to try advanced content management in your office, try <a href="http://www.turnkeylinux.org/joomla">Joomla</a> or <a href="http://www.turnkeylinux.org/drupal6">Drupal</a>.</p>
<p>If you are working on a small project team and want to protect your source code, try <a href="http://www.turnkeylinux.org/redmine">Redmine</a> or <a href="http://www.turnkeylinux.org/trac">Trac</a> and do your bug tracking using <a href="http://www.turnkeylinux.org/bugzilla">Bugzilla</a>.</p>
<p>And while you are at it, you can document your organization&#8217;s working practices using a wiki such as <a href="http://www.turnkeylinux.org/moinmoin">MoinMoin</a> or <a href="http://www.turnkeylinux.org/mediawiki">MediaWiki</a>.</p>
<p><strong>Don&#8217;t forget to back it up!</strong></p>
<p>As with any computer, you should include your new TurnKey appliance in your backup strategy. The nice thing is that you don&#8217;t really need to care at all about backing up Linux or the other software; just back up the data. I don&#8217;t need to back up my entire MediaWiki machine; I just need to back up the database and image files. If anything goes wrong, you can rebuild the TurnKey appliance from scratch in minutes and then restore your data.</p>
<p>To save yourself some pain, keep notes on any small tweaks you made to the configuration.</p>
<p><strong>One Machine, One Purpose</strong></p>
<p>These disk images share common Ubuntu underpinnings, but they are referred to as Appliances because they turn your PC into a purpose-built appliance.</p>
<p>This means that if you want a content management system and you also want a ticket management system, you will need two old computers—not a rare commodity these days.</p>
<p>Take a look at <a href="http://www.turnkeylinux.org/">what they have to offer</a> and give TurnKey a shot—specialized software used in corporate environments is now within reach of small offices at the right price.</p>
]]></content:encoded>
			<wfw:commentRss>http://paperjammed.com/2010/05/05/new-life-for-an-old-pc%e2%80%94no-geek-card-required/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Another good checklist for going paperless</title>
		<link>http://paperjammed.com/2010/03/02/another-good-checklist-for-going-paperless/</link>
		<comments>http://paperjammed.com/2010/03/02/another-good-checklist-for-going-paperless/#comments</comments>
		<pubDate>Tue, 02 Mar 2010 19:36:36 +0000</pubDate>
		<dc:creator>Tad</dc:creator>
				<category><![CDATA[Paperless Life]]></category>
		<category><![CDATA[Backups]]></category>
		<category><![CDATA[Data Loss]]></category>
		<category><![CDATA[Good Sites]]></category>
		<category><![CDATA[Green Living]]></category>
		<category><![CDATA[Scanning]]></category>
		<category><![CDATA[Searching and Indexing]]></category>
		<category><![CDATA[Shredding]]></category>
		<category><![CDATA[Tips]]></category>
		<category><![CDATA[Workflow]]></category>

		<guid isPermaLink="false">http://paperjammed.com/?p=924</guid>
		<description><![CDATA[Jim Robinson over at Money Talks News has put together a nice article giving five basic steps for getting a jump start on your paperless life.
Among other things he discusses options for prioritizing and cutting down on the total volume of stuff you plan on keeping, digital or otherwise.
&#8220;Backup, backup, backup&#8221; made number four on [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-full wp-image-925" src="http://paperjammed.com/wp-content/uploads/2010/03/20100302-moneytalksnews.gif" alt="" width="300" height="314" />Jim Robinson over at <strong>Money Talks News</strong> has put together a nice article giving five basic steps for getting a jump start on your paperless life.</p>
<p>Among other things he discusses options for prioritizing and cutting down on the total volume of stuff you plan on keeping, digital or otherwise.</p>
<p>&#8220;Backup, backup, backup&#8221; made number four on his list.</p>
<p>And finally, he provides a few notes on some helpful free organizing software. I think I&#8217;m going to check out that <a href="http://www.knowyourstuff.org/iii/login.html">Know Your Stuff</a> application he mentioned.</p>
<p><a href="http://www.moneytalksnews.com/2010/03/02/papers-we-dont-need-no-stinkin-papers/">Five Tips to Paperless Finances</a> (moneytalksnews.com)</p>
]]></content:encoded>
			<wfw:commentRss>http://paperjammed.com/2010/03/02/another-good-checklist-for-going-paperless/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Could your family access your secrets in an emergency?</title>
		<link>http://paperjammed.com/2010/01/10/could-your-family-access-your-secrets-in-an-emergency/</link>
		<comments>http://paperjammed.com/2010/01/10/could-your-family-access-your-secrets-in-an-emergency/#comments</comments>
		<pubDate>Sun, 10 Jan 2010 18:59:10 +0000</pubDate>
		<dc:creator>Tad</dc:creator>
				<category><![CDATA[Green Living]]></category>
		<category><![CDATA[Paperless Life]]></category>
		<category><![CDATA[Security]]></category>
		<category><![CDATA[Data Loss]]></category>
		<category><![CDATA[Knowledge Management]]></category>
		<category><![CDATA[Tips]]></category>

		<guid isPermaLink="false">http://paperjammed.com/?p=851</guid>
		<description><![CDATA[Several weeks ago I was sitting at the dining room table with a family friend going through a stack of documents and letters. Her husband had passed away suddenly some weeks before, and I was doing the best I could to help her untangle the paperwork and understand what was what. This unfortunate scene made [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-medium wp-image-853" title="Keys on a keyboard" src="http://paperjammed.com/wp-content/uploads/2010/01/iStock_000008796911XSmall-225x300.jpg" alt="" width="225" height="300" />Several weeks ago I was sitting at the dining room table with a family friend going through a stack of documents and letters. Her husband had passed away suddenly some weeks before, and I was doing the best I could to help her untangle the paperwork and understand what was what. This unfortunate scene made it clear to me that sudden illness or death of a family member may require us to access files that they have, for many reasons.</p>
<p>Imagine that you were to become temporarily incapacitated for whatever reason&#8230;</p>
<ul>
<li>Can a family member log in to your computer, as yourself, in order to access your files?</li>
<li>Can your spouse access your online banking details so the bills can be paid?</li>
<li>Can your family find your insurance information that you scanned and filed away?</li>
<li>Is there someone who can log in to any online accounts that need care and feeding?</li>
</ul>
<p>Not a pleasant subject, indeed, but one that worries me from time to time.</p>
<p>One way to address these needs is to keep all of your passwords and so forth in one special place, using a password safe application, and make sure someone else has the access code. For example, you can use a tool such as <a href="http://agilewebsolutions.com/products/1Password">1Password</a> or <a href="http://www.splashdata.com/splashid/index.asp">SplashId</a> to store hundreds of secret bits that you use all the time, and your family might need.</p>
<p>You might consider writing down the master passwords that control your life and sealing them in an envelope that you provide to a trusted family member. Since this is such a great security risk if found by the enemy, you might want to omit any identifying information from the note. Impress upon them the need to secure the document very well.</p>
<p>Perhaps you can choose the same master password with your spouse, with one relatively short password locking your computer and a long secure password locking your password safe application.</p>
<p>Regardless of how you address these issues, sit down with your better half (or trusted family member) and review where documents are and how to access them.</p>
]]></content:encoded>
			<wfw:commentRss>http://paperjammed.com/2010/01/10/could-your-family-access-your-secrets-in-an-emergency/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Don&#8217;t worry if you didn&#8217;t sanitize your documents—even the TSA forgets occasionally</title>
		<link>http://paperjammed.com/2009/12/08/dont-worry-if-you-didnt-sanitize-your-documents%e2%80%94even-the-tsa-forgets-occasionally/</link>
		<comments>http://paperjammed.com/2009/12/08/dont-worry-if-you-didnt-sanitize-your-documents%e2%80%94even-the-tsa-forgets-occasionally/#comments</comments>
		<pubDate>Tue, 08 Dec 2009 22:29:29 +0000</pubDate>
		<dc:creator>Tad</dc:creator>
				<category><![CDATA[Paperless Life]]></category>
		<category><![CDATA[Searching and Indexing]]></category>
		<category><![CDATA[Security]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Data Loss]]></category>
		<category><![CDATA[PDF]]></category>
		<category><![CDATA[Privacy]]></category>
		<category><![CDATA[Rants]]></category>
		<category><![CDATA[Shredding]]></category>

		<guid isPermaLink="false">http://paperjammed.com/?p=796</guid>
		<description><![CDATA[It&#8217;s too comical to be true. A few months back, when I wrote an article warning about inadequate attempts at sanitizing PDF documents, I thought that any organization serious about censoring documents would not make such a basic error. Especially not a government agency, after the military had been caught by this pitfall.
Apparently this is [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-full wp-image-797" title="20091208-redaction1" src="http://paperjammed.com/wp-content/uploads/2009/12/20091208-redaction1.gif" alt="20091208-redaction1" width="361" height="280" />It&#8217;s too comical to be true. A few months back, when I wrote an article <a href="http://paperjammed.com/2009/04/21/keeping-your-secrets-to-yourself—what-can-your-shared-documents-tell-others/">warning about inadequate attempts at sanitizing PDF documents</a>, I thought that any organization serious about censoring documents would not make such a basic error. Especially not a government agency, after the military <a href="http://www.schneier.com/blog/archives/2005/05/pdf_radacting_f.html">had been caught</a> by this pitfall.</p>
<p><a href="http://www.wanderingaramean.com/2009/12/tsa-makes-another-stupid-move.html">Apparently this is not the case</a></p>
<p>It seems that the TSA has leaked their official document of airport security guidelines. ABC News says <a href="http://abcnews.go.com/Blotter/massive-tsa-security-breach-agency-secrets/story?id=9280503">Online Posting Reveals a &#8220;How To&#8221; for Terrorists to Get Through Airport Security</a></p>
<p><a href="http://abcnews.go.com/Blotter/massive-tsa-security-breach-agency-secrets/story?id=9280503"></a><span id="more-796"></span></p>
<p><strong>A Rookie Mistake</strong></p>
<p>Look at the screenshot of the document at the top of this post. Even though a certain part of the document has been blacked out, it is possible to select the text and copy/paste to find out what is hidden behind the black text.</p>
<p>What kinds of things are listed in this document?</p>
<ul>
<li>Photographs of all kinds of official ID cards. Ever wondered what a U.S. Senator&#8217;s ID card looks like?</li>
<li>Procedures for calibrating equipment, such as where guns should be hidden for the testing and such.</li>
<li>Guidelines for who gets searched and who doesn&#8217;t.</li>
<li>Guidelines for what objects get searched and which don&#8217;t.</li>
<li>And much much more!</li>
</ul>
<p>In other words, this was a most unfortunate event.</p>
<p>See for yourself—ABC News (and others) have <a href="http://a.abcnews.go.com/images/Blotter/ht_tsa_screening_2_091208.pdf">posted the document with redactions removed</a>.</p>
<p><strong>Easy as Pie</strong></p>
<p>Here&#8217;s a screenshot of the original document, opened in Adobe Acrobat Professional.</p>
<p><img class="alignnone size-full wp-image-801" title="20091208-redaction2" src="http://paperjammed.com/wp-content/uploads/2009/12/20091208-redaction2.gif" alt="20091208-redaction2" width="500" height="197" /></p>
<p>As you can see, it was a trivial matter to use the <strong>TouchUp Object</strong> tool to gently slide the black rectangle off of the secret stuff (I have blurred the text here, though you can read it from ABC News if you wish).</p>
<p>If you are working with confidential documents that could potentially cause disaster if leaked, <em>please</em> learn how to redact your documents correctly!</p>
]]></content:encoded>
			<wfw:commentRss>http://paperjammed.com/2009/12/08/dont-worry-if-you-didnt-sanitize-your-documents%e2%80%94even-the-tsa-forgets-occasionally/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Keeping your secrets to yourself—old changes lingering in your PDF files</title>
		<link>http://paperjammed.com/2009/11/23/keeping-your-secrets-to-yourself-old-changes-lingering-in-your-pdf-files/</link>
		<comments>http://paperjammed.com/2009/11/23/keeping-your-secrets-to-yourself-old-changes-lingering-in-your-pdf-files/#comments</comments>
		<pubDate>Tue, 24 Nov 2009 04:46:58 +0000</pubDate>
		<dc:creator>Tad</dc:creator>
				<category><![CDATA[Security]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Data Loss]]></category>
		<category><![CDATA[Geeky]]></category>
		<category><![CDATA[PDF]]></category>

		<guid isPermaLink="false">http://paperjammed.com/?p=781</guid>
		<description><![CDATA[A few months ago I wrote an article that touched upon the problems inherent in attempts to sanitize documents before sending them to the enemy—perhaps to remove competitor&#8217;s names or trade secrets.
I was reading a post on a board I frequent where a person was describing exactly this kind of activity—removing sensitive information from PDF [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-medium wp-image-791" title="Rusty trap" src="http://paperjammed.com/wp-content/uploads/2009/11/iStock_000011076402XSmall-300x225.jpg" alt="Rusty trap" width="300" height="225" />A few months ago I wrote an article that touched upon <a href="http://paperjammed.com/2009/04/21/keeping-your-secrets-to-yourself—what-can-your-shared-documents-tell-others/">the problems inherent in attempts to sanitize documents</a> before sending them to the enemy—perhaps to remove competitor&#8217;s names or trade secrets.</p>
<p>I was reading a post on a board I frequent where a person was describing exactly this kind of activity—removing sensitive information from PDF documents. Several suggestions were made, but one individual suggested opening the file in Acrobat Pro and replacing the sensitive text with good old <a href="http://www.lipsum.com/">Lorem Ipsum</a>.</p>
<p>It was at that moment that I recalled a peculiar feature of the PDF file format: it is designed to support nondestructive updates, allowing people to make vast changes to a PDF document while still retaining the original document, fully intact. I did a few experiments and was surprised with the results.<span id="more-781"></span></p>
<p><strong>A Brief Note on the PDF File Format</strong></p>
<p>For the geeky types among us, one place to begin is this article:</p>
<p><a href="http://www.mactech.com/articles/mactech/Vol.15/15.09/PDFIntro/">Portable Document Format: An Introduction for Programmers</a></p>
<p>The key points to get out of the article is this: A PDF document is comprised of several distinct sections, a <strong>Header</strong>, a <strong>Body</strong>, an <strong>&#8220;xref&#8221; Table</strong>, and a <strong>Trailer</strong>. At the very end of the file you will find the character sequence <strong>%%EOF</strong></p>
<p>The PDF standard was designed to allow multiple updates to a document, while retaining the original version. This is accomplished by appending anything new to the end of the document, after the original <strong>EOF</strong> tag. The document will now have two <strong>EOF</strong> tags: one indicating where the original document ended, and a new <strong>EOF</strong> tag indicating where the new changes end.</p>
<p>If we wish to revert PDF changes, it should be a simple matter of opening the PDF file in a binary editor, searching for the first <strong>EOF</strong> tag, and deleting everything following.</p>
<p><strong>A Simple Experiment</strong></p>
<p>Let&#8217;s start with a proper secret document containing missile plans&#8230;</p>
<p><img class="alignnone size-full wp-image-785" title="20091123-missile-plans-1" src="http://paperjammed.com/wp-content/uploads/2009/11/20091123-missile-plans-1.gif" alt="20091123-missile-plans-1" width="439" height="418" /></p>
<p>Suppose we want to obscure some special information in paragraph 37. We can open the file in Acrobat Professional and use its text editing features to swap in the venerable <em>Lorem Ipsum</em> text.</p>
<p>Here&#8217;s what it looks like after the switch:</p>
<p><img class="alignnone size-full wp-image-786" title="20091123-lorem-ipsum" src="http://paperjammed.com/wp-content/uploads/2009/11/20091123-lorem-ipsum.gif" alt="20091123-lorem-ipsum" width="598" height="243" /></p>
<p>You can see here that the first seven lines of text starting on paragraph 37 have been replaced with appropriate unreadable text.</p>
<p>Now, open the new PDF file in a binary editor (since PDF files contain a mix of text and binary, the editor must be a binary editor).</p>
<p><img class="alignnone size-full wp-image-787" title="20091123-binary-editor" src="http://paperjammed.com/wp-content/uploads/2009/11/20091123-binary-editor.gif" alt="20091123-binary-editor" width="693" height="633" /></p>
<p>Note the <strong>%%EOF</strong> character sequence embedded in the text. This is the first <strong>EOF</strong> tag, indicating where the original file ended. All we need to do is place the cursor to the right of the <strong>EOF</strong> and delete everything to the end of the file.</p>
<p>Once we have done so, it&#8217;s like magic:</p>
<p><img class="alignnone size-full wp-image-788" title="20091123-after-binary-editing" src="http://paperjammed.com/wp-content/uploads/2009/11/20091123-after-binary-editing.gif" alt="20091123-after-binary-editing" width="794" height="323" /></p>
<p>The edits that replaced lines of paragraph 37 with gibberish have neatly been undone!</p>
<p><strong>More Details</strong></p>
<p>From the <a href="http://www.mactech.com/articles/mactech/Vol.15/15.09/PDFIntro/">PDF Intro document</a> linked earlier:</p>
<p>&#8220;The trailer, it turns out, plays an important role in the way PDF implements incremental updating. The key concept to understand here is that a PDF file is never overwritten, only added to. That goes for all portions of the PDF file &#8211; even the trailer itself, and the end-of-file marker. In other words, a multiply-updated PDF document may contain multiple trailers &#8211; and multiple end-of-file markers! (There may be numerous occurrences of %%EOF.) Each time the file is edited, an addendum is written to the tail of the file, consisting of the content objects that have changed, a new xref section, and a new trailer containing all the information that was in the previous trailer, as well as a /Prev key specifying the byte offset (from the beginning of the file) of the previous xref section. The cross-reference info will then be distributed across more than one xref section. To access all of the cross-references, the reader must walk the list of /Prev keys in all the trailers, in reverse order.</p>
<p>Space doesn&#8217;t permit a detailed exploration of updates here, but you can find several examples in Appendix A of the PDF 1.3 specification (available at <a href="http://partners.adobe.com/asn/developer">http://partners.adobe.com/asn/developer</a>).&#8221;</p>
<p><strong>Summary</strong></p>
<p>It is important to understand that the PDF standard allows for appended updates to files that leave the original document intact, regardless of how drastic the changes are. If you are intent on redacting text from PDF documents, do not depend on simply deleting the secrets using a PDF editor—you must use a proper redaction tool that addresses these issues correctly.</p>
<p>That said, I did some experimenting with a few utilities (Apple Preview, PDFpen, and Adobe Acrobat Pro) and found that some write the file from scratch each time, with no lingering cruft from former versions, while others respect the original intent of the PDF standard. This means that you can&#8217;t trust that older revisions are being retained in your file and you can&#8217;t trust that they aren&#8217;t.</p>
<p>Be conservative: use a redaction tool for secrecy and proper backups for versioning.</p>
]]></content:encoded>
			<wfw:commentRss>http://paperjammed.com/2009/11/23/keeping-your-secrets-to-yourself-old-changes-lingering-in-your-pdf-files/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Dodged the corrupt-document bullet this time, just barely&#8230;</title>
		<link>http://paperjammed.com/2009/10/27/dodged-the-corrupt-document-bullet-this-time-just-barely/</link>
		<comments>http://paperjammed.com/2009/10/27/dodged-the-corrupt-document-bullet-this-time-just-barely/#comments</comments>
		<pubDate>Tue, 27 Oct 2009 21:52:30 +0000</pubDate>
		<dc:creator>Tad</dc:creator>
				<category><![CDATA[Searching and Indexing]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Data Loss]]></category>
		<category><![CDATA[Geeky]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[Knowledge Management]]></category>
		<category><![CDATA[PDF]]></category>

		<guid isPermaLink="false">http://paperjammed.com/?p=750</guid>
		<description><![CDATA[A couple of weeks ago, a co-worker sent me a PDF document to look at. He said that he was having trouble copying and pasting from the document and was scratching his head about why this particular PDF would have such issues.
As it would turn out, there were several thousand other documents on a file [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-medium wp-image-751" title="gibberish document in a file folder" src="http://paperjammed.com/wp-content/uploads/2009/10/iStock_000006486654XSmall-300x199.jpg" alt="gibberish document in a file folder" width="300" height="199" />A couple of weeks ago, a co-worker sent me a PDF document to look at. He said that he was having trouble copying and pasting from the document and was scratching his head about why this particular PDF would have such issues.</p>
<p>As it would turn out, there were several thousand other documents on a file server that shared the same funny behavior. By the time we were done struggling with this problem I had gained new respect for PDF corruption issues and their prevention.<span id="more-750"></span></p>
<p><strong>The Problem</strong></p>
<p>We were looking to load a few thousand of these scientific reports into a fancy-schmancy new database, with linguistics searching and other bells and whistles. Much to our chagrin, these documents just weren&#8217;t loading, and we couldn&#8217;t understand why. They were text documents, with some embedded images, but mostly straightforward text.</p>
<p>Here is an excerpt:</p>
<p><img class="alignnone size-full wp-image-755" title="20091027-plaintext" src="http://paperjammed.com/wp-content/uploads/2009/10/20091027-plaintext.gif" alt="20091027-plaintext" width="521" height="93" /></p>
<p>And you can tell that it is right and proper text because when I blow it up all the way, the fonts are nice and smooth—this isn&#8217;t just an image of text.</p>
<p><img class="alignnone size-full wp-image-756" title="20091027-smooth-letter" src="http://paperjammed.com/wp-content/uploads/2009/10/20091027-smooth-letter.gif" alt="20091027-smooth-letter" width="258" height="295" /></p>
<p>But if I copy and paste that particular paragraph into any handy editor (Notepad, in this case), this is what I see:</p>
<p><img class="alignnone size-full wp-image-757" title="20091027-notepad" src="http://paperjammed.com/wp-content/uploads/2009/10/20091027-notepad.gif" alt="20091027-notepad" width="496" height="155" /></p>
<p>And as far as I know, at this point the actual text is beyond the reach of average folks like me. We tried, believe me we tried.</p>
<p><strong>What went wrong?</strong></p>
<p>A quick Google of the subject led us to understand that many PDF generation tools embed subsets of fonts, with nonstandard mappings from the text to the font.</p>
<p>This fellow explains it nicely:</p>
<p>&#8220;The PDF file does not contain all the information to extract the text. The problem is that a character in a PDF file may not contain information what &#8220;real&#8221; character it relates to. Some PDF generators do a pretty bad job when they embed fonts into PDF files. They use a proprietary encoding mechanism (e.g. 1 is A, 2 is B, 3 is C, &#8230;) in both the embedded font and when they place glyphs on the page. Without a table that implements the reverse (e.g. character code 1 is &#8216;A&#8217;) you cannot extract text from such a file.</p>
<p>There is nothing you can do (besides to complain to whoever created the PDF file, and the author of the software that created this file).&#8221;<br />
— from <a href="http://www.experts-exchange.com/Web_Development/Document_Imaging/Adobe_Acrobat/Q_21426533.html">khkremer on experts-exchange.com</a></p>
<p>As it would turn out, many of the reports had been generated by printing to Adobe Distiller from Microsoft Word. It would seem that the default settings used for Distiller included the &#8220;totally hose my document content&#8221; switch.</p>
<p><strong>The Solution</strong></p>
<p>We fretted over this quite a bit. These are important scientific reports, and there is no way to easily ungarble them. We finally ended up contacting the <a href="http://finereader.abbyy.com/">Abbyy Finereader</a> folks and trying out their OCR toolkit for Linux: not only did this product make fast work of running optical character recognition on the sample document, but once we had a script running, we managed to blow through the 10,000 pages the trial license gave us, in a day or two.</p>
<p><strong>Imperfect, at best</strong></p>
<p>I am happy that we were able to salvage the bulk of the electronic knowledge found within those thousands of files, but our work barely scratched the surface.</p>
<p>For example, most of these documents have rich bookmarking of sections and keywording, such as this (content tastefully blurred on purpose).</p>
<p><img class="alignnone size-full wp-image-760" title="20091027-doc-with-contents" src="http://paperjammed.com/wp-content/uploads/2009/10/20091027-doc-with-contents.gif" alt="20091027-doc-with-contents" width="500" height="348" /></p>
<p>In addition, scientific documents typically have loads of tables full of numbers. Though it is possible to mine this data with a good OCR tool (the FineReader API provides tools for just this purpose), the tables are far more difficult to extract correctly once the original text information is lost.</p>
<p><strong>Final thoughts</strong></p>
<p>I wrote a few weeks about document formats, <a href="http://paperjammed.com/2009/09/29/are-your-portable-document-format-files-all-that/">mentioning the PDF/A document standard</a>. This is worth investigating, regardless of what your document needs are.</p>
<p>If our thousands of files had been originally generated as PDF/A, it is certain that we would have been able to copy/paste from them without problem: PDF/A prohibits such font shenanigans as were perpetrated on our garbled reports.</p>
<p>In the end, our OCR sledgehammer approach worked like a charm, and is probably sufficient for our needs. Text mining is a pretty slushy business, so no-one will complain if there are a few typos on each page—if they find the doc in a search, they can print it and read it the old fashioned way.</p>
]]></content:encoded>
			<wfw:commentRss>http://paperjammed.com/2009/10/27/dodged-the-corrupt-document-bullet-this-time-just-barely/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Are your Portable Document Format files all that?</title>
		<link>http://paperjammed.com/2009/09/29/are-your-portable-document-format-files-all-that/</link>
		<comments>http://paperjammed.com/2009/09/29/are-your-portable-document-format-files-all-that/#comments</comments>
		<pubDate>Wed, 30 Sep 2009 00:41:36 +0000</pubDate>
		<dc:creator>Tad</dc:creator>
				<category><![CDATA[Green Living]]></category>
		<category><![CDATA[Paperless Life]]></category>
		<category><![CDATA[Scanning]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Data Loss]]></category>
		<category><![CDATA[PDF]]></category>
		<category><![CDATA[Printing]]></category>
		<category><![CDATA[Searching and Indexing]]></category>

		<guid isPermaLink="false">http://paperjammed.com/?p=692</guid>
		<description><![CDATA[Like most people who are trying to archive reams of paper, the one reliable tool I always turn to is Adobe Portable Document Format.
I trust my digital life to PDF. Almost everything I scan and most documents I write eventually end up squirreled away somewhere as PDF documents.
Have you ever considered just how portable those [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-medium wp-image-696" src="http://paperjammed.com/wp-content/uploads/2009/09/iStock_000009658438XSmall-201x300.jpg" alt="Lost keys at the beach" width="201" height="300" />Like most people who are trying to archive reams of paper, the one reliable tool I always turn to is Adobe Portable Document Format.</p>
<p>I trust my digital life to PDF. Almost everything I scan and most documents I write eventually end up squirreled away somewhere as PDF documents.</p>
<p>Have you ever considered just how portable those documents really are?</p>
<p><strong>What&#8217;s wrong with PDF?</strong></p>
<p><strong></strong>It seems strange to question the portability of these files, doesn&#8217;t it?</p>
<p>For the past ten or fifteen years Adobe has been providing Acrobat Reader and singing the wonders of their new universal document format. And it seemed to be all that, too—regardless of the audible groan we give when Acrobat launches after we click a link, isn&#8217;t it amazing that we can download press-ready copies of our income tax forms, that are guaranteed to look exactly the same when you print them as when I print them? Read on to see what dangers lurk within.<span id="more-692"></span></p>
<p><strong>What&#8217;s the Problem?</strong></p>
<p>In order to understand the nature of the PDF portability issues, one need only look as far as the web browser for an analogy. Consider how the web browser went from a barebones tool that could display a simple language, HTML, in a neutral way, fitting the web content onto each user&#8217;s screen, to a memory hogging behemoth that is an integral part of your operating system. It didn&#8217;t happen all at one; it has been death by a thousand cuts.</p>
<p>Mirroring the evolution of web browsers, the PDF document standard has adapted over the years to include many bells and whistles such as embedded audio, video, and JavaScript. It is these features that chip away at the core purpose and <em>raison d&#8217;être</em> of the PDF standard.</p>
<p><strong>An example: Font Issues</strong></p>
<p><strong></strong>A simple example of the weakness of these extended PDF features is the humble text font. When your application generates a PDF document, there is the option of using 14 standard PDF fonts, local machine fonts, or embedded TTF or Postscript fonts.</p>
<blockquote><p>There are 14 standard fonts that should be available by default in each PDF reader. These fonts are Courier, Courier Bold, Courier Italic (Oblique), Courier Bold and Italic, Helvetica, Helvetica Bold, Helvetica Italic (Oblique), Helvetica Bold and Italic, Times Roman, Times Roman Bold, Times Roman Italic, Times Roman Bold and Italic, Symbol and ZapfDingBats® (<a href="http://itextdocs.lowagie.com/tutorial/fonts/index.php">source</a>)</p></blockquote>
<p>Guess what happens when you set your document in <em><a href="http://new.myfonts.com/fonts/linotype/itc-mona-lisa/">Mona Lisa Solid ITC</a></em> and then print to PDF and send to all of your colleagues? Does your friend&#8217;s machine have a copy of this font? Maybe, and maybe not.</p>
<p>As I was writing this, I planned on putting together a cute demo by saving a document set in Mona Lisa Solid ITC in PDF from my Mac and then opening it on a PC. Much to my surprise (and delight), I found that the default &#8220;Print to PDF&#8221; functionality on my Mac does, in fact, embed the font within the document.</p>
<p>Regardless, if you have always just trusted that the fonts would be identical across platforms, you could get quite a surprise when your friend tries to print your beautiful document.</p>
<p><strong>PDF/A Standard</strong></p>
<p>Some time back, Adobe recognized the need for a more tightly controlled standard, for creating <em>really portable</em> documents, instead of mere <em>portable</em> documents. This standard, dating from 2005, is referred to as <a href="http://en.wikipedia.org/wiki/PDF/A">PDF/A</a>, where the A stands for Archive.</p>
<blockquote><p>A key element to &#8230; reproducibility is the requirement for PDF/A documents to be 100 % self-contained. All of the information necessary for displaying the document in the same manner every time is embedded in the file. This includes, but is not limited to, all content (text, raster images and vector graphics), fonts, and color information. A PDF/A document is not permitted to be reliant on information from external sources (e.g. font programs and hyperlinks). (<a href="http://en.wikipedia.org/wiki/PDF/A#Description">Wikipedia</a>)</p></blockquote>
<p>Basically PDF/A forbids all of the flashy stuff and sticks to the basics: good solid document rendering.</p>
<p>Banned features include:</p>
<ul>
<li>Audio and Video</li>
<li>JavaScript</li>
<li>Encryption</li>
<li>Nonstandard metadata</li>
<li>Transparent images</li>
</ul>
<p>In addition to the loss of several features, PDF/A documents can be somewhat larger, due to the embedded fonts, and they might have rendering issues with images that depend on transparency.</p>
<p>With all that, it still sounds like an enticing concept. Many PDF tools speak fluent PDF/A. Check out your own toolkit and see if you can future-proof your documents a little more</p>
<p><strong>Here&#8217;s more on PDF/A documents</strong></p>
<p><a href="http://blog.nitropdf.com/index.php/2009/07/13/longterm-digital-archiving-pdfa/">Long-term digital archiving with PDF/A</a> (The PDF Blog)<br />
<a href="http://en.wikipedia.org/wiki/PDF/A">PDF/A</a> (Wikipedia)<br />
<a href="http://www.pdfa.org/doku.php?id=pdfa:en:pdfa_whitepaper">PDF/A &#8211; A new Standard for Long-Term Archiving</a> (PDF/A Competence Center)</p>
]]></content:encoded>
			<wfw:commentRss>http://paperjammed.com/2009/09/29/are-your-portable-document-format-files-all-that/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>When migrating to a new operating system, Look Before You Leap!</title>
		<link>http://paperjammed.com/2009/09/07/when-migrating-to-a-new-operating-system-look-before-you-leap/</link>
		<comments>http://paperjammed.com/2009/09/07/when-migrating-to-a-new-operating-system-look-before-you-leap/#comments</comments>
		<pubDate>Tue, 08 Sep 2009 02:56:21 +0000</pubDate>
		<dc:creator>Tad</dc:creator>
				<category><![CDATA[Backups]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Data Loss]]></category>
		<category><![CDATA[Geeky]]></category>
		<category><![CDATA[Macintosh]]></category>
		<category><![CDATA[Tips]]></category>

		<guid isPermaLink="false">http://paperjammed.com/?p=676</guid>
		<description><![CDATA[I can&#8217;t help it. As soon as I hear of a new version of anything, whether it&#8217;s an application or the entire operating system, I have to install it.
Now prudence would lead one to take careful steps and wait until all of the wrinkles are ironed out before starting. I was almost not prudent enough [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-medium wp-image-685" src="http://paperjammed.com/wp-content/uploads/2009/09/iStock_000005873765XSmall-241x300.jpg" alt="" width="241" height="300" />I can&#8217;t help it. As soon as I hear of a new version of <em>anything</em>, whether it&#8217;s an application or the entire operating system, I have to install it.</p>
<p>Now prudence would lead one to take careful steps and wait until all of the wrinkles are ironed out before starting. I was almost not prudent enough this week.</p>
<p><strong>Mac OS X Snow Leopard</strong></p>
<p>So folks have been talking about the new <a href="http://en.wikipedia.org/wiki/Mac_OS_X_v10.6">Snow Leopard</a> operating system for Mac. Over the past year, Apple has been positioning this version as more of a &#8220;under the hood&#8221; upgrade that tightens things up rather than a glitzy overhaul of the user interface. No matter what they said it was, I figured that it was newer, and therefore better, than the current OS—Leopard–and I had to have it.</p>
<p>I ordered my copy last week on Amazon and sat down with a smile as I awaited its arrival. And then I thought about doing a few quick Googles to see how other people have been making out with Snow Leopard. I immediately happened upon a few upgrade guides <a href="http://www.cultofmac.com/how-to-upgrade-to-snow-leopard-the-right-way/15141">like this one</a>, providing sage advice about the upgrade process. They recommended the &#8220;slash and burn&#8221; method, starting from a clean hard drive, and I felt that was a good idea. Nothing better than a wipe and fresh install to make your machine zip along twice as fast. And therein lies a tale.<span id="more-676"></span></p>
<p><strong>The first sign of trouble</strong></p>
<p>As I was reading up on the Snow Leopard upgrade process, I happened upon lists of &#8220;unsupported software&#8221; and casually glanced at the lists, expecting esoteric tools only used by three über geeks in the audio recording industry or perhaps some exotic ray-tracing software. Much to my surprise, I saw two of my favorite applications, in a very very short list of troublesome apps: <a href="http://en.wikipedia.org/wiki/Parallels_Desktop_for_Mac">Parallels</a> and <a href="http://www.elgato.com/elgato/na/mainmenu/home/what-is-eyetv.en.html">EyeTV</a>.</p>
<p>I immediately checked the versions and breathed a sigh of relief when I saw that my EyeTV version was safe. But, Parallels was another story&#8230; They have no plans for patching Parallels 3 to work with Snow Leopard, and why should they, when they can sell us Parallels 4!</p>
<p>So, I ordered my fresh copy of Parallels 4, from Amazon with a twenty dollar rebate. When it arrived, I spent an evening upgrading Parallels, and I thought I was all set for Snow Leopard.</p>
<p><strong>Preventative Measures</strong></p>
<p>Following the advice of the upgrade websites, and prior experience, I used <a href="http://www.bombich.com/software/ccc.html">Carbon Copy Cloner</a> to make a full backup of my hard drive on a spare external drive. On a hunch, I turned on the drive that I use for <a href="http://en.wikipedia.org/wiki/Time_Machine_(Apple_software)">Time Machine</a> and had it do one final &#8220;Time Machine&#8221; sweep through the system before bidding <em>adiu</em> to Leopard.</p>
<p>I knew that I had all of my installation media for stuff like iLife and Photoshop Elements, and I had all of my license keys in electronic form. It would be a simple matter of mounting the backup drive, copying over my loads of documents, and peering into them to find keys.</p>
<p><strong>The first attempt</strong></p>
<p>I boldly inserted the Snow Leopard disk and booted from the DVD drive, selecting the &#8220;Slash and Burn&#8221; method of installation. I reformatted the hard drive and went off for dinner while Snow Leopard installed.</p>
<p><strong>Trouble</strong></p>
<p>When I got home that evening, I started the lengthy process of installing stuff. I suddenly realized that it was not as easy as I had hoped: it&#8217;s one thing to reinstall something like Microsoft Office, but there seemed to be more loose ends than I had considered:</p>
<ul>
<li>How would I migrate my Mail settings from the old image to the new?</li>
<li>What was the best way to migrate the Address Book contents?</li>
<li>iTunes is great, but it has tendrils in everything. Can I simply copy my old library to the new without messing up my iPhone, Address Book, or other linked stuff?</li>
<li>How about those nice password tools such as 1Password and SplashID that keep your passwords safe and sound? I had no clue how to get their contents from the backup. I wasn&#8217;t sure if it was even possible to do so—perhaps I was supposed to have exported the data beforehand.</li>
</ul>
<p>It was becoming clearer to me that I had not done my homework at all.</p>
<p><strong>More trouble</strong></p>
<p>My initial shock at the depth of the upgrade process led me to start making a list of applications and looking at what I needed for each one. I soon found out that Snow Leopard support is somewhat spotty in many applications. In particular, the FineReader for ScanSnap software that I depend on so much for my scanning work flow is <a href="http://www.documentsnap.com/abbyy-finereader-and-snow-leopard-file-not-created-with-scansnap/">not fully supported</a>. Fujitsu says that they will have an update soon and to keep checking their web site.</p>
<p>My password tool, 1Password, is <a href="http://www.switchersblog.com/2009/08/update-1password-on-snow-leopard.html">another problem child</a>. It works only on 32-bit Safari, and Snow Leopard now runs Safari in 64-bit mode. Of course, a new version is coming, and I will probably have to pay for it, but it is still in beta.</p>
<p>There was <a href="http://graphicssoft.about.com/b/2009/08/28/what-about-photoshop-elements-6-in-snow-leopard.htm">quite a bit of chatter</a> on the Web about whether Adobe Photoshop Elements would work on Snow Leopard, and the responses seem split fifty-fifty for now.</p>
<p>Three very important tools were in danger of running in limited mode or not running at all, so I had to throw in the towel.</p>
<p><strong>Time Machine saves the day!</strong></p>
<p>As I sat, humbled, before my vanilla install of Snow Leopard, I admitted defeat. I slipped the Snow Leopard DVD back in the drive and rebooted from the DVD. This time, I selected the &#8220;Restore from Time Machine&#8221; option and turned on my Time Machine drive.</p>
<p>Guess what? It worked perfectly! Unlike many software products, Time Machine does exactly what it promises.</p>
<p>Within a few hours, my machine was fully restored to the way it looked seconds before I made my first attempt at Snow Leopard.</p>
<p><strong>A Final Word</strong></p>
<p>Learn from my mistakes, and my salvation by the full backup. As much as you can&#8217;t wait to upgrade, please do the following:</p>
<ul>
<li>Inventory all of your applications that you really need.</li>
<li>Obtain the installation media (download or CD) for every single one.</li>
<li>Obtain the keys for every single one.</li>
<li>Investigate whether you need to export data from any of them, and make a checklist for these exports prior to upgrade.</li>
<li>Check the &#8220;Unsupported Software&#8221; lists that are out there for any red flags.</li>
<li>Check the web sites of your most important apps for their official word.</li>
<li>And finally, do a complete backup!</li>
</ul>
<p>It&#8217;s amazing how many applications and weird little utilities we forget we have. How could I have possibly remembered that I compiled a custom copy of the &#8220;rsync&#8221; executable for my backup workflow? I would have lost that and had to figure out how to rebuild it on Snow Leopard.</p>
<p>And I haven&#8217;t even talked about making sure your documents make it safely onto the new machine. That&#8217;s a whole &#8216;nother story.</p>
<p>In case I forgot to say it, please make a full backup.</p>
<p><strong>[Update: I'm giving Snow Leopard a rest for a few months]</strong></p>
<p>It has been said that Time Machine allows you to do a full restore from bare metal, and I&#8217;m living proof: I have done exactly that twice in the past week, with astounding success.</p>
<p>Encouraged by an episode of the <a href="http://www.macobserver.com/tmo/features/mac_geek_gab/">Mac Geek Gab</a> where they talked about their experiences upgrading their existing systems to Snow Leopard, I decided I would give the upgrade-in-place option a try. I expected some things to not work well and others to be quirky, but here&#8217;s what happened&#8230;</p>
<p>The actual install was painless, taking an hour or so to complete. I then began to kick the tires to see what was broken.</p>
<p>It was clear where those 64 bits went: apps like Safari were positively zippy, and I was pleasantly surprised with each new application I launched. All of my special settings seemed to make it through alive, including my password manager, though I did have to re-enter some of my registration keys. All of my mail and contacts made it through well. I was able to sync my iPhone without incident.</p>
<p>I found a few apps that weren&#8217;t working correctly and I looked for newer 10.6-compatible versions. I found newer versions of <a href="http://www.ironicsoftware.com/yep/">Yep</a> and <a href="http://alum.hampshire.edu/~bjk02/xGestures/">xGestures</a>.</p>
<p>I did note that there is currently no ad blocker available for Safari that runs in 64-bit mode. This is disappointing because even though I understand that Apple wants us to see <em>their</em> ads, I can&#8217;t imagine that they really want us to suffer from the flickering jumping dreck that should have ended with the hated &#8220;punch the monkey&#8221; banners of years gone by. The fact of the matter is, if I want that 64-bit speed and snap, I guess I have to watch ads.</p>
<p><strong>The Showstopper</strong></p>
<p>I decided to scan a document to see just how difficult it would be to get my workflow going again. Michael F, below, wrote the truth about the situation: the scanner works fine in certain modes, but the OCR software doesn&#8217;t.</p>
<p>He pointed out that it was a problem of the FineReader software looking for a specific bit of metadata in the PDF identifying it as a ScanSnap PDF. Sadly, that metadata string changed.</p>
<blockquote><p>The Finereader software is looking for “Mac OS X 10.5.8 Quartz PDFContext”, but under Snow Leopard, the string is set to “Mac OS X 10.6 Quartz PDFContext” instead.</p></blockquote>
<p>There are ways to tweak PDF metadata, and one of them is by using <a href="http://www.accesspdf.com/pdftk/">pdftk</a>.</p>
<p>I went to the pdftk site, all ready to download it and start OCRing my PDFs. I was greeted with less than optimal news: they have a version compiled for Panther, a version of OS X from several years ago.</p>
<p>I knew it wouldn&#8217;t work, but I gave it a try anyway: the app told me it needed Rosetta to run. I could have installed Rosetta at that point, but I figured I wanted a <em>proper</em> compiled version.</p>
<p>From there, I looked into compiling the app on OS X 10.6. I should have remembered my struggles with this several months ago on a Solaris Unix box when I found that pdftk depends on a monster called GCJ that required about forty other software packages to compile—it seemed a gargantuan task that I wasn&#8217;t ready to begin.</p>
<p>On a hunch, I inspected the content of a<em> new</em> pdf and an <em>old</em> pdf, the latter still acceptable to FineReader. Though much of the file was raw binary, the metadata was in text at the end. A short <a href="http://en.wikipedia.org/wiki/Sed">sed</a> script was all it took to swap the nice text string for the offending 10.6 one.</p>
<p>In spite of my best efforts, FineReader still rejected my hand-tooled PDF file. It knew that it was a bogus file.</p>
<p>I have looked into Abbyy FineReader several times before, as well as Fujitsu&#8217;s ScanSnap support, and was unimpressed. For two vendors that produce products that are at the top of their class—FineReader is arguably the best OCR you can get for Mac, and ScanSnap is the best document scanner for the common man—they sure do have miserable customer support.</p>
<p>It is as if neither company cares a whit about the Macintosh platform or their customers. While most other vendors are busily patching their products and giving hourly updates on their Snow Leopard compatibility progress, Abbyy and Fujitsu just don&#8217;t seem to care that their best-of-breed combo suddenly doesn&#8217;t work on Mac.</p>
<p>Once they get this sorted out (hopefully in the next few months) I&#8217;ll give Snow Leopard another try. In the meantime, I&#8217;m sticking with good old Leopard.</p>
]]></content:encoded>
			<wfw:commentRss>http://paperjammed.com/2009/09/07/when-migrating-to-a-new-operating-system-look-before-you-leap/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Keeping Your Documents Readable for Years to Come</title>
		<link>http://paperjammed.com/2009/07/13/keeping-your-documents-readable-for-years-to-come/</link>
		<comments>http://paperjammed.com/2009/07/13/keeping-your-documents-readable-for-years-to-come/#comments</comments>
		<pubDate>Tue, 14 Jul 2009 02:46:59 +0000</pubDate>
		<dc:creator>Tad</dc:creator>
				<category><![CDATA[Backups]]></category>
		<category><![CDATA[Paperless Life]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Data Loss]]></category>
		<category><![CDATA[Macintosh]]></category>
		<category><![CDATA[Media]]></category>
		<category><![CDATA[PDF]]></category>
		<category><![CDATA[Printing]]></category>
		<category><![CDATA[Windows]]></category>

		<guid isPermaLink="false">http://paperjammed.com/?p=601</guid>
		<description><![CDATA[Whether you are a cube dweller sharing an electronic document with your next door neighbor or a homeowner attempting to catalogue your digital life, you will soon encounter resistance in the form of document incompatibility. What good is a byte-for-byte perfect duplicate of the original if you cannot open it in an application?
My own choice [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-medium wp-image-607" src="http://paperjammed.com/wp-content/uploads/2009/07/iStock_000000498634XSmall-300x199.jpg" alt="" width="300" height="199" />Whether you are a cube dweller sharing an electronic document with your next door neighbor or a homeowner attempting to catalogue your digital life, you will soon encounter resistance in the form of document incompatibility. What good is a byte-for-byte perfect duplicate of the original if you cannot open it in an application?</p>
<p>My own choice for document format is almost always Portable Document Format (PDF), but rather than just state this, I would like to consider some of the factors involved.</p>
<p>This is the first of a series of articles covering document formats. This article focuses specifically on the distinction between works in progress and finished product.<span id="more-601"></span></p>
<p><strong>Two Kinds of Documents</strong></p>
<p>In general, we can consider two broad categories of documents: working documents (works in progress) and archived documents. You can call these by many different names, but the fundamental distinction is still there.</p>
<p><strong>Working Documents</strong></p>
<p>These are documents that you are still writing. They share some characteristics:</p>
<ul>
<li>They must be retained in their original format, such as Microsoft Word.</li>
<li>The formats are often very specialized. Quite often another tool can import such a document, but you usually lose something in the translation.</li>
<li>You and your colleagues need to have the same editor software to view and modify the documents.</li>
<li>They are often short-lived. This phase of a document&#8217;s life usually doesn&#8217;t more than a few months (though a template document might be kept for many years).</li>
<li>A good backup strategy will need a short window between backups; these documents change often, so they should be backed up frequently.</li>
<li>You may want to consider a document versioning strategy, so you can see how the document appeared at different stages during its life.</li>
</ul>
<p>Here are some examples:</p>
<ul>
<li>Microsoft Word documents</li>
<li>Visio diagrams</li>
<li>Photos that you are still retouching</li>
<li>Audio files that you are in the process of curating (e.g. applying ID3 tags)</li>
</ul>
<p><strong>Archived Documents</strong></p>
<p>These are documents that are read-only, meant to be viewed but never modified.</p>
<ul>
<li>They often must be rendered in very precise ways, so each viewer sees the document as intended (consider a 1040 form from the IRS)</li>
<li>They may be around for a long time.</li>
<li>These documents should be less tightly bound to a particular software product. PDF vs. MS Word; JPG vs. Adobe Photoshop.</li>
<li>They typically have a wider audience. You may share a work-in-progress with a co-worker or two, but a finished read-only document might be read by hundreds or thousands.</li>
<li>Any user should be able to read these documents, with little effort.</li>
<li>Your backup strategy is probably going to be more focused on longevity and less focused on frequency. These documents are in it for the long haul.</li>
</ul>
<p><strong>Why not start with a simple example?</strong></p>
<p>Here is a screenshot of an application I use in my day job:</p>
<p><img class="size-full wp-image-602 alignnone" src="http://paperjammed.com/wp-content/uploads/2009/07/20090713-caffeine.gif" alt="" width="528" height="474" /></p>
<p>Just in case you did not recognize the unmistakable visage of this small molecule, I have labeled it appropriately.</p>
<p>This is an application called <a href="http://en.wikipedia.org/wiki/ChemDraw">ChemDraw</a> from <a href="http://www.cambridgesoft.com/">Cambridgesoft</a>, and unless you are a chemist you have probably never heard of it. My molecule is saved as <strong>caffeine.cdx</strong> in a format that only ChemDraw knows intimately (though there are other similar chemistry tools that can import this file format).</p>
<p>My point is simple: if your friend sent you a copy of <strong>caffeine.cdx</strong>, how exactly would you open it?</p>
<p>In contrast, <a href="http://paperjammed.com/wp-content/uploads/2009/07/20090713-caffeine.pdf">here is a more accessible rendition</a> of the same molecule in PDF format. Try it out; you should be able to view the molecule, and zoom in on details.</p>
<p>What if you had to show someone this document five years down the road? Do you want to have to chase down a possibly obsolete version of a very expensive application that might not even run on your operating system?</p>
<p><strong>Obsolescence</strong></p>
<p>Some time back I was sifting through some files on an old server at work that apparently had been written by me. Fifteen years ago I was attending night classes and writing many of my English assignments on a <a href="http://en.wikipedia.org/wiki/VAX">VAX</a> running <a href="http://en.wikipedia.org/wiki/OpenVMS">VMS</a> at work (over my lunch break!). I was using some anemic version of WordPerfect that had been ported to VMS. This arrangement saw me safely through college, but was not conducive to long term document storage.</p>
<p>Do you have any idea what VMS directory structures look like? Maybe, and maybe not. Are these files compatible with the contemporary DOS versions of WordPerfect? Maybe.</p>
<p>Could I open these files on a Windows Vista machine in 2009 using Microsoft Word? <a href="http://cjis.ci.lincoln.ne.us:8080/aiug/msg00586.html">With luck</a>. What about using Pages from Apple iWork on my Mac running OS X? Doubtful.</p>
<p>Not only do we need to be concerned with special applications that only a select few (with expensive licenses) have, but we also need to consider that the file format might be obsolete beyond hope.</p>
<p>For an exaggerated example, consider the image of <a href="http://en.wikipedia.org/wiki/Punched_tape">punched paper tape</a> at the top of this article. I would have no clue what to do if I were given a roll of this tape.</p>
<p><strong>Which do you keep?</strong></p>
<p>Look at the characteristics of the document types listed above and see which one fits your document best. Quite often you will find yourself keeping both the original document and a PDF rendition. Indeed, this is what many professional document databases do.</p>
<p>If you can&#8217;t easily choose one, keep both. In most cases, I have found that I only need the PDF rendition for the long term and I couldn&#8217;t care less about the source document.</p>
<p><strong>Summary</strong></p>
<p>In the world of the paperless home, much of what we do is store digital copies of old documents for searching and possible reprinting some time in the future. Don&#8217;t make the mistake of keeping all of your documents only in their original editable format; you might just find yourself with a digital file that cannot be viewed!</p>
]]></content:encoded>
			<wfw:commentRss>http://paperjammed.com/2009/07/13/keeping-your-documents-readable-for-years-to-come/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Banish the kids to their own network!</title>
		<link>http://paperjammed.com/2009/06/02/banish-the-kids-to-their-own-network/</link>
		<comments>http://paperjammed.com/2009/06/02/banish-the-kids-to-their-own-network/#comments</comments>
		<pubDate>Wed, 03 Jun 2009 00:16:43 +0000</pubDate>
		<dc:creator>Tad</dc:creator>
				<category><![CDATA[Security]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Data Loss]]></category>
		<category><![CDATA[Geeky]]></category>
		<category><![CDATA[Networking]]></category>
		<category><![CDATA[Portable Devices]]></category>
		<category><![CDATA[Privacy]]></category>
		<category><![CDATA[Windows]]></category>

		<guid isPermaLink="false">http://paperjammed.com/?p=557</guid>
		<description><![CDATA[A nastygram from my ISP let me know that I needed to take action to lock down my home network. In this article I discuss using a spare router in a somewhat unusual daisy chain configuration in order to banish the teenagers and all of their wifi devices to their own network.]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-medium wp-image-560" src="http://paperjammed.com/wp-content/uploads/2009/06/istock_000006562749xsmall-300x210.jpg" alt="" width="300" height="210" />A few weeks ago I received an unpleasant bit of email from my Internet provider. At first, I thought it was yet another lame spammer or phisher sending me some official-looking notice, but after a moment&#8217;s inspection I realized that this was a real <em>bona-fide </em>official notice.</p>
<p>Their network security department very kindly (and politely) informed me that they had received a &#8220;cease and desist&#8221; order from a particular game publisher. They had included the game publisher&#8217;s email, complete with the incriminating evidence.</p>
<p>There it was: logs showing the MAC address of my cable modem being involved in suspicious <a href="http://en.wikipedia.org/wiki/BitTorrent_(protocol)">BitTorrent</a> activities.</p>
<p>Considering that at any time during the week there can be from two to six or seven different teenagers hanging out in my humble abode, carrying virus-ridden machines, the message was clear: I had to get serious about locking down network access<span id="more-557"></span></p>
<p><strong>The Problem</strong></p>
<p>I would have liked to have bought some net filtering software to slap on the offending machine and been done with it, however I knew that this was insufficient.</p>
<p>Even if this one event could be traced to a youthful source, a more ominous danger comes from the inevitable malware and viruses that teenagers collect on their machines as they swap cool stuff with their friends.</p>
<p>Complicating things, there are many devices on our home network: Besides their school laptops, the kids have video game consoles and one has an iPod touch, all with wifi access. Think about how many different gadgets are on <em>your</em> home network.</p>
<p>And shutting off access altogether was not an option—there is still schoolwork to be done!</p>
<p><strong>The answer: A Private Network for the Kids</strong></p>
<p>My solution was to put together an unusual network configuration using a second wireless router; I wanted the ability to manage every single kid-owned device at the flip of a switch, while leaving the grownups untouched.</p>
<p><img class="aligncenter size-full wp-image-568" src="http://paperjammed.com/wp-content/uploads/2009/06/20090602-network-devices.gif" alt="" width="600" height="550" /></p>
<p>I hooked the cable modem (<strong>red</strong>) to the main router, shown in <strong>green</strong>. I then plugged a second wireless router, shown in <strong>blue</strong>, into the first.</p>
<p>By doing this, you can see that there is <em>one single wire</em> connecting the entire <strong>blue</strong> network (the kids) to the <strong>green</strong> network. It was trivial to then configure the green<em> </em>router with appropriate access control and filtering for that one single device: the blue router.</p>
<p><strong>Some quirky details</strong></p>
<p>Home routers like these are, by default, configured with a <a href="http://en.wikipedia.org/wiki/Network_address_translation">NAT</a> firewall. They work sort of like one-way mirrors: someone on the network can see out, but nobody can see in. As a result of this, the kids (<strong>blue</strong> devices) can see any device on the main router (<strong>green</strong> devices), such as our print server and the NAS device, but no one can see <em>into</em> the kids&#8217; network.</p>
<p>As paradoxical as it seems, this is exactly what I wanted. By making the kids&#8217; network a private network, it appears to the green router as a single device. When I am configuring access restrictions, I only need to control access for the blue router&#8217;s IP address or MAC address.</p>
<p>Many consumer-grade routers have flakey firmware that just doesn&#8217;t really behave well when you start doing things like turning on filtering for multiple machines. I simplified things by bringing down the number of controlled devices to <em>one</em>. In addition, if one were to try filtering on the IP addresses or MAC addresses of individual machines, this can be easily defeated by manually changing the IP address or MAC address. With my configuration, the MAC address being filtered is the blue router, locked away safely.</p>
<p><strong>The Finer Points</strong></p>
<p>If you want to set up a network like this, do the following:</p>
<ul>
<li>(Recommended) Reset the kids&#8217; router. Hold the hard reset button on the router in while you turn on power; hold the button for 15 seconds or so.</li>
<li>Hook the kids&#8217; router up to a spare laptop using an Ethernet cable. (Turn off the wireless of the laptop for the time being).</li>
<li>Use the laptop to navigate to the configuration web page (usually 192.168.1.1).</li>
<li>Set the router&#8217;s own address to a <em>different</em> network from the main network, such as 192.168.<strong>2</strong>.1. <em>This is critical</em>.</li>
<li>Configure the router&#8217;s gateway and DHCP server entries to all point to the <em>main</em> router (192.168.1.1). This tells the kids&#8217; router to use the main router as a source for its DHCP lookups and such, rather than going to cable modem.</li>
<li>Navigate to the configuration web page at the new address (192.168.2.1). You may need to close the browser and replug the Ethernet cable.</li>
<li>Set up your wireless security for the kids however you like. Make sure to choose a different channel and SSID from your main router.</li>
<li>Remove the laptop and plug the WAN port of the kids&#8217; router into one of the LAN ports of the main router. Restart everything.</li>
<li>Test both networks to make sure things work the way you think they should.</li>
<li>(Optional) You might want to connect to the kids&#8217; router and set it&#8217;s external IP address statically. Make sure that this is set to a number on the home network (e.g. 192.168.1.2).</li>
</ul>
<p>Some notes:</p>
<ul>
<li>You can only maintain the kids&#8217; router from a machine connected to the kids&#8217; network; the home network cannot see the management screens. If you wish, you could enable remote management for the kids&#8217; network only, since the main home router is still protecting the whole network from intruders.</li>
<li>Computers on the kids&#8217; network can see all devices, but they aren&#8217;t on the same network. This means that network printers and NAS devices are accessible, but you will have to attach to them using IP addresses. I was able to easily set up the machines on the 192.168.2.1 network to use a print server on 192.168.1.100.</li>
<li>For machines that should have full access (a.k.a. <em>yours</em>), make sure that you either set the <strong>green</strong> network to be a higher priority or remove the <strong>blue</strong> network SSID entry altogether. I found out the hard way that my iMac would randomly pick the green or the blue depending on which one it saw first when it woke up.</li>
<li>This does <em>not</em> wall off your main network; it simply provides a single point of control to the entire kids&#8217; network. In other words, don&#8217;t depend on this setup to prevent malware on the kids machines from seeing your machine. You can, however, set up your PC to not trust the kids&#8217; network.</li>
</ul>
<p><strong>Wireless Network Security</strong></p>
<p>Regardless of how you set up your network, make sure you use at least WPA encryption (Never use WEP!). Make sure your passwords are solid.</p>
<p><strong>Using DD-WRT on my new wireless router</strong></p>
<p>In addition to the new network configuration, I went one step further and chose a main router that lends itself well to installation of open-source firmware. I ordered a <a href="http://www.amazon.com/Linksys-Cisco-WRT54GL-Wireless-G-Broadband-Compatible/dp/B000BTL0OA/ref=sr_1_1?ie=UTF8&amp;s=electronics&amp;qid=1243905597&amp;sr=8-1">Linksys WRT54GL</a> from Amazon for a little over fifty bucks. I chose this one because, as a direct descendent of the venerable <a href="http://en.wikipedia.org/wiki/WRT54G">WRT54G</a>, this router is very well suited for running alternative firmware such as <a href="http://en.wikipedia.org/wiki/Dd-wrt">DD-WRT</a>, giving substantial control over things like, say, access control&#8230;</p>
<p>Within a half hour after my new router arrived, I had gone to the <a href="http://www.dd-wrt.com/dd-wrtv3/dd-wrt/hardware.html">Supported Hardware</a> page, obtained the latest build of DD-WRT, and replaced the Linksys firmware with the far-better open source code.</p>
<p>I won&#8217;t go into the specifics of installation here, but it isn&#8217;t very challenging. Check out the <a href="http://www.dd-wrt.com/dd-wrtv3/index.php">DD-WRT site</a> for details.</p>
<p><strong>Closing Thoughts</strong></p>
<p>Make no mistake: we are responsible for whatever goes on our home networks. Just like your home telephone; if someone dials up some 900 number and rings up a thousand-dollar phone bill, the phone company won&#8217;t care a whit who did it, you will still pay. Likewise, regardless of who did the BitTorrent download, there is a certain degree of responsibility of the homeowner to lock down the network.</p>
<p>Another point: Without some degree of personal responsibility on the part of the kids in the house, this sort of activity would simply be an arms race of filtering and blocking versus hacking. My goal is to help keep the honest people honest and to make life more difficult for the viruses and malware.</p>
]]></content:encoded>
			<wfw:commentRss>http://paperjammed.com/2009/06/02/banish-the-kids-to-their-own-network/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
