<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Automate ScanSnap OCR process on your Mac with AppleScript (Snow Leopard Edition)</title>
	<atom:link href="http://paperjammed.com/2010/01/04/automate-scansnap-ocr-process-on-your-mac-with-applescript-snow-leopard-edition/feed/" rel="self" type="application/rss+xml" />
	<link>http://paperjammed.com/2010/01/04/automate-scansnap-ocr-process-on-your-mac-with-applescript-snow-leopard-edition/</link>
	<description>Has paper taken over your life?</description>
	<lastBuildDate>Thu, 01 Jul 2010 19:23:15 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Tad</title>
		<link>http://paperjammed.com/2010/01/04/automate-scansnap-ocr-process-on-your-mac-with-applescript-snow-leopard-edition/comment-page-1/#comment-10258</link>
		<dc:creator>Tad</dc:creator>
		<pubDate>Thu, 01 Jul 2010 19:23:15 +0000</pubDate>
		<guid isPermaLink="false">http://paperjammed.com/?p=840#comment-10258</guid>
		<description>Glad to hear you found a solution. As I said further up, for one situation at work we ended up licensing the ABBYY engine for Linux and running everything in batch mode.

There was no great coding work on our part: we simply built their command line example and then wrapped that with a bash script.

Some stuff works fine in a desktop scripting model, but there is a threshold beyond which a more batch-level solution is best.</description>
		<content:encoded><![CDATA[<p>Glad to hear you found a solution. As I said further up, for one situation at work we ended up licensing the ABBYY engine for Linux and running everything in batch mode.</p>
<p>There was no great coding work on our part: we simply built their command line example and then wrapped that with a bash script.</p>
<p>Some stuff works fine in a desktop scripting model, but there is a threshold beyond which a more batch-level solution is best.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: M</title>
		<link>http://paperjammed.com/2010/01/04/automate-scansnap-ocr-process-on-your-mac-with-applescript-snow-leopard-edition/comment-page-1/#comment-10255</link>
		<dc:creator>M</dc:creator>
		<pubDate>Thu, 01 Jul 2010 16:10:25 +0000</pubDate>
		<guid isPermaLink="false">http://paperjammed.com/?p=840#comment-10255</guid>
		<description>Hi Tad,

yes, that didn&#039;t work out on my side. I&#039;ve now bought the Linux engine of ABBYY (150 EUR for 12000 pages a year) and written a wrapper around it that recursively and autonomously iterates my directory structure and runs the engine on all documents that have not yet been treated. I&#039;ve made it open source: 

http://www.mnsoft.org/547.0.html?&amp;cHash=9120c122ed&amp;tx_ttnews[backPid]=544&amp;tx_ttnews[tt_news]=30

and

pdfocrwrapper.sourceforge.net.

Tested on thousands of pages, works perfectly.

HTH,

M</description>
		<content:encoded><![CDATA[<p>Hi Tad,</p>
<p>yes, that didn&#8217;t work out on my side. I&#8217;ve now bought the Linux engine of ABBYY (150 EUR for 12000 pages a year) and written a wrapper around it that recursively and autonomously iterates my directory structure and runs the engine on all documents that have not yet been treated. I&#8217;ve made it open source: </p>
<p><a href="http://www.mnsoft.org/547.0.html?&amp;cHash=9120c122ed&amp;tx_ttnewsbackPid=544&amp;tx_ttnewstt_news=30" rel="nofollow">http://www.mnsoft.org/547.0.html?&amp;cHash=9120c122ed&amp;tx_ttnewsbackPid=544&amp;tx_ttnewstt_news=30</a></p>
<p>and</p>
<p>pdfocrwrapper.sourceforge.net.</p>
<p>Tested on thousands of pages, works perfectly.</p>
<p>HTH,</p>
<p>M</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tad</title>
		<link>http://paperjammed.com/2010/01/04/automate-scansnap-ocr-process-on-your-mac-with-applescript-snow-leopard-edition/comment-page-1/#comment-10205</link>
		<dc:creator>Tad</dc:creator>
		<pubDate>Wed, 30 Jun 2010 02:24:09 +0000</pubDate>
		<guid isPermaLink="false">http://paperjammed.com/?p=840#comment-10205</guid>
		<description>Hi M,

This sounds like pure AppleScript, isn&#039;t it? Even if the apps don&#039;t provide any explicit AppleScript support, you can still have it drive the keyboard for you and automate the process.

The goal behind my own script was to allow FineReader to run unattended while I was sipping tea somewhere else, and it does that just swimmingly.

One point of difference: you are using FineReader Express 8, while I am working with FineReader for ScanSnap. If anything, I would assume yours provides more functionality.</description>
		<content:encoded><![CDATA[<p>Hi M,</p>
<p>This sounds like pure AppleScript, isn&#8217;t it? Even if the apps don&#8217;t provide any explicit AppleScript support, you can still have it drive the keyboard for you and automate the process.</p>
<p>The goal behind my own script was to allow FineReader to run unattended while I was sipping tea somewhere else, and it does that just swimmingly.</p>
<p>One point of difference: you are using FineReader Express 8, while I am working with FineReader for ScanSnap. If anything, I would assume yours provides more functionality.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: M</title>
		<link>http://paperjammed.com/2010/01/04/automate-scansnap-ocr-process-on-your-mac-with-applescript-snow-leopard-edition/comment-page-1/#comment-10054</link>
		<dc:creator>M</dc:creator>
		<pubDate>Fri, 25 Jun 2010 07:04:47 +0000</pubDate>
		<guid isPermaLink="false">http://paperjammed.com/?p=840#comment-10054</guid>
		<description>Hi Tad,

I am trying to make this work with ABBY FineReader Express 8 for Mac. Making some progress in that I can hand over to the program the PDF I want to work on; then I still have to tell the program &quot;do it&quot;, afterwards &quot;save&quot;; then next file.

Do you think there&#039;s a way to remote control the program entirely? I always want to use the same settings, but unattended.

Thanks,

M</description>
		<content:encoded><![CDATA[<p>Hi Tad,</p>
<p>I am trying to make this work with ABBY FineReader Express 8 for Mac. Making some progress in that I can hand over to the program the PDF I want to work on; then I still have to tell the program &#8220;do it&#8221;, afterwards &#8220;save&#8221;; then next file.</p>
<p>Do you think there&#8217;s a way to remote control the program entirely? I always want to use the same settings, but unattended.</p>
<p>Thanks,</p>
<p>M</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tad</title>
		<link>http://paperjammed.com/2010/01/04/automate-scansnap-ocr-process-on-your-mac-with-applescript-snow-leopard-edition/comment-page-1/#comment-8388</link>
		<dc:creator>Tad</dc:creator>
		<pubDate>Fri, 16 Apr 2010 01:20:46 +0000</pubDate>
		<guid isPermaLink="false">http://paperjammed.com/?p=840#comment-8388</guid>
		<description>Hi Mark,

No, in my home life the docs are few enough that I name and file them by hand. I imagine that some of the heavy hitters in the document management field will automatically populate keywords based on content, but then they also have an annoying habit of squirreling your documents away into the deep recesses of their black box.

With available command line tools it ought to be pretty easy to do some regular expression matches with the content to attempt to categorize and then name a document.
My main concern would be the inherent slushiness of the OCR process. How do you write a simple app that looks for errors like &quot;Citihank&quot; and &quot;Ciflbank&quot; and correctly interprets them as &quot;Citibank&quot;?

I guess if I were writing such an app, it would display thumbnails of documents along with their proposed names, and a list of &quot;undetermined&quot; docs. Then you could go down the list and approve or reject the filename changes. Certainly most would be correct, but one or two would be misnamed.</description>
		<content:encoded><![CDATA[<p>Hi Mark,</p>
<p>No, in my home life the docs are few enough that I name and file them by hand. I imagine that some of the heavy hitters in the document management field will automatically populate keywords based on content, but then they also have an annoying habit of squirreling your documents away into the deep recesses of their black box.</p>
<p>With available command line tools it ought to be pretty easy to do some regular expression matches with the content to attempt to categorize and then name a document.<br />
My main concern would be the inherent slushiness of the OCR process. How do you write a simple app that looks for errors like &#8220;Citihank&#8221; and &#8220;Ciflbank&#8221; and correctly interprets them as &#8220;Citibank&#8221;?</p>
<p>I guess if I were writing such an app, it would display thumbnails of documents along with their proposed names, and a list of &#8220;undetermined&#8221; docs. Then you could go down the list and approve or reject the filename changes. Certainly most would be correct, but one or two would be misnamed.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: mark</title>
		<link>http://paperjammed.com/2010/01/04/automate-scansnap-ocr-process-on-your-mac-with-applescript-snow-leopard-edition/comment-page-1/#comment-8364</link>
		<dc:creator>mark</dc:creator>
		<pubDate>Wed, 14 Apr 2010 21:43:03 +0000</pubDate>
		<guid isPermaLink="false">http://paperjammed.com/?p=840#comment-8364</guid>
		<description>Hi. Interesting post. I was wondering whether you then go on to name the files or just save them and subsequently find them using the OCR search. I&#039;m trying to come up with a method whereby after the file has been through the ocr process it is automatically named based on whether certain terms appear in the file itself. For example if a scan of a citibank bank statement contains the words citibank + statement, then the file would be saved into a particular folder as citibank_statement_todays date hence removing a lot lof the tedious manual naming.
I&#039;d appreciate any thoughts you have on the subject.
Thanks
Mark</description>
		<content:encoded><![CDATA[<p>Hi. Interesting post. I was wondering whether you then go on to name the files or just save them and subsequently find them using the OCR search. I&#8217;m trying to come up with a method whereby after the file has been through the ocr process it is automatically named based on whether certain terms appear in the file itself. For example if a scan of a citibank bank statement contains the words citibank + statement, then the file would be saved into a particular folder as citibank_statement_todays date hence removing a lot lof the tedious manual naming.<br />
I&#8217;d appreciate any thoughts you have on the subject.<br />
Thanks<br />
Mark</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tad</title>
		<link>http://paperjammed.com/2010/01/04/automate-scansnap-ocr-process-on-your-mac-with-applescript-snow-leopard-edition/comment-page-1/#comment-8168</link>
		<dc:creator>Tad</dc:creator>
		<pubDate>Fri, 02 Apr 2010 17:29:01 +0000</pubDate>
		<guid isPermaLink="false">http://paperjammed.com/?p=840#comment-8168</guid>
		<description>Actually, for low throughput purposes, the AppleScript that I put together to send docs to the ABBYY app works just fine. You don&#039;t have fine grained control over the OCR settings on a per doc basis, but it is definitely possible to have AppleScript feed documents to ABBYY.

If you are looking into a more substantial solution, for a small organization or business, you might be interested in talking with ABBYY about licensing their engine. This is what we did at work for a particular job: we bought a license to use their engine on a single Linux server for a specific number of pages per year.

What they provided was their SDK along with platform-specific binaries. We were able to compile a command line client to the SDK that worked wonders. We then used a bash script on a cron job to wrap the command line client, periodically checking a source directory for files and feeding them to the CLI.</description>
		<content:encoded><![CDATA[<p>Actually, for low throughput purposes, the AppleScript that I put together to send docs to the ABBYY app works just fine. You don&#8217;t have fine grained control over the OCR settings on a per doc basis, but it is definitely possible to have AppleScript feed documents to ABBYY.</p>
<p>If you are looking into a more substantial solution, for a small organization or business, you might be interested in talking with ABBYY about licensing their engine. This is what we did at work for a particular job: we bought a license to use their engine on a single Linux server for a specific number of pages per year.</p>
<p>What they provided was their SDK along with platform-specific binaries. We were able to compile a command line client to the SDK that worked wonders. We then used a bash script on a cron job to wrap the command line client, periodically checking a source directory for files and feeding them to the CLI.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tjebbe van Tijen</title>
		<link>http://paperjammed.com/2010/01/04/automate-scansnap-ocr-process-on-your-mac-with-applescript-snow-leopard-edition/comment-page-1/#comment-8167</link>
		<dc:creator>Tjebbe van Tijen</dc:creator>
		<pubDate>Fri, 02 Apr 2010 16:18:44 +0000</pubDate>
		<guid isPermaLink="false">http://paperjammed.com/?p=840#comment-8167</guid>
		<description>I am considering buying the ScanSnapz for the Mac, it bundles with ABBYY FinerReader and I am just testing this last piece of software for another workflow, whereby I want to use OCR on photographs of book-pages and big size screenshots of on-line digital facsimile (like Google Books). As I have a 30 inch monitor my screenshots of book pages can be rather good. The whole process is managed by Filemaker Pro (10) with several plug-ins. OCR from screenshots with ABBYY works surprisingly well. So now I want to script the steps of the process. As ABBYY does not have an Applescript functionality, but seems to have some scripting engine ( ABBYY FineReader Engine) I was wondering whether you have any experience with this... I also saw some mentioning of a Command Line Interface (CLI) maybe such command could be called from Applescript to run these in UNIX? Let me know if these are realistic tracks to follow... Thanks in advance</description>
		<content:encoded><![CDATA[<p>I am considering buying the ScanSnapz for the Mac, it bundles with ABBYY FinerReader and I am just testing this last piece of software for another workflow, whereby I want to use OCR on photographs of book-pages and big size screenshots of on-line digital facsimile (like Google Books). As I have a 30 inch monitor my screenshots of book pages can be rather good. The whole process is managed by Filemaker Pro (10) with several plug-ins. OCR from screenshots with ABBYY works surprisingly well. So now I want to script the steps of the process. As ABBYY does not have an Applescript functionality, but seems to have some scripting engine ( ABBYY FineReader Engine) I was wondering whether you have any experience with this&#8230; I also saw some mentioning of a Command Line Interface (CLI) maybe such command could be called from Applescript to run these in UNIX? Let me know if these are realistic tracks to follow&#8230; Thanks in advance</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tad</title>
		<link>http://paperjammed.com/2010/01/04/automate-scansnap-ocr-process-on-your-mac-with-applescript-snow-leopard-edition/comment-page-1/#comment-8029</link>
		<dc:creator>Tad</dc:creator>
		<pubDate>Thu, 25 Mar 2010 23:28:16 +0000</pubDate>
		<guid isPermaLink="false">http://paperjammed.com/?p=840#comment-8029</guid>
		<description>I guess I wasn&#039;t searching for the right thing when I tried to stop the icon from bouncing.
Anyway, many sites out there had the following instructions:

To stop all dock bouncing forever, type the following in a terminal window:

&lt;b&gt;defaults write com.apple.dock no-bouncing -bool TRUE&lt;/b&gt;

and then

&lt;b&gt;killall Dock&lt;/b&gt;

To reverse the process, do the same thing, but use &lt;b&gt;FALSE&lt;/b&gt; instead of &lt;b&gt;TRUE&lt;/b&gt;.</description>
		<content:encoded><![CDATA[<p>I guess I wasn&#8217;t searching for the right thing when I tried to stop the icon from bouncing.<br />
Anyway, many sites out there had the following instructions:</p>
<p>To stop all dock bouncing forever, type the following in a terminal window:</p>
<p><b>defaults write com.apple.dock no-bouncing -bool TRUE</b></p>
<p>and then</p>
<p><b>killall Dock</b></p>
<p>To reverse the process, do the same thing, but use <b>FALSE</b> instead of <b>TRUE</b>.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tad</title>
		<link>http://paperjammed.com/2010/01/04/automate-scansnap-ocr-process-on-your-mac-with-applescript-snow-leopard-edition/comment-page-1/#comment-6303</link>
		<dc:creator>Tad</dc:creator>
		<pubDate>Fri, 08 Jan 2010 22:10:57 +0000</pubDate>
		<guid isPermaLink="false">http://paperjammed.com/?p=840#comment-6303</guid>
		<description>Glad to hear it works for you!
And yes, I would expect that a couple text strings might need to be tweaked to go from one ScanSnap to the other.

As far as the quality, this is likely a consequence of settings in Finereader.
I launched &lt;strong&gt;FineReader for ScanSnap Preferences&lt;/strong&gt; and clicked on the &lt;strong&gt;Scan to Searchable PDF&lt;/strong&gt; tab.

On this tab I have selected the following:

Save mode: &lt;strong&gt;Text under page image&lt;/strong&gt;
Quality: &lt;strong&gt;High (for printing)&lt;/strong&gt;
Format: &lt;strong&gt;Automatic&lt;/strong&gt;

I imagine your Quality setting might be on one of the lower settings right now.</description>
		<content:encoded><![CDATA[<p>Glad to hear it works for you!<br />
And yes, I would expect that a couple text strings might need to be tweaked to go from one ScanSnap to the other.</p>
<p>As far as the quality, this is likely a consequence of settings in Finereader.<br />
I launched <strong>FineReader for ScanSnap Preferences</strong> and clicked on the <strong>Scan to Searchable PDF</strong> tab.</p>
<p>On this tab I have selected the following:</p>
<p>Save mode: <strong>Text under page image</strong><br />
Quality: <strong>High (for printing)</strong><br />
Format: <strong>Automatic</strong></p>
<p>I imagine your Quality setting might be on one of the lower settings right now.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
