Automate ScanSnap OCR process on your Mac with AppleScript (Snow Leopard Edition)
Monday, 4 January 2010
Some time back I published an AppleScript that allows one to automatically run OCR in the background on scanned files generated by your Fujitsu ScanSnap, while you to continue scanning more files. ScanSnap owners should all be familiar with this: the out-of-the-box configuration of the ScanSnap Manager and Abbyy Finereader force the scan and OCR stages to run in lockstep: scan 1…OCR 1…scan 2…OCR 2… and so on. This script allowed you to scan regardless of the OCR processing going on.
As it turns out, my original script does not work in Snow Leopard, and I promised that I would one day clean up and publish my new and improved version.
Chris posted a comment today as a gentle reminder, so here is the new and improved version without further delay…
The Details
Unfortunately, Snow Leopard came around and caused some indigestion. For starters, the ScanSnap Manager didn’t work correctly and Abbyy Finereader would not process anything made by the ScanSnap. A couple of months later they got everything straightened out and delivered new versions of each product.
The new version of the Abbyy Finereader product does not play well with my original script.
Since I cannot do without this important functionality, I rolled up my sleeves and rewrote most of the script. The new version works in Snow Leopard quite nicely with one small annoyance: you really don’t want to try to use the machine for anything other than scanning or OCR while it is going because the new Finereader version keeps bouncing the darned icon all the time it is running and that is quite annoying to watch.
Fortunately, I really don’t need to use my machine for anything else while it is chewing on the docs; I just wanted to be able to continue scanning at the same time!
Note: Before going forward, note that you will need to upgrade the ScanSnap Manager and Abbyy Finereader to the Snow Leopard versions first! Get the files here.
Here is a link to the new script…
And here’s the code itself:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 | (* NOTE: This script was written for Snow Leopard. It may work on Leopard, but I never tried it. This is a folder listener script that will act as a queue, receiving PDF files from the ScanSnap scanner and feeding them, one by one, to the Abbyy FineReader OCR software. This allows you to keep scanning while the OCR job runs in the background on all of the unprocessed files. Why do we want to do this? The ScanSnap Manager software does not support this by default, so when you scan in a file, it sends it to FineReader for OCR. You then must wait until FineReader finishes its work before scanning in another document. This script allows you to keep scanning without waiting for OCR. Installation: o Copy this script to: <home>/Library/Scripts/Folder Action Scripts You may have to create the "Folder Action Scripts" folder. o Open a Finder window and navigate to the parent folder of the scanned documents folder. o Right click (control-click) the scanned documents folder and choose: Folder Actions Setup... o At this point if folder actions are not enabled, you will likely have to enable them and add the script manually. - check "Enable Folder Actions" - Use the "+" buttons on the left and right sides to add the scan folder and then this script. o Otherwise, a list of scripts will come up. Choose this script from the "Choose a Script to Attach" dialog. o Close all windows. Copyright (C) 2010 Tad Harrison *) property ocrFileSuffix : " processed by FineReader.pdf" property ocrApplicationName : "Scan to Searchable PDF" property ocrApplicationWindow : "Converting the document" property ocrLockFileName : "OCR in Progress" on adding folder items to this_folder after receiving added_items set lockFilePath to (POSIX path of (path to desktop folder as text)) & ocrLockFileName try logEvent("=== Run OCR on New Folder Items ===") -- Test for lockfile; exit if lockfile exists tell application "System Events" to set lockFileExists to exists file lockFilePath if lockFileExists then logEvent("Other script running. Exiting...") return else do shell script "/usr/bin/touch \"" & lockFilePath & "\"" end if -- Main loop set moreWorkToDo to true repeat while moreWorkToDo set aFile to getNextFile(this_folder) if not aFile = "" then ocrFile(aFile) else set moreWorkToDo to false end if end repeat logEvent("No more work.") exitApp(ocrApplicationName) on error errorStr number errNum display dialog "Error " & errNum & " while running OCR: " & errorStr set my isRunning to false end try -- Get rid of the lockfile, ignoring any errors try do shell script "/bin/rm \"" & lockFilePath & "\"" end try end adding folder items to (* Name: ocrFile Description: Runs OCR on the next un-OCR'd file Parameters: aFile - the file to be OCR'd *) on ocrFile(aFile) set posixFilePath to POSIX path of aFile set posixOcrFilePath to getPosixOcrFilePath(posixFilePath) logEvent("OCR: " & posixFilePath) tell application ocrApplicationName to open aFile -- -- Now sit in a loop checking once per second for the OCR file -- Give up after five minutes -- with timeout of 300 seconds set ocrFileExists to false repeat until ocrFileExists set ocrFileExists to posixFileExists(posixOcrFilePath) if ocrFileExists then logEvent("OCR file generated.") -- Wait 5 even if the file was found, to let things settle delay 5 else -- Wait a second before checking again delay 1 end if end repeat end timeout end ocrFile (* Name: appIsRunning Description: Determines if a particular application is running. Parameters: appName - the name of the application to be tested Returns: True if the application is running; otherwise False *) on appIsRunning(appName) tell application "System Events" to (name of processes) contains appName end appIsRunning (* Name: posixFileExists Description: Determines if a particular file exists. Parameters: posixFilePath - the POSIX path to the file Returns: True if the file exists; otherwise False *) on posixFileExists(posixFilePath) tell application "System Events" to exists file posixFilePath end posixFileExists (* Name: exitApp Description: Exits the specified app if it is running. Parameters: appName - the application name *) on exitApp(appName) if appIsRunning(appName) then tell application appName to quit end if end exitApp (* Name: getPosixOcrFilePath Description: Gets the OCR output filename for a given input filename. Parameters: posixFilePath - the full path to the source file Return: the POSIX path of the OCR output file *) on getPosixOcrFilePath(posixFilePath) set posixBaseName to do shell script ¬ "filename=" & quoted form of posixFilePath & "; echo ${filename%\\.*}" set posixOcrFilePath to posixBaseName & ocrFileSuffix return posixOcrFilePath end getPosixOcrFilePath (* Name: getNextFile Description: Finds the next unprocessed ScanSnap PDF Return: the file or "" *) on getNextFile(aFolder) logEvent("Getting next file...") set masterFileList to list folder aFolder ¬ without invisibles set posixPath to POSIX path of aFolder repeat with i from 1 to count masterFileList set fileName to item i of masterFileList set posixFilePath to posixPath & fileName log posixFilePath -- -- Construct a FineReader file name from our file -- set posixOcrFilePath to getPosixOcrFilePath(posixFilePath) -- -- See if the FineReader file we constructed exists -- set ocrFileExists to posixFileExists(posixOcrFilePath) tell me to set fileCreator to getSpotlightInfo for "kMDItemCreator" from posixFilePath log ("Creator: " & fileCreator) if not ocrFileExists and fileCreator = "ScanSnap Manager" then return POSIX file posixFilePath end if end repeat return "" end getNextFile (* Name: getSpotlightInfo Description: Gets a named attribute from metadata for a specific file. Parameters: for myattribute - the name of the attribute from myfile - the name of the file Returns: the attribute value or "" if none found *) on getSpotlightInfo for myattribute from myfile try set this_kMDItemResult to "" tell application "Finder" set this_item to myfile as string set this_item to POSIX path of this_item set this_kMDItem to myattribute set theResult to words of (do shell script "/usr/bin/mdls -name " & this_kMDItem & " -raw -nullMarker None " & quoted form of this_item) log "Result: " & theResult as string repeat with j from 1 to number of items in theResult set this_kMDItemResult to this_kMDItemResult & item j of theResult as string if j < number of items in theResult then set this_kMDItemResult to this_kMDItemResult & " " end if end repeat end tell on error set this_kMDItemResult to "" end try return this_kMDItemResult end getSpotlightInfo (* Name: logEvent Description: Write an event to an event log Parameters: themessage - the message to write to the log *) on logEvent(themessage) set theLine to (do shell script ¬ "date +'%Y-%m-%d %H:%M:%S'" as string) ¬ & " " & themessage do shell script "echo " & theLine & ¬ " >> ~/Library/Logs/AppleScript-events.log" end logEvent |
Installation
- Use the Script Editor to save this script as Run OCR on New Folder Items under User Home/Library/Scripts/Folder Action Scripts
You may have to create the Folder Action Scripts folder. - Now open a Finder window and navigate to the parent folder of your scanned documents folder.
- Right click (control-click) the scanned documents folder and choose Folder Actions Setup…
- At this point if folder actions are not enabled, you will likely have to enable them and add the script manually.
- Check Enable Folder Actions
- Use the “+” buttons on the left and right sides to add the scan folder and then this script.
- Otherwise, a list of scripts will come up. Choose this script from the Choose a Script to Attach dialog.
- Close all windows.
That’s it! The script will be invoked automatically every time a new file appears in your scanned documents folder.
Please let me know if you have any ideas that can improve this script. I’m not an AppleScript guru, so someone might just know how to keep that annoying Finereader icon from jumping.



No. 1 — January 4th, 2010 at 8:55 pm
[...] Update: The script on this page works only with Leopard (10.5). Get the Snow Leopard version here [...]
No. 2 — January 7th, 2010 at 2:32 am
Excellent script!
I had to change the string “ScanSnap Manager” to “ScanSnap Manager S1500M” to get it working, but now it is working like a charm.
The only thing I noticed is the quality of the output of the PDF’s from Finereader. It doesn’t look as good as the original scan. Do you have any ideas how to improve that?
No. 3 — January 8th, 2010 at 5:10 pm
Glad to hear it works for you!
And yes, I would expect that a couple text strings might need to be tweaked to go from one ScanSnap to the other.
As far as the quality, this is likely a consequence of settings in Finereader.
I launched FineReader for ScanSnap Preferences and clicked on the Scan to Searchable PDF tab.
On this tab I have selected the following:
Save mode: Text under page image
Quality: High (for printing)
Format: Automatic
I imagine your Quality setting might be on one of the lower settings right now.