Automate ScanSnap OCR process on your Mac with AppleScript (Snow Leopard Edition)

Some time back I published an AppleScript that allows one to automatically run OCR in the background on scanned files generated by your Fujitsu ScanSnap, while you to continue scanning more files. ScanSnap owners should all be familiar with this: the out-of-the-box configuration of the ScanSnap Manager and Abbyy Finereader force the scan and OCR stages to run in lockstep: scan 1…OCR 1…scan 2…OCR 2… and so on. This script allowed you to scan regardless of the OCR processing going on.

As it turns out, my original script does not work in Snow Leopard, and I promised that I would one day clean up and publish my new and improved version.

Chris posted a comment today as a gentle reminder, so here is the new and improved version without further delay…

The Details

Unfortunately, Snow Leopard came around and caused some indigestion. For starters, the ScanSnap Manager didn’t work correctly and Abbyy Finereader would not process anything made by the ScanSnap. A couple of months later they got everything straightened out and delivered new versions of each product.

The new version of the Abbyy Finereader product does not play well with my original script.

Since I cannot do without this important functionality, I rolled up my sleeves and rewrote most of the script. The new version works in Snow Leopard quite nicely with one small annoyance: you really don’t want to try to use the machine for anything other than scanning or OCR while it is going because the new Finereader version keeps bouncing the darned icon all the time it is running and that is quite annoying to watch.

Fortunately, I really don’t need to use my machine for anything else while it is chewing on the docs; I just wanted to be able to continue scanning at the same time!

Note: Before going forward, note that you will need to upgrade the ScanSnap Manager and Abbyy Finereader to the Snow Leopard versions first! Get the files here.

Here is a link to the new script

And here’s the code itself:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
(*

NOTE: This script was written for Snow Leopard. It may work
on Leopard, but I never tried it.

This is a folder listener script that will act as a queue, receiving
PDF files from the ScanSnap scanner and feeding them, one by one, to
the Abbyy FineReader OCR software.

This allows you to keep scanning while the OCR job runs in the background
on all of the unprocessed files.

Why do we want to do this?

The ScanSnap Manager software does not support this by default, so
when you scan in a file, it sends it to FineReader for OCR. You then
must wait until FineReader finishes its work before scanning in another
document.

This script allows you to keep scanning without waiting for OCR.

Installation:

o   Copy this script to:

    <home>/Library/Scripts/Folder Action Scripts

    You may have to create the "Folder Action Scripts" folder.

o   Open a Finder window and navigate to the parent folder
  of the scanned documents folder.

o Right click (control-click) the scanned documents folder and
  choose:

    Folder Actions Setup...

o At this point if folder actions are not enabled, you will
  likely have to enable them and add the script manually.
    - check "Enable Folder Actions"
    - Use the "+" buttons on the left and right sides to add the
      scan folder and then this script.
   
o Otherwise, a list of scripts will come up. Choose this script
  from the "Choose a Script to Attach" dialog.

o Close all windows.

Copyright (C) 2010 Tad Harrison
*)

property ocrFileSuffix : " processed by FineReader.pdf"
property ocrApplicationName : "Scan to Searchable PDF"
property ocrApplicationWindow : "Converting the document"
property ocrLockFileName : "OCR in Progress"
on adding folder items to this_folder after receiving added_items
    set lockFilePath to (POSIX path of (path to desktop folder as text)) & ocrLockFileName
    try
        logEvent("=== Run OCR on New Folder Items ===")
        -- Test for lockfile; exit if lockfile exists
        tell application "System Events" to set lockFileExists to exists file lockFilePath
        if lockFileExists then
            logEvent("Other script running. Exiting...")
            return
        else
            do shell script "/usr/bin/touch \"" & lockFilePath & "\""
        end if
        -- Main loop
        set moreWorkToDo to true
        repeat while moreWorkToDo
            set aFile to getNextFile(this_folder)
            if not aFile = "" then
                ocrFile(aFile)
            else
                set moreWorkToDo to false
            end if
        end repeat
        logEvent("No more work.")
        exitApp(ocrApplicationName)
    on error errorStr number errNum
        display dialog "Error " & errNum & " while running OCR: " & errorStr
        set my isRunning to false
    end try
    -- Get rid of the lockfile, ignoring any errors
    try
        do shell script "/bin/rm \"" & lockFilePath & "\""
    end try
end adding folder items to
(*
Name: ocrFile
Description: Runs OCR on the next un-OCR'd file
Parameters:
  aFile - the file to be OCR'd
*)

on ocrFile(aFile)
    set posixFilePath to POSIX path of aFile
    set posixOcrFilePath to getPosixOcrFilePath(posixFilePath)
    logEvent("OCR: " & posixFilePath)
    tell application ocrApplicationName to open aFile
    --
    -- Now sit in a loop checking once per second for the OCR file
    -- Give up after five minutes
    --
    with timeout of 300 seconds
        set ocrFileExists to false
        repeat until ocrFileExists
            set ocrFileExists to posixFileExists(posixOcrFilePath)
            if ocrFileExists then
                logEvent("OCR file generated.")
                -- Wait 5 even if the file was found, to let things settle
                delay 5
            else
                -- Wait a second before checking again
                delay 1
            end if
        end repeat
    end timeout
end ocrFile
(*
Name: appIsRunning
Description: Determines if a particular application is running.
Parameters:
    appName - the name of the application to be tested
Returns: True if the application is running; otherwise False
*)

on appIsRunning(appName)
    tell application "System Events" to (name of processes) contains appName
end appIsRunning
(*
Name: posixFileExists
Description: Determines if a particular file exists.
Parameters:
    posixFilePath - the POSIX path to the file
Returns: True if the file exists; otherwise False
*)

on posixFileExists(posixFilePath)
    tell application "System Events" to exists file posixFilePath
end posixFileExists
(*
Name: exitApp
Description: Exits the specified app if it is running.
Parameters:
    appName - the application name
*)

on exitApp(appName)
    if appIsRunning(appName) then
        tell application appName to quit
    end if
end exitApp
(*
Name: getPosixOcrFilePath
Description: Gets the OCR output filename for a given input filename.
Parameters:
    posixFilePath - the full path to the source file
Return: the POSIX path of the OCR output file
*)

on getPosixOcrFilePath(posixFilePath)
    set posixBaseName to do shell script ¬
        "filename=" & quoted form of posixFilePath & "; echo ${filename%\\.*}"
    set posixOcrFilePath to posixBaseName & ocrFileSuffix
    return posixOcrFilePath
end getPosixOcrFilePath
(*
Name: getNextFile
Description: Finds the next unprocessed ScanSnap PDF
Return: the file or ""
*)

on getNextFile(aFolder)
    logEvent("Getting next file...")
    set masterFileList to list folder aFolder ¬
        without invisibles
    set posixPath to POSIX path of aFolder
    repeat with i from 1 to count masterFileList
        set fileName to item i of masterFileList
        set posixFilePath to posixPath & fileName
        log posixFilePath
        --
        -- Construct a FineReader file name from our file
        --
        set posixOcrFilePath to getPosixOcrFilePath(posixFilePath)
        --
        -- See if the FineReader file we constructed exists
        --
        set ocrFileExists to posixFileExists(posixOcrFilePath)
        tell me to set fileCreator to getSpotlightInfo for "kMDItemCreator" from posixFilePath
        log ("Creator: " & fileCreator)
        if not ocrFileExists and fileCreator = "ScanSnap Manager" then
            return POSIX file posixFilePath
        end if
    end repeat
    return ""
end getNextFile
(*
Name: getSpotlightInfo
Description: Gets a named attribute from metadata for a specific file.
Parameters:
    for myattribute - the name of the attribute
    from myfile - the name of the file
Returns: the attribute value or "" if none found
*)

on getSpotlightInfo for myattribute from myfile
    try
        set this_kMDItemResult to ""
       
        tell application "Finder"
            set this_item to myfile as string
            set this_item to POSIX path of this_item
            set this_kMDItem to myattribute
            set theResult to words of (do shell script "/usr/bin/mdls -name " & this_kMDItem & " -raw -nullMarker None " & quoted form of this_item)
            log "Result: " & theResult as string
            repeat with j from 1 to number of items in theResult
                set this_kMDItemResult to this_kMDItemResult & item j of theResult as string
                if j < number of items in theResult then
                    set this_kMDItemResult to this_kMDItemResult & " "
                end if
            end repeat
        end tell
    on error
        set this_kMDItemResult to ""
    end try
    return this_kMDItemResult
end getSpotlightInfo
(*
Name: logEvent
Description: Write an event to an event log
Parameters:
    themessage - the message to write to the log
*)

on logEvent(themessage)
    set theLine to (do shell script ¬
        "date  +'%Y-%m-%d %H:%M:%S'" as string) ¬
        & " " & themessage
    do shell script "echo " & theLine & ¬
        " >> ~/Library/Logs/AppleScript-events.log"
end logEvent

Installation

  • Use the Script Editor to save this script as Run OCR on New Folder Items under User Home/Library/Scripts/Folder Action Scripts
    You may have to create the Folder Action Scripts folder.
  • Now open a Finder window and navigate to the parent folder of your scanned documents folder.
  • Right click (control-click) the scanned documents folder and choose Folder Actions Setup…
  • At this point if folder actions are not enabled, you will likely have to enable them and add the script manually.
    • Check Enable Folder Actions
    • Use the “+” buttons on the left and right sides to add the scan folder and then this script.
  • Otherwise, a list of scripts will come up. Choose this script from the Choose a Script to Attach dialog.
  • Close all windows.

That’s it! The script will be invoked automatically every time a new file appears in your scanned documents folder.

Please let me know if you have any ideas that can improve this script. I’m not an AppleScript guru, so someone might just know how to keep that annoying Finereader icon from jumping.

3 Responses to “Automate ScanSnap OCR process on your Mac with AppleScript (Snow Leopard Edition)”

  1. Automate ScanSnap OCR process on your Mac with AppleScript | Paper Jammed writes:

    [...] Update: The script on this page works only with Leopard (10.5). Get the Snow Leopard version here [...]

  2. Rene Nederhand writes:

    Excellent script!

    I had to change the string “ScanSnap Manager” to “ScanSnap Manager S1500M” to get it working, but now it is working like a charm.

    The only thing I noticed is the quality of the output of the PDF’s from Finereader. It doesn’t look as good as the original scan. Do you have any ideas how to improve that?

  3. Tad writes:

    Glad to hear it works for you!
    And yes, I would expect that a couple text strings might need to be tweaked to go from one ScanSnap to the other.

    As far as the quality, this is likely a consequence of settings in Finereader.
    I launched FineReader for ScanSnap Preferences and clicked on the Scan to Searchable PDF tab.

    On this tab I have selected the following:

    Save mode: Text under page image
    Quality: High (for printing)
    Format: Automatic

    I imagine your Quality setting might be on one of the lower settings right now.

Leave a Reply