OT: Help getting PDF to OCR or searchable form

Cesar Baquerizo ces at cescom.com
Mon Sep 16 09:42:41 PDT 2019


Good to hear. Glad I could give back.

On 9/16/2019 11:27 AM, Laura Brody wrote:
> Just an update.... I got it all working on a Raspberry Pi 3 B+ with a 
> 32 GB micro SD chip. Debian Linux. I wrote a script to upload the 
> processed file to Dropbox automatically. It is happily working on 
> files and will probably be done in a week or less (it has about 55 
> files to process, some with 2-6 pages, but most with 50-130 pages).
>
> All of the software was free. I already had a few Raspberry Pi boards, 
> so my only investment was my time.
>
> Thank you so much for pointing me in the right direction. Left to my 
> own devices, I would still be researching how to tackle this project.
>
> Laura Brody
>
> On Mon, Sep 9, 2019 at 10:44 PM Laura Brody <laura.k.brody at gmail.com 
> <mailto:laura.k.brody at gmail.com>> wrote:
>
>     Yes, I see that. Now that I know that PDFsandwhich and tesseract
>     will run on the Raspberry Pi and do what I need, I have a clear
>     idea what I need to do to get searchable PDFs out of the files
>     that I have. Thank you for pointing me in the right direction. You
>     saved me a boatload of time and aggravation.
>
>     Laura Brody
>
>     On Mon, Sep 9, 2019 at 10:38 PM Cesar Baquerizo <ces at cescom.com
>     <mailto:ces at cescom.com>> wrote:
>
>         Yw. You’ll also need tesseract. They are two different Sw. Let
>         me know how it goes.
>
>         Get Outlook for iOS <https://aka.ms/o0ukef>
>         ------------------------------------------------------------------------
>         *From:* Laura Brody <laura.k.brody at gmail.com
>         <mailto:laura.k.brody at gmail.com>>
>         *Sent:* Monday, September 9, 2019 10:35 PM
>         *To:* Cesar Baquerizo; Filepro_List
>         *Subject:* Re: OT: Help getting PDF to OCR or searchable form
>         I found a list of Linux flavors that PDFsandwhich has been
>         ported to and Raspberrian Linux was on the list!
>
>         I will be be working on this project tomorrow. Thank you so
>         much for this lead. I don't think I would have found it by myself.
>
>         Laura Brody
>
>         On Mon, Sep 9, 2019 at 10:27 PM Laura Brody
>         <laura.k.brody at gmail.com <mailto:laura.k.brody at gmail.com>> wrote:
>
>             This is very interesting.
>
>             The only Linux box I have running at the moment is
>             Raspberry Pi 3 B+. I have 64GB SD card available, so space
>             isn't an issue. Any idea if it will work on it?
>
>             Laura Brody
>
>             On Mon, Sep 9, 2019 at 9:54 PM Cesar Baquerizo
>             <ces at cescom.com <mailto:ces at cescom.com>> wrote:
>
>                 Lookup Tesseract and Pdfsandwich. It may help you.
>
>                 Get Outlook for iOS <https://aka.ms/o0ukef>
>                 ------------------------------------------------------------------------
>                 *From:* Filepro-list
>                 <filepro-list-bounces+ces=cescom.com at lists.celestial.com
>                 <mailto:cescom.com at lists.celestial.com>> on behalf of
>                 Laura Brody via Filepro-list
>                 <filepro-list at lists.celestial.com
>                 <mailto:filepro-list at lists.celestial.com>>
>                 *Sent:* Monday, September 9, 2019 9:50 PM
>                 *To:* Filepro_List
>                 *Cc:* Laura Brody
>                 *Subject:* Re: OT: Help getting PDF to OCR or
>                 searchable form
>                 Additional information....
>
>                 I talked to the user and got some history...
>
>                 The user scanned in legal documents. Saved the images
>                 as pages in a PDF.
>                 That is why I can't search on keywords for most of the
>                 files. A few files
>                 were typed up and then exported as PDF. most are
>                 images of the pages. That
>                 means that OCR has to be part of the solution.
>
>                 I discovered that Adobe Acobat Reader has a setting to
>                 search all PDFs in a
>                 directory for keywords. The problem is that these
>                 files don't contain text.
>                 They contain images of text. Adobe can't search images
>                 and find keywords.
>
>                 Laura Brody
>
>                 On Mon, Sep 9, 2019 at 8:03 PM Laura Brody
>                 <laura.k.brody at gmail.com
>                 <mailto:laura.k.brody at gmail.com>> wrote:
>
>                 > I am hoping that one of you has solved this problem
>                 before.....
>                 >
>                 > I have over a thousand pages of text in a dozen or
>                 so PDF files. Most
>                 > files are "read-only" and I can not do Ctrl-F to
>                 search for keywords. I
>                 > would like to be able to OCR the files and put
>                 everything into one file
>                 > that is searchable. Or is there a utility that will
>                 search all of the PDFs
>                 > in a directory for a keyword?
>                 >
>                 > Suggestions anyone?
>                 >
>                 > Laura Brody
>                 >
>                 -------------- next part --------------
>                 An HTML attachment was scrubbed...
>                 URL:
>                 <http://mailman.celestial.com/pipermail/filepro-list/attachments/20190909/935e0f40/attachment.html>
>
>                 _______________________________________________
>                 Filepro-list mailing list
>                 Filepro-list at lists.celestial.com
>                 <mailto:Filepro-list at lists.celestial.com>
>                 Subscribe/Unsubscribe/Subscription Changes
>                 http://mailman.celestial.com/mailman/listinfo/filepro-list
>
>
-- 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.celestial.com/pipermail/filepro-list/attachments/20190916/5008226a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ces_logo.png
Type: image/png
Size: 20179 bytes
Desc: not available
URL: <http://mailman.celestial.com/pipermail/filepro-list/attachments/20190916/5008226a/attachment.png>


More information about the Filepro-list mailing list