OT: Help getting PDF to OCR or searchable form

Laura Brody laura.k.brody at gmail.com
Mon Sep 16 08:27:59 PDT 2019


 Just an update.... I got it all working on a Raspberry Pi 3 B+ with a 32
GB micro SD chip. Debian Linux. I wrote a script to upload the processed
file to Dropbox automatically. It is happily working on files and will
probably be done in a week or less (it has about 55 files to process, some
with 2-6 pages, but most with 50-130 pages).

All of the software was free. I already had a few Raspberry Pi boards, so
my only investment was my time.

Thank you so much for pointing me in the right direction. Left to my own
devices, I would still be researching how to tackle this project.

Laura Brody

On Mon, Sep 9, 2019 at 10:44 PM Laura Brody <laura.k.brody at gmail.com> wrote:

> Yes, I see that. Now that I know that PDFsandwhich and tesseract will run
> on the Raspberry Pi and do what I need, I have a clear idea what I need to
> do to get searchable PDFs out of the files that I have. Thank you for
> pointing me in the right direction. You saved me a boatload of time and
> aggravation.
>
> Laura Brody
>
> On Mon, Sep 9, 2019 at 10:38 PM Cesar Baquerizo <ces at cescom.com> wrote:
>
>> Yw. You’ll also need tesseract. They are two different Sw. Let me know
>> how it goes.
>>
>> Get Outlook for iOS <https://aka.ms/o0ukef>
>>
>> ------------------------------
>> *From:* Laura Brody <laura.k.brody at gmail.com>
>> *Sent:* Monday, September 9, 2019 10:35 PM
>> *To:* Cesar Baquerizo; Filepro_List
>> *Subject:* Re: OT: Help getting PDF to OCR or searchable form
>>
>> I found a list of Linux flavors that PDFsandwhich has been ported to and
>> Raspberrian Linux was on the list!
>>
>> I will be be working on this project tomorrow. Thank you so much for this
>> lead. I don't think I would have found it by myself.
>>
>> Laura Brody
>>
>> On Mon, Sep 9, 2019 at 10:27 PM Laura Brody <laura.k.brody at gmail.com>
>> wrote:
>>
>>> This is very interesting.
>>>
>>> The only Linux box I have running at the moment is Raspberry Pi 3 B+. I
>>> have 64GB SD card available, so space isn't an issue. Any idea if it will
>>> work on it?
>>>
>>> Laura Brody
>>>
>>> On Mon, Sep 9, 2019 at 9:54 PM Cesar Baquerizo <ces at cescom.com> wrote:
>>>
>>>> Lookup Tesseract and Pdfsandwich. It may help you.
>>>>
>>>> Get Outlook for iOS <https://aka.ms/o0ukef>
>>>>
>>>> ------------------------------
>>>> *From:* Filepro-list <filepro-list-bounces+ces=
>>>> cescom.com at lists.celestial.com> on behalf of Laura Brody via
>>>> Filepro-list <filepro-list at lists.celestial.com>
>>>> *Sent:* Monday, September 9, 2019 9:50 PM
>>>> *To:* Filepro_List
>>>> *Cc:* Laura Brody
>>>> *Subject:* Re: OT: Help getting PDF to OCR or searchable form
>>>>
>>>> Additional information....
>>>>
>>>> I talked to the user and got some history...
>>>>
>>>> The user scanned in legal documents. Saved the images as pages in a
>>>> PDF.
>>>> That is why I can't search on keywords for most of the files. A few
>>>> files
>>>> were typed up and then exported as PDF. most are images of the pages.
>>>> That
>>>> means that OCR has to be part of the solution.
>>>>
>>>> I discovered that Adobe Acobat Reader has a setting to search all PDFs
>>>> in a
>>>> directory for keywords. The problem is that these files don't contain
>>>> text.
>>>> They contain images of text. Adobe can't search images and find
>>>> keywords.
>>>>
>>>> Laura Brody
>>>>
>>>> On Mon, Sep 9, 2019 at 8:03 PM Laura Brody <laura.k.brody at gmail.com>
>>>> wrote:
>>>>
>>>> > I am hoping that one of you has solved this problem before.....
>>>> >
>>>> > I have over a thousand pages of text in a dozen or so PDF files. Most
>>>> > files are "read-only" and I can not do Ctrl-F to search for keywords.
>>>> I
>>>> > would like to be able to OCR the files and put everything into one
>>>> file
>>>> > that is searchable. Or is there a utility that will search all of the
>>>> PDFs
>>>> > in a directory for a keyword?
>>>> >
>>>> > Suggestions anyone?
>>>> >
>>>> > Laura Brody
>>>> >
>>>> -------------- next part --------------
>>>> An HTML attachment was scrubbed...
>>>> URL: <
>>>> http://mailman.celestial.com/pipermail/filepro-list/attachments/20190909/935e0f40/attachment.html>
>>>>
>>>> _______________________________________________
>>>> Filepro-list mailing list
>>>> Filepro-list at lists.celestial.com
>>>> Subscribe/Unsubscribe/Subscription Changes
>>>> http://mailman.celestial.com/mailman/listinfo/filepro-list
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.celestial.com/pipermail/filepro-list/attachments/20190916/92736863/attachment.html>


More information about the Filepro-list mailing list