OT: Help getting PDF to OCR or searchable form

Laura Brody laura.k.brody at gmail.com
Mon Sep 9 19:44:44 PDT 2019


Yes, I see that. Now that I know that PDFsandwhich and tesseract will run
on the Raspberry Pi and do what I need, I have a clear idea what I need to
do to get searchable PDFs out of the files that I have. Thank you for
pointing me in the right direction. You saved me a boatload of time and
aggravation.

Laura Brody

On Mon, Sep 9, 2019 at 10:38 PM Cesar Baquerizo <ces at cescom.com> wrote:

> Yw. You’ll also need tesseract. They are two different Sw. Let me know how
> it goes.
>
> Get Outlook for iOS <https://aka.ms/o0ukef>
>
> ------------------------------
> *From:* Laura Brody <laura.k.brody at gmail.com>
> *Sent:* Monday, September 9, 2019 10:35 PM
> *To:* Cesar Baquerizo; Filepro_List
> *Subject:* Re: OT: Help getting PDF to OCR or searchable form
>
> I found a list of Linux flavors that PDFsandwhich has been ported to and
> Raspberrian Linux was on the list!
>
> I will be be working on this project tomorrow. Thank you so much for this
> lead. I don't think I would have found it by myself.
>
> Laura Brody
>
> On Mon, Sep 9, 2019 at 10:27 PM Laura Brody <laura.k.brody at gmail.com>
> wrote:
>
>> This is very interesting.
>>
>> The only Linux box I have running at the moment is Raspberry Pi 3 B+. I
>> have 64GB SD card available, so space isn't an issue. Any idea if it will
>> work on it?
>>
>> Laura Brody
>>
>> On Mon, Sep 9, 2019 at 9:54 PM Cesar Baquerizo <ces at cescom.com> wrote:
>>
>>> Lookup Tesseract and Pdfsandwich. It may help you.
>>>
>>> Get Outlook for iOS <https://aka.ms/o0ukef>
>>>
>>> ------------------------------
>>> *From:* Filepro-list <filepro-list-bounces+ces=
>>> cescom.com at lists.celestial.com> on behalf of Laura Brody via
>>> Filepro-list <filepro-list at lists.celestial.com>
>>> *Sent:* Monday, September 9, 2019 9:50 PM
>>> *To:* Filepro_List
>>> *Cc:* Laura Brody
>>> *Subject:* Re: OT: Help getting PDF to OCR or searchable form
>>>
>>> Additional information....
>>>
>>> I talked to the user and got some history...
>>>
>>> The user scanned in legal documents. Saved the images as pages in a PDF.
>>> That is why I can't search on keywords for most of the files. A few
>>> files
>>> were typed up and then exported as PDF. most are images of the pages.
>>> That
>>> means that OCR has to be part of the solution.
>>>
>>> I discovered that Adobe Acobat Reader has a setting to search all PDFs
>>> in a
>>> directory for keywords. The problem is that these files don't contain
>>> text.
>>> They contain images of text. Adobe can't search images and find
>>> keywords.
>>>
>>> Laura Brody
>>>
>>> On Mon, Sep 9, 2019 at 8:03 PM Laura Brody <laura.k.brody at gmail.com>
>>> wrote:
>>>
>>> > I am hoping that one of you has solved this problem before.....
>>> >
>>> > I have over a thousand pages of text in a dozen or so PDF files. Most
>>> > files are "read-only" and I can not do Ctrl-F to search for keywords.
>>> I
>>> > would like to be able to OCR the files and put everything into one
>>> file
>>> > that is searchable. Or is there a utility that will search all of the
>>> PDFs
>>> > in a directory for a keyword?
>>> >
>>> > Suggestions anyone?
>>> >
>>> > Laura Brody
>>> >
>>> -------------- next part --------------
>>> An HTML attachment was scrubbed...
>>> URL: <
>>> http://mailman.celestial.com/pipermail/filepro-list/attachments/20190909/935e0f40/attachment.html>
>>>
>>> _______________________________________________
>>> Filepro-list mailing list
>>> Filepro-list at lists.celestial.com
>>> Subscribe/Unsubscribe/Subscription Changes
>>> http://mailman.celestial.com/mailman/listinfo/filepro-list
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.celestial.com/pipermail/filepro-list/attachments/20190909/b6a2140e/attachment.html>


More information about the Filepro-list mailing list