OT: Help getting PDF to OCR or searchable form

Cesar Baquerizo ces at cescom.com
Mon Sep 9 19:33:06 PDT 2019


It should work. Review the Alice.pdf example at the pdfsandwich site and that may answer your question. I think it matches your use case. It takes an image, ocr’s it and adds the text for searching, etc.

Get Outlook for iOS<https://aka.ms/o0ukef>

________________________________
From: Laura Brody <laura.k.brody at gmail.com>
Sent: Monday, September 9, 2019 10:27 PM
To: Cesar Baquerizo
Cc: laura at hvcomputer.com; Filepro_List
Subject: Re: OT: Help getting PDF to OCR or searchable form

This is very interesting.

The only Linux box I have running at the moment is Raspberry Pi 3 B+. I have 64GB SD card available, so space isn't an issue. Any idea if it will work on it?

Laura Brody

On Mon, Sep 9, 2019 at 9:54 PM Cesar Baquerizo <ces at cescom.com<mailto:ces at cescom.com>> wrote:
Lookup Tesseract and Pdfsandwich. It may help you.

Get Outlook for iOS<https://aka.ms/o0ukef>

________________________________
From: Filepro-list <filepro-list-bounces+ces=cescom.com at lists.celestial.com<mailto:cescom.com at lists.celestial.com>> on behalf of Laura Brody via Filepro-list <filepro-list at lists.celestial.com<mailto:filepro-list at lists.celestial.com>>
Sent: Monday, September 9, 2019 9:50 PM
To: Filepro_List
Cc: Laura Brody
Subject: Re: OT: Help getting PDF to OCR or searchable form

Additional information....

I talked to the user and got some history...

The user scanned in legal documents. Saved the images as pages in a PDF.
That is why I can't search on keywords for most of the files. A few files
were typed up and then exported as PDF. most are images of the pages. That
means that OCR has to be part of the solution.

I discovered that Adobe Acobat Reader has a setting to search all PDFs in a
directory for keywords. The problem is that these files don't contain text.
They contain images of text. Adobe can't search images and find keywords.

Laura Brody

On Mon, Sep 9, 2019 at 8:03 PM Laura Brody <laura.k.brody at gmail.com<mailto:laura.k.brody at gmail.com>> wrote:

> I am hoping that one of you has solved this problem before.....
>
> I have over a thousand pages of text in a dozen or so PDF files. Most
> files are "read-only" and I can not do Ctrl-F to search for keywords. I
> would like to be able to OCR the files and put everything into one file
> that is searchable. Or is there a utility that will search all of the PDFs
> in a directory for a keyword?
>
> Suggestions anyone?
>
> Laura Brody
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.celestial.com/pipermail/filepro-list/attachments/20190909/935e0f40/attachment.html>
_______________________________________________
Filepro-list mailing list
Filepro-list at lists.celestial.com<mailto:Filepro-list at lists.celestial.com>
Subscribe/Unsubscribe/Subscription Changes
http://mailman.celestial.com/mailman/listinfo/filepro-list
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.celestial.com/pipermail/filepro-list/attachments/20190910/49e153df/attachment.html>


More information about the Filepro-list mailing list