OT: Help getting PDF to OCR or searchable form
Cesar Baquerizo
ces at cescom.com
Mon Sep 16 09:42:41 PDT 2019
Good to hear. Glad I could give back.
On 9/16/2019 11:27 AM, Laura Brody wrote:
> Just an update.... I got it all working on a Raspberry Pi 3 B+ with a
> 32 GB micro SD chip. Debian Linux. I wrote a script to upload the
> processed file to Dropbox automatically. It is happily working on
> files and will probably be done in a week or less (it has about 55
> files to process, some with 2-6 pages, but most with 50-130 pages).
>
> All of the software was free. I already had a few Raspberry Pi boards,
> so my only investment was my time.
>
> Thank you so much for pointing me in the right direction. Left to my
> own devices, I would still be researching how to tackle this project.
>
> Laura Brody
>
> On Mon, Sep 9, 2019 at 10:44 PM Laura Brody <laura.k.brody at gmail.com
> <mailto:laura.k.brody at gmail.com>> wrote:
>
> Yes, I see that. Now that I know that PDFsandwhich and tesseract
> will run on the Raspberry Pi and do what I need, I have a clear
> idea what I need to do to get searchable PDFs out of the files
> that I have. Thank you for pointing me in the right direction. You
> saved me a boatload of time and aggravation.
>
> Laura Brody
>
> On Mon, Sep 9, 2019 at 10:38 PM Cesar Baquerizo <ces at cescom.com
> <mailto:ces at cescom.com>> wrote:
>
> Yw. You’ll also need tesseract. They are two different Sw. Let
> me know how it goes.
>
> Get Outlook for iOS <https://aka.ms/o0ukef>
> ------------------------------------------------------------------------
> *From:* Laura Brody <laura.k.brody at gmail.com
> <mailto:laura.k.brody at gmail.com>>
> *Sent:* Monday, September 9, 2019 10:35 PM
> *To:* Cesar Baquerizo; Filepro_List
> *Subject:* Re: OT: Help getting PDF to OCR or searchable form
> I found a list of Linux flavors that PDFsandwhich has been
> ported to and Raspberrian Linux was on the list!
>
> I will be be working on this project tomorrow. Thank you so
> much for this lead. I don't think I would have found it by myself.
>
> Laura Brody
>
> On Mon, Sep 9, 2019 at 10:27 PM Laura Brody
> <laura.k.brody at gmail.com <mailto:laura.k.brody at gmail.com>> wrote:
>
> This is very interesting.
>
> The only Linux box I have running at the moment is
> Raspberry Pi 3 B+. I have 64GB SD card available, so space
> isn't an issue. Any idea if it will work on it?
>
> Laura Brody
>
> On Mon, Sep 9, 2019 at 9:54 PM Cesar Baquerizo
> <ces at cescom.com <mailto:ces at cescom.com>> wrote:
>
> Lookup Tesseract and Pdfsandwich. It may help you.
>
> Get Outlook for iOS <https://aka.ms/o0ukef>
> ------------------------------------------------------------------------
> *From:* Filepro-list
> <filepro-list-bounces+ces=cescom.com at lists.celestial.com
> <mailto:cescom.com at lists.celestial.com>> on behalf of
> Laura Brody via Filepro-list
> <filepro-list at lists.celestial.com
> <mailto:filepro-list at lists.celestial.com>>
> *Sent:* Monday, September 9, 2019 9:50 PM
> *To:* Filepro_List
> *Cc:* Laura Brody
> *Subject:* Re: OT: Help getting PDF to OCR or
> searchable form
> Additional information....
>
> I talked to the user and got some history...
>
> The user scanned in legal documents. Saved the images
> as pages in a PDF.
> That is why I can't search on keywords for most of the
> files. A few files
> were typed up and then exported as PDF. most are
> images of the pages. That
> means that OCR has to be part of the solution.
>
> I discovered that Adobe Acobat Reader has a setting to
> search all PDFs in a
> directory for keywords. The problem is that these
> files don't contain text.
> They contain images of text. Adobe can't search images
> and find keywords.
>
> Laura Brody
>
> On Mon, Sep 9, 2019 at 8:03 PM Laura Brody
> <laura.k.brody at gmail.com
> <mailto:laura.k.brody at gmail.com>> wrote:
>
> > I am hoping that one of you has solved this problem
> before.....
> >
> > I have over a thousand pages of text in a dozen or
> so PDF files. Most
> > files are "read-only" and I can not do Ctrl-F to
> search for keywords. I
> > would like to be able to OCR the files and put
> everything into one file
> > that is searchable. Or is there a utility that will
> search all of the PDFs
> > in a directory for a keyword?
> >
> > Suggestions anyone?
> >
> > Laura Brody
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> <http://mailman.celestial.com/pipermail/filepro-list/attachments/20190909/935e0f40/attachment.html>
>
> _______________________________________________
> Filepro-list mailing list
> Filepro-list at lists.celestial.com
> <mailto:Filepro-list at lists.celestial.com>
> Subscribe/Unsubscribe/Subscription Changes
> http://mailman.celestial.com/mailman/listinfo/filepro-list
>
>
--
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.celestial.com/pipermail/filepro-list/attachments/20190916/5008226a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ces_logo.png
Type: image/png
Size: 20179 bytes
Desc: not available
URL: <http://mailman.celestial.com/pipermail/filepro-list/attachments/20190916/5008226a/attachment.png>
More information about the Filepro-list
mailing list