OT: OCR / PDF Parsing
Laura Brody
laura.k.brody at gmail.com
Thu Jan 13 08:44:45 PST 2022
I used pdfsandwich in a project. I set up a Raspberry Pi, found all of the
dependencies for it and wrote a script to automate it (I had over 200 files
to work on). It did OCR on the files I put in the "original" directory. If
it succeeded without errors, I moved it to a "Done" directory otherwise it
went to a "Fail" directory. It did the OCR and added the text to the file.
It wasn't perfect, but close enough for my needs.
Laura Brody
On Wed, Jan 12, 2022 at 8:38 PM Cesar Baquerizo via Filepro-list <
filepro-list at lists.celestial.com> wrote:
> Look for pdfsandwich. That should do what you want. Lots of info at the
> site.
>
> Regards
> ---------------------
>
>
>
>
> ********************************************************************
>
> This message and any attachments are solely for the intended recipient. If
> you are not the intended recipient, disclosure, copying, use or
> distribution of the information included in this message is prohibited. If
> you received this message in error, please notify the sender and
> permanently delete.
>
>
> > On Jan 12, 2022, at 8:28 PM, Jose Lerebours via Filepro-list <
> filepro-list at lists.celestial.com> wrote:
> >
> > I have an GSA that wants data extracted from PDF documents, most of
> which are scanned
> > documents saved as PDF; which in essence makes them images saved as PDF.
> >
> > I have written code in PHP to save the PDF to PNG and extract TEXT from
> PNG but this is not proving
> > to be reliable since lots of characters are read wrong or not read at
> all.
> >
> > It is like pulling teeth, I want this done but do not ask me to get you
> "true" PDFs, the scanned
> > documents is all I can get ... type of scenario.
> >
> > So, my question is: is anyone here successfully extracting data from
> scanned documents and if so,
> > what are you using?
> >
> > Regards,
> >
> >
> > --
> > Jose Lerebours
> > 954-559-7186
> > https://www.asisuites.com
> > Accounting - Retail - Wholesale - Distribution
> > Manufacturing - Warehousing - Transportation - eCommerce - Web
> Development
> >
> > _______________________________________________
> > Filepro-list mailing list
> > Filepro-list at lists.celestial.com
> > Subscribe/Unsubscribe/Subscription Changes
> > http://mailman.celestial.com/mailman/listinfo/filepro-list
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: smime.p7s
> Type: application/pkcs7-signature
> Size: 2349 bytes
> Desc: not available
> URL: <
> http://mailman.celestial.com/pipermail/filepro-list/attachments/20220112/7e3b435f/attachment.p7s
> >
> _______________________________________________
> Filepro-list mailing list
> Filepro-list at lists.celestial.com
> Subscribe/Unsubscribe/Subscription Changes
> http://mailman.celestial.com/mailman/listinfo/filepro-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.celestial.com/pipermail/filepro-list/attachments/20220113/a6d91667/attachment.html>
More information about the Filepro-list
mailing list