OT: OCR / PDF Parsing
Cesar Baquerizo
ces at cescom.com
Wed Jan 12 17:37:26 PST 2022
Look for pdfsandwich. That should do what you want. Lots of info at the site.
Regards
---------------------
********************************************************************
This message and any attachments are solely for the intended recipient. If you are not the intended recipient, disclosure, copying, use or distribution of the information included in this message is prohibited. If you received this message in error, please notify the sender and permanently delete.
> On Jan 12, 2022, at 8:28 PM, Jose Lerebours via Filepro-list <filepro-list at lists.celestial.com> wrote:
>
> I have an GSA that wants data extracted from PDF documents, most of which are scanned
> documents saved as PDF; which in essence makes them images saved as PDF.
>
> I have written code in PHP to save the PDF to PNG and extract TEXT from PNG but this is not proving
> to be reliable since lots of characters are read wrong or not read at all.
>
> It is like pulling teeth, I want this done but do not ask me to get you "true" PDFs, the scanned
> documents is all I can get ... type of scenario.
>
> So, my question is: is anyone here successfully extracting data from scanned documents and if so,
> what are you using?
>
> Regards,
>
>
> --
> Jose Lerebours
> 954-559-7186
> https://www.asisuites.com
> Accounting - Retail - Wholesale - Distribution
> Manufacturing - Warehousing - Transportation - eCommerce - Web Development
>
> _______________________________________________
> Filepro-list mailing list
> Filepro-list at lists.celestial.com
> Subscribe/Unsubscribe/Subscription Changes
> http://mailman.celestial.com/mailman/listinfo/filepro-list
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2349 bytes
Desc: not available
URL: <http://mailman.celestial.com/pipermail/filepro-list/attachments/20220112/7e3b435f/attachment.p7s>
More information about the Filepro-list
mailing list