OT: OCR / PDF Parsing
Bob Rasmussen
ras at anzio.com
Wed Jan 12 17:50:04 PST 2022
Your results will depend on the quality of your OCR library and the
quality (mainly dot density) of the scan-to-PDF.
Try Adobe Acrobat (not Reader). It can take an image-only PDF and OCR
it with good accuracy. Then you can Save As it to a new PDF, and/or export
it to plain text (or other formats).
I don't know about programmability.
On Wed, 12 Jan 2022, Jose Lerebours via Filepro-list wrote:
> I have an GSA that wants data extracted from PDF documents, most of which are
> scanned
> documents saved as PDF; which in essence makes them images saved as PDF.
>
> I have written code in PHP to save the PDF to PNG and extract TEXT from PNG
> but this is not proving
> to be reliable since lots of characters are read wrong or not read at all.
>
> It is like pulling teeth, I want this done but do not ask me to get you
> "true" PDFs, the scanned
> documents is all I can get ... type of scenario.
>
> So, my question is: is anyone here successfully extracting data from scanned
> documents and if so,
> what are you using?
>
> Regards,
>
>
> --
> Jose Lerebours
> 954-559-7186
> https://www.asisuites.com
> Accounting - Retail - Wholesale - Distribution
> Manufacturing - Warehousing - Transportation - eCommerce - Web Development
>
> _______________________________________________
> Filepro-list mailing list
> Filepro-list at lists.celestial.com
> Subscribe/Unsubscribe/Subscription Changes
> http://mailman.celestial.com/mailman/listinfo/filepro-list
>
Regards,
....Bob Rasmussen, President, Rasmussen Software, Inc.
personal e-mail: ras at anzio.com
company e-mail: rsi at anzio.com
voice: (US) 503-624-0360 (9:00-6:00 Pacific Time)
fax: (US) 503-624-0760
web: http://www.anzio.com
street address: Rasmussen Software, Inc. NEW ADDRESS AS OF AUGUST 1, 2020
8835 SW Canyon Lane, Suite 401
Portland, OR 97225 USA
More information about the Filepro-list
mailing list