OT: OCR / PDF Parsing
Jose Lerebours
fpgroups at gmail.com
Wed Jan 12 19:07:25 PST 2022
Will look at Adobe - found this:
adobe.io/apis/documentcloud/dcsdk
if not for this particular problem, there may be something of value here!
Thanks,
On 1/12/22 8:50 PM, Bob Rasmussen wrote:
> Your results will depend on the quality of your OCR library and the
> quality (mainly dot density) of the scan-to-PDF.
>
> Try Adobe Acrobat (not Reader). It can take an image-only PDF and OCR
> it with good accuracy. Then you can Save As it to a new PDF, and/or
> export it to plain text (or other formats).
>
> I don't know about programmability.
>
> On Wed, 12 Jan 2022, Jose Lerebours via Filepro-list wrote:
>
>> I have an GSA that wants data extracted from PDF documents, most of
>> which are scanned
>> documents saved as PDF; which in essence makes them images saved as PDF.
>>
>> I have written code in PHP to save the PDF to PNG and extract TEXT
>> from PNG but this is not proving
>> to be reliable since lots of characters are read wrong or not read at
>> all.
>>
>> It is like pulling teeth, I want this done but do not ask me to get
>> you "true" PDFs, the scanned
>> documents is all I can get ... type of scenario.
>>
>> So, my question is: is anyone here successfully extracting data from
>> scanned documents and if so,
>> what are you using?
>>
>> Regards,
>>
>>
>> --
>> Jose Lerebours
>> 954-559-7186
>> https://www.asisuites.com
>> Accounting - Retail - Wholesale - Distribution
>> Manufacturing - Warehousing - Transportation - eCommerce - Web
>> Development
>>
>> _______________________________________________
>> Filepro-list mailing list
>> Filepro-list at lists.celestial.com
>> Subscribe/Unsubscribe/Subscription Changes
>> http://mailman.celestial.com/mailman/listinfo/filepro-list
>>
>
> Regards,
> ....Bob Rasmussen, President, Rasmussen Software, Inc.
>
> personal e-mail: ras at anzio.com
> company e-mail: rsi at anzio.com
> voice: (US) 503-624-0360 (9:00-6:00 Pacific Time)
> fax: (US) 503-624-0760
> web: http://www.anzio.com
> street address: Rasmussen Software, Inc. NEW ADDRESS AS OF
> AUGUST 1, 2020
> 8835 SW Canyon Lane, Suite 401
> Portland, OR 97225 USA
--
Jose Lerebours
954-559-7186
https://www.asisuites.com
Accounting - Retail - Wholesale - Distribution
Manufacturing - Warehousing - Transportation - eCommerce - Web Development
More information about the Filepro-list
mailing list