OT: OCR / PDF Parsing

Jose Lerebours fpgroups at gmail.com
Wed Jan 12 19:07:25 PST 2022


Will look at Adobe - found this:

adobe.io/apis/documentcloud/dcsdk


if not for this particular problem, there may be something of value here!


Thanks,



On 1/12/22 8:50 PM, Bob Rasmussen wrote:
> Your results will depend on the quality of your OCR library and the 
> quality (mainly dot density) of the scan-to-PDF.
>
> Try Adobe Acrobat (not Reader). It can take an image-only PDF and OCR 
> it with good accuracy. Then you can Save As it to a new PDF, and/or 
> export it to plain text (or other formats).
>
> I don't know about programmability.
>
> On Wed, 12 Jan 2022, Jose Lerebours via Filepro-list wrote:
>
>> I have an GSA that wants data extracted from PDF documents, most of 
>> which are scanned
>> documents saved as PDF; which in essence makes them images saved as PDF.
>>
>> I have written code in PHP to save the PDF to PNG and extract TEXT 
>> from PNG but this is not proving
>> to be reliable since lots of characters are read wrong or not read at 
>> all.
>>
>> It is like pulling teeth, I want this done but do not ask me to get 
>> you "true" PDFs, the scanned
>> documents is all I can get ... type of scenario.
>>
>> So, my question is: is anyone here successfully extracting data from 
>> scanned documents and if so,
>> what are you using?
>>
>> Regards,
>>
>>
>> -- 
>> Jose Lerebours
>> 954-559-7186
>> https://www.asisuites.com
>> Accounting - Retail - Wholesale - Distribution
>> Manufacturing - Warehousing - Transportation - eCommerce - Web 
>> Development
>>
>> _______________________________________________
>> Filepro-list mailing list
>> Filepro-list at lists.celestial.com
>> Subscribe/Unsubscribe/Subscription Changes
>> http://mailman.celestial.com/mailman/listinfo/filepro-list
>>
>
> Regards,
> ....Bob Rasmussen,   President,   Rasmussen Software, Inc.
>
> personal e-mail: ras at anzio.com
>  company e-mail: rsi at anzio.com
>           voice: (US) 503-624-0360 (9:00-6:00 Pacific Time)
>             fax: (US) 503-624-0760
>             web: http://www.anzio.com
>  street address: Rasmussen Software, Inc.         NEW ADDRESS AS OF 
> AUGUST 1, 2020
>                  8835 SW Canyon Lane, Suite 401
>                  Portland, OR  97225  USA

-- 
Jose Lerebours
954-559-7186
https://www.asisuites.com
Accounting - Retail - Wholesale - Distribution
Manufacturing - Warehousing - Transportation - eCommerce - Web Development



More information about the Filepro-list mailing list