filePro help needed...

Brian K. White brian at aljex.com
Fri Dec 18 00:10:17 PST 2009


Bob Rasmussen wrote:
> On Sun, 6 Dec 2009, Brian K. White wrote:
> 
>> (clipping a lot) 
>>
>> It does put a heavy new workload on your servers though. Image 
>> processing is heavy cpu work compared to anything you normally do in 
>> filepro, and it quickly fills up hard drives.
>> At low volumes and low user counts you might not notice at first, 
>> because you almost can't buy a server today that isn't over-spec for 5 
>> or 10 users doing just ordinary filepro work, so you can add some more 
>> work without noticing.
>> But dealing with scanned images quickly adds up to tax a server once you 
>> actually start using it a little.
> 
> While I can understand the heavy storage requirements on the server, I'm 
> curious about the CPU-intensive part. What do you do with the image files 
> on the server other than store them and feed them back?

Depending what format you use to collect images, you have to convert 
them to something else, sometimes more than once, to print them and/or 
fax them and/or email them and/or display them in a browser.
Actually I should have said "No matter what format" not "Depending what 
format". Some formats make some jobs more efficient and others others, 
but no single format provides an efficient to all uses of the image 
data. If I scan to tiffg3, I can fax for free, but everything else costs 
work, even viewing, because I do not accept requiring the user to 
install a browser plugin just to view images when so many image formats 
exist that the browser can display natively. If I scan to png or gif I 
can view and email for free but it costs to fax or print or pdf.
If I scan to pdf it costs to fax and print, email is free, but viewing 
is less immediate on the client since they must have acrobate installed 
and they have to wait for it to laod up, every, time....

I used to collect images as tiff and then convert to equivalent png on 
the server because my pc scanner util of the day couldnt save as png or 
gif natively. (jpg was out the door as too ineffiecient)

Then later and for most users still today, I collected them as png.

In both cases, I have to use ghostscript to print and to fax.
(I wasn't keeping the original tiff's because of disk space, more 
importantly, tape drive space and backup time windows)

I tried for a long time to get users to accept emails that were just 
html with the images as img tags, but they just insist on pdf,

so, for customers that scan to png, I have to run ghostpdl a lot to 
generate pdfs of emailed invoices which contain a front page in pcl 
generated from filepro, followed by the associated scanned images, all 
in one neat pdf. Great for the end user. Hard on the server.

My latest version of our scanning scans directly to pdf on the pc.
And I have pdftk which seems to be slicing and dicing pdfs at a high 
level (without actually re-rasterizing, if that's even a word). That's 
less work but not no work. And ghostscript still has to work to print 
them and hylafax/vsifax still have to work to convert to tiffg3 to fax them.

And in all of these cases I have to generate a large enough thumbnail 
view of every new file that the user can get some sense of what the file 
is from the thimbnail. Thats a lot of work right there. I measure "lot 
of work" by the fact that, it takes the server a few solid seconds to do 
it. Which is an eternity considering that's just one person doing one 
thing on a box that's otherwise supports 200 concurrent people just fine.

Actually in no case is emailing really free, I must always base64 it at 
least, since sending embedded links in emails is usually blocked these 
days, and I decided a while back it wasn't good practise. I should be 
sending the end user their actual statement (or whatever) not a mere 
link that might break in the future. base64 is a pretty cheap process 
but not as cheap as merely handling/copying.

Further... say I scan at fax quality settings
1bpp 200x200, well faxes are really 196x204 and hylafax/vsifax actually 
has to resample or resize that internally, and pcl printers mostly don't 
actually print at 200dp1 even if you ask, they have to actually use 
600dpi internally top print 200dpi, some can do this transparently, some 
require me to do it on the server with ghostscript, which means really I 
just have to do it on the server all the time.

But some people deal with documents that can not really be imaged at 
1bpp. They require at _least_ 4bit greyscale. Which really ends up being 
8bit greyscale or 8bit indexed color where all the colors are shades of 
grey. That explodes the file size up so hugely that I must drop the 
resolution down to 100dpi and use lossy jpeg compression and still the 
file sizes are 2 to 4 times larger (which adds up fast at thousands of 
pages per day) a 100dpi 8bit greyscale looks pretty garbagy when faxed, 
looks ok on screen, and only looks ok printed if you print at 600 or 
higher dpi so that the printer can emulate greyscale with dithering.

Theres definitely a lot more to document imaging and integrating in a 
useful way with your application than merely handling the files and 
passing them around without otherwise touching them.

-- 
bkw



More information about the Filepro-list mailing list