filePro help needed...
Brian K. White
brian at aljex.com
Fri Dec 18 00:10:17 PST 2009
Bob Rasmussen wrote:
> On Sun, 6 Dec 2009, Brian K. White wrote:
>
>> (clipping a lot)
>>
>> It does put a heavy new workload on your servers though. Image
>> processing is heavy cpu work compared to anything you normally do in
>> filepro, and it quickly fills up hard drives.
>> At low volumes and low user counts you might not notice at first,
>> because you almost can't buy a server today that isn't over-spec for 5
>> or 10 users doing just ordinary filepro work, so you can add some more
>> work without noticing.
>> But dealing with scanned images quickly adds up to tax a server once you
>> actually start using it a little.
>
> While I can understand the heavy storage requirements on the server, I'm
> curious about the CPU-intensive part. What do you do with the image files
> on the server other than store them and feed them back?
Depending what format you use to collect images, you have to convert
them to something else, sometimes more than once, to print them and/or
fax them and/or email them and/or display them in a browser.
Actually I should have said "No matter what format" not "Depending what
format". Some formats make some jobs more efficient and others others,
but no single format provides an efficient to all uses of the image
data. If I scan to tiffg3, I can fax for free, but everything else costs
work, even viewing, because I do not accept requiring the user to
install a browser plugin just to view images when so many image formats
exist that the browser can display natively. If I scan to png or gif I
can view and email for free but it costs to fax or print or pdf.
If I scan to pdf it costs to fax and print, email is free, but viewing
is less immediate on the client since they must have acrobate installed
and they have to wait for it to laod up, every, time....
I used to collect images as tiff and then convert to equivalent png on
the server because my pc scanner util of the day couldnt save as png or
gif natively. (jpg was out the door as too ineffiecient)
Then later and for most users still today, I collected them as png.
In both cases, I have to use ghostscript to print and to fax.
(I wasn't keeping the original tiff's because of disk space, more
importantly, tape drive space and backup time windows)
I tried for a long time to get users to accept emails that were just
html with the images as img tags, but they just insist on pdf,
so, for customers that scan to png, I have to run ghostpdl a lot to
generate pdfs of emailed invoices which contain a front page in pcl
generated from filepro, followed by the associated scanned images, all
in one neat pdf. Great for the end user. Hard on the server.
My latest version of our scanning scans directly to pdf on the pc.
And I have pdftk which seems to be slicing and dicing pdfs at a high
level (without actually re-rasterizing, if that's even a word). That's
less work but not no work. And ghostscript still has to work to print
them and hylafax/vsifax still have to work to convert to tiffg3 to fax them.
And in all of these cases I have to generate a large enough thumbnail
view of every new file that the user can get some sense of what the file
is from the thimbnail. Thats a lot of work right there. I measure "lot
of work" by the fact that, it takes the server a few solid seconds to do
it. Which is an eternity considering that's just one person doing one
thing on a box that's otherwise supports 200 concurrent people just fine.
Actually in no case is emailing really free, I must always base64 it at
least, since sending embedded links in emails is usually blocked these
days, and I decided a while back it wasn't good practise. I should be
sending the end user their actual statement (or whatever) not a mere
link that might break in the future. base64 is a pretty cheap process
but not as cheap as merely handling/copying.
Further... say I scan at fax quality settings
1bpp 200x200, well faxes are really 196x204 and hylafax/vsifax actually
has to resample or resize that internally, and pcl printers mostly don't
actually print at 200dp1 even if you ask, they have to actually use
600dpi internally top print 200dpi, some can do this transparently, some
require me to do it on the server with ghostscript, which means really I
just have to do it on the server all the time.
But some people deal with documents that can not really be imaged at
1bpp. They require at _least_ 4bit greyscale. Which really ends up being
8bit greyscale or 8bit indexed color where all the colors are shades of
grey. That explodes the file size up so hugely that I must drop the
resolution down to 100dpi and use lossy jpeg compression and still the
file sizes are 2 to 4 times larger (which adds up fast at thousands of
pages per day) a 100dpi 8bit greyscale looks pretty garbagy when faxed,
looks ok on screen, and only looks ok printed if you print at 600 or
higher dpi so that the printer can emulate greyscale with dithering.
Theres definitely a lot more to document imaging and integrating in a
useful way with your application than merely handling the files and
passing them around without otherwise touching them.
--
bkw
More information about the Filepro-list
mailing list