filepro and multiple servers

Fri May 23 19:53:28 PDT 2008

On Fri, May 23, 2008 at 04:24:45PM -0500, Richard D. Williams may or may
not have proven themselves an utter git by pronouncing:
> Has anyone used multiple servers in combination with a single large data
> server with filepro?

Yes, but not in a configuration I'd recommend.  The architecture was not my
recommendation, though.

> A general overview of what I am thinking:
>
> Filepro is installed on each of 4 servers (A,B,C,D) and via NFS, or some
> other connectivity, the data is stored on server Z.  A proxy server is in
> front of the 4 servers to do load balancing.  Each server has a 200 user
> license.

fP over NFS works, but I wouldn't recommend it--especially for large files.
Even with sync -off- (which is a calculated risk), say you have something
that has to scan through an index, and the index is huge...  Here's a
real-life example:

-rw------- 1 filepro 502 101963776 May 23 01:05 index.A

That's 100MB of index, and that's also only -one- index.  If something
needs to read through the index for a report, that's automatically 100MB of
data going over the network.  Actually more, since you have IP overhead,
TCP overhead, and NFS overhead--even with 8K packet sizes.

NFS was never designed to be speedy.  Even on gigabit, I don't recommend it
for the volumes you're talking about.  I can see this going very poorly
from the start.

Locking can be odd on NFS, depending on the platform.  I've seen it work
fine on linux.  I've seen it refuse to work correctly on SCO.  Solaris is
probably going to work fine based on some other locking I've seen work from
a different application (that also used fcntl() locks).

> We do not want to run a single server with 800 hundred users because we
> do not get any redundancy.  We can also run smaller servers and balance
> the load.  There will be a redundant Y server to the Z server.

Okay, salient question:  what are you doing that you have 800 users?  Is
this a web-based something?  I'm assuming so, as you're talking about a
proxy.  If a site is getting hit to the tune of 800 concurrent requests at
a time...well, that's upwards of 24000 hits an hour, figuring 2 seconds per
request.  Do you actually consider that likely?

Additionally, replication, failover, and filePro is not a pretty picture.
Bear in mind that I expressly state NFS is a Bad Idea[tm] for use with fP.
Ignore that advice at your own peril.  Others have.

Consider a situation where you transport records to and from live/backup
servers:

Server A: normal live server
Server B: hot spare failover server

Now...  *BREAK*  ...Server A just went down.  Role reversal:

Server A: [dead] spare
Server B: live failover

Okay, all's well, right?

Now...  *FIX*  ...Server A is back.  Role reversal to original roles.

Okay, at this point you have records that are on A that may not have made
it to B before failure (or after restoration).  You have records on B that
aren't on A from the failover period.  When you reverse the second time,
you need to get all records from both boxes to the respective boxes that
don't have them.  There's "missing" data on both.  

In other words, in this scenario, you have to design your architecture to
handle something like this.  It won't be light on its feet, and it
certainly won't be easy with four servers and maybe a spare in the mix.
Once you hit 3, you're not running into more difficulty if you code it
right--it should scale up reasonably.  But even handling two is bad enough.

fP has no concept of replication.  In this respect, you'd be better off
going with an SQL solution that can handle that automagically.  Actually,
with the loads you're talking about, you're better off going with something
other than fP.  If you're actually doing this with -any- CGI type solution,
you've got...let's just assume 200 concurrent requests on one system:

unknown_number of apache forks, since it's tunable...estimate at least 10
200 instances of some CGI program (fpcgi, onegate, whatever)
200 shells sitting between the CGI and fP
200 clerk or report binaries

That's 600+ processes at once, per server.

Change those 200's to 800's if you single-servered it.  Luck on that.  :)

That is a -cartload- of RAM and CPU.  We won't even get into the disk
contention--which I -certainly- would not want to put across NFS.  

Let's revisit the NFS proposal...  If you tried NFS for 800 simultaneous,
and only had 1MB indexes that need to be used for each, that's 800MB of
data inside a couple seconds.  If you multiply by 8 to go to megabit
from megabyte, you need 6400mbit to accomodate that (without IP/TCP/NFS
overhead, mind you).  That's 6.4 gigabit.  Even assuming it can be allowed
to carry over into 3 seconds per request, that's 1.2 gigabit.  I don't see
that as viable.  You're saturated or worse on gigabit ethernet.

And that's with tiny 1MB indexes floating around, nevermind actual key
file segments (keep in mind that if you have a 2GB key file on a share,
and you're accessing it with fP on a different box, if it has to seek
1.75GB into the key to get to the record that the index said to go to,
you -transfer- 1.75GB of data (plus overhead!) to get there on the remote
machine's incarnation of clerk or report).  If you do a lot of updating,
caching won't help well enough to count on for help.

I trust the case against using NFS for heavy lifting is now apparent.

Incidentally, using a proxy server is a bad idea.  A load balancer, sure.
A proxy for data that changes dynamically?  No.

> I have bounced this off some people in FP and there seems to be a
> consensus that it will work.

They can say what they like.  I find it laughable, and I wouldn't recommend
the proposed solution at all.  Assuming your numbers are even close to an
accurate representation of the load you expect, you definitely don't want
to do this with NFS, and you may not even want to do it with fP.  If you
do, it's going to cost you in terms of infrastructure development to get it
working respectably some other way.

> Here is a dangerous statement...... Any thoughts?

Many.  Like, "What the hell was someone thinking when they said it should
work?"  The math just doesn't bear that out given your estimated load, and
that's assuming 100% optimal performance with -nothing- else going on.  And
1MB indexes may be on the small side, to boot...that was an example.

mark->
-- 
"Moral cowardice will surely be written as the cause on the death
certificate of what used to be Western Civilization." --James P. Hogan