fP transaction-based clustering - viability?

Fri Nov 5 15:37:03 PST 2004

Simon--er, no...it was Brian K. White--said:
> 
> Just a little idea about the rsync...
> If you rsync often enough, and/or optimize the job enough that it just 
> includes the necessary files which might be a lot less than everything in a 
> filepro tree, then the job can go fast enough that it's OK for the web 
> service to be unresponsive for that time.

I disagree, Brian, with all due respect.  And I'm not talking about -just-
web-based systems.  I'm also encompassing systems where you may have
multiple branches or departments that need the same data but don't work off
the same central server--you want them to be autonomous as possible, but
have the most current set of data allowable by concurrent availability at
all times..

Even narrowing it down to -one- transaction, it -is- possible to hit the
files in such an order that even if you did only one filepro "file" (ie.,
table) at a time, you could rsync the key and a transaction could have
already gone through that renders the index invalid for the key you just
copied.

You can't do it without a locking mechanism.

Now, if all your stuff is CGI-based and that's your -only- consideration, I
would agree that a limited rsync job in a wrapper that generates a wrapper
for which all CGI programming checks before proceeding with a transaction
is more or less sufficient.

It is not, however, very robust.  The main sticking point is that you're
thinking in terms of only one system's data being current at any given
point.  You're technically not talking a clustered situation, you're
talking a hot spare.  

If you use rsync, you're flat-out wiping all changes in the same way that
five people starting to edit the same file from the same starting copy in
a text editor will result in only the last person's changes being saved
to disk--the rest discarded by overwrites.  I'm talking about two or more
servers that are autonomous, each able to accept transactions in their own
right, syncing their data between each other in as close to realtime as
possible (realtime if they're up, deferred for one if that one was down for
any reason), preserving -all- changes from -all- machines.

> You could have the various servers agree upon a time to rsync and/or have a 
> master server throw lockfiles up on the others or send a special cgi request 
> that tells the others to finish up what they're doing and que further cgi 
> requests but don't access fp any more. And wait for corresponding "ok I'm 
> done with my pending tasks" files (lock-acknowledge-files)?

No, you can't.  The second you have more than two servers, rsync is no
longer a viable option, as it acts like diff/patch--server #2 will force #1
to look exactly as #2.  If #3 tries to make #1 look like #3 after that
point, #2's changes are entirely lost.

> Once all the other servers have acknowledges the lock, run the rsyncs, 
> release the locks.

Locking isn't the problem.  Locking is the -easy- part.

> That will keep the data uncorrupted, a little old at worst few minutes to 

One entry-point's data, sure.  Not multiple entry-point's though.  All but
one's would get discarded.

> And even then it only works as long as the sub-servers are just serving up 
> data sort of read-only. If the sub servers need to make data, then I don't 
> even know of a way for rsync to do it unless you do basically the same 
> programming or even more, as it would take to do transactions anyways.

You see what I get for inline replying?  :)  Okay, you take my point, but
I'll leave it stand for anyone that may actually misunderstand or overlook
the underlying principles.

> This is not a real solution of course, you are absolutely correct in that we 
> need transactions. I have toyed with a simple cgi script a couple years ago 

Yes, and they must cover three separate cases:  add, update, and delete.
Of the three, delete will be the trickiest, as presumably the developer will 
have to do things to lock down deletion from screens, or at least capture
@sk or whatever it was, and @bk from browses, maintain state of what @rn
was in what table, and pass that information along to a table to do
deletions.  Add and update aren't as tricky as delete, IMHO.

> Then I figured I might be reinventing sql or something and in any event the 
> customer who wanted it stopped wanting it and so it didn't get any further 
> than the read-only, one-file, + map of future development stage.

Yes, well...people don't want to move to SQL, they want to stay with
fP.  They have a lot of development time invested in their current
solutions--they just need a bit more robust communication.  

If the mountain won't come to Mohammed...

So yeah, reinventing a purpose-specific wheel may indeed turn out to
be necessary unless fP suddenly sprouts network-awareness.  Even with
what I see of the language-layer wrappers for the network stack that
I believe are implemented (Ken never said a word in response, you may
have noticed), I note the distinct lack of SELECT().  Without that, any
multi-connection server is doomed.  Not that fP's language is really
suitable for async communications models in the first place IMHO, so either
way it's likely dead in the water as a direct fP-language extension.  It's
going to require some external communications middleware, as well as fP
CALLable subroutines.

mark->
-- 
Bring the web-enabling power of OneGate to -your- filePro applications today!

Try the live filePro-based, OneGate-enabled demo at the following URL:
               http://www2.onnik.com/~fairlite/flfssindex.html