fP, RSS, and near-realtime synchronisation potential

Tue Jul 20 20:59:47 PDT 2004

Hi all,

Only one paragraph of brief background...then the fP-centric stuff starts.

To make a long story short, I started using a multi-protocol IM client
a while back and just explored it's weblogging (I hate the term 'blog'
more than I dislike "googling" as a verb) capabilities, as it has built in
support for LiveJournal.  Thing is, it also has support for RSS feeds built
in, and I had first tested this to see if the LiveJournal weblogging worked
the way I thought it did.  Then I fired it up on BBC and liked what I saw
of how RSS feeds work.  So I've done more than a bit of reading on the
subject since last night, collating my thoughts. 

For those who, like myself, were unaware of RSS, it stands for
[ostensibly--there are at least three differing definitions] Really Simple
Syndication.  Essentially, it's nothing more than sticking a specially
formatted XML file in a location on a server, and letting a program fetch
it and get what data it needs from the XML.  There are mechanisms in place
for timestamping, etc.  

I got to thinking about fP web clusters with secure, firewalled centralised
fP data servers, and "transactional" fP servers on the public side that
may comprise a whole cluster of load-balanced servers that may not have
all the data at once at any given point.  I've dealt with this on numerous
occasions.  Let me make it clear right now that web-based scenarios are
-not- the only ones that I believe can benefit from this; although the
technology to make it work requires web servers, they could be private
ones, and any applications in question need not be public "web enabled"
projects.  This has a potential bearing on any multi-server environment.
I'm simply going to explain it in terms of how it would work for a scenario
that I've dealt with repeatedly, but the application is not limited to that
scenario by any means.

I'm thinking that if you have multiple machines--let us say public web
servers whose data changes--connected to a private machine which is a
central data clearing house, a secured RSS feed may very well be a good
potential mechanism for publishing the data obtained on web servers back to
the main server.  You only need something like wget, curl, or even lynx to
get the RSS data, and some processing to parse the feed and do what you
would with it.  RSS feeds being inherently XML, they lend themselves
readily to what some of us are already doing as far as importing data into
filePro from XML, not to mention the presumed forthcoming built-in XML
functionality of fP.

And, I really see no reason why the reverse can't be done as well, in the
reverse direction (step three below):

     Public HTTP 1 - Receives new data.  Updates RSS feed for private server.

    Private Server - Fetches RSS feed, imports any new data.  Exports new 
                     record(s) in its own RSS feed for the public servers 
                     to read.

  Public HTTP 2..n - Fetches RSS feed from private server.  Populates local
                     copy of database so that it's current, even though the
                     data was obtained on a different server entirely.

In the past, I've used what I call "query brokering", where you have two
CGI setups, one public, one private, and you get the request on a public
server, reformat the query to the private server, get the results and pass
it back along the chain as a response for the browser.  You can do posting
this way as well.  In that way, it's pure realtime.  It's also a bit of a
PITA to do though, and relies on the endpoint license being up to the task
of handling as many requests as the external/other servers might be tuned
to use, as well.

However, if you had all servers doing RSS feeds with reasonable tolerances
built in for downtimes, etc., you could maintain enough data that any
server should be able to get all data necessary at any given point.
You could even maintain -two- RSS feeds--one for the short-term for
performance, say 200-1000 records or so worth of transactions, and a daily
one that would be like a master backup feed, in case a machine goes down
and needs to catch up from longer back, once you've rolled off entries.

Constructing the feeds seems straightforward enough.  You get a new record
in, you simply reconstruct the feed into a fresh file from 'X' entries
(however many you defined in processing) and populate the XML with that
many items from the most recent activity backwards, and at the last second,
symlink the feed file to the new data file.  You could alternate between
several so you had backups/failsafes.  

In effect, you could keep a cluster of 2..n web servers populated with the
latest data from each other, and sync'd with the main data storage server
constantly, within a threshhold of 'y' minutes.  The only problem I see is
that until you hit that syndication refresh time (the loop between server
1, data server, and server 2), how do you keep the browser from hitting any
other machines in the cluster that don't -have- that data yet until it's
populated.  Of course, if one does sessioning, one can basically have any
subsequent transactions redirected to the server that initiated the process
and carry on down the line until the transaction would no longer depend on
the most recently added data.  This assumes public IP#'s for all the public
servers, rather than a load-balanced NAT situation, but that tends to be
the case more often than not (at least in my experiences).

Haven't worked out -all- the fine points in my head yet, but it seems like
this should be something majorly useful, in theory. 

So, anyone want to tell me if I'm barking up the wrong tree?  I personally
think it can work, and work well.  I know of fP sites that update each
server only nightly, so this would be a radical improvement on that model.
This is pretty much as close to a transactional model as I can conceive of
within the framework of fP's way of doing things.  It certainly beats a
potential "corruption" downtime when you're doing an rsync or something.
Essentially it should (theoretically) eliminated the need for any downtime
at all for synchronisation.  It should be perpetually self-maintaining.
The nice thing is that it should only require one seat per server to
maintain the scheme, thus maximising cost:benefit ratios for the end
customers.  (No offense to fP-Tech...customers tend to look at that sort of
thing though, and the farther you can stretch a license by being efficient,
the better, in their eyes.)

The most it really requires is cron (or a 'doze equivalent), something to
fetch URL contents with, a secured area of the DocumentRoot on every server
(trivial), and some processing, 

I may be missing something.  I may be entirely going in the wrong
direction, though I currently think I'm mostly on-track.  I don't know...I
discovered this marvelously "old" technology that I'd never heard of
until last night, and it just dawned on me that there might be a serious
relevance and use for it as pertains to fP.  And web serving in these
conditions is only one instance.  I can easily see point-of-sale systems
at satellite locations, redundant backup systems, etc., using this kind of
mechanism.  You'd stay as current as your defined refresh times.  If you
polled every five minutes (pretty conservative given the LAN bandwidth
and server power available these days...you could poll every one minute if
you wanted, most likely), you'd only ever be out of sync by that long--not
lose a day's worth of transactions due to a server going down.

Actually, in theory, you wouldn't even need to adhere to the RSS format
specification.  It would in fact be far easier to use the -model- of RSS,
but not the specification, as it would be easier to parse, and thus to
import/export.  I think I'd approach it from the standpoint of an RSS
model based on a custom-designed fP-centric XML DTD that handled the same
concerns, but in a different manner, strictly because it would let you do
the same data passing without having to URLencode your data for special
entities, and because you don't -need- links and such.  You could just
define an XML format that was specific to fP's needs and work from there.
So long as you handle the "publication" time reasonably so that the model
is maintained, the official data structure could be tossed in favour of
something far more appropriate to fP.  You'd end up with something like
fPRSS or the like.  I imagine several of us here could come up with a draft
standard for a DTD that makes sense, in short order.  But the important
thing is to hold to the actual distribution model, regardless of any format
customisations that could be done.  The concept is the driving force behind
my thoughts here, not the hard and fast specification of the RSS format
(any version).  I really think straight XML would serve better as a format
for use with fP, and just allow for the serialisation that is inherent in
RSS's model.

Just some thoughts on the potential for mixing the two technologies.

Comments welcomed...

mark->
-- 
Fairlight->   ||| "You can't take a ferret to a      | Fairlight Consulting
  __/\__      ||| funeral!" --Norman Clegg, "Last of |
 <__<>__>     ||| the Summer Wine"                   | http://www.fairlite.com
    \/        |||                                    | info at fairlite.com