fP and data integrity...how to maintain on non-quiescent systems?

Sat Feb 17 19:11:20 PST 2007

On Sat, Feb 17, 2007, Fairlight wrote:
>Okay, scenario:
>
>Company X is a 24/7 business running fP.
>
>Let's assume about 80 tables or so.  Figure about 10 indexes on average.
>
>Data goes back, say, 9 years or so, no archiving has been done, which fact
>exacerbates a problem that creeps up even if the data was kept lean.
>
....
>Let's look at the time it takes (figure about 20-30min) to rsync or
>tape-backup all the data between tables A and Q.  In that time, both tables
>may have been updated, but the detail for things that got backed up in the
>headers will be missing in your backup.
>
>This leaves you with a non-viable backup.
>
>Okay, some of you folks have been doing fP for 25 years now.  How do you
>handle this kind of situation?  Generally speaking, backups should be run
>when a system is quiescent.  I know with absolutely certainty that there
>are businesses out there using it that are 24/7 and never have sufficient
>inactivity to avoid this kind of issue.
>
>I'm very interested in how anyone avoids this on an rsync.  For that
>matter, rsync aside, how do you avoid it even with Edge, tar, cpio, or
>anything else?  I'm coming up blank right now, so I ask--how does one
>avoid this kind of problem?  This kind of thing would make even the best
>bit-level verified tape/disk/whatever backup almost useless in many, many
>circumstances.  I can just see a freight system with a tracking number and
>scant partial data in the header record, but all of the supplemental data
>gone missing.  NOT a good scenario.

You might look at the way that MySQL and openldap handle database
replication.  They keep a log file of all transactions updating the
database(s), which is the used by the secondary machine(s) to update their
copies of the data.

MySQL handles this by (a) making a snapshot of the database, and noting the
offset within the current logfile.  The secondary system is created from
that snapshot, then it's given the offset in the remote log file,
effectively does a seek into that file, and proceeds to continuously update
its databases, keeping track of its current position in the log processing.

We've been using this for several months with MySQL to maintain and
distribute spamassassin bayesian filter information for clusters of mail
servers on multiple ISPs with no problems so far.

This is a bit more complex that simple rsyncs, but it has the advantage
that it can work with live systems, and when doing backups, it requires
almost zero time to switch servers.

Given that FilePro doesn't have this type of logging, it would require
writing your own logging routines to generate the transaction logs.  As a
first cut, I would probably write out the appropriate SQL commands to
perform the operations.  One might even be able to hack SQLlite to work
with FilePro's binary data formats.

Bill
--
INTERNET:   bill at Celestial.COM  Bill Campbell; Celestial Software, LLC
URL: http://www.celestial.com/  PO Box 820; 6641 E. Mercer Way
FAX:            (206) 232-9186  Mercer Island, WA 98040-0820; (206) 236-1676

If you want government to intervene domestically, you're a liberal.  If you
want government to intervene overseas, you're a conservative.  If you want
government to intervene everywhere, you're a moderate.  If you don't want
government to intervene anywhere, you're an extremist -- Joseph Sobran