fP and data integrity...how to maintain on non-quiescent systems?

Sat Feb 17 19:45:58 PST 2007

Fairlight wrote:

***WARNING***  This is long and not for the "faint of heart"... -kfw

> Okay, scenario:
> 
> Company X is a 24/7 business running fP.
Probably not an uncommon thing for any reasonably sized FP shop, 
anymore...

> Let's assume about 80 tables or so.  Figure about 10 indexes on average.
> 
> Data goes back, say, 9 years or so, no archiving has been done, which fact
> exacerbates a problem that creeps up even if the data was kept lean.
Archiving, storage is cheap so why archive (yes I meant that 
tongue-in-cheek), but few sites "properly" archive, even those that 
attempt to, sometimes get it wrong...

> Now, let's assume they want backups.  Tape, rsync to another system,
> whatever.
> 
> "table_A" is about 30 tables in sequential directory-traversal order from
> "table_Q", and comes first in traversal order.
> 
> "table_A" is the detail file for the header information in "table_Q".
> 
> Let's nevermind that, in theory (actually, probably in practise), you could
> get indices and a key that are mismatched while backing up just "table_A".
> Let's just assume that's pretty rare, although I feel it's a bad
> assumption.
If you can't dry the system activity up, you most certainly will end 
up with indexes out of sync with data files on the archive medium. 
Worse is your database as a whole can be out-of-sync (you mention 
that below), however a re-index can fix the indexes but nothing can 
truly fix data table updates that were not captured along with other 
files...

> Let's look at the time it takes (figure about 20-30min) to rsync or
> tape-backup all the data between tables A and Q.  In that time, both tables
> may have been updated, but the detail for things that got backed up in the
> headers will be missing in your backup.
> 
> This leaves you with a non-viable backup.
YEP, and until one tries to restore and actually use, they may not 
comprehend the disaster awaiting them...

> Okay, some of you folks have been doing fP for 25 years now.  How do you
> handle this kind of situation?  Generally speaking, backups should be run
> when a system is quiescent.  I know with absolutely certainty that there
> are businesses out there using it that are 24/7 and never have sufficient
> inactivity to avoid this kind of issue.
There are ways, but they involve some foresight in design...

> I'm very interested in how anyone avoids this on an rsync.  For that
> matter, rsync aside, how do you avoid it even with Edge, tar, cpio, or
> anything else?  I'm coming up blank right now, so I ask--how does one
> avoid this kind of problem?  This kind of thing would make even the best
> bit-level verified tape/disk/whatever backup almost useless in many, many
> circumstances.  I can just see a freight system with a tracking number and
> scant partial data in the header record, but all of the supplemental data
> gone missing.  NOT a good scenario.
> 
> Interested in hearing feedback on how to avoid/defeat this problem.
> 
> mark->

Mark, this goes back to a short period of time back when you were 
putting out a thought about sync'ing multiple systems and doing 
transactional processing.  It is VERY hard to avoid transactional 
processing and having data recovery/rollback capability to be able 
to protect this scenario.  Here is how I did a project and it would 
not necessarily be easy to retrofit as most mature systems not 
designed this way would be tough to convert.

You basically design your system to be run in two states.  This 
actually can help eliminate some of the archiving -OR- make the 
archiving easier once the process is in place.

The first state is the historical state from the prior checkpoint.
These are the base "historical" files that have everything posted 
prior to the previous checkpoint.

All of your "new" transactions, (read as updates post the last 
checkpoint) are in another file (or couple of files, at most).
Your historical base is READ-ONLY for all practical purposes !!

When you access data in the historical base, you look in the 
"updates" area and make the required alterations to the "historical" 
data to reflect the current state when presenting data to the user.

The update stage is the second state and to really make the system 
robust you would have the ability to have two "update" stages, call 
them Update Set-A and Update Set-B.  With out one Update Set you 
cannot be truly 24/7 !!

Where does this get you if you have setup your application this way?

Lets say you are running on your historical files with Update Set-A 
and you want to do your backups. You set a switch in the application 
to say that "NEW" updates are going to Update Set-B.  YES, that 
means that when you reference any historical data, you then look in 
Set-A updates for amendments and then in Set-B updates before you 
present it to the user.

For system backups, you Backup the historical data and the Set-A 
updates BUT NOT THE SET-B (current running) UPDATES.  This leaves 
the historical files and the set of updates that are to be applied 
to them ALL IN SYNC, TOGETHER, ON THE SAME BACKUP SET.

Once the backup set is complete, you POST and REMOVE all the data in 
the Update Set-A file(s) into the historical files.  That way if a 
user gets data from the Historical File and then goes to the 
previous cycle update file, there will not be an entry there (as it 
has been merged into the historical base), then checks the current 
Update Set (B at this point in time) making any required amendments 
and then presents the data to the user.

Even if your system "burps" during this "merge" process, the system 
can simply be brought back up and the "merge" restarted, any number 
of times, until the "merge" is complete and a completely new 
historical base has been created.

If a recovery of the files were needed, restoring the historical 
base and the Update Set would get you back to the same exact point 
in time from your backup set.

Now the next checkpoint/backup time would switch you back to Update 
Set-A, which would be excluded on the next backup and when the 
backup is complete the Update Set-B files would then be merged into 
your historical base.

NOW, if you have managed to stay with me, you have a system capable 
of running...  24hrs a day, 365/366 days/year, perfect in-sync 
backups... all you have to do is have hardware and operating 
software that can give you those types of actual uptime statistics.

Archiving is then simply removing what you no longer want in the 
historical file set.  Maybe you have a second level set of 
historical data that contains data over "n" years old and only gets 
backed up once a week or after you archive to these files, you make 
2 or 3 sets of media and throw them in a safe as you are only going 
to change those files once a year???

A lot of work, YES... but it is about as bullet proof as you can 
build an online system.  If you had a requirement to have a "hot" 
spare system, then you could simply insert the step of sending the 
cycled update file(s) either before or after the backup operation is 
complete BUT prior to doing the merge.

Actually you could send the Update Set to the hot spare and run the 
merge, while the backups are running on the production system. 
RSyncing these update files between multiple systems would only 
require freezing things for seconds to a few minutes at the most 
because the "merge" process time DOES NOT have to ever be figured 
into the backup or recovery time... it is a parallel process that 
can occur while the system is in operation at ANY TIME !!

I had asked for some things starting back in v4.5+ to make a library 
of functions to make this complete process available to any 
developer and have never been able to get the functionality added to FP.

Unfortunately, if you truly want to have this kind of functionality 
you can take the above template, but there is a ton or work left up 
to you... I wish the supporting functions were available... could 
you imagine what FP would be like if this was built into the actual 
FP engine rather than having to be programmed by each developer?

Regards,
Keith
-- 
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Keith F. Weatherhead                   keithw at ddltd.com

Discus Data, LTD                  Voice: (815) 237-8467
3465 S Carbon Hill Rd               Fax: (815) 237-8641
Braceville, IL  60407       Nat'l_Pager: (815) 768-8098
- - - - - - - - - - - - - - - - - - - - - - - - - - - -