testing for corruption

Mon Feb 4 14:25:13 PST 2008

Only Bob Stockler would say something like:
> Jeff Harrison wrote (on Mon, Feb 04, 2008 at 11:01:16AM -0800):
> [snip]
> 
> | I wonder if there would be any demand for a filepro
> | corruption detector (written in filepro of course). 
> | This routine would just prompt the user for the name
> | of the filepro file to check (or it could be passed on
> | the command line) and it could output the record
> | numbers where corruption appears, and perhaps a
> | description as to what kind of corruption it is.

Define "corruption" in this context.

Record 20-byte header at wrong/unexpected offset?  Possibly detectable,
since they'd be outside the printable character range.

Detection of non-printable characters in data areas?  Possibly detectable,
although it's going to be hard to tell programmatically whether this is the
case or if the header is just misaligned as in the first point.  I suppose
if you see 20 bytes in a row that are outside the printable range when you
find one where you don't expect it, that might be an indicator.  There's no
guarantee that it's what you think it is, though.

Further...when you find such a misalignment, what is the procedure for
continuing reporting?  Reset the record start offset to where you -think-
the record actually starts?  Continue onwards as if nothing happened,
assuming the other offsets should be correct?  That's got a 50/50 chance of
being wrong no matter which way you go.

And technically speaking, both those scenarios are actually the least
likely form of corruption unless you're physically losing a drive's
integrity.  Mostly because of the same reason it's better to create a
file far larger than you need in advance as empty records, extending it
outwards, and then let it fill up--since no segment of the file is ever
actually physically deleted, the sizes and offsets should remain constant
and consistent because it's not like the file is moved/unlinked/copied.
It's all handled in-place, and at most appended to.  The only thing that
should mangle a key/data segment in terms of changing the offsets would be
use of a buggy third party program, or a bug in fP itself.  I doubt the
latter would exist at this late date, at least as far as key/data--blobs
may be an entirely different story.

So really, you get into the technical question of, "What is valid data?"
And you want to check it field by field.  Since we're talking fixed-width
fields, and spaces are valid data, about the only thing you can say is that
non-printable characters shouldn't be in data areas, if you were checking
whole records at a time.

The only thing I can think of to do is actually run a process that creates
a processing table that has a variable for each map entry, runs that table,
and the table checks to make sure that the fields actually properly fit
into the edit type.  Unfortunately, to the best of my knowledge, edit
failure outside of dclerk's actual screen manipulation just silently either
discards data or tries to make it fit, but doesn't throw an exception and
isn't something you can even detect failure of.

So...what is it, exactly, you're going to test for?  And if you -are- going
to test for something, you'd pretty much have to rewrite the edit system in
a fault-detectable environment, as fP won't (TTBOMK) do it itself.

And even then, an edit would probably allow and simply discard the
beginning whitespace on a mis-offset field.  It would take "  Name" and
simply make it "Name" padded out the other side.  So really you'd have to
define additional constraints that fP doesn't even enforce itself.

Again, define "corrupted".  Without using "government" in the same
sentence, please...no recursive definitions.  But you have to know what
you're testing for before you can test for it.  Half of the tests I've
thought of are next to useless on hardware that isn't physically dying (and
if you don't have a backup solution, that's your own fault), and the other
half are rather fluid and ambiguous tests in general.  fP -shouldn't- need
an fsck type program.  As for the rest...it's very nebulous, and very
difficult to nail to the wall in a meaningful way.

mark->