testing for corruption

Tue Feb 5 23:47:09 PST 2008

----- Original Message ----- 
From: "Jeff Harrison" <jeffaharrison at yahoo.com>
To: "Nancy Palmquist" <nlp at vss3.com>
Cc: <filepro-list at lists.celestial.com>
Sent: Tuesday, February 05, 2008 10:59 PM
Subject: Re: testing for corruption

>
> --- Nancy Palmquist <nlp at vss3.com> wrote:
>
>> Jeff,
>>
>> I have seen corruption that falls into a few
>> categories.
>>
>> 1) a block of records is just bad, usually caused by
>> a bad hard drive sector or
>> a FAT table failure that pointed to sector
>> incorrectly.  In this case, the
>> records before and after the sector are just fine
>> but the records that did live
>> in the sector are ruined.  They all need to be
>> erased and returned to the free list.
>>
>> 2) Corruption during a shuffle - Part of the file is
>> structured be one map and
>> part by a second map.  I have seen this in more than
>> one form.  A chunk of
>> records in the middle (usually when someone update
>> the data during a shuffle)
>> or half right and half wrong.  When the file is
>> increased, the shuffle will make
>> the new file in a new place.  When the shuffle is
>> making the file smaller, it
>> writes over the existing file, from the beginning.
>> This crashes part way thru
>> and the data file is trashed.
>>
>> 3)  Mismatches between key and data files as to the
>> number of records each counts.
>>
>> 4) this can also apply to the key,  keyx1, keyx2,
>> keyx3 types of file layouts.
>>
>> I have also seen cases where you can see the data on
>> the screen but the free
>> record flag is set to free so the data is not seen
>> by filePro.  That was a
>> bugger to fix.  Can not imagine how that happened in
>> the first place, but I saw
>> it at least twice.
>>
>> Sounds like a great tool, but corruption can come in
>> a variety of forms.  Might
>> be tricky to figure out what is exactly needed.
>>
>> Nancy
>>
>
> Hi Nancy.  Thanks.  That it is still 2 votes to 1
> against my developing this though - wait make that 2
> to 2 as I would vote yes :-)
>
> Yes, the utility that I am proposing would detect the
> corruption from your #1, and #2 scenarios, as I would
> expect to see non ascii data filepro's data fields.
>
> As for #3, mismatches between # of records in key and
> data - I don't think a utility is needed for that as
> filepro will detect this for you.  dexpand will even
> fix it for you as I'm sure you are aware.
>
> Good point on #4 keyx1 datax1 x2, etc.  It would also
> need to handle qualifiers I guess.
>
> I have also heard of cases where the record is marked
> as deleted, but there was still data there - although
> a don't know that I have ever come across that myself.
> This utility would check for that.
>
> Thanks for your comments.
>
> Jeff Harrison
> jeffaharrison at yahoo.com
>
> Author of JHImport and JHExport. The fastest and
> easiest ways to import and export with filepro.

Why can't a record be marked for deletion and have old data in it?
How do you know if the 20byte header is wrong or the data is wrong?

corruption can really only be recognized, let alone fixed, by the designer 
of an _application_ and except in a sense so broad as to be not useful, 
filepro itself doesn't count as an application. Fiilepro IS an application 
and COULD detect some insanity within it''s own data structures, such as the 
file format of a screen or a menu. Filepro is fit to automatically say 
things like "if ever there is a byte foo at this location, or if ever their 
fails to be this repeating pattern from here to here... then it is ok to 
throw it out, zero-fill it, replace it with a valid skeleton formatting, 
whatever etc...
But the data in a filepro file is just too user specifiable.

You *may* detect that the data doesn't line up with the map, *if* you happen 
to be lucky and the file uses any edits that garantee certain minimum data 
must exist, but, even in that lucky case, how do you know the which is 
corrupt, the data, or the map? The application author can say that. No one 
else can. They know what they wanted.  Which btw, is no garantee that's what 
they told filepro to do. how do you detect corruption that is the result of 
filepro correctly doing exactly what was asked, but which was a mistake made 
by the developer? It's not actually corrupt at all, except to the developer 
who knows he doesn't want it.

I'm not even sure you can safely detect the 20-byte header and say that it 
is or isn't falling every <defined record length> I was not under the 
impression that _any_ bytes were illegal in filepro data, as long as they 
don't conflict with an edit, and even then, edits take effect at data entry, 
they don't necessarily mean the data that doesn't match the edit is 
unintentional. Supposed I purposefully put edit-breaking data in a field as 
part of my logic? For whatever reason I want to know the difference between 
emptied-out, never been filled yet, filled but invalid, and filled valid?
Or suppose I count on /ov to alert me that some other step has gone out of 
bounds? The absolute last thing I would want then is for my error detecttion 
to all be erased and made "valid" and leave me never suspecting I had 
errors.

I think there is no end of possibilities like this.

Any util that claims to detect corruption should probably do nothing but 
detect, offer the evidence to the user and ask what to do _always_ and make 
_no_ assumptions.

I think it might as well be attempted though and see what happens. maybe 
there is some, however minimal, useful job it can do. Maybe it can be done 
in such a way that it starts every time by asking the user 57 questions in 
place of all those assumptions. That would probaboy be real useful then. It 
could handle every freak custom arrangement we all love to pervert filepro 
into perfroming then, without you having to somehow think of everything 
thats possible.

I think detecting data corruption in a general purpose tool like filepro 
would be about like detecting invalid use of the english language. What 
string of characters can you really say is automatically invalid? Even 
setting aside the infinite valid possibilities allowed by simply claiming 
"art!", even in plebian prose, you may know that a comma doesn't come after 
or before some some word or phrase, but how do you know what is the correct 
correction? remove the comma? end the sentence and start the next there? or 
was some word, or mere letter, lost, or scrambled,  right there, or 
somewhere else,  that would have made it all correct?

On the other hand
poetry's application
gives enough format

-- 
Brian K. White    brian at aljex.com    http://www.myspace.com/KEYofR
+++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++.
filePro  BBx    Linux  SCO  FreeBSD    #callahans  Satriani  Filk!