FW: filePro speed, and indexes (GRX)

John Esak john at valar.com
Thu Mar 29 16:26:35 PDT 2007


> Thanks for the input John.  As far as the filling in of the field, I
> actually did not use a date but an X and the record number concatenated
> ("X"{@RN) together.  The field was an already existing field of
> 10,allup, so rather than make a change to the database structure, we
> just left it.  The actual information that we are putting in it now, is
> just a sequential number.
>
> Again, thanks for your advice.
>
> Christopher Sellitto


Ah, I see. I thought it was a date... but using the x and @rn is a very good
idea. I had neglected to mention something like that and I wanted to. I was
skirting the issue about duplicates and didn't nail it right on the head. To
just replace all the blanks with the same string (index key) would be no
better since it would present the same duplicates problem. By using the @rn
in the makeup of the field, you give the index routine the best chance of
not having to deal with duplicates.

By the way, for someone not understanding what I mean by profiling for
duplicates. Think of it this way. Let's say as an analogous situation  you
have a routine that compresses a file to make it smaller. You might do
something like the following... very simplified of course.

In the file you are compressing you come across 89,523 spaces all in a row.
In your compressed version of the file, you might write a special character
or couple characters to indicate that a compression notation is coming up,
say ESCAPE-N and then after that you would write the digits 89523 followed
by a space. This depicts the run of 89523 spaces, but you've done it in only
8 characters where the original file used 89523 actual characters. When your
decompression routine  sees the Escape-N followed by 89523 and a space, it
simply converts it back into 89,523 actual spaces all in a row. (What always
blows my mind is that compression/decompression routines like this [winzip,
gzip, whatever] always do this stuff so blindingly fast... it really is
amazing if you ever stop to think about it.)

Anyway, that is essentially what the filePro indexing routine is always
trying to do. It's always shooting for smaller index files, and better
indexing algorithms so the final indexes are as fast and efficient  as
possible. If there are 89523 records with exactly the same value for an
indexed field... why write out 89,523 instances of that information in the
index file when you can do it with some equally tremendous savings as in the
example above. With indexes it's not only savings in space (but also time
reading and writing the index file itself) when using this kind of
compressing notation . Again, the actual index file and its contents are
vastly different than my analog, but you get the idea.  I'd rather have the
streamlining and more efficient, faster compressing routines than big slow,
lumbering indexes. It's worth the small hassles now and again for all of us
to help work out the solution when small problems show up. We end up with
better and better indexes.

John




More information about the Filepro-list mailing list