Best way to import XML Files
Fairlight
fairlite at fairlite.com
Sun Jun 20 09:21:25 PDT 2004
Four score and seven years--eh, screw that!
At about Sun, Jun 20, 2004 at 06:27:12AM -0400,
George Simon blabbed on about:
>
> My concern with reading the data in chunks of xxxx number of bytes and
> stuffing that into a dummy and then reading the file again and stuffing it
> into another dummy, etc., etc. until you reach the end of the file is... how
> do you know where the first chunk of data ended? Maybe you cut it off in
> the middle of the data or in the middle of a tag. I'm sure there must be
> ways around this, but it seems to me that's a lot of things to worry about
> _if_ you don't have to. Is there any easy solution to this problem?
Rough psudocode:
Grab size of file.
Read 8192K bytes (or whatever...31K if you want) until you get to within
less than the final whole chunk, then read the balance to EOF (by size).
While you're reading in, parse as you go, nuking the buffer as you parse so
you've only got the remainder left at any given atomic parsing step. If
you end in the middle of data or before a tag closing, you just retain the
buffer's trailing contents and append the new input to the buffer, then go
back to parsing.
Rinse, repeat.
If you have data that exceeds the max size of a dummy variable (let's say
a base64 encoded file embedded in XML, for instance), then you make use of
an external file to write that out to, if you know you're expecting that
kind of scenario.
A good reason to use something under 16K is so that you -can- maintain a
decent sized buffer of embedded data and still have room to append at least
one more round to try again. I'd personally probably use 8192, just
because it's relatively standard and efficient.
mark->
--
Bring the web-enabling power of OneGate to -your- filePro applications today!
Try the live filePro-based, OneGate-enabled demo at the following URL:
http://www2.onnik.com/~fairlite/flfssindex.html
More information about the Filepro-list
mailing list