import/export dif

Brian K. White brian at aljex.com
Tue Feb 20 19:08:13 PST 2007


We just discovered the hard way today that fp's dif import and export both 
don't handle quotes properly.

According to specs on line, when a " appears in a data field, it needs to be 
represented as "" in the raw dif file.

We wrote a new export to dif and excel was importing it completely horribly, 
and it was obviously failing to parse the "'s in the data.

When we saw the specs for dif and looked at the file fp created, we saw that 
fp was not doubling up the quotes.
This has nothing to do with the quotes that go around the entire field as 
part of the dif format btw.

When we added a gosub to the export process to scan every character of every 
field one at a time and double up any quotes in the output, the resulting 
file looked like the specs describe, and more importantly, excel imported 
the file flawlessly. the double-double-quotes in the raw dif file were 
rendered as only one double-quote in the data in excel.

Thats all fine and the gosub in the export is yucky but not intolerable.

But we need to re-import the dif file and not surprizingly fp (import dif 
...) also doesn't handle the quotes correctly and imports the correct dif 
data about as badly as excel imported the incorrect dif data. But there is 
no simple work-around on import the way there was for export. the import 
command has already munged the data before we can see it in processing to 
fix it up.
That is, we can't write a gosub that looks for 2 consecutive quotes and 
turns them back to one quote, because the field has already been chopped off 
by the import command not parsing the quotes correctly and the data isn't 
there to fix.

So of course the simple answer is switch to pipe or tab delimiter or some 
other safe delimiter instead of dif, which we are doing and the data is not 
so crazy that this won't work fine.

Just wanted to vent about two programmers wasting almost a couple hours each 
figuring out whats wrong and re-writing something all because we tried to 
use a feature that turned out not to do what it claims.

If this is a known and fix bug I'd like to hear about it.
If this an unknown bug or a new one that didn't used to exist, then here is 
the report of it.

In either event, public service announcement:
Don't use export or import dif unless if your data has (or ever might have) 
even a single " anywhere in the entire file.

Why were we even trying to use that odd format in the first place?
Because the data does have lots of "'s, and commas, and other stuff that 
confuses excel when it's in csv format.
(other stuff = data that excel thinks is equations and fuctions and cell 
references when really it's just supposed to be litteral text)
And excel was failing to import csv that really wasn't ambiguous according 
to the rules. fp would have imported it correctly, openoffice imported it 
correctly, some installations of excel even imported it correctly.
And it was a good tactic at first, since excel did correctly import the dif 
version of the same data, after we added the gosub to fix the "'s on the way 
out. (it still misinterpreted some of the data as cell references and stuff 
but little enough that it was tolerable to just manually fix it after 
opening in excel)

Brian K. White  --  brian at aljex.com  --  http://www.aljex.com/bkw/
+++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++.
filePro  BBx    Linux  SCO  FreeBSD    #callahans  Satriani  Filk!



More information about the Filepro-list mailing list