Importing/exporting data with carriage returns (I Think)
Brian K. White
brian at aljex.com
Tue Aug 29 14:54:34 PDT 2017
On 8/29/2017 2:31 PM, Mike Fedkiw via Filepro-list wrote:
> I'm using DIF files for importing and when the import file has line
> feeds it tells me that it's not a valid DIF file. I'm pretty sure it's
> line feeds and not carriage returns causing the issues for me because
> when I did a search for Ctrl+J and it didn't find anything.
>
> Just to test it out, I used excels CLEAN and remove all of the CR's and
> LF's before importing the data which worked and I didn't get any errors.
> The problem is now everything is lumped together into one big paragraph
> which is was trying to avoid in the first place.
>
> If I added an "~" or something unique to the beginning and the end of
> the cells with the line feeds before saving it as a DIF for importing,
> and I'm telling filepro that the ("~" or whatever) is the field marker,
> is it still going to give me the not a valid DIF file error if that
> field still contains line feeds. And if it that actually works, will the
> line feeds actually show in filepro after the import.
>
> If this enables me to import the data into the memo with all of the line
> feeds intact that would be great although I'm sure I'll be needing to
> change them to something else before exporting if filepro is going to be
> stopping the export after the first LF because it'll recognize it as an
> end of field marker similar to CR's. *Robert Helbing* posted a nice
> example of how he goes about doing that though so that'll be a big help, :)
>
>
> * Or maybe pre-processing the data is the most practical way to go after
> all.
>
> I'm always pre-processing the data in the worksheets I receive from my
> vendors anyway so one more step isn't going to be an issue.
>
> * If there are CR's or LF's or both within the cells, then how ARE the
> records delimited, if not by CR? Maybe you don't need to do anything but
> adjust the record delimiter option on the import command line in
> processing. That would be much better than cobbling together some
> external pre-processing steps.
>
> I just save the XLS file as a DIF after I sort out and arrange the
> fields for the import. I always know how many columns are in the import
> file so I just do an END after the last one when doing the import, I'm
> not actually looking for delimiters.
fp is looking for a delimiter.
I never use dif so I don't know whether lf or cr are supposed to be
valid within a field, and the format incorporates some special way of
delimiting the records, or if it internally encodes the conflicting data
within cells, and maybe fp is not importing fully correctly, or if fp is
importing correctly and it's excel that's creating a technically invalid
dif export.
What I do know is that if you are controlling the export, then you can
arrange it so that the the cells contain only LF's for new lines
embedded within cells, and then either CRLF or only CR are for the
record delimiter on export, save as csv, and then in fp you can use
import ascii foo=/path/to/foo.csv r=\r f=, o=" c="
and you will have embedded newlines in cells no problem.
Given a sample csv file like this (the ^M are really CR's, and the line
breaks are LF's, the records end with CRLF, the LF are "invisible" here
except for the fact that they cause line breaks):
This csv file was exported from LibreOffice on Windows, but I only have
filepro on linux, so the code may need adjusting for filepro on windows.
/tmp/crlf_exported.csv:
"row1 field1 line1
row1 field1 line2
row1 field1 line3","row1 field2 line1","row1 field3 line1"^M
"row2 field1 line1","row2 field2 line1
row2 field2 line2","row2 field3 line1"^M
@keyi If: '*** import csv w/ embedded linefeeds ***
Then: r = "0"
getcr If:
Then: import ascii csv=/tmp/crlf_exported.csv r=\r f=, o=" c="
If: not csv
Then: close csv ;end
If:
Then: r = r + "1" ;msgbox "row"<r<"cell 1 = \""&csv(1)&"\""
If:
Then: goto getcr
┌───────────────────────────────────┐
│ row 1 cell 1 = "row1 field1 line1 │
│ row1 field1 line2 │
│ row1 field1 line3" │
└─────────────────── Press Enter ┘
┌─────────────────────────────────────┐
│ row 2 cell 1 = "row2 field1 line1"" │
└───────────────────── Press Enter ┘
There is still a little bug to figure out. In the 2nd row an extra "
appears in the 1st field. I don't know why but if it were me this would
just be an ordinary sort of thing to track down by a little more trial &
error.
The basic functionality is proved, that:
* A field can be imported with embedded LF's properly, by the fact the
the msgbox for the first row has the right contents for the 1st field.
It didn't chop off at the first lf, nor did it go beyond the field
delimiter into the next field.
* The entire record breaks at the right place, by the fact that the
msgbox for the 2nd row has -generally- the right contents for the first
field of the 2nd row. There is an extra quote in there, but it still
didn't break the record on any previous linefeeds.
The extra quote character might be from the extra LF in the CRLF line
endings? It was saved with normal windows CRLF line endings, but in fp
by specifying r=\r, I told fp to only look for a CR, and so the LF might
be being ignored automatically by fp, or it might be appearing as a
character in the last cell, or who knows what.
That's just a detail still to be worked out, and I'm sure it works fine
once you clear up that last detail one way or another.
Having to work out things like this by systematic experiment and
elimination is just what it takes to program anything at all. I had to
do several things just to get this example this far.
The first time, I tried import word instead of import ascii, because
generally for csv, it's slightly more convenient because you don't have
to specify the r= f= o= c=. It's like it should have just been named
"import csv". But that didn't work. It broke the records on the
linefeeds, because I'm on linux and that is the default record delimiter
on any of the unix variants.
Then, my file was in /tmp/crlf_exported.csv, and I had that path in
quotes on the import line, and that didn't work. It tried to import from
/u/global/appl/fpmerge/tmp/csv_exported.csv. I found in the help files
that the examples showed the filename without quotes. Sure enough
without the quotes it worked.
That's just what it takes to get any job done ever anywhere.
Maybe DIF can't handle embeddded linefeeds while csv can, or maybe you
could try SLK instead of DIF, or maybe DIF works equally well as csv,
and you just need to figure out the detail the same way I just did for csv.
--
bkw
More information about the Filepro-list
mailing list