Importing/exporting data with carriage returns (I Think)

Tue Aug 29 14:54:34 PDT 2017

On 8/29/2017 2:31 PM, Mike Fedkiw via Filepro-list wrote:
> I'm using DIF files for importing and when the import file has line 
> feeds it tells me that it's not a valid DIF file. I'm pretty sure it's 
> line feeds and not carriage returns causing the issues for me because 
> when I did a search for Ctrl+J and it didn't find anything.
> 
> Just to test it out, I used excels CLEAN and remove all of the CR's and 
> LF's before importing the data which worked and I didn't get any errors. 
> The problem is now everything is lumped together into one big paragraph 
> which is was trying to avoid in the first place.
> 
> If I  added an "~" or something unique to the beginning and the end of 
> the cells with the line feeds before saving it as a DIF for importing, 
> and I'm telling filepro that the ("~" or whatever) is the field marker, 
> is it still going to give me the not a valid DIF file error if that 
> field still contains line feeds. And if it that actually works, will the 
> line feeds actually show in filepro after the import.
> 
> If this enables me to import the data into the memo with all of the line 
> feeds intact that would be great although I'm sure I'll be needing to 
> change them to something else before exporting if filepro is going to be 
> stopping the export after the first LF because it'll recognize it as an 
> end of field marker similar to CR's. *Robert Helbing* posted a nice 
> example of how he goes about doing that though so that'll be a big help, :)
> 
> 
> * Or maybe pre-processing the data is the most practical way to go after 
> all.
> 
> I'm always pre-processing the data in the worksheets I receive from my 
> vendors anyway so one more step isn't going to be an issue.
> 
> * If there are CR's or LF's or both within the cells, then how ARE the 
> records delimited, if not by CR? Maybe you don't need to do anything but 
> adjust the record delimiter option on the import command line in 
> processing. That would be much better than cobbling together some 
> external pre-processing steps.
> 
> I just save the XLS file as a DIF after I sort out and arrange the 
> fields for the import. I always know how many columns are in the import 
> file so I just do an END after the last one when doing the import, I'm 
> not actually looking for delimiters.

fp is looking for a delimiter.

I never use dif so I don't know whether lf or cr are supposed to be 
valid within a field, and the format incorporates some special way of 
delimiting the records, or if it internally encodes the conflicting data 
within cells, and maybe fp is not importing fully correctly, or if fp is 
importing correctly and it's excel that's creating a technically invalid 
dif export.

What I do know is that if you are controlling the export, then you can 
arrange it so that the the cells contain only LF's for new lines 
embedded within cells, and then either CRLF or only CR are for the 
record delimiter on export, save as csv, and then in fp you can use
import ascii foo=/path/to/foo.csv r=\r f=, o=" c="
and you will have embedded newlines in cells no problem.

Given a sample csv file like this (the ^M are really CR's, and the line 
breaks are LF's, the records end with CRLF, the LF are "invisible" here 
except for the fact that they cause line breaks):

This csv file was exported from LibreOffice on Windows, but I only have 
filepro on linux, so the code may need adjusting for filepro on windows.

/tmp/crlf_exported.csv:
"row1 field1 line1
row1 field1 line2
row1 field1 line3","row1 field2 line1","row1 field3 line1"^M
"row2 field1 line1","row2 field2 line1
row2 field2 line2","row2 field3 line1"^M

@keyi    If: '*** import csv w/ embedded linefeeds ***
        Then: r = "0"
getcr    If:
        Then: import ascii csv=/tmp/crlf_exported.csv r=\r f=, o=" c="
          If: not csv
        Then: close csv ;end
          If:
        Then: r = r + "1" ;msgbox "row"<r<"cell 1 = \""&csv(1)&"\""
          If:
        Then: goto getcr

                      ┌───────────────────────────────────┐
                      │ row 1 cell 1 = "row1 field1 line1 │
                      │ row1 field1 line2                 │
                      │ row1 field1 line3"                │
                      └─────────────────── Press  Enter   ┘

                     ┌─────────────────────────────────────┐
                     │ row 2 cell 1 = "row2 field1 line1"" │
                     └───────────────────── Press  Enter   ┘

There is still a little bug to figure out. In the 2nd row an extra " 
appears in the 1st field. I don't know why but if it were me this would 
just be an ordinary sort of thing to track down by a little more trial & 
error.

The basic functionality is proved, that:

* A field can be imported with embedded LF's properly, by the fact the 
the msgbox for the first row has the right contents for the 1st field. 
It didn't chop off at the first lf, nor did it go beyond the field 
delimiter into the next field.

* The entire record breaks at the right place, by the fact that the 
msgbox for the 2nd row has -generally- the right contents for the first 
field of the 2nd row. There is an extra quote in there, but it still 
didn't break the record on any previous linefeeds.

The extra quote character might be from the extra LF in the CRLF line 
endings? It was saved with normal windows CRLF line endings, but in fp 
by specifying r=\r, I told fp to only look for a CR, and so the LF might 
be being ignored automatically by fp, or it might be appearing as a 
character in the last cell, or who knows what.

That's just a detail still to be worked out, and I'm sure it works fine 
once you clear up that last detail one way or another.

Having to work out things like this by systematic experiment and 
elimination is just what it takes to program anything at all. I had to 
do several things just to get this example this far.

The first time, I tried import word instead of import ascii, because 
generally for csv, it's slightly more convenient because you don't have 
to specify the r= f= o= c=. It's like it should have just been named 
"import csv". But that didn't work. It broke the records on the 
linefeeds, because I'm on linux and that is the default record delimiter 
on any of the unix variants.

Then, my file was in /tmp/crlf_exported.csv, and I had that path in 
quotes on the import line, and that didn't work. It tried to import from 
/u/global/appl/fpmerge/tmp/csv_exported.csv. I found in the help files 
that the examples showed the filename without quotes. Sure enough 
without the quotes it worked.

That's just what it takes to get any job done ever anywhere.

Maybe DIF can't handle embeddded linefeeds while csv can, or maybe you 
could try SLK instead of DIF, or maybe DIF works equally well as csv, 
and you just need to figure out the detail the same way I just did for csv.

-- 
bkw