Importing/exporting data with carriage returns (I Think)

Wed Aug 30 14:29:21 PDT 2017

The errors I was seeing when trying to import the DIF file was caused by 
the line feeds without the carriage returns along with them it seems. 
When I looked at the data in Notepad++ there was a CR,LF at the each 
cells data except for the long description that had a couple of LF's by 
themselves in the middle of it. Once I changed those to a ~ using the 
"replace" in excel the errors went away. That solved my biggest problem 
which was just getting the data into filepro so I could manipulate it.

Now that the data is imported I'm seeing something weird in a few places 
that wasn't showing up, or at least I didn't see it before the import 
that looks like this "─". When I look at the data in Notepad++ and I 
select show all characters I can see all of the LF's, CR's and 
spaces(which look like little orange dots I guess) but if I look at the 
space where the "_" is showing up in the imported data it's just shows a 
totally blank space in Notepad++. I know nothing at all about Base64 but 
just playing around I converted a space to Base64 and got this "IA==" 
and when I converted one of the spaces that wound up turning into the 
"_" I got this "oA==" if that means anything at all.

This the data before the import...
"Keep your fish healthy and thriving by testing your aquarium water's 
NO3 levels regularly with this easy-to-use API NITRATE TEST KIT for 
freshwater and saltwater aquariums. Testing water parameters weekly 
helps prevent invisible water problems that can be harmful to 
*fish. When* left uncorrected, high nitrate in aquarium water can lead 
to a build-up of organic pollutants and poor fish health. Testing is 
fast, easy, and *accurate. The* API NITRATE TEST KIT for freshwater and 
saltwater aquariums tests for harmful nitrate levels from 0-160 ppm. 
This product comes with one nitrate bottle, one capped glass test tube, 
one instruction manual and one color card. The API NITRATE TEST KIT for 
freshwater and saltwater aquariums Test Kit can be used in freshwater 
and saltwater aquariums."

this the data after the import...
"Keep your fish healthy and thriving by testing your aquarium water's 
NO3 levels regularly with this easy-to-use API NITRATE TEST KIT for 
freshwater and saltwater aquariums. Testing water parameters weekly 
helps prevent invisible water problems that can be harmful to 
*fish.─When* left uncorrected, high nitrate in aquarium water can lead 
to a build-up of organic pollutants and poor fish health. Testing is 
fast, easy, and *accurate.─The* API NITRATE TEST KIT for freshwater and 
saltwater aquariums tests for harmful nitrate levels from 0-160 ppm. 
This product comes with one nitrate bottle, one capped glass test tube, 
one instruction manual and one color card. The API NITRATE TEST KIT for 
freshwater and saltwater aquariums Test Kit can be used in freshwater 
and saltwater aquariums."

If I had a clue what those spaces were that are being converted into 
"_"' I could just replace them with actual spaces before importing the 
data I think. Maybe what I should try now is first changing the LF's to 
the "~" then just use the CLEAN in excel to see if that removes them but 
I'd still be curious at to what they actually are.

Mike

On 8/29/2017 5:54 PM, Brian K. White wrote:
> On 8/29/2017 2:31 PM, Mike Fedkiw via Filepro-list wrote:
>> I'm using DIF files for importing and when the import file has line 
>> feeds it tells me that it's not a valid DIF file. I'm pretty sure 
>> it's line feeds and not carriage returns causing the issues for me 
>> because when I did a search for Ctrl+J and it didn't find anything.
>>
>> Just to test it out, I used excels CLEAN and remove all of the CR's 
>> and LF's before importing the data which worked and I didn't get any 
>> errors. The problem is now everything is lumped together into one big 
>> paragraph which is was trying to avoid in the first place.
>>
>> If I  added an "~" or something unique to the beginning and the end 
>> of the cells with the line feeds before saving it as a DIF for 
>> importing, and I'm telling filepro that the ("~" or whatever) is the 
>> field marker, is it still going to give me the not a valid DIF file 
>> error if that field still contains line feeds. And if it that 
>> actually works, will the line feeds actually show in filepro after 
>> the import.
>>
>> If this enables me to import the data into the memo with all of the 
>> line feeds intact that would be great although I'm sure I'll be 
>> needing to change them to something else before exporting if filepro 
>> is going to be stopping the export after the first LF because it'll 
>> recognize it as an end of field marker similar to CR's. *Robert 
>> Helbing* posted a nice example of how he goes about doing that though 
>> so that'll be a big help, :)
>>
>>
>> * Or maybe pre-processing the data is the most practical way to go 
>> after all.
>>
>> I'm always pre-processing the data in the worksheets I receive from 
>> my vendors anyway so one more step isn't going to be an issue.
>>
>> * If there are CR's or LF's or both within the cells, then how ARE 
>> the records delimited, if not by CR? Maybe you don't need to do 
>> anything but adjust the record delimiter option on the import command 
>> line in processing. That would be much better than cobbling together 
>> some external pre-processing steps.
>>
>> I just save the XLS file as a DIF after I sort out and arrange the 
>> fields for the import. I always know how many columns are in the 
>> import file so I just do an END after the last one when doing the 
>> import, I'm not actually looking for delimiters.
>
> fp is looking for a delimiter.
>
> I never use dif so I don't know whether lf or cr are supposed to be 
> valid within a field, and the format incorporates some special way of 
> delimiting the records, or if it internally encodes the conflicting 
> data within cells, and maybe fp is not importing fully correctly, or 
> if fp is importing correctly and it's excel that's creating a 
> technically invalid dif export.
>
> What I do know is that if you are controlling the export, then you can 
> arrange it so that the the cells contain only LF's for new lines 
> embedded within cells, and then either CRLF or only CR are for the 
> record delimiter on export, save as csv, and then in fp you can use
> import ascii foo=/path/to/foo.csv r=\r f=, o=" c="
> and you will have embedded newlines in cells no problem.
>
> Given a sample csv file like this (the ^M are really CR's, and the 
> line breaks are LF's, the records end with CRLF, the LF are 
> "invisible" here except for the fact that they cause line breaks):
>
> This csv file was exported from LibreOffice on Windows, but I only 
> have filepro on linux, so the code may need adjusting for filepro on 
> windows.
>
> /tmp/crlf_exported.csv:
> "row1 field1 line1
> row1 field1 line2
> row1 field1 line3","row1 field2 line1","row1 field3 line1"^M
> "row2 field1 line1","row2 field2 line1
> row2 field2 line2","row2 field3 line1"^M
>
>
>
> @keyi    If: '*** import csv w/ embedded linefeeds ***
>        Then: r = "0"
> getcr    If:
>        Then: import ascii csv=/tmp/crlf_exported.csv r=\r f=, o=" c="
>          If: not csv
>        Then: close csv ;end
>          If:
>        Then: r = r + "1" ;msgbox "row"<r<"cell 1 = \""&csv(1)&"\""
>          If:
>        Then: goto getcr
>
>
>                      ┌───────────────────────────────────┐
>                      │ row 1 cell 1 = "row1 field1 line1 │
>                      │ row1 field1 line2                 │
>                      │ row1 field1 line3"                │
>                      └─────────────────── Press  Enter   ┘
>
>
>                     ┌─────────────────────────────────────┐
>                     │ row 2 cell 1 = "row2 field1 line1"" │
>                     └───────────────────── Press  Enter   ┘
>
> There is still a little bug to figure out. In the 2nd row an extra " 
> appears in the 1st field. I don't know why but if it were me this 
> would just be an ordinary sort of thing to track down by a little more 
> trial & error.
>
> The basic functionality is proved, that:
>
> * A field can be imported with embedded LF's properly, by the fact the 
> the msgbox for the first row has the right contents for the 1st field. 
> It didn't chop off at the first lf, nor did it go beyond the field 
> delimiter into the next field.
>
> * The entire record breaks at the right place, by the fact that the 
> msgbox for the 2nd row has -generally- the right contents for the 
> first field of the 2nd row. There is an extra quote in there, but it 
> still didn't break the record on any previous linefeeds.
>
> The extra quote character might be from the extra LF in the CRLF line 
> endings? It was saved with normal windows CRLF line endings, but in fp 
> by specifying r=\r, I told fp to only look for a CR, and so the LF 
> might be being ignored automatically by fp, or it might be appearing 
> as a character in the last cell, or who knows what.
>
> That's just a detail still to be worked out, and I'm sure it works 
> fine once you clear up that last detail one way or another.
>
> Having to work out things like this by systematic experiment and 
> elimination is just what it takes to program anything at all. I had to 
> do several things just to get this example this far.
>
> The first time, I tried import word instead of import ascii, because 
> generally for csv, it's slightly more convenient because you don't 
> have to specify the r= f= o= c=. It's like it should have just been 
> named "import csv". But that didn't work. It broke the records on the 
> linefeeds, because I'm on linux and that is the default record 
> delimiter on any of the unix variants.
>
> Then, my file was in /tmp/crlf_exported.csv, and I had that path in 
> quotes on the import line, and that didn't work. It tried to import 
> from /u/global/appl/fpmerge/tmp/csv_exported.csv. I found in the help 
> files that the examples showed the filename without quotes. Sure 
> enough without the quotes it worked.
>
> That's just what it takes to get any job done ever anywhere.
>
> Maybe DIF can't handle embeddded linefeeds while csv can, or maybe you 
> could try SLK instead of DIF, or maybe DIF works equally well as csv, 
> and you just need to figure out the detail the same way I just did for 
> csv.
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.celestial.com/pipermail/filepro-list/attachments/20170830/0b1ff6b9/attachment.html>