Data Import

Brian K. White brian at aljex.com
Tue Mar 12 15:58:23 PDT 2013


It does screw up fp but that's fp's flaw I think. I export lf and crlf 
inside of fields all the time, in CSV format, and other stuff handles it 
properly. For instance the notes on a contacs/addressbook record in PIM 
apps like Outlook and Palm Desktop (not that anyone uses that anymore) 
are multiple lines all in a single field. And that wasn't just my bad 
idea to try that and luckily it worked. When you export the address book 
to csv from those different PIM apps, that is how they export it.

A record delimiter inside of a *quoted* field is simply not supposed to 
be taken as a record delimiter. Excel and libre/open-office and PIM apps 
all handle that no problem

To clean up csv data that contains record delimiters inside of fields so 
that it won't break fp would basically require a csv parser.

No import word or import ascii or import anything. Plain read(). You'd 
have to read every character one at a time and keep track of your state 
as you go along whether you are inside of a quoted field or not at any 
given time, and apply the same precedence rules everything else does 
instead of the way "import word" does. IE that quoting supersedes record 
delimiting.

You could do it right in filepro processing. It would actually not be 
all that many lines of fp code, or all that hard to use once written, 
and could be a reusable call table or a gosub. It might be slow since it 
would be a processing loop that runs at least a couple lines of 
processing on every single character in the file. Then again fp's 
processing speed has always surprised me. Or else most real world jobs 
are just so small in terms of amount of data that it doesn't matter on 
todays hardware. I have other things that chew one character at a time 
like that and they've never been the speed problem I expected.


Xml shouldn't care at all. You may place whitespace, including lf and 
crlf anywhere you want outside of elements and they are ignored. Inside 
of elements they are data. Importing xml directly with filepro, since it 
has no xml parser, requires you to write your own xml parser. Expecting 
line endings anywhere in xml data is your fault not the datas fault. A 5 
meg file with not a single lf or crlf, or even any whitespace at all 
could still be perfectly valid xml.

I've used xmlstarlet for some jobs to read xml data reliably. It's a 
command line xml parser. Not the most efficient since you have to run it 
at least a few times to get the data you want.

I think Mark (Fairlite) has xml-parsing fp code, at least to some 
degree. Or maybe I'm thinking of a stand alone perl util? xml2csv?

-- 
bkw


On 3/12/2013 9:32 AM, Nancy Palmquist wrote:
> I do imports for all kinds of things and the worst character people can
> put into a data field is the CRLF.
>
> There is no way around this at all.
>
> I get XML data and between tags, I will get fields that include CRLF.
> As you all note, the CSV files are also riddled with issues because the
> source is not as controlled as we would like.  One advantage of XML is
> that it does not need any CRLF codes to work, so you can strip them out
> without causing any data loss or other issues.  The other issue is any
> characters that would be changed by TOHTML() function.  There is no
> reverse function so each code must be searched out and reversed to the
> appropriate code manually. (This is a pain but necessary)
>
> People put quotes - inside of quoted strings.  They put CRLF inside of
> quoted strings.  When CRLF is also the record delimiter this ruins
> everything.  FilePro should understand the difference but it does not.
>
> I have had files so bad that the only way to read them is byte by byte.
>
> It is important to look at a sample or small subset of samples and see
> exactly what the data will include to try to figure out what to do.
>
> Always use a HEX editor to do this looking.  It is more informative.
>
> I will resort to OPEN() READLINE() or READ() when IMPORT is just not
> working to load a file of data.  The rules are different.
> Good luck Ken.
>
> Nancy
>
>
>
> On 3/12/2013 8:43 AM, scooter6 at gmail.com wrote:
>>     Or you can run a simple sed script on the file first.....to remove the "
>>     sed -e 's/"//g'
>>
>>
>>
>> On Mon, Mar 11, 2013 at 10:45 AM, Kenneth Brody <kenbrody at spamcop.net>wrote:
>>
>>> On 3/10/2013 10:05 PM, Scott Walker wrote:
>>>> Ken,
>>>>
>>>> Thanks for the input.  It definitely handles the field surrounded by the
>>> "
>>>> characters.
>>> FYI - The other option is to use the O= and C= flags.  ("Import word" just
>>> sets the defaults to one of the common formats, CSV with quotes around
>>> [some] values.)
>>>
>>>> It definitely does not handle the field with the " in the actual data.
>>> Can you give a specific example that "doesn't work"?
>>>
>>>> This is what I always seem to find with import.  One way handles some
>>> issues
>>>> but not others.  Change it...you fix some problems, make others.
>>>>
>>>> I don't see this as a problem with fp import command.  I think the data I
>>>> get is FUBAR.
>>>>
>>>> Thanks again for taking the time.
>>>>
>>>> Regards,
>>>>
>>>> Scott
>>>>
>>>>> -----Original Message-----
>>>>> From: filepro-list-
>>>>> bounces+scottwalker=ramsystemscorp.com at lists.celestial.com
>>>>> [mailto:filepro-list-
>>>>> bounces+scottwalker=ramsystemscorp.com at lists.celestial.com] On Behalf
>>> Of
>>>>> Ken Cole
>>>>> Sent: Sunday, March 10, 2013 9:09 PM
>>>>> To: Scott Walker
>>>>> Cc: filePro Mailing List
>>>>> Subject: Re: Data Import
>>>>>
>>>>> "Import word" does it automatically.
>>>>>
>>>>> Ken
>>>>>
>>>>> On Monday, 11 March 2013, Scott Walker wrote:
>>>>>
>>>>>> I am importing from an .csv file.
>>>>>>
>>>>>> Data is separated by commas.
>>>>>>
>>>>>> If a field contains a comma in it the field is enclosed in "
>>>>>>
>>>>>> Example:
>>>>>>
>>>>>> Scott, Walker,"112 Main Street, Suite 12",Charlotte,NC,28212
>>>>>>
>>>>>>
>>>>>> What's the best way to import into fp so I get the 112 Main Street,
>>>>>> Suite
>>>>>> 12
>>>>>> in the field without the ".
>>>>>>
>>>>>> Sorry for this basic question, but imports always seem to befuddle me.
>>>>>>
>>>>>> Regards,
>>>>>> Scott
>>>>>>
>>>>>>
>>>>>> Scott Walker
>>>>>> RAM Systems Corp
>>>>>> (704) 896-6549
>>>>>> Scott.Walker at RAMSystemsCorp.com
>>>>>>
>>>>>> www.RAMSystemsCorp.com
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Filepro-list mailing list
>>>>>> Filepro-list at lists.celestial.com <javascript:;>
>>>>>> Subscribe/Unsubscribe/Subscription Changes
>>>>>> http://mailman.celestial.com/mailman/listinfo/filepro-list
>>>>>>
>>>>> -------------- next part --------------
>>>>> An HTML attachment was scrubbed...
>>>>> URL: http://mailman.celestial.com/pipermail/filepro-
>>>>> list/attachments/20130311/021ace82/attachment.html
>>>>> _______________________________________________
>>>>> Filepro-list mailing list
>>>>> Filepro-list at lists.celestial.com
>>>>> Subscribe/Unsubscribe/Subscription Changes
>>>>> http://mailman.celestial.com/mailman/listinfo/filepro-list
>>>>
>>>> _______________________________________________
>>>> Filepro-list mailing list
>>>> Filepro-list at lists.celestial.com
>>>> Subscribe/Unsubscribe/Subscription Changes
>>>> http://mailman.celestial.com/mailman/listinfo/filepro-list
>>>>
>>>>
>>> --
>>> Kenneth Brody
>>> _______________________________________________
>>> Filepro-list mailing list
>>> Filepro-list at lists.celestial.com
>>> Subscribe/Unsubscribe/Subscription Changes
>>> http://mailman.celestial.com/mailman/listinfo/filepro-list
>>>
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: http://mailman.celestial.com/pipermail/filepro-list/attachments/20130312/8f9aab15/attachment.html
>> _______________________________________________
>> Filepro-list mailing list
>> Filepro-list at lists.celestial.com
>> Subscribe/Unsubscribe/Subscription Changes
>> http://mailman.celestial.com/mailman/listinfo/filepro-list
>>
>



More information about the Filepro-list mailing list