Importing XML Generically: A Proposed Project
Fairlight
fairlite at fairlite.com
Tue Dec 18 08:55:16 PST 2007
Hi All,
Been considering this for a while, and just got my head around something
within the last week that I think makes it tennable now. I'm interested
in finding out whether it's worth my while to take the time and develop,
however.
Basically, fP has no graceful way of handling XML import unless you rewrite
XML parsers by hand--over and over and over for each different DTD or
Schema--in processing.
However, it does have the ability to neatly import CSV format files.
Assuming one could specify a configuration file (with -optional- comments)
in the format:
# Order Date.
root.order.date
# Order Number.
root.order.number
# Customer Name.
root.order.customer.first_name
root.order.customer.last_name
# Order Status (attribute status="yes|no" on the <order> element).
root.order#status
# Billing Amount.
root.order.billing.amount
# Billing Currency as attribute to element <amount>.
root.order.billing.amount#currency
#Billing Date.
root.order.billing.date
...and assuming the command was simply:
xml2csv -c ord.conf [-n namespace] [-o out_file] -r order 20071218113903.xml
...and assuming you'd get as output, line after line in the format:
"12/11/2007","C2987197D25","Mark","Luljak","Yes","995.00","USD","12/12/2007"
...Well, then you'd have valid XML translated to a usable CSV format the fP
can use with IMPORT. Fields would be presented in the order defined in the
config file. You can specify different config files for different data
files, making this generic.
I'm still deciding how to handle multiple entries in a one-to-many
relationship. If you specified "root.order.items.item_number" in the
definition, I don't know if it should do multiple complete lines per unique
item number, or just comma-separate them in one line (ie., "23115A,25907G").
I think it should be comma separated, as you could theoretically have more
than a single one-to-many relationship in the same association, and
repetative lines really do give you too many permutation possibilities to
be useful if you end up with a 4-way matrix or something. Unlikely, but
possible, so I do think internally separate but inline is the way to go.
At any rate, that's the theoretical plan of how it would work. Then you
could just IMPORT straight into fP, problem solved. One product, multiple
imports of different XML formats. You'd just tell it which element is
the "record "separator" element, similar to "r=" in IMPORT. You'd supply
the configuration information for fields and attributes from that element
forward. You'd be able to specify the namespace to use, so you could throw
raw SOAP data at it and totally and wholly ignore the envelope and such,
paying attention only to the desired encapsulated data. Pretty versatile.
So, assuming such a creature was written, would you be interested in it,
and at what price point? I know what I have in mind, I'm interested to see
what the untainted responses are.
I'd appreciate any and all responses, on-list or off, doesn't matter to me,
I'll read them all.
Thanks,
mark->
--
Fairlight-> ||| "The road to truth is long, and | Fairlight Consulting
__/\__ ||| lined the entire way with annoying |
<__<>__> ||| bastards." --Alexander Jablokov | http://www.fairlite.com
\/ ||| | info at fairlite.com
More information about the Filepro-list
mailing list