Importing XML Generically: A Proposed Project
Bill Campbell
bill at celestial.com
Tue Dec 18 10:29:29 PST 2007
On Tue, Dec 18, 2007, Fairlight wrote:
>Hi All,
>
>Been considering this for a while, and just got my head around something
>within the last week that I think makes it tennable now. I'm interested
>in finding out whether it's worth my while to take the time and develop,
>however.
>
>Basically, fP has no graceful way of handling XML import unless you rewrite
>XML parsers by hand--over and over and over for each different DTD or
>Schema--in processing.
>
>However, it does have the ability to neatly import CSV format files.
>
>Assuming one could specify a configuration file (with -optional- comments)
>in the format:
>
># Order Date.
>root.order.date
># Order Number.
>root.order.number
># Customer Name.
>root.order.customer.first_name
>root.order.customer.last_name
># Order Status (attribute status="yes|no" on the <order> element).
>root.order#status
># Billing Amount.
>root.order.billing.amount
># Billing Currency as attribute to element <amount>.
>root.order.billing.amount#currency
>#Billing Date.
>root.order.billing.date
>
>...and assuming the command was simply:
>
>xml2csv -c ord.conf [-n namespace] [-o out_file] -r order 20071218113903.xml
>
>...and assuming you'd get as output, line after line in the format:
>
>"12/11/2007","C2987197D25","Mark","Luljak","Yes","995.00","USD","12/12/2007"
>
>...Well, then you'd have valid XML translated to a usable CSV format the fP
>can use with IMPORT. Fields would be presented in the order defined in the
>config file. You can specify different config files for different data
>files, making this generic.
>
>I'm still deciding how to handle multiple entries in a one-to-many
>relationship. If you specified "root.order.items.item_number" in the
>definition, I don't know if it should do multiple complete lines per unique
>item number, or just comma-separate them in one line (ie., "23115A,25907G").
>
>I think it should be comma separated, as you could theoretically have more
>than a single one-to-many relationship in the same association, and
>repetative lines really do give you too many permutation possibilities to
>be useful if you end up with a 4-way matrix or something. Unlikely, but
>possible, so I do think internally separate but inline is the way to go.
I'm hardly expert at ways of processing XML, mostly doing things per case
where I have to.
The fundamental problem is that XML is essentially a hierarchical, object
oriented structure while FilePro is similar to a relational database (that
should get some flames going :-). One may well have to import multiple
tables from a single XML file.
I generally dig into XML files by writing a python module with a class
structure that models the XML. Each class then has a method to generate
CSV/Tab delimited output as necessary.
I suspect that XSLT might be useful in breik and XML file into these
multiple imports.
My project today is to take tab-delimited files exported from multiple
FilePro tables, and import them into a Plone product.
Bill
--
INTERNET: bill at celestial.com Bill Campbell; Celestial Software LLC
URL: http://www.celestial.com/ PO Box 820; 6641 E. Mercer Way
FAX: (206) 232-9186 Mercer Island, WA 98040-0820; (206) 236-1676
It's time to feed the hogs
-- Unintended Consequences
More information about the Filepro-list
mailing list