extended characters
Brian K. White
brian at aljex.com
Tue Sep 11 12:38:17 PDT 2007
Do you need to preserve this data as UTF-8? or merely display it and it's ok
to transform it into the closest approximation your terminals current font
and encoding can manage?
iconv is the most common util for converting. There is also something called
recode, both open source.
And probably others as well.
iconv is probably already installed, source and a sco binary for recode is
available at http://www.aljex.com/bkw/sco/#recode linux packages must be out
there.
Assuming input.txt is a text file where the text is in utf-8 format.
iconv -f utf8 <input.txt >output.txt
That will use "the systems current locale" for the output encoding, which is
probably best, or you can specify the output encoding:
iconv -f utf8 -t cp437 <input.txt >output.txt
If you are not free to just use system() to convert the whole file in one
shot and use the converted file in filepro, but need to convert many small
random strings, it's probably possible to use iconv as a user() command. The
only difficulty is that as ever with user() it may be tricky and prone to
getting out of step and locking up.
The command would just be "user iconv = iconv -f utf8" unless you wanted to
write a wrapper shell script around it to try to ensure the script and fp
stay in sync.
Finally, since this is xml, I beleive it's possible for individual fields to
have their own encoding unrelated to that of the xml file itself (the tags,
definitions of the tags, field metadata, everything in the file that isn't
actual field content/data), so converting the whole file with one
transformation may not be a technically correct handling of the data. You
may really need an xml parser that handles each field
correctly/individually.
Brian K. White brian at aljex.com http://www.myspace.com/KEYofR
+++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++.
filePro BBx Linux SCO FreeBSD #callahans Satriani Filk!
----- Original Message -----
From: "Bruce Easton" <bruce at stn.com>
To: "FilePro Mailing List" <filepro-list at lists.celestial.com>
Sent: Tuesday, September 11, 2007 2:17 PM
Subject: extended characters
>I recently imported a file (via readline command in clerk)
> from an XML file that states at the end of the top line:
> "encoding="UTF-8". I then write out the line as-is to
> a record.
>
> In filePro (SCO 5.14), the line is stored with the funny
> characters here and there. (The data has names in several
> different languages including eastern & western european
> and middle eastern as well.)
>
> I coded an errorbox to come up on @key that will scan
> each line character by character so that I can see the
> decimal values of each character that is above 127.
>
> When I am on, for instance, the third character in the
> expression "President du commandement" which I'm thinking
> must be French, and therefore an e with an accute accent
> (avec accent aigu), my errorbox is telling me [via my
> code: asc(mid(xx,currpos,"1"))] for two characters in
> a row that filePro is storing this as decimal 195 followed
> by decimal 169.
>
> I don't see how these decimal numbers correlate to any
> common character set. Am I missing something obvious?
>
> (The only funny part of this was was that when I'd scroll
> down thru the data, it would hit some combination of
> the extended chars and send a print job to the system
> printer. Who knows what else it did on the system.)
>
> Bruce
>
> Bruce Easton
> STN, Inc.
>
> _______________________________________________
> Filepro-list mailing list
> Filepro-list at lists.celestial.com
> http://mailman.celestial.com/mailman/listinfo/filepro-list
>
More information about the Filepro-list
mailing list