extended characters

Brian K. White brian at aljex.com
Tue Sep 11 12:38:17 PDT 2007


Do you need to preserve this data as UTF-8? or merely display it and it's ok 
to transform it into the closest approximation your terminals current font 
and encoding can manage?

iconv is the most common util for converting. There is also something called 
recode, both open source.
And probably others as well.
iconv is probably already installed, source and a sco binary for recode is 
available at http://www.aljex.com/bkw/sco/#recode linux packages must be out 
there.

Assuming input.txt is a text file where the text is in utf-8 format.

iconv -f utf8 <input.txt >output.txt

That will use "the systems current locale" for the output encoding, which is 
probably best, or you can specify the output encoding:

iconv -f utf8 -t cp437 <input.txt >output.txt

If you are not free to just use system() to convert the whole file in one 
shot and use the converted file in filepro, but need to convert many small 
random strings, it's probably possible to use iconv as a user() command. The 
only difficulty is that as ever with user() it may be tricky and prone to 
getting out of step and locking up.
The command would just be "user iconv = iconv -f utf8" unless you wanted to 
write a wrapper shell script around it to try to ensure the script and fp 
stay in sync.

Finally, since this is xml, I beleive it's possible for individual fields to 
have their own encoding unrelated to that of the xml file itself (the tags, 
definitions of the tags, field metadata, everything in the file that isn't 
actual field content/data), so converting the whole file with one 
transformation may not be a technically correct handling of the data. You 
may really need an xml parser that handles each field 
correctly/individually.

Brian K. White    brian at aljex.com    http://www.myspace.com/KEYofR
+++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++.
filePro  BBx    Linux  SCO  FreeBSD    #callahans  Satriani  Filk!


----- Original Message ----- 
From: "Bruce Easton" <bruce at stn.com>
To: "FilePro Mailing List" <filepro-list at lists.celestial.com>
Sent: Tuesday, September 11, 2007 2:17 PM
Subject: extended characters


>I recently imported a file (via readline command in clerk)
> from an XML file that states at the end of the top line:
> "encoding="UTF-8".  I then write out the line as-is to
> a record.
>
> In filePro (SCO 5.14), the line is stored with the funny
> characters here and there.  (The data has names in several
> different languages including eastern & western european
> and middle eastern as well.)
>
> I coded an errorbox to come up on @key that will scan
> each line character by character so that I can see the
> decimal values of each character that is above 127.
>
> When I am on, for instance, the third character in the
> expression "President du commandement" which I'm thinking
> must be French, and therefore an e with an accute accent
> (avec accent aigu), my errorbox is telling me [via my
> code:  asc(mid(xx,currpos,"1"))] for two characters in
> a row that filePro is storing this as decimal 195 followed
> by decimal 169.
>
> I don't see how these decimal numbers correlate to any
> common character set.  Am I missing something obvious?
>
> (The only funny part of this was was that when I'd scroll
> down thru the data, it would hit some combination of
> the extended chars and send a print job to the system
> printer.  Who knows what else it did on the system.)
>
> Bruce
>
> Bruce Easton
> STN, Inc.
>
> _______________________________________________
> Filepro-list mailing list
> Filepro-list at lists.celestial.com
> http://mailman.celestial.com/mailman/listinfo/filepro-list
> 



More information about the Filepro-list mailing list