Xml special characters and xlate

Walter Vaughan wvaughan at steelerubber.com
Tue Oct 2 13:46:08 PDT 2007


Fairlight wrote:

> 1) HTML and XML follow the same rules.  They don't.  HTML is far looser,
> and while many entities may be held in common, they aren't guaranteed to be
> identical lists.  How well is TOHTML documented in what it alters?

UTF-8 characters are the only thing *I* have found that it doesn't make xmllint 
nor the java import libraries happy.

And we solve that with this bloated code that works, but probably could be done 
in one line...

open(INFILE,$ARGV[0]) or die();
open(OUTFILE,">$ARGV[1]") or die();

while (<INFILE>) {
        #chomp;
        $_ =~ s/([\x{80}-\x{FFFF}])/'&#' . ord($1) . ';'/gse;
        $_ =~ s/(\\)+/ /gse;
        print OUTFILE $_;
}

close INFILE;
close OUTFILE;



More information about the Filepro-list mailing list