9 January 2013
Some people have reported that when they process the PBS XML using Microsoft's MSXSL.EXE processor, the resulting text file has many extra NUL bytes.
This is due to the character encoding of the output file. Every XSLT processor will produce output files in a particular character encoding, even text output.
When an XSLT processor creates an output file it will use the default character encoding
for the platform. This may be overridden by the XSL stylesheet (using the xsl:output
element). A processor may also provide its own means of changing the default (eg.
using a command-line parameter).
In the case of MSXSL.EXE running on a MS Windows operating system, the default character encoding is UTF-16. UTF-16 uses at least 2 bytes (16 bits) for every character. For characters that are in the 7-bit ASCII character repertoire then the character encoding adds an extra NUL byte to make up the 16 bits.
The XSL stylesheets provided on this site do not explicitly set the character encoding of the result document. This is because, in most cases, the XSLT processor will automatically select the appropriate character encoding for the platform.
In order to force the processor to use a particular character encoding, a simple solution
is to write a small XSL stylesheet that imports the original XSL stylesheet, using
the xsl:import
element, and sets the character encoding using the xsl:output
element.
For example:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:import href="drug.xsl"/>
<xsl:output method="text" encoding="utf-8"/>
</xsl:stylesheet>
We are always looking for ways to improve our website
© Commonwealth of Australia | Department of Health and Aged Care