3.2 Handling white-space
SGML2PL has four modes for handling white-space. The initial mode can
be switched using the space(SpaceMode)
option to
load_structure/3
and set_sgml_parser/2.
In XML mode, the mode is further controlled by the xml:space
attribute, which may be specified both in the DTD and in the document.
The defined modes are:
- space(sgml)
- In SGML, newlines at the start and end of an element are removed.2In addition, newlines at the end of lines containing only markup should be deleted. This is not yet implemented. This is the default mode for the SGML dialect.
- space(preserve)
- White space is passed literally to the application. This mode leaves
most white space handling to the application. This is the default mode
for the XML dialect. Note that
\r\n
is still translated to\n
. To preserve whitespace exactly, usespace(strict)
(see below) - space(strict)
- White space is passed strictly to the application. This mode leaves all white space handling to the application. This is useful for producing and verifying XML signatures.
- space(default)
- In addition to
sgml
space-mode, all consequtive white-space is reduced to a single space-character. This mode canonicalises all white space. - space(remove)
- In addition to
default
, all leading and trailing white-space is removed fromCDATA
objects. If, as a result, theCDATA
becomes empty, nothing is passed to the application. This mode is especially handy for processingādata-orientedādocuments, such as RDF. It is not suitable for normal text documents. Consider the HTML fragment below. When processed in this mode, the spaces between the three modified words are lost. This mode is not part of any standard; XML 1.0 allows onlydefault
andpreserve
.Consider adjacent <b>bold</b> <ul>and</ul> <it>italic</it> words.