There are two kinds of parser types: One that optimises one-time access to
the DOM tree and one that creates a persistent DOM tree.
-
SGML_PARSER_STREAM
-
The first one will simply push nodes on the stack, not building a
DOM tree. This interface is similar to that of SAX (Simple API for
XML) where events are fired when nodes are entered and exited. It is
useful when you are not actually interested in the DOM tree, but can
do all processing in a stream-like manner, such as when highlighting
HTML code.
-
SGML_PARSER_TREE
-
The second one is a DOM tree builder, that builds a persistent DOM
tree. When using this type, it is possible to do even more
(pre)processing than for parser streams. For example you can sort
element child nodes, or purge various node such as text nodes that
only contain space characters.
These flags control how the parser behaves.
-
SGML_PARSER_COUNT_LINES
-
Make line numbers available.
-
SGML_PARSER_COMPLETE
-
Used internally when incremental.
-
SGML_PARSER_INCREMENTAL
-
Parse chunks of input.
-
SGML_PARSER_DETECT_ERRORS
-
Report errors.
The SGML parser has only little state.
-
sgml_parser_state.info
-
Info about the properties of the node contained by state.
This is only meaningful to element and attribute nodes. For
unknown nodes it points to the common unknown node info.
-
sgml_parser_state.end_token
-
This is used by the DOM source renderer for highlighting the
end-tag of an element.
These enum values are used for return codes.
-
SGML_PARSER_CODE_OK
-
The parsing was successful
-
SGML_PARSER_CODE_INCOMPLETE
-
The parsing could not be completed
-
SGML_PARSER_CODE_MEM_ALLOC
-
Failed to allocate memory
-
SGML_PARSER_CODE_ERROR
-
FIXME: For when we will add support for requiring stricter parsing
or even a validator.
Called by the SGML parser when a parsing error has occurred.
If the return code is not SGML_PARSER_CODE_OK the parsing will be
ended and that code will be returned.
This struct hold info used while parsing SGML data.
-
sgml_parser.type
-
Stream or tree
-
sgml_parser.flags
-
Flags that control the behaviour
-
sgml_parser.info
-
Backend dependent info
-
sgml_parser.uri
-
The URI of the DOM document
-
sgml_parser.root
-
The document root node
-
sgml_parser.error_func
-
Called for detected errors
-
sgml_parser.stack
-
A stack for tracking parsed nodes
-
sgml_parser.parsing
-
Used for tracking parsing states
Initialise an SGML parser with the given properties.
|
type
|
Stream or tree; one-time or persistant.
|
|
doctype
|
The document type, this affects what sub type nodes are given.
|
|
uri
|
The URI of the document root.
|
|
flags
|
Flags controlling the behaviour of the parser.
|
Returns the created parser or NULL.
Deallocates all resources, _expect_ the root node.
|
parser
|
The parser being released.
|
Parses the given buffer. For incremental rendering the last buffer can be
signals through the complete parameter.
|
parser
|
A parser created with init_sgml_parser.
|
|
buffer
|
A string containing the chunk to parse.
|
|
complete
|
Whether this is the last chunk to parse.
|
The returned code is SGML_PARSER_CODE_OK if the buffer was
successfully parserd, else a code hinting at the error.
Returns what line number the parser is currently at or zero if there has
been no parsing yet.
|
Note
|
Line numbers are recorded in the scanner tokens. |