WaterkenTM Doc
XML Surface Syntax
2003-01-24
This specification describes the mapping of the WaterkenTM Doc
document model onto the XML surface
syntax. The mapping provides interoperation between WaterkenTM
Doc-based tools and XML-based tools.
The WaterkenTM Doc
document model is mapped onto a defined subset of the
XML grammar.
- WaterkenTM Doc-based tools generate output that
existing XML-based tools can manipulate.
- Existing XML-based tools can generate output destined for an application that uses
WaterkenTM Doc-based tools.
- The shared syntax is a useable textual encoding of the
WaterkenTM Doc
document model.
Each Branch is represented as an XML element
The Branch
Name is used as the XML
element
Name. The
Branch
Annotation is encoded
as XML CharData preceding the
element
start-tag. The
Branch child
Node is encoded as
the XML element
content. The child
Node's
Schema is encoded in the XML
element as an attribute with name
'schema'. The child Node's
Annotation is encoded as XML
CharData following the
end-tag of the last child
element in the XML
element
content.
For each root Node in a
WaterkenTM Doc
Document, a synthetic
Branch is created for the XML representation. These
synthetic root Branches use
Name 'doc' and have an empty
Annotation. Upon parsing of the XML representation,
the root 'doc' elements will be
stripped, yielding the root Nodes.
The grammar for the XML subset is:
document ::= element
element ::= EmptyElemTag | (STag content ETag)
EmptyElemTag ::= '<' Name (S SchemaAttribute)? S? '/>'
STag ::= '<' Name (S SchemaAttribute)? S? '>'
ETag ::= '</' Name S? '>'
content ::= CharData? ((element | Reference) CharData?)*
SchemaAttribute ::= 'schema' Eq AttValue
document
An XML document MUST contain a single element, whereas a WaterkenTM
Doc document can contain a list of zero or more nodes. If the
Document does not contain exactly one
Node, the XML representation MUST be wrapped in a synthetic
'list' element. Upon parsing of the XML representation, the 'list' element
will be stripped.
EmptyElemTag
This is semantically the same as an STag followed immediately by an
ETag.
STag
The Branch Name and child
Node Schema. A root
Branch has Name
'doc'.
content
The list of Node Branches followed
by the Node Annotation.
The CharData preceding each child
element is the
Annotation of the corresponding
Branch.
Below is a simple example of an XML encoded representation of a
WaterkenTM Doc document.
<doc schema="http://example.com/project/NodeSchema">
First branch comment <branch_name>child node data</branch_name>
Root node comment
</doc>
The document has a single root Node. The root
Node Schema
is: 'http://example.com/project/NodeSchema'. The root Node
Annotation is: '\nRoot node comment\n'. The root
Node has a single
Branch named 'branch_name', with
Annotation: '\nFirst branch comment '. The single
Branch points to a child
Node with
Annotation: 'child node data'. The child
Node has an implicit
Schema and no
Branches.
|