WaterkenTM Web
Object Serialization
2003-12-22
This specification defines a serialization mechanism for transporting object state
between disparate computing systems. [code]
A <Schema> is generated for
each object class. The schema identifier is generated based on
the fully qualified class name. Each object member is represented by a separate
Branch.
- The representation of object state is independent of the programming language object model.
- The consistency of transported object state is guaranteed.
- Upgrade of a serialized object class is supported.
- Object state is represented as a WaterkenTM Doc
document.
In Internet-scale applications, peers will be implemented in a variety of different programming
languages. Supporting this application environment requires an exchanged state representation that is
independent of programming language.
The serialization mechanism guarantees that a deserialized object is equivalent to the
serialized object. Deserializing a serialized object yields an equivalent object.
As an application develops, some serialized object classes may require upgrading. An object class
upgrade may mean: adding additional object members; deleting existing object members; changing the
static type of existing object members; and/or changing the meaning of existing object members.
When an object class is upgraded, propagating the update to all users of the object class or
upgrading all serialized instances of the old object class may not be possible. The serialization
mechanism provides a well defined means for handling on-the-fly upgrading.
WaterkenTM Doc provides a simple and extensible model for representing
state. Both binary and textual syntaxes are supported.
An object graph is serialized as one or more WaterkenTM Doc
documents. Each object is represented by a
Node. The schema for a
Node is generated based on the object class. This
specification describes how the
<Schema> is generated from
the object class.
The WaterkenTM Doc Document Schema
specification lists several predefined schemas. An
object class that can be represented by a predefined schema SHOULD be encoded using the predefined
schema. The serialization mechanism MUST maintain a mapping from predefined schema to equivalent local
implementation class.
The set of predefined schemas that an application uses is not limited to primitive types; it
SHOULD include schemas defined by existing applications with which the new application wishes to be
interoperable. Schemas from existing applications can be integrated in the same way that
schemas for primitive types are.
When a pass-by-copy object with no predefined schema is transported, a corresponding
schema is generated. [reify] The general object encoder is
coded to obey the rules of this schema.
For each generated schema, a globally unique URI MUST be generated for the
schema identifier. To facilitate discussion between human
programmers, the generated URI SHOULD be human memorable.
For programming environments where the fully qualified name of an object class incorporates a DNS
hostname, an http URI SHOULD be generated for the schema identifier.
The generated http URI uses the hostname specified in the fully qualified class name.
The remaining parts of the fully qualified class name are encoded in the http path,
each part separated by the '/' path segment delimiter.
For each declared member of the object class, the generated
<Schema>
declares a corresponding
'child' branch.
The member name is the
<Branch>
'name'.
If the static member type is an array type, the
<Branch>
'arity' is
<Many>. The array's static
component type is used to generate the
'implicit'
schema identifier. Each element of the array is output as an
occurrence of the defined Branch.
For all other members, the
<Branch>
'arity' is
<Once>. The static member
type is used to generate the
'implicit'
schema identifier.
For each direct object superclass, the generated
<Schema> declares a
corresponding 'child' branch
with 'name'
'super'. The superclass type is used to generate the
'implicit'
schema identifier.
Conceptually, this approach transforms an inheritance hierarchy into an aggregation
model. The superclass becomes a synthetic object member named 'super'.
A pass-by-reference object is encoded as a
<http://waterken.com/doc/pointer/Embed>
Node. The
'target' is the URI for the
referenced object.
The receiving object deserialization code recognizes the
<Embed>
Node and establishes a connection to the referenced
object for delivery of messages.
Object state that is pass-by-copy is immutable by definition. Given only immutable object
types, constructing an object graph that contains a cycle is impossible. If the transported
object graph consists solely of immutable objects, handling object graph cycles is unnecessary.
Many programming languages enable immutable object graph cycles by supporting "promises" in either
a pure or degenerate form. A promise is a reference that is not yet bound to a target object.
A "pure" promise queues received method invocations until the reference is resolved to a
target object. Once resolved, the promise delivers the queued method invocations to the target
object. Some programming languages, such as Java and C++, support a form of degenerate promise in
which method invocations are not queued, but are instead delivered to the partially constructed
object. The degenerate promise is a this pointer passed to another object. The
this pointer is passed before the constructor of the referred-to object finishes
executing.
Immutable object graph cycles are encoded by breaking the cycles at the promise objects. The
resolved promise is effectively unresolved. This treatment breaks the cycle, creating separate
acyclic sub-graphs. The receiving object deserialization code is then responsible for re-resolving
the promise once it receives all of the immutable acyclic sub-graphs of the overall cyclic
object graph. Each immutable acyclic sub-graph is received as a separate
WaterkenTM Doc document.
An unresolved promise is also encoded as an
<Embed>. The receiving
object deserialization code recognizes the
<Embed> and fetches the
promised document.
Once published, a schema MUST be considered immutable. The following compatibility mechanism
supports evolution of the corresponding object class.
When serializing an object, the schema identifier of any
Node referred to by a 'super'
Branch MUST be explicitly specified. The
implicit child node schema identifier MUST NOT be relied upon.
When a site deserializing an object graph encounters a
Node with an unknown
schema identifier, it will check the
Node for a 'super'
Branch. If a 'super'
Branch exists, the site will attempt
to deserialize the indicated object in place of the unrecognized object. This process will continue
recursively until either a recognized object class version is found, or the deserialization fails
because there is no recognized and compatible object class version. If a
Node has multiple 'super'
Branches, they are searched in a pre-order
traversal until a recognized schema identifier is found.
The compatibility mechanism overloads the inheritance hierarchy to
additionally represent the object class version compatibility chain.
[catch] As a result, new subclasses are created more frequently than
might otherwise be the case. If evolution of an existing object class requires a change to the
corresponding serialization schema, a new subclass MUST be created instead of modifying
the existing class. The changes required for object class evolution are implemented in the
new subclass.
Class changes that require a new subclass
- Adding a new member
- Deleting an existing member
- Changing the static type of an existing member
- Changing the meaning of an existing member
Class changes that do not require a new subclass
- Adding a new method
- Deleting an existing method
- Changing the implementation of an existing method
If a site deserializing an object graph encounters a
Node with a missing
Branch, it MUST be treated as an
unrecoverable error. The deserializing site MUST NOT assign a default value to an object member,
unless the schema specifies an
<http://waterken.com/doc/schema/Implied>
value.
All of the predefined schema types in the
WaterkenTM Doc Document Schema
specification were generated based on the generic object serialization mechanism defined in this
specification.
[code] This specification does not address transporting object
behavior, such as program code.
[reify] This schema need not be reified, as it only represents
the encoding rules that a general object encoder MUST obey.
[catch] The semantics of the compatibility mechanism are very
similar to those used in the exception handling logic of many popular programming languages. In the Java
programming language, a catch clause catches all exceptions of an indicated class and any
subclass. The implicit assumption is that the subclass has a meaning which is compatible with the
base-class. The compatibility mechanism defined here extends these semantics to also solve the upgrade
problem.
|