web-calculus
Object Serialization
2005-04-08
This specification defines a serialization
mechanism for transporting object state between disparate
computing systems. [code]
A
<Schema>
is generated for each object class. The
schema identifier is
generated based on the fully qualified class name. Each object
member is represented by a separate
Branch.
- The representation of object state is independent of the
programming language object model.
- The consistency of transported object state is
guaranteed.
- Upgrade of a serialized object class is supported.
- Object state is represented as a web-calculus
document.
In Internet-scale applications, peers will be implemented in a
variety of different programming languages. Supporting this
application environment requires an exchanged state
representation that is independent of programming language.
The serialization mechanism guarantees that a deserialized
object is equivalent to the serialized object. Deserializing a
serialized object yields an equivalent object.
As an application develops, some serialized object classes may
require upgrading. An object class upgrade may mean: adding
additional object members; deleting existing object members;
changing the static type of existing object members; and/or
changing the meaning of existing object members. When an object
class is upgraded, propagating the update to all users of the
object class or upgrading all serialized instances of the old
object class may not be possible. The serialization mechanism
provides a well defined means for handling on-the-fly upgrading.
The web-calculus document model provides a simple and extensible
model for representing state. Both binary and textual syntaxes
are supported.
An object graph is serialized as one or more web-calculus
documents. Each object is
represented by a
Node. The
Node schema
is generated based on the object class. This specification
describes how the
<Schema>
is generated from the object class.
The web-calculus Document Schema
specification lists several predefined schemas. An object class
that can be represented by a predefined schema SHOULD be encoded
using the predefined schema. The serialization mechanism MUST
maintain a mapping from predefined schema to equivalent local
implementation class.
The set of predefined schemas that an application uses is not
limited to primitive types; it SHOULD include schemas defined by
existing applications with which the new application wishes to
be interoperable. Schemas from existing applications can be
integrated in the same way that schemas for primitive types are.
When a pass-by-copy object with no predefined
schema is transported, a corresponding schema is generated.
[reify] The general object encoder is
coded to obey the rules of this schema.
For each generated schema, a globally unique URI MUST be
generated for the
schema identifier. To
facilitate discussion between human programmers, the generated
URI SHOULD be human memorable.
For programming environments where the fully qualified name of
an object class incorporates a DNS hostname, an
http URI SHOULD be generated for the schema
identifier. The generated http URI uses the
hostname specified in the fully qualified class name. The
remaining parts of the fully qualified class name are encoded in
the http path, each part separated by the
'/' path segment delimiter.
For each declared member of the object class, the generated
<Schema>
declares a corresponding
'child'
branch. The member name is the
<Branch>
'name'.
If the static member type is an array type,
the
<Branch>
'arity' is
<Many>.
The array's static component type is used to generate the
'expected'
schema identifier.
Each element of the array is output as an occurrence of the
defined
Branch.
For all other members, the
<Branch>
'arity' is
<Once>.
The static member type is used to generate the
'expected'
schema identifier.
For each direct object superclass, the generated
<Schema>
declares a corresponding
'child'
branch with
'name'
'super'. The superclass type is used to generate the
'expected'
schema identifier.
Conceptually, this approach transforms an inheritance hierarchy
into an aggregation model. The superclass becomes a synthetic
object member named 'super'.
A pass-by-reference object is encoded as an
<http://web-calculus.org/pointer/Embed>
Node. The
'target'
is the URI for the referenced object.
The receiving object deserialization code recognizes the
<Embed>
Node and
establishes a connection to the referenced object for delivery
of messages.
Object state that is pass-by-copy is immutable by definition.
Given only immutable object types, constructing an object graph
that contains a cycle is impossible. If the transported object
graph consists solely of immutable objects, handling object
graph cycles is unnecessary.
Many programming languages enable immutable object graph cycles
by supporting "promises" in either a pure or degenerate form. A
promise is a reference that is not yet bound to a target object.
A "pure" promise queues received method invocations until the
reference is resolved to a target object. Once resolved, the
promise delivers the queued method invocations to the target
object. Some programming languages, such as Java and C++,
support a form of degenerate promise in which method invocations
are not queued, but are instead delivered to the partially
constructed object. The degenerate promise is a
this pointer passed to another object. The
this pointer is passed before the constructor of
the referred-to object finishes executing.
Immutable object graph cycles are encoded by breaking the cycles
at the promise objects. The resolved promise is effectively
unresolved. This treatment breaks the cycle, creating separate
acyclic sub-graphs. The receiving object deserialization code
is then responsible for re-resolving the promise once it
receives all of the immutable acyclic sub-graphs of the overall
cyclic object graph. Each immutable acyclic sub-graph is
received as a separate web-calculus document.
An unresolved promise is also encoded as an
<Embed>.
The receiving object deserialization code recognizes the
<Embed>
and fetches the promised document.
Once published, a schema MUST be considered immutable. The
following compatibility mechanism supports evolution of the
corresponding object class.
When serializing an object, the
schema identifier of
any Node
referred to by a 'super'
Branch
MUST be explicitly specified. The implicit child node schema
identifier MUST NOT be relied upon.
When a site deserializing an object graph encounters a
Node with an
unknown
schema identifier, it
will check the
Node for a
'super'
Branch. If
a 'super'
Branch
exists, the site will attempt to deserialize the indicated
object in place of the unrecognized object. This process will
continue recursively until either a recognized object class
version is found, or the deserialization fails because there is
no recognized and compatible object class version. If a
Node has
multiple 'super'
Branches,
they are searched in a pre-order traversal until a recognized
schema identifier is
found.
The compatibility mechanism overloads the
inheritance hierarchy to additionally represent the object class
version compatibility chain. [catch]
As a result, new subclasses are created more frequently than
might otherwise be the case. If evolution of an existing object
class requires a change to the corresponding serialization
schema, a new subclass MUST be created instead of modifying the
existing class. The changes required for object class evolution
are implemented in the new subclass.
Class changes that require a new subclass
- Adding a new member
- Deleting an existing member
- Changing the static type of an existing member
- Changing the meaning of an existing member
Class changes that do not require a new subclass
- Adding a new method
- Deleting an existing method
- Changing the implementation of an existing method
If a site deserializing an object graph encounters a
Node with a
missing
Branch, it
MUST be treated as an unrecoverable error. The deserializing
site MUST NOT assign a default value to an object member.
All of the predefined schemas in the web-calculus
Document Schema specification
were generated based on the generic object serialization
mechanism defined in this specification.
[code] This specification does
not address transporting object behavior, such as program code.
[reify] This schema need not
be reified, as it only represents the encoding rules that a general
object encoder MUST obey.
[catch] The semantics of the
compatibility mechanism are very similar to those used in the
exception handling logic of many popular programming languages. In
the Java programming language, a catch clause catches
all exceptions of an indicated class and any subclass. The implicit
assumption is that the subclass has a meaning which is compatible
with the base-class. The compatibility mechanism defined here
extends these semantics to also solve the upgrade problem.
|