home -> developer -> Web -> Object

previous version Model Schema

web-calculus

Object Serialization

2005-04-08

This specification defines a serialization mechanism for transporting object state between disparate computing systems. [code]

Abstract

A <Schema> is generated for each object class. The schema identifier is generated based on the fully qualified class name. Each object member is represented by a separate Branch.

Overview

Design goals

  1. The representation of object state is independent of the programming language object model.
  2. The consistency of transported object state is guaranteed.
  3. Upgrade of a serialized object class is supported.
  4. Object state is represented as a web-calculus document.

Programming language independence

In Internet-scale applications, peers will be implemented in a variety of different programming languages. Supporting this application environment requires an exchanged state representation that is independent of programming language.

Guaranteeing consistency

The serialization mechanism guarantees that a deserialized object is equivalent to the serialized object. Deserializing a serialized object yields an equivalent object.

Supporting object class upgrade

As an application develops, some serialized object classes may require upgrading. An object class upgrade may mean: adding additional object members; deleting existing object members; changing the static type of existing object members; and/or changing the meaning of existing object members. When an object class is upgraded, propagating the update to all users of the object class or upgrading all serialized instances of the old object class may not be possible. The serialization mechanism provides a well defined means for handling on-the-fly upgrading.

web-calculus based

The web-calculus document model provides a simple and extensible model for representing state. Both binary and textual syntaxes are supported.

Description

An object graph is serialized as one or more web-calculus documents. Each object is represented by a Node. The Node schema is generated based on the object class. This specification describes how the <Schema> is generated from the object class.

Encoding an object with a predefined schema

The web-calculus Document Schema specification lists several predefined schemas. An object class that can be represented by a predefined schema SHOULD be encoded using the predefined schema. The serialization mechanism MUST maintain a mapping from predefined schema to equivalent local implementation class.

The set of predefined schemas that an application uses is not limited to primitive types; it SHOULD include schemas defined by existing applications with which the new application wishes to be interoperable. Schemas from existing applications can be integrated in the same way that schemas for primitive types are.

Generating a schema for a pass-by-copy object with no predefined schema

When a pass-by-copy object with no predefined schema is transported, a corresponding schema is generated. [reify] The general object encoder is coded to obey the rules of this schema.

Generating a schema identifier

For each generated schema, a globally unique URI MUST be generated for the schema identifier. To facilitate discussion between human programmers, the generated URI SHOULD be human memorable.

For programming environments where the fully qualified name of an object class incorporates a DNS hostname, an http URI SHOULD be generated for the schema identifier. The generated http URI uses the hostname specified in the fully qualified class name. The remaining parts of the fully qualified class name are encoded in the http path, each part separated by the '/' path segment delimiter.

Representing object members

For each declared member of the object class, the generated <Schema> declares a corresponding 'child' branch. The member name is the <Branch> 'name'.

If the static member type is an array type, the <Branch> 'arity' is <Many>. The array's static component type is used to generate the 'expected' schema identifier. Each element of the array is output as an occurrence of the defined Branch.

For all other members, the <Branch> 'arity' is <Once>. The static member type is used to generate the 'expected' schema identifier.

Representing an inheritance hierarchy

For each direct object superclass, the generated <Schema> declares a corresponding 'child' branch with 'name' 'super'. The superclass type is used to generate the 'expected' schema identifier.

Conceptually, this approach transforms an inheritance hierarchy into an aggregation model. The superclass becomes a synthetic object member named 'super'.

Encoding a pass-by-reference object

A pass-by-reference object is encoded as an <http://web-calculus.org/pointer/Embed> Node. The 'target' is the URI for the referenced object.

The receiving object deserialization code recognizes the <Embed> Node and establishes a connection to the referenced object for delivery of messages.

Encoding object graph cycles

Object state that is pass-by-copy is immutable by definition. Given only immutable object types, constructing an object graph that contains a cycle is impossible. If the transported object graph consists solely of immutable objects, handling object graph cycles is unnecessary.

Many programming languages enable immutable object graph cycles by supporting "promises" in either a pure or degenerate form. A promise is a reference that is not yet bound to a target object. A "pure" promise queues received method invocations until the reference is resolved to a target object. Once resolved, the promise delivers the queued method invocations to the target object. Some programming languages, such as Java and C++, support a form of degenerate promise in which method invocations are not queued, but are instead delivered to the partially constructed object. The degenerate promise is a this pointer passed to another object. The this pointer is passed before the constructor of the referred-to object finishes executing.

Immutable object graph cycles are encoded by breaking the cycles at the promise objects. The resolved promise is effectively unresolved. This treatment breaks the cycle, creating separate acyclic sub-graphs. The receiving object deserialization code is then responsible for re-resolving the promise once it receives all of the immutable acyclic sub-graphs of the overall cyclic object graph. Each immutable acyclic sub-graph is received as a separate web-calculus document.

Encoding an unresolved promise

An unresolved promise is also encoded as an <Embed>. The receiving object deserialization code recognizes the <Embed> and fetches the promised document.

Schema upgrade

Once published, a schema MUST be considered immutable. The following compatibility mechanism supports evolution of the corresponding object class.

When serializing an object, the schema identifier of any Node referred to by a 'super' Branch MUST be explicitly specified. The implicit child node schema identifier MUST NOT be relied upon.

When a site deserializing an object graph encounters a Node with an unknown schema identifier, it will check the Node for a 'super' Branch. If a 'super' Branch exists, the site will attempt to deserialize the indicated object in place of the unrecognized object. This process will continue recursively until either a recognized object class version is found, or the deserialization fails because there is no recognized and compatible object class version. If a Node has multiple 'super' Branches, they are searched in a pre-order traversal until a recognized schema identifier is found.

The compatibility mechanism overloads the inheritance hierarchy to additionally represent the object class version compatibility chain. [catch] As a result, new subclasses are created more frequently than might otherwise be the case. If evolution of an existing object class requires a change to the corresponding serialization schema, a new subclass MUST be created instead of modifying the existing class. The changes required for object class evolution are implemented in the new subclass.

Class changes that require a new subclass
  • Adding a new member
  • Deleting an existing member
  • Changing the static type of an existing member
  • Changing the meaning of an existing member
Class changes that do not require a new subclass
  • Adding a new method
  • Deleting an existing method
  • Changing the implementation of an existing method

If a site deserializing an object graph encounters a Node with a missing Branch, it MUST be treated as an unrecoverable error. The deserializing site MUST NOT assign a default value to an object member.

Examples

All of the predefined schemas in the web-calculus Document Schema specification were generated based on the generic object serialization mechanism defined in this specification.

Footnotes

[code] This specification does not address transporting object behavior, such as program code.

[reify] This schema need not be reified, as it only represents the encoding rules that a general object encoder MUST obey.

[catch] The semantics of the compatibility mechanism are very similar to those used in the exception handling logic of many popular programming languages. In the Java programming language, a catch clause catches all exceptions of an indicated class and any subclass. The implicit assumption is that the subclass has a meaning which is compatible with the base-class. The compatibility mechanism defined here extends these semantics to also solve the upgrade problem.

top

Copyright 2002 - 2005 Waterken Inc. All rights reserved.

Valid XHTML 1.0! Valid CSS!