WaterkenTM Web
Abstract Messaging Protocol
2003-11-16
This specification defines an abstract messaging protocol, the web-amp, for a distributed implementation
of the web-calculus.
A remote edge in the WaterkenTM Web
calculus is represented by an unguessable URI. An operation message sent
using such a URI additionally specifies an unguessable URI for a return continuation on the client host.
The return URI is used to implement reliable messaging between remote hosts.
- The capability semantics of edges is preserved.
- A secure model for handling transmission failures is supported.
- Features that magnify a denial-of-service attack are not required.
- The protocol is easy to understand and use.
- A simple implementation of the protocol is possible.
- Interoperation in a heterogenous network environment is supported.
The edges in a WaterkenTM Web have
capability semantics. The remote referencing
architecture must preserve these semantics.
The protocol is designed to be used over unreliable networks. "Unreliable" means that messages might be
lost or be delivered to a host more than once. The protocol is designed to ensure that a sent
message will eventually be processed and that it will be processed at most once. This guarantee survives
failure of network connections and temporary failure of both the client and server computers. Given
this guarantee, client code need only consider two possible outcomes of a message send: the message is
processed once; or the message is never processed because the target does not exist.
A "denial-of-service" attack occurs when a user consumes a large enough portion of a service's resources
that other users are prevented from using the service. The attacker may generate a volume of service
requests that makes up a significant portion of the service's maximum throughput. The attacker may send
requests that consume a disproportionate amount of the service's resources as compared to "normal"
requests. The first attack is a brute force attack upon the service's resource management. The second
attack is an exploit of a flaw in the service's resource management.
Consonant with its design goals, the WaterkenTM Web protocol
excludes features whose implementations require exploitable flaws in their resource management.
Accomplishing this goal requires limiting both the amount of resources consumed by a request and the
ability of the user to schedule the consumption of the resources. If a service serves multiple
distinct users, every message processed by the service should consume a similar amount of
resources. A user must not be able to schedule delayed processing of his requests to force sequential
processing of a large group of requests. To ensure that these limitations can be met, the protocol is
designed to consume a fixed amount of resources for each processed message. The protocol also gives a
service great freedom in scheduling message processing.
By eliminating resource management flaws, a denial-of-service attack can only be
successful if the attacker has resources comparable in size to the attacked service.
[ddos]
The WaterkenTM Web calculus defines a generic interface that wraps the
native interface of a service. If the native interface is easier to use than the generic one,
programmers will prefer the native interface. To prevent this phenomenon, the protocol's useability
must be great enough that the native protocol cannot compete based on ease of use.
The WaterkenTM Web calculus defines a generic interface for accessing
any type of service. As predicted by Metcalfe's Law, the value of such an interface grows with the number
of services that implement the interface. Simplifying implementation of the interface facilitates this
phenomenon.
A variety of network protocols exist for communicating between hosts. Some of these network protocols
may be preferable in some situations and not in others. Some hosts may only support a subset of these
protocols. To support interoperation in such a heterogenous network environment, this messaging protocol
is designed to be independent of the underlying network protocol. Protocol independence is achieved by
specifying both the information to transfer and how to act upon that information.
Other WaterkenTM specifications
specify how this information is transferred using a particular network protocol. These
specifications handle issues such as connection negotiation and management.
Implementing the WaterkenTM Web calculus
across multiple hosts requires: a mechanism for implementing an edge that crosses the boundary from a
client host to a server host; and a mechanism for reliably sending an operation along such an edge and
receiving the return value. The implementation of a cross-host edge is described first. An explanation
of reliable operation transmission follows.
A cross-host WaterkenTM Web edge is implemented by a capability URI.
A capability URI must provide:
The URI scheme may identify the host by directly giving its location or by specifying
a locating service which can be consulted to ascertain the host's current location. The use of a
locating service is encouraged. Any URI scheme which meets these requirements can be used by this
protocol.
A cross-host edge does not prevent the target node from being garbage collected by its host. A host is
only required to guarantee that the node lifetime is continuous. Once the host reports a node as
deleted, it must forever report the same status. Deletion of a particular node is typically determined
by the host's application logic.
A common design pattern is to treat each host as a separate space bank. All URIs exported by the space
bank remain valid during the lifetime of the space bank. When the space bank is destroyed, so are all
nodes in the space bank. The overall application design ensures that this large-grained resource
management happens as a natural consequence of application interaction.
A client sends an operation to a server by first reifying the operation in a message. A message is an
<http://waterken.com/amp/Envelope>
which specifies: the URI for the source node;
the path to the target node; the
operation; the
argument list; and the URI for the
return continuation.
The message is sent to the server using the URI for the
source node.
For each request message, the client host SHOULD create a continuation for
receiving the operation return value. The URI for this node is specified by the
'reply_to' member. If specified, the
'reply_to' URI MUST be unique
to the message. [REPLY-TO]
To support message pipelining, providing a
return continuation implicitly creates a
promise for the operation return value. The client can use this promise in the
construction of subsequent messages. [E]
A
<http://waterken.com/doc/pointer/Pipeline>
is a promise for an operation return value. The return value is identified by two URIs: one that
specifies the eventual location of the return value on the server host; and another that specifies the
eventual location of the return value on the client host. The server URI is specified by the
'promise' member. The
client URI is contained in the
'super' member. Both
URIs contain the same GUID. The promise GUID is generated according to the formula:
promise_guid = to_base32(sha1(to_ascii(source_guid + reply_to_guid)))
The input to the SHA-1 hash function is the concatenation of: the ASCII bytes representing
the GUID in the source URI; and the ASCII bytes
representing the 'reply_to'
GUID, generated by the client.
The binary output of the SHA-1 hash function is used to generate the promise GUID.
The promise GUID is the base32 encoding of the computed hash.
[base32]
A server MUST immediately respond to a received request. The server may execute the operation and
respond with the operation return value, or the server may respond with a promise for the return value.
The promise is an
<http://waterken.com/doc/pointer/Embed>
containing the URI that will eventually be assigned to the return value.
In either case, the response is a POST operation. The
operation source is the
return continuation from the request
<http://waterken.com/amp/Envelope>.
The operation path is empty. The sole argument
in the argument list is the return value.
After processing a request, the server SHOULD bind the implicit promise GUID, generated according to the
procedure specified in the Generating the promise URI section,
to a locally held instance of the return value. The server MAY skip this step, but in this case, the
benefits of message pipelining will be lost.
A server can redirect a request by returning a
<http://waterken.com/amp/Redirect>.
The client SHOULD re-dispatch the operation, using the specified
'src' node and
'path'.
If a client does not receive a response from a server, the client SHOULD simply resend the request
<http://waterken.com/amp/Envelope>.
The client MUST obey any backoff requirements that the underlying network protocol specifies for
retrying a connection.
The operations: GET;
EXPECT;
SETTLE are idempotent. If a server receives a duplicate
request for one of these operations, it can simply process the request as if it were the first.
A POST operation might not be
idempotent. A server MUST NOT appear to process a POST
operation more than once. A POST request is uniquely
identified by its 'reply_to'
URI. If a server receives a duplicate POST request, it MUST
respond to the request using the previously calculated return value, rather than executing the
invocation again. The server MUST maintain this ability until the client confirms receipt of the return
value. If the client has confirmed receipt of the return value, the server can ignore future duplicate
requests. [idempotent]
If the source node of a
POST request is not a local pass-by-reference node, the
server MUST NOT process the request. The server MUST respond to the request by returning a
<http://waterken.com/amp/Redirect>.
If the source node exists on more than one host,
the server cannot guarantee that the request will be processed only once. A
POST initiated from a pass-by-copy node can only be
processed by the host that originated the operation.
A server may refuse to pipeline some or all requests. In this case, the server will respond to requests
on the promise URI by returning a
<http://waterken.com/amp/NotFound>.
The client SHOULD handle this case by waiting for the promise to settle
and then re-dispatching the operation on the resolved value of the promise.
[multiple connections]
The above recovery procedure covers two possible cases: the server refused to pipeline the request,
meaning that the promise URI was never created; or the promised node was deleted. The recovery procedure
correctly handles both of these cases.
If the promise URI was never created, the request will never be processed. The re-dispatched operation
takes the place of the rejected request.
In the other possible case, the promise URI was created, but the promised node was deleted before the
request was processed. This situation may occur for multiple reasons, such as: the application logic
for resource management is faulty, causing the promised node to be prematurely deleted; or a
man-in-the-middle attacker intercepted the request response and then later resent the request, after the
promised node was deleted. This case itself has two possible sub-cases: the promised node was
pass-by-reference; or the promised node is pass-by-copy. If the promised node was pass-by-reference, the
re-dispatched operation will be rejected just as the original operation was. If the promised object is
pass-by-copy, the original request is guaranteed to produce no side-effects.
In all cases, re-dispatching the operation can have no visible side-effects.
A client host may pass a generated, or received, promise to a server host; however, the server host MAY
reject the promise. If the server host is unwilling to accept the promise, it MUST respond to the
request by returning a
<http://waterken.com/amp/Rejection>
that indicates the rejected promise. The
client host SHOULD handle this case by waiting for the specified promise to settle. After the promise
has settled, the client SHOULD rewrite the request, replacing the promise with its settled value. The
rewritten request can then be retried.
Sending a request on a received promise creates a race condition. The sent request may arrive at the
server host before the request that generates the promise URI. In this case, the server will reject the
request by returning a
<http://waterken.com/amp/NotFound>.
The client host SHOULD handle this case by waiting until the promise is settled, and then resending the
request to the server. If the server again rejects the request, the client SHOULD re-dispatch the
operation on the settled value of the promise.
As with Recovering from a broken pipeline, the above
recovery procedure covers multiple cases. From the point after the request is repeated, the logic is the
same as before. The preceding step of resending the request, before starting the main recovery
procedure, is necessary in order to prevent possible duplication of an operation.
A network protocol that provides a binding for the WaterkenTM Web
abstract messaging protocol must support:
- a mechanism for locating the server host based on the
source URI
- a mechanism for authenticating the server host
- a mechanism to protect the secrecy of a message while in transit between the
client and server
- a mechanism to send a message and receive the response
[ddos] Given the current state of the internet, this advantage
is largely academic. Almost all computers on the internet are running operating systems that allow a
remote attacker to take control of the computer. This means that a user can mount a
distributed-denial-of-service attack on a service by using the computing resources of other users.
In this way, it is easy to acquire the resources needed to mount a brute force denial-of-service
attack on any service. Hopefully someday we will live in a world where script kiddies cannot
unilaterally appropriate vast portions of the world's computing resources.
[REPLY-TO] The
'reply_to' request member is
similar in purpose to the combination of the REPLY-TO and complaint
continuations used in
ACT 1.
[E] The message pipelining feature is inspired from that in
E. See the E
description of message pipelining.
[base32] To ensure compatibility with existing protocols and
filesystems, the generated names must not rely on case-sensitivity. It is also desireable to keep
the URI length as short as possible. The base32 encoding uses the alphabet { a-z, 2-7 }. See:
<http://www.waterken.com/dev/Enc/base32/>.
[idempotent] If a server can prove that a given
invocation is idempotent, it need not guard against receiving duplicates.
[multiple connections] The recovery
procedure assumes that all requests are sent on a single connection from the client to the server. If
multiple connections are used to send requests from the client to the server, the first request that
generates the pipeline may arrive after the request sent on the pipeline. In this case, the client MUST
use the recovery procedure specified in
Solving the race condition.
|