web-calculus
Abstract Messaging Protocol
2006-02-03
This specification defines an abstract messaging protocol, the
web-amp, for a distributed implementation of the
web-calculus.
A remote edge in the web-calculus is represented by an
unguessable URL. Non-idempotent operations additionally provide
a message identifier used to implement reliable messaging
between hosts.
- The capability semantics of edges is preserved.
- A secure model for handling transmission failures is supported.
- Features that magnify a denial-of-service attack are not required.
- The protocol is easy to understand and use.
- A simple implementation of the protocol is possible.
- Interoperation in a heterogenous network environment is supported.
The edges in the web-calculus have
capability
semantics. The remote referencing architecture must preserve
these semantics.
The protocol is designed to be used over unreliable networks.
"Unreliable" means that messages might be lost or be delivered
to a host more than once. The protocol is designed to ensure
that, so long as the target exists, a sent message will
eventually be processed and that it will be processed at most
once. This guarantee survives failure of network connections
and temporary failure of both the client and server computers.
Given this guarantee, client code need only consider two
possible outcomes of a message send: the message is processed
once; or the message is never processed because the target does
not exist.
A "denial-of-service" attack occurs when a user consumes a large
enough portion of a service's resources that other users are
prevented from using the service. The attacker may generate a
volume of service requests that makes up a significant portion
of the service's maximum throughput. The attacker may send
requests that consume a disproportionate amount of the service's
resources as compared to "normal" requests. The first attack is
a brute force attack upon the service's resource management.
The second attack is an exploit of a flaw in the service's
resource management.
Consonant with its design goals, the web-amp excludes features
whose implementations require exploitable flaws in their
resource management. Accomplishing this goal requires limiting
both the amount of resources consumed by a request and the
ability of the user to schedule the consumption of the
resources. If a service serves multiple distinct users, every
message processed by the service should consume a similar amount
of resources. A user must not be able to schedule delayed
processing of his requests to force sequential processing of a
large group of requests.
By eliminating resource management flaws, a
denial-of-service attack can only be successful if the attacker
has resources comparable in size to the attacked service.
[ddos]
The web-calculus defines a generic interface that wraps the
native interface of a service. If the native interface is
easier to use than the generic one, programmers will prefer the
native interface. To prevent this phenomenon, the protocol's
useability must be great enough that the native protocol cannot
compete based on ease of use.
The web-calculus defines a generic interface for accessing any
type of service. As predicted by Metcalfe's Law, the value of
such an interface grows with the number of services that
implement the interface. Simplifying implementation of the
interface facilitates this phenomenon.
A variety of network protocols exist for communicating between
hosts. Some of these network protocols may be preferable in some
situations and not in others. Some hosts may only support a
subset of these protocols. To support interoperation in such a
heterogenous network environment, this messaging protocol is
designed to be independent of the underlying network protocol.
Protocol independence is achieved by specifying only the
information to transfer and how to act upon that information.
Other specifications specify
how this information is transferred using a particular network
protocol. These specifications handle issues such as connection
negotiation and management.
Implementing the web-calculus across multiple hosts requires: a
mechanism for implementing an edge that crosses the boundary
from a client host to a server host; and a mechanism for
reliably sending an operation along such an edge and receiving
the return value. The implementation of a cross-host edge is
described first. An explanation of reliable operation
transmission follows.
A cross-host web-calculus edge is implemented by a capability
URL. A capability URL
must provide:
Although any URL scheme which meets these requirements can be
used by this protocol, use of a
YURL scheme is recommended.
A capability URL does not prevent the target edge from being
garbage collected by its host. A host is only required to
guarantee that the edge lifetime is continuous. Once the host
reports an edge as deleted, it MUST forever report the same
status. Deletion of a particular edge is typically determined by
the host's application logic.
A common design pattern is to treat each host as a separate
space bank. All URLs exported by the space bank remain valid
during the lifetime of the space bank. When the space bank is
destroyed, so are all edges in the space bank. The overall
application design ensures that this large-grained resource
management happens as a natural consequence of application
interaction.
An
<http://web-calculus.org/pointer/Embed>
is an edge that will eventually refer to a promised value.
This value can be fetched using a GET operation on
the edge. This process is called "settling" of the promise. The
value returned by the GET operation is the
"settled value" of the promise. If the promise has not yet
settled, the GET operation will return another
promise.
A client sends an operation to a host by first reifying the
operation in a message. A message is an
<http://web-calculus.org/amp/Envelope>
which specifies: the URL for the
target
edge; the
operation;
and the
argument list.
The message is routed to the host using information provided by
the
target
URL.
For a non-idempotent operation, the client host SHOULD include
an unguessable
message identifier.
If a message identifier is not provided, the client MUST have
alternate means for preventing message replay attacks.
To support message pipelining, providing a
message identifier
implicitly creates a pipeline promise for the invocation return
value. The client can use this promise in the
construction of subsequent messages. The promise is passed
in place of the invocation argument. [E]
A
<http://web-calculus.org/pointer/Pipeline>
is a promise for an invocation return value. The
'super'
member identifies a promise on the client host that will
eventually settle to the invocation return value.
The server host may also hold a copy of the return value. The
'pipeline' member of a
<Pipeline>
is a reference to the return value held on the server host. The client generates the GUID for this
pipeline URL according to the formula:
promise_guid = to_base32(sha1(to_ascii(mid)))
The input to the SHA-1 hash function is the ASCII bytes
representing the
'mid'
generated by the client. The pipeline GUID
is the base32 encoding of the binary output of the SHA-1 hash
function. [base32]
After processing a request, the server SHOULD also perform this
calculation and bind the pipeline GUID to a locally held
instance of the return value. If the server skips this step, the
benefits of message pipelining are lost.
If a client does not receive a response from a server, the
client MUST resend the request. The client MUST obey any
backoff requirements that the underlying network protocol
specifies for retrying a connection.
The GET operation is
idempotent. If a server receives a duplicate
GET request, it can
simply process the request as if it were the first.
A POST
operation might not be idempotent.
If a server receives a duplicate
POST request, it
MUST respond to the request without causing any side-effects. Duplicate
POST requests are
requests specifying the same
message identifier.
A server may refuse to pipeline some or all requests. In this
case, the server will reject requests on the
pipeline URL
by returning a
<http://web-calculus.org/amp/NotFound>.
The client MUST handle this
case by re-dispatching the operation on the settled value of the
promise.
[multiple connections]
The above recovery procedure covers two possible cases: the
server refused to pipeline the request, meaning that the
pipeline URL was not created; or the promised edge was deleted.
The recovery procedure correctly handles both of these cases.
If the pipeline URL was not created, the rejected request will
never be processed. The re-dispatched operation takes the place
of the rejected request.
If the promised edge was deleted before the rejected request was
processed, the re-dispatched operation will also be rejected.
In all cases, re-dispatching the operation can have no visible
side-effects.
A host may pass a generated, or received,
promise to
another host; however, the recipient host MAY reject the
promise. If a host is unwilling to accept a promise, it MUST
respond to the request by returning a
<http://web-calculus.org/amp/Rejection>
that indicates the
rejected promise.
The sender host MUST handle this case by resending the request
using the settled value of the promise, instead of the promise.
Sending a request using a received pipeline URL creates a race
condition. The sent request may arrive at the server host before
the request that generates the pipeline URL. In this case, the
server will reject the request by returning a
<http://web-calculus.org/amp/NotFound>.
The client host MUST handle this case by waiting until the
promise is settled, and then resending the request to the
server. If the server again rejects the request, the client
MUST re-dispatch the operation on the settled value of the
promise.
As with Recovering from a broken pipeline,
the above recovery procedure covers multiple cases. From the
point after the request is repeated, the logic is the same as
before. The preceding step of resending the request, before
starting the main recovery procedure, is necessary in order to
prevent possible duplication of an operation. The client host
can skip this step if the underlying network protocol prevents
message replay attacks.
[ddos] Given the current
state of the Internet, this advantage is largely academic.
Almost all computers on the Internet are running operating
systems that allow a remote attacker to take control of the
computer. This means that a user can mount a
distributed-denial-of-service attack on a service by using the
computing resources of other users. In this way, it is easy to
acquire the resources needed to mount a brute force
denial-of-service attack on any service. Hopefully someday we
will live in a world where script kiddies cannot unilaterally
appropriate vast portions of the world's computing resources.
[E] The message pipelining feature
is inspired from that in E.
See the E
description of message pipelining.
[base32] To ensure
compatibility with existing protocols and filesystems, the
generated names must not rely on case-sensitivity. It is also
desireable to keep the URL length as short as possible. The
base32 encoding uses the alphabet { a-z, 2-7 }.
See:
<http://www.waterken.com/dev/Enc/base32/>.
[idempotent] If
a server can prove that an operation is idempotent, it need not
guard against receiving duplicates.
[multiple connections] The
recovery procedure assumes that all requests are sent on a
single connection from the client to the server. If multiple
connections are used to send requests from the client to the
server, the request that generates the pipeline may arrive after
the request sent on the pipeline. In this case, the client MUST
use the recovery procedure specified in
Solving the race condition.
|