home -> developer -> Enc -> base32

WaterkenTM Enc

base32 Encoding

2003-06-27

The base32 encoding is designed to represent arbitrary sequences of octets in a form that is suitable for inclusion in a URI or filename.

Abstract

Each 5 bits of input is encoded as a character from the alphabet: { 'a'-'z', '2'-'7' }. Bits are encoded in big-endian order. No padding character is used. During decoding, differences in case are ignored; however, any other character from outside the alphabet is treated as an unrecoverable error.

Overview

Design goals

  1. An arbitrary sequence of octets can be represented.
  2. An encoding can be included in a URI without character escaping.
  3. The encoding does not depend on case being preserved.
  4. The encoding is compact.
  5. Encoding and decoding are fast.

An arbitrary sequence of octets

The encoded data may be a cryptographic hash or a nonce.

URI inclusion

The purpose of this encoding is to include octet sequences in a URI. The encoding alphabet is restricted to the unreserved characters in RFC 2396. [URI]

Case-insensitive

In some situations, the transport does not guarantee that case will be preserved during transmission. For example, case is not preserved with DNS names and filenames in some filesystems. Non-compliant SMTP software may also fail to preserve case in mailbox names.

Compact encoding

A URI or filename is easier to use when it is short.

Fast encoding

Encoding/decoding time will often be included in the latency of lookup operations.

Description

A 32-character subset of US-ASCII is used, enabling 5 bits to be represented per printable character.

Octets are encoded from first to last, with each octet being processed in big-endian bit order. This binary stream is processed in 5 bit groups, each of which is translated into a single character in the base32 alphabet. Each 5-bit group is used as an index into an array of 32 printable characters. The character referenced by the index is placed in the output string. These characters, identified in Table 1, below, are selected from US-ASCII digits and lowercase letters.

Table 1
index char index char index char index char
0 'a' 8 'i' 16 'q' 24 'y'
1 'b' 9 'j' 17 'r' 25 'z'
2 'c' 10 'k' 18 's' 26 '2'
3 'd' 11 'l' 19 't' 27 '3'
4 'e' 12 'm' 20 'u' 28 '4'
5 'f' 13 'n' 21 'v' 29 '5'
6 'g' 14 'o' 22 'w' 30 '6'
7 'h' 15 'p' 23 'x' 31 '7'

If the length of the binary input stream is not a multiple of 5, the stream is padded with 0 bits to the next multiple of 5.

During decoding, changes in case are ignored; however, any other character from outside of the case-insensitive encoding alphabet is treated as an unrecoverable decoding error. The decoder MUST also verify that the encoding represents an integral number of octets and that any padding bits are 0 bits.

References

[URI] T. Berners-Lee, R. Fielding and L. Masinter; "Uniform Resource Identifiers (URI): Generic Syntax and Semantics"; RFC 2396; August 1998.

top

Copyright 2002 - 2003 Waterken Inc. All rights reserved.

Valid XHTML 1.0! Valid CSS!