home -> developer -> Enc -> base32

previous version Java implementation

base32 Encoding


The base32 encoding is designed to represent arbitrary sequences of octets in a form that is suitable for inclusion in a URI or filename.


Each 5 bits of input is encoded as a character from the alphabet: { 'a'-'z', '2'-'7' }. Bits are encoded in big-endian order. No padding character is used. During decoding, differences in case are ignored; however, any other character from outside the alphabet is treated as an unrecoverable error.


Design goals

  1. An arbitrary sequence of octets can be represented.
  2. An encoding can be included in a URI without character escaping.
  3. The encoding does not depend on case being preserved.
  4. The encoding is compact.
  5. Encoding and decoding are fast.

An arbitrary sequence of octets

The encoded data may be a cryptographic hash or a nonce.

URI inclusion

The purpose of this encoding is to include octet sequences in a URI. The encoding alphabet is restricted to the unreserved characters in RFC 2396. [URI]


In some situations, the transport does not guarantee that case will be preserved during transmission. For example, case is not preserved with DNS names and filenames in some filesystems. Non-compliant SMTP software may also fail to preserve case in mailbox names.

Compact encoding

A URI or filename is easier to use when it is short.

Fast encoding

Encoding/decoding time will often be included in the latency of lookup operations.


A 32-character subset of US-ASCII is used, enabling 5 bits to be represented per printable character.

Octets are encoded from first to last, with each octet being processed in big-endian bit order. This binary stream is processed in 5 bit groups, each of which is translated into a single character in the base32 alphabet. Each 5-bit group is used as an index into an array of 32 printable characters. The character referenced by the index is placed in the output string. These characters, identified in Table 1, below, are selected from US-ASCII digits and lowercase letters.

Table 1
index char index char index char index char
0 'a' 8 'i' 16 'q' 24 'y'
1 'b' 9 'j' 17 'r' 25 'z'
2 'c' 10 'k' 18 's' 26 '2'
3 'd' 11 'l' 19 't' 27 '3'
4 'e' 12 'm' 20 'u' 28 '4'
5 'f' 13 'n' 21 'v' 29 '5'
6 'g' 14 'o' 22 'w' 30 '6'
7 'h' 15 'p' 23 'x' 31 '7'

If the length of the binary input stream is not a multiple of 5, the stream is padded with 0 bits to the next multiple of 5.

During decoding, changes in case are ignored; however, any other character from outside of the case-insensitive encoding alphabet is treated as an unrecoverable decoding error. The decoder MUST also verify that the encoding represents an integral number of octets and that any padding bits are 0 bits.

Relationship to RFC 3548

The base32 encoding defined by this specification is derived from the base32 encoding defined by RFC 3548. [RFC 3548] The differences between the base32 encoding defined here, and that in RFC 3548 are:

  • '=' padding MUST NOT be added
  • case MUST NOT be significant
  • an encoding MUST NOT contain characters outside the case-insensitive encoding alphabet
  • the canonical encoding MUST use lowercase letters


[URI] T. Berners-Lee, R. Fielding and L. Masinter; "Uniform Resource Identifiers (URI): Generic Syntax and Semantics"; RFC 2396; August 1998.

[RFC 3548] S. Josefsson; "The Base16, Base32, and Base64 Data Encodings"; RFC 3548; July 2003.


Copyright 2002 - 2003 Waterken Inc. All rights reserved.

Valid XHTML 1.0! Valid CSS!