The Missing Bit

Introducing Base24

TL;DR:
Base24 is a binary-to-text encoding aimed at encoding short keys (32-512 bits) for human usage.

Update:
I was somehow able to write the wrong alphabet in the initial article. This has now been corrected.

Update 2:
Implementations are now available for multiple languages. The list is at the end of the article.

Update 3:
I was surprised that my idea was very well received and I am very grateful for your interest.


I am working on a project where I need to give the user the possibility to recover its account with recovery codes. I generated a few codes and tried different encodings but I could not get something satisfying.

Comparison of some existing solutions

Plain numbers (base 10)

Pros:

Cons:

Hex (base 16)

Pros:

Cons:

Base32

Pros:

Cons:

Base64

Pros:

Cons:

How Base24 came together

The goal is to provide a way to encode and decode binary keys of cryptographic length (32-512 bits). Short keys can be used as recovery codes with key derivation while longer keys can be directly used.

To give an idea of the size of the numbers, here are a few numbers in base 10:

Those numbers are really hard to read directly.

The codes might be dictated over the phone or written down. As seen with credit cards numbers, it can be cumbersome. While a typo in a credit card will at worst lead to a failed transaction, a typo in a cryptographic key will make the data unreadable. Of course the typo can be "brute forced" by technical users, but it could be hard or even impossible for normal users. Which would lead to data loss.

The chosen encoding alphabet must be absolutely unambiguous. No similar characters. Present or not in the alphabet. The length of the alphabet is the minimum length required to store 32bit is 7 characters (instead of 8 for hex), which is 24. The alphabet must also be case insensitive.

A list of ambiguous characters (both cases are displayed for letters):

The ambiguity is also taken into account when hand written.

The final alphabet I came up with is ZAC2B3EF4GH5TK67P8RS9WXY. As I required 24 characters, I kept G and 6 which are the least ambiguous in the list. The order of the characters is arbitrary, I just ensured the characters where not sorted, to ensure the string would stand out and not being seen as a full alphabet. I put the Z first so that a series of 0 would be ZZZ... which is snoring/sleeping and made me smile. This is of course technically irrelevant, but computers are made for people.

The data length must be multiple of 32 bits. There is no padding mechanism in the encoder.

Example

Let's take a 128 bit data:

Or 64 bit which is reasonable for recovery code when used with key derivation:

As we can see, this is manageable by a human for copy on paper with a low risk of error.

Implementations

Licensing

If this is ever to used by anybody, consider it public domain.