Behind GitHub’s new authentication token formats

We’re excited to share a deep dive into how our new authentication token formats are built and how these improvements are keeping your tokens more secure. As we continue to…

Indigo K·@indigok

April 5, 2021 | Updated May 10, 2023

| 4 minutes

We’re excited to share a deep dive into how our new authentication token formats are built and how these improvements are keeping your tokens more secure. As we continue to focus on the security of our platform and services across the web, this update shows how big an impact simple changes can have.

Many of our old authentication token formats are hex-encoded 40 character strings that are indistinguishable from other encoded data like SHA hashes. These have several limitations, such as inefficient or even inaccurate detection of compromised tokens for our secret scanning feature. We continually strive for security excellence, so we knew that token detection was something we wanted to improve. How could we make our tokens easier to identify and more secure?

Without further ado, here are the design decisions behind our new authentication token formats that let us meet both goals.

Identifiable prefixes

As we see across the industry from companies like Slack and Stripe, token prefixes are a clear way to make tokens identifiable. We are including specific 3 letter prefixes to represent each token, starting with a company signifier, gh, and the first letter of the token type. The results are:

ghp for GitHub personal access tokens
gho for OAuth access tokens
ghu for GitHub user-to-server tokens
ghs for GitHub server-to-server tokens
ghr for refresh tokens

Additionally, we want to make these prefixes clearly distinguishable within the token to improve readability. Thus, we are adding a separator: _. An underscore is not a Base64 character which helps ensure that our tokens cannot be accidentally duplicated by randomly generated strings like SHAs.

One other neat thing about _ is it will reliably select the whole token when you double click on it. Other characters we considered are sometimes included in application word separators and thus will stop highlighting at that character. Try out double clicking this-random-text versus this_random_text!

With this prefix alone, we anticipate the false positive rate for secret scanning will be down to 0.5%.⚡

Checksum

Identifiable prefixes are great, but let’s go one step further. A checksum virtually eliminates false positives for secret scanning offline. We can check the token input matches the checksum and eliminate fake tokens without having to hit our database.

A 32 bit checksum in the last 6 digits of each token strikes the optimal balance between keeping the random token portion at a consistent entropy and enough confidence in the checksum. We start the implementation with a CRC32 algorithm, a standard checksum algorithm. We then encode the result with a Base62 implementation, using leading zeros for padding as needed.

Token entropy

We of course can’t forget about token entropy. Entropy is a logarithmic measure of information or uncertainty inherent in the possible token combinations. We use it as a representation of uniqueness for a given pattern and it’s important to maintain for the vast number of tokens we generate everyday. For personal access tokens alone, we create over 10k on a slow day and upwards of 18k on peak days. With our new formats, not only did we maintain our previous levels — we increased them!

Previously, our implementation for OAuth access tokens had an entropy of 160:

Math.log(((“a”..“f”).to_a + (0..9).to_a).length)/Math.log(2) * 40 = 160

Our implementation for OAuth access tokens are now 178:

Math.log(((“a”..“z”).to_a + (“A”..“Z”).to_a + (0..9).to_a).length)/Math.log(2) * 30 = 178

As we continue to grow and move forward, we will increase this entropy even more. But for now, we are thrilled our tokens have increased identifiability, security, and entropy — all without changing the token length.

What does this mean for you?

As a GitHub user…

We strongly encourage you to reset any personal access tokens and OAuth tokens you have. These improvements help secret scanning detection and will help you mitigate any risk to compromised tokens. You can reset your personal access tokens by going to developer settings and your OAuth tokens with our API.

As a service provider…

If you issue tokens as part of your platform and aren’t part of our secret scanning feature, we encourage you to follow the guidelines we outline here for your own tokens and join our secret scanning program so we can keep your tokens secure too.

We thank you for helping us make our platform and services the best and most secure they can be.✨

Written by

Engineering

Behind GitHub’s new authentication token formats

Identifiable prefixes

Checksum

Token entropy

What does this mean for you?

Tags:

Written by

Indigo K

Related posts

How GitHub engineers tackle platform problems

GitHub Issues search now supports nested queries and boolean operators: Here’s how we (re)built it

Design system annotations, part 2: Advanced methods of annotating components

Tags:

Written by

Related posts

We do newsletters, too