Learn how researchers and security experts at GitHub, Microsoft, and Santander came together to address the challenges presented by the post-quantum cryptography world.
When you hear the words, “quantum computing,” it sounds like something out of a science fiction movie. Yet in recent years, quantum computing has become a hot topic, especially in the world of cryptography. Post-quantum cryptography raises many questions and challenges, and a group of researchers and security experts across GitHub, Santander, and Microsoft came together to start trying to tackle them. They started with a question: how do you understand how cryptography is used and implemented, whether it be on-prem or in the cloud, across hundreds of thousands if not millions of lines of code?
To tackle this initial problem, the team decided to use a number of building blocks to create queries and run them at scale. One of these tools was CodeQL, the static analysis engine that powers GitHub’s code scanning. CodeQL allows you to model applications like data and then run queries against that data. Once the team had the queries they wanted to run, they needed a way to scale them across thousands of repositories. This is where multi-repository variant analysis (MRVA) came into play. MRVA allows you to scale your threat hunting by running a single CodeQL query across up to a thousand repositories.
Below, we’ll dive more into post-quantum cryptography, what organizations can do to prepare themselves, and how they can leverage CodeQL to help.
What is post-quantum cryptography?
In its simplest form, quantum computing harnesses the principles of quantum mechanics to process information and perform computations–giving it the ability to solve complex problems beyond the capabilities of traditional computers. If a traditional computer solves problems by doing steps one after the other, quantum computers can solve problems by doing multiple steps at the same time.
Meanwhile, cryptography aims to secure information through the use of mathematical techniques, ensuring that only authorized individuals can access and understand it. Think of it like a secret code or message—that only those with the key can understand and will look like a jumble of letters or numbers to anyone else.
With the potential to unravel the foundation of conventional security measures, the looming advent of large-scale quantum machines poses a formidable challenge to the security of digital communications. The urgent need for safeguarding our online privacy and data integrity has birthed the concept of post-quantum cryptography. Also known as quantum-resistant cryptography, this innovative field aims to construct cryptographic systems resilient against the prowess of both quantum and classical computers while seamlessly integrating with existing communication networks.
Generating a Cryptography Bill of Materials
In order to meet the demands of this new post-quantum cryptographic world, organizations will need to create a Cryptography Bill of Materials or CBOM. A CBOM is a record containing the details of various cryptographic software components used in a software system. There are a number of challenges, however, when trying to generate a CBOM, including API variability, data flow complexity, API modeling scope, and abstraction unification.
Taking the example of data flow complexity, the challenge lies in having to trace sources of data to sinks of configuration. The configuration might be an initialization vector, a key size, or how an algorithm specified itself. These data flow challenges will also have to be solved interprocedurally on large programs. This is where CodeQL can help.
Leveraging CodeQL for CBOM generation
The ability to write custom queries for CodeQL allows users a great deal of flexibility in what they want the output of their analysis to be. The team leveraged custom queries for CBOM generation, by writing simple informative queries. Once they’d written the queries, they had to define an open-source crypto abstraction model in CodeQL. They used abstract classes to represent cryptography concepts. They then extended the abstractions for crypto APIs. This gave the CBOM results, which they improved over time as more APIs were modeled. They made all of this work open source, so you can leverage these queries in your own organization.
Applying variant analysis to CBOMs
When we start thinking about understanding a CBOM just looking at a top-level application or a single package isn’t enough. What we see at GitHub is that the majority of a typical enterprise application is made up of open source code. As you look at your software supply chain and dependency tree, there is a massive amount of dependencies that help make your application function. This means, when thinking about a CBOM, there’s a lot of additional information that we need to make sure we’re looking at.
This is where we can start to use techniques like variant analysis. Variant analysis is a technique that allows us to search for variants of specific issues or vulnerabilities. Multi-repository variant analysis scales CodeQL’s variant analysis capabilities across thousands of repositories. This is critical for organizations that have hundreds of thousands of different repositories that may need to be scanned.
Tying this back to CBOMs, you can use these existing technologies to create a nice workflow for how you would use existing technologies to generate a CBOM. Above, you can see an example of what this workflow would look like and the steps you’d take to ultimately generate your own CBOM.
Now that we have our CBOM, it’s time to use that information to drive action. This will be a growing area of knowledge over the years, but for now, this is a great place to use tools like generative AI to help us identify and improve the quality of our code to address the issues that have been identified. Specifically, features like GitHub Copilot Chat can give guidance on the best paths to take to understand the issue and recommend updated algorithms.
Looking ahead: instilling cryptographic agility in your organization
Post-quantum cryptography has spurred us to gain a better understanding about how and where we’re using cryptography. We need to drastically rethink the use of cryptography in our projects and how it affects our software supply chain. This requires new approaches, such as rethinking how and where cryptographic standards would be used.
So, how do we prepare for the post-quantum cryptographic world? The key is cryptographic agility, understanding what you have, where it is and how it’s used. Once you’ve understood the risks, located and assessed your software supply chain, generated your CBOM, and scaled your efforts across your organization, you can instill this cryptographic agility.
How can I use these methods in my organization?
In order to get started using these methods in your own organization, you can leverage a GitHub Action that has been created to produce a report on both first party code and third party dependencies used in a repository. The instructions to set that up can be found here, and you can see a demo of these steps below.
Additional options are available to control analysis of third party dependencies, the autobuild of source code for CodeQL analysis, and the threshold for the inclusion of source code bytes. Any open source dependencies that cannot be analyzed during a workflow run will be reported to GitHub here for out-of-band analysis, with results cached publicly for any future workflow run.
In this post, I’ll exploit CVE-2024-5830, a type confusion in Chrome that allows remote code execution (RCE) in the renderer sandbox of Chrome by a single visit to a malicious site.