Using GitHub code scanning and CodeQL to detect traces of Solorigate and other backdoors

Image of Bas van Schaik

Last month, a member of the CodeQL security community contributed multiple CodeQL queries for C# codebases that can help organizations assess whether they are affected by the SolarWinds nation-state attack on various parts of critical network infrastructure around the world. This attack is also referred to as Solorigate (by Microsoft), or Sunburst (by FireEye). In this blog post, we’ll explain how GitHub Advanced Security customers can use these CodeQL queries to establish whether their build infrastructure is infected with the malware.

What happened?

Early December 2020, a security consultancy firm, FireEye, published details of a nation-state attack on SolarWinds, a company that provides network monitoring tools to various organisations, including the US government. As part of the attack, the hackers succeeded in backdooring SolarWinds’ Orion network monitoring product, which was shipped to a large number of their customers. The attackers subsequently gained access to networks in which the Orion product was deployed.

Over the past few years, Microsoft has been using CodeQL to investigate vulnerabilities and data breaches. The CodeQL query contributions were a major element in their response against this attack, as well as past investigations.

What is build hijacking?

The malware spreads by backdooring build systems in order to inject malicious code into product releases, and in turn compromise the users of a shipped release. In particular, it monitors for invocations of msbuild.exe (Microsoft Build Engine) processes. By giving itself debugging privileges, it injects additional malicious code into the build process. This means that while the codebases themselves do not contain any malicious commits or other traces of the malware, the products that are built from those codebases do contain the malware. This process of “build hijacking” is explained in more detail in this technical analysis from Crowdstrike.

CodeQL security analysis

GitHub CodeQL is a semantic code analysis engine that uses queries to analyze source code and find unwanted patterns. For example, CodeQL can track data from an untrusted source (e.g., an HTTP request) that ends up in a potentially dangerous place (e.g., a string concatenation inside a SQL statement resulting in a SQL injection vulnerability).

CodeQL queries can be run on source code databases that CodeQL generates during the build process (for compiled languages). To do so, CodeQL closely observes the build process and subsequently extracts the relevant parts of the source code that is used to build a binary. The output of the extraction process is a structured representation of the source code in relational form: a CodeQL database.

Using CodeQL to detect traces of Solorigate

If a build server is backdoored with the build hijacking component of the Solorigate malware campaign, the malware will inject additional source code at compilation time. If CodeQL is observing the build process on the infected server, it will extract the injected malicious source code together with the genuine source code. The resulting CodeQL database will therefore contain traces of the malicious Solorigate source code. Note that if your CodeQL database is generated on a machine that is not infected, the database will not contain any injected source code.

Diagram showing code scanning workflow described in blog post

The CodeQL queries that were contributed by the Microsoft team will detect patterns of malicious C# code injected by the malware. The best way to run these queries is by manually creating a CodeQL database on the potentially-affected server(s), and analyzing that database with the CodeQL extension for Visual Studio Code.

Alternatively, you can generate the CodeQL database and run the queries through a CI/CD pipeline. This could detect build injection on the systems that run your CI/CD jobs (and may be used to build your release artifacts).

Running the CodeQL queries using Visual Studio Code

  1. Install the VS Code plugin for CodeQL, and follow the Quick start guide to set up the starter workspace.
  2. Generate a CodeQL database by building your C# source code on a potentially-infected build server.
  3. Transfer the CodeQL database to your machine.
    Note: the CodeQL database itself does not contain any (potentially dangerous) compilation artifacts or infected executables. It contains (1) a plaintext copy of the source code that was compiled, and (2) a relations representation of that code.
  4. Load the potentially-affected CodeQL database into VS Code.
  5. Navigate to ql/csharp/ql/src/codeql-suites, where you’ll find the solorigate.qls CodeQL query suite file. Right-click on the file, and select CodeQL: Run Queries in Selected Files.

UI screenshot that shows how to run a CodeQL query how to

Repeat steps 2-5 for every codebase that is potentially affected.

Running the CodeQL queries in GitHub Code Scanning

In order to run the additional CodeQL queries on a C# codebase in GitHub Code Scanning, create a file .github/codeql/solorigate.qls in the repository you would like to analyze:

- import: codeql-suites/solorigate.qls
from: codeql-csharp

Next, set up a default CodeQL workflow (or edit an existing workflow) and amend the “Initialize CodeQL” section of the template as follows:

- name: Initialize CodeQL
uses: github/codeql-action/init@v1
languages: csharp
queries: ./.github/codeql/solorigate.qls

If your code requires a special build command to compile, please refer to the documentation on customizing the CodeQL Code Scanning analysis.

With the above configuration, the additional CodeQL queries will be run. If CodeQL detects any malware indicators (Solorigate or otherwise) in your source code, it will produce an alert in the GitHub Code Scanning web interface.

Screenshot of code scanning alert

For more information and configuration examples, please refer to the documentation for running custom CodeQL queries in GitHub Code Scanning.

Next steps

If CodeQL flags up suspicious elements in a product or codebase, you should conduct a careful manual code review of the affected area. In particular, we suggest that you compare the code that was seen by CodeQL to the original source code.

The queries contributed by Microsoft’s Solorigate response team serve as a heuristic for detecting backdoors, like the one involved in the Solorigate attack. A negative result does not necessarily rule out that a system or network is compromised. Analyzing codebases using CodeQL should be considered just one part in a mosaic of techniques to audit for compromise. For more information on the attack and advice on other auditing techniques, please refer to the Microsoft Solorigate Resource Center.

If you have any questions related to CodeQL and Solorigate, please contact your GitHub Advanced Security representative. If you are not currently a GitHub customer, please contact us through this form, and we’ll be happy to assist further.

Further reading

If you’d like to know more about the technical background of the Solorigate queries, please refer to this post on the Microsoft Blog.