code-scanning

Subscribe to all “code-scanning” posts via RSS or follow GitHub Changelog on Twitter to stay updated on everything we ship.

~ cd github-changelog
~/github-changelog|main git log main
showing all changes successfully

We have started creating and storing CodeQL databases for the most popular open-source projects on GitHub.com. If you use CodeQL for security research, you can now obtain these databases easily and directly through the CodeQL extension for Visual Studio Code, which makes it much easier to write and run your own custom CodeQL queries.

Using CodeQL for security research

The CodeQL engine powers GitHub code scanning: it analyses source code and flags up potential security problems (for example, in pull requests). By default, code scanning runs a large set of open source queries that are able to identify the most important and common security problems.

CodeQL is also a powerful tool for variant analysis and other types of security research. CodeQL treats source code as data, and anyone can write custom CodeQL queries to explore a codebase and identify vulnerabilities. Like code search on steroids!

The first step of any CodeQL analysis is extracting the source code into a CodeQL database. This database contains a relational representation of the source code — including elements like the abstract syntax tree, the data flow graph, and the control flow graph. You can create CodeQL databases yourself using the CodeQL CLI, but with the feature we shipped today, it's much quicker to get started: you can download a ready-built CodeQL database from GitHub.com.

Downloading CodeQL databases from GitHub.com in VS Code

To download a CodeQL database for use in the CodeQL extension in VS Code:

  1. Make sure you have set up the CodeQL extension for VS Code. For more information, see Setting up CodeQL in Visual Studio Code.
  2. Open the CodeQL databases view in the extension.
  3. Hover over the sidebar, click the GitHub icon, and specify the owner/repo identifier of the public repository you'd like to analyze.

    image

Once you've downloaded a CodeQL database, you're ready to start your research. Find more information in the CodeQL documentation.

FAQs

How many CodeQL databases are available?

We currently store databases for over 200,000 repositories on GitHub.com. That list is constantly growing and evolving to make sure that it includes the most interesting codebases for security research.

What languages are can you download CodeQL databases for?

We create and store databases for all of the languages that we support in CodeQL code scanning. For more information, see About code scanning with CodeQL.

Can I download CodeQL databases outside VS Code?

Yes, you can also download CodeQL databases using the GitHub REST API. For more information, see Downloading databases from GitHub.com in the CodeQL CLI documentation.

Why is there no CodeQL codebase available for my favourite open source repository?

If there is a repository that you'd like to analyze, but a CodeQL database is not available yet, then you can trigger the creation (and storing) of a database by enabling GitHub code scanning with the CodeQL engine. Alternatively, you could fork the repository and enable code scanning on the fork. For more information, see the code scanning documentation.

See more

The default code scanning query suites include checks for the most important security vulnerabilities for each supported language, so that any potential problems can be surfaced to developers before they are committed to their repository. However, in some situations a particular check is not relevant for a codebase and you might prefer to not run that CodeQL query. You can now easily exclude queries using code scanning query filters.

Query filters use the same syntax as CodeQL query suites and you can filter on any CodeQL query metadata property. Query filters must be specified in a custom code scanning configuration file, which you refer to from your code scanning analysis workflow file.

In your code scanning workflow file, use the config-file parameter of the init action to specify the path to the configuration file you want to use:

- uses: github/codeql-action/init@v2
  with:
    config-file: path/to/config/file.yml

In your configuration file, specify the query filters you want to use. For example, to exclude the Unsafe HTML constructed from library input query from the default code scanning query suite for JavaScript you can specify its id in an exclude block:

name: "My code scanning CodeQL config"

query-filters:
- exclude:
     id: js/html-constructed-from-input

For more information about how to use query filters, see Configuring code scanning in the code scanning documentation.

See more

It's now easier to debug CodeQL analysis problems in code scanning: click Re-run jobs from the GitHub Actions workflow run page, check the Enable debug logging box, and hit the Re-run jobs button.

Re-run all jobs

The data will be uploaded as an Actions artifact named debug-artifacts, attached to the workflow run. Such artifacts contain CodeQL logs, CodeQL databases, and the SARIF files that were produced.

Actions artifacts

These artifacts will help you when you're debugging problems with CodeQL code scanning. When contacting GitHub support, you might be asked for this data.

As part of the analysis, CodeQL extracts your source code into a relational database format. The debug artifacts include more detailed information about CodeQL extraction errors and warnings that occurred during database creation. If you want to permanently enable debug logging for the CodeQL analysis, or would like more information about troubleshooting CodeQL, please follow these instructions.

This feature is now available to all users on GitHub.com and will also be available in GitHub Enterprise Server 3.7.

See more

Code scanning flags up potential security vulnerabilities in pull requests — well before code is merged and deployed. Starting today, such alerts will be more visible: they will appear as a review on the pull request Conversation tab. As with any review, developers can then have a conversation about specific areas of the code that was changed.

And of course, from the code review by the GitHub code scanning bot, you can dive deeper into the alert: view the details, check the data flow paths, and dismiss an alert.

Code scanning alert

Code scanning and branch protection rules

Users were already able to configure code scanning as a required check in the branch protection settings in a repository.

With the new code scanning functionality, developers can start a conversation about code scanning alerts. Branch protection rules that require all conversations to be resolved before a PR can be merged apply equally to conversations about code scanning alerts: as soon as a code reviewer comments on a code scanning alert, the PR can not be merged until the conversation is marked as resolved. This helps ensure comments made on alerts are addressed prior to merging.

As you'd expect, when an alert is fixed, the conversation around the alert gets resolved and the PR can be merged.

PR merge blocked because of unresolved conversation

Learn more about GitHub Advanced Security and code scanning.

See more

Users can now add a comment when dismissing a code scanning alert.
Add a dismissal comment to a code scanning alert

It is optional to provide a dismissal comment. Dismissal comments are recorded in the alert timeline. They can also be set via the code scanning REST API when updating an alert, and retrieved through the new dismissed_comment attribute.

This feature is now available to all users on GitHub.com and will be released in GHES 3.6.

See more

On March 30, 2022, we released CodeQL Action v2, which runs on the Node.js 16 runtime. The CodeQL Action v1 will be deprecated at the same time as GHES 3.3, which is currently scheduled for December 2022.

How does this affect me?

Users of GitHub.com, GitHub AE, and GitHub Enterprise Server 3.5 (and later)

All users of GitHub code scanning (which by default uses the CodeQL analysis engine) on GitHub Actions on the following platforms should update their workflow files:

  • GitHub.com (including open source repositories, users of GitHub Teams and GitHub Enterprise Cloud)
  • GitHub AE
  • GitHub Enterprise Server (GHES) 3.5 and later

Users of the above-mentioned platforms should update their CodeQL workflow file(s) to refer to the new v2 version of the CodeQL Action.

Users of GitHub Enterprise Server 3.4 (and older)

Users of GitHub Enterprise Server 3.4 (and older) are not recommended to update their configuration to use the v2 version of the CodeQL Action:

  • GHES 3.3 (and older) does not support running Actions using the Node 16 runtime and is therefore unable to run the v2 version of the CodeQL Action. Please upgrade to a newer version of GitHub Enterprise Server prior to changing your CodeQL Action workflow files.
  • While GHES 3.4 does support Node 16 Actions, it does not ship with v2 of the CodeQL Action. Users who want to migrate to v2 on GHES 3.4 should request that their system administrator enables GitHub Connect to download v2 onto GHES before updating their workflow files.

The upcoming release of GitHub Enterprise Server 3.5 will ship with v2 of the CodeQL Action included.

Exactly what do I need to change?

To upgrade to the CodeQL Action v2, open your CodeQL workflow file(s) in the .github directory of your repository and look for references to:

  • github/codeql-action/init@v1
  • github/codeql-action/autobuild@v1
  • github/codeql-action/analyze@v1
  • github/codeql-action/upload-sarif@v1

These entries need to be replaced with their v2 equivalents:

  • github/codeql-action/init@v2
  • github/codeql-action/autobuild@v2
  • github/codeql-action/analyze@v2
  • github/codeql-action/upload-sarif@v2

Can I use Dependabot to help me with this upgrade?

Yes, you can! For more details on how to configure Dependabot to automatically upgrade your Actions dependencies, please see this page.

What happens in December 2022?

In December 2022, the CodeQL Action v1 will be officially deprecated (at the same time as the GHES 3.3 deprecation). At that point, no new updates will be made to v1, which means that new CodeQL analysis capabilities will only be available to users of v2. We will keep a close eye on the migration progress across GitHub. If many workflow files still refer to v1 of the CodeQL Action we might consider scheduling one or more brownout moments later in the year to increase awareness.

See more

The CodeQL runner has been deprecated in favor of the CodeQL CLI. As previously announced, starting March 14th, the CodeQL bundle now no longer includes the CodeQL runner. This deprecation only affects users who use CodeQL code scanning in 3rd party CI/CD systems; users of GitHub Actions are not affected.

GitHub Enterprise Server (GHES)

The CodeQL runner was shipped as part of GitHub Enterprise Server (GHES) versions up to and including 3.3.x. GitHub Enterprise Server 3.4 and later no longer include the CodeQL runner. We strongly recommend that customers migrate to the CodeQL CLI, which is a feature-complete replacement for the CodeQL runner and has many additional features.

How does this affect me?

If you’re using CodeQL code scanning on GitHub Actions, you are not affected by this change.

If you’ve configured code scanning to run the CodeQL runner inside another CI/CD system, we recommend migrating to the CodeQL CLI as soon as possible.
Starting April 1st, changes to both the CodeQL analysis engine and the code scanning API are not guaranteed to be compatible with older CodeQL runner releases.

What actions should I take?

You should configure your CI/CD system to use the CodeQL CLI before upgrading to GHES 3.4.0. When setting up the CodeQL CLI, we recommend that you test the CodeQL CLI set up to verify that the CLI is correctly configured to analyze your repository.

Learn more about migrating from the CodeQL runner to the CodeQL CLI here.

See more

The code scanning alert page now shows the analysis origin for an alert. Code scanning alerts can originate from different analysis configurations on a repository. These may be using different tools or targeting different languages or areas of the code. For example, an alert generated using the default CodeQL analysis with GitHub Actions will have a different analysis origin from an alert generated externally and uploaded via the code scanning API. If an alert is generated by multiple analysis origins, the alert may be fixed in one origin but remain open in another.

code-scanning-analysis-origins

Code scanning now shows the details of the analysis origin of an alert. If an alert has more than one analysis origin, it is shown in the ‘Affected branches’ sidebar and in the alert timeline. You can hover over the analysis origin icon in the ‘Affected branches’ sidebar to see the alert status in each analysis origin. If an alert only has a single analysis origin, no information about analysis origins is displayed on the alert page.

These improvements will make it easier to understand your alerts — in particular those that have multiple analysis origins. This is especially useful for setups with multiple analysis configurations, such as mono repos.

Read more about code scanning analysis configurations

See more

The code scanning alert page now always shows the alert status and information for the default branch. There is a new ‘Affected branches’ panel in the sidebar to see the status of the alert in other branches. If the alert does not exist in your default branch, the alert page will show the status as ‘In branch’ or ‘In pull request’ for the location where the alert was last seen.

This improvement makes it easier to understand the status of alerts which have been introduced into your code base.

The alert list page is not changed and can be filtered by branch. You can use the code scanning API to retrieve more detailed branch information for alerts.

Read more about alert details.

See more

GitHub code scanning helps open source maintainers and organizations find potential vulnerabilities in their code, before these can make their way into deployments. CodeQL, our very own analysis engine, powers the majority of those checks. Over the past few months, we have been working hard to improve the depth and breadth of our analysis to cover more CWEs, add support for a host of new language versions, and improve our platform compatibility.

Before we dive into the details: If you haven’t tried GitHub code scanning with CodeQL yet, you can enable it now on your repositories by following this guide! It’s free for open-source projects and available as part of GitHub Advanced Security for our enterprise customers.

All improvements below are available to users of GitHub code scanning on GitHub.com today, and will be part of the next GitHub Enterprise Server release (GHES version 3.5). Users of other GHES versions can also update their CodeQL version to benefit from these analysis improvements straight away.

Language Support

Today, CodeQL already supports JavaScript/TypeScript, Python, Ruby, Java, C#, Go, and C/C++. These languages are themselves under constant development, and we now support the following language versions:

  • C# 10 / .NET 6,
  • Python 3.10,
  • Java 17, and
  • TypeScript 4.5

The standard language features in those language releases are now fully supported by CodeQL.

Performance and Compatibility

For our Linux users, we have fixed an issue that caused the CodeQL CLI to be incompatible with systems running glibc version 2.34 and older.

For users of the CodeQL Apple Silicon support (beta), we are now bundling a native Java runtime for improved performance. Rosetta 2 and macOS Developer Tools are still required for other CodeQL components.

Security Coverage

The Common Weakness Enumeration (CWE) system is an industry-standard way of cataloging insecure software development patterns. CodeQL runs hundreds of queries out of the box that are able to detect an even greater number of CWEs. We went back through our existing queries, and aligned dozens of them with updated CWE IDs to give users better insight into the potential impact of a security issue when an alert is flagged up by code scanning.

We’ve added and improved detection for a large number of CWEs. These are the most significant changes:

  • CWE-190 – Integer Overflow: The cpp/uncontrolled-arithmetic query for C/C++ detects potential user-controlled inputs to calculations that could produce an overflow condition
  • CWE-319 – Cleartext Transmission of Sensitive Data: The cpp/cleartext-transmission query for C/C++ detects network transmissions of sensitive data without encryption
  • CWE-120 – Buffer Overflow: The cpp/very-likely-overrunning-write query for C/C++ now detects cases of out-of-bounds writes based on advanced range analysis
  • CWE-732 – Incorrect Permission Assignment for Critical Resource: The cpp/open-call-with-mode-argument (and optional cpp/world-writable-file-creation) query for C/C++ detect issues that could lead to stack memory disclosure or attacker-writable files
  • CWE-295 – Improper Certificate Validation: The java/insecure-trustmanager query for Java now detects missing or lax certificate handling that could lead to man-in-the-middle attacks
  • CWE-829 – Inclusion of Functionality from Untrusted Control Sphere: The js/insecure-dependency query for JavaScript/TypeScript detects dependency downloads over unencrypted communication channels
  • CWE-347 – Improper Verification of Cryptographic Signature: The js/jwt-missing-verification query for JavaScript/TypeScript detects scenarios in which a JWT payload is not verified with a cryptographic secret or public key
  • CWE-918 – Server-Side Request Forgery: SSRF detection queries for Python have been improved, and now differentiate between partially and fully (py/full-ssrf) user-controlled URLs

Behind the scenes, we’re also working on support for mobile application security, with additional support for Kotlin and Swift on our roadmap. In the meantime, we’ve also added more coverage for mobile security issues for our existing Java support:

See more

GitHub code scanning supports a wide variety of code analysis engines through GitHub Actions workflows — including our own CodeQL engine. Users can now discover and configure Actions workflow templates for partner integrations straight from their repository's "Actions" tab under a category called "Security". Workflows are recommended based on the repository's content: we will suggest analysis engines that are compatible with the source code in your repository.

Configure workflow

Code scanning and our own CodeQL analysis engine are freely available for public repositories. Analysis engines and services provided by partners might require a subscription. You can also configure code scanning for organization-owned private repositories where GitHub Advanced Security is enabled.

Learn more about code scanning workflows on GitHub Actions tab.

See more

Users can now retrieve all their code scanning alerts at the GitHub organization level via the REST API. This new API endpoint supplements the existing repository level endpoint.

This API is available on GitHub.com starting today and will also be available to GitHub Enterprise Server users starting version 3.5.

Learn more about the code scanning REST API
Learn more about GitHub Advanced Security

See more

We have released improvements to the code scanning API:

  • We've added the fixed_at timestamp to alerts. This is the first time that the alert was not detected in an analysis. You can use this data to better understand when code scanning alerts are being fixed.
  • We've enabled sorting of alert results using sort and direction on either created, updated or number. Use this to see the alerts that are most important to you first. For more information, see List code scanning alerts for a repository.
  • We've added a Last-Modified header to the alerts and alert endpoint response. For more information, see Last-Modified in the Mozilla documentation.
  • We've added relatedLocations to the SARIF response when you request a code scanning analysis. The field may contain locations which are not the primary location of the alert. See an example in the SARIF spec and read about getting a code scanning analysis for a repository.
  • We've added help and tags data to the webhook response alert rule object. For more information, see Code scanning alert webhooks events and payloads.
  • PATs with the public_repo scope now have write access for code scanning endpoints on public repos, if the user has permission. This is a bug fix and is now inline with the documentation.

For more information, see the Code scanning in the API reference.

See more

We’ve improved the depth of CodeQL's Python analysis by adding support for more libraries and frameworks, including:

  • FastAPI
  • aiomysql
  • aiopg
  • asyncpg
  • Django REST framework
  • The os.path module
  • Flask-Admin
  • toml
  • ruamel.yaml
  • SQLAlchemy

As a result, CodeQL can now detect even more potential sources of untrusted user data, steps through which that data flows, and potentially dangerous sinks in which this data could end up. This results in an overall improvement of the quality of the code scanning alerts.

We carefully choose and prioritize the libraries and frameworks supported by CodeQL based on their popularity and through user feedback. These improvements are now available to users of CodeQL code scanning on GitHub.com, and will also be available in the next release of GitHub Enterprise Server (3.4).

See more

The latest release of the CodeQL CLI supports including markdown-rendered query help in SARIF files so that the help text can be viewed in the code scanning UI. This functionality is now available for code scanning on GitHub.com and will be available in GitHub Enterprise Server 3.4.

The CodeQL query help text is displayed in the code scanning UI whenever the query generates an alert. The query help explains the problem in more detail, and shows examples of vulnerable and fixed code. Until now, code scanning only displayed the query help for alerts generated by the default CodeQL queries. With the release of CodeQL CLI 2.7.1, the query help for your own custom queries will be uploaded to GitHub and displayed in code scanning.

Example of custom CodeQL query help

Writing query help for custom CodeQL queries

When you write your own queries, we recommend that you write a query help file so that other users can properly understand the impact an alert has on the security of their code. For custom query help in your repository there are no restrictions on the content, but we recommend that you follow the Query help style guide to make the help text as useful as possible.

You should write query help for custom queries in your repository in a markdown file alongside the corresponding query. CodeQL code scanning looks for query help files written in markdown that share the same name as the corresponding query file. For example, if your query file is MyCustomQuery.ql, the query help file should be named MyCustomQuery.md.

For users of 3rd party CI/CD systems

When using CodeQL with GitHub Actions, the query help will automatically be imported from markdown files that are stored alongside the corresponding custom queries. The query help is inserted into SARIF files generated during the analysis step and made available in the code scanning UI.

If you use a different CI/CD system, you have to add the --sarif-add-query-help flag to the codeql database analyze command to include the query help in your SARIF results files. For more information, see Analyzing databases with the CodeQL CLI.

See more

It's now easier to debug problems with CodeQL code scanning: an optional flag in the Actions workflow file will trigger diagnostic data to be uploaded as an artifact to your Actions run. To do this, you can modify the init step of your Actions workflow:

- name: Initialize CodeQL
  uses: github/codeql-action/init@v1
  with:
    debug: true

The data will be uploaded as an Actions artifact named debug-artifacts, attached to the workflow run. Such artifacts contain CodeQL logs, CodeQL databases, and the SARIF files that were produced.

These artifacts will help you when you're debugging problems with CodeQL code scanning. When contacting GitHub support, they might ask for this data too.

Learn more about Troubleshooting the CodeQL workflow.

See more

We’ve improved the depth of CodeQL's analysis by adding support for more libraries and frameworks and increasing the coverage of our existing library and framework models. JavaScript analysis now supports most common templating languages, and Java now covers more than three times the endpoints of previous CodeQL versions. As a result, CodeQL can now detect even more potential sources of untrusted user data, steps through which that data flows, and potentially dangerous sinks in which this data could end up. This results in an overall improvement of the quality of the code scanning alerts.

We carefully choose and prioritize the libraries and frameworks supported by CodeQL based on their popularity and through user feedback. These improvements are now available to users of CodeQL code scanning on GitHub.com, and will also be available in the next release of GitHub Enterprise Server (3.3).

Java

We've improved coverage for the following libraries:

JavaScript

We've added support for the following templating languages:

Learn more about CodeQL and code scanning.

See more

Developers and security researchers using the CodeQL CLI and VS Code extension can now build databases and analyze code on machines powered by Apple Silicon (e.g. Apple M1).

In order to use the CodeQL CLI and/or the VS Code extension on Apple Silicon, please make sure to install the Xcode command-line developer tools and Rosetta 2.

For detailed instructions on how to set up the CLI on supported platforms, please refer to the CodeQL CLI guide.

Learn more about CodeQL and code scanning.

See more