Responsible AI pair programming with GitHub Copilot
GitHub Copilot boosts developer productivity, but using it responsibly still requires good developer and DevSecOps practices.
GitHub Copilot is like something out of a sci-fi movie—an AI pair programmer that seems capable of reading your mind as you code. GitHub Copilot uses OpenAI Codex, trained on billions of lines of public code, to suggest code and even entire functions in real-time in a developer’s integrated development editor (IDE). Using GitHub Copilot boosts developer productivity, but it is not a silver bullet, nor a replacement for good coding practices and DevSecOps processes. In this post we look at how to think about GitHub Copilot to use it effectively and responsibly.
AI-paired programming—the next frontier
Computers do not understand programming languages—they understand low-level commands. However, most developers today are not coding in low-level commands or assembler code; they use higher-level abstractions that make programming easier and far more productive. AI models that assist developers are well positioned to become ubiquitous, much like how third and fourth generation (3GL and 4GL) programming languages have all but superseded low-level programming, especially for business applications.
GitHub Copilot is not the only AI-powered model in the news. You may have heard of ChatGPT—a model created by OpenAI that can answer natural language questions and assist with tasks like emails, and even write code based on prompts. OpenAI also created DALLE-2, which can create realistic art based on natural language descriptions.
Why are these AI-based tools becoming so popular? One reason is that they abstract users away from the details of how to accomplish a task to enable them to think more clearly about what they are trying to achieve.
Theory building and cognitive load
In Programming as Theory Building (1985), Peter Naur writes that, “The building of the program is the same as the building of the theory of it by the team of programmers.” When developers are programming, they are trying to implement a theory (or model) to solve a business problem. The value that programming adds is not the programming itself, but in the problem that it solves.
Cognitive load is the amount of working memory needed to complete a task. There are broadly three types of cognitive load when programming: intrinsic (knowing how to program), extraneous (knowing how to construct an array, add items to a database, or call an API), and germane (knowing how to solve a business problem). Intrinsic knowledge is required to program, but developers that can maximize the amount of germane load while decreasing the amount of extraneous load are more productive.
Imagine a developer is designing an e-commerce website: they need to model the product catalog, the orders, the shopping basket, the shipment tracking and other aspects of the business. Let’s imagine the product catalog for a moment: where do you want the developer to focus their time and energy—on how to most efficiently present the right products to the right customers, or how to query a database for a list of products? When the developer fills their “working memory” with the mechanics of querying the database and passing objects from the backend to the frontend, they aren’t really thinking about the business problem. The developer could be thinking about which subset of products to fetch first or how to position the product images for maximum visibility in order to increase sales—these are the germane problems the developer is trying to solve.
GitHub Copilot is meant to keep the developer focussed on the business problem (the theory of the program) rather than the programming itself.
Context switching
According to the 2022 Stack Overflow Developer Survey, 63% of developers surveyed spend more than 30 minutes a day searching for answers or solutions to problems (extraneous cognitive load). For a team of 50 developers, StackOverflow estimates that this adds up to between 300 and 650 hours of time lost per week across the team.
Context switching is proven to be an impediment to focused and productive work. Switching out of an IDE to search the internet for how to implement some boilerplate code can impede a developer’s flow, and the less time the developer spends in flow, the less productive they are. In our Research: quantifying GitHub Copilot’s impact on developer productivity and happiness report, developers using Copilot reported they were 88% more productive and 73% more in the flow. We recruited 95 developers to write a web server in Javascript—and gave 45 of them GitHub Copilot. These 45 developers completed the task 55% faster than the group without GitHub Copilot.
GitHub Copilot to the rescue
Understanding the challenges outlined above, you can begin to understand why GitHub Copilot is so valuable. GitHub Copilot helps developers spend more time thinking about the theory they are building than about the code itself. Or in terms of cognitive load: GitHub Copilot helps reduce the amount of extraneous working memory so that developers can focus more on the germane business problem. And rather than constantly switching out of the IDE to find solutions and answers, GitHub Copilot is synthesizing solutions in context for developers as they work, keeping them in their flow.
But how do we know that GitHub Copilot’s solutions are correct and secure? In short, we don’t. Like any other website or code resource developers lean on, it’s meant to serve as a copilot, not an autopilot. While GitHub Copilot is designed to boost productivity, it is not meant to replace developers, nor is it meant to replace good practices and processes for scanning, testing, and validating code.
GitHub Copilot in the inner loop
When discussing DevSecOps, teams often refer to the inner and outer loops. The inner loop is the part of the development that takes place in the developer’s development environment—where they are coding, running and debugging their code locally (either on their own development machine or in a Codespace)—as well as peer code review, typically using a pull request. DevSecOps environments security tools such as GitHub Advanced Security can be embedded into the interloop with pull request security reviews and preventive security checks to prevent vulnerabilities while keeping development velocity high. The outer loop is where Continuous Integration/Continuous Delivery (CI/CD) occurs—where automation is leveraged to build, scan, test, and deploy code using an automation engine like GitHub Actions.
In the inner loop (the development environment), GitHub Copilot can assist developers by synthesizing code snippets based on the context the developer is busy with, as we discussed above, which keeps them in the flow. So, how do developers know whether GitHub Copilot is generating good or bad code for them? To start, developers using GitHub Copilot should have some basic understanding of the language they are coding in. This will help them determine if the solutions GitHub Copilot suggests are valid. From there, the code should still be run and tested locally. And of course, code reviews should not be skipped!
The outer loop
The outer loop should apply to all code—whether or not GitHub Copilot helped create the code. Organizations should apply the same practices and processes they use today to validate code without GitHub Copilot to code that was synthesized by GitHub Copilot. These practices include automated linting, unit testing, static application security testing (SAST), and software composition analysis (SCA)—which can be accomplished with GitHub Advanced Security and automated with GitHub Actions. Ideally, functional, integration, load, and penetration testing should also be performed. GitHub Copilot does not invalidate the need for any of these practices.
Conclusion
GitHub Copilot is a fantastic productivity booster, helping developers stay in flow and be more productive, but is not a replacement for good development and DevSecOps practices. As AI-powered tools gain popularity, enterprises should ensure that their teams have good developer and DevSecOps practices in place so that they can leverage these tools with confidence.
Tags:
Written by
Related posts
How we evaluate AI models and LLMs for GitHub Copilot
We share some of the GitHub Copilot team’s experience evaluating AI models, with a focus on our offline evaluations—the tests we run before making any change to our production environment.
Documenting and explaining legacy code with GitHub Copilot: Tips and examples
Learn how to document and explain legacy code with GitHub Copilot with real-world examples.
How to use GitHub Copilot: What it can do and real-world examples
How Copilot can generate unit tests, refactor code, create documentation, perform multi-file edits, and much more.