Discover the latest trends and insights on public software development activity on GitHub with the release of Q3 2023 data for the Innovation Graph.
2021 Transparency Report
In GitHub's latest transparency report, we’re giving you a by-the-numbers look at how we responded to requests for user info and content removal.
At GitHub, we put developers first and work hard to provide a safe, open, and inclusive platform for code collaboration. This means we are committed to minimizing the disruption of software projects, protecting developer privacy, taking action on abusive content, and being transparent with developers about content moderation and disclosure of user information. This kind of transparency is vital because of the potential impacts to people’s privacy, access to information, and ability to dispute decisions that affect their content. With that in mind, we’ve published transparency reports going back seven years to inform the developer community about GitHub’s content moderation and disclosure of user information.
A key development in the transparency reporting space was the 2021 update of the Santa Clara Principles on Transparency and Accountability in Content Moderation. GitHub was honored to contribute to the updated principles, and we’ve made significant changes to our reporting forms, clarifications to our policies, and additions to our reporting in several categories as we work to ensure that our transparency reporting continues to reflect the spirit of the Santa Clara Principles. We also continue to follow the guidelines related to promoting transparency and taking the most narrow approach possible when restricting content, set forth in the United Nations report on content moderation.
We promote transparency by:
- Developing our policies in public by open sourcing them so that our users can provide input and track changes
- Explaining our reasons for making policy decisions
- Notifying users when we need to restrict content, along with our reasons, whenever possible
- Allowing users to appeal removal of their content
- Publicly posting all Digital Millennium Copyright Act (DMCA) and government takedown requests we process in a public repository in real time, and sending DMCA notices to the Lumen database
We limit content removal, in line with lawful limitations, as much as possible by:
- Aligning our Acceptable Use Policies with restrictions on free expression, for example, on hate speech, under international human rights law
- Providing users an opportunity to remediate or remove specific content rather than blocking entire repositories, when we see that is possible
- Restricting access to content only in those jurisdictions where it is illegal (geoblocking), rather than removing it for all users worldwide
- Before removing content based on alleged circumvention of copyright controls (under Section 1201 of the US DMCA or similar laws in other countries), we carefully review both the legal and technical claims, and we sponsor a Developer Defense Fund to provide developers with meaningful access to legal resources.
Focus areas for this year’s reporting, including what’s new
In this year’s numbers-based reporting, which includes our January-June 2021 report, GitHub continued to focus on areas of strong interest from developers and the general public, such as requests we receive from governments—whether for information about our users or to take down content posted by our users—and copyright-related takedowns. Copyright-related takedowns (which we often refer to as DMCA takedowns) are particularly relevant to GitHub because so much of our users’ content is software code and can be eligible for copyright protection. That said, only a tiny fraction of content on GitHub is the subject of a DMCA notice (under one in 10 thousand repositories).
This year, we’re increasing transparency by adding reporting on automated detection for some of the most egregious categories of Terms of Service violations: child sexual exploitation and abuse imagery, as well as terrorist and violent extremist content.
GitHub’s Guidelines for Legal Requests of User Data explain how we handle legally authorized requests, including law enforcement requests, subpoenas, court orders, and search warrants, as well as national security letters and orders.
We follow the law and also require adherence to the highest legal standards for user requests for data. Some kinds of legally authorized requests for user data, typically limited in scope, do not require review by a judge or a magistrate. For example, both subpoenas and national security letters are written orders to compel someone to produce documents or testify on a particular subject, and neither requires judicial review. National security letters are further limited in that they can only be used for matters of national security.
By contrast, search warrants and court orders both require judicial review. A national security order is a type of court order that can be put in place, for example, to produce information or authorize surveillance. National security orders are issued by the Foreign Intelligence Surveillance Court, a specialized US court for national security matters.
As we note in our guidelines:
- We only release information to third parties when the appropriate legal requirements have been satisfied, or where we believe it’s necessary to comply with our legal requirements or to prevent an emergency involving danger of death or serious physical injury to a person.
- We require a subpoena to disclose certain kinds of user information, like a name, an email address, or an IP address associated with an account, unless under rare, exigent circumstances. Exigent circumstances refers to cases where we determine that disclosure is necessary to prevent an emergency involving danger of death or serious physical injury to a person, and we keep disclosure as limited as possible in relation to the emergency.
- We require a court order or search warrant for all other kinds of user information, like user access logs or the contents of a private repository.
- We notify all affected users about any requests for their account information, except where we are prohibited from doing so by law or court order.
In 2021, GitHub received and processed 335 requests to disclose user information, as compared to 303 in 2020. Of those 335 requests, 195 were subpoenas (183 criminal and 12 civil), 94 were court orders, and 22 were search warrants. These requests also include six requests made under exigent circumstances, and 18 cross-border data requests, which we’ll share more about later in this report. These numbers represent every request we received for user information, regardless of whether we disclosed information or not, with one exception: we are prohibited from even stating whether or how many national security letters or orders we received. More information on that below. We’ll cover additional information about disclosure and notification in the next sections.
The large majority (96.4%) of these requests came from law enforcement. The remaining 3.6% were civil requests, all of which came from civil litigants wanting information about another party.
We carefully review all requests to disclose user data to ensure they adhere to our policies and satisfy all appropriate legal requirements, and we push back where they do not. As a result, we didn’t disclose user information in response to every request we received. In some cases, the request was not specific enough, and the requesting party withdrew the request after we asked for clarification. In other cases, we received very broad requests, and we were able to limit the scope of the information we provided.
When we do disclose information, we never share private content data, except in response to a search warrant. Content data includes, for example, content hosted in private repositories. With all other requests, we only share non-content data, which includes basic account information, such as username and email address, metadata (such as information about account usage or permissions), and log data regarding account activity or access history.
Of the 335 requests we received in 2021, we disclosed information in response to 269 of those. We disclosed information in response to 178 subpoenas (170 criminal and 8 civil), 64 court orders, 22 search warrants, and five requests made under exigent circumstances.
Those 269 disclosures affected 1,671 accounts.
We notify users when we disclose their information in response to a legal request, unless a law or court order prevents us from doing so. In many cases, legal requests are accompanied by a court order that prevents us from notifying users, commonly referred to as a gag order.
Of the 269 times we disclosed information in 2021, we were only able to notify users nine times. Gag orders prevented us from notifying users in 255 of the other requests. The other five were processed under exigent circumstances, where we delay notification if we determine it is necessary to prevent death or serious harm or due to an ongoing investigation.
While the number of requests with gag orders continues to be a rising trend as a percentage of overall requests, it correlates with the number of criminal requests we processed. Legal requests in criminal matters often come with a gag order, since law enforcement authorities often assert that notification would interfere with the investigation. On the other hand, civil matters are typically public record, and the target of the legal process is often a party to the litigation, obviating the need for any secrecy. None of the civil requests we processed this year came with a gag order, which means we notified each of the affected users.
In 2021, we continue to see a correlation between civil requests we processed (3.0%) and our ability to notify users (3.3%). Our data from past years also reflects this trend of notification percentages correlating with the percentage of civil requests:
- 3.3% notified and 3.0% civil requests in 2020
- 3.7% notified and 3.1% civil requests in 2019
- 9.1% notified and 11.6% civil requests in 2018
- 18.6% notified and 23.5% civil requests in 2017
- 20.6% notified and 8.8% civil requests in 2016
- 41.7% notified and 41.7% civil requests in 2015
- 40% notified and 43% civil requests in 2014
We’re very limited in what we can legally disclose about national security letters and Foreign Intelligence Surveillance Act (FISA) orders. The US Department of Justice (DOJ) has issued guidelines that only allow us to report information about these types of requests in ranges of 250, starting with zero. As shown below, we received 0–249 notices in 2021, affecting 0–249 accounts.
Governments outside the US can make cross-border data requests for user information through the DOJ via a mutual legal assistance treaty (MLAT) or similar form of international legal process. Our Guidelines for Legal Requests of User Data explain how we handle user information requests from foreign law enforcement. Essentially, when a foreign government seeks user information from GitHub, we direct the government to the DOJ so that the DOJ can determine whether the request complies with US legal protections.
If it does, the DOJ would send us a subpoena, court order, or search warrant, which we would then process like any other requests we receive from the US government. When we receive these requests from the DOJ, they don’t necessarily come with enough context for us to know whether they’re originating from another country. However, when a request does indicate this, we capture that information in our statistics for subpoenas, court orders, and search warrants. This year, we know that four of the legal requests we processed originated as cross-border requests.
In 2021, we received eighteen requests directly from foreign governments. Those requests came from five countries: one from Nepal, one from Brazil, one from Japan, three from Germany, and ten from India. This is an increase compared to 2020, when we received eight requests, all from Germany and India. Consistent with our guidelines above, in each of those cases we referred those governments to the DOJ to use the MLAT process.
In the next sections, we describe two main categories of requests we receive to remove or block user content: government takedown requests and DMCA takedown notices.
From time to time, GitHub receives requests from governments to remove content that they judge to be unlawful in their local jurisdiction. When we remove content at the request of a government, we limit it to only the jurisdiction(s) where the content is illegal whenever possible. In addition, we always post the official request that led to the block in a public government takedown repository, creating a public record where people can see that a government asked GitHub to take down content.
When we receive a request, we confirm whether:
- The request came from an official government agency
- An official sent an actual notice identifying the content
- An official specified the source of illegality in that country
If we determine the answer is “yes” to all three, we block the content in the narrowest way we see possible, for example, by geoblocking content only in a local jurisdiction.
In 2021, GitHub received and processed 26 government takedown requests based on local laws—from Russia, China, and Hong Kong. These takedowns resulted in 69 projects (five gists, all or part of 46 repositories, and 18 GitHub Pages sites) being blocked in Russia, China, or Hong Kong. In comparison, in 2020, we processed 44 takedowns that affected 44 projects, all from Russia.
We denied two government requests based on local laws, because the request was incomplete or the content was no longer available at the time of processing. These requests came from Russia. It’s worth noting that in the first half of 2021, GitHub had only processed four government takedowns, though this number steadily increased during the second half of the year. While we processed a lower number of government takedown requests in 2021 as compared to 2020, there were a larger number of affected projects overall.
In addition to requests based on violations of local law, GitHub processed nine requests from governments to take down content as a Terms of Service violation, affecting four accounts, 98 repositories, and four gists in 2021. These requests concerned phishing (US), malware (US), personal information removal (the Netherlands), GitHub Pages violations (UK and Brazil), and copyright, processed under our DMCA takedown policy (US and China).
We denied one government request that alleged a Terms of Service violation, due to lack of evidence that a violation of our Terms had occurred. This request came from the US.
Consistent with our approach to content moderation across the board, GitHub handles Digital Millennium Copyright Act (DMCA) claims to maximize the availability of code by limiting disruption for legitimate projects. Accordingly, we designed our DMCA Takedown Policy to safeguard developer interests against overreaching and ambiguous takedown requests. Most content removal requests we receive are submitted under the DMCA, which allows copyright holders to ask GitHub to take down content they believe infringes on their copyright. If the user who posted the allegedly infringing content believes the takedown was a mistake or misidentification, they can then send a counter notice asking GitHub to reinstate the content.
Additionally, before processing a valid takedown notice that alleges that only part of a repository is infringing, or if we see that’s the case, we give users a chance to address the claims identified in the notice first. We also now do this with all valid notices alleging circumvention of a technical protection measure. That way, if the user removes or remediates the specific content identified in the notice, we avoid having to disable any content at all. This is an important element of our DMCA policy, given how much users rely on each other’s code for their projects.
Each time we receive a valid DMCA takedown notice, we redact personal information, as well as any reported URLs where we were unable to determine there was a violation. We then post the notice to a public DMCA repository. In 2021, we began adding annotations to certain takedown notices to help others understand how the notices were processed. For example, when we give users a chance to address claims before taking action (as we described in the previous paragraph), we note this at the top of the posted notice. We are continuing this practice in 2022, coming up with additional annotations where we think they will improve transparency around our processes or help increase understanding.
In addition to posting the DMCA notices we process in a GitHub repository, from today onward, we are also sending those notices to the Lumen database, an independent research project of the Berkman Klein Center for Internet & Society at Harvard University that collects and analyzes requests to remove material from the web. That means that you’ll be able to search for and find them there, too.
Our DMCA Takedown Policy explains more about the DMCA process, as well as the differences between takedown notices and counter notices. It also sets out the requirements for making a valid request, which include that the person submitting the notice takes into account fair use.
In 2021, GitHub received and processed 1,828 valid DMCA takedown notices. This is the number of separate notices for which we took down content or asked our users to remove content. In addition, we received and processed 38 valid counter notices, seven retractions, one reversal, and three counter notice reversals, for a total of 1,877 notices in 2021. We also received three notices of legal action filed related to DMCA takedown requests this year.
While content can be taken down, it can also be restored. In some cases, we reinstate content that was taken down if we receive one of the following:
- Counter notice: the person whose content was removed sends us sufficient information to allege that the takedown was the result of a mistake or misidentification.
- Retraction: the person who filed the takedown changes their mind and requests to withdraw it.
- Reversal: after receiving a seemingly complete takedown request, GitHub later receives information that invalidates it, and we reverse our original decision to honor the takedown notice.
These definitions of “retraction” and “reversal” each refer to a takedown request. However, the same can happen with respect to a counter notice. In 2021, we processed three counter notice reversals.
In 2021, the total number of takedown notices ranged from 115 to 218 per month. The monthly totals for counter notices, retractions, and reversals combined ranged from one to eight.
Often, a single takedown notice can encompass more than one project. For these instances, we looked at the total number of projects, including repositories, gists, and GitHub Pages sites that we had taken down due to DMCA takedown requests in 2021.
The monthly totals for projects reinstated—based on a counter notice, retraction, or reversal—ranged from negative one to 34. (“Negative one” represents a counter notice that we reversed because it turned out to be invalid.) The number of counter notices, retractions, and reversals we receive ranges from less than one to more than five percent of the DMCA-related notices we get each month. This means that most of the time when we receive a valid takedown notice, the content comes down and stays down. In total in 2021, we took down 19,276 projects and reinstated 85, which means that 19,191 projects stayed down.
The number 19,191 may sound like a lot of projects, but it’s less than .01% of the more than 200 million repositories on GitHub in 2021.
That number also includes many projects that are actually currently up. When a user makes changes in response to a takedown notice, we count that in the “stayed down” number. Because the reported content stayed down, we include it even if the rest of the project is still up. Those projects are in addition to the number reinstated.
Within our DMCA reporting, we also look specifically at takedown notices that allege circumvention of a technical protection measure under section 1201 of the DMCA. GitHub requires additional information for a DMCA takedown notice to be complete and actionable where it alleges circumvention. In 2021, we updated our DMCA Takedown Policy to include a section that specifically addresses circumvention claims and outlines our policy with respect to how we review and process such claims. We also updated our copyright claims form to more readily communicate the additional information needed to report circumvention-related claims. These changes also support our developer-first commitment and enable us to more closely track and report on this data in our transparency reports.
We are able to estimate the number of DMCA notices we processed that include a circumvention claim by searching the takedown notices we processed for relevant keywords. On that basis, we can estimate that of the 1,828 notices we processed in 2021, 92 notices, or 5.0%, related to circumvention. Although takedown notices for circumvention violations have increased in the past few years, they are relatively few, and the proportion of takedown notices related to circumvention has fluctuated between roughly two and five percent of all takedown notices:
- 92 or 5.03% of all notices in 2020
- 49 or 2.78% of all notices in 2019
- 33 or 1.83% of notices in 2018
- 25 or 1.81% of notices in 2017
- 36 or 4.74% of notices in 2016
- 18 or 3.56% of notices in 2015
The previous DMCA numbers related to valid notices we received. We also received a lot of incomplete or insufficient notices regarding copyright infringement. Because these notices do not result in us taking down content, we do not currently keep track of how many incomplete notices we receive, or how often our users are able to work out their issues without sending a takedown notice.
Trends in DMCA data
Based on DMCA data we’ve compiled over the last few years, the number of DMCA notices we received and processed has generally correlated with growth in repositories over the same period of time. In 2021, however, we processed a similar number of DMCA notices as we did in 2018 (1,802 in 2018 vs. 1,828 in 2021), with approximately double the number of repositories (96+ million in 2018 vs. 200+ million in 2021) on the entire GitHub platform. Additionally, in 2021 we took down significantly fewer projects as a result of DMCA notices compared to 2020. We are not able to determine the exact cause of the downtick, however, we suspect that a contributing factor is GitHub’s continued focus on standing up for developers’ rights, including the update to our DMCA review process in late 2020.
A new category we’re reporting on this year is automated detection. We use automated scanning to detect some of the most egregious kinds of abuse on the platform: child sexual exploitation and abuse imagery (CSEAI) and terrorist and violent extremist content (TVEC). We scan based on robust hash matching techniques using the PhotoDNA tool. Our process also involves human review to confirm an image that is initially detected as a hit, and allows users to appeal an automated content moderation decision against them.
In 2021, out of millions of images scanned, we confirmed automated detection of one account with CSEAI, which was reported to the National Center for Missing & Exploited Children (NCMEC). None of the images scanned contained TVEC. While the data shows very little of this content on the platform, we feel it is important to put in resources to detect it to safeguard survivors and the community. It is also important to note this data does not include other staff action taken in response to reports of CSEAI or TVEC. We made four additional CSEAI reports to NCMEC based on these cases.
Reinstatements, including as a result of appeals, are a key component of fairness to our users and respect for their right to a remedy for content removal or account restrictions. Reinstatements can occur when we undo an action we had taken to disable a repository, hide an account, or suspend a user’s access to their account in response to a Terms of Service violation. While sometimes this happens because a user disputes a decision to restrict access to their content (an appeal), in many cases, we reinstate an account after a user removes content that violated our Terms of Service and agrees not to violate them going forward. For the purposes of this report, we looked at reinstatements related to:
- Abuse: violations of our Acceptable Use Policies, except for spam, phishing, and malware
- Trade controls: violations of trade sanctions restrictions
As explained above, GitHub’s Terms of Service include numerous abuse-related restrictions on content and conduct. When we determine a violation of our Terms of Service has occurred, we have a number of enforcement actions we can take. In keeping with our approach of restricting content in the narrowest way possible to address the violation, sometimes we can resolve an issue by disabling one repository (taking down one project) rather than acting on an entire account. Other times, we may need to act at the account level, for example, if the same user is committing the same violation across several repositories.
At the account level, in some cases we will only need to hide a user’s account content—for example, when the violation is based on content being publicly posted—while still giving the user the ability to access their account. In other cases, we will only need to restrict a user’s access to their account—for example, when the violation is based on their interaction with other users—while still giving other users the ability to access their shared content. For a collaborative software development platform like GitHub, we realized we need to provide this option so that other users can still access content that may provide value to their projects.
We reported on restrictions and reinstatements by type of action taken. In 2021, we hid 4,585 accounts and reinstated 341 hidden accounts. We restricted a repository owner’s access to 69 accounts and reinstated it for 23 accounts. For 4,240 accounts, we both hid and restricted the repository owner’s access, lifting both of those restrictions to fully reinstate 28 accounts and lifting one but not the other to partially reinstate eight accounts. As for abuse-related restrictions at the project level, we disabled 2,257 projects and reinstated 55 in 2021. These do not count DMCA related takedowns or reinstatements (for example due to counter notices), which are reported on in the DMCA section, above).
We’re dedicated to empowering as many developers around the world as possible to collaborate on GitHub. The US government has imposed sanctions on several countries and regions (Crimea, Cuba, Iran, North Korea, and Syria), which means GitHub isn’t fully available in some of those places. However, GitHub will continue advocating with US regulators for the greatest possible access to code collaboration services for developers in sanctioned regions. For example, in January 2021 we secured a license from the US government to make all GitHub services fully available to developers in Iran. The 2021 statistics below show that we granted all remaining appeals and denied none. We are continuing to work toward a similar outcome for developers in Crimea and Syria. Our services are also generally available to developers located in Cuba, aside from specially designated nationals, other denied or blocked parties under U.S. and other applicable law, and certain government officials.
Although trade control laws require GitHub to restrict account access from certain regions, we enable users to appeal these restrictions, and we work with them to restore as many accounts as we legally can. In many cases, we can reinstate a user’s account (grant an appeal), for example after they returned from temporarily traveling to a restricted region or if their account was flagged in error. More information on GitHub and trade controls can be found here.
We started tracking sanctions-related appeals in July 2019. Unlike abuse-related appeals, we must always act at the account level (as opposed to being able to disable a repository) because trade controls laws require us to restrict a user’s access to GitHub. In 2021, 1,504 users appealed trade-control related account restrictions, as compared to 2,500 in 2020. Of the 1,504 appeals we received in 2021, we approved 1,312 and denied 137, and required further information to process in eight cases. We also received 47 appeals that were mistakenly filed by users who were not subject to trade controls, so we excluded these from our analysis below. Although we received 60 percent fewer appeals in 2021, compared to 2020, the percentage of appeals we granted increased in 2021.
Appeals varied widely by region in 2021, ranging from more than one thousand related to Crimea to one related to North Korea and none related to Cuba. In 102 cases, we were unable to assign an appeal to a region in our data. We marked them as “Unknown” in the table below and excluded them from regional totals in the chart below it. The number of appeals from Crimea and Syria each increased as compared to 2020, while the numbers from Iran were much smaller (and all granted), because the US government granted us a license to offer GitHub to developers there in 2021.
GitHub remains committed to maintaining transparency and promoting free expression as an essential part of our commitment to developers. We aim to lead by example in our approach to transparency by providing in-depth explanation of the areas of content removal that are most relevant to developers and software development platforms. This year, we expanded our transparency reporting to include our use of automated detection for the most egregious kinds of abuse, and we updated our abuse reporting form, which will allow us to expand future transparency reporting.
Key to our commitment is protecting user privacy, free association, assembly, and expression by limiting the amount of user data we disclose, and the amount of legitimate content we take down, within the bounds of the law. Through our transparency reports, we’re continuing to shed light on our own practices, while also hoping to contribute to broader discourse on platform governance.
We hope you found this year’s report helpful and encourage you to let us know if you have suggestions for additions to future reports. For more on how we develop GitHub’s policies and procedures, check out our site policy repository.
February 7, 2022: We updated some numbers in this report to reflect updates we’ve made to the data in the DMCA repository’s data folder.