The GitHub Data Challenge II
There are millions of projects on GitHub. Every day, people from around the world are working to make these projects better. Opening issues, pushing code, submitting Pull Requests, discussing project…
There are millions of projects on GitHub. Every day, people from around the world are working to make these projects better. Opening issues, pushing code, submitting Pull Requests, discussing project details — GitHub activity is a papertrail of progress. Have you ever wondered what all that data looks like? There are millions of stories to tell; you just have to look.
Last year we held our first data challenge.
We saw incredible visualizations, interesting timelines and compelling analysis.
What stories will be told this year? It’s up to you!
To Enter
Send a link to a GitHub repository or gist with your graph(s) along with a description to data@github.com before midnight, May 8th, 2013 PST.
Data access
The GitHub public timeline is a featured public dataset available on Google BigQuery. The “timeline” table has over a year’s worth of public activity and is approaching 100M rows. You can find even more data available in JSON format on The GitHub Archive project.
You are free to use any tools you like. If you choose to use BigQuery, running queries against the GitHub dataset is free for the first 100GB of query processing. Pricing information for additional query processing is available here. After signing up for BigQuery, add the project name “githubarchive”.
[]((https://bigquery.cloud.google.com/)
Prizes
GitHub staff will be voting on our favorite visualizations and there will be prizes for the top three spots:
- 1st Prize: $200 to the GitHub Shop
- 2nd Prize: $100 to the GitHub Shop
- 3rd Prize: $50 to the GitHub Shop
We will also feature the three winning entries on the GitHub blog. Winners will be announced the week of May 20th.
Analyzing Millions of GitHub Commits
Last year Ilya Grigorik and I spoke at the Strata conference, showcasing some of the interesting analysis and visualizations from the data challenge including which programming language results in most frustration (VimL), amusement (Ruby) and surprise (Perl).
Written by
Related posts

Racing into 2025 with new GitHub Innovation Graph data
Discover the latest trends and insights on public software development activity on GitHub with the quarterly release of data for the Innovation Graph, updated through December 2024.

GitHub Availability Report: March 2025
In March, we experienced one incident that resulted in degraded performance across GitHub services.

Vibe coding with GitHub Copilot: Agent mode and MCP support rolling out to all VS Code users
In celebration of MSFT’s 50th anniversary, we’re rolling out Agent Mode with MCP support to all VS Code users. We are also announcing the new GitHub Copilot Pro+ plan w/ premium requests, the general availability of models from Anthropic, Google, and OpenAI, next edit suggestions for code completions & the Copilot code review agent.