Third Annual GitHub Data Challenge
GitHub’s annual data challenge is back, and we can’t wait to see what you’ll build this year, be it beautiful generative art or full blown, third-party activity dashboards. Check out…
GitHub’s annual data challenge is back, and we can’t wait to see what you’ll build this year, be it beautiful generative art or full blown, third-party activity dashboards. Check out the winners from 2013 and 2012 for some inspiration.
The Details
Entries are generally visualizations, prose descriptions of data analyses, or both. We love innovative entries, so an “entry” is defined somewhat loosely.
There are only three rules:
- To enter, you must fill out our submission form by midnight PDT on August 25th, 2014.
- Your entry needs to use publicly available GitHub data from any number of available sources described below.
- Show your work! Whatever you submit needs associated code or documentation describing what data you used and how you processed it. Some examples of what we’re looking for include code (and instructions to use it) in a GitHub repository, an academic write-up of your analysis, or an informal prose write-up. If you’re not linking to a repository, you should submit a Gist with your documentation.
After the submission deadline on August 25th, GitHub employees will review and vote on all entries to pick the three top winners. We’ll send out notifications to those top three by mid-September.
Data Sources
GitHub activity data is available from several publicly-available sources. Here are a few links to get you started:
- Our very own API.
- The GitHub Archive, providing historical archives of our public timeline data.
- Google BigQuery, where GitHub’s public timeline is a featured public dataset; see the GitHub Archive home page for getting started instructions.
- GHTorrent, which maintains a relational model of GitHub activity data and offers archives for download.
ProTips
There are a few things we’re looking for when we score your entry:
- Innovation/Story: Does your entry tell a good, data-driven story? Does it reveal interesting insights about GitHub activity? We love it when we’re surprised by new insights hidden in our own data.
- Accuracy: Is your analysis accurate? Do accompanying visualizations clearly and unambiguously convey your conclusions?
- Completeness: Is your entry a code submission? If so, is your code well-organized and documented? Can others easily understand and reproduce your analysis from the materials you’ve submitted?
The Prizes
The winning entry in this year’s data challenge will receive an all-expense paid trip to attend a one-day data visualization course taught by Edward Tufte,
a data visualization expert and the author of some of our favorite books on visualization. We’ll cover your enrollment for the course (either December 18th or 19th in San Francisco, CA), along with travel expenses to and from San Francisco, lodging at a nearby hotel for two nights (the evening before and of the course), and your meals.
The second and third prize contestants will receive $500 and $250 cash prizes, respectively.
Finally, all winners will have their GitHub profile and their data challenge entry publicly featured on our blog!
If you have questions about the data challenge rules, drop us a line at data@github.com. Good luck!
Written by
Related posts
Inside the research: How GitHub Copilot impacts the nature of work for open source maintainers
An interview with economic researchers analyzing the causal effect of GitHub Copilot on how open source maintainers work.
OpenAI’s latest o1 model now available in GitHub Copilot and GitHub Models
The December 17 release of OpenAI’s o1 model is now available in GitHub Copilot and GitHub Models, bringing advanced coding capabilities to your workflows.
Announcing 150M developers and a new free tier for GitHub Copilot in VS Code
Come and join 150M developers on GitHub that can now code with Copilot for free in VS Code.