Skip to content

Third Annual GitHub Data Challenge

GitHub's annual data challenge is back, and we can't wait to see what you'll build this year, be it beautiful generative art or full blown, third-party activity dashboards. Check out…

Author

GitHub’s annual data challenge is back, and we can’t wait to see what you’ll build this year, be it beautiful generative art or full blown, third-party activity dashboards. Check out the winners from 2013 and 2012 for some inspiration.

The Details

Entries are generally visualizations, prose descriptions of data analyses, or both. We love innovative entries, so an “entry” is defined somewhat loosely.

There are only three rules:

  1. To enter, you must fill out our submission form by midnight PDT on August 25th, 2014.
  2. Your entry needs to use publicly available GitHub data from any number of available sources described below.
  3. Show your work! Whatever you submit needs associated code or documentation describing what data you used and how you processed it. Some examples of what we’re looking for include code (and instructions to use it) in a GitHub repository, an academic write-up of your analysis, or an informal prose write-up. If you’re not linking to a repository, you should submit a Gist with your documentation.

After the submission deadline on August 25th, GitHub employees will review and vote on all entries to pick the three top winners. We’ll send out notifications to those top three by mid-September.

Data Sources

GitHub activity data is available from several publicly-available sources. Here are a few links to get you started:

  • Our very own API.
  • The GitHub Archive, providing historical archives of our public timeline data.
  • Google BigQuery, where GitHub’s public timeline is a featured public dataset; see the GitHub Archive home page for getting started instructions.
  • GHTorrent, which maintains a relational model of GitHub activity data and offers archives for download.

ProTips

There are a few things we’re looking for when we score your entry:

  • Innovation/Story: Does your entry tell a good, data-driven story? Does it reveal interesting insights about GitHub activity? We love it when we’re surprised by new insights hidden in our own data.
  • Accuracy: Is your analysis accurate? Do accompanying visualizations clearly and unambiguously convey your conclusions?
  • Completeness: Is your entry a code submission? If so, is your code well-organized and documented? Can others easily understand and reproduce your analysis from the materials you’ve submitted?

The Prizes

The winning entry in this year’s data challenge will receive an all-expense paid trip to attend a one-day data visualization course taught by Edward Tufte,
a data visualization expert and the author of some of our favorite books on visualization. We’ll cover your enrollment for the course (either December 18th or 19th in San Francisco, CA), along with travel expenses to and from San Francisco, lodging at a nearby hotel for two nights (the evening before and of the course), and your meals.

The second and third prize contestants will receive $500 and $250 cash prizes, respectively.

Finally, all winners will have their GitHub profile and their data challenge entry publicly featured on our blog!

If you have questions about the data challenge rules, drop us a line at data@github.com. Good luck!

Explore more from GitHub

Community

Community

See what’s happening in the open source community.
The ReadME Project

The ReadME Project

Stories and voices from the developer community.
GitHub Copilot

GitHub Copilot

Don't fly solo. Try 30 days for free.
Work at GitHub!

Work at GitHub!

Check out our current job openings.