opensource.google.com

Menu

From Google Summer of Code to Game of Thrones on the Back of a JavaScript Dragon (Part 2)

Friday, August 12, 2016

This guest post is a part of a short series about Guy Yachdav, Tatyana Goldberg and Christian Dallago and the journey that was inspired by their participation as Google Summer of Code mentors for the BioJS project. Don’t miss the first post in the series. Heads up, this post contains spoilers for Game of Thrones seasons 5 and 6!

We built on the Google Summer of Code (GSoC) philosophy and the lessons we learned from participating in 2014 by starting a JavaScript Technology class at the Technical University of Munich (TUM).

We began with two dozen students who worked on expanding the BioJS visualization library. Our class became popular quickly and the number of applicants doubled each semester (nearly 180 applicants for 40 seats in the 2016 summer term).

In 2016 our team grew to include Christian Dallago, who had joined as a GSoC mentor. Together we decided to break with tradition of our course’s previous semesters. Instead of focusing on data visualization, we wanted to introduce students to data science with JavaScript. To get our students fully engaged, we decided the project would center on data from the hit TV show, Game of Thrones.

Our aim was to create an online portal for Game of Thrones fans which would:
  1. Provide the most comprehensive, structured and open data set about the Game of Thrones world accessible via API.
  2. Present an interactive map based on JavaScript.
  3. Listen to what people are saying on Twitter about each of the show’s characters.
  4. Use machine learning algorithms to predict the likelihood of each character’s death.
Our plan worked — the students were engaged. It was a beautiful sight to see: GitHub repos humming with activity as each dev team delved deeper into their projects. As a project manager, you know you’ve got something good when issues are being opened and closed at 4:00 AM!

The results were mind blowing. In 50 days of programming, 36 students opened over 1,200 issues and pull requests, pushed 3,300 commits, released four apps to NPM, and, of course, produced one absolutely amazing website.

The website amasses data from 2,028 characters. Our map shows 240 landmarks and the paths traveled by 28 characters. Our Twitter sentiment analysis tool analyzed over 3 million tweets. And we launched the first ever machine learning-based prediction algorithm that predicts the likelihood of dying for the 1,451 characters in the show that are still alive.

image02fix.png
Visualization of Twitter sentiment analysis data for Jon Snow during season 5 of Game of Thrones. The X axis shows the timeline and the Y axis shows the number of positive (green) and negative (red) tweets. Each tweet is analyzed by an algorithm using a neural network to determine whether the tweet’s writer has a positive, negative or neutral attitude toward the character. 
Since launch, the site’s popularity has skyrocketed. Following our press release, we were covered by over 1,500 media outlets, most notably Time, The GuardianRolling Stone, Daily Mail, BBC, Reuters, The Telegraph, CNET and many more. HowStuffWorks, The Vulture and others produced videos about the site and Chris Hardwick’s Comedy Central show did a segment about us. We've also given countless interviews to TV, radio and newspapers.

Blog2_Figure1_v3.png
Google Analytics for the website. Left chart shows the number of visitors to the website during the first week after launch, reaching over 73K visitors on April 25th. Right chart shows the number of visitors at a given time point during the same week.
The most exciting part of the project was predicting the likelihood that any given character would die using machine learning. Machine learning algorithms find rules and patterns in the data, things that humans cannot obviously and simply detect. Once the rules and patterns are identified, we apply machine learning to make inferences or predictions from novel, previously unseen, data sets.

Warning: The next paragraphs contain spoilers for seasons 5 and 6 of Game of Thrones!

In order to predict the likelihood of a character’s death, we collected information about all of the characters that appeared in books 1 to 5 and analyzed over 30 features, including age, gender, marital status and others. Then we used a support vector machine (SVM) to statistically compare the features of characters, both dead and alive, to predict who would get the axe next. Our prediction was correct for 74% of all cases and surprised us by placing a number of characters thought to be relatively safe in grave danger.

According to our predictions, Jon Snow, who was seemingly betrayed and murdered by fellow members of the Night’s Watch at the end of season 5, had only an 11% chance of dying. Indeed, Jon has risen from the dead in the second episode of season 6! We also predicted that the rulers of Dorn (Doran and Trystane) Martell are at a high likelihood of death and, as predicted, they were taken out in the first episode of the new season.

Of course, as is always the case with predictions, there were also misses. We didn’t expect Roose Bolton to be killed off nor did we see Hodor’s departure coming.

This experience was an amazing ride for our team and it all started with Google Summer of Code! In the next post we’ll share what followed and where we see ourselves heading in the future.

By Guy Yachdav, Tatyana Goldberg and Christian Dallago, BioJS
.