Since 2005, our Open Source Programs Office has enabled 11,000+ students, ranging in age from 13 to 56, to explore open source software development. They’ve worked hands-on with over 515 projects across a variety of disciplines.

If you’re a student looking to learn new coding skills that can help make a difference, check out our upcoming programs: Google Code-in for students 13-17 and Google Summer of Code for university students.

Google Code-in - Program starts for students December 7, 2015

For the sixth year in a row, Google Code-in will give 13-17 year old pre-university students an opportunity to dive in and explore the world of open source. Students with many different skills -- coders and non-coders alike -- will find opportunities to learn by doing and earn prizes. It’s easy to get started: just choose an interesting task from our participating organizations’ lists and complete it under the guidance of a mentor.

GCI-logo generic no border.pngGoogle Code-in is for students asking questions like:
  • What is open source?
  • What kinds of stuff do open source projects do?
  • How can I write real code when all I’ve done is a little classroom work?
  • Can I contribute even if I’m not really a coder?

With tasks in five different categories, there’s something to fit almost any student’s skills:
  • Code: writing or refactoring
  • Documentation/Training: creating/editing documents and helping others learn more
  • Outreach/research: community management, outreach/marketing, or studying problems and recommending solutions
  • Quality Assurance: testing and ensuring code is of high quality
  • User Interface: user experience research or user interface design and interaction
GCI 2014 Grand Prize Winners on the Google Campus

Over 2,200 students from 87 countries have taken part in Google Code-in, and we’re excited to welcome many more into this year’s edition. We’ll be announcing this year’s participating organizations on November 13th, so stay tuned.

Google Summer of Code - Student applications open on March 14, 2016
GSoC logos from the last 10 years
Google Summer of Code (GSoC) is an innovative program dedicated to introducing students from universities around the world to open source software development. The program offers student developers stipends to write code for a wide variety of carefully selected open source projects while under the guidance of mentors. Our goal is to help these students pursue academic challenges over the summer break while they create and release open source code for the benefit of all. Over the past 11 years, over 8,300 mentors and 8,500 student developers in 101 countries have produced a stunning 55 million lines of code.

500+ GSoC Students and Mentors

We’re proud to continue this tradition for another year: we’ll be welcoming another batch of students into Google Summer of Code 2016. We’ll be accepting applications from open source organizations in February and student applications from March 14 - 25, 2016 so it’s not too early to start thinking about proposals.

Spread the word to your friends and stay tuned for more details coming soon!

By Stephanie Taylor and Carol Smith, Open Source Programs Office

Our wrap-up post this Friday features HPCC Systems, another organization new to Google Summer of Code 2015. HPCC aims to solve big problems around big data. Read below to learn more.
HPCC Systems was designed to solve “big data” problems. It can process, analyze and find links and associations in high volumes of complex data at high speed and with incredible accuracy. While it was originally created by LexisNexis and is still used in-house, the HPCC Systems Project went open source four years ago. Free downloads of the software, documentation and training materials are available from our website.

This is the first time we participated in Google Summer of Code (GSoC) and it has been a great success. As a first-time organization, we were allocated two student slots. It was quite hard to choose which proposals to accept because there were many high quality contenders. We selected two projects that highlight areas of specific interest not just for us but for our community and the world of big data.

Add Statistics to the Linear and Logistic Regression Modules - Sarthak Jain

Machine learning statistics are important to the big data world, providing a way to drill down into data using complex queries and produce meaningful results to help businesses maintain their competitive edge in the market place. The HPCC Systems Machine Learning Library has been around for a while now and we are always looking for ways to improve it. The new statistics added as part of this project give vastly improved results about the models created.
Slide taken from Sarthak's presentation describing some of the tasks completed
The statistics Sarthak added provide metrics which indicate the “goodness” of the model created. He completed the tasks associated with these statistics in very good time and also added three stepwise functions to the same modules which find the best model by adding or taking away independent variables. A goodness metric was also added to these features to select which independent variables are added to or taken away from the model. The three functions he added were forward, backward and bidirectional.

Expand the HPCC Systems Visualization Framework (Web-Based) - Anmol Jagetia

Currently the HPCC Systems Platform has very little support for visual analytics. While there are plenty of “off the shelf” visual analytic tools and dashboard creators, none are really suitable for big data because they typically work with local datasets (think charting with a spreadsheet). The HPCC Systems Visualization Framework aims to solve the issue by bringing together existing “best of breed” visualizations as well as bespoke HPCC Systems visualizations into a consistent framework.

Anmol’s project involved adding unit tests and linting as well as adding new visualization widgets and enhancing existing ones. He used his knowledge and experience to enhance our build quality infrastructure and has also added a range of new features to the existing framework including the addition of a time lapse capability and a number of features which enable bar charts to be used as Gantt charts. The work he has done, which is already being used, significantly improves the user experience.

Below is an illustration of the work Anmol did to add range support in a column chart where there is both an upper and lower bound.

We’ve really enjoyed participating in GSoC this year and we will definitely apply to be accepted again next year. Our thanks go to the students for contributing to our project. We hope they enjoyed working with us.

By Lorraine Chapman, HPCC Systems Release Manager and GSoC Org Admin


For our Google Summer of Code wrap-up this week we have The Distributed Little Red Hen Lab. A new organization for 2015, Red Hen Lab had three student projects. Read on to learn about the Lab and their effort to scan a huge repository of international television news programming.

The Distributed Little Red Hen Lab is an international consortium for research on multimodal communication. We develop open source tools for joint parsing of text, audio/speech and video, using datasets of various sorts, most centrally a very large dataset of international television news called the UCLA Library Broadcast NewsScape. Red Hen uses 100% open source software. In fact, not just the software but everything else—including recording nodes—is shared in the consortium.

The Red Hen archive is a huge repository of recordings of TV programming, processed in a range of ways to produce derived products useful for research, expanded daily, and supplemented by various sets of other recordings. Our challenge is to create tools that allow us to access audio, visual, and textual (closed-captioning) information in the corpus in various ways by creating abilities to search, parse and analyze the video files. However, as you can see, the archive is very large, so creating processes that can scan the entire dataset is time consuming, and often with a margin of error.

Our projects for Google Summer of Code 2015 (GSoC) challenged students to assist in a number of projects, including some that have successfully improved our ability to search, parse and extract information from the archive.

Ekateriana Ageeva - Multiword Expression Search and Tagging

Ekaterina built a multiword expressions toolkit (MWEtoolkit), which is a tool for detecting multi-word units (e.g. phrasal verbs or idiomatic expressions) in large corpora. The toolkit operates via command-line interface. To ease access and expand the toolkit's audience, Ekaterina developed a web-based interface, which builds on and extends the toolkit functionality.

The interface allows us to do the following:
  • Upload, manage, and share corpora
  • Create XML patterns which define constraints on multiword expressions
  • Search the corpora using the patterns
  • Filter search results by occurrence and frequency measures
  • Tag the corpora with obtained search results

The interface is built with Python/Django. It currently supports operations with corpora tagged with Stanford CoreNLP parser, with a possibility to extend to other formats supported by MWEtoolkit. The system uses part of speech and syntactic dependency information to find the expressions. Users may rely on various frequency metrics to obtain the most relevant search results.

Owen He - Automatic Speaker Recognition System

Owen_He-web.jpgOwen used a reservoir computing method called conceptor together with the traditional Gaussian Mixture Models (GMM) to distinguish voices between different speakers. He also used a method proposed by Microsoft Research last year at the Interspeech Conference, which used a Deep Neural Network (DNN) and an Extreme Learning Machine (ELM) to recognize speech emotions. DNN was trained to extract segment-level (256 ms) features and ELM was trained to make decisions based on the statistics of these features on a utterance level.

Owen’s project focused on applying this to detect male and female speakers, specific speakers, and emotions by collecting training samples from different speakers and audio signals with different emotional features. He then preprocessed the audio signals and created the statistical models from the training dataset. Finally, he computed the combined evidence in real time and tuned the apertures for the conceptors so that the optimal classification performance could be reached. You can check out the summary of results on GitHub.

Vasant_Kalingeri-web.jpgVasanth Kalingeri - Commercial detection system

Vasanth built a system for detecting commercials in television programs from any country and in any language. The system detects the location and the content of ads in any stream of video, regardless of the content being broadcast and other transmission noise in the video. In tests, the system achieved 100% detection of commercials. An online interface was built along with the system to allow regular inspection and maintenance.

Initially the user uses a set of hand tagged commercials. The system detects this set of commercials in the TV segment. On detecting these commercials, it divides the entire broadcast into blocks. Each of these blocks can be viewed and tagged as commercials by the user. There is a set of 60 hand labelled commercials for one to work with. This process takes about 10-30min for a 1hr TV segment, depending on the number of commercials that have to be tagged.

When the database has an appreciable amount of commercials (usually around 30 per channel) we can use it to recognize commercials in any unknown TV segment. On making changes to the web interface, the system updates its db with new/edited commercials. This web interface can be used for viewing the detected commercials as well. For more information see Vasanth’s summary of results.

By Patricia Wayne, UCLA Communication Studies

It’s Friday! Time for another Google Summer of Code wrap-up post. Boston University / XIA is one of the 37 new organizations to the program this year. Read below about three student projects and their work to discover the future architecture of the internet.
Linux XIA is the native implementation of eXpressive Internet Architecture (XIA), a meta network architecture that supports evolution of all of its components, which we call “principals,” and promotes interoperability between these principals. We are developing Linux XIA because we believe that the most effective way to find the future Internet architecture that will eventually replace TCP/IP is to crowdsource the search. This crowdsourced search is possible in Linux XIA.

Our organization, Boston University / XIA, received 34 proposals from 12 countries. As a first-year organization in Google Summer of Code (GSoC), we were surprised by the number of proposals, and we did our best to choose great students for each of the following projects:

XLXC is a set of scripts written in Ruby that creates network topologies using virtual interfaces and Linux containers. While testing a new network stack, a good amount of work goes into creating testing environments. XLXC saves developers and tinkerers a lot of time while experimenting with Linux XIA. Our student Aryaman Gupta from India worked with mentor Rahul Kumar to enable XLXC to emulate any topology using a language to describe the topologies.

Linux XIA needs to call forwarding functions that correspond to each XID type in order to forward a packet. XID types are 32-bit identifiers associated with principals which, in turn, define the forwarding functions. Being able to hash each XID type to a unique entry in an array increases the number of packets Linux XIA can forward per second because it reduces the number of memory accesses per lookup. Our student Pranav Goswami, also from India, worked with mentor Qiaobin Fu to find the best perfect hashing algorithm for Linux XIA to use in this case, and implemented it in Linux XIA.

We do not know how the future Internet will route packets between autonomous systems (ASes), but we are certain that Linux XIA can leverage IP's routing tables to have large deployments of Linux XIA. This is the goal of the LPM principal: leveraging routing tables derived from BGP, OSPF, IS-IS and any other IP routing protocol to forward XIA packets natively, that is, without encapsulation in IP. Thanks to the evolution mechanism built into Linux XIA, when a better way to route between ASes becomes available, we will be able to incrementally phase LPM out. Student André Ferreira Eleuterio from Brazil implemented the LPM principal in Linux XIA with the help of mentor Cody Doucette.

We are going to work with our students during the fall to have their contributions merged into our repositories and to add new projects to our ideas list that build upon their contributions. We expect that this will motivate new contributors by showing how much impact they can have on Linux XIA. Finally, new collaborators do not need to wait for the next GSoC to get involved! Join our community today, and "do what you can, with what you have, where you are" to make a difference like our three students successfully did.

By Michel Machado, Organization Administrator for Boston University / XIA

"Time zones are logical and easy to use."
—no one ever

Programming with time zones is notoriously difficult and error prone. Sure, this is partially because time zones have some inherent complexity. But perhaps the bigger problem is that programmers don't have a clear conceptual model of how time and time zones work. Additionally, library support may not be what it should. The end result is that code dealing with time zones is often overly complicated and sometimes even wrong.

A couple years ago we set out to fix these time zone programming woes within Google. We did this first by defining a greatly simplified mental model that enables programmers to understand time concepts and correctly reason about their code. We also created a C++ Time Zone library that closely matches this mental model and allows programmers to handle even the most complicated issues in a general and clear way.

And since we don't believe that time zone programming problems are unique to Google, we think our solutions may be useful to others. We presented these ideas and announced the open sourced cctz library this week at CppCon 2015. Even if you don't use C++, we hope you'll take a moment to read about the simplified mental model and perhaps flip through the slides from our talk, because those ideas are language independent.

by Greg Miller and Bradley White, Google Engineering

At Google, we think that internet users’ time is valuable, and that they shouldn’t have to wait long for a web page to load. Because fast is better than slow, two years ago we published the Zopfli compression algorithm. This received such positive feedback in the industry that it has been integrated into many compression solutions, ranging from PNG optimizers to preprocessing web content. Based on its use and other modern compression needs, such as web font compression, today we are excited to announce that we have developed and open sourced a new algorithm, the Brotli compression algorithm.

While Zopfli is Deflate-compatible, Brotli is a whole new data format. This new format allows us to get 20–26% higher compression ratios over Zopfli. In our study ‘Comparison of Brotli, Deflate, Zopfli, LZMA, LZHAM and Bzip2 Compression Algorithms’ we show that Brotli is roughly as fast as zlib’s Deflate implementation. At the same time, it compresses slightly more densely than LZMA and bzip2 on the Canterbury corpus. The higher data density is achieved by a 2nd order context modeling, re-use of entropy codes, larger memory window of past data and joint distribution codes. Just like Zopfli, the new algorithm is named after Swiss bakery products. Brötli means ‘small bread’ in Swiss German.

The smaller compressed size allows for better space utilization and faster page loads. We hope that this format will be supported by major browsers in the near future, as the smaller compressed size would give additional benefits to mobile users, such as lower data transfer fees and reduced battery use.

By Zoltan Szabadka, Software Engineer, Compression Team

Pencil Code is a collaborative programming site for art, music and creating games. It is also a place to experiment with mathematical functions, geometry, graphing, webpages, simulations and algorithms. Pencil Code had three Google Summer of Code students in 2015. You can read more about their project successes below.

As we return to school and look around Pencil Code in preparation for classes this fall, there are quite a few improvements created by our Google Summer of Code (GSoC) students. The first thing you see when you log in — icons everywhere! But better yet, if you have saved the program recently, the icon will be a screenshot of the program's output. This change will help students and teachers quickly identify saved projects, and will help people find interesting projects they want to share.
The icon implementation was done by Xinan Liu, a student at Singapore National University. He rewrote several bits of the Pencil Code server to support the icons, and then on the client side, he integrated the very cool html2canvas library to create the screenshots.

Xinan also contributed quite a bit beyond this project. He refactored our node.js-based build to switch from require.js to browserify, and he has been contributing to other sharing and scaling features on Pencil Code, helping other non-GSoC contributors get up to speed and reviewing their pull requests. We're looking forward to Xinan's continuing involvement and contributions to our little open source community.
The next contribution was by IIIT Hyderabad student Saksham Aggarwal. Saksham has implemented an HTML block mode for the Droplet block editor, which means that teachers can introduce beginners to HTML syntax using a drag-and-drop interface. And as usual with Droplet, you can toggle between blocks and text at any time. Saksham is also working on a similar Droplet-based editor for CSS syntax. The visual HTML syntax editor is a very accessible way to see and work with HTML syntax without having to type every bracket. And yet, magically, it does not hide the syntax - by toggling into text, you can work directly with traditional code. It is fully authentic, but highly accessible. You can read a paper about Saksham's work here.
The final project was a collaboration between GSoC student Jeremy Ruten from the University of Saskatchewan, and two of our summer students Amanda Boss from Harvard and Cali Stenson from Wellesley. They created an incredibly ambitious project to implement a "rewindable" debugger in Pencil Code. Although it is not quite ready for production yet, we are already using pieces of it in Pencil Code. You will see the debugger in coming months! For examples of how it transforms code, you can check out Jeremy, Amanda and Cali's writeup of their debugging work.

Did I mention that the three of them are students? And that they built this rewindable debugger over just one summer!? They made improvements that will make a real difference as we use Pencil Code to bring computer science to the next generation of students.

We'd like you to participate!

If you are interested in bringing some of this cool work into your classroom, join our discussion group by signing up at We have teachers from elementary school to college, from Texas to Singapore. And if you'd like to make an open source contribution, check out for project ideas, and join the teaching discussion group — also an area where our open source contributors hang out.

We are grateful to Google for supporting our summer open source program with GSoC. We hope the summer was as interesting for our students as it was productive for our project. We look forward to our students' continued involvement in the Pencil Code community.

By David Bau, Organization Administrator for Pencil Code

Now that the 11th year of Google Summer of Code has officially come to a close, we will devote Fridays to wrap-up posts from a handful of the 137 mentoring organizations that participated in 2015. Organizations this year represented a wide range of computing fields including artificial intelligence, featured below.


Two software libraries that originate from our laboratory, the Institute for Artificial Intelligence, that are used and supported by a larger user community are the KnowRob system for robot knowledge processing and the CRAM (Cognitive Robot Abstract Machine) framework for plan-based robot control. In our group, we have a very strong focus on open source software and active maintenance and integration of projects. The systems we develop are available under BSD and MIT licenses, and partly (L)GPL.

Within the context of these frameworks, we offered four projects during the summer term in 2015, which were all accepted to Google Summer of Code (GSoC).

Multi-modal Big Data Analysis for Robotic Everyday Manipulation Activities

The project "Multi-modal Big Data Analysis for Robotic Everyday Manipulation Activities" added to our ongoing work to build the robotic perception system RoboSherlock for service robots performing household chores. Our GSoC student, Alexander, made exciting progress and valuable contributions during the summer. He ported an earlier prototypical proprioceptive module from Java to C++ to integrate it into RoboSherlock, he developed tools for visualizing the module's various detections and annotations, and applied this infrastructure to detect collisions of the robot's arms with unperceived parts of the environment in a shelf reordering task. We are also very happy that Alexander decided to stay and keep on working on RoboSherlock after GSoC ended.

Kitchen Activity Games GUI

Our GSoC student, Mesut, developed a GUI to interact with the robotics simulator Gazebo. The simulator has been used as a library, allowing different scenarios (worlds) to be selected and executed. Playlists can be generated in order to replay logged episodes. During the replay, various plugins can be linked and executed from the GUI to allow post processing the data. The user interface will ease organizing and saving simulation data further used for learning. You can view Mesut’s project on GitHub here.

Symbolic Reasoning Tools with Bullet using CRAM

Autonomous robots performing complex manipulation tasks in household environments, such as preparing a meal or tidying up, are required to know where different objects are located and what properties they have. The knowledge about their environment is called “belief state”, i.e. the information that the robot believes holds true in the surrounding world. Our GSoC student, Kunal, worked on improving the world representation of the CRAM robotic framework, which represents the environment as a 3-dimensional world where simple physics rules of the Bullet Physics engine apply. The goal of the project was to issue events when errors are found in the belief state, such as, if the robot thinks its arm is inside of a table, which is physically impossible. A stand-alone ROS (Robot Operating System) publisher node, that would notify all its listeners about errors, was partially implemented while integration with the CRAM belief state is still in progress.

Report Card Generation from Robot Mobile Manipulation Activities

Throughout the summer, our GSoC student Kacper made great progress in developing a framework for automatically generating report cards from robot experiences. We have a special focus in mobile manipulation activities in robots and are interested in anomaly detection in our rather complex systems — the developed components greatly help us save time on mundane analysis tasks, and make complicated analysis steps (looking up all aspects of a certain action, comparing different trials) easier to do.

By Jan Winkler, Organization Administrator and PhD student at the Institute of Artificial Intelligence

We're excited to announce the Beta release of Bazel, an open source build system designed to support a wide variety of different programming languages and platforms.

There are lots of other build systems out there -- Maven, Gradle, Ant, Make, and CMake just to name a few. So what’s special about Bazel? Bazel is what we use to build the large majority of software within Google. As such, it has been designed to handle build problems specific to Google’s development environment, including a massive, shared code repository in which all software is built from source, a heavy emphasis on automated testing and release processes, and language and platform diversity. Bazel isn’t right for every use case, but we believe that we’re not the only ones facing these kinds of problems and we want to contribute what we’ve learned so far to the larger developer community.

Our Beta release provides:

Check out the tutorial app to see a working example using several languages.

We still have a long way to go.  Looking ahead towards our 1.0.0 release, we plan to provide Windows support, distributed caching, and Go support among other features. See our roadmap for more details and follow our blog or Twitter account for regular updates.  Feel free to contact us with questions or feedback on the mailing list or IRC (#bazel on freenode).

By Jeff Cox, Bazel team