Noemi Derzsy – scikit-learn sprint

Post written by Noemi Derzsy:

Contributing to open source for first timers can seem quite intimidating and overwhelming. At least I know that was the case for me. I kept reading blog posts about the importance of open source contribution, why we need to support open source software, about the fact that the scientific python ecosystem is based on and relies on volunteer coders’ work, and it is each of our responsibility to engage in the community’s effort to sustain these packages that our daily work relies on. This all sounds very motivating, but I still felt like in front of a deep pool of water thinking: “I should learn to do this, but I cannot just jump!” This is where this workshop (that I will describe below) came in really handy, as it was meant to take us by the hand and guide us step by step down on the pool steps to get us comfortable with the water and ultimately with swimming in it.

On March 4th I had the opportunity to participate at Scikit-learn Sprint – Crash-Course in contributing to scikit-learn, another excellent workshop organized by WiMLDS in NYC. The event was designed to bring together volunteer programmers to contribute to scikit-learn, Python’s data science toolkit, and it was hosted by Stack Exchange at the Stack Overflow HQ in NYC, the winner of Crain’s “Best places to work in NYC” award. I know this from their exhibited awards at the front desk, and of course from enjoying a day brainstorming, learning and networking at their office on the 28th floor with amazing Manhattan views and great amenities. Throughout the day we were pampered with refreshments provided by Bloomberg.

The event was led by Andreas Müller, lecturer in Data Science at Columbia University, one of the core developers of the scikit-learn machine learning library, and author of the book Introduction to Machine Learning with Python. If you want to know more about Andreas, Reshama Shaikh, co-organizer of WiMLDS, had an awesome interview with him, discussing the lack of women’s involvement in open source projects (there is only one woman among the top 100 contributors to the scikit-learn library), and the ways Andreas together with WiMLDS are helping to overcome this gender bias and encourage women’s participation in open source contribution. For the full interview click here.

Since this was my first ever open source contribution, as mentioned above, it seemed overwhelming at first, however Andreas and his agile and helpful team helped us throughout the day from technical issues to workflow questions. Andreas prepared for us a presentation and started the day with an overview of scikit-learn and GitHub workflow, and instructions on how to get started. After that, we were supposed to work in pairs to tackle and contribute to the more than 700 open issues. We were instructed to start with issues labeled ”Easy” in the issue tracker on GitHub to familiarize with the workflow, and then move on to the more complex issues. In addition to contributing to code bug fixing or enhancements, we also had the option of contributing to the documentation. By the end of the event we had 18 submitted pull requests and 3 merged. After spending several hours on GitHub and the scikit-learn project, I am certainly a lot more confident to contribute to open source in the future, so it was certainly very well spent day, thanks to the wonderful WiMLDS organizers who made this event possible.

And as if that had not been already awesome enough, the organizers ended the day with a raffle of O’Reilly t-shirts and booklets, and Andreas gave us scikit-learn stickers (yaaay!). Can’t wait for the next WiMLDS event, it is always a joy to learn, work and network in the company of extraordinary, ambitious professional women.

If this blog post tickled your curiosity about contributing to open source projects, you can contribute to the scikit-learn repository here as there are still more than 700 open issues that need to be fixed and here are the explanations on how to contribute.