Image Recognition Through Deep Learning & Neural Networks (Clarifai)

“Understand every image and video to improve life”. If you haven’t heard this quote before, then you definitely missed an amazing presentation by Matt Zeiler, the founder and CEO of Clarifai.

The event started with an outstanding demo of the Clarifai product. Matt uploaded a picture of a heart and, in a fraction of a second, concept tags and similar images appeared on the screen. But this was not the best part! He also uploaded a video of a scenic landscape and the output was a time series of each tag, predicting, for example, the likelihood that at minute 1:31 a mountain appeared in the video. You don’t trust me? Then check it out on their website where you can upload personal pictures and videos and demo the classifier for yourself. And if English is a problem for you, don’t worry, the tags are translated into 21 languages!

Matt then explained how his product is used by several companies Clarifai collaborates with. Vodafone, Trivago, Vimeo use it for video/picture search or picture organization. Two other companies, Style me Pretty and Inside, use it in more interesting ways.
Style me Pretty is a wedding magazine and blog, for which Clarifai developed a special product focusing on the “wedding” language to organize their photographs. Inside is a medical company which records videos for the purpose of medical diagnostics. Clarifai created an algorithm that can, by parsing the image of the video provided, recognize about a dozen of diseases. Pretty impressive, right?

Matt also explained the machine learning methods powering Clarifai’s algorithms. He explained how neural networks lear features directly from the data without the need of any handcrafted features. In the ImageNet 2012 competition, Matt’s neural network algorithm outperformed other all other methods increasing accuracy by 10%! But what is this learning model exactly?
The steps for deconvolutional networks are the following:
Feature maps -> unpooling -> nonlinearity -> convolution ->input image.
Matt showed us how the different layers in the neural net filter different patterns. For example, the 1st layer focuses more on patterns like colors, the 2nd one could distinguish skin tones and the 5th one could even tell you if there is a husky or a the wheel of a car in your image. As you go along the different layers, the learner becomes more and more specific.

Matt finished the talk by introducing Clarifai’s new baby: Forevery. Forevery is an app, which you can find on the Apple Store, that organizes your photos automatically. It’s not only based on usual tags like places, time etc., but you can actually teach it to recognize specific things like your best friend or a whiteboard. After the learning is done you can search your galleries for pictures based on 1, 2, or even more combined tags.

So if your dream is to work for a visual recognition company that has one day per week where you can work on any project of your choice, look no further: Clarifai is your place! Check our their openings!
Thank you to Clarifai for hosting us! You can find their blog post and slides from the presentation here.