Post written by Eszter D. Schoell:
PyGotham 2016 – another excellent year
Due to the generous scholarship provided by PyGotham and awarded by Women in Machine Learning and Data Science (WiMLDS), I was able to attend an absolutely excellent conference.
PyGotham – the NYC based Python-centric conference – once again lived up to its name as an eclectic, developer run/visited gathering. Topics ranged from anomalies to Yosai (no z- topics yet). Particularly interesting were the NLP (natural language processing) talks. NLP is a huge area, requiring expertise from linguists, computer scientists and data scientists – and all were presented at the conference, giving a developer a concrete introduction. Mike Williams (Fast Forward Labs) introduced three methods for summarizing documents: Luhn’s method from 1958, topic modeling and recurrent neural networks. Luhn’s method may not produce the best results, but it is an excellent place to start if you want to practice turning an academic paper into code. For topic modeling, Mike spoke about Latent Dirichlet Allocation, a mature method from David Blei for which the package in sklearn is LatentDirichletAllocation (not lda). Finally, recurrent neural networks is by far the best performing, but the path to the results is totally uninterpretable (do not forget to regularize with dropout when using!).
For those not familiar with NLP, a great step-by-step introduction was given by Steven Butler and Max Schwartz, computational linguists at the City University of New York. Before you delve into NLP, make sure you are aware of how strings are handled in Python 2 and Python 3. Steven and Max highly recommend using Python 3.
Burton DeWilde (Chartbeat) continued the NLP experience by introducing textacy, a python package built on spaCy for higher-level NLP. As one follower noted, this is ‘pandas for nlp’! The project is here and contributions are most welcome.
Ever wonder how to explain complicated concepts so they are easily understood? Watch Daniel Kronovet on “Understanding GPU Programming” and learn! For example, how to explain training models?
“Data are matrices
Transformations are matrices
Lots of matrices
Big matrices
Just multiply them together
Forever. ”
Got it? 😉
Another complex topic made easy was “Probabilistic Graphical Models” by Aileen Nielsen. Specifically, she was highly successful in moving from the abstract explanation of probabilities into code examples using the python package pgmpy. For example, if you want to model how a physician thinks, this is the way to go!
On a more abstract level, Gene Callahan’s talk on “Write less code with algebra” was a great introduction to agent based modeling. Biology provides an easy to understand analogy – birds flying together in a flock. Simple rules, such as fly to the right of the bird in front, converge into complex behavior. For programming, this translates into creating computer units that act as independent agents – having a set of rules about how to behave.
Because PyGotham is also NY-centric (though not exclusively), Evan Misshula’s talk “Python for the Win: Creating Excitement” deserves the recency effect. Evan is at CUNY and advocates for under-represented groups in maths and computer science. For example, did you know about 80% of programmers in NYC come from somewhere else, although 50% of CUNY computer science graduates are earning less than expected because they are not getting jobs writing code? Evan is part of the NYC Tech Talent Pipeline that aims to address this issue and prepare NYers through internships for careers in tech.
This summary is but a small glimpse of the wonderful talks at PyGotham. Once the videos are up, I highly suggest taking a look.