Birunda Chelliah – PLOTCON 2016

Post written by Birunda Chelliah:

Thanks to the complimentary pass from WiMLDS, I was able to attend the PLOTCON 2016 Conference, which took place in New York City from November 15 to 18. The conference focused on data visualization techniques, tools, and methods using open source data science languages (R/Python/ MATLAB/Julia). The talks were organized into four days with broad topic areas: Journalism & Web Tech, The Enterprise & Productization, Open Vis in Python, and Open Viz in R.

Apart from unanimously agreeing that pie charts are poor visualizations, the talks ranged from simple but important topics, like Eduardo Arino de la Rubia’s talk on ‘23 visualizations and when to use them’, to advanced materials such as the talk by Mike Williams on text analysis.  All speakers provided as much context as possible and explained how they used certain tools to develop visualizations or solve a data challenge. While it was not possible to learn a particular code or technique within 30 minutes, these talks curated potential data tool libraries/packages and inspired the audience to explore and eventually master some of these tools in one’s own data analysis work.

The two speakers I found particularly interesting on Day 1 focused on text analysis visualizations. The first speaker, Irene Ros, presented on ‘Text is Data! Analysis and visualization methods’ using Python’s NLTK library to analyze Alice in Wonderland as a single document. She covered the basics of data cleaning and different text analysis tools/methods such as simple word count, classifier, and topic modeling. As a novice in text analysis, this talk was very useful in that it provided a quick and dirty summary of text analysis techniques, visualizations, and real-world applications.  One of the main takeaways from her talk was the balance between context and extreme data cleaning.

On a similar topic, Mike William, another speaker, discussed ‘Beyond the word cloud: New Approaches for next summarization’. His statement that “A document is not a collection of words, it is a collection of ideas” really resonated with me. The talk emphasized that word clouds are rudimentary tools that do not fully capture the main ideas and themes of a document. He also touched upon topic modeling, a process of grouping collections of words that co-occur, as a better approach for text analysis using an example of Amazon book reviews. Essentially, the key message the speaker imparted was that text analysis should be able to reduce a large document to identify a unique collection of topics without losing the meaning of the document.

Day 4 was my personal favorite as it included everything R! One particular speaker who really stood out was Jenny Bryan, a Professor at the University of British Columbia. She presented on Data Rectangling, — aka Data Wrangling – a task that typically takes more time than the actual analysis or visualization.  Her talk recommended keeping all data wrangling activities within a dataframe instead of leaving them as floating vectors. I personally found Ms. Bryan’s presentation fascinating as she took a unique spin on the conference’s overarching theme of visualization by using LEGOs to visualize challenging data structures such as nested lists within dataframes. She illustrated how to iterate functions over multiple columns using the Purrr library. Overall, her presentation was very engaging especially as she was able to explain complex problems using very simple examples.

Overall, I found this conference to be an eye-opening experience especially as a non-programmer from the applied social research field. There was an outstanding set of speakers who spoke about their unique experiences solving ongoing data issues.  In addition, there was plenty of time during lunch to network with people from diverse fields. Lastly, an unlimited coffee supply never hurts anyone!

All the talks are now available online and the presentation materials will also shortly be available. Some materials can already be found in the speakers’ GitHub accounts. I would highly recommend going through some of the talks!