“Overview first, zoom and filter, then details-on-demand.” — Ben Schneiderman
Data Visualisation for Data Science
The aim of the workshop is to provide a thorough introduction to data visualization in R or Python. The focus will be to provide hands on experience with using R or Python libraries to create data visualization - both for conducting exploratory data analysis for large data sets and for communicating insights from data visually. There are three key objectives:
- To gain an experience of using R or Python libraries for creating data visualization for exploration and communication.
- To understand different approaches - static graphics, web based interactive graphics, interactive data products - for creating data visualization.
- To explore using visualization in different statistical context - exploration, modeling e.g. regression, clustering, mapping to gain insight from data
The workshop would be scheduled over two days and would be delivered with a hands-on interactive approach, requiring the participants to do in class R programming and coding as part of learning and discussion. Shorter versions for one-day or half-day can be customized based on specific requirements or needs. It would aim to cover the following topics.
Session 1 - Intro to visualization
- Understand Exploratory Data Analysis (EDA) and graphics in R or Python
- Using base library for graphics [R: graphics, Python: matplotlib]
- Click to see Sample Session #1 Slides
Session 2 - The Grammar of Graphics
- Visualizing single & dual variables [R: ggplot2, Python: seaborn]
- Visualizing categorical data [R library: vcd]
- Click to see Sample Session #2 Slides
Session 3 - Visualizing multiple variables
- Using aesthetics and facets for more than two variables [R: ggplot2, Python: seaborn]
- Using matrix view, parallel coordinates for more than two variables [R: GGally]
- Click to see Sample Session #3 Slides
Session 4 - Communicating with graphs
- Graphical perception and critique
- Understand color, scales, labeling and annotation [R: RColorBrewer, ggplot2]
- Theming and publication ready graphics [R: ggplot2, Python: matplotlib]
Session 5 - Interactive graphics
- Creating interactive graphs for the web [R: ggvis, rCharts, Python: Bokeh, Altair]
- Allowing interactive data-model manipulation [R: shiny, Python: Bokeh]
- Click to see Sample Session #5 Slides
Session 6 - Escaping flatland
- Visualizing multi dimensional variables - Using linking & brushing, dynamic query [R: iplots / Mondrian, Python: Bokeh, ipyvega]
- 3D visualization, projections & tours [R: rggobi / GGobi, scatterplot3d, rgl]
Session 7 - Visualizing clusters and networks
- Clustering - Hierarchical and Non-Hierarchical [R library - graphics]
- Trees and graphs in R [R: Rgraphviz, igraph]
Session 8 - Visualizing geo-spatial data
- Maps and map projections in R [R: ggmap, rleaflet, maps, Python: folium]
Session 9 - Composer of Visualization
- Thinking beyond established charts
- Creating your own layered graphics
Session 10 - Creating Interactive Data Products
- Creating interactive data products for user exploration [R: shiny, Python: Flask]
Participant Profile — The workshop is ideal for anyone who is using open source software - R or Python stack for statistical analysis and visualization. If you are not using R or Python for statistical analysis, then existing familiarity with any other statistical programming tool like SPSS, SAS, MATLAB would be needed. There is no pre-requisite requirement to be familiar with the R or Python libraries mentioned above.
Tools Used - For doing the exercise during the workshop, we would be using R and R IDE - R Studio or Anaconda Distribution for Python. Please install the same in your machine prior to the workshop session. A detailed list of R libraries to install would be shared ahead of the workshop session.
Number of Participants — The maximum number of participants for the workshop would be capped at 30. The small class size would enable a more participative environment with group interaction and presentations possible as well as opportunities to have one-to-one learning interactions.
Duration — The workshop would be conducted over 2 days from 0900 to 1700. There will be short breaks during the morning and afternoon session and a longer lunch break of around 45 minutes in the middle.
Venue Logistics — A training venue for the workshop, with availability of a projector, sound system and whiteboard would be needed for conducting the session.
The workshop would be charged at Rs. 150,000 per day (for Indian locations) or USD 5,000 per day (for International locations). Service tax and other government charges as applicable will be additional. Also, for sessions conducted outside of Bangalore, the facilitator’s travel and accommodation cost would be charged on actuals.
Amit Kapoor is interested in learning and teaching the craft of telling visual stories with data. He is the founder partner at narrativeVIZ Consulting, where he teaches data-science, data-visualisation and data-stories as tools for improving communication, persuasion, and leadership and conducts workshops on these topics for businesses, nonprofits, and academic institutes. He also teaches visualisation as a guest faculty in design context at NID, Bangalore and in management context at IIM Bangalore & IIM Ahmedabad
His background is in strategy consulting in using data-driven stories to drive change across organizations and businesses. He has more than 15 years of management consulting experience, first with AT Kearney in India, then with Booz & Company in Europe and more recently for startups in Bangalore. He did his B.Tech in Mechanical Engineering from IIT, Delhi and PGDM (MBA) from IIM, Ahmedabad. You can find more about him at amitkaps.com and tweet him at @amitkaps.