amitkaps

The Art of Data Science

The ever increasing computational capacity has enabled us to acquire, process and analyse larger data-sets and information. We increasingly want to take a data-driven lens to solve business problems. But business problems are inherently ‘wicked in nature’ - with multiple stakeholder, different problem definition, different solutions interdependence, constraints, amplifying loops etc. There is no one trick to solve them. What is required is learning a structured approach to problem solving that can be applied to large set of these problems. One possible way is to use a Hypotheses Driven Approach - problems definition, scoping, issue identification and hypothesis generation - as a starting point for this. In this workshop, you will learn how to apply this hypotheses driven approach through seven pragmatic steps - Frame, Acquire, Refine, Transform, Explore, Model, and Insight - to any business problem. The focus will be to learn the principles through an applied case study and by actually coding in R or Python to solve this.

Objective


Approach

Curriculum

1. INTRO

“I think, therefore I am”

2. FRAME

“Framing the problem is often far more essential than its solution”

3. ACQUIRE

“Data is the new oil”

4. REFINE

“Data is messy”

5. TRANSFORM

“What is measured may not help answer what is needed “

4. EXPLORE

“I don’t know, what I don’t know”

6. MODEL

“All models are wrong, Some of them are useful”

7. INSIGHT

“The goal is to turn data into insight”

Target Audicence

The workshop is ideal for anyone who wants to learn how to use open source software - R or Python stack for statistical analysis and visualization. If you are not using R or Python for statistical analysis, then existing familiarity with data analysis in some other tool would help. There is no pre-requisite requirement to be familiar with the R or Python libraries mentioned above.

Software Requirements

For doing the exercise during the workshop, we would be using R and R IDE - R Studio or Anaconda Distribution for Python. Please install the same in your machine prior to the workshop session. For attendees more curious, we will be using Rmarkdown or Jupyter Notebook as our IDE. Some of the main libraries we will using in the session are:

The working repo for this workshop is at https://github.com/amitkaps/art-data-science

Facilitators’ Profile

Amit Kapoor teaches the craft of telling visual stories with data. He conducts workshops and trainings on Data Science in Python and R, as well as on Data Visualisation topics. His background is in strategy consulting having worked with AT Kearney in India, then with Booz & Company in Europe and more recently for startups in Bangalore. He did his B.Tech in Mechanical Engineering from IIT, Delhi and PGDM (MBA) from IIM, Ahmedabad. You can find more about him at http://amitkaps.com/ and tweet him at @amitkaps.

Bargava Subramanian is a practicing Data Scientist. He has 14 years of experience delivering business analytics solutions to Investment Banks, Entertainment Studios and High-Tech companies. He has given talks and conducted workshops on Data Science, Machine Learning, Deep Learning and Optimization in Python and R. He has a Masters in Statistics from University of Maryland, College Park, USA. He is an ardent NBA fan. You can find more about him at http://bargava.com/ and tweet to him at @bargava.