Data Science Bootcamp
“Jack of all trades, master of none, though oft times better than master of one.”
In this intensive data science bootcamp, you will learn how to solve business problem using data science (the art of data science), the principle and application of data visualisation (Data Visualisation for Data Science), the math behind machine learning in a hacker’s way (HackerMath for ML), using machine learning in an applied context (Applied ML) and finally how to create a data-driven product (FullStack Data Science).
Day 1: The Art of Data Science
- Frame a problem
- Acquire the data
- Refine the data
- Transform the data
- Explore the data
- Model the data
- Communicate the Insight
Day 2: Data Visualisation for Data Science
- Intro to visualization
- Understand Exploratory Data Analysis (EDA)
- Communicating insights using visualisation
- The Grammar of Graphics
- Visualizing single & dual variables
- Visualizing categorical data
- Visualizing Multi-Dimensional Data
- Using aesthetics and facets for more than two variables
- Using matrix view, parallel coordinates for more than two variables
- Communicating with graphs
- Graphical perception and critique
- Understand color, scales, labeling and annotation
- Theming and publication ready graphics
- Interactive graphics
- Creating interactive graphs for the web
- Allowing interactive data-model manipulation
- Visualizing geo-spatial data
- Creating Interactive Data Dashboards
Day 3: HackerMath for Machine Learning
- Linear Algebra
- Matrix: Basics, Inverse
- Solve for
- Solve for
- Application: Linear Regression
- Calculus and Numerical Optimisation
- Cost Function
- Gradient Descent
- Application: Classification
- Direct Simulation
- Application: A/B Testing
Day 4: Applied Machine Learning
- Model Building
- Decision Trees
- Bagging and Boosting
- Random Forest
- Gradient Boosting Machines
- Feature Importance
- Model Evaluation
- Training and Validation
- Model Evaluation Metrics - Accuracy, ROC, AUC, Confusion Matrix etc.
- Overfitting and Bias-Variance trade-off
- Regularization (L1/L2)
- K-fold Cross Validation
- Model Selection
- Model Pipelines
- Feature Engineering
- Ensemble Models (Advanced)
Day 5: Full Stack Data Science
- Overview of the case
- Build Simple ML model (linear/logistic regression)
- Creating RESTful API
- Integrating model output to DB
- Updating the model as more data comes in (batch only - no streaming)
- A simple webpage front-end to visualise the results and interact with the API.
- Creating a simple application that accomplishes this end-to-end
- A programmer but not a data science practioner: A programmer with experience in server-side or front-end development and maybe has some familiarity with doing data analysis. You could be looking to transition in to building data driven products or a create a richer product experience with data.
- A data science practioner but not a programmer: A data science newbie with some experience in doing data analysis, preferably in a scripting language (R/Python/Scala), but wants to get a deeper and a more richer experience in data science.
- Programming knowledge is mandatory. Attendee should be able to write conditional statements, use loops, be comfortable writing functions and be able to understand code snippets and come up with programming logic.
- Participants should have a basic familiarity of Python. Specifically, we expect participants to know the first three sections from this: http://anandology.com/python-practice-book/
We will be using Python data stack for the workshop. Please install Ananconda for Python 3.5 for the workshop. Additional requirement will be communicated to participants.
Amit Kapoor teaches the craft of telling visual stories with data. He conducts workshops and trainings on Data Science in Python and R, as well as on Data Visualisation topics. His background is in strategy consulting having worked with AT Kearney in India, then with Booz & Company in Europe and more recently for startups in Bangalore. He did his B.Tech in Mechanical Engineering from IIT, Delhi and PGDM (MBA) from IIM, Ahmedabad. You can find more about him at http://amitkaps.com/ and tweet him at @amitkaps.
Bargava Subramanian is a practicing Data Scientist. He has 14 years of experience delivering business analytics solutions to Investment Banks, Entertainment Studios and High-Tech companies. He has given talks and conducted workshops on Data Science, Machine Learning, Deep Learning and Optimization in Python and R. He has a Masters in Statistics from University of Maryland, College Park, USA. He is an ardent NBA fan. You can tweet to him at @bargava.