Full Stack Data Science
“Jack of all trades, master of none, though oft times better than master of one.”
One of the common pain points that we have come across in big organizations is the last-mile delivery of data science applications. One common delivery vehicle is to create dashboards(BI). But the one, that’s very useful and neglected more often than not, is to create APIs and provide seamless integration with other applications within the company. This requires you to have a basic understanding of machine learning, server-side programming and front-end application.
In this workshop, you would learn how to build a seamless end-to-end data driven application - Data Exploration, Machine Learning Model, RESTful API and Web Application - to solve a business prediction problem.
- Introduction to Data Science Process
- Building a simple Machine Learning model
- Building a simple ML Service (localhost)
- Improving the ML model and creating many models
- Creating RESTful API and deploying to cloud
- Create dashboard to visualise the results and interact with the API.
- Persisting model output
- Updating the model as more data comes in (batch only - no streaming)
- Creating a simple application that accomplishes this end-to-end
This will be covered over eight sessions of two hours each over two days.
Session 1: Introduction and Concepts
- Approach for building ML products
- Problem definition and dataset
- Build your first ML Model (Part 1)
Session 2: Build a Simple ML Service
- Build your first ML Model (Part 2)
- Concept of ML Service
- Deploy your first ML Service - localhost API
Session 3: Build & Evaluate ML Models
- Feature Engineering
- Build your second ML model
- ML model evaluation
- Accuracy metrics
- Cross Validation
Session 4: Practice Session
- Practice problem overview and data
- Build your ML Model
- Build your API
Session 5: Build a Simple Dashboard
- Concept of dashboard design
- Create your first dashboard
- Integrate ML model API with dashboard
Session 6: Deploy to cloud
- Get started with cloud server setup
- Deploy your ML service as cloud API
- Deploy your dashboard as cloud service
Session 7: Repeatable ML as a Service
- Build data pipelines
- Update model, API and dashboard
- Schedule ML as as Service process
Session 8: Practice Session & Wrap-up
- Deploy on cloud - dashboard and API
- Best practices and challenges in building ML service
- Where to go from here
- A programmer but not a data science practioner: A programmer with experience in server-side or front-end development and maybe has some familiarity with doing data analysis. You could be looking to transition in to building data driven products or a create a richer product experience with data.
- A data science practioner but not a programmer: A data science with some experience in doing data analysis, preferably in a scripting language (R/Python/Scala), but wants to get a deeper and a more applied perspective on creating data driven products.
- Programming knowledge is mandatory. Attendee should be able to write conditional statements, use loops, be comfortable writing functions and be able to understand code snippets and come up with programming logic.
- Participants should have a basic familiarity of Python. Specifically, we expect participants to know the first four sections from this: http://anandology.com/python-practice-book/
- Participants should also have some experience with using Python for Data Science. Specifically, participants should be able to work with the following python libraries
jupyter: For doing literate programming in notebooks
numpy: For scientific computation
pandas: For data wrangling and transformation of tabular data (dataframes)
scikit-learn: For building machine learning models
We will be using Python data stack for the workshop. Please install Ananconda for Python 3.5 for the workshop. Additional requirement will be communicated to participants.
Anand Chitipothu is a software consultant and trainer based in Visakhapatnam. He has over 13 years of experience in architecting and developing variety of software applications. He is co-author of web.py, a micro web framework in Python. He has worked at Strand Life Sciences and Internet Archive. You can tweet him at @anandology
Amit Kapoor teaches the craft of telling visual stories with data. He conducts workshops and trainings on Data Science in Python and R, as well as on Data Visualisation topics. His background is in strategy consulting having worked with AT Kearney in India, then with Booz & Company in Europe and more recently for startups in Bangalore. He did his B.Tech in Mechanical Engineering from IIT, Delhi and PGDM (MBA) from IIM, Ahmedabad. You can find more about him at http://amitkaps.com/ and tweet him at @amitkaps.
Bargava Subramanian is a practicing Data Scientist. He has 14 years of experience delivering business analytics solutions to Investment Banks, Entertainment Studios and High-Tech companies. He has given talks and conducted workshops on Data Science, Machine Learning, Deep Learning and Optimization in Python and R. He has a Masters in Statistics from University of Maryland, College Park, USA. He is an ardent NBA fan. You can tweet to him at @bargava.