Online Data Science

This course will be conducted over a period of 5 weeks, 8 hours per week. These 8 hours will be further structured as combination of Steinbies Knowledge centre face to face visits and online training.

A career journey of over 25 years in the IT industry. He specializes in tools and technology space today, having created his own products in R, python, RapidMiner, Hadoop. Specialized in Data Science from John Hopkins University.
online data science
Mukund Veeraraghavan

Who is this course for ?

  • Engineering and Science college students – Final Year, who are looking at preparing themselves for Data Science Jobs


  • Career professional, who are looking to switch to Data Science Careers


  • Industry practitioners who want to solve their Data Problems effectively.
online data science

Enquiry Form



  • Industry Focused Problem Solving Techniques using Data Science and Machine Learning.
  • Hybrid learning approach – both on premises at Steinbies Knowledge centre premises and online.
  • Courses taught by leading Industry experts with over 4 decades of collective experience working on Data Science products and projects for Global Organizations, Like, IBM, Alcatel-Lucent, HP, Wipro and Tecnotree.
  • Extensive knowledge sharing platform offered by Steinbies International and Jnanadvisory.
  • Flexible Modular Coursework.
  • Rich and comprehensive course content, which enables student to learn and prepare themselves for Data Science Careers.
  • Hands-on exercises and Quiz to check knowledge levels.
  • Real life problem-based Capstone projects.
  • Interactive sessions and Q&A to clear doubts.


Introduction and Basics about Data

  • Introduction and Discussion about Industry Problems associated with Data Science
  • Basics of Python – Installation of Anaconda framework, Useful libraries – Scikit, Kera etc.
  • How to define a Data science Problem
  • Basics of Statistics as needed for this course
  • Data Acquisition techniques – Reading Data from different sources
  • A small primer about Big Data
  • Installing Oracle Virtual VM box and reading data into HDFS (Hadoop Distributed File system)
  • Visualization of Data – Visual interpretation of Data using plots and charts
  • Statistical inference of Data. Understanding Mean, Median, Standard deviation, percentile and other parameters Hands-on Exercise
  • Sampling techniques as applied to quality control
  • Quiz and Hand-on Exercises. (Grades – 80 % pass mark)

Preparing Data for Machine learning and Basic Modelling

  • Preparing Data for Modelling – Indexing, Slicing, pivoting, aggregation techniques and using Outlier detection to reduce dimension of Data.
  • Basics of Data Modelling in Big Data – Relational Data Model, Semi structured Data Model.
  • Feature Engineering in Data Science – Extracting useful Information from Data.
  • Techniques to partition data into training, development and testing samples.
  • Using Statistics to understand how features in Data, impact final prediction and what is correlation of features.
  • Cause-and-effect diagram (also known as the “fishbone” or Ishikawa diagram)
  • Check sheet.
  • Control chart.
  • Pareto chart.
  • Scatter diagram.
  • Stratification (alternately, flow chart or run chart)
  • Quiz and Hands-on Exercise.
  • Introduction to WECO Rules for quality control and their implementation with Python
  • Machine learning Models – Their Classification and taxonomy. Where and when to use which Model.
  • Classification Models –, Naive Bayes, Linear Discriminate Analysis, Decision Trees, K Nearest Neighbour.
  • Regression – Linear Regression, Logistic Regression, least Angle regression (LARS)
  • Clustering – K- Means Clustering, Hierarchical Clustering, K- median Clustering
  • Basics of A machine learning model performance.
  • Quiz and Hands on exercise. (Grades – 80 % pass mark)

Improving Performance of Machine learning Models

  • Some tools and techniques for data preparation which helps in improving accuracy – K-fold cross validation, Principal Component Analysis.
  • Running Statistical testing on Training data to validate data integrity. Hypothesis testing, Student T test, P value, Confidence Interval, Statistical Power.
  • Concepts of Bias, Variance in Data. Bias-Variance balancing. Performance Indicators of model such as Sensitivity, Specificity, Accuracy, Precision and Recall.
  • Using specific performance Indication tools such as ROC curves and RMSE scores.
  • Base lining Models using Random prediction and Zero Rule Algorithm.
  • Concept of Gradient Boosting.
  • Comparing various Machine learning models.
  • Some Ensemble techniques (Higher order Algorithm) to improve performance.
  • Quiz and Hands On exercise. (Grades – 80 % pass mark)

Neural Network and its application

  • Basics of Neural Networks. How Neural Network is fast becoming one of the most used models, Some Applications.
  • Introduction to Theano and Kera Libraries.
  • Basic building Blocks of Neural Networks
  • Solving Simple logistical Regression classification problems using Neural Network.
  • Some advanced concepts like Residual Net in Neural Network.
  • A short Note on how to productize the model using APIs
  • Industry 4.0 and use of Data Science/ANN in fault detection and predictive maintenance
  • Quiz and hands on exercise. (Grade – 80 % pass mark)

One Week Capstone Project (Hands on)

  • Industry Problem Identification and definition
  • Acquiring Data and preparing Data, with Exploratory Data Analysis.
  • Spreadsheet Parsing Techniques using Pandas with mini assignment on maintenance scheduling
  • Interim Report – I on results of Data Acquisition, preparation and Data Visualization
  • Diving Data into Training, Development and test Data
  • Selecting the right Model for solving the problem.
  • .Applying Model to predict on Training Data
  • Validating results using Statistical testing, P value, Confidence interval.
  • Interim report – 2 on the above three steps
  • Predicting the performance of Model.
  • Comparing models, here students can choose between on lower order Model and one Neural Network model.
  • Finally using Ensembling techniques to show improvement of performance with final Report, Report – 3
  • Grade ( 80 % pass mark)
  • Congratulatory Note on completion.