Jigsaw Academy - Data Scientist Outline (1)

  • Upload
    akeey4u

  • View
    220

  • Download
    0

Embed Size (px)

Citation preview

  • 8/11/2019 Jigsaw Academy - Data Scientist Outline (1)

    1/6

    Jigsaw Academy aims to meet the growing demand for

    talent in the field of analytics by providing industry-relevant

    training and education to develop business-ready

    professionals

  • 8/11/2019 Jigsaw Academy - Data Scientist Outline (1)

    2/6

    Data Scientist Course: Analytics Techniques

    Jigsaw Academy Education Pvt. Ltd.

    Overview of Analytics What is analytics? Types of problems in analytics Case studies of application of analytics in

    business When analytics does not work Analytics vs. data warehousing, OLAP,

    Statistics Widely used analytic software Companies using analytics Day in the life of a business analyst Career path in analytics Qualities of a business analyst

    BigData Analytics What is BigData? Applications Sample cases Structured vs. Unstructured Data Hadoop Ecosystem MapReduce Concepts BigData Tools Companies using BigData Caree path in Data Science Focus on Applications Case study Twitter Analysis

    Analytic Methodology Problem definition Data selection

    Data exploration Data partition Data cleansing Data transformation Modeling Validation Deployment Assessment Re-start

    Module: Introduction to Analytics& BigData Overview

    Models and Algorithms Modeling Terminology Linear Regression Logistics Regression Decision Trees MARS Rule Induction K-nearest Neural Network Genetic Algorithm

    Problem Definition Basics of problem definition Case study - Car Insurance Case study - Credit Cards Case study - Telecom

    Analytic Tools Overview of Analytic Tools GUI based and Programming based Excel SAS R

    Others

  • 8/11/2019 Jigsaw Academy - Data Scientist Outline (1)

    3/6

    Data Scientist Course: Analytics Techniques

    Jigsaw Academy Education Pvt. Ltd.

    Data Preparation Why data prep Outlier treatment Missing values treatment Telecom case study Categorical variables Dummy variables Derived variables Lag variables Interaction variables Variable transformation Quadratic variables Date, time variables Sampling and partitioning Case study - Auto manufacturer

    Regression

    Basics of Regression Linear Regression Logistic Regression Interpretation of modeling results Violation of regression assumptions Insurance Case study

    Decision Trees What are decision trees? Examples of trees Terminology in decision trees Data preparation for trees How to create a tree? Measure of effectiveness

    Gini Chi-square Information gain Reduction in variance Others

    Application of algorithms Case study - Fraud detection Case study - Car Insurance pricing Use of decision trees

    Pros and cons What makes a good tree? When to use Decision trees? Widely used software for Decision

    trees

    Clustering What is clustering Types of clustering K-means clustering Measures of homogeneity Data prep Hierarchical clustering Cluster evaluation Cluster profiling When to use Important considerations Clustering in SAS - case study on

    store clustering

    Pitfalls to avoid while Modeling Misleading patterns

    Biased population Data at wrong level Already known insights Un-actionable insights

  • 8/11/2019 Jigsaw Academy - Data Scientist Outline (1)

    4/6

    Data Scientist Course: Tools Training

    Jigsaw Academy Education Pvt. Ltd.

    Introduction to R What is R? Origins of R Current Status R Ecosystem Commercial products How R can help in Business Analytics,

    Data Mining , Data Visualization R Installation

    Windows Linux Using VMware for virtual partition CRAN and Packages

    R Interfaces Command line Graphical User Interfaces

    IDE Web Interfaces

    Advantages and Disadvantages ofusing R

    Data Input Data Import Spreadsheet Like Data Statistical File Formats Databases

    Internet Data Data from Packages Using GUI for Data Import

    Data Manipulation Transposing Dataset Conditional selection of rows,

    columns, variables Using Reshape Merging Datasets

    Data Exploration Types of Data in R Summarizing data in R Using GUI for Data Exploration Graphs in R

    Module: R

    Data Visualization Introduction to ggplot2 Plotting using R data frames Graphics on large data Visualizing Statistical Outputs

    Regression modeling using R Logistic Regression Linear Regression

    Creating a model and Scoring amodel

    Understanding Model Output

    (Coefficients, Fit, Residuals, R square,

    P Value)

    Decision Tress & Clustering in R Hierarchical Clustering K Means Clustering Decision trees using rpart package

    How to export data? Exporting Graphs Saving code and output Exporting using GUI

  • 8/11/2019 Jigsaw Academy - Data Scientist Outline (1)

    5/6

    Working with Big Data

    Jigsaw Academy Education Pvt. Ltd.

    Big Data Overview What is Big Data? Generators Drivers Characteristics Structured vs. Unstructured Data Challenges

    Big Data Analytics Comparison with traditional

    Analytics Tools and Technologies Applications Sample Case Studies Project Life Cycle

    Hadoop RDBMS Limitations No-SQL Databases History of Hadoop Why Hadoop Hadoop Ecosystem Big Data Applications Installation Modes

    Working with Hadoop

    MapReduce What is MapReduce? Map Process Reduce Process Anatomy of Map Reduce

    program

    Hadoop Components HDFS Overview HDFS Architecture HBASE Overview HIVE

    FLUME

    Module: Big Data Module

    RHadoop for Big Data R & Hadoop Overview Why Rhadoop? Advantages Installation Configuration Sample Use Cases

    Final Case Study Case Study: Twitter Analytics

    using HIVE and FLUME

  • 8/11/2019 Jigsaw Academy - Data Scientist Outline (1)

    6/6

    Data Scientist Course: Statistics

    Jigsaw Academy Education Pvt. Ltd.

    Statistics Introduction to statistics Summary statistics

    Mean Median Mode Variance

    Random Variables Probability Probability distribution

    Binomial Poisson Normal

    Module: Statistics

    Hypothesis testing Intuition Standard approaches

    T-test One sample Two Sample

    Chi-square test Variance Association

    Multiple Sample Tests ANOVA Chi Square

    Non parametric testing