Upload
akeey4u
View
220
Download
0
Embed Size (px)
Citation preview
8/11/2019 Jigsaw Academy - Data Scientist Outline (1)
1/6
Jigsaw Academy aims to meet the growing demand for
talent in the field of analytics by providing industry-relevant
training and education to develop business-ready
professionals
8/11/2019 Jigsaw Academy - Data Scientist Outline (1)
2/6
Data Scientist Course: Analytics Techniques
Jigsaw Academy Education Pvt. Ltd.
Overview of Analytics What is analytics? Types of problems in analytics Case studies of application of analytics in
business When analytics does not work Analytics vs. data warehousing, OLAP,
Statistics Widely used analytic software Companies using analytics Day in the life of a business analyst Career path in analytics Qualities of a business analyst
BigData Analytics What is BigData? Applications Sample cases Structured vs. Unstructured Data Hadoop Ecosystem MapReduce Concepts BigData Tools Companies using BigData Caree path in Data Science Focus on Applications Case study Twitter Analysis
Analytic Methodology Problem definition Data selection
Data exploration Data partition Data cleansing Data transformation Modeling Validation Deployment Assessment Re-start
Module: Introduction to Analytics& BigData Overview
Models and Algorithms Modeling Terminology Linear Regression Logistics Regression Decision Trees MARS Rule Induction K-nearest Neural Network Genetic Algorithm
Problem Definition Basics of problem definition Case study - Car Insurance Case study - Credit Cards Case study - Telecom
Analytic Tools Overview of Analytic Tools GUI based and Programming based Excel SAS R
Others
8/11/2019 Jigsaw Academy - Data Scientist Outline (1)
3/6
Data Scientist Course: Analytics Techniques
Jigsaw Academy Education Pvt. Ltd.
Data Preparation Why data prep Outlier treatment Missing values treatment Telecom case study Categorical variables Dummy variables Derived variables Lag variables Interaction variables Variable transformation Quadratic variables Date, time variables Sampling and partitioning Case study - Auto manufacturer
Regression
Basics of Regression Linear Regression Logistic Regression Interpretation of modeling results Violation of regression assumptions Insurance Case study
Decision Trees What are decision trees? Examples of trees Terminology in decision trees Data preparation for trees How to create a tree? Measure of effectiveness
Gini Chi-square Information gain Reduction in variance Others
Application of algorithms Case study - Fraud detection Case study - Car Insurance pricing Use of decision trees
Pros and cons What makes a good tree? When to use Decision trees? Widely used software for Decision
trees
Clustering What is clustering Types of clustering K-means clustering Measures of homogeneity Data prep Hierarchical clustering Cluster evaluation Cluster profiling When to use Important considerations Clustering in SAS - case study on
store clustering
Pitfalls to avoid while Modeling Misleading patterns
Biased population Data at wrong level Already known insights Un-actionable insights
8/11/2019 Jigsaw Academy - Data Scientist Outline (1)
4/6
Data Scientist Course: Tools Training
Jigsaw Academy Education Pvt. Ltd.
Introduction to R What is R? Origins of R Current Status R Ecosystem Commercial products How R can help in Business Analytics,
Data Mining , Data Visualization R Installation
Windows Linux Using VMware for virtual partition CRAN and Packages
R Interfaces Command line Graphical User Interfaces
IDE Web Interfaces
Advantages and Disadvantages ofusing R
Data Input Data Import Spreadsheet Like Data Statistical File Formats Databases
Internet Data Data from Packages Using GUI for Data Import
Data Manipulation Transposing Dataset Conditional selection of rows,
columns, variables Using Reshape Merging Datasets
Data Exploration Types of Data in R Summarizing data in R Using GUI for Data Exploration Graphs in R
Module: R
Data Visualization Introduction to ggplot2 Plotting using R data frames Graphics on large data Visualizing Statistical Outputs
Regression modeling using R Logistic Regression Linear Regression
Creating a model and Scoring amodel
Understanding Model Output
(Coefficients, Fit, Residuals, R square,
P Value)
Decision Tress & Clustering in R Hierarchical Clustering K Means Clustering Decision trees using rpart package
How to export data? Exporting Graphs Saving code and output Exporting using GUI
8/11/2019 Jigsaw Academy - Data Scientist Outline (1)
5/6
Working with Big Data
Jigsaw Academy Education Pvt. Ltd.
Big Data Overview What is Big Data? Generators Drivers Characteristics Structured vs. Unstructured Data Challenges
Big Data Analytics Comparison with traditional
Analytics Tools and Technologies Applications Sample Case Studies Project Life Cycle
Hadoop RDBMS Limitations No-SQL Databases History of Hadoop Why Hadoop Hadoop Ecosystem Big Data Applications Installation Modes
Working with Hadoop
MapReduce What is MapReduce? Map Process Reduce Process Anatomy of Map Reduce
program
Hadoop Components HDFS Overview HDFS Architecture HBASE Overview HIVE
FLUME
Module: Big Data Module
RHadoop for Big Data R & Hadoop Overview Why Rhadoop? Advantages Installation Configuration Sample Use Cases
Final Case Study Case Study: Twitter Analytics
using HIVE and FLUME
8/11/2019 Jigsaw Academy - Data Scientist Outline (1)
6/6
Data Scientist Course: Statistics
Jigsaw Academy Education Pvt. Ltd.
Statistics Introduction to statistics Summary statistics
Mean Median Mode Variance
Random Variables Probability Probability distribution
Binomial Poisson Normal
Module: Statistics
Hypothesis testing Intuition Standard approaches
T-test One sample Two Sample
Chi-square test Variance Association
Multiple Sample Tests ANOVA Chi Square
Non parametric testing