12
Antrix Academy of Data Science 1 Course Offered: 1. Microsoft Excel & VBA 2. Base SAS Programming 3. Advanced SAS Programming 4. R Programming for Data Analytics 5. Python Based Data Analytics 6. Machine Learning with R 7. Big Data & Hadoop Development Training Venue: Bhagmal Complex, 110, 1st Floor, Captain Vijyant Thapar Marg, Naya Bans, Naya Bans Village, Sector 15,Noida, Uttar Pradesh 201301, India Contact No. 9971-283-969 Email Id - [email protected] Website www.antrixacademy.com

Course Offered - yet5.com Academy of Data Science...... Statistical etc) Data Manipulation & Data Aggregation ... Exploratory Data Analysis & Data ... Univariate Analysis (Distribution

Embed Size (px)

Citation preview

Antrix Academy of Data Science

1

Course Offered:

1. Microsoft Excel & VBA

2. Base SAS Programming

3. Advanced SAS Programming

4. R Programming for Data Analytics

5. Python Based Data Analytics

6. Machine Learning with R

7. Big Data & Hadoop Development

Training Venue:

Bhagmal Complex, 110, 1st Floor, Captain Vijyant Thapar Marg, Naya

Bans, Naya Bans Village, Sector 15,Noida, Uttar Pradesh 201301, India

Contact No. 9971-283-969

Email Id - [email protected]

Website – www.antrixacademy.com

Antrix Academy of Data Science

2

Topic : Microsoft Excel & VBA

Duration 16 Hrs

Tool MS Excel 2013/2016

Learning Mode Instructor Led Training

Excel - Basic

Introduction to Excel

Working with Formulas and functions

Formating & Conditional Formating

Filtering, sorting, paste special etc

Functions (Logical & Text, Mathematical, Statistical etc)

Data Manipulation & Data Aggregation

Data Analysis using functions

Excel - Advanced

Analyzing Data using Pivots

Descriptive Statistics

Creating Charts & Graphics

Data analytics tool (What -if analysis, Goal seek, Data Table, Solver)

Protecting Workbooks, worksheets and formulas

Introduction to VBA

Working with VBE (Visual Basic Editor)

Introduction to Excel Object Model

Understanding of Sub and Function Procedures

Key Component of Programming Language

Understanding of If, Select Case, With End With Statements

Looping with VBA

User Defined Function

Some Commonly Used Macro Examples

Error Handling

Object and Memory Management in VBA

User Form Controls

ActiveX Controls

Communicating with Database MS Access through ADO - Exporting/Importing Data

Antrix Academy of Data Science

3

Topic : Base SAS Programming

Duration 20 Hrs

Tool SAS Studio 9.5

Learning Mode Instructor Led Training

SAS - Introduction - Data importing

Introduction to SAS, GUI

Concepts of Libraries, PDV, data execution etc

Building blocks of SAS (Data & Proc Steps - Statements & options)

Debugging SAS Codes

Importing different types of data & connecting to data bases

Data Understanding(Meta data, variable attributes(format, informat, length, label etc))

SAS Procedures for data import /export / understanding(Proc import/Proc contents/Proc

print/Proc means/Proc feq)

SAS - Data Manipulation

Data Manipulation steps(Sorting, filtering, duplicates, merging, appending, subsetting, derived

variables, sampling, Data type converstions, renaming, formatting, etc)

Data manipulation tools (Operators, Functions, Procedures, control structures, Loops, arrays )

SAS Functions (Text, numeric, date, utility functions)

SAS Procedures for data manipulation (Proc sort, proc format etc)

SAS Options (System Level, procedure level)

SAS - Exploratory Data Analysis & Data visualization

Introduction exploratory data analysis

Descriptive statistics, Frequency Tables and summarization

Univariate Analysis (Distribution of data & Graphical Analysis)

Bivariate Analysis(Cross Tabs, Distributions & Relationships, Graphical Analysis)

SAS Procedures for Data Analysis(proc freq/Proc means/proc summary/proc tabulate/Proc

univariate etc)

SAS Procedures for Graphical Analysis (Proc Sgplot, proc gplot etc)

SAS - Reporting - Output Exporting

Introduction to Reporting

SAS Reporting Procedures (Proc print, Proc Report, Proc Tabulate etc)

Exporting data sets into different formats (Using proc export)

Concept of ODS (output delivery system)

ODS System - Exporting output into different formats

Antrix Academy of Data Science

4

Topic : Advanced SAS Programming

Duration 20 Hrs

Tool SAS Studio 9.5

Learning Mode Instructor Led Training

Advanced SAS (Proc SQL - Macros) - Optimizing SAS

Introduction to Advanced SAS - Proc SQL & Macros

Understanding select statement (From, where, group by, having, order by etc)

Proc SQL - Data creation/extraction

Proc SQL - Data Manipulation steps

Proc SQL - Summarizing Data

Proc SQL - Concept of sub queries, indexes etc

SAS Macros - Creating/defining macro variables

SAS Macros - Defining/calling macros

SAS Macros- Concept of local/global variables

SAS Macros - Debugging techniques

Know How of Statistic Concepts

Introduction of Statistics

Descriptive and inferential statistics

Explanatory Versus Predictive Modeling

Population and samples

Uses of variable independent and dependent

Types of variables quantitative and categorical

Descriptive Statistics Introduction

Descriptive Statistics Introduction

Histogram

Measures of shape skewness

Box Plots

Univariante Procedure

Statistical graphics procedures

The SGPLOT Procedure

ODS Graphics Output

Using SAS to picture your data

Confidence Intervals for the Mean Introduction

Distribution of sample means

Normality and the central limit theorem

Calculation of 95% confidence interval

Hypothesis Testing introduction

Antrix Academy of Data Science

5

Decision Making Process

Steps in Hypothesis Testing

Types of error and power

The p value effect size and sample size

Statistical Hypothesis Test

the t statistic t distribution and two sided t test

Using proc univariate to generate a t statistic

Antrix Academy of Data Science

6

Topic : R Programming for Data Analytics

Duration 20 Hrs

Tool R Studio

Learning Mode Instructor Led Training

R-Introduction - Data Importing/Exporting

Introduction R/R-Studio - GUI

Concept of Packages - Useful Packages (Base & other packages) in R

Data Structure & Data Types (Vectors, Matrices, factors, Data frames, and Lists)

Importing Data from various sources

Database Input (Connecting to database)

Exporting Data to various formats)

Viewing Data (Viewing partial data and full data)

Variable & Value Labels – Date Values

R - Data Manipulation

Data Manipulation steps(Sorting, filtering, duplicates, merging, appending, subsetting, derived

variables, sampling, Data type converstions, renaming, formating etc)

Data manipulation tools(Operators, Functions, Packages, control structures, Loops, arrays etc)

R Built-in Functions (Text, numeric, date, utility functions)

R User Defined Functions

R Packages for data manipulation(base, dplyr, plyr, reshape,car, sqldf etc)

R - Data Analysis - Visualization

Introduction exploratory data analysis

Descriptive statistics, Frequency Tables and summarization

Univariate Analysis (Distribution of data & Graphical Analysis)

Bivariate Analysis(Cross Tabs, Distributions & Relationships, Graphical Analysis)

Creating Graphs- Bar/pie/line chart/histogram/boxplot/scatter/density etc)

R Packages for Exploratory Data Analysis(dplyr, plyr, gmodes, car, vcd, Hmisc, psych, doby etc)

R Packages for Graphical Analysis (base, ggplot, lattice etc)

Antrix Academy of Data Science

7

Topic : Python Based Data Analytics

Duration 20 Hrs

Tool Python IDE

Learning Mode Instructor Led Training

Python: Introduction & Essentials

Overview of Python- Starting Python

Introduction to Python Editors & IDE's(Canopy, pycharm, Jupyter, Rodeo, Ipython etc…)

Custom Environment Settings

Concept of Packages/Libraries - Important packages(NumPy, SciPy, scikit-learn, Pandas,

Matplotlib, etc)

Installing & loading Packages & Name Spaces

Data Types & Data objects/structures (Tuples, Lists, Dictionaries)

List and Dictionary Comprehensions

Variable & Value Labels – Date & Time Values

Basic Operations - Mathematical - string - date

Reading and writing data

Simple plotting

Control flow

Debugging

Code profiling

Python: Accessing/Importing and Exporting Data

Importing Data from various sources (Csv, txt, excel, access etc)

Database Input (Connecting to database)

Viewing Data objects - subsetting, methods

Exporting Data to various formats

Python: Data Manipulation – cleansing

Cleansing Data with Python

Data Manipulation steps(Sorting, filtering, duplicates, merging, appending, subsetting,

derived variables, sampling, Data type conversions, renaming, formatting etc)

Data manipulation tools(Operators, Functions, Packages, control structures, Loops, arrays

etc)

Python Built-in Functions (Text, numeric, date, utility functions)

Python User Defined Functions

Stripping out extraneous information

Normalizing data

Antrix Academy of Data Science

8

Formatting data

Important Python Packages for data manipulation (Pandas, Numpy etc)

Python: Data Analysis – Visualization

Introduction exploratory data analysis

Descriptive statistics, Frequency Tables and summarization

Univariate Analysis (Distribution of data & Graphical Analysis)

Bivariate Analysis(Cross Tabs, Distributions & Relationships, Graphical Analysis)

Creating Graphs- Bar/pie/line chart/histogram/boxplot/scatter/density etc)

Important Packages for Exploratory Analysis(NumPy Arrays, Matplotlib, Pandas and

scipy.stats etc)

Python: Basic statistics

Basic Statistics - Measures of Central Tendencies and Variance

Building blocks - Probability Distributions - Normal distribution - Central Limit Theorem

Inferential Statistics -Sampling - Concept of Hypothesis Testing

Statistical Methods - Z/t-tests (One sample, independent, paired), Anova, Correlation and

Chi-square

Python: Polyglot Programming

Making Python talk to other languages and database systems

How do R and Python play with each other

Antrix Academy of Data Science

9

Topic : Machine Learning with R

Duration 40 Hrs

Tool R Studio

Learning Mode Instructor Led Training

Introduction to Machine Learning

What is machine learning?

What are the use case of Machine learning?

Statistical learning vs. Machine learning

Iteration and evaluation

Major Classes of Learning Algorithms -Supervised vs Unsupervised Learning

Different Phases of Predictive Modelling (Data Pre-processing, Sampling, Model Building,

Validation)

Concept of Overfitting and Under fitting (Bias-Variance Trade off) & Performance Metrics

Types of Cross validation(Train & Test, Bootstrapping, K-Fold validation etc)

Introduction to CARET package

Introduction to H2O package

Supervised Learning

Linear Regression

Logistic regression

Generalization & Non Linearity

Recursive Partitioning(Decision Trees)

Ensemble Models(Random Forest, Bagging & Boosting(ada, gbm etc))

Artificial Neural Networks(ANN)

Support Vector Machines(SVM)

K-Nearest neighbours

Naive Bayes

Unsupervised Learning

K-means clustering

Challenges of unsupervised learning and beyond K-means

RECOMMENDATION ENGINE

Market Basket Analysis

Collaborative Filtering

SOCIAL MEDIA AND TEXT ANALYTICS USING R

Antrix Academy of Data Science

10

Social Media – Characteristics of Social Media

Applications of Social Media Analytics

Metrics(Measures Actions) in social media analytics

Examples & Actionable Insights using Social Media Analytics

Text Analytics – Sentiment Analysis using R

Text Analytics – Word cloud analysis using R

Text Analytics - K-Means Clustering

Text Mining, Social Network Analysis and NLP AND NLP

Taming big text, Unstructured vs. Semi-structured Data; Fundamentals of information retrieval,

Properties of words; Vector space models; Creating Term-Document (TxD);Matrices; Similarity

measures, Low-level processes (Sentence Splitting; Tokenization; Part-of-Speech Tagging;

Stemming; Chunking)

Handling big graphs

The purpose of it all: Finding patterns in data

Finding patterns in text: text mining, text as a graph

Natural Language processing (NLP)

Antrix Academy of Data Science

11

Topic : Big Data & Hadoop Development

Duration 40 Hrs

Tool SAS Studio 9.5

Learning Mode Instructor Led Training

Introduction to Big Data

Introduction and relevance

Uses of Big Data analytics in various industries like Telecom, E- commerce, Finance and

Insurance etc.

Problems with Traditional Large-Scale Systems

Hadoop (Big Data) Ecosystem

Motivation for Hadoop

Different types of projects by Apache

Role of projects in the Hadoop Ecosystem

Key technology foundations required for Big Data

Limitations and Solutions of existing Data Analytics Architecture

Comparison of traditional data management systems with Big Data management systems

Evaluate key framework requirements for Big Data analytics

Hadoop Ecosystem & Hadoop 2.x core components

Explain the relevance of real-time data

Explain how to use big and real-time data as a Business planning tool

Hadoop Cluster -Architecture - Configuration files

Hadoop Master-Slave Architecture

The Hadoop Distributed File System - Concept of data storage

Explain different types of cluster setups(Fully distributed/Pseudo etc)

Hadoop cluster set up - Installation

Hadoop 2.x Cluster Architecture

A Typical enterprise cluster – Hadoop Cluster Modes

Understanding cluster management tools like Cloudera manager/Apache ambari

Hadoop Core Components - HDFS & MapReduce(YARN)

HDFS Overview & Data storage in HDFS

Get the data into Hadoop from local machine(Data Loading Techniques) - vice versa

Map Reduce Overview (Traditional way Vs. MapReduce way)

Concept of Mapper & Reducer

Understanding MapReduce program Framework

Develop MapReduce Program using Java (Basic)

Antrix Academy of Data Science

12

Develop MapReduce program with streaming API) (Basic)

Data Integration Using SQOOP & FLUME

Integrating Hadoop into an Existing Enterprise

Loading Data from an RDBMS into HDFS by Using Sqoop

Managing Real-Time Data Using Flume

Accessing HDFS from Legacy Systems

Data Analysis using PIG Data Analysis Using PIG

Introduction to Data Analysis Tools

Apache PIG - MapReduce Vs Pig, Pig Use Cases

PIG’s Data Model

PIG Streaming

Pig Latin Program & Execution

Pig Latin : Relational Operators, File Loaders, Group Operator, COGROUP Operator, Joins and

COGROUP, Union, Diagnostic Operators, Pig UDF

Writing JAVA UDF’s

Embedded PIG in JAVA

PIG Macros

Parameter Substitution

Use Pig to automate the design and implementation of MapReduce applications

Use Pig to apply structure to unstructured Big Data

Data Analysis Using Hive

Apache Hive - Hive Vs. PIG - Hive Use Cases

Discuss the Hive data storage principle

Explain the File formats and Records formats supported by the Hive environment

Perform operations with data in Hive

Hive QL: Joining Tables, Dynamic Partitioning, Custom Map/Reduce Scripts

Hive Script, Hive UDF

Hive Persistence formats

Loading data in Hive - Methods

Serialization & Deserialization

Handling Text data using Hive

Integrating external BI tools with Hadoop Hive