Upload
truongkhanh
View
214
Download
0
Embed Size (px)
Citation preview
Antrix Academy of Data Science
1
Course Offered:
1. Microsoft Excel & VBA
2. Base SAS Programming
3. Advanced SAS Programming
4. R Programming for Data Analytics
5. Python Based Data Analytics
6. Machine Learning with R
7. Big Data & Hadoop Development
Training Venue:
Bhagmal Complex, 110, 1st Floor, Captain Vijyant Thapar Marg, Naya
Bans, Naya Bans Village, Sector 15,Noida, Uttar Pradesh 201301, India
Contact No. 9971-283-969
Email Id - [email protected]
Website – www.antrixacademy.com
Antrix Academy of Data Science
2
Topic : Microsoft Excel & VBA
Duration 16 Hrs
Tool MS Excel 2013/2016
Learning Mode Instructor Led Training
Excel - Basic
Introduction to Excel
Working with Formulas and functions
Formating & Conditional Formating
Filtering, sorting, paste special etc
Functions (Logical & Text, Mathematical, Statistical etc)
Data Manipulation & Data Aggregation
Data Analysis using functions
Excel - Advanced
Analyzing Data using Pivots
Descriptive Statistics
Creating Charts & Graphics
Data analytics tool (What -if analysis, Goal seek, Data Table, Solver)
Protecting Workbooks, worksheets and formulas
Introduction to VBA
Working with VBE (Visual Basic Editor)
Introduction to Excel Object Model
Understanding of Sub and Function Procedures
Key Component of Programming Language
Understanding of If, Select Case, With End With Statements
Looping with VBA
User Defined Function
Some Commonly Used Macro Examples
Error Handling
Object and Memory Management in VBA
User Form Controls
ActiveX Controls
Communicating with Database MS Access through ADO - Exporting/Importing Data
Antrix Academy of Data Science
3
Topic : Base SAS Programming
Duration 20 Hrs
Tool SAS Studio 9.5
Learning Mode Instructor Led Training
SAS - Introduction - Data importing
Introduction to SAS, GUI
Concepts of Libraries, PDV, data execution etc
Building blocks of SAS (Data & Proc Steps - Statements & options)
Debugging SAS Codes
Importing different types of data & connecting to data bases
Data Understanding(Meta data, variable attributes(format, informat, length, label etc))
SAS Procedures for data import /export / understanding(Proc import/Proc contents/Proc
print/Proc means/Proc feq)
SAS - Data Manipulation
Data Manipulation steps(Sorting, filtering, duplicates, merging, appending, subsetting, derived
variables, sampling, Data type converstions, renaming, formatting, etc)
Data manipulation tools (Operators, Functions, Procedures, control structures, Loops, arrays )
SAS Functions (Text, numeric, date, utility functions)
SAS Procedures for data manipulation (Proc sort, proc format etc)
SAS Options (System Level, procedure level)
SAS - Exploratory Data Analysis & Data visualization
Introduction exploratory data analysis
Descriptive statistics, Frequency Tables and summarization
Univariate Analysis (Distribution of data & Graphical Analysis)
Bivariate Analysis(Cross Tabs, Distributions & Relationships, Graphical Analysis)
SAS Procedures for Data Analysis(proc freq/Proc means/proc summary/proc tabulate/Proc
univariate etc)
SAS Procedures for Graphical Analysis (Proc Sgplot, proc gplot etc)
SAS - Reporting - Output Exporting
Introduction to Reporting
SAS Reporting Procedures (Proc print, Proc Report, Proc Tabulate etc)
Exporting data sets into different formats (Using proc export)
Concept of ODS (output delivery system)
ODS System - Exporting output into different formats
Antrix Academy of Data Science
4
Topic : Advanced SAS Programming
Duration 20 Hrs
Tool SAS Studio 9.5
Learning Mode Instructor Led Training
Advanced SAS (Proc SQL - Macros) - Optimizing SAS
Introduction to Advanced SAS - Proc SQL & Macros
Understanding select statement (From, where, group by, having, order by etc)
Proc SQL - Data creation/extraction
Proc SQL - Data Manipulation steps
Proc SQL - Summarizing Data
Proc SQL - Concept of sub queries, indexes etc
SAS Macros - Creating/defining macro variables
SAS Macros - Defining/calling macros
SAS Macros- Concept of local/global variables
SAS Macros - Debugging techniques
Know How of Statistic Concepts
Introduction of Statistics
Descriptive and inferential statistics
Explanatory Versus Predictive Modeling
Population and samples
Uses of variable independent and dependent
Types of variables quantitative and categorical
Descriptive Statistics Introduction
Descriptive Statistics Introduction
Histogram
Measures of shape skewness
Box Plots
Univariante Procedure
Statistical graphics procedures
The SGPLOT Procedure
ODS Graphics Output
Using SAS to picture your data
Confidence Intervals for the Mean Introduction
Distribution of sample means
Normality and the central limit theorem
Calculation of 95% confidence interval
Hypothesis Testing introduction
Antrix Academy of Data Science
5
Decision Making Process
Steps in Hypothesis Testing
Types of error and power
The p value effect size and sample size
Statistical Hypothesis Test
the t statistic t distribution and two sided t test
Using proc univariate to generate a t statistic
Antrix Academy of Data Science
6
Topic : R Programming for Data Analytics
Duration 20 Hrs
Tool R Studio
Learning Mode Instructor Led Training
R-Introduction - Data Importing/Exporting
Introduction R/R-Studio - GUI
Concept of Packages - Useful Packages (Base & other packages) in R
Data Structure & Data Types (Vectors, Matrices, factors, Data frames, and Lists)
Importing Data from various sources
Database Input (Connecting to database)
Exporting Data to various formats)
Viewing Data (Viewing partial data and full data)
Variable & Value Labels – Date Values
R - Data Manipulation
Data Manipulation steps(Sorting, filtering, duplicates, merging, appending, subsetting, derived
variables, sampling, Data type converstions, renaming, formating etc)
Data manipulation tools(Operators, Functions, Packages, control structures, Loops, arrays etc)
R Built-in Functions (Text, numeric, date, utility functions)
R User Defined Functions
R Packages for data manipulation(base, dplyr, plyr, reshape,car, sqldf etc)
R - Data Analysis - Visualization
Introduction exploratory data analysis
Descriptive statistics, Frequency Tables and summarization
Univariate Analysis (Distribution of data & Graphical Analysis)
Bivariate Analysis(Cross Tabs, Distributions & Relationships, Graphical Analysis)
Creating Graphs- Bar/pie/line chart/histogram/boxplot/scatter/density etc)
R Packages for Exploratory Data Analysis(dplyr, plyr, gmodes, car, vcd, Hmisc, psych, doby etc)
R Packages for Graphical Analysis (base, ggplot, lattice etc)
Antrix Academy of Data Science
7
Topic : Python Based Data Analytics
Duration 20 Hrs
Tool Python IDE
Learning Mode Instructor Led Training
Python: Introduction & Essentials
Overview of Python- Starting Python
Introduction to Python Editors & IDE's(Canopy, pycharm, Jupyter, Rodeo, Ipython etc…)
Custom Environment Settings
Concept of Packages/Libraries - Important packages(NumPy, SciPy, scikit-learn, Pandas,
Matplotlib, etc)
Installing & loading Packages & Name Spaces
Data Types & Data objects/structures (Tuples, Lists, Dictionaries)
List and Dictionary Comprehensions
Variable & Value Labels – Date & Time Values
Basic Operations - Mathematical - string - date
Reading and writing data
Simple plotting
Control flow
Debugging
Code profiling
Python: Accessing/Importing and Exporting Data
Importing Data from various sources (Csv, txt, excel, access etc)
Database Input (Connecting to database)
Viewing Data objects - subsetting, methods
Exporting Data to various formats
Python: Data Manipulation – cleansing
Cleansing Data with Python
Data Manipulation steps(Sorting, filtering, duplicates, merging, appending, subsetting,
derived variables, sampling, Data type conversions, renaming, formatting etc)
Data manipulation tools(Operators, Functions, Packages, control structures, Loops, arrays
etc)
Python Built-in Functions (Text, numeric, date, utility functions)
Python User Defined Functions
Stripping out extraneous information
Normalizing data
Antrix Academy of Data Science
8
Formatting data
Important Python Packages for data manipulation (Pandas, Numpy etc)
Python: Data Analysis – Visualization
Introduction exploratory data analysis
Descriptive statistics, Frequency Tables and summarization
Univariate Analysis (Distribution of data & Graphical Analysis)
Bivariate Analysis(Cross Tabs, Distributions & Relationships, Graphical Analysis)
Creating Graphs- Bar/pie/line chart/histogram/boxplot/scatter/density etc)
Important Packages for Exploratory Analysis(NumPy Arrays, Matplotlib, Pandas and
scipy.stats etc)
Python: Basic statistics
Basic Statistics - Measures of Central Tendencies and Variance
Building blocks - Probability Distributions - Normal distribution - Central Limit Theorem
Inferential Statistics -Sampling - Concept of Hypothesis Testing
Statistical Methods - Z/t-tests (One sample, independent, paired), Anova, Correlation and
Chi-square
Python: Polyglot Programming
Making Python talk to other languages and database systems
How do R and Python play with each other
Antrix Academy of Data Science
9
Topic : Machine Learning with R
Duration 40 Hrs
Tool R Studio
Learning Mode Instructor Led Training
Introduction to Machine Learning
What is machine learning?
What are the use case of Machine learning?
Statistical learning vs. Machine learning
Iteration and evaluation
Major Classes of Learning Algorithms -Supervised vs Unsupervised Learning
Different Phases of Predictive Modelling (Data Pre-processing, Sampling, Model Building,
Validation)
Concept of Overfitting and Under fitting (Bias-Variance Trade off) & Performance Metrics
Types of Cross validation(Train & Test, Bootstrapping, K-Fold validation etc)
Introduction to CARET package
Introduction to H2O package
Supervised Learning
Linear Regression
Logistic regression
Generalization & Non Linearity
Recursive Partitioning(Decision Trees)
Ensemble Models(Random Forest, Bagging & Boosting(ada, gbm etc))
Artificial Neural Networks(ANN)
Support Vector Machines(SVM)
K-Nearest neighbours
Naive Bayes
Unsupervised Learning
K-means clustering
Challenges of unsupervised learning and beyond K-means
RECOMMENDATION ENGINE
Market Basket Analysis
Collaborative Filtering
SOCIAL MEDIA AND TEXT ANALYTICS USING R
Antrix Academy of Data Science
10
Social Media – Characteristics of Social Media
Applications of Social Media Analytics
Metrics(Measures Actions) in social media analytics
Examples & Actionable Insights using Social Media Analytics
Text Analytics – Sentiment Analysis using R
Text Analytics – Word cloud analysis using R
Text Analytics - K-Means Clustering
Text Mining, Social Network Analysis and NLP AND NLP
Taming big text, Unstructured vs. Semi-structured Data; Fundamentals of information retrieval,
Properties of words; Vector space models; Creating Term-Document (TxD);Matrices; Similarity
measures, Low-level processes (Sentence Splitting; Tokenization; Part-of-Speech Tagging;
Stemming; Chunking)
Handling big graphs
The purpose of it all: Finding patterns in data
Finding patterns in text: text mining, text as a graph
Natural Language processing (NLP)
Antrix Academy of Data Science
11
Topic : Big Data & Hadoop Development
Duration 40 Hrs
Tool SAS Studio 9.5
Learning Mode Instructor Led Training
Introduction to Big Data
Introduction and relevance
Uses of Big Data analytics in various industries like Telecom, E- commerce, Finance and
Insurance etc.
Problems with Traditional Large-Scale Systems
Hadoop (Big Data) Ecosystem
Motivation for Hadoop
Different types of projects by Apache
Role of projects in the Hadoop Ecosystem
Key technology foundations required for Big Data
Limitations and Solutions of existing Data Analytics Architecture
Comparison of traditional data management systems with Big Data management systems
Evaluate key framework requirements for Big Data analytics
Hadoop Ecosystem & Hadoop 2.x core components
Explain the relevance of real-time data
Explain how to use big and real-time data as a Business planning tool
Hadoop Cluster -Architecture - Configuration files
Hadoop Master-Slave Architecture
The Hadoop Distributed File System - Concept of data storage
Explain different types of cluster setups(Fully distributed/Pseudo etc)
Hadoop cluster set up - Installation
Hadoop 2.x Cluster Architecture
A Typical enterprise cluster – Hadoop Cluster Modes
Understanding cluster management tools like Cloudera manager/Apache ambari
Hadoop Core Components - HDFS & MapReduce(YARN)
HDFS Overview & Data storage in HDFS
Get the data into Hadoop from local machine(Data Loading Techniques) - vice versa
Map Reduce Overview (Traditional way Vs. MapReduce way)
Concept of Mapper & Reducer
Understanding MapReduce program Framework
Develop MapReduce Program using Java (Basic)
Antrix Academy of Data Science
12
Develop MapReduce program with streaming API) (Basic)
Data Integration Using SQOOP & FLUME
Integrating Hadoop into an Existing Enterprise
Loading Data from an RDBMS into HDFS by Using Sqoop
Managing Real-Time Data Using Flume
Accessing HDFS from Legacy Systems
Data Analysis using PIG Data Analysis Using PIG
Introduction to Data Analysis Tools
Apache PIG - MapReduce Vs Pig, Pig Use Cases
PIG’s Data Model
PIG Streaming
Pig Latin Program & Execution
Pig Latin : Relational Operators, File Loaders, Group Operator, COGROUP Operator, Joins and
COGROUP, Union, Diagnostic Operators, Pig UDF
Writing JAVA UDF’s
Embedded PIG in JAVA
PIG Macros
Parameter Substitution
Use Pig to automate the design and implementation of MapReduce applications
Use Pig to apply structure to unstructured Big Data
Data Analysis Using Hive
Apache Hive - Hive Vs. PIG - Hive Use Cases
Discuss the Hive data storage principle
Explain the File formats and Records formats supported by the Hive environment
Perform operations with data in Hive
Hive QL: Joining Tables, Dynamic Partitioning, Custom Map/Reduce Scripts
Hive Script, Hive UDF
Hive Persistence formats
Loading data in Hive - Methods
Serialization & Deserialization
Handling Text data using Hive
Integrating external BI tools with Hadoop Hive