24
An Analyst’s Toolbox and the inclusion of R Ali Arsalan Kazmi April 9 th , 2015

Aimia Analytics Off site, Dubai, 2015

Embed Size (px)

Citation preview

Page 1: Aimia Analytics Off site, Dubai, 2015

An Analyst’s Toolbox and the inclusion of RAli Arsalan KazmiApril 9th, 2015

Page 2: Aimia Analytics Off site, Dubai, 2015

2

What is this presentation about?

ToolsThey determine

• What can be done

• How can it be done

• By when can it be done

They undergo rapid changes/improvements

New tools constantly made

All tools designed to be ‘cures’ for specific ‘problems’

Page 3: Aimia Analytics Off site, Dubai, 2015

3

Tools

Tools may be• Under-utilised

• Over-utilised

• Incorrectly utilised

Tools may not ‘cure’ problems, as such

“What problems do we ‘cure’ using our tools at

Aimia?”

Page 4: Aimia Analytics Off site, Dubai, 2015

4

Presentation Overview

• R

• An Analyst’s work flow

• Tool #1: Reproducibility

• Tool #2: Automation

• Tool #3: Visualisation

• What Problems R does not solve

• Conclusion: A Data Analyst’s Toolbox

Page 5: Aimia Analytics Off site, Dubai, 2015

5

R – Environment for Statistical Computing

Why choose R?• Lingua France for Computational Statisticians

• Now has capability to perform almost all of data extraction, manipulation, analyses, and visualisation tasks

• Offers specialist as well as general data analyses functions

• Is continuously improved

• Given all the above, is still free

Page 6: Aimia Analytics Off site, Dubai, 2015

6

Presentation Overview

• R

• An Analyst’s work flow

• Tool #1: Reproducibility

• Tool #2: Automation

• Tool #3: Visualisation

• What Problems R does not solve

• Conclusion: A Data Analyst’s Toolbox

Page 7: Aimia Analytics Off site, Dubai, 2015

7

An Analyst’s Work Flow

Generally, different tools used at each stage

Each stage faces different problems

…But tools available to cure the problems…

Page 8: Aimia Analytics Off site, Dubai, 2015

8

An Analyst’s Work Flow

Generally, different tools used at each stage

Each stage faces different problems

…But tools available to cure the problems…

Page 9: Aimia Analytics Off site, Dubai, 2015

9

Presentation Overview

• R

• An Analyst’s work flow

• Tool #1: Reproducibility

• Tool #2: Automation

• Tool #3: Visualisation

• What Problems R does not solve

• Conclusion: A Data Analyst’s Toolbox

Page 10: Aimia Analytics Off site, Dubai, 2015

10

Tool #1: Reproducibility

Definition: The quality of analyses or a work flow to be reproduced

• Computational Reproducibility

• Statistical Reproducibility

What problems does this tool cure?

• Unreliability

• Lack of Quality Control

• Concealment of knowledge

• Dynamic/Reactive Documents and Reports

Page 11: Aimia Analytics Off site, Dubai, 2015

11

Tool #1: Reproducibility

How to use this tool (i.e. Reproducibility)?

Page 12: Aimia Analytics Off site, Dubai, 2015

12

Tool #1: Reproducibility

How to use this tool?

With Excel?• Absence of operation history

• Unclear intra/inter-sheet organisation

• Quality Control difficult

• Difficult for a newcomer to follow

With R?• Command history available

• Comments included in script

• Scripting – easier to automate

• Quality Control checks injected into code

• Dynamic/Reactive documents and reports

Page 13: Aimia Analytics Off site, Dubai, 2015

13

Presentation Overview

• R

• An Analyst’s work flow

• Tool #1: Reproducibility

• Tool #2: Automation

• Tool #3: Visualisation

• What Problems R does not solve

• Conclusion: A Data Analyst’s Toolbox

Page 14: Aimia Analytics Off site, Dubai, 2015

14

Tool #2: Automation

Definition: Using a computer to mechanise a set of tasks on the bases of rules (i.e. programmable code)

What problems does this tool cure?

• Unproductiveness

• Human-induced errors

Page 15: Aimia Analytics Off site, Dubai, 2015

15

Tool #2: Automation

How to use this tool?

With Excel?• Record a macro for basic tasks

• Learn VBA

With R?• Windows Task Scheduler

• scheduleR package

Page 16: Aimia Analytics Off site, Dubai, 2015

16

Presentation Overview

• R

• An Analyst’s work flow

• Tool #1: Reproducibility

• Tool #2: Automation

• Tool #3: Visualisation

• What Problems R does not solve

• Conclusion: A Data Analyst’s Toolbox

Page 17: Aimia Analytics Off site, Dubai, 2015

17

Tool #3: Visualisation

Definition: A visual generated in response to a question(s). Opens up the analysis of data pictorially.

What problems does this tool cure?

• Unintuitive Communication of data analyses

• Inaccessible insights from data

What if the user does not have a static set of questions?

• Interactivity

Page 18: Aimia Analytics Off site, Dubai, 2015

18

Tool #3: Visualisation

How to use this tool?

With Excel?• Inflexible

• Difficult to Automate

• Inefficient

• Less fluid Dashboards

With R?• Charting package (ggplot2) based on

a Grammar of Graphics

• Greater charting capabilities

• Easily automated

• Efficient with large data sets

• Interactivity

• Interactive and efficient Dashboards

Page 19: Aimia Analytics Off site, Dubai, 2015

19

Tool #3: Visualisation

Page 20: Aimia Analytics Off site, Dubai, 2015

20

Presentation Overview

• R

• An Analyst’s work flow

• Tool #1: Reproducibility

• Tool #2: Automation

• Tool #3: Visualisation

• What Problems R does not solve

• Conclusion: A Data Analyst’s Toolbox

Page 21: Aimia Analytics Off site, Dubai, 2015

21

What Problems R does not solve?

R has:

• Difficult for some to use…?

• Limitations based on size of data

But is rapidly improving (packages support parallelisation and Hadoop)

Page 22: Aimia Analytics Off site, Dubai, 2015

22

Presentation Overview

• R

• An Analyst’s work flow

• Tool #1: Reproducibility

• Tool #2: Automation

• Tool #3: Visualisation

• What Problems R does not solve

• Conclusion: A Data Analyst’s Toolbox

Page 23: Aimia Analytics Off site, Dubai, 2015

23

A Data Analyst’s Toolbox

In today’s world, tools used by analysts/computational statisticians, computer scientists are continuously evolving…

A toolbox, ideally, will contain tools to cure multiple problems across multiple dimensions

Over/under-utilisation of tools, and incorrect use of tools will keep us bounded by problems

Page 24: Aimia Analytics Off site, Dubai, 2015

Thank You