10 Commandments for BI in Big Data, Shant Hovsepian, Arcadia Data [FirstMark's Data Driven]

Preview:

Citation preview

What’s so special aboutBI on Big Data?

12.14.15

Shant Hovsepian@superdupershant

Presentation prepared for Data Driven NYC #42

1

What’s so special aboutBI on Big Data?

12.14.15

Shant Hovsepian@superdupershant

Presentation prepared for Data Driven NYC #42

2

#BigDataSeacrest

Co-Founder & CTO

Came out of stealth mode in June and just announced our GA product release.

Rapidly Growing and focused on the Fortune 2000

See lots of customer struggles with data, Big and Small

You don’t use previous generation architectures to store Big Data so why use previous generation BI tools to analyze it?

Create businessvalue fromBig Data

Data Driven NYC #42 12.14.15 3

– OUR FOUNDING VISION –

Data Driven NYC #42 12.14.15 4

@BigDataBorat

Data Driven NYC #42 12.14.15 5

#BigDataSeacrest

Data Driven NYC #42 12.14.15 6

#BigDataMoses

-The 10 commandments of BI ON BIG DATA-

Thou shalt notmove Big Data

7Data Driven NYC #42 12.14.15

Moving Big Data Is Expensive

On-Cluster BI is now possible

Push all the computation down close to the data

Careful having to extract data out to data marts & cubes

-Lots of native analysis engines out there, make sure your BI tools support them.-ODBC/JDBC connectors aren’t always enough.-

-Having to extract data out of the system is slow and defeats the purpose of having a specialized architecture.-Extracts and cubes in situ aren’t so bad as long as it’s not a required first step to analysis.-

-YARN, Mesos, have made it possible to run a BI server right next to the data.-The benefits of unified management, performance, workload management are just huge when the infrastructure is converged.-

8Data Driven NYC #42 12.14.15

Thou shalt not stealor violate corporate security policy

9Data Driven NYC #42 12.14.15

Data Driven NYC #42 12.14.15 10

Security is Serious

-All the serious Big Data infrastructure vendors have implemented some form of security, your BI tool should support it.-BI software shouldn’t require re-implementing all the access control rules all over again. -RBAC – Role Based Access Control-Single Sign On especially for embedded use cases-

Thou shalt not payfor every user or megabyte

11Data Driven NYC #42 12.14.15

Be wary of pricing models that penalize you for increased adoption

-We’ve seen Big Data deployments quadruple in size and adoption within a couple of months-Keep an eye out for licensing models that bill for users or data size, these too can grow much quicker than you can anticipate-

12Data Driven NYC #42 12.14.15

Thou shalt covetthy neighbor’s visualizations

13Data Driven NYC #42 12.14.15

First Class Support for Collaboration

SHAREPUBLISH-Export to PDF or email is expected by everyone.-Publish to server to preserve interactivity instead of a static image.-Supporting source data updates after publishing is even better.-

-Preserve data lineage and how.-Network effects, github for BI clone and fork.-

14

Collaborative exploration is needed because in some cases no single person understands the entire data set.

Data Driven NYC #42 12.14.15

Thou shalt analyze thine datain its natural form

15Data Driven NYC #42 12.14.15

This is What Big Data Looks Like-Free form text-

16Data Driven NYC #42 12.14.15

This is What Big Data Looks Like-Free form text-Key Value Pairs-

17Data Driven NYC #42 12.14.15

8=FIX.4.2^A9=145^A35=D^A34=4^A49=ABC_DEFG01^A52=20090323-15:40:29^A56=CCG^A115=XYZ^A11=NF0542/03232009^A54=1^A38=100^A55=CVS^A40=1^A59=0^A47=A^A60=20090323-15:40:29^A21=1^A207=N^A10=139^A

This is What Big Data Looks Like-Free form text-Key Value Pairs-JSON / Semi-Structured-

18Data Driven NYC #42 12.14.15

8=FIX.4.2^A9=145^A35=D^A34=4^A49=ABC_DEFG01^A52=20090323-15:40:29^A56=CCG^A115=XYZ^A11=NF0542/03232009^A54=1^A38=100^A55=CVS^A40=1^A59=0^A47=A^A60=20090323-15:40:29^A21=1^A207=N^A10=139^A

This is What Big Data Looks Like-Free form text-Key Value Pairs-JSON / Semi-Structured-Tables-

19Data Driven NYC #42 12.14.15

8=FIX.4.2^A9=145^A35=D^A34=4^A49=ABC_DEFG01^A52=20090323-15:40:29^A56=CCG^A115=XYZ^A11=NF0542/03232009^A54=1^A38=100^A55=CVS^A40=1^A59=0^A47=A^A60=20090323-15:40:29^A21=1^A207=N^A10=139^A

20Data Driven NYC #42 12.14.15

Don’t let your BIsolution tell youotherwise.

Thou shalt not waitendlessly for thy results

21Data Driven NYC #42 12.14.15

No Surprise Here, Things Should Be Fast

Take Samples of the Data

Build anOLAP Cube

Create Temp Tables-

This works pretty well once you’ve got a good idea of what metrics matter.-Don’t get stuck with “cube first results later”.-Make sure your cubes can live on cluster or scale out easily.-

-This can be as simple as fancy caching. Make sure some of tables can be intelligently reused.-Materialize complex expressions so we don’t have to recalculate them every time.-Store them on cluster where they belong. Be wary of extracts out. -

22

Tricks legacy BI tools use to achieve performance

Data Driven NYC #42 12.14.15

-Instant gratification though the results may not be correct initially.-How far down can the samples be pushed, need to cognizant of blocking operations. -

Thou shalt not buildreports but apps instead

23Data Driven NYC #42 12.14.15

What comes to mind when I say reports?

24Data Driven NYC #42 12.14.15

What comes to mind when I say reports?

-Traffic Report-

25Data Driven NYC #42 12.14.15

What comes to mind when I say reports?

-Traffic Report-Weather Report-

26Data Driven NYC #42 12.14.15

What comes to mind when I say reports?

-Traffic Report-Weather Report-Book Report-

27Data Driven NYC #42 12.14.15

What comes to mind when I say reports?

-Traffic Report-Weather Report-Book Report-Report Card-

28Data Driven NYC #42 12.14.15

What comes to mind when I say reports?

-Traffic Report-Weather Report-Book Report-Report Card-

29Data Driven NYC #42 12.14.15

What comes to mind when I say apps?

30Data Driven NYC #42 12.14.15

What comes to mind when I say apps?

31Data Driven NYC #42 12.14.15

Visual Information Seeking Mantra

Rails made web apps easy, BI Tool should do the same.

Async data from multiple sources.

Interact with Visual elements not text boxes.

-Pull in new data async without having to refresh the entire thing.-Supporting auxiliary data sources and APIs to bring in richer content.-

-We don’t to deal with control box and parameter hell.-Want to be able to interact with actual visual elements drawn have the visualization update accordingly.-

-Templates and reusable components.-Decoupling the data from the app and make it easy to manage and mass produce multiple apps. -

32

“overview, zoom and filter, then details on demand

Data Driven NYC #42 12.14.15

Thou shalt useintelligent tools

33Data Driven NYC #42 12.14.15

“Smart” BI Tools will help the user out.

34

-Help with suggesting Vizs to create.-Built in search for everything.-Automatically maintaining models and caches the burden isn’t on the end user.-

Data Driven NYC #42 12.14.15

Thou shalt go beyondthe basics

35Data Driven NYC #42 12.14.15

You don’t ask the same questions of your Big Data?

Make some of that functionality is available in an easy to use manner.

36

Big Data is a gold mind of predictive and advanced analytics use cases.

Data Driven NYC #42 12.14.15

Thou shalt use Arcadia Data

37Data Driven NYC #42 12.14.15

Thou shalt use Arcadia DataJust kidding

38Data Driven NYC #42 12.14.15

39

Arcadia DataConvergedAnalyticsPlatform

arcadiadata.com

Data Driven NYC #42 12.14.15

Thank you.

40

12.14.15

Data Driven NYC #42 12.14.15

Recommended