45
Home of Redis Analytics at the Speed of Business with Redis and Spark Leena Joshi VP Product Marketing Noel Yuhanna Principal Analyst, Forrester

Running Analytics at the Speed of Your Business

Embed Size (px)

Citation preview

Page 1: Running Analytics at the Speed of Your Business

Home of Redis

Analytics at the Speed of Business with Redis and Spark

Leena JoshiVP Product Marketing

Noel YuhannaPrincipal Analyst, Forrester

Page 2: Running Analytics at the Speed of Your Business

2

Agenda

• Why Data & Analytics Need to be Real Time

• Drivers and Challenges for Real time analytics

• The Roadmap to Fast Data

• Recommendations

• Brief Introduction to Redis

• Analytics with Redis

• Redis –Spark Integration

• Making Analytics Cost Effective

• Extended analytics with Redis Modules

Noel Yuhanna – 20 min Leena Joshi – 20 min

Page 3: Running Analytics at the Speed of Your Business

Running Analytics At The Speed Of

Your BusinessNoel Yuhanna, Principal Analyst

RedisLabs Webinar

Page 4: Running Analytics at the Speed of Your Business

© 2016 Forrester Research, Inc. Reproduction Prohibited 4

Data bottlenecks are creating

business bottlenecks that’s

impacting growth and innovation!

Page 5: Running Analytics at the Speed of Your Business

© 2016 Forrester Research, Inc. Reproduction Prohibited 5

Currency Oil

Digital transformation is all about the data…

But what if your data is slow and that’s not being

utilized for analytics or in a timely manner?

Data is the new

Page 6: Running Analytics at the Speed of Your Business

Today business users think of analytics as a set of boring reports

and dashboards … they don’t want yesterdays data tomorrow!

of enterprise datain used for analytics….

12%

Page 7: Running Analytics at the Speed of Your Business

Source: Forrester

Performance remains a key Database challenge..

Page 8: Running Analytics at the Speed of Your Business

© 2016 Forrester Research, Inc. Reproduction Prohibited 8

Trends affecting your Database strategy..

Database

› Increasing transaction volume

› Data volume explosion

› Continuous 24x7 availability

› Stronger security measures

› All types of data formats

› New analytical requirements

› Faster access to information

› Co-related/unified data access

› More self-service capabilities

› Unpredictable workloads/patterns

DatabaseDatabase

Page 9: Running Analytics at the Speed of Your Business

© 2016 Forrester Research, Inc. Reproduction Prohibited 9

Businesses want real-time access to information…

› Mobile devices – we need data now!

› Competitive pressure – to act more quickly

› Pressure from businesses (LOB) - to support real-time data access

› New insights, advanced analytics – real-time BI

› Global business – that needs global real-time access

› IOT Applications – sensors, devices . .

› Lower cost of memory and computing

Page 10: Running Analytics at the Speed of Your Business

© 2016 Forrester Research, Inc. Reproduction Prohibited 10

TREND – The need for Fast Data

Real-time

Mostly

Batch

FAST DATA

Page 11: Running Analytics at the Speed of Your Business

© 2016 Forrester Research, Inc. Reproduction Prohibited 11

What is Fast Data?

Fast Data is combining Systems of Engagement (batch) and

Systems of Record(Real-time) together quickly to support

new next-generation business analytics.

Systems of engagement (SOE)

• Mobile, web, and smart devices

• Frequent changes

• Delight clients

• Delivered frequently

Systems of record (SOR)

• Stable requirements

• Highly transactional

• Less change

• Delivered infrequently

Forrester estimates that 20% of all data in an enterprises

is Fast Data, and that’ll double over the next three years.

Fast

Data

Traditional DataReal-time data

Page 12: Running Analytics at the Speed of Your Business

© 2016 Forrester Research, Inc. Reproduction Prohibited 12

Key capabilities you need for Fast Data Strategy

› Distributed In-memory computing layer

› Low-latency access to large volumes of data

› Ability to integrate data from disparate data sources

› Continuous availability of the database/data platform

› Support for scale-out architecture to support extreme scale

› Ability to support hybrid environment – on-prem and cloud

› Easy to deploy, highly automated and with built-in intelligence

Page 13: Running Analytics at the Speed of Your Business

© 2016 Forrester Research, Inc. Reproduction Prohibited 13

Apache Spark offers new possibilities…

› Open Source distributed computing framework that uses in-memory

platform to scale, process and provide low-latency access

› Key benefits: i) Performance, ii) Supports streaming and complex

analytics, iii) Supports SQL, iv) Easy to write Apps using Java, Scala or

Python.

› Use cases: i) Sensor data processing, ii) Stream processing, iii)

Interactive analytics and data processing platform, iv) Interactive

algorithms in machine learning, v) IOT analytics, vi) Complex analytics.

› Adoption: Current adoption of Apache Spark is estimated at 30% in

large enterprises likely to double in the next three years.

Page 14: Running Analytics at the Speed of Your Business

© 2016 Forrester Research, Inc. Reproduction Prohibited 14

Road Map for your Fast Data Strategy

Page 15: Running Analytics at the Speed of Your Business

© 2016 Forrester Research, Inc. Reproduction Prohibited 15

Recommendations

› In the era of big data, you need to look beyond traditional data

architectures to succeed and gain competitive advantage.

› Fast data strategy needs to be on your roadmap, focusing on making

data available more quickly to business users and decision makers.

› Look for automation, simplification and easy-of-use database solutions

that can help support faster time-to-value initiatives.

› Look at in-memory and scale-out architectures to support new

business analytics to grow business and innovate.

› Look at open source that can provide lower cost and deliver a platform

to support your fast data strategy.

Page 16: Running Analytics at the Speed of Your Business

© 2009 Forrester Research, Inc. Reproduction Prohibited

Thank you

Noel Yuhanna

www.forrester.com

Twitter: @nyuhanna

Page 17: Running Analytics at the Speed of Your Business

17

Who We Are

The open source home and commercial provider of Redis

Open source. The leading in-memory data structure store, supporting any high performance operational or analytic use case.

Page 18: Running Analytics at the Speed of Your Business

18

Redis is a Game Changer

Simplicity(through Data Structures)

Extensibility (through Redis Modules)

Performance

ListsSorted Sets

Hashes Hyperlog-logs

Geospatial Indexes

Bitmaps

SetsStrings

Bit field

Page 19: Running Analytics at the Speed of Your Business

19

• Used by developers like “Lego” blocks

• Enables data to be processed on the database level rather than the application level

• Turns complex functionality into a single command such as:"Get the e-mail address of the user with the highest score in a game that started on July 24th at 11:00pm PST”ZREVRANGE 07242015_2300 0 0

Simplicity: Data Structures - Redis’ Building Blocks

ListsSorted Sets

HashesHyperlog-

logs

Geospatial IndexesBitmaps

SetsStrings

• Enable solving complex problems by creating relations between data structures, using standard or custom (Lua) commands

• The result: cleaner, more elegant code, faster execution time

Page 20: Running Analytics at the Speed of Your Business

20

Extensibility: Modules Extend Redis Infinitely

• Add-ons using a Redis API for seamlessly adding to it use cases and data structures

• Modules enjoy Redis’ simplicity, super high performance, infinite scalability and high availability

• Modules can be created by anyone. Certified by Redis Labs.

Full Text Search Enhanced JSON Graph Operations Secondary Indexes

Linear Algebra SQL Support Image ProcessingN-Dimension

Queries …

Page 21: Running Analytics at the Speed of Your Business

21

Performance: the Most Powerful Database

Highest Throughput at Lowest Latency in High Volume of Writes Scenario

Lowest number of servers needed to deliver 1 Million writes/second

300

50 50

20

50

100

150

200

250

300

350

Benchmarks performed by Avalon Consulting Group Benchmarks published in the Google blog

Page 22: Running Analytics at the Speed of Your Business

22

Redis CloudAvailable since mid-2013

6,100+ enterprise customers

Redis Labs Enterprise Cluster (RLEC)Available since early-2015

100+ enterprise customers

Wide Adoption

Page 23: Running Analytics at the Speed of Your Business

Why Use Redis in Analytics

Page 24: Running Analytics at the Speed of Your Business

24

Popular Redis Use Cases

Geo SearchData Ingestion Social Functionality

Following, Followers, Relations Location-based ApplicationsHigh Throughput Buffering

Job & Queue Caching

Any Business Application Any Web or Mobile App

High Speed Transactions Time-Series

Business Applications

Analytics

Real-time Computations Time-Based Analysis

Page 25: Running Analytics at the Speed of Your Business

25

Example : Redis For Bid Management

The Application Problem

• Many users bidding on items• Need to instantly show who’s

leading, in what order and by how much

• May also need to display analytics like how many users are bidding in what range

• Disk-based DBMS-es are too slow for real-time, high scale calculations

Why Redis Rocks This

• Sorted sets automatically keep list of users and scores updated and in order (ZADD)

• ZRANGE, ZREVRANGE will get your top users

• ZRANK will get any users rank instantaneously

• ZCOUNT will return a count of users in a range,

• ZRANGEBYSCORE will return all the users in a range by their bids

Page 26: Running Analytics at the Speed of Your Business

26

Redis Sorted Sets

ZADD item:1 10000 id:2 21000 id: 1ZADD item:1 34000 id:3 35000 id 4ZINCRBY item1:1 10000 id:3

ZREVRANGE item:1 0 0id:3

Item: 1id:3 44000

id:4 35000

id:1

id:2

21000

10000

Page 27: Running Analytics at the Speed of Your Business

27

Example : Redis For RecommendationsThe Application Problem

• Users, items, likes, dislikes, similarities• Set comparisons of user likes, user

dislikes should help create similarity scores, which can then be stored in a sorted set

• Set comparisons of similar user likes/dislikes with items not purchased by current user should yield suggestions

• High speed and low latency requirements

Why Redis Rocks This• Redis Sets are unordered collections

of strings- SADD to add objects to each tag

• Set operations executed in –memory, blazing fast speeds

• SINTER, SINTERSTORE to intersect

multiple sets

• SUNIONSTORE to add multiple sets

• SISMEMBER to determine membership,

SMEMBERS to retrieve all values

• Sets and Sorted sets combined are a great choice for recommendation engines

Page 28: Running Analytics at the Speed of Your Business

28

Redis Sets

SADD item:1 tag:1 tag:22 tag:24SADD tag:1 item:1SADD tag: 2 item:22 item:14 item:3

SINTER tag1 tag2item:3

SUNIONSTORE tag:x tag1 tag2SMEMBERS tag:xitem:1 item:3 item:22 item:14 item:3

item 1 {tag:1, tag:22, tag:24}

{item:1, item:3}tag 1

{item:22, item:14, item: 3}tag 2

{item:1, item:22, item:14, item: 3}tag x

Page 29: Running Analytics at the Speed of Your Business

Redis & Spark

Page 30: Running Analytics at the Speed of Your Business

30

Spark & Redis – Serving Layer & Accelerator

Internal accelerator

Page 31: Running Analytics at the Speed of Your Business

31

Accelerate Spark Time-Series with Redis

Redis sorted sets accelerate time series data processing by 100 times compared to other in-

memory K/V stores

Example time series data: Stock prices for 1024 stocks over 32 years

Page 32: Running Analytics at the Speed of Your Business

32

Accelerating Spark Time-Series with Redis

Redis is faster by upto 100 times compared to HDFS and over 45 times compared to Tachyon or Spark

Page 33: Running Analytics at the Speed of Your Business

33

More Details About the Redis & Spark Integration

Github link: Spark-Redis Connector Package https://github.com/RedisLabs/spark-redis

How to get started with Spark and Redis:https://redislabs.com/solutions/spark-and-redis

Blog: https://redislabs.com/blog/connecting-spark-and-redis

Page 34: Running Analytics at the Speed of Your Business

Cost Effective Analytics

Page 35: Running Analytics at the Speed of Your Business

35

Price/Performance of Memory Technology

Page 36: Running Analytics at the Speed of Your Business

36

Redis on Flash Flash used as a RAM extender and NOT as persistent storage

Page 37: Running Analytics at the Speed of Your Business

37

How to Achieve Optimal Price/Performance

By dynamically setting RAM/Flash ratio Behind the scenes…

Page 38: Running Analytics at the Speed of Your Business

38

Single Server Results with Dell & Samsung NVMe

read

write

read

write

Avg: 2.04M ops/sec

Max: 2.14M ops/sec

Avg: 0.91msec

Max: 0.98 msec

% below 1msec: 100%

Avg: 313RMB / 9.4WMB

Max: 1.71RGB / 96WMB

Avg: 1.45Gbps (Tx) / 0.97Gbps (Rx)

Max: 1.6Gbps (Tx) / 1.2Gbps (Rx)

Test setup:• Redis Labs Enterprise

Cluster v3.2• Dell Xeon CPU E5-

2670 v3 @ 2.50GHz• 4x Samsung NVMe

PM1725• Memtier benchmark-

open source tool• 100B object size• 80% read• 20% write

Throughput – ops/sec

Latency – msec

Disk Bandwidth – MB/sec

NW Bandwidth – Gb/sec

>2M Ops/sec, <1 ms latency, > 1GB disk bandwidth

Page 39: Running Analytics at the Speed of Your Business

39

Customer Example : Redis on Flash

• Genome dataset: 31TBs of raw data

• Optimized data set through encodingand using Redis Hashes

• Resulting data runs high speed analyses with 55GB of RAM and 4.5TB of Flash

• 97% annual savings compared to a pure RAM solution

Redis on RAM Redis on Flash

RAM Size 5TB 0.5TB

Flash size N/A 4.5TB

Serverson AWS :

21x r3.8xlarge on P8:

2x s822 LC

1yr costs $489,333 $15,677

P8 savings 97%

Page 40: Running Analytics at the Speed of Your Business

Extending Redis Analytics

40

Page 41: Running Analytics at the Speed of Your Business

41

What Can Modules Do41

• All modules are certified by Redis Labs for full compliance with OSS Redis, Redis Cloud and Redis Labs Enterprise Cluster (RLEC)

Full Text Search Enhanced JSON Graph Operations Secondary Indexes

Linear Algebra SQL Support Image ProcessingN-Dimension

Queries …

Page 42: Running Analytics at the Speed of Your Business

4242

3.152.40

21.00

8.70

24.57

10.61

0.00

5.00

10.00

15.00

20.00

25.00

30.00

Full text search Prefix search

Average Latency (msec)

RLEC Elasticsearch Solr

20,045

6,831

690

3,686

621

3,133

0

5,000

10,000

15,000

20,000

25,000

Full text search Prefix search

Ops/sec

RLEC Elasticsearch Solr

85% higher

32x higher

7.8x faster 4.1x faster

redisearch

The world fastest text search engine

Page 43: Running Analytics at the Speed of Your Business

43

Redis Module Hub (www.redismodules.com)

Page 44: Running Analytics at the Speed of Your Business

44Redis Labs proprietary & confidential information

Next Steps

Learn More:

Redis with Spark: https://redislabs.com/solutions/spark-and-redis

Redis on Flash : https://redislabs.com/solutions/redis-for-very-large-datasets

Redis Modules : www.redismodules.com

44

Page 45: Running Analytics at the Speed of Your Business

Home of Redis

Questions?

@socialeena