ScaleDB: Persistence for Stream Data

  • Upload
    errin

  • View
    73

  • Download
    0

Embed Size (px)

DESCRIPTION

ScaleDB: Persistence for Stream Data. ScaleDB: Big Fast Data w/ MariaDB. In-Memory SAP HANA BigQuery. High-Velocity / Disk. ScaleDB. Data Velocity (Driven by Performance). Disk MariaDB , Oracle, SQL Server, etc. Disk Hadoop. Data Volume (Driven by Cost – DRAM vs. Disk). Demo. - PowerPoint PPT Presentation

Citation preview

Template

ScaleDB: Persistence for Stream Data

#1ScaleDB: Big Fast Data w/MariaDBData Velocity(Driven by Performance)Data Volume(Driven by Cost DRAM vs. Disk)In-MemorySAP HANABigQuery

DiskMariaDB, Oracle,SQL Server, etc.

ScaleDB

DiskHadoop

High-Velocity / Disk#DemoPayment Table P.K. * FK: Account, Time, * Fields: Store, Amount, CouponInsertsLookup by Primary KeyLookup by Account (Foreign Key)Complex queries - BI & analytics#Demo Copyright 2014 ScaleDB. The information contained herein is subject to change without notice.#ScaleDBs Solution1M Inserts/Second (indexed) with Simultaneous QueriesCommodity Cloud Instance Total: 6 Nodes, 48 cores, 0.2TB main memory~1M inserts/second, cost is less than $15,000SAP HANA (In memory DBMS)Cluster total: 100 Nodes, 4,000 cores, 100TB of main memory1.5M inserts/second (Vishal Sikka, SAP TechED)In Memory: DRAM cost alone is ~ $2M

More Than 2 Orders of Magnitude Cost Advantage

#Data Volumes are Exploding

Tweets per DayiPhone DownloadsAWS S3 & Dropbox Data ObjectsDriven by new data sources and data types

Devices Social Log FilesAnalytics Business#Faster Insights = More Value0 ms

Milliseconds to minutesLater. Possibly much laterResponse Latency

Twitter Storm

HigherLowerValue of the Data to Users/Advertisers

(Complements Kinesis, Storm, etc.)#Big Data Fast DataPools of Data at RestBatch (programmatic) ProcessingHadoop

Real-Time DataAd Hoc (SQL) ProcessingScaleDB & Stream Processors

Twitter Storm

MillWheel

BigQuery#Hadoops Batch ProcessingMapReduce technologies are good at handling large volumes of data. But they are fundamentally batch-based, and struggle with enabling real-time decisions on a never-endingand never fully completestream of data.

Terry HanoldVice President of New Business InitiativesAmazon AWS#Fast Data: The Car Metaphor

Limited View / Real-Time DataNo Historical ViewHistorical ViewBatch LagReal-Time DataHistorical ViewSQL Support#DRAM Too Expensive for Stream DataMedia Costs Based upon Data Volume (DRAM vs. Disk)

This is why Amazon uses disk-based S3 (non-DBMS) for Kinesis1M inserts/second (100 byte rows), 24 hours = >8.5 TB/DayDisk Media Cost = ~ $370DRAM Media Cost = ~ $172,800 (>450X more)1TB10TB100TB1 PetabyteDRAM$20,000$43$200,000$430$2,000,000$4,300$20,000,000$43,000Disk#But Data Volumes Increase 78% CAGRAccording to IDC1 and Gartner2 data volumes have been measured to increase ten-fold every five years. 1. Gantz, John F. The Diverse and Exploding Digital Universe: An Updated Forecast of Worldwide Information Growth Through 2011. Tech. An IDC White Paper

2. Paquet, Raymond. Technology Trends You Cant Afford to Ignore. Lecture. Gartner Webinar. Gartner.com. Gartner Inc., Jan. 2010. #In-Memory & Big DataData Volume Growth Dramatically Outpaces DRAM AffordabilityYearsYearsIncrease Multiplier (Volume/Affordability)Increase Multiplier (Volume/Affordability)#ScaleDB: Big Fast Data w/MariaDBData Velocity(Driven by Performance)Data Volume(Driven by Cost DRAM vs. Disk)In-MemorySAP HANABigQuery

DiskMariaDB, Oracle,SQL Server, etc.

ScaleDB

DiskHadoop

High-Velocity / Disk1,000,000 Inserts per secondBigQuery Cost: $86,400/dayScaleDB Cost*: $46/day* AWS: $28 for 8.4TB storage, $18 for 6 instances of heavy usage EBS optimized#How it Works Copyright 2014 ScaleDB. The information contained herein is subject to change without notice.#Scaling the DatabaseStorage InstanceDBMSInstanceMariaDB

MyIsamInnoDB

Data

StorageMariaDB

ScaleDB

ScaleDB#Scaling the Database TierStorage InstanceStorage InstanceDBMSInstanceDBMSInstanceDBMSInstanceDBMSInstanceClusterManager#Scaling the Storage TierClusterManagerStorage InstanceStorage InstanceStorage InstanceStorage InstanceStorage InstanceDBMSInstanceDBMSInstanceDBMSInstanceDBMSInstance#High-AvailabilityClusterManagerStorage InstanceStorage InstanceStorage InstanceStorage InstanceStorage InstanceDBMSInstanceDBMSInstanceDBMSInstanceDBMSInstanceMirroredVolumes#NoSQL v. MySQLFunctionNoSQLScaleDBTransactionsNoYesJoinsNoYesData ConsistencyNo (Eventual)YesSQL SupportNoYesACID CompliantNoYesMature Ecosystem (e.g. MySQL tools, apps, developers)NoYesOptimal for Analytics / BI / ReportingNoYesDisk-Based Insert Performance25,000-40,000/second1,000,000/secondIdeal Use CaseStoring/Accessing Individual ObjectsProcessing LargeQuantities of Data#Push-Down: Distributed Parallel ProcessingScaleDBStorageScaleDBStorageScaleDBScaleDBStorageQueryQueryQueryQueryResponseResponseResponseResponsePush Processing to the DataResult: High-PerformanceParallel ProcessingSimilar to Map/Reduce

MariaDB#21Customer Success Story Copyright 2014 ScaleDB. The information contained herein is subject to change without notice.#

Customer Success Story: Statricks

Target:300M-450M Listings per DayFrom: eBay, Craigslist .Processing:Price trendsListing LongevitySpam DetectionAd MetricsPrice Trend Time SeriesStatistical Analysis#Thank You Copyright 2014 ScaleDB. The information contained herein is subject to change without notice.#