If you can't read please download the document
Upload
errin
View
73
Download
0
Tags:
Embed Size (px)
DESCRIPTION
ScaleDB: Persistence for Stream Data. ScaleDB: Big Fast Data w/ MariaDB. In-Memory SAP HANA BigQuery. High-Velocity / Disk. ScaleDB. Data Velocity (Driven by Performance). Disk MariaDB , Oracle, SQL Server, etc. Disk Hadoop. Data Volume (Driven by Cost – DRAM vs. Disk). Demo. - PowerPoint PPT Presentation
Citation preview
Template
ScaleDB: Persistence for Stream Data
#1ScaleDB: Big Fast Data w/MariaDBData Velocity(Driven by Performance)Data Volume(Driven by Cost DRAM vs. Disk)In-MemorySAP HANABigQuery
DiskMariaDB, Oracle,SQL Server, etc.
ScaleDB
DiskHadoop
High-Velocity / Disk#DemoPayment Table P.K. * FK: Account, Time, * Fields: Store, Amount, CouponInsertsLookup by Primary KeyLookup by Account (Foreign Key)Complex queries - BI & analytics#Demo Copyright 2014 ScaleDB. The information contained herein is subject to change without notice.#ScaleDBs Solution1M Inserts/Second (indexed) with Simultaneous QueriesCommodity Cloud Instance Total: 6 Nodes, 48 cores, 0.2TB main memory~1M inserts/second, cost is less than $15,000SAP HANA (In memory DBMS)Cluster total: 100 Nodes, 4,000 cores, 100TB of main memory1.5M inserts/second (Vishal Sikka, SAP TechED)In Memory: DRAM cost alone is ~ $2M
More Than 2 Orders of Magnitude Cost Advantage
#Data Volumes are Exploding
Tweets per DayiPhone DownloadsAWS S3 & Dropbox Data ObjectsDriven by new data sources and data types
Devices Social Log FilesAnalytics Business#Faster Insights = More Value0 ms
Milliseconds to minutesLater. Possibly much laterResponse Latency
Twitter Storm
HigherLowerValue of the Data to Users/Advertisers
(Complements Kinesis, Storm, etc.)#Big Data Fast DataPools of Data at RestBatch (programmatic) ProcessingHadoop
Real-Time DataAd Hoc (SQL) ProcessingScaleDB & Stream Processors
Twitter Storm
MillWheel
BigQuery#Hadoops Batch ProcessingMapReduce technologies are good at handling large volumes of data. But they are fundamentally batch-based, and struggle with enabling real-time decisions on a never-endingand never fully completestream of data.
Terry HanoldVice President of New Business InitiativesAmazon AWS#Fast Data: The Car Metaphor
Limited View / Real-Time DataNo Historical ViewHistorical ViewBatch LagReal-Time DataHistorical ViewSQL Support#DRAM Too Expensive for Stream DataMedia Costs Based upon Data Volume (DRAM vs. Disk)
This is why Amazon uses disk-based S3 (non-DBMS) for Kinesis1M inserts/second (100 byte rows), 24 hours = >8.5 TB/DayDisk Media Cost = ~ $370DRAM Media Cost = ~ $172,800 (>450X more)1TB10TB100TB1 PetabyteDRAM$20,000$43$200,000$430$2,000,000$4,300$20,000,000$43,000Disk#But Data Volumes Increase 78% CAGRAccording to IDC1 and Gartner2 data volumes have been measured to increase ten-fold every five years. 1. Gantz, John F. The Diverse and Exploding Digital Universe: An Updated Forecast of Worldwide Information Growth Through 2011. Tech. An IDC White Paper
2. Paquet, Raymond. Technology Trends You Cant Afford to Ignore. Lecture. Gartner Webinar. Gartner.com. Gartner Inc., Jan. 2010. #In-Memory & Big DataData Volume Growth Dramatically Outpaces DRAM AffordabilityYearsYearsIncrease Multiplier (Volume/Affordability)Increase Multiplier (Volume/Affordability)#ScaleDB: Big Fast Data w/MariaDBData Velocity(Driven by Performance)Data Volume(Driven by Cost DRAM vs. Disk)In-MemorySAP HANABigQuery
DiskMariaDB, Oracle,SQL Server, etc.
ScaleDB
DiskHadoop
High-Velocity / Disk1,000,000 Inserts per secondBigQuery Cost: $86,400/dayScaleDB Cost*: $46/day* AWS: $28 for 8.4TB storage, $18 for 6 instances of heavy usage EBS optimized#How it Works Copyright 2014 ScaleDB. The information contained herein is subject to change without notice.#Scaling the DatabaseStorage InstanceDBMSInstanceMariaDB
MyIsamInnoDB
Data
StorageMariaDB
ScaleDB
ScaleDB#Scaling the Database TierStorage InstanceStorage InstanceDBMSInstanceDBMSInstanceDBMSInstanceDBMSInstanceClusterManager#Scaling the Storage TierClusterManagerStorage InstanceStorage InstanceStorage InstanceStorage InstanceStorage InstanceDBMSInstanceDBMSInstanceDBMSInstanceDBMSInstance#High-AvailabilityClusterManagerStorage InstanceStorage InstanceStorage InstanceStorage InstanceStorage InstanceDBMSInstanceDBMSInstanceDBMSInstanceDBMSInstanceMirroredVolumes#NoSQL v. MySQLFunctionNoSQLScaleDBTransactionsNoYesJoinsNoYesData ConsistencyNo (Eventual)YesSQL SupportNoYesACID CompliantNoYesMature Ecosystem (e.g. MySQL tools, apps, developers)NoYesOptimal for Analytics / BI / ReportingNoYesDisk-Based Insert Performance25,000-40,000/second1,000,000/secondIdeal Use CaseStoring/Accessing Individual ObjectsProcessing LargeQuantities of Data#Push-Down: Distributed Parallel ProcessingScaleDBStorageScaleDBStorageScaleDBScaleDBStorageQueryQueryQueryQueryResponseResponseResponseResponsePush Processing to the DataResult: High-PerformanceParallel ProcessingSimilar to Map/Reduce
MariaDB#21Customer Success Story Copyright 2014 ScaleDB. The information contained herein is subject to change without notice.#
Customer Success Story: Statricks
Target:300M-450M Listings per DayFrom: eBay, Craigslist .Processing:Price trendsListing LongevitySpam DetectionAd MetricsPrice Trend Time SeriesStatistical Analysis#Thank You Copyright 2014 ScaleDB. The information contained herein is subject to change without notice.#