21
Real-time Analytics Powered by GPU-Accelerated Databases Chris Prendergast and Woody Christy GTC, May 8, 2017

Real-time Analytics Powered by GPU-Accelerated Databaseson-demand.gputechconf.com/gtc/...real-time-analytics-powered-by-gp… · Real-time Analytics Powered by GPU-Accelerated Databases

  • Upload
    others

  • View
    18

  • Download
    0

Embed Size (px)

Citation preview

Real-timeAnalyticsPoweredbyGPU-AcceleratedDatabases

ChrisPrendergastandWoodyChristyGTC,May8,2017

Wins IDCHPCInnovationExcellenceAwardforworkwithUS PostalService.

Kinetica Background

2009 2012

UnitedStatesArmyIntelligenceseeksameanstoassessterroristandothernationalsecuritythreats.

Nodatabaseinthemarketwasfastorflexibleenoughtomettheirneeds.

FoundersAmitVijandNima NegahbanstartonthepioneeringuseofGPUswhilebuildingaGPU-accelerateddatabasefromthegroundup.

2014

CommercializationenteredproductionwithUSPS.

2016

Rebranded toKinetica.Seedfunding.MovedHQtoSanFrancisco.Expandedmanagementteam.Hiredfieldteam.

Wins IDCHPCInnovationExcellenceAwardforworkwithUSArmy.

GPUdb goeslivewiththeUSArmyIntelligence.

Patentgrantedfor“Methodandsystemforimprovingcomputationalconcurrencyusingamulti-threadedGPUcalculationengine”

22

Evolution of Analytics

3

SimpleReporting

StandardAnalytics Real-time Analytics MachineLearning DeepLearning

Listcustomer energyconsumptioninthepast3years

Whatistheaverageconsumption byregionmonthly? Perhousehold?Residentialvs.Commercial?

Whatisthecurrentenergyconsumption byaregion/household?Howdoesthatcomparetohistoricaverages?How doesitcomparetootherregions?

Givenlocation,history,demographic,,usage,whatisthelikelihood ofserviceissues/outage?

Deducefromunspecifiedsignalsacrossawiderangeofdatasetsthelikelihoodthiscustomerwillconsumemore/lessenergy?Haveserviceinterruption?

GPUAcceleration

GPUAccelerationOvercomesProcessingBottlenecks

4

4,000+coresperdeviceinmanycases,versus16to32coresper

typicalCPU-baseddevice.

HighperformancecomputingtrendtousingGPU’stosolve

massiveprocessingchallengesGPUaccelerationbringshighperformancecomputetocommodityhardware

Parallelprocessingisidealforscanningentiredataset&bruteforcecompute.

GPUsaredesignedaroundthousandsofsmall,efficientcoresthatarewellsuitedtoperformingrepeatedsimilarinstructionsinparallel.Thismakesthemwell-suitedtothecompute-intensiveworkloadsrequiredoflargedatasets.

Kinetica:ADistributed,In-MemoryDatabase

5

GPU-accelerateddatabaseoperations

Naturallanguageprocessingbasedfull-textsearch

NativeGISandIP-addressobject

support

Realtimedatahandlerstoingeststructuredand

unstructureddata

Deepintegrationwithopensourceandcommercial

frameworksandapplications:Hadoop,Spark,NiFi,Accumulo,H20,Tableau,Kibana andCaravel

Predictablescaleoutfordataingestionand

querying

Notypicaltuning,indexing,andtweaking

Distributedvisualizationpipelinebuiltin

Kinetica:UniqueStrengths&Capabilities

Fast,Distributed,OLAPEngineforFastMoving,LargeScaleData

6

OLAPPerformance,Scalability,Stability

GeospatialProcessing&Visualization

APIforGPUPoweredData&ComputeOrchestration

ConvergedAIandBI

NativeGeospatialandVisualizationPipeline

FastData

In-DatabaseAnalytics

InteractiveLocation-BasedAnalytics

DatabaseorCachesystemservinguppre-computedaggregates

Italsotakesalotofefforttore-computeaggregatesandtoloadtheservingdatabaseorcache

Whatisthemainproblem?

ChallengeswithLambdaandKappaArchitectures

7

PerformanceBI

0.09s

2.5s

Query2:Sumaggregationwithasubqueryaggregationjoiningbothtables

LARGETELCO

LeadingEnterpriseDatabase

8

345s

44s

0.65s

0.68s

CASESTUDY

LeadingEnterpriseDatabase

Query1:Simpleaveragecalculationonthe1.8Browtable

Real-Time,AdvancedAnalytics,SpeedLayerforTeradataorOracle

9

Parallelingestionofevents

Lambda-typearchitectureforTeradataorOracle

Kineticaisspeedlayerwithreal-timeanalyticcapabilitiesformillisecondSLAs

ConvergeMachineLearning,DeepLearning,NLP,streamingandlocationanalyticsandfastQuery,Reporting&AnalyticswithKinetica&Teradata/Oracle

DATAINMOTIONANDREST

DATAWAREHOUSE/TRANSACTIONAL

AmazonKinesis

ANALYSTS

MOBILEUSERS

DASHBOARDS&APPLICATIONS

ALERTINGSYSTEMS

KineticaConnectors

STREAM/ETLPROCESSING

FastGPUaccelerated,in-

MemoryDatabaseConvergeML,DL,

Streaming,Location,and

QR&A

SpeedLayerforHadoop

10

ParallelIngestion

Parallelingestionofevents

Kineticaisspeedlayerwithreal-timeanalyticcapabilities

HDFSforarchivalstore

Muchloosercouplingthantraditionallambdaarchitecture

BatchmodeSparkorMRjobscanpushdatatoKineticaasneededforfastqueryondataloadedfromHDFS

EVENTS

MESSAGEBROKERS

AmazonKinesis

ANALYSTS

MOBILEUSERS

DASHBOARDS&APPLICATIONS

ALERTINGSYSTEMS

Put,get,scan

Executecomplexanalyticsonthefly

KineticaConnectors

STREAMPROCESSING

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

HDFS(HadoopDistributedFileSystem)

• Noneedtoregularlyrecompute aggregates.

• Noneedtoloadandmanageaseparateservingsystemorcachetomakedeephistoricalaggregatesavailabletoyourstreamprocessingcode.

• Aggregatesarealwaysuptodate,astheyarecomputedondemand;thelatesteventsarealwaysincluded

• Betterperformancewithsignificantlyreducedoperationalcomplexity,hardwarefootprintandcost.

SIMPLIFYYOURARCHITECTURE

STREAMINGANALYTICS,SIMPLIFIED

EVENTS

MESSAGEBROKERS

AmazonKinesis

ANALYSTS

MOBILEUSERS

DASHBOARDS&APPLICATIONS

ALERTINGSYSTEMS

PUT,GET,SCAN

ExecutecomplexanalyticsontheflyKinetica

Connectors

STREAMPROCESSING

INTELLIGENCE:USArmy- INSCOM

USArmy’sin-memorycomputationalengineforanydatawithageospatialortemporalattributeforamajorjointcloudinitiativewithintheIntelligenceCommunity(ICITE).

Intelanalystsareabletoconductnearreal-timeanalyticsandfuseSIGINT,ISR,andGEOINTstreamingbigdatafeedsandvisualizeinawebbrowser.

Firsttimeinhistorymilitaryanalystsareabletoqueryandvisualizebillionstotrillionsofnearreal-timeobjectsinaproductionenvironment.

Majorexecutivemilitaryandcongressionalvisibility.

OracleSpatial(92Minutes)

42xLowerSpace28xLowerCost38xLowerPowerCost

U.SArmyINSCOMShiftfromOracletoGPUdb

GPUdb(20ms)

1GPUdbservervs42serverswithOracle10gR2(2011)

CASESTUDY: LOCATIONBASEDANALYTICS

LOGISTICS:Routeoptimization

DISTRIBUTEDANALYSIS

ATSCALE200,000USPSdevicesemittinglocation eachminuteà250+millioneventscapturedandanalyzeddaily…......trackedon10nodes.

USPSisthesinglelargestlogisticentityinthecountry,movingmoreindividualitemsinfourhoursthanthecombinationofUPS,FedEx,andDHLmoveallyear.

CASESTUDY: LOCATIONBASEDANALYTICS

15,000simultaneoussessions

PREDICTIVEINFRASTRUCTUREMANAGEMENT

15

Kineticaoperatesasaspeed-layerwithESRItomonitor,manage,andpredictinfrastructurehealth.

LARGEUTILITYCOMPANY

CASESTUDY: LOCATIONBASEDANALYTICS

LOGISTICS&FLEETMANAGEMENT

16

Kineticaenablesagiletrackingofshipmentstoassiststoremanagersfortrackingofinventoryandarrivaltimes.

• Visibilityandtrackingofdeliveries&trucksforstoremanagers

• ETA&Notifications– Provideestimatedtimeofdelivery,notificationsandcustomlocationbasedalerting

• RouteOptimizationbasedontrucksize,andifcargoisperishableorcontainshazardousmaterials.

LARGERETAILER

CASESTUDY: LOCATIONBASEDANALYTICS

PIPELINE&WELLANALYTICS

17

Kineticaenablesinteractivequeryandgeospatialvisualizationoflargenumbersofupstreamandmidstreamassets.

• Complexjoinsacrossseveraltableswith300mrowsofdata.Approx 100GBinsize.

• Createcustomvisualizations,charts.

• Visualizationofwellsbylandownership,region,etc.

ENERGYRESEARCH

CASESTUDY: LOCATIONBASEDANALYTICS

LIFESCIENCES:GENOMICSRESEARCHCASESTUDY:ADVANCEDIN-DATABASEANALYTICS

18

GPU-accelerationonKineticaenablesprocessingoftranscriptomicstorunsimulationsfordrugresearch.

• Seekingoutsignalsfrommassivecollectionofdrugtargetscombinedwithhistoricaldata.

• Acceleratesimulationsofchemicalreactions.

• In-databaseprocessingtodevelopmodels,leveragingGPUaccelerationforperformance,anddirectaccesstoCUDAAPIsviaUDFsdeployedwithinKinetica.

OneofthethingsIlikeaboutKineticaisitgivesusmoreofageneral-purposeuseofthetechnology.Therehasbeenalotofsoftwarecreatedtoanswercertainquestions[but]highlyspecializedtoolshavelimitedfunctionalityandaretunedtodoacertainworkload.

"MarkRamsey,ChiefDataOfficeratGSK

RISKMANAGEMENT

19

Largefinancialinstitutionmovescounterpartyriskanalysisfromovernighttoreal-time.

• DatacollectedbyXVAlibrarywhichcomputesriskmetricsforeachtrade

• Riskcomputationsarebecomingmorecomplexandcomputationallyheavy.xVA analysisneedstoprojectyearsintothefuture.

• Kineticaenablesbankstomovefrombatch/overnightanalysistoastreaming/real-timesystemforflexiblereal-timemonitoringbytraders,auditorsandmanagement.

MULTINATIONALBANK

CASESTUDY:ADVANCEDIN-DATABASEANALYTICS

FasterAnalyticsonInventoryandSales

0.65s

0.68s

LARGERETAILER

EnterpriseIn-MemDB

20

34s

44s

0.65s

0.68s

CASESTUDY

EnterpriseIn-MemDB

Query1:Sumofretailsalesgroupedbyregion

Query2:Sumofinventoryavailablegroupedbytype

StopbyBooth#431andGetyourFreeT-shirt

www.kinetica.com