Implementing a GPU-based Machine Learning Library on Apache

Implementing a GPU-based Machine Learning Library

on Apache Spark

James JiaPradeep KalipatnapuRichard ChiouYiheng YangJohn F. Canny, Ed.

Electrical Engineering and Computer SciencesUniversity of California at Berkeley

Technical Report No. UCB/EECS-2016-51

http://www.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-51.html

May 11, 2016

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission.

Implementing a GPU-based Machine Learning Library on Apache Spark

JamesJia

ElectricalEngineeringandComputerSciences

UniversityofCalifornia,Berkeley

TableofContents

1.1ProblemDefinition....................................................................................31.2TaskBreakdown.........................................................................................41.3SettingupEC2..............................................................................................51.4SupportingReadingandWritingfromHDFS....................................61.5ParallelizingBIDData..............................................................................................81.5.1BIDDataArchitectureOverview.........................................................................................81.5.2ExtendingtheLearner.............................................................................................................81.5.3APIforDistributedMachineLearning...........................................................................101.5.4ImplementingDistributedKMeans.................................................................................101.5.5KMeansBenchmarks..........................................................................................................11

2.1Introduction..............................................................................................142.2TrendsandMarket.................................................................................142.3IndustryAnalysis....................................................................................................162.3.1ValueChainAnalysis..............................................................................................................162.3.2Porter’sFiveForcesAnalysis.............................................................................................182.3.3ThreatofSubstitutes..............................................................................................................182.3.4BargainingPowerofSuppliers..........................................................................................182.3.5BargainingPowerofConsumers......................................................................................192.3.6ThreatofNewEntrants........................................................................................................192.3.7CompetitiveRivalry............................................................................................................20

2.4GotoMarketStrategy............................................................................20Bibliography......................................................................................................................22

1.1ProblemDefinition

OurcapstoneprojectistoportBIDDataoverontoApacheSpark,anopensource

engineusedforlarge-scaledataprocessing,soreal-worlddevelopersatcompaniessuch

asYahoo,Twitter,andFacebookcanleveragethe10xperformanceandcostbenefitsof

GPUacceleratedmachinesatalargescaleformachinelearningtasks.BIDDataisa

machinelearninglibrarythatrunsonasinglemachinethatisGPU-optimized,

outperformingothersetupsthatrunonhundredsofmulti-coreCPUmachinesdueto

theparallelstructureofmanymachinelearningalgorithms.Infact,itholdsthe

benchmarkforaplethoraofcommonalgorithmssuchask-NearestNeighbors,Random

Forests,andLatentDirichletAllocation,beatingothersetupsthatusehundredsofmulti-

coreCPUcomputers(Canny2015).

System nodes/cores nclust Time Cost Energy(KJ)

Spark 32/128 256 180s $0.45 1150

BIDMach 1 256 320s $0.06 90

Spark 96/384 4096 1100 $9.00 22000

BIDMach 1 4096 735 $0.12 140

RunningKMeansontheMNIST-8M(25GB)dataset(Canny2015)

However,runningBIDDataonasinglemachinecanonlyprocessdataonthe

scaleofhundredsofgigabytes.Forcomparison,manycompanies,likeYahoo,process

petabytesofdataeveryday(Feng2013).DevelopersthatwanttoscaleupBIDDatato

matchtheirdataprocessingneedshavetohandlealloftheproblemsthatarisewhen

creatingascalable,distributedplatform,suchasincompatiblelibrarydependenciesand

datastorageandsynchronizationissueswiththeirdatabasefiles.However,by

integratingBIDDataintoApacheSpark,wecaneliminatethisoverheadsodevelopers

canfocusonperformingactualanalytics.

1.2TaskBreakdown

Wesplitourprojectintotwomajorcomponents.PradeepandYihengfocusedon

thefirstcomponent:tocreateanautomatedbuildmanagementsystemforrunning

BIDDataonmultipleplatforms.Currently,developersthatwanttouseBIDDatatorun

machinelearningalgorithmsontheGPUneedtobuildthelibrariesthemselves.This

representsahugebarriertoentryasmanyalternativemachinelearninglibrarieshave

releasedpre-builtversionsthatsimplifythedeploymentprocess.However,asPradeep

andYihengwilldiscussingreaterdetailintheirreports,establishinganautomatedbuild

managementsystemforBIDDataiscomplicatedbythefactthatwearesupporting

multipledifferenthostarchitecturesandeachofthesearchitectureshavedistinctnative

librariesuponwhichBIDDatadepends.Meanwhile,RichardandIworkedonthesecond

component:toportBIDDataontopofApacheSparkandtheHadoopDistributedFile

System(HDFS).SinceBIDDatawillnowbereadingandwritingtoHDFSinsteadsimply

writingtodisk,weneedtoextendBIDData’sexistingAPIthisnewrequirement.

Furthermore,wealsoneedtoimplementseveralparallelmachinelearningalgorithmsin

ordertodocumenttheperformanceofrunningBIDDataontopofSparkandcompare

againstotheralternatives.Richardworkedonimplementingdistributedlogistic

regressionandIworkedonextendingsupporttoreadingandwritingfromHDFSaswell

asimplementedthedistributedK-Meansalgorithm.

1.3SettingupEC2

Myfirstprioritywastosetupasampledistributedsystemasamock

environmentthatwecoulduseforfuturedevelopment.Withoutsuchanenvironment,

wewouldnotbeabletobenchmarktheparallelvariantsofcommonmachinelearning

algorithmstocompareagainsttheserialversions.WechosetouseAmazonEC2asitis

thepredominantcloudcomputingplatformusedinbothacademiaandindustryto

powerdistributedcomputation.ThisentailedsettingupApacheSparkonthemaster

andslavenodeswiththeappropriateconfigurationandhookingSparkuptotheHadoop

DistributedFileSystem(HDFS)inwhichwehouseourdata.TheBIDDatalibrariesand

ApacheSparkbothdependonnativelibrariesthatarespecifictothemachinethatone

isrunningon.ThismeantthatrunningtheBIDDatalibrariesontopofSparkrequires

additionalconfigurationstepsbeyonddownloadingandbuildingSparkandHadoopon

theEC2instances–weneededamechanismtospecifyallthelibrarymodulesthat

BIDDatarequirestobedistributedacrossallthenodesinourcluster.Traditionally,users

whowanttorunBIDDataonasinglemachinerunashellscriptthatdownloadsallthe

librarydependencies.TheSparkguideonconfigurationanddeploymentrecommended

addingintheappropriatelibrarypathstothedefaultconfigurationfilelocatedin

SPARK_HOME/confs/spark-defaults.conf,soIcreatedashellscriptthatdeploysthe

librariesfromthemasternodetotheslavenodesandincludedthefilepathinthe

configurationfileonallnodes.Althoughthissolutionisnotrobusttochangesinlibrary

dependenciesfromversionupgradesinSparkorBIDData,itremovedthebottleneck

thatpreventedusfromparallelizingourworkdistribution.Sincewealsowanttoallow

developerstobuildSparkwithBIDDataforthemselves,Idocumentedthedeployment

processaswellasthecommandlineinvocationstoimporttheBIDDatalibrarieswhen

runningSparkasastandalonecluster.

1.4SupportingReadingandWritingfromHDFS

BIDDatahasitsownspecializedmatrixdatatypesthatcharacterizethedifferent

primitivesthatarestoredwithinthematrixaswellastherepresentationofthematrix

itself(sparseordense).Astheabstractionsforsavingandloadingmatricesarethe

same,Iwilldetailtheprocessforsavingamatrixtodiskforbrevity.Currently,tosavea

matrixtodisk,theBIDMachAPIexposesanoverloadedsaveMatfunctionthat,at

runtime,callstheappropriatesavefunctionbasedonthematrixtype,writingittoan

OutputStream.InordertosupportreadingandwritingfromtheHadoopDistributed

FileSystem,weneededtorevampthematrixI/Oroutinestobebasedon

DataInputStreamsandDataOutputStreamsclassesratherthantheircurrentgeneric

variantsandimplementawrapperoverthesecustommatrixdatatypesinordertoallow

Hadooptoserializethedatafortransmissionacrossthenetwork.

ToupdatethematrixI/Oroutines,ProfessorCannyrewrotethelayerthatwrites

thematrixdatatodisktooperateonDataOutputStreamsinsteadofOutputStreams.

Thisensuresthattheunderlyingdataisformattedinaplatformindependentwayand

adherestoHDFS’sabstractionforI/O.Ithenaddedamiddlelayerthatchecksthe

matrix’sfilepath.IfthepathisaddressedtotheHDFS,IinvokethenewHDFSfunction

towrapthematrixinaserializableformatforHadoop.

1.5ParallelizingBIDData

1.5.1BIDDataArchitectureOverview

WithintheBIDDataarchitecture,theModelclassisanabstractionthat

implementsthespecificmachinelearningalgorithm.Forexample,thereisaKMeans

modelthatimplementstheKMeansmodelupdateandpredictionmethods.The

DatasourceandDatasinkclassesencapsulatethelogicforreadingfromandwritingto

thevariousdataformatsthatwesupport.ProfessorCannycreatedadatasourcecalled

IteratorSourcedesignedtoworkwiththeIteratorsclassprovidedbySpark.Finally,the

Learnerclassorchestratestheflowofmodeltrainingandprediction.Ititeratively

updatesthemodelandoutputsthepredictionstothedatasink.Thelearneralsohasa

referencetoanoptionsclass,whichasthenameimplies,storestheconfigurationsthat

thedeveloperprovide.Thedevelopercreatesaninstanceofalearner,passinginthe

datasource,thedatasink,andthemodelthatisappropriatefortheirmachinelearning

task,trainsthelearnerusingasubsetofthedata,andmakesapredictionwiththe

remainingdata(Canny2015).

1.5.2ExtendingtheLearner

Thelearnerclasspreviouslyassumedthatthemodelisrunonasinglemachine,

andthusrunsthroughtheiterationsoftrainingallatonceuponinvocation.However,

val (mm, opts) = KMeans.learner(data_path) mm.train

val (pp, popts) = KMeans.predictor(mm.model, test_data_path) pp.predict

SamplecodeforrunningKMeansonasinglemachine(Canny2015)

whenrunningBIDDataonadistributedplatform,weneedtosynchronizethemodels

withinthelearnersthatarerunningontheseparateexecutorsaftereachpassthrough

thedataset.

Iabstractedawaythelogicthathandlesonepassthroughthedatasetintotwo

functions,firstPassandnextPass,andinsteadhavetraincalltheappropriatepass

functionbasedonthecurrentiterationindex.Thismorefine-graincontroloverthe

learnerlogicletsthedistributedlearnertoperformmodelsynchronizationaftereach

passthroughthedataset,whilestillallowingthesingle-machinevarianttoiterate

throughallatonce.ThedistinctionbetweenfirstPassandnextPassisbasedonthefact

that,incertainalgorithms,thefirstpassthroughthedatasetisfundamentallydistinct

fromtheremainingpasses.Asanexample,theK-Meansalgorithminstantiatesthe

centroidclustersinthefirstpass,whereastheremainingpassesimprovethecentroids.

Comparisonbetweeniterativetraininganddistributedtraining

1.5.3APIforDistributedMachineLearning

Wedistributeourmodelandourdataacrosstheexecutors.Onebenefitofthis

approachisthatwecantrainmodelsthatdonotfitintothememoryofasingleGPU.

However,thisalsobringsadditionalchallengesofneedingtosynchronizethemodel

aftereachpassthroughthedataset.Atahighlevel,wewanttocreatealearnerforeach

executoranditeratethroughthedatathatexistslocallyonthatexecutoroneachpass,

synchronizingthelearner’smodelbetweenthepasses.Althoughwelaterwantto

utilizeKylix,abutterflyall-reducecommunicationnetworktosynchronizethemodelfor

optimumnetworkperformance,IcurrentlyuseSpark’simplementationoftreeReduce

asabaselineforcomparison.

Icreatedanapplicationprogramminginterface(API)thatabstractsawaythe

implementationdetailsforrunningBIDDataontopofSpark.TheAPItakesinseveral

parameters:thesparkcontextofclassSparkContext,thelearnerofclassLearner,and

thedataofclassRDD[(SerText,MatIO)]onwhichtorunthelearner,andthenumberof

executorsinthecluster.TheSparkContextvariableisrequiredtooperateontheRDD

abstraction,thelearneranddataarenecessarytorunthemachinelearningalgorithm

onthedataset,andthenumberofexecutorsispassedinsotheprogramcanload

balanceacrosstheexecutors.

1.5.4ImplementingDistributedKMeans

TotestouttheframeworktorunmachinelearningalgorithmsonSparkusing

BIDData,IimplementedadistributedvariantoftheexistingKMeansalgorithminthe

BIDDatamachinelearninglibrary.Thisrequiredaddingageneralreductionoperator,

combineModels,totheModelclass,whichtheKMeansmodeloverrideswithitsmodel-

specificreductionoperation.Sincethefirstiterationofamodelupdatemaybe

differentfromtheotheriterations,thefunctionalsotakesintheiterationindex.

InthefirstiterationofdistributedKMeans,wewanttorandomlysamplen

clustersfromourdatasetwithequalprobability.Idosobycountingthenumberofdata

pointsthathavebeenprocessedbyeachmodel,andsamplenclustersfromthetwo

models’clusterswithprobabilityproportionaltothenumberofprocesseddatapoints.In

theotheriterationsofmodelreduction,Iaddupallthecenters’featuresandaverage

thempost-reduction.

1.5.5KMeansBenchmarks

256clusters,batchsizeof10000,10epochs

Total Time Per Epoch

Sequential BIDMach 349 seconds 37 seconds

Distributed BIDMach with 16 clusters

45.6 seconds 1.7 seconds

Spark with 16 clusters 263.1 seconds 26 seconds

5000clusters,batchsizeof10000,10epochs

Total Time Per Epoch

Sequential BIDMach 1605 seconds 169.2 seconds

Distributed BIDMach with 16 clusters

235 seconds 14.4 seconds

Spark with 16 clusters 5779 seconds 576 seconds

ThesebenchmarkswereobtainedfromrunningonAWSg2.2xlargeinstances,eachwith

one1,5360CUDAcoresNVIDIAGPUandeightIntelXeonprocessors(Amazon2016).

FromthebenchmarksweseethatrunningBIDMachonSparkvastlyoutperformsboth

BIDMachrunningonasinglemachineaswellasSparkrunningonthesameEC2setup.

Asexpected,thereissomeoverheadforreducingthemodelattheendofeachepoch

andredistributingtheupdatedmodelbacktotheworkersatthebeginningofthe

followingepoch.

Chapter 2: Engineering Leadership

JamesJia

PradeepKalipatnapu

RichardChiou

YihengYang

2.1Introduction

Asdatastoragebecomesincreasinglycommoditized,companiesarecollecting

transactionalrecordsontheorderofseveralpetabytesthatarebeyondtheabilityof

typicaldatabasesoftwaretoolstostoreandanalyze.Analysisofthis“bigdata”canyield

businessinsightssuchascustomerpreferencesandmarkettrends,whichcanbring

companiesbenefitssuchasnewrevenueopportunitiesandimprovedoperational

efficiency(Manyikaet.al2011).Toanalyzethisbigdata,companiestypicallybuildor

usethirdpartymachinelearning(ML)tools.

ProfessorJohnCannyofUCBerkeley’sEECSdepartmenthasdevelopedBIDData,

asetofGPU-basedMLlibrariesthatiscapableofcompletingcommonMLtasksan

orderofmagnitudefasterthanrivaltechnologies,whenrunonasinglemachine.

However,most“bigdata”tasksrequiregreatercomputationalpowerandstoragespace

thanthatofferedbyasinglemachine.

OurcapstoneprojectintegratesBIDDatawithApacheSpark,afast,open-source

bigdataprocessingframeworkwithrapidadoptionbythesoftwareindustry.Integration

ofBIDDatawithSparkwillenablemorecomplexbigdataanalysisandgeneratetime,

energy,andcostsavings.

2.2TrendsandMarket

Themarketopportunityforbigdataisastronomical.Duetotheconvenienceof

consumerelectronics,moreandmoredailyservicessuchasbankingandshoppingare

beingconductedonline.Thesetransactionsgenerateagargantuanamountofuserand

marketdatathatcompaniescanbenefitfrom.Asanexample,thesocialnetworkingand

searchengineindustriesoftenusemachinelearninginordertoincreasetheirprofitson

sellingadvertisingspacetovariouscompanies,optimizingthepricingmodelforeachad

spotaswellasdeterminingthebestadvertisementplacementtomaximizethe

likelihoodofuserclicks(Kahn2014:7).Determiningtheoptimalpricingandplacement

strategyisespeciallypivotaltoGoogleandYahoo’soperations,sincemostofthesead

spacesarepaidforthroughapay-per-clickmodelandconstitute98.8%ofthetotal

revenueof$11.2billionin2014(Kahn2014:14).

Processinggiganticamountsofinformationwithinsub-secondtimeintervalsis

computationallydemanding.Moderncomputerprocessingunits(CPUs)canprocess

dataat1billionfloatingpointoperationspersecond,buttypicalbigdatasetsareonthe

orderofquadrillionsofbytes(Intel2016).TraditionalCPUswouldtakehourstoprocess

asinglebigdataset,andwillbecomeincreasinglyinefficientasdatasetsizescontinues

togrow.Thus,thereexistsagreatopportunityforcompaniesthatfocusoncost-

efficientdataanalyticstoolswhichprovideeasyintegrationandprocessdataquickly.

Forinstance,McKinseyGlobalInstituteestimatesthatretailersthatuseefficientbig

datatoolscouldincreasetheiroperatingmarginsbymorethan60percent(Manyikaet.

al2011).

Recenttechnologytrendsshowthatinordertoacceleratethespeedofbigdata

MLalgorithms,CPUtechnologyisbeingreplacedbythegraphicprocessingunit(GPU).

BecauseGPUsareoptimizedformathematicaloperations,theyareordersof

magnitudesfasterthanCPUsontasksrelatedtobigdataanalysis,andbothindustryand

academiaaremovingtowardsusingGPUs(LopesandRibeiro2010).AsaGPU-basedML

library,BIDDataalsooffersnumerousimprovementscomparedtorivaltechnologies.In

termsofsingle-machineprocessingspeedandelectricityexpenditure,BIDDataalready

beatstheperformanceofothercompetitors,includingdistributedMLlibraries,byan

orderofmagnitude,assumingthedatasetcanbehousedunderasinglemachine(Canny

2015).

However,single-instanceMLlibrariessuchasBIDDataarecurrentlynotwidely

usedinindustry,astheylackthescalabilitytohandleincreasingsizesandcomplexities

ofbigdatasets.Instead,mostcompaniesareturningtowardsefficientdistributed

computingplatformstoprocessthedata(Lowet.al2012).Thus,ourcapstoneproject

aimstointegrateBIDDatawiththedistributedframeworkofSpark,aleadingmachine

learningsoftwareinthemarkettoday.Moreover,theprojectalsoincorporatesAmazon

WebServicesforbigdatastorage,anotherplatformwhichmanycompaniesarealready

leveragingtoday(Amazon2016).Byusingthisunderlyinginfrastructure,BIDDatais

morelikelytobeviewedasahighlydesirablebigdataanalysistoolbythemarket.

2.3IndustryAnalysis

2.3.1ValueChainAnalysis

Figure1:Theabovevaluechaincovershowusersandcompaniesalikebenefitfrombigdata.Asconsumersuseproducts,technologycompaniesareabletocollectdataontheirhabitsandpreferences.Analyticsfirmsandthird-partytoolsprocessthisdataandselltheirfindings.Ultimately,bigdataanalyticscanleadtoimprovedproductsandbetteruserexperiences.

Inthebigdataindustry,firmsprocessinguserdatatogaininsightsintocustomer

habitsandpreferences.Thestatisticsandtrendsthattheydiscovercanbeusedto

refineexistingproductsandservicesorbesoldtoadvertisers.Forexample,userseither

buyaproduct(e.g.FitBit)orsignuptouseafreead-supportedproduct(e.g.Gmail)

fromvarioustechcompanies.Thesecompaniescollectdatafromtheirusersbasedon

theirusagepatterns:FitBitprovidesanonymized,aggregateddataforresearch

purposes,andGmailprovidesrelevantinformationonuserstotheGoogleAdsteam.

Thisdataispassedontoorganizationsthatspecializeinbigdataanalysis.Afterobtaining

insightsonthedata,theseorganizationsthenselltheirfindingsbacktothetech

companiesortofirmssuchasadvertisers.Ultimately,bigdataanalyticscanbeusedto

improveproductsanduserexperiencesforconsumers.

Whilethesebigdataanalysiscompanieshaveaccesstolotsofdata,theymay

nothavetheunderstandingortheresourcestocreateeveryappropriatetoolfor

analyzingalloftheirbigdata,whichcancomeinvariousformats.Subsequently,these

bigdataanalysisorganizationsmustturntothird-partycustomerstoprocesssomeof

theirdata.Becauseitsmachinelearninglibrariesrecordthebestpossiblebenchmarks

amongsttheirpeers,BIDDatahasastrongcaseforbecomingoneoftheseleadingthird-

partytools.Subsequently,astartupwithexpertiseinBIDDataonSparkcouldactasboth

asupplierandaconsultantforthesebigdataorganizations.

2.3.2Porter’sFiveForcesAnalysis

AccordingtoMichaelPorter,allcompaniesfacefiveforcesofcompetitionwithin

theirindustry.Despiteitsbenchmark-leadingperformances,BIDDatafacespotential

competitionfromexistingfirmsandfutureentrantsinthebigdataindustry.

2.3.3ThreatofSubstitutes

CurrentmachinelearninglibrariesmostlyrunonCPUs,butasdiscussedinthe

TrendsandMarketssection,theindustryhasshiftedawayfromthem.Asevidencedby

Netflix’srecentswitchtousingGPUs,theindustryhasnoticedthatGPU-basedsoftware

solutionsrecordhigherbenchmarksthantraditionalCPU-basedtools(Morgan2014).

Thusinthelongterm,thethreatofsubstitutesisrelativelyweak,asCPU-based

softwareisgraduallyreplaced.

2.3.4BargainingPowerofSuppliers

Typically,programmersarethemainindividualsinvolvedinthecreationof

softwarelibraries.Currently,BIDDataisopensource,allowinganyinterested

independentprogrammerstocollaborateandmakeunpaidcontributionstotheproject.

Thus,thebargainingpowerofsupplierswithregardtowagesisextremelyweak.Asan

additionalbenefit,theopensourcemodelcanpotentiallyleadtohigh-qualityproducts

atafractionofthecost(Weber2005).

2.3.5BargainingPowerofConsumers

Ontheotherhand,thebargainingpowerofconsumersinthisspaceisstrong.

AlthoughlotsofresearchonGPU-basedMLtechniqueshasbeenconductedinrecent

years,mostcomputersusedbyconsumersandcompaniesalikestillonlyuseCPUs.

Subsequently,customersaremorelikelytochoosefromawidevarietyofCPU-based

MLtoolstouse.Moreover,thebiggestandmostprofitablecustomers(e.g.Google)

havetheresourcestocreatetheirowndataanalysistoolsforinternaluse,whichcan

furtherdepressdemandforthird-partymachinelearningtoolsinthisspace.While

newercomputerscomewithGPUs,itmaytakesometimebeforeusersandcompanies

fullyinvestinandtransitiontoGPU-basedMLtools.

2.3.6ThreatofNewEntrants

Ingeneral,thesoftwareindustryhasanextremelylowbarriertoentry,asitonly

takesasingleprogrammerwithonecomputertocreatefully-functionalsoftware.

Additionally,softwareundergoeslotsofiterationsquickly.AlthoughBIDData’s

underlyingGPUtechnologyisstillrelativelynew,startupsandexistingfirmsalikeare

growingmoreinterestedinGPU-basedsolutions.Subsequently,theindustryfacesa

verystrongthreatfromnewentrants.

2.3.7CompetitiveRivalry

Becausebigdatacomesfromavarietyofsourcesandinamultitudeofformats,

consumersrequiremanydifferentMLtechniquesforanalysis.IfBIDDatadoesnot

containanimplementationofaspecificMLalgorithm,consumerscouldsimplyuse

anothertoolthatoffersit.Asaresult,thereexistsanintensefeature-basedrivalryinthe

third-partytoolsspace,inwhichseveralfirmsofferamultitudeofservices.Forinstance,

onefirmmayspecializeindataclassification,whileanotherfirmmayprovideaproduct

optimizedfordataregression.Nonetheless,thisfeature-basedrivalrycouldallowmore

playerstoco-existinthisspace.

2.4GotoMarketStrategy

AlthoughGPUaccelerationhasbeenidentifiedasapromisingdevelopment,itis

largelystillinitsinfancyandhasnotseenwidespreadadoptionacrossmultiplesectors.

Ratherthanfocusingonprofitabilityasintraditionalmodels,wewillfocusonastrategy

thathelpsusgainmarketshare.Todoso,wearegoingtotargetaparticularsubsetof

companiesthathavebigdataproblems,specificallycompaniesthathavetheabilityto

collectlargeamountsofdata,butnotnecessarilyaccesstothecomputationalpoweror

resourcestoobtainbusinessintelligencefromit.Aswehavediscussedearlier,Fitbitisa

primeexample:theyhaveagreatcapacitytogatherinformationfromtheirdevicesas

anauxiliaryeffectoftheirproduct,butitwouldrequireanextraordinaryamountof

technicalexpertiseaswellasinfrastructuretosiftthroughthevastseaofdata.

Withthisinmind,ourgotomarketstrategyhasthreemainemphases.First,we

areutilizinga“plugandplay”modelthatemphasizesfluidsoftwareintegrationto

encourageearlyadoptions.Sincewearealsogoingtoremainopensource,thiswillalso

inspiredeveloperstocontributebacktoourcodebase.Thestrongeradvocatescould

alsoservetoevangelizewithintheircompanies,givingusstrongerleverageoverour

competitors.Lastly,thenatureofourproductisinherentlyscalable.Oncewehave

writtencodereadyforproduction,thecosttohaveanadditionaldeveloperuseour

codebaseisnegligible.

Whilecustomersmaycitealackoftechnicalsupportasanobstacletoadoption,

thereexistsanopportunityforstartupslikeDatabrickstogaincustomersthroughtheir

consultingservices.Subsequently,wecouldproactivelyestablishpotentialpartnerswith

companieslikeDatabricksthatoperateonaconsultingmodel,segmentingoursolution

astheleadingGPU-acceleratedmachinelearninglibrary.

Bibliography

Amazon.(n.d.).AllAWSCaseStudies.RetrievedMarch5,2016,fromhttps://aws.amazon.com/solutions/case-studies/all/

Amazon.(n.d.).EC2InstanceTypes.RetrievedMay7,2016,fromhttps://aws.amazon.com/ec2/instance-types/

Apache.(n.d.).SparkProgrammingGuide.RetrievedApril14,2016,fromhttp://spark.apache.org/docs/latest/programming-guide.html

Apache.(n.d.).Clustering-spark.mllib.RetrievedFebruary9,2016,fromhttp://spark.apache.org/docs/latest/mllib-clustering.html

Canny,J.F.,&Zhao,H.(2013).BIDMach:Large-scaleLearningwithZeroMemoryAllocation.BigLearning.Retrievedfromhttp://biglearn.org/2013/files/papers/biglearning2013_submission_23.pdf

Canny,J.F.(2015,August19).Benchmarks.RetrievedFebruary10,2016,fromhttps://github.com/BIDData/BIDMach/wiki/Benchmarks

Canny,J.F.(2015,September27).BIDMachTutorials.RetrievedFebruary14,2016,fromhttps://github.com/BIDData/BIDMach/wiki/BIDMach-Tutorials

Intel.(n.d.).6thGenerationIntel®Core™i7Processors.RetrievedMarch5,2016,fromhttps://www-ssl.intel.com/content/www/us/en/processors/core/core-i7-processor.html

Kahn,S.(2014).SearchEnginesintheUS.IBISWorldIndustryReport51913a.RetrievedfromIBISWorlddatabase.

Lopes,N.,Ribeiro,B.,&Quintas,R.(2010).GPUMLib:AnewLibrarytocombineMachineLearningalgorithmswithGraphicsProcessingUnits.201010thInternationalConferenceonHybridIntelligentSystems.doi:10.1109/his.2010.5600028

Low,Y.,Bickson,D.,Gonzalez,J.,Guestrin,C.,Kyrola,A.,&Hellerstein,J.M.(2012).DistributedGraphLab.Proc.VLDBEndow.ProceedingsoftheVLDBEndowment,5(8),716-727.doi:10.14778/2212351.2212354

Manyika,J.,Chui,M.,Brown,B.,Bughin,J.,Dobbs,R.,Roxburgh,C.,&Byers,A.H.(2011,May).Bigdata:Thenextfrontierforinnovation,competition,andproductivity(Rep.).Retrievedhttp://www.mckinsey.com/business-functions/business-technology/our-insights/big-data-the-next-frontier-for-innovation

Morgan,T.P.(2014,February11).NetflixSpeedsMachineLearningWithAmazonGPUs.Retrievedfromhttp://www.enterprisetech.com/2014/02/11/netflix-speeds-machine-learning-amazon-gpus/

Porter,M.E.(2008).TheFiveCompetitiveForcesThatShapeStrategy.HarvardBusinessReview.

Weber,S.(2004).Thesuccessofopensource.Cambridge,MA:HarvardUniversityPress.

Implementing a GPU-based Machine Learning Library on Apache

Documents

Evaluating and Implementing Recommender Systems As Web ... · Evaluating and Implementing Recommender Systems As Web Services Using Apache Mahout By: Peter Casinelli Advisor: Sergio

Big Data Systeme Recommendations - HAW Hamburgubicomp/projekte/master2017... · •Apache Apex •Apache Beam •Batch •Apache Hadoop •Apache Tez •Stream •Apache Storm •Apache

Implementing Challenging Seismic Attributes on the GPU ...on-demand.gputechconf.com/gtc/...Seismic-Attributes... · Implementing Challenging Seismic Attributes on the GPU | GTC 2013

mrf-92e2.kxcdn.com · 2019. 11. 26. · APACHE CORPORATION APACHE CORPORATION APACHE CORPORATION APACHE CORPORATION APACHE CORPORATION APACHE CORPORATION APACHE CORPORATION 336 A-213

GPU Computing with Apache Spark and Python - GTC On …on-demand.gputechconf.com/gtc/2016/presentation/...apache-spark-p… · 26 What is Apache Spark? • An API and an execution

Implementing BigPetStore with Apache Flink

Build GPU Cluster Hardware for Efficiently Accelerating ... · Hardware GPU dense HPC Cluster CPU Host RAM GPU #5 PHB IB Card Host RAM CPU GPU #0 PHB IB Card GPU #1 GPU #4 GPU #2

Best Practices GPU-Based Video Processing | GTC 2013on-demand.gputechconf.com/...Based-Video-Processing... · GPU-Based Video Processing Optimal GPU Upload GPU Processing GPU Readback

Implementing Challenging Seismic Attributes on the GPU | GTC … · Implementing Challenging Seismic Attributes on the GPU | GTC 2013 Author: Jon Marbach Subject: Follow the evolving

Taming GPU Threads with F# and Alea GPU · Taming GPU Threads with F# and Alea GPU ... Alea Reactive Dataflow ... 20141105_Taming GPU threads with Fshap

Elasticsearch: implementing document full-text searchfiles.meetup.com/18804222/raytion_elasticsearch_full_text.pdf · Language detection in Apache Tika a lot of Analyzers in Elasticsearch/Lucene

Apache Harmony - chariotsolutions.com · Apache Harmony Implementing Java SE in open source Geir Magnusson Jr Apache Software Foundation Joost. 03/27/2007 Presentation Goal Teach

LESSONS LEARNED FROM IMPLEMENTING A PAA SERVICE BY … · the Python imaging library, the XMPP application server ejabberd and Apache Cassandra and Apache ZooKeeper, which are used

Cygnus: GPU meets FPGA for HPC - RIKEN R-CCS · 2020. 2. 27. · FPGA-GPU DMA (FPGA ← GPU) FPGA-GPU DMA (FPGA → GPU) direction via CPU FPGA-GPU DMA GPU→FPGA 17 1.44 FPGA→GPU

Multi-GPU Programming - GPU Technology Conference

Implementing NTRU on a GPU

Apache Mahoutarchive.apachecon.com/eu2008/program/materials/mahout... · 2011-09-20 · GSoC @ Mahout DRAFT “Implementing Logistic Regression in Mahout” “Codename Mahout.GA

Implementing the Output APAR enhancements · WP102267 – Implementing the Output APAR enhancements This document will show you how to: 1. Setup an Apache HTTP server which is a member

Embedded domain specific language for GPU- accelerated ...ieee-hpec.org/2016/onlineproc2016/sept13_htm_files/... · Large-scale graph analytics frameworks, such as Apache Giraph and

GPU Computing with Matlab® @ CBI Laboratory. Overview GPU History & Hardware – GPU History – CPU vs. GPU Hardware – Parallelism Design Points GPU Software