24
Implementing a GPU-based Machine Learning Library on Apache Spark James Jia Pradeep Kalipatnapu Richard Chiou Yiheng Yang John F. Canny, Ed. Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2016-51 http://www.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-51.html May 11, 2016

Implementing a GPU-based Machine Learning Library on Apache

Embed Size (px)

Citation preview

Page 1: Implementing a GPU-based Machine Learning Library on Apache

Implementing a GPU-based Machine Learning Library

on Apache Spark

James JiaPradeep KalipatnapuRichard ChiouYiheng YangJohn F. Canny, Ed.

Electrical Engineering and Computer SciencesUniversity of California at Berkeley

Technical Report No. UCB/EECS-2016-51

http://www.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-51.html

May 11, 2016

Page 2: Implementing a GPU-based Machine Learning Library on Apache

Copyright © 2016, by the author(s).All rights reserved.

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission.

Page 3: Implementing a GPU-based Machine Learning Library on Apache

Implementing a GPU-based Machine Learning Library on Apache Spark

JamesJia

ElectricalEngineeringandComputerSciences

UniversityofCalifornia,Berkeley

Page 4: Implementing a GPU-based Machine Learning Library on Apache

TableofContents

1.1ProblemDefinition....................................................................................31.2TaskBreakdown.........................................................................................41.3SettingupEC2..............................................................................................51.4SupportingReadingandWritingfromHDFS....................................61.5ParallelizingBIDData..............................................................................................81.5.1BIDDataArchitectureOverview.........................................................................................81.5.2ExtendingtheLearner.............................................................................................................81.5.3APIforDistributedMachineLearning...........................................................................101.5.4ImplementingDistributedKMeans.................................................................................101.5.5KMeansBenchmarks..........................................................................................................11

2.1Introduction..............................................................................................142.2TrendsandMarket.................................................................................142.3IndustryAnalysis....................................................................................................162.3.1ValueChainAnalysis..............................................................................................................162.3.2Porter’sFiveForcesAnalysis.............................................................................................182.3.3ThreatofSubstitutes..............................................................................................................182.3.4BargainingPowerofSuppliers..........................................................................................182.3.5BargainingPowerofConsumers......................................................................................192.3.6ThreatofNewEntrants........................................................................................................192.3.7CompetitiveRivalry............................................................................................................20

2.4GotoMarketStrategy............................................................................20Bibliography......................................................................................................................22

Page 5: Implementing a GPU-based Machine Learning Library on Apache

1.1ProblemDefinition

OurcapstoneprojectistoportBIDDataoverontoApacheSpark,anopensource

engineusedforlarge-scaledataprocessing,soreal-worlddevelopersatcompaniessuch

asYahoo,Twitter,andFacebookcanleveragethe10xperformanceandcostbenefitsof

GPUacceleratedmachinesatalargescaleformachinelearningtasks.BIDDataisa

machinelearninglibrarythatrunsonasinglemachinethatisGPU-optimized,

outperformingothersetupsthatrunonhundredsofmulti-coreCPUmachinesdueto

theparallelstructureofmanymachinelearningalgorithms.Infact,itholdsthe

benchmarkforaplethoraofcommonalgorithmssuchask-NearestNeighbors,Random

Forests,andLatentDirichletAllocation,beatingothersetupsthatusehundredsofmulti-

coreCPUcomputers(Canny2015).

System nodes/cores nclust Time Cost Energy(KJ)

Spark 32/128 256 180s $0.45 1150

BIDMach 1 256 320s $0.06 90

Spark 96/384 4096 1100 $9.00 22000

BIDMach 1 4096 735 $0.12 140

RunningKMeansontheMNIST-8M(25GB)dataset(Canny2015)

However,runningBIDDataonasinglemachinecanonlyprocessdataonthe

scaleofhundredsofgigabytes.Forcomparison,manycompanies,likeYahoo,process

petabytesofdataeveryday(Feng2013).DevelopersthatwanttoscaleupBIDDatato

Page 6: Implementing a GPU-based Machine Learning Library on Apache

matchtheirdataprocessingneedshavetohandlealloftheproblemsthatarisewhen

creatingascalable,distributedplatform,suchasincompatiblelibrarydependenciesand

datastorageandsynchronizationissueswiththeirdatabasefiles.However,by

integratingBIDDataintoApacheSpark,wecaneliminatethisoverheadsodevelopers

canfocusonperformingactualanalytics.

1.2TaskBreakdown

Wesplitourprojectintotwomajorcomponents.PradeepandYihengfocusedon

thefirstcomponent:tocreateanautomatedbuildmanagementsystemforrunning

BIDDataonmultipleplatforms.Currently,developersthatwanttouseBIDDatatorun

machinelearningalgorithmsontheGPUneedtobuildthelibrariesthemselves.This

representsahugebarriertoentryasmanyalternativemachinelearninglibrarieshave

releasedpre-builtversionsthatsimplifythedeploymentprocess.However,asPradeep

andYihengwilldiscussingreaterdetailintheirreports,establishinganautomatedbuild

managementsystemforBIDDataiscomplicatedbythefactthatwearesupporting

multipledifferenthostarchitecturesandeachofthesearchitectureshavedistinctnative

librariesuponwhichBIDDatadepends.Meanwhile,RichardandIworkedonthesecond

component:toportBIDDataontopofApacheSparkandtheHadoopDistributedFile

System(HDFS).SinceBIDDatawillnowbereadingandwritingtoHDFSinsteadsimply

writingtodisk,weneedtoextendBIDData’sexistingAPIthisnewrequirement.

Furthermore,wealsoneedtoimplementseveralparallelmachinelearningalgorithmsin

ordertodocumenttheperformanceofrunningBIDDataontopofSparkandcompare

Page 7: Implementing a GPU-based Machine Learning Library on Apache

againstotheralternatives.Richardworkedonimplementingdistributedlogistic

regressionandIworkedonextendingsupporttoreadingandwritingfromHDFSaswell

asimplementedthedistributedK-Meansalgorithm.

1.3SettingupEC2

Myfirstprioritywastosetupasampledistributedsystemasamock

environmentthatwecoulduseforfuturedevelopment.Withoutsuchanenvironment,

wewouldnotbeabletobenchmarktheparallelvariantsofcommonmachinelearning

algorithmstocompareagainsttheserialversions.WechosetouseAmazonEC2asitis

thepredominantcloudcomputingplatformusedinbothacademiaandindustryto

powerdistributedcomputation.ThisentailedsettingupApacheSparkonthemaster

andslavenodeswiththeappropriateconfigurationandhookingSparkuptotheHadoop

DistributedFileSystem(HDFS)inwhichwehouseourdata.TheBIDDatalibrariesand

ApacheSparkbothdependonnativelibrariesthatarespecifictothemachinethatone

isrunningon.ThismeantthatrunningtheBIDDatalibrariesontopofSparkrequires

additionalconfigurationstepsbeyonddownloadingandbuildingSparkandHadoopon

theEC2instances–weneededamechanismtospecifyallthelibrarymodulesthat

BIDDatarequirestobedistributedacrossallthenodesinourcluster.Traditionally,users

whowanttorunBIDDataonasinglemachinerunashellscriptthatdownloadsallthe

librarydependencies.TheSparkguideonconfigurationanddeploymentrecommended

addingintheappropriatelibrarypathstothedefaultconfigurationfilelocatedin

SPARK_HOME/confs/spark-defaults.conf,soIcreatedashellscriptthatdeploysthe

Page 8: Implementing a GPU-based Machine Learning Library on Apache

librariesfromthemasternodetotheslavenodesandincludedthefilepathinthe

configurationfileonallnodes.Althoughthissolutionisnotrobusttochangesinlibrary

dependenciesfromversionupgradesinSparkorBIDData,itremovedthebottleneck

thatpreventedusfromparallelizingourworkdistribution.Sincewealsowanttoallow

developerstobuildSparkwithBIDDataforthemselves,Idocumentedthedeployment

processaswellasthecommandlineinvocationstoimporttheBIDDatalibrarieswhen

runningSparkasastandalonecluster.

1.4SupportingReadingandWritingfromHDFS

BIDDatahasitsownspecializedmatrixdatatypesthatcharacterizethedifferent

primitivesthatarestoredwithinthematrixaswellastherepresentationofthematrix

itself(sparseordense).Astheabstractionsforsavingandloadingmatricesarethe

same,Iwilldetailtheprocessforsavingamatrixtodiskforbrevity.Currently,tosavea

matrixtodisk,theBIDMachAPIexposesanoverloadedsaveMatfunctionthat,at

runtime,callstheappropriatesavefunctionbasedonthematrixtype,writingittoan

OutputStream.InordertosupportreadingandwritingfromtheHadoopDistributed

FileSystem,weneededtorevampthematrixI/Oroutinestobebasedon

DataInputStreamsandDataOutputStreamsclassesratherthantheircurrentgeneric

variantsandimplementawrapperoverthesecustommatrixdatatypesinordertoallow

Hadooptoserializethedatafortransmissionacrossthenetwork.

Page 9: Implementing a GPU-based Machine Learning Library on Apache

ToupdatethematrixI/Oroutines,ProfessorCannyrewrotethelayerthatwrites

thematrixdatatodisktooperateonDataOutputStreamsinsteadofOutputStreams.

Thisensuresthattheunderlyingdataisformattedinaplatformindependentwayand

adherestoHDFS’sabstractionforI/O.Ithenaddedamiddlelayerthatchecksthe

matrix’sfilepath.IfthepathisaddressedtotheHDFS,IinvokethenewHDFSfunction

towrapthematrixinaserializableformatforHadoop.

Page 10: Implementing a GPU-based Machine Learning Library on Apache

1.5ParallelizingBIDData

1.5.1BIDDataArchitectureOverview

WithintheBIDDataarchitecture,theModelclassisanabstractionthat

implementsthespecificmachinelearningalgorithm.Forexample,thereisaKMeans

modelthatimplementstheKMeansmodelupdateandpredictionmethods.The

DatasourceandDatasinkclassesencapsulatethelogicforreadingfromandwritingto

thevariousdataformatsthatwesupport.ProfessorCannycreatedadatasourcecalled

IteratorSourcedesignedtoworkwiththeIteratorsclassprovidedbySpark.Finally,the

Learnerclassorchestratestheflowofmodeltrainingandprediction.Ititeratively

updatesthemodelandoutputsthepredictionstothedatasink.Thelearneralsohasa

referencetoanoptionsclass,whichasthenameimplies,storestheconfigurationsthat

thedeveloperprovide.Thedevelopercreatesaninstanceofalearner,passinginthe

datasource,thedatasink,andthemodelthatisappropriatefortheirmachinelearning

task,trainsthelearnerusingasubsetofthedata,andmakesapredictionwiththe

remainingdata(Canny2015).

1.5.2ExtendingtheLearner

Thelearnerclasspreviouslyassumedthatthemodelisrunonasinglemachine,

andthusrunsthroughtheiterationsoftrainingallatonceuponinvocation.However,

val (mm, opts) = KMeans.learner(data_path) mm.train

val (pp, popts) = KMeans.predictor(mm.model, test_data_path) pp.predict

SamplecodeforrunningKMeansonasinglemachine(Canny2015)

Page 11: Implementing a GPU-based Machine Learning Library on Apache

whenrunningBIDDataonadistributedplatform,weneedtosynchronizethemodels

withinthelearnersthatarerunningontheseparateexecutorsaftereachpassthrough

thedataset.

Iabstractedawaythelogicthathandlesonepassthroughthedatasetintotwo

functions,firstPassandnextPass,andinsteadhavetraincalltheappropriatepass

functionbasedonthecurrentiterationindex.Thismorefine-graincontroloverthe

learnerlogicletsthedistributedlearnertoperformmodelsynchronizationaftereach

passthroughthedataset,whilestillallowingthesingle-machinevarianttoiterate

throughallatonce.ThedistinctionbetweenfirstPassandnextPassisbasedonthefact

that,incertainalgorithms,thefirstpassthroughthedatasetisfundamentallydistinct

fromtheremainingpasses.Asanexample,theK-Meansalgorithminstantiatesthe

centroidclustersinthefirstpass,whereastheremainingpassesimprovethecentroids.

Comparisonbetweeniterativetraininganddistributedtraining

Page 12: Implementing a GPU-based Machine Learning Library on Apache

1.5.3APIforDistributedMachineLearning

Wedistributeourmodelandourdataacrosstheexecutors.Onebenefitofthis

approachisthatwecantrainmodelsthatdonotfitintothememoryofasingleGPU.

However,thisalsobringsadditionalchallengesofneedingtosynchronizethemodel

aftereachpassthroughthedataset.Atahighlevel,wewanttocreatealearnerforeach

executoranditeratethroughthedatathatexistslocallyonthatexecutoroneachpass,

synchronizingthelearner’smodelbetweenthepasses.Althoughwelaterwantto

utilizeKylix,abutterflyall-reducecommunicationnetworktosynchronizethemodelfor

optimumnetworkperformance,IcurrentlyuseSpark’simplementationoftreeReduce

asabaselineforcomparison.

Icreatedanapplicationprogramminginterface(API)thatabstractsawaythe

implementationdetailsforrunningBIDDataontopofSpark.TheAPItakesinseveral

parameters:thesparkcontextofclassSparkContext,thelearnerofclassLearner,and

thedataofclassRDD[(SerText,MatIO)]onwhichtorunthelearner,andthenumberof

executorsinthecluster.TheSparkContextvariableisrequiredtooperateontheRDD

abstraction,thelearneranddataarenecessarytorunthemachinelearningalgorithm

onthedataset,andthenumberofexecutorsispassedinsotheprogramcanload

balanceacrosstheexecutors.

1.5.4ImplementingDistributedKMeans

TotestouttheframeworktorunmachinelearningalgorithmsonSparkusing

BIDData,IimplementedadistributedvariantoftheexistingKMeansalgorithminthe

BIDDatamachinelearninglibrary.Thisrequiredaddingageneralreductionoperator,

Page 13: Implementing a GPU-based Machine Learning Library on Apache

combineModels,totheModelclass,whichtheKMeansmodeloverrideswithitsmodel-

specificreductionoperation.Sincethefirstiterationofamodelupdatemaybe

differentfromtheotheriterations,thefunctionalsotakesintheiterationindex.

InthefirstiterationofdistributedKMeans,wewanttorandomlysamplen

clustersfromourdatasetwithequalprobability.Idosobycountingthenumberofdata

pointsthathavebeenprocessedbyeachmodel,andsamplenclustersfromthetwo

models’clusterswithprobabilityproportionaltothenumberofprocesseddatapoints.In

theotheriterationsofmodelreduction,Iaddupallthecenters’featuresandaverage

thempost-reduction.

1.5.5KMeansBenchmarks

256clusters,batchsizeof10000,10epochs

Total Time Per Epoch

Sequential BIDMach 349 seconds 37 seconds

Distributed BIDMach with 16 clusters

45.6 seconds 1.7 seconds

Spark with 16 clusters 263.1 seconds 26 seconds

Page 14: Implementing a GPU-based Machine Learning Library on Apache

5000clusters,batchsizeof10000,10epochs

Total Time Per Epoch

Sequential BIDMach 1605 seconds 169.2 seconds

Distributed BIDMach with 16 clusters

235 seconds 14.4 seconds

Spark with 16 clusters 5779 seconds 576 seconds

ThesebenchmarkswereobtainedfromrunningonAWSg2.2xlargeinstances,eachwith

one1,5360CUDAcoresNVIDIAGPUandeightIntelXeonprocessors(Amazon2016).

FromthebenchmarksweseethatrunningBIDMachonSparkvastlyoutperformsboth

BIDMachrunningonasinglemachineaswellasSparkrunningonthesameEC2setup.

Asexpected,thereissomeoverheadforreducingthemodelattheendofeachepoch

andredistributingtheupdatedmodelbacktotheworkersatthebeginningofthe

followingepoch.

Page 15: Implementing a GPU-based Machine Learning Library on Apache

Chapter 2: Engineering Leadership

JamesJia

PradeepKalipatnapu

RichardChiou

YihengYang

Page 16: Implementing a GPU-based Machine Learning Library on Apache

2.1Introduction

Asdatastoragebecomesincreasinglycommoditized,companiesarecollecting

transactionalrecordsontheorderofseveralpetabytesthatarebeyondtheabilityof

typicaldatabasesoftwaretoolstostoreandanalyze.Analysisofthis“bigdata”canyield

businessinsightssuchascustomerpreferencesandmarkettrends,whichcanbring

companiesbenefitssuchasnewrevenueopportunitiesandimprovedoperational

efficiency(Manyikaet.al2011).Toanalyzethisbigdata,companiestypicallybuildor

usethirdpartymachinelearning(ML)tools.

ProfessorJohnCannyofUCBerkeley’sEECSdepartmenthasdevelopedBIDData,

asetofGPU-basedMLlibrariesthatiscapableofcompletingcommonMLtasksan

orderofmagnitudefasterthanrivaltechnologies,whenrunonasinglemachine.

However,most“bigdata”tasksrequiregreatercomputationalpowerandstoragespace

thanthatofferedbyasinglemachine.

OurcapstoneprojectintegratesBIDDatawithApacheSpark,afast,open-source

bigdataprocessingframeworkwithrapidadoptionbythesoftwareindustry.Integration

ofBIDDatawithSparkwillenablemorecomplexbigdataanalysisandgeneratetime,

energy,andcostsavings.

2.2TrendsandMarket

Themarketopportunityforbigdataisastronomical.Duetotheconvenienceof

consumerelectronics,moreandmoredailyservicessuchasbankingandshoppingare

beingconductedonline.Thesetransactionsgenerateagargantuanamountofuserand

Page 17: Implementing a GPU-based Machine Learning Library on Apache

marketdatathatcompaniescanbenefitfrom.Asanexample,thesocialnetworkingand

searchengineindustriesoftenusemachinelearninginordertoincreasetheirprofitson

sellingadvertisingspacetovariouscompanies,optimizingthepricingmodelforeachad

spotaswellasdeterminingthebestadvertisementplacementtomaximizethe

likelihoodofuserclicks(Kahn2014:7).Determiningtheoptimalpricingandplacement

strategyisespeciallypivotaltoGoogleandYahoo’soperations,sincemostofthesead

spacesarepaidforthroughapay-per-clickmodelandconstitute98.8%ofthetotal

revenueof$11.2billionin2014(Kahn2014:14).

Processinggiganticamountsofinformationwithinsub-secondtimeintervalsis

computationallydemanding.Moderncomputerprocessingunits(CPUs)canprocess

dataat1billionfloatingpointoperationspersecond,buttypicalbigdatasetsareonthe

orderofquadrillionsofbytes(Intel2016).TraditionalCPUswouldtakehourstoprocess

asinglebigdataset,andwillbecomeincreasinglyinefficientasdatasetsizescontinues

togrow.Thus,thereexistsagreatopportunityforcompaniesthatfocusoncost-

efficientdataanalyticstoolswhichprovideeasyintegrationandprocessdataquickly.

Forinstance,McKinseyGlobalInstituteestimatesthatretailersthatuseefficientbig

datatoolscouldincreasetheiroperatingmarginsbymorethan60percent(Manyikaet.

al2011).

Recenttechnologytrendsshowthatinordertoacceleratethespeedofbigdata

MLalgorithms,CPUtechnologyisbeingreplacedbythegraphicprocessingunit(GPU).

BecauseGPUsareoptimizedformathematicaloperations,theyareordersof

magnitudesfasterthanCPUsontasksrelatedtobigdataanalysis,andbothindustryand

Page 18: Implementing a GPU-based Machine Learning Library on Apache

academiaaremovingtowardsusingGPUs(LopesandRibeiro2010).AsaGPU-basedML

library,BIDDataalsooffersnumerousimprovementscomparedtorivaltechnologies.In

termsofsingle-machineprocessingspeedandelectricityexpenditure,BIDDataalready

beatstheperformanceofothercompetitors,includingdistributedMLlibraries,byan

orderofmagnitude,assumingthedatasetcanbehousedunderasinglemachine(Canny

2015).

However,single-instanceMLlibrariessuchasBIDDataarecurrentlynotwidely

usedinindustry,astheylackthescalabilitytohandleincreasingsizesandcomplexities

ofbigdatasets.Instead,mostcompaniesareturningtowardsefficientdistributed

computingplatformstoprocessthedata(Lowet.al2012).Thus,ourcapstoneproject

aimstointegrateBIDDatawiththedistributedframeworkofSpark,aleadingmachine

learningsoftwareinthemarkettoday.Moreover,theprojectalsoincorporatesAmazon

WebServicesforbigdatastorage,anotherplatformwhichmanycompaniesarealready

leveragingtoday(Amazon2016).Byusingthisunderlyinginfrastructure,BIDDatais

morelikelytobeviewedasahighlydesirablebigdataanalysistoolbythemarket.

2.3IndustryAnalysis

2.3.1ValueChainAnalysis

Page 19: Implementing a GPU-based Machine Learning Library on Apache

Figure1:Theabovevaluechaincovershowusersandcompaniesalikebenefitfrombigdata.Asconsumersuseproducts,technologycompaniesareabletocollectdataontheirhabitsandpreferences.Analyticsfirmsandthird-partytoolsprocessthisdataandselltheirfindings.Ultimately,bigdataanalyticscanleadtoimprovedproductsandbetteruserexperiences.

Inthebigdataindustry,firmsprocessinguserdatatogaininsightsintocustomer

habitsandpreferences.Thestatisticsandtrendsthattheydiscovercanbeusedto

refineexistingproductsandservicesorbesoldtoadvertisers.Forexample,userseither

buyaproduct(e.g.FitBit)orsignuptouseafreead-supportedproduct(e.g.Gmail)

fromvarioustechcompanies.Thesecompaniescollectdatafromtheirusersbasedon

theirusagepatterns:FitBitprovidesanonymized,aggregateddataforresearch

purposes,andGmailprovidesrelevantinformationonuserstotheGoogleAdsteam.

Thisdataispassedontoorganizationsthatspecializeinbigdataanalysis.Afterobtaining

insightsonthedata,theseorganizationsthenselltheirfindingsbacktothetech

companiesortofirmssuchasadvertisers.Ultimately,bigdataanalyticscanbeusedto

improveproductsanduserexperiencesforconsumers.

Whilethesebigdataanalysiscompanieshaveaccesstolotsofdata,theymay

nothavetheunderstandingortheresourcestocreateeveryappropriatetoolfor

analyzingalloftheirbigdata,whichcancomeinvariousformats.Subsequently,these

Page 20: Implementing a GPU-based Machine Learning Library on Apache

bigdataanalysisorganizationsmustturntothird-partycustomerstoprocesssomeof

theirdata.Becauseitsmachinelearninglibrariesrecordthebestpossiblebenchmarks

amongsttheirpeers,BIDDatahasastrongcaseforbecomingoneoftheseleadingthird-

partytools.Subsequently,astartupwithexpertiseinBIDDataonSparkcouldactasboth

asupplierandaconsultantforthesebigdataorganizations.

2.3.2Porter’sFiveForcesAnalysis

AccordingtoMichaelPorter,allcompaniesfacefiveforcesofcompetitionwithin

theirindustry.Despiteitsbenchmark-leadingperformances,BIDDatafacespotential

competitionfromexistingfirmsandfutureentrantsinthebigdataindustry.

2.3.3ThreatofSubstitutes

CurrentmachinelearninglibrariesmostlyrunonCPUs,butasdiscussedinthe

TrendsandMarketssection,theindustryhasshiftedawayfromthem.Asevidencedby

Netflix’srecentswitchtousingGPUs,theindustryhasnoticedthatGPU-basedsoftware

solutionsrecordhigherbenchmarksthantraditionalCPU-basedtools(Morgan2014).

Thusinthelongterm,thethreatofsubstitutesisrelativelyweak,asCPU-based

softwareisgraduallyreplaced.

2.3.4BargainingPowerofSuppliers

Typically,programmersarethemainindividualsinvolvedinthecreationof

softwarelibraries.Currently,BIDDataisopensource,allowinganyinterested

independentprogrammerstocollaborateandmakeunpaidcontributionstotheproject.

Thus,thebargainingpowerofsupplierswithregardtowagesisextremelyweak.Asan

Page 21: Implementing a GPU-based Machine Learning Library on Apache

additionalbenefit,theopensourcemodelcanpotentiallyleadtohigh-qualityproducts

atafractionofthecost(Weber2005).

2.3.5BargainingPowerofConsumers

Ontheotherhand,thebargainingpowerofconsumersinthisspaceisstrong.

AlthoughlotsofresearchonGPU-basedMLtechniqueshasbeenconductedinrecent

years,mostcomputersusedbyconsumersandcompaniesalikestillonlyuseCPUs.

Subsequently,customersaremorelikelytochoosefromawidevarietyofCPU-based

MLtoolstouse.Moreover,thebiggestandmostprofitablecustomers(e.g.Google)

havetheresourcestocreatetheirowndataanalysistoolsforinternaluse,whichcan

furtherdepressdemandforthird-partymachinelearningtoolsinthisspace.While

newercomputerscomewithGPUs,itmaytakesometimebeforeusersandcompanies

fullyinvestinandtransitiontoGPU-basedMLtools.

2.3.6ThreatofNewEntrants

Ingeneral,thesoftwareindustryhasanextremelylowbarriertoentry,asitonly

takesasingleprogrammerwithonecomputertocreatefully-functionalsoftware.

Additionally,softwareundergoeslotsofiterationsquickly.AlthoughBIDData’s

underlyingGPUtechnologyisstillrelativelynew,startupsandexistingfirmsalikeare

growingmoreinterestedinGPU-basedsolutions.Subsequently,theindustryfacesa

verystrongthreatfromnewentrants.

Page 22: Implementing a GPU-based Machine Learning Library on Apache

2.3.7CompetitiveRivalry

Becausebigdatacomesfromavarietyofsourcesandinamultitudeofformats,

consumersrequiremanydifferentMLtechniquesforanalysis.IfBIDDatadoesnot

containanimplementationofaspecificMLalgorithm,consumerscouldsimplyuse

anothertoolthatoffersit.Asaresult,thereexistsanintensefeature-basedrivalryinthe

third-partytoolsspace,inwhichseveralfirmsofferamultitudeofservices.Forinstance,

onefirmmayspecializeindataclassification,whileanotherfirmmayprovideaproduct

optimizedfordataregression.Nonetheless,thisfeature-basedrivalrycouldallowmore

playerstoco-existinthisspace.

2.4GotoMarketStrategy

AlthoughGPUaccelerationhasbeenidentifiedasapromisingdevelopment,itis

largelystillinitsinfancyandhasnotseenwidespreadadoptionacrossmultiplesectors.

Ratherthanfocusingonprofitabilityasintraditionalmodels,wewillfocusonastrategy

thathelpsusgainmarketshare.Todoso,wearegoingtotargetaparticularsubsetof

companiesthathavebigdataproblems,specificallycompaniesthathavetheabilityto

collectlargeamountsofdata,butnotnecessarilyaccesstothecomputationalpoweror

resourcestoobtainbusinessintelligencefromit.Aswehavediscussedearlier,Fitbitisa

primeexample:theyhaveagreatcapacitytogatherinformationfromtheirdevicesas

anauxiliaryeffectoftheirproduct,butitwouldrequireanextraordinaryamountof

technicalexpertiseaswellasinfrastructuretosiftthroughthevastseaofdata.

Page 23: Implementing a GPU-based Machine Learning Library on Apache

Withthisinmind,ourgotomarketstrategyhasthreemainemphases.First,we

areutilizinga“plugandplay”modelthatemphasizesfluidsoftwareintegrationto

encourageearlyadoptions.Sincewearealsogoingtoremainopensource,thiswillalso

inspiredeveloperstocontributebacktoourcodebase.Thestrongeradvocatescould

alsoservetoevangelizewithintheircompanies,givingusstrongerleverageoverour

competitors.Lastly,thenatureofourproductisinherentlyscalable.Oncewehave

writtencodereadyforproduction,thecosttohaveanadditionaldeveloperuseour

codebaseisnegligible.

Whilecustomersmaycitealackoftechnicalsupportasanobstacletoadoption,

thereexistsanopportunityforstartupslikeDatabrickstogaincustomersthroughtheir

consultingservices.Subsequently,wecouldproactivelyestablishpotentialpartnerswith

companieslikeDatabricksthatoperateonaconsultingmodel,segmentingoursolution

astheleadingGPU-acceleratedmachinelearninglibrary.

Page 24: Implementing a GPU-based Machine Learning Library on Apache

Bibliography

Amazon.(n.d.).AllAWSCaseStudies.RetrievedMarch5,2016,fromhttps://aws.amazon.com/solutions/case-studies/all/

Amazon.(n.d.).EC2InstanceTypes.RetrievedMay7,2016,fromhttps://aws.amazon.com/ec2/instance-types/

Apache.(n.d.).SparkProgrammingGuide.RetrievedApril14,2016,fromhttp://spark.apache.org/docs/latest/programming-guide.html

Apache.(n.d.).Clustering-spark.mllib.RetrievedFebruary9,2016,fromhttp://spark.apache.org/docs/latest/mllib-clustering.html

Canny,J.F.,&Zhao,H.(2013).BIDMach:Large-scaleLearningwithZeroMemoryAllocation.BigLearning.Retrievedfromhttp://biglearn.org/2013/files/papers/biglearning2013_submission_23.pdf

Canny,J.F.(2015,August19).Benchmarks.RetrievedFebruary10,2016,fromhttps://github.com/BIDData/BIDMach/wiki/Benchmarks

Canny,J.F.(2015,September27).BIDMachTutorials.RetrievedFebruary14,2016,fromhttps://github.com/BIDData/BIDMach/wiki/BIDMach-Tutorials

Intel.(n.d.).6thGenerationIntel®Core™i7Processors.RetrievedMarch5,2016,fromhttps://www-ssl.intel.com/content/www/us/en/processors/core/core-i7-processor.html

Kahn,S.(2014).SearchEnginesintheUS.IBISWorldIndustryReport51913a.RetrievedfromIBISWorlddatabase.

Lopes,N.,Ribeiro,B.,&Quintas,R.(2010).GPUMLib:AnewLibrarytocombineMachineLearningalgorithmswithGraphicsProcessingUnits.201010thInternationalConferenceonHybridIntelligentSystems.doi:10.1109/his.2010.5600028

Low,Y.,Bickson,D.,Gonzalez,J.,Guestrin,C.,Kyrola,A.,&Hellerstein,J.M.(2012).DistributedGraphLab.Proc.VLDBEndow.ProceedingsoftheVLDBEndowment,5(8),716-727.doi:10.14778/2212351.2212354

Manyika,J.,Chui,M.,Brown,B.,Bughin,J.,Dobbs,R.,Roxburgh,C.,&Byers,A.H.(2011,May).Bigdata:Thenextfrontierforinnovation,competition,andproductivity(Rep.).Retrievedhttp://www.mckinsey.com/business-functions/business-technology/our-insights/big-data-the-next-frontier-for-innovation

Morgan,T.P.(2014,February11).NetflixSpeedsMachineLearningWithAmazonGPUs.Retrievedfromhttp://www.enterprisetech.com/2014/02/11/netflix-speeds-machine-learning-amazon-gpus/

Porter,M.E.(2008).TheFiveCompetitiveForcesThatShapeStrategy.HarvardBusinessReview.

Weber,S.(2004).Thesuccessofopensource.Cambridge,MA:HarvardUniversityPress.