Possible Visions for Mahout 1.0

Preview:

DESCRIPTION

These are the slides that we used to ignite the conversation with the audience at Hadoop Summit EU. Come over to the Mahout dev list to be part of the ongoing conversation.

Citation preview

© 2014 MapR Technologies 1

What’s Coming in Mahout 1.0?

Ted Dunning, Chief Application ArchitectMapR Technologies

© 2014 MapR Technologies 2

© 2014 MapR Technologies 3

© 2014 MapR Technologies 4

A typical encounter with a potential Mahout

user

© 2014 MapR Technologies 5

Which leads us to

the Mahout 1.0 vision

© 2014 MapR Technologies 6

© 2014 MapR Technologies 7

© 2014 MapR Technologies 8

© 2014 MapR Technologies 9

Example: Cooccurrence Analysis

© 2014 MapR Technologies 10

How often do items co-occur?// load distributed matrixval A = drmFromHDFS(...)

// compute co-occurrencesval C = A.t %*% A

© 2014 MapR Technologies 11

How often do items co-occur?// load distributed matrixval A = drmFromHDFS(...)

// compute co-occurrencesval C = A.t %*% A

Under the covers:

Optimizer rewrites the matrix multiplication and transpose operations to a TransposeSelf operator

Optimizer chooses from two physical operators for TransposeSelf

© 2014 MapR Technologies 12

Which items co-occur anomalously?

// compute & broadcast number // of interactions per itemval numInteractions =

drmBroadcast(A.colSums)

// create indicator matrixval I = C.mapBlock() { case (keys, block) =>

// allocate sparse block of indicator matrix val indicatorBlock = sparse(block.nrow, block.ncol) // compute indicators with loglikelihood ratio test for (row <- block)

indicatorBlock(row.index,::) = computeLLR(row,numInteractions) keys -> indicatorBlock

}

© 2014 MapR Technologies 13

Runtime

• prototype on Apache Spark– fast and expressive cluster

computing system– general computation graphs, in-memory primitives, rich API, interactive

shell

• future: add Stratosphere– project proposed to

Apache Incubator recently– similar to Apache Spark, adds data flow optimization and efficient out-

of-core execution

© 2014 MapR Technologies 14

© 2014 MapR Technologies 15

© 2014 MapR Technologies 16

How Does This Apply?

© 2014 MapR Technologies 17

How Can I Start?

© 2014 MapR Technologies 18

Q & A

@ted_dunning @mapr maprtech

tdunning@mapr.com

Engage with us!

MapR

maprtech

mapr-technologies

© 2014 MapR Technologies 20