Transcript
Page 1: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 1

What’s Coming in Mahout 1.0?

Ted Dunning, Chief Application ArchitectMapR Technologies

Page 2: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 2

Page 3: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 3

Page 4: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 4

A typical encounter with a potential Mahout

user

Page 5: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 5

Which leads us to

the Mahout 1.0 vision

Page 6: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 6

Page 7: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 7

Page 8: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 8

Page 9: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 9

Example: Cooccurrence Analysis

Page 10: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 10

How often do items co-occur?// load distributed matrixval A = drmFromHDFS(...)

// compute co-occurrencesval C = A.t %*% A

Page 11: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 11

How often do items co-occur?// load distributed matrixval A = drmFromHDFS(...)

// compute co-occurrencesval C = A.t %*% A

Under the covers:

Optimizer rewrites the matrix multiplication and transpose operations to a TransposeSelf operator

Optimizer chooses from two physical operators for TransposeSelf

Page 12: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 12

Which items co-occur anomalously?

// compute & broadcast number // of interactions per itemval numInteractions =

drmBroadcast(A.colSums)

// create indicator matrixval I = C.mapBlock() { case (keys, block) =>

// allocate sparse block of indicator matrix val indicatorBlock = sparse(block.nrow, block.ncol) // compute indicators with loglikelihood ratio test for (row <- block)

indicatorBlock(row.index,::) = computeLLR(row,numInteractions) keys -> indicatorBlock

}

Page 13: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 13

Runtime

• prototype on Apache Spark– fast and expressive cluster

computing system– general computation graphs, in-memory primitives, rich API, interactive

shell

• future: add Stratosphere– project proposed to

Apache Incubator recently– similar to Apache Spark, adds data flow optimization and efficient out-

of-core execution

Page 14: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 14

Page 15: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 15

Page 16: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 16

How Does This Apply?

Page 17: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 17

How Can I Start?

Page 18: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 18

Q & A

@ted_dunning @mapr maprtech

[email protected]

Engage with us!

MapR

maprtech

mapr-technologies

Page 19: Possible Visions for Mahout 1.0

© 2014 MapR Technologies 20