23
DML Syntax & Invocation Nakul Jindal Spark Technology Center, San Francisco

DML Syntax and Invocation process

Embed Size (px)

Citation preview

DMLSyntax&InvocationNakulJindal

SparkTechnologyCenter,SanFrancisco

GoalofTheseSlides

• ProvideyouwithbasicDMLsyntax• Linktoimportantresources• Invocation

Non-Goals• ComprehensivesyntaxandAPIcoverage

Resources

• Google“ApacheSystemml”• Documentation- https://apache.github.io/incubator-systemml/• DMLLanguageReference- https://apache.github.io/incubator-systemml/dml-language-reference.html• MLContext - https://apache.github.io/incubator-systemml/spark-mlcontext-programming-guide.html#spark-shell-scala-example• Github - https://github.com/apache/incubator-systemml

Note• Somedocumentation isoutdated• Ifyoufindatypoorwanttoupdatethedocument, considermakingaPullRequest• AlldocsareinMarkdownformat• https://github.com/apache/incubator-systemml/tree/master/docs

AboutDMLBriefly

• DML=DeclarativeMachineLearning• R-likesyntax,somesubtledifferencesfromR• Dynamicallytyped• DataStructures

• Scalars– Boolean,Integers,Strings,DoublePrecision• Cacheable–Matrices,DataFrames

• DataStructureTerminology inDML• ValueType- Boolean,Integers,Strings,DoublePrecision• DataType– Scalar,Matrices,DataFrames*• YoucanhaveaDataType[ValueType],notallcombinationsaresupported

• Forinstance– matrix[double]

• Scoping• Oneglobalscope,exceptinside functions

*Coming soon

AboutDMLBriefly

• ControlFlow• Sequential imperativecontrolflow(likemostotherlanguages)• Looping–

• while (<condition>){…}• for (var in <for_predicate>){…}• parfor (var in <for_predicate>){…} //Iterationsinparallel

• Guards–• if (<condition>){...}[ else if (<condition>){...}...else {…}]

• Functions• Built-in– Listavailable inlanguagereference• UserDefined– (multiplereturnparameters)

• functionName =function (<formal_parameters>…)return (<formal_parameters>){...}• Canonlyaccessvariablesdefinedintheformal_parameters inthebodyofthefunction

• ExternalFunction– sameasuserdefined,cancallexternalJavaPackage

AboutDMLBriefly

• Imports• Canimportuserdefined/externalfunctions fromothersourcefiles• Disambiguationusingnamespaces

• CommandLineArguments• Byposition- $1,$2 …• Byname- $X,$Y ...

• Limitations• Auserdefinedfunctionscanonlybecalledontherighthandsideofassignmentsastheonlyexpression• Cannotwrite• X<- Y+bar()• for (i in foo(1,2,3)){…}

SampleCodeA = 1.0 # A is an integerX <- matrix(“4 3 2 5 7 8”, rows=3, cols=2) # X = matrix of size 3,2 '<-' is assignmentY = matrix(1, rows=3, cols=2) # Y = matrix of size 3,2 with all 1sb <- t(X) %*% Y # %*% is matrix multiply, t(X) is transposeS = "hello world"

i=0while(i < max_iteration) {

H = (H * (t(W) %*% (V/(W%*%H))))/t(colSums(W)) # * is element by element multW = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H))i = i + 1; # i is an integer

}

print (toString(H)) # toString converts a matrix to a string

SampleCodesource("nn/layers/affine.dml") as affine # import a file in the “affine“ namespace[W, b] = affine::init(D, M) # calls the init function, multiple return

parfor (i in 1:nrow(X)) { # i iterates over 1 through num rows in X in parallelfor (j in 1:ncol(X)) { # j iterates over 1 through num cols in X

# Computation ...}

}

write (M, fileM, format=“text”) # M=matrix, fileM=file, also writes to HDFSX = read (fileX) # fileX=file, also reads from HDFS

if (ncol (A) > 1) {# Matrix A is being sliced by a given range of columnsA[,1:(ncol (A) - 1)] = A[,1:(ncol (A) - 1)] - A[,2:ncol (A)];

}

SampleCodeinterpSpline = function(double x, matrix[double] X, matrix[double] Y, matrix[double] K) return (double q) {i = as.integer(nrow(X) - sum(ppred(X, x, ">=")) + 1)

# misc computation …q = as.scalar(qm)

}

eigen = externalFunction(Matrix[Double] A) return(Matrix[Double] eval, Matrix[Double] evec)implemented in (classname="org.apache.sysml.udf.lib.EigenWrapper", exectype="mem")

SampleCode(FromLinearRegDS.dml*)

A = t(X) %*% Xb = t(X) %*% y

if (intercept_status == 2) {A = t(diag (scale_X) %*% A + shift_X %*% A [m_ext, ]) A = diag (scale_X) %*% A + shift_X %*% A [m_ext, ]b = diag (scale_X) %*% b + shift_X %*% b [m_ext, ]

}

A = A + diag (lambda)

print ("Calling the Direct Solver...")

beta_unscaled = solve (A, b)

*https://github.com/apache/incubator-systemml/blob/master/scripts/algorithms/LinearRegDS.dml#L133

MLContext API

• YoucaninvokeSystemML fromthe• Commandlineora• SparkProgram

• TheMLContext APIletsyouinvokeitfromaSparkProgram• Commandlineinvocationdescribedlater• AvailableasaScalaAPIandaPythonAPI• TheseslideswillonlytalkabouttheScalaAPI

MLContext API– ExampleUsage

val ml = new MLContext(sc)

val X_train = sc.textFile("amazon0601.txt").filter(!_.startsWith("#")).map(_.split("\t") match{case Array(prod1, prod2)=>(prod1.toInt, prod2.toInt,1.0)}).toDF("prod_i", "prod_j", "x_ij").filter("prod_i < 5000 AND prod_j < 5000") // Change to smaller number.cache()

MLContext API– ExampleUsageval pnmf ="""# data & argsX = read($X)rank = as.integer($rank)

# Computation ....

write(negloglik, $negloglikout)write(W, $Wout)write(H, $Hout)"""

MLContext API– ExampleUsageval pnmf ="""# data & argsX = read($X)rank = as.integer($rank)

# Computation ....

write(negloglik, $negloglikout)write(W, $Wout)write(H, $Hout)"""

ml.registerInput("X", X_train)ml.registerOutput("W")ml.registerOutput("H")ml.registerOutput("negloglik")

val outputs = ml.executeScript(pnmf, Map("maxiter" -> "100", "rank" -> "10"))

val negloglik = getScalarDouble(outputs, "negloglik")

Invocation– HowtorunaDMLfile

• SystemML canrunon• Yourlaptop(Standalone)• Spark• HybridSpark– usingthebetterchoicebetweenthedriverandthecluster• Hadoop• HybridHadoop

• Forthispresentation,wecareaboutstandalone,spark &hybrid_spark• Documentationhasdetailedinstructionsontheothers

Invocation– HowtorunaDMLfile

StandaloneInthesystemml directorybin/systemml <dml-filename>[arguments]

Exampleinvocations:bin/systemml LinearRegCG.dml –nvargs X=X.mtx Y=Y.mtx B=B.mtxbin/systemml oddsRatio.dml –args X.mtx 50B.mtx

Namedarguments

Positionarguments

Invocation– HowtorunaDMLfile

Spark/ HybridSparkDefineSPARK_HOMEtopointtoyourApacheSparkInstallationDefineSYSTEMML_HOMEtopointtoyourApacheSystemML installation

Inthesystemml directoryscripts/sparkDML.sh<dml-filename>[systemmlarguments]

Exampleinvocations:scripts/sparkDML.sh LinearRegCG.dml --nvargs X=X.mtx Y=Y.mtxB=B.mtxscripts/sparkDML.sh oddsRatio.dml --args X.mtx 50B.mtx

Namedarguments

Positionarguments

Invocation– HowtorunaDMLfileSpark/ HybridSparkDefineSPARK_HOMEtopointtoyourApacheSparkInstallationDefineSYSTEMML_HOMEtopointtoyourApacheSystemML installationUsingthespark-submit script

$SPARK_HOME/bin/spark-submit--master<master-url>--classorg.apache.sysml.api.DMLScript${SYSTEMML_HOME}/SystemML.jar -f<dml-filename> <systemml arguments>-exec{hybrid_spark,spark}

Exampleinvocation:$SPARK_HOME/bin/spark-submit--masterlocal[*]--classorg.apache.sysml.api.DMLScript${SYSTEMML_HOME}/SystemML.jar -fLinearRegCG.dml --nvargs X=X.mtx Y=Y.mtx B=B.mtx

EditorSupport

• Veryrudimentaryeditorsupport• Bitofshamelessself-promotion:• Atom– HackableTexteditor

• Installpackage- https://atom.io/packages/language-dml• FromGUI- http://flight-manual.atom.io/using-atom/sections/atom-packages/• Orfromcommandline– apm installlanguage-dml• Rudimentarysnippetbasedcompletionofbuiltin function

• Vim• Installpackage- https://github.com/nakul02/vim-dml• WorkswithVundle (vimpackagemanager)

• ThereisanexperimentalZeppelinNotebookintegrationwithDML–• https://issues.apache.org/jira/browse/SYSTEMML-542• Availableasadocker imagetoplaywith- https://hub.docker.com/r/nakul02/incubator-zeppelin/

• Pleasesendfeedbackwhenusingthese,requestsforfeatures,bugs• I’llworkonthemwhenIcan

OtherInformation

• Allscriptsarein- https://github.com/apache/incubator-systemml/tree/master/scripts• AlgorithmScripts- https://github.com/apache/incubator-systemml/tree/master/scripts/algorithms• TestScripts- https://github.com/apache/incubator-systemml/tree/master/src/test/scripts• Lookinsidethetestfolderforprogramsthatrunthetests,playaroundwithsomeofthem- https://github.com/apache/incubator-systemml/tree/master/src/test/java/org/apache/sysml/test

Thanks!

• Thedocumentationmightbeoutdatedandhavetypos• Pleasesubmitfixes

• Ifalanguagefeaturedoesnotmakesenseorismissing,askaSystemML teammember• HaveFun!

BACKUPSLIDES

• TherewasanattemptatanEclipsePluginlatelastyear-• https://www.mail-archive.com/dev%40systemml.incubator.apache.org/msg00147.html• Theprojectislargelydead

EditorSupport