Upload
buicong
View
263
Download
0
Embed Size (px)
Citation preview
CS6140:MachineLearningSpring2017
Instructor:LuWangCollegeofComputerandInforma@onScience
NortheasternUniversityWebpage:www.ccs.neu.edu/home/luwang
Email:[email protected]
TimeandLoca@on
• Time:Thursdaysfrom6:00pm–9:00pm
• Loca)on:ForsythBuilding129
CourseWebpage
• hPp://www.ccs.neu.edu/home/luwang/courses/cs6140_sp2017.html
Prerequisites
• Programming– Beingabletowritecodeinsomeprogramminglanguages(e.g.Python,Java,C/C++,Matlab)proficiently
• Courses– Algorithms– Probabilityandsta@s@cs– Linearalgebra
Prerequisites
• Courses– Algorithms– Probabilityandsta@s@cs– Linearalgebra
• Aquiz:– 22simpleques@ons,20ofthemasTrueorFalseques@ons(relevanttoprobability,sta@s@cs,andlinearalgebra)
– Thepurposeofthisquizistoindicatetheexpectedbackgroundofstudents.
– 80%oftheques@onsshouldbeeasytoanswer.– Notcountedinyourfinalscore!
TextbookandReferences
• MainTextbook– KevinMurphy,"MachineLearning-aProbabilis@cPerspec@ve",MITPress,2012.
– ChristopherM.Bishop,"PaPernRecogni@onandMachineLearning",Springer,2006.
• Othertextbooks– TomMitchell,"MachineLearning",McGrawHill,1997.
• Machinelearninglectures
ContentoftheCourse• Regression:linearregression,logis@cregression• DimensionalityReduc)on:PrincipalComponentAnalysis(PCA),Independent
ComponentAnalysis(ICA),LinearDiscriminantAnalysis• Probabilis)cModels:NaiveBayes,maximumlikelihoodes@ma@on• Sta)s)calLearningTheory:VCdimension• Kernels:SupportVectorMachines(SVMs),kerneltricks,duality• Sequen)alModelsandStructuralModels:HiddenMarkovModel(HMM),
Condi@onalRandomFields(CRFs)• Clustering:spectralclustering,hierarchicalclustering• LatentVariableModels:K-means,mixturemodels,expecta@on-maximiza@on
(EM)algorithms,LatentDirichletAlloca@on(LDA),representa@onlearning• DeepLearning:feedforwardneuralnetwork,restrictedBoltzmannmachine,
autoencoders,recurrentneuralnetwork,convolu@onalneuralnetwork• ReinforcementLearning:Markovdecisionprocesses,Q-learning• andothers,includingadvancedtopicsformachinelearninginnaturallanguage
processingandtextanalysis
TheGoal
• Scien@ficunderstandingofmachinelearningmodels
• Howtoapplyanddesignlearningmethodsfornovelproblems
TheGoal
• Notonlywhat,butalsowhy!
Grading• Assignment
– 3assignments,10%foreach
• Quiz– 10in-classtests,1%foreach
• Exam– 1exam,30%
• Project– 1project,27%
• Par@cipa@on– 3%– Classes– Piazza
Exam
• Openbook• April20,2017
CourseProject
• Amachinelearningrelevantresearchproject
• 2-3studentsasateam
Topics
• Machinelearningrelevant– Naturallanguageprocessing– Computervision– Robo@cs– Bioinforma@cs– Healthinforma@cs– …
CourseProjectGrading
• Wewanttoseenovelandinteres@ngprojects!– Theproblemneedstobewell-defined,novel,useful,andprac@cal
– machinelearningtechniques
– Reasonableresultsandobserva@ons
ProjectfromLastYear
ProjectfromLastYear
• Predic@ngFollow-backBehaviorinInstagramUsers
ProjectfromLastYear
• Predic@ngGraspPointsUsingConvolu@onalNeuralNetworks
ProjectfromLastYear
• Ar@ficialNeuralNetworksforDrugResponsePredic@oninTailoredTherapy
ProjectfromLastYear
• ThreatDetec@onfromTwiPer
ProjectfromLastYear
• PlayerRankinginPopularGames
CourseProjectGrading
• Threereports– Proposal(2%)– Progress,withcode(10%)– Final,withcode(10%)
• Onepresenta@on– Inclass(5%)
SubmissionandLatePolicy• Eachassignmentorreport,bothelectroniccopyandhardcopy,isdueatthebeginningofclassonthecorrespondingduedate.
• Programminglanguage– Python,Java,C/C++,Matlab
• Electronicversion– Onblackboard
• Hardcopy– Inclass
SubmissionandLatePolicy
• Assignmentorreportturnedinlatewillbecharged10points(outof100points)offforeachlateday(i.e.24hours).
• Eachstudenthasabudgetof5daysthroughoutthesemesterbeforealatepenaltyisapplied.
Howtofindus?• Coursewebpage:– hPp://www.ccs.neu.edu/home/luwang/courses/cs6140_sp2017.html
• Officehours– LuWang:Thursdaysfrom4:30pmto5:30pm,orbyappointment,448WVH
– RuiDong(TA),Tuesdaysfrom4:00pmto5:00pm,orbyappointment,466BWVH
• Piazza– hPp://piazza.com/northeastern/spring2017/cs614002– Allcourserelevantques@onsgohere
WhatisMachineLearning?
• “Asetofmethodsthatcanautoma@callydetectpaPernsindata,andthenusetheuncoveredpaPernstopredictfuturedata,ortoperformotherkindsofdecisionsmakingundercertainty.”
RealWorldApplica@ons
RealWorldApplica@ons
RealWorldApplica@ons
RealWorldApplica@ons
RealWorldApplica@ons
RealWorldApplica@ons
RealWorldApplica@ons
RealWorldApplica@ons
RealWorldApplica@ons
RealWorldApplica@ons
Rela@onswithOtherAreas
• NaturalLanguageProcessing
• ComputerVision
• Robo@cs
• Alotofotherareas…
Today’sOutline
• Basicconceptsinmachinelearning
• K-nearestneighbors
• Linearregression
• Ridgeregression
Supervisedvs.UnsupervisedLearning
SupervisedLearning
Supervisedvs.UnsupervisedLearning
• Supervisedlearning
Trainingset Trainingsample Gold-standardlabel- Classifica)on,ifcategorical- Regression,ifnumerical
SupervisedLearning
SupervisedLearning
SupervisedLearning
• Goal:– Generalizabletonewinputsamples– Overfivngvs.underfivng– Onesolu@on:weuseprobabilis@cmodels
• Typicalsetup:– Step1:Features– Step2:Trainingset,testset,developmentset– Step3:Evalua@on
SupervisedLearning
SupervisedLearning
SupervisedLearning
SupervisedLearning
• Regression– Predic@ngstockprice– Predic@ngtemperature– Predic@ngrevenue…
Supervisedvs.UnsupervisedLearning
• UnsupervisedLearning
• Moreabout“knowledgediscovery”
UnsupervisedLearning
• Dimensionreduc@on– Principalcomponentanalysis
UnsupervisedLearning
• Clustering(e.g.graphmining)
RolX:RoleExtrac.onandMininginLargeNetworks,byHendersonetal,2011
UnsupervisedLearning
• Topicmodeling
Parametricvs.Non-parametricmodel
• Fixednumberofparameters?– Ifyes,parametricmodel
• Numberofparametersgrowwiththeamountoftrainingdata?– Ifyes,non-parametricmodel
• Computa@onaltractability
Today’sOutline
• Basicconceptsinmachinelearning
• K-nearestneighbors– Supervisedlearning– Anon-parametricclassifier
• Linearregression
• Ridgeregression
Anon-parametricclassifier:K-nearestneighbors(KNN)
Anon-parametricclassifier:K-nearestneighbors(KNN)
• Basicidea:memorizeallthetrainingsamples– Themoreyouhaveintrainingdata,themorethemodelhastoremember
Anon-parametricclassifier:K-nearestneighbors(KNN)
• Basicidea:memorizeallthetrainingsamples– Themoreyouhaveintrainingdata,themorethemodelhastoremember
• Nearestneighbor(or1-nearestneighbor):– Tes@ngphase:findclosetsample,andreturncorrespondinglabel
Anon-parametricclassifier:K-nearestneighbors(KNN)
• Basicidea:memorizeallthetrainingsamples– Themoreyouhaveintrainingdata,themorethemodelhastoremember
• K-Nearestneighbor:– Tes@ngphase:findtheKnearestneighbors,andreturnthemajorityvoteoftheirlabels
AboutK
• K=1:justpiecewiseconstantlabeling• K=N:globalmajorityvote(class)
ProblemsofkNN
• Canbeslowwhentrainingdataisbig– Searchingfortheneighborstakes@me
• Needslotsofmemorytostoretrainingdata
• Needstotunekanddistancefunc@on
• Notaprobabilitydistribu@on
ProblemsofkNN
• Distancefunc@on– Euclideandistance
ProblemsofkNN
• Distancefunc@on– Mahalanobisdistance:weightsoncomponents
Probabilis@ckNN
• Wepreferaprobabilis@coutputbecausesome@meswemaygetan“uncertain”result– 1samplesas“yes”,199samplesas“no”à?– 99samplesas“yes”,101samplesas“no”à?
• Probabilis@ckNN:
Probabilis@ckNN
3-classsynthe@ctrainingdata
Smoothing
• Class1:3,class2:0,class3:1• Originalprobability:– P(y=1)=3/4,p(y=2)=0/4,p(y=3)=1/4
Smoothing
• Class1:3,class2:0,class3:1• Originalprobability:– P(y=1)=3/4,p(y=2)=0/4,p(y=3)=1/4
• Add-1smoothing:– Class1:3+1,class2:0+1,class3:1+1– P(y=1)=4/7,p(y=2)=1/7,p(y=3)=2/7
Soxmax
• Class1:3,class2:0,class3:1• Originalprobability:– P(y=1)=3/4,p(y=2)=0/4,p(y=3)=1/4
• Redistributeprobabilitymassintodifferentclasses– Defineasoxmaxas
Today’sOutline
• Basicconceptsinmachinelearning
• K-nearestneighbors
• Linearregression– Supervisedlearning– Aparametricclassifier
• Ridgeregression
Aparametricclassifier:linearregression
• Assump@on:theresponseisalinearfunc@onoftheinputs
InnerproductbetweeninputsampleXandweightvectorW
Residualerror:differencebetweenpredic@onandtruelabel
Aparametricclassifier:linearregression
• Assumeresidualerrorhasanormaldistribu@on
InnerproductbetweeninputsampleXandweightvectorW
Residualerror:differencebetweenpredic@onandtruelabel
Aparametricclassifier:linearregression
• Wecanfurtherassume
• Basicfunc@onexpansion
Aparametricclassifier:linearregression
Ver@cal:temperatureHorizontal:loca@onwithinaroom
Aparametricclassifier:linearregression
LearningwithMaximumLikelihoodEs@ma@on(MLE)
• MaximumLikelihoodEs@ma@on(MLE)
LearningwithMaximumLikelihoodEs@ma@on(MLE)
• Log-likelihood
• Maximizelog-likelihoodisequivalenttominimizenega@velog-likelihood(NLL)
LearningwithMaximumLikelihoodEs@ma@on(MLE)
• Withournormaldistribu@onassump@on
Residualsumofsquares(RSS)àWewanttominimizeit!
Deriva@onofMLEforLinearRegression
• Rewriteourobjec@vefunc@onas
Deriva@onofMLEforLinearRegression
• Rewriteourobjec@vefunc@onas
• Getthederiva@ve(orgradient)
Deriva@onofMLEforLinearRegression
• Rewriteourobjec@vefunc@onas
• Getthederiva@ve(orgradient)
• Setourderiva@veto0
Ordinaryleastsquaressolu)on
Overfivng
Featureweightsw:
APriorontheWeight
• Zero-meanGaussianprior
APriorontheWeight
• Zero-meanGaussianprior
• Newobjec@vefunc@on
APriorontheWeight
• Zero-meanGaussianprior
• Newobjec@vefunc@on
Today’sOutline
• Basicconceptsinmachinelearning
• K-nearestneighbors
• Linearregression
• Ridgeregression
RidgeRegression
• Wewanttominimize
RidgeRegression
• Wewanttominimize
• Newes@ma@onfortheweight
RidgeRegression
• Wewanttominimize
• Newes@ma@onfortheweight
L2regulariza)on
RidgeRegression
• Wewanttominimize
• Newes@ma@onfortheweight
L2regulariza)on
LeavetheproofinAssignment1!
Whatwelearned
• Basicconceptsinmachinelearning
• K-nearestneighbors
• Linearregression
• Ridgeregression
Homework
• ReadingMurphych1,ch2,andch7(onlythesec@onscoveredinthelecture)
• SignupatPiazza• hPp://piazza.com/northeastern/spring2017/cs614002
• Startthinkingaboutcourseprojectandfindateam!– ProjectproposaldueJan26
Homework
• ReadingMurphych1,ch2,andch7• SignupatPiazza• hPp://piazza.com/northeastern/spring2017/cs614002
• Startthinkingaboutcourseprojectandfindateam!– ProjectproposaldueJan26
• NextTime:Logis@cRegression,DecisionTree,Genera@veModels(NaiveBayes)– Reading:MurphyCh3,8.1-8.3,8.6,16.2