Upload
hoangtram
View
251
Download
1
Embed Size (px)
Citation preview
Sem22017Lecturer:TrevorCohn
Lecture1.Introduction.ProbabilityTheoryCOMP90051MachineLearning
AdaptedfromslidesprovidedbyBenRubinstein
COMP90051MachineLearning(S22017) L1
WhyLearnLearning?
2
COMP90051MachineLearning(S22017) L1
Motivation
• “Wearedrowningininformation,butwearestarvedforknowledge”
- JohnNaisbitt,Megatrends
• Data=rawinformation
• Knowledge=patternsormodelsbehindthedata
3
COMP90051MachineLearning(S22017) L1
Solution:MachineLearning• Hypothesis:pre-existingdatarepositoriescontainalotofpotentiallyvaluableknowledge
• Missionoflearning:findit
• Definitionoflearning:
(semi-)automaticextractionofvalid,novel,usefulandcomprehensibleknowledge– intheformofrules,regularities,patterns,constraintsormodels– fromarbitrarysetsofdata
4
COMP90051MachineLearning(S22017) L1
ApplicationsofMLareDeepandPrevalent• Onlineadselectionandplacement
• Riskmanagementinfinance,insurance,security
• High-frequencytrading
• Medicaldiagnosis
• Miningandnaturalresources
• Malwareanalysis
• Drugdiscovery
• Searchengines…
5
COMP90051MachineLearning(S22017) L1
DrawsonManyDisciplines
• ArtificialIntelligence• Statistics• Continuousoptimisation• Databases• InformationRetrieval• Communications/informationtheory• SignalProcessing• ComputerScienceTheory• Philosophy• Psychologyandneurobiology
…6
COMP90051MachineLearning(S22017) L1
Job$
7
ManycompaniesacrossallindustrieshireMLexperts:
DataScientistAnalyticsExpertBusinessAnalystStatisticianSoftwareEngineerResearcher…
COMP90051MachineLearning(S22017) L1
AboutthisSubject
8
(refertosubjectoutlineongithub formoreinformation– linkedfromLMS)
COMP90051MachineLearning(S22017) L1
VitalStatistics
9
Lecturers:Weeks1;9-12
Weeks2-8
TrevorCohn(DMD8.,[email protected])A/Prof&FutureFellow,Computing&InformationSystemsStatisticalMachineLearning,NaturalLanguageProcessing
Andrey Kan ([email protected])ResearchFellow,WalterandEliza HallInstituteML,Computationalimmunology,Medicalimageanalysis
Tutors: YasmeenGeorge([email protected])Nitika Mathur ([email protected])YuanLi([email protected])
Contact: Weeklyyoushouldattend2xLectures,1xWorkshop
OfficeHours Thursdays1-2pm,7.03DMDBuilding
Website: https://trevorcohn.github.io/comp90051-2017/
COMP90051MachineLearning(S22017) L1
AboutMe(Trevor)
10
• PhD2007– UMelbourne
• 10yearsabroadUK* EdinburghUniversity,inLanguagegroup* SheffieldUniversity,inLanguage&Machinelearninggroups
• Expertise:Basicresearchinmachinelearning;Bayesianinference;graphicalmodels;deeplearning;applicationstostructuredproblemsintext(translation,sequencetagging,structuredparsing,modellingtimeseries)
COMP90051MachineLearning(S22017) L1
SubjectContent
• Thesubjectwillcovertopicsfrom
Foundationsofstatisticallearning,linearmodels,non-linearbases,kernelapproaches,neuralnetworks,Bayesianlearning,probabilisticgraphicalmodels(BayesNets,MarkovRandomFields),clusteranalysis,dimensionalityreduction,regularisationandmodelselection
• Wewillgainhands-onexperiencewithallofthisviaarangeoftoolkits,workshoppracs,andprojects
11
COMP90051MachineLearning(S22017) L1
SubjectObjectives
• Developanappreciationfortheroleofstatisticalmachinelearning,bothintermsoffoundationsandapplications
• GainanunderstandingofarepresentativeselectionofMLtechniques
• Beabletodesign,implementandevaluateMLsystems
• BecomeadiscerningMLconsumer
12
COMP90051MachineLearning(S22017) L1
Textbooks
• Primarilyreferencesto* Bishop(2007)PatternRecognitionandMachineLearning
• Othergoodgeneralreferences:* Murphy(2012)MachineLearning:AProbabilisticPerspective[readfreeebookusing‘ebrary’athttp://bit.ly/29SHAQS]
* Hastie,Tibshirani,Friedman(2001)TheElementsofStatisticalLearning:DataMining,InferenceandPrediction [freeathttp://www-stat.stanford.edu/~tibs/ElemStatLearn]
13
COMP90051MachineLearning(S22017) L1
Textbooks
• ReferencesforPGMcomponent* Koller,Friedman(2009)ProbabilisticGraphicalModels:PrinciplesandTechniques
14
COMP90051MachineLearning(S22017) L1
AssumedKnowledge(Week2WorkshoprevisesCOMP90049)
• Programming* Required:proficiencyatprogramming,ideallyinpython* Ideal:exposuretoscientificlibrariesnumpy,scipy,matplotlib etc.(similarinfunctionalitytomatlab &aspectsofR.)
• Maths* Familiaritywithformalnotation* Familiaritywithprobability(Bayesrule,marginalisation)* Exposuretooptimisation(gradientdescent)
• ML:decisiontrees,naïveBayes,kNN,kMeans
15
𝐏𝐫 𝒙 =% 𝐏𝐫(𝒙, 𝒚)�
𝒚
COMP90051MachineLearning(S22017) L1
Assessment
• Assessmentcomponents* Twoprojects– onereleasedearly(w3-4),onelate(w7-8);willhave~3weekstocomplete• Firstprojectfairlystructured(20%)• Secondprojectincludescompetitioncomponent(30%)
* FinalExam
• Breakdown* 50%Exam* 50%Projectwork
• 50%Hurdleappliestobothexam andongoingassessment
16
COMP90051MachineLearning(S22017) L1
MachineLearningBasics
17
COMP90051MachineLearning(S22017) L1
Terminology
• Inputtoamachinelearningsystemcanconsistof
* Instance:measurementsaboutindividualentities/objectsaloanapplication
* Attribute(akaFeature,explanatoryvar.):componentoftheinstancestheapplicant’ssalary,numberofdependents,etc.
* Label(akaResponse,dependentvar.):anoutcomethatiscategorical,numeric,etc.forfeitvs.paidoff
* Examples:instancecoupledwithlabel<(100k,3),“forfeit”>
* Models:discoveredrelationshipbetweenattributesand/orlabel18
COMP90051MachineLearning(S22017) L1
Supervisedvs UnsupervisedLearning
19
Data Modelusedfor
Supervisedlearning Labelled Predictlabelsonnew
instances
Unsupervisedlearning Unlabelled
Cluster relatedinstances;Projecttofewerdimensions;Understandattributerelationships
COMP90051MachineLearning(S22017) L1
ArchitectureofaSupervisedLearner
20
Testdata
Traindata Learner
Model
Evaluation
Examples
Instances
Labels
Labels
COMP90051MachineLearning(S22017) L1
Evaluation(SupervisedLearners)
• Howyoumeasurequalitydependsonyourproblem!
• Typicalprocess* Pickanevaluationmetriccomparinglabelvsprediction* Procureanindependent,labelledtestset* “Average”theevaluationmetricoverthetestset
• Exampleevaluationmetrics* Accuracy,Contingencytable,Precision-Recall,ROCcurves
• Whendatapoor,cross-validate
21
COMP90051MachineLearning(S22017) L1
Dataisnoisy(almostalways)
22
• Example:* givenmarkforKnowledgeTechnologies(KT)
* predictmarkforMachineLearning(ML)
KTmark
MLm
ark
*syntheticdata:)
Trainingdata*
COMP90051MachineLearning(S22017) L1
Typesofmodels
23
𝑦- = 𝑓 𝑥
KTmarkwas95,MLmarkispredictedto
be95
𝑃 𝑦 𝑥
KTmarkwas95,MLmarkislikelytobein
(92,97)
𝑃(𝑥, 𝑦)
probabilityofhaving(𝐾𝑇 = 𝑥,𝑀𝐿 = 𝑦)
𝑥
COMP90051MachineLearning(S22017) L1
ProbabilityTheory
24
Briefrefresher
COMP90051MachineLearning(S22017) L1
BasicsofProbabilityTheory
• Aprobabilityspace:* SetW ofpossibleoutcomes
* SetF ofevents(subsetsofoutcomes)
* ProbabilitymeasureP:Fà R
• Example:adieroll* {1,2,3,4,5,6}
* {j,{1},…,{6},{1,2},…,{5,6},…,{1,2,3,4,5,6}}
* P(j)=0,P({1})=1/6,P({1,2})=1/3,…
25
COMP90051MachineLearning(S22017) L1
AxiomsofProbability
1. 𝑃(𝑓) ≥ 0 foreveryeventf inF
2. 𝑃 ⋃ 𝑓�8 = ∑ 𝑃(𝑓)�8 forallcollections*ofpairwise
disjointevents
3. 𝑃 Ω = 1
26
*Wewon’tdelvefurtherintoadvancedprobabilitytheory,whichstartswithmeasuretheory.Buttobeprecise,additivityisovercollectionsofcountably-manyevents.
COMP90051MachineLearning(S22017) L1
RandomVariables(r.v.’s)
• ArandomvariableX isanumericfunctionofoutcome𝑋(𝜔) ∈ 𝑹
• 𝑃 𝑋 ∈ 𝐴 denotestheprobabilityoftheoutcomebeingsuchthatX fallsintherangeA
• Example:X winningson$5betonevendieroll* Xmaps1,3,5to-5Xmaps2,4,6to5
* P(X=5)=P(X=-5)=½
27
COMP90051MachineLearning(S22017) L1
Discretevs.ContinuousDistributions
• Discretedistributions* Governr.v.takingdiscretevalues
* Describedbyprobabilitymassfunctionp(x)whichisP(X=x)
* 𝑃 𝑋 ≤ 𝑥 = ∑ 𝑝(𝑎)DEFGH
* Examples:Bernoulli,Binomial,Multinomial,Poisson
• Continuousdistributions* Governreal-valuedr.v.
* CannottalkaboutPMFbutratherprobabilitydensityfunctionp(x)
* 𝑃 𝑋 ≤ 𝑥 = ∫ 𝑝 𝑎 𝑑𝑎DGH
* Examples:Uniform,Normal,Laplace,Gamma,Beta,Dirichlet
28
COMP90051MachineLearning(S22017) L1
-4 -2 0 2 4
0.0
0.1
0.2
0.3
0.4
x
p(x)
Expectation
• ExpectationE[X]isther.v.X’s“average”value* Discrete:𝐸 𝑋 = ∑ 𝑥𝑃(𝑋 = 𝑥)�
D
* Continuous:𝐸 𝑋 = ∫ 𝑥𝑝 𝑥 𝑑𝑥D
• Properties* Linear:𝐸 𝑎𝑋 + 𝑏 = 𝑎𝐸 𝑋 + 𝑏
𝐸 𝑋 + 𝑌 = 𝐸 𝑋 + 𝐸 𝑌* Monotone:𝑋 ≥ 𝑌 ⇒ 𝐸 𝑋 ≥ 𝐸 𝑌
• Variance:𝑉𝑎𝑟 𝑋 = 𝐸[ 𝑋 − 𝐸 𝑋 T]
29
COMP90051MachineLearning(S22017) L1
IndependenceandConditioning
• X,Y areindependent if* 𝑃 𝑋 ∈ 𝐴, 𝑌 ∈ 𝐵 =𝑃 𝑋 ∈ 𝐴 𝑃(𝑌 ∈ 𝐵)
* Similarlyfordensities:𝑝W,X 𝑥, 𝑦 = 𝑝W(𝑥)𝑝X(𝑦)
* Intuitively:knowingvalueofY revealsnothingaboutX
* Algebraically:thejointonX,Yfactorises!
• Conditionalprobability
* 𝑃 𝐴 𝐵 = Y(Z∩\)Y(\)
* Similarlyfordensities𝑝 𝑦 𝑥 = ](D,^)
](D)
* Intuitively:probabilityeventA willoccurgivenweknoweventB hasoccurred
* X,Yindependentequiv to𝑃 𝑌 = 𝑦 𝑋 = 𝑥 = 𝑃(𝑌 = 𝑦)
30
COMP90051MachineLearning(S22017) L1
InvertingConditioning:Bayes’Theorem
• IntermsofeventsA,B* 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 𝐵 𝑃 𝐵 = 𝑃 𝐵 𝐴 𝑃 𝐴
* 𝑃 𝐴 𝐵 = Y 𝐵 𝐴 Y(Z)Y(\)
• Simplerulethatletsusswapconditioningorder
• Bayesianstatisticalinferencemakesheavyuse* Marginals: probabilitiesofindividualvariables* Marginalisation:summingawayallbutr.v.’s ofinterest
31
Bayes
COMP90051MachineLearning(S22017) L1
Summary
• Whystudymachinelearning?
• Machinelearningbasics
• Reviewofprobabilitytheory
32