View
145
Download
1
Category
Preview:
DESCRIPTION
Citation preview
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
Seminar: Statistical NLPSeminar: Statistical NLP
Girona, June 2003Girona, June 2003
Machine Learning for Natural Language
Processing
Machine Learning for Natural Language
Processing Lluís Màrquez
TALP Research Center Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya
Lluís MàrquezTALP Research Center
Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
OutlineOutline
• Machine Learning for NLP• Machine Learning for NLP
• The Classification Problem
• Three ML Algorithms
• Applications to NLP
• The Classification Problem
• Three ML Algorithms
• Applications to NLP
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
• Machine Learning for NLP• Machine Learning for NLP
OutlineOutline
• The Classification Problem
• Three ML Algorithms
• Applications to NLP
• The Classification Problem
• Three ML Algorithms
• Applications to NLP
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
ML4NLPML4NLP
• There are many general-purpose definitions of Machine Learning (or artificial learning):
• There are many general-purpose definitions of Machine Learning (or artificial learning):
Making a computer automatically acquire some kind of knowledge from a concrete
data domain
Making a computer automatically acquire some kind of knowledge from a concrete
data domain
• Learners are computers: we study learning algorithms
• Resources are scarce: time, memory, data, etc.
• It has (almost) nothing to do with: Cognitive science, neuroscience, theory of scientific discovery and research, etc.
• Biological plausibility is welcome but not the main goal
• Learners are computers: we study learning algorithms
• Resources are scarce: time, memory, data, etc.
• It has (almost) nothing to do with: Cognitive science, neuroscience, theory of scientific discovery and research, etc.
• Biological plausibility is welcome but not the main goal
Machine LearningMachine Learning
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
Machine LearningMachine Learning
• Learning... but what for?– To perform some particular task
– To react to environmental inputs
– Concept learning from data:• modelling concepts underlying data
• predicting unseen observations
• compacting the knowledge representation
• knowledge discovery for expert systems
• Learning... but what for?– To perform some particular task
– To react to environmental inputs
– Concept learning from data:• modelling concepts underlying data
• predicting unseen observations
• compacting the knowledge representation
• knowledge discovery for expert systems
• We will concentrate on:– Supervised inductive learning for classification
= discriminative learning
• We will concentrate on:– Supervised inductive learning for classification
= discriminative learning
ML4NLPML4NLP
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
Obtaining a description of the concept in
some representation language that
explains observations and helps
predicting new instances of the same
distribution
Obtaining a description of the concept in
some representation language that
explains observations and helps
predicting new instances of the same
distribution
A more precise definition:A more precise definition:
Machine LearningMachine LearningML4NLPML4NLP
• What to read?– Machine Learning (Mitchell, 1997)
• What to read?– Machine Learning (Mitchell, 1997)
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
• Lexical and structural ambiguity problems
– Word selection (SR, MT)– Part-of-speech tagging– Semantic ambiguity (polysemy)– Prepositional phrase attachment– Reference ambiguity (anaphora)– etc.
• Lexical and structural ambiguity problems
– Word selection (SR, MT)– Part-of-speech tagging– Semantic ambiguity (polysemy)– Prepositional phrase attachment– Reference ambiguity (anaphora)– etc.
Clasification problems
Clasification problems
9090’s’s: Application of Machine Learning techniques (ML) to NLP problems
9090’s’s: Application of Machine Learning techniques (ML) to NLP problems
Empirical NLP Empirical NLP ML4NLPML4NLP
• What to read? Foundations of Statistical Language Processing (Manning & Schütze,
1999)
• What to read? Foundations of Statistical Language Processing (Manning & Schütze,
1999)
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
He was shot in the hand as he chased
the robbers in the back street
He was shot in the hand as he chased
the robbers in the back street
• Ambiguity is a crucial problem for natural language understanding/processing. Ambiguity Resolution = Classification
• Ambiguity is a crucial problem for natural language understanding/processing. Ambiguity Resolution = Classification
(The Wall Street Journal Corpus)(The Wall Street Journal Corpus)
NLP “classification” problems
NLP “classification” problems
ML4NLPML4NLP
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
He was shot in the hand as he chased
the robbers in the back street
He was shot in the hand as he chased
the robbers in the back street
• Morpho-syntactic ambiguity• Morpho-syntactic ambiguity
NNVBNNVB
JJVBJJ
VBNNVBNNVB
(The Wall Street Journal Corpus)(The Wall Street Journal Corpus)
NLP “classification” problems
NLP “classification” problems
ML4NLPML4NLP
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
He was shot in the hand as he chased
the robbers in the back street
He was shot in the hand as he chased
the robbers in the back streetNNVBNNVB
JJVBJJ
VBNNVBNNVB
(The Wall Street Journal Corpus)(The Wall Street Journal Corpus)
• Morpho-syntactic ambiguity: Part of Speech Tagging
• Morpho-syntactic ambiguity: Part of Speech Tagging
NLP “classification” problems
NLP “classification” problems
ML4NLPML4NLP
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
He was shot in the hand as he chased
the robbers in the back street
He was shot in the hand as he chased
the robbers in the back streetbody-partclock-partbody-partclock-part
(The Wall Street Journal Corpus)(The Wall Street Journal Corpus)
• Semantic (lexical) ambiguity• Semantic (lexical) ambiguity
NLP “classification” problems
NLP “classification” problems
ML4NLPML4NLP
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
He was shot in the hand as he chased
the robbers in the back street
He was shot in the hand as he chased
the robbers in the back streetbody-partclock-partbody-partclock-part
(The Wall Street Journal Corpus)(The Wall Street Journal Corpus)
• Semantic (lexical) ambiguity: Word Sense Disambiguation
• Semantic (lexical) ambiguity: Word Sense Disambiguation
NLP “classification” problems
NLP “classification” problems
ML4NLPML4NLP
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
He was shot in the hand as he chased
the robbers in the back street
He was shot in the hand as he chased
the robbers in the back street
(The Wall Street Journal Corpus)(The Wall Street Journal Corpus)
• Structural (syntactic) ambiguity• Structural (syntactic) ambiguity
NLP “classification” problems
NLP “classification” problems
ML4NLPML4NLP
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
He was shot in the hand as he chased
the robbers in the back street
He was shot in the hand as he chased
the robbers in the back street
(The Wall Street Journal Corpus)(The Wall Street Journal Corpus)
• Structural (syntactic) ambiguity• Structural (syntactic) ambiguity
NLP “classification” problems
NLP “classification” problems
ML4NLPML4NLP
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
He was shot in the hand as he (chased
(the robbers)NP (in the back street)PP)
He was shot in the hand as he (chased
(the robbers)NP (in the back street)PP)
(The Wall Street Journal Corpus)(The Wall Street Journal Corpus)
• Structural (syntactic) ambiguity: PP-attachment disambiguation
• Structural (syntactic) ambiguity: PP-attachment disambiguation
NLP “classification” problems
NLP “classification” problems
ML4NLPML4NLP
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
• The Classification Problem
• Three ML Algorithms in detail
• Applications to NLP
• The Classification Problem
• Three ML Algorithms in detail
• Applications to NLP
OutlineOutline
• Machine Learning for NLP• Machine Learning for NLP
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
IA perspective
IA perspective
Feature Vector ClassificationFeature Vector ClassificationClassificationClassification
• An instance is a vector: x = <x1,…, xn> whose components,
called features (or attributes), are discrete or real-valued.
• Let X be the space of all possible instances.
• Let Y={y1,…, ym} be the set of categories (or classes).
• The goal is to learn an unknown target function, f : X Y
• A training example is an instance x belonging to X,
labelled with the correct value for f(x), i.e., a pair <x, f(x)>
• Let D be the set of all training examples.
• An instance is a vector: x = <x1,…, xn> whose components,
called features (or attributes), are discrete or real-valued.
• Let X be the space of all possible instances.
• Let Y={y1,…, ym} be the set of categories (or classes).
• The goal is to learn an unknown target function, f : X Y
• A training example is an instance x belonging to X,
labelled with the correct value for f(x), i.e., a pair <x, f(x)>
• Let D be the set of all training examples.
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
Feature Vector ClassificationFeature Vector Classification
The goal is to find a function h belonging to
H such that for all pair <x, f (x)> belonging
to D, h(x) = f (x)
The goal is to find a function h belonging to
H such that for all pair <x, f (x)> belonging
to D, h(x) = f (x)
• The hypotheses space, H, is the set of functions h: X Y that the learner can consider as possible definitions
• The hypotheses space, H, is the set of functions h: X Y that the learner can consider as possible definitions
ClassificationClassification
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
An ExampleAn Example
Example SIZE COLOR SHAPE CLASS1 small red circle positive
2 big red circle positive
3 small red triangle negative
4 big blue circle negative
(COLOR=red) (SHAPE=circle) positive
RulesRules
red blue
SHAPE negative
positive
circle triangle
negative
COLOR
Decision TreeDecision Tree
otherwise negative
ClassificationClassification
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
An ExampleAn Example
Example SIZE COLOR SHAPE CLASS1 small red circle positive
2 big red circle positive
3 small red triangle negative
4 big blue circle negative
Example SIZE COLOR SHAPE CLASS1 small red circle positive
2 big red circle positive
3 small red triangle negative
4 big blue circle negative
RulesRules
(SIZE=small) (SHAPE=circle) positive
otherwise negative
(SIZE=big) (COLOR=red) positivesmall big
SHAPE
pos
circle red
SIZE
Decision TreeDecision Tree
COLOR
triang blue
neg pos neg
ClassificationClassification
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
Some important conceptsSome important concepts
• Inductive Bias
“Any means that a classification learning system uses to choose between to functions that are both consistent with the training data is called inductive bias” (Mooney & Cardie, 99)
– Language / Search bias
• Inductive Bias
“Any means that a classification learning system uses to choose between to functions that are both consistent with the training data is called inductive bias” (Mooney & Cardie, 99)
– Language / Search bias
ClassificationClassification
red blue
SHAPE negative
positive
circle triangle
negative
COLOR
Decision TreeDecision Tree
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
• Generalization ability and overfitting
• Batch Learning vs. on-line Leaning
• Symbolic vs. statistical Learning
• Propositional vs. first-order learning
• Generalization ability and overfitting
• Batch Learning vs. on-line Leaning
• Symbolic vs. statistical Learning
• Propositional vs. first-order learning
• Inductive Bias
• Training error and generalization error
• Inductive Bias
• Training error and generalization error
Some important conceptsSome important conceptsClassificationClassification
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
Propositional vs. Relational Learning
Propositional vs. Relational Learning
ClassificationClassification
color(red) shape(circle) classA
course(X) person(Y) link_to(Y,X) instructor_of(X,Y)
research_project(X) person(Z) link_to(L1,X,Y) link_to(L2,Y,Z) neighbour_word_people(L1)
member_proj(X,Z)
• Relational learning = ILP (induction of logic programs)• Relational learning = ILP (induction of logic programs)
• Propositional learning• Propositional learning
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
The Classification SettingClass, Point, Example, Data Set, ...
The Classification SettingClass, Point, Example, Data Set, ...
• Input Space: X Rn
• (binary) Output Space: Y = {+1,-1}
• A point, pattern or instance: x X, x = (x1, x2, …, xn)
• Example: (x, y) with x X, y Y
• Training Set: a set of m examples generated i.i.d. according to an unknown distribution P(x,y) S = {(x1, y1), …, (xm, ym)} (X Y)m
• Input Space: X Rn
• (binary) Output Space: Y = {+1,-1}
• A point, pattern or instance: x X, x = (x1, x2, …, xn)
• Example: (x, y) with x X, y Y
• Training Set: a set of m examples generated i.i.d. according to an unknown distribution P(x,y) S = {(x1, y1), …, (xm, ym)} (X Y)m
ClassificationClassification
CoLT/SLT perspectiveCoLT/SLT
perspective
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
The Classification SettingLearning, Error, ...
The Classification SettingLearning, Error, ...
• The hypotheses space, H, is the set of functions h: XY that the learner can consider as possible definitions. In SVM are of the form:
• The goal is to find a function h belonging to H such that the expected misclassification error on new examples, also drawn from P(x,y), is minimal (Risk Minimization, RM)
• The hypotheses space, H, is the set of functions h: XY that the learner can consider as possible definitions. In SVM are of the form:
• The goal is to find a function h belonging to H such that the expected misclassification error on new examples, also drawn from P(x,y), is minimal (Risk Minimization, RM)
n
iii bwh
1
)()( xx
ClassificationClassification
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
The Classification SettingLearning, Error, ...
The Classification SettingLearning, Error, ...
• Expected error (risk)
• Problem: P itself is unknown. Known are training examples an induction principle is needed
• Empirical Risk Minimization (ERM): Find the function h belonging to H for which the training error (empirical risk) is minimal
• Expected error (risk)
• Problem: P itself is unknown. Known are training examples an induction principle is needed
• Empirical Risk Minimization (ERM): Find the function h belonging to H for which the training error (empirical risk) is minimal
),(),(loss ydPyhhR xx
m
i iiemp yhmhR1
),(loss1 x
ClassificationClassification
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
The Classification SettingError, Over(under)fitting,...
The Classification SettingError, Over(under)fitting,...
• Low training error low true error?
• The overfitting dilemma:
• Low training error low true error?
• The overfitting dilemma:
• Trade-off between training error and complexity
• Different learning biases can be used
• Trade-off between training error and complexity
• Different learning biases can be used
(Mül le
r et
al.,
200
1)
(Mül le
r et
al.,
200
1)
ClassificationClassification
Underfitting Overfitting
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
• The Classification Problem
• Three ML Algorithms
• Applications to NLP
• The Classification Problem
• Three ML Algorithms
• Applications to NLP
OutlineOutline
• Machine Learning for NLP• Machine Learning for NLP
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
• The Classification Problem
• Three ML Algorithms−Decision Trees−AdaBoost−Support Vector Machines
• Applications to NLP
• The Classification Problem
• Three ML Algorithms−Decision Trees−AdaBoost−Support Vector Machines
• Applications to NLP
OutlineOutline
• Machine Learning for NLP• Machine Learning for NLP
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
• Statistical learning:– HMM, Bayesian Networks, ME, CRF, etc.
• Traditional methods from Artificial Intelligence (ML, AI)
– Decision trees/lists, exemplar-based learning, rule induction, neural networks, etc.
• Methods from Computational Learning Theory (CoLT/SLT)
– Winnow, AdaBoost, SVM’s, etc.
• Statistical learning:– HMM, Bayesian Networks, ME, CRF, etc.
• Traditional methods from Artificial Intelligence (ML, AI)
– Decision trees/lists, exemplar-based learning, rule induction, neural networks, etc.
• Methods from Computational Learning Theory (CoLT/SLT)
– Winnow, AdaBoost, SVM’s, etc.
Learning ParadigmsLearning ParadigmsAlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
• Classifier combination:– Bagging, Boosting, Randomization,
ECOC, Stacking, etc.
• Semi-supervised learning: learning from labelled and unlabelled examples– Bootstrapping, EM, Transductive learning
(SVM’s, AdaBoost), Co-Training, etc.
• etc.
• Classifier combination:– Bagging, Boosting, Randomization,
ECOC, Stacking, etc.
• Semi-supervised learning: learning from labelled and unlabelled examples– Bootstrapping, EM, Transductive learning
(SVM’s, AdaBoost), Co-Training, etc.
• etc.
AlgorithmsAlgorithms
Learning ParadigmsLearning Paradigms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
Decision TreesDecision Trees
• Decision trees are a way to represent rules underlying training data, with hierarchical structures that recursively partition the data.
• They have been used by many research communities (Pattern Recognition, Statistics, ML, etc.) for data exploration with some of the following purposes: Description, Classification, and Generalization.
• From a machine-learning perspective: Decision Trees are n -ary branching trees that represent classification rules for classifying the objects of a certain domain into a set of mutually exclusive classes
• Decision trees are a way to represent rules underlying training data, with hierarchical structures that recursively partition the data.
• They have been used by many research communities (Pattern Recognition, Statistics, ML, etc.) for data exploration with some of the following purposes: Description, Classification, and Generalization.
• From a machine-learning perspective: Decision Trees are n -ary branching trees that represent classification rules for classifying the objects of a certain domain into a set of mutually exclusive classes
AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
Decision TreesDecision Trees
• Acquisition: Top-Down Induction of Decision Trees (TDIDT)
• Systems: CART (Breiman et al. 84),ID3, C4.5, C5.0 (Quinlan 86,93,98),
ASSISTANT, ASSISTANT-R (Cestnik et al. 87) (Kononenko et al. 95)
etc.
• Acquisition: Top-Down Induction of Decision Trees (TDIDT)
• Systems: CART (Breiman et al. 84),ID3, C4.5, C5.0 (Quinlan 86,93,98),
ASSISTANT, ASSISTANT-R (Cestnik et al. 87) (Kononenko et al. 95)
etc.
AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
An ExampleAn Example
A1
A2 A3
C1
A5 A2
A2
A5 C3
C2C1
...
..
....
...
v1
v2
v3
v5v4
v6
v7
small big
SHAPE
pos
circle red
SIZE
Decision TreeDecision Tree
COLOR
triang blue
neg pos neg
AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
Learning Decision TreesLearning Decision TreesTrainingTraining
Training Set
TDIDTTDIDT+DT
=
TestTest
=DT
Example + ClassClass
AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
Gen
era
l In
du
cti
on
A
lgori
thm
Gen
era
l In
du
cti
on
A
lgori
thm
function TDIDT (X:set-of-examples; A:set-of-features) var: tree1,tree2: decision-tree;
X’: set-of-examples; A’: set-of-features end-var if (stopping_criterion (X)) then tree1 := create_leaf_tree (X)
else amax := feature_selection (X,A);
tree1 := create_tree (X, amax);
for-all val in values (amax) do
X’ := select_examples (X,amax,val);
A’ := A - {amax};
tree2 := TDIDT (X’,A’);
tree1 := add_branch (tree1,tree2,val)
end-for end-if return (tree1)
end-function
AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
Gen
era
l In
du
cti
on
A
lgori
thm
Gen
era
l In
du
cti
on
A
lgori
thm
function TDIDT (X:set-of-examples; A:set-of-features) var: tree1,tree2: decision-tree;
X’: set-of-examples; A’: set-of-features end-var if (stopping_criterion (X)) then tree1 := create_leaf_tree (X)
else amax := feature_selection (X,A);
tree1 := create_tree (X, amax);
for-all val in values (amax) do
X’ := select_examples (X,amax,val);
A’ := A - {amax};
tree2 := TDIDT (X’,A’);
tree1 := add_branch (tree1,tree2,val)
end-for end-if return (tree1)
end-function
AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
Feature Selection CriteriaFeature Selection Criteria
• Functions derived from Information Theory:– Information Gain, Gain Ratio (Quinlan 86)
• Functions derived from Distance Measures– Gini Diversity Index (Breiman et al. 84)
– RLM (López de Mántaras 91)
• Statistically-based– Chi-square test (Sestito & Dillon 94)
– Symmetrical Tau (Zhou & Dillon 91)
• RELIEFF-IG: variant of RELIEFF (Kononenko 94)
• Functions derived from Information Theory:– Information Gain, Gain Ratio (Quinlan 86)
• Functions derived from Distance Measures– Gini Diversity Index (Breiman et al. 84)
– RLM (López de Mántaras 91)
• Statistically-based– Chi-square test (Sestito & Dillon 94)
– Symmetrical Tau (Zhou & Dillon 91)
• RELIEFF-IG: variant of RELIEFF (Kononenko 94)
AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
Extensions of DTsExtensions of DTs
• Pruning (pre/post)
• Minimize the effect of the greedy approach: lookahead
• Non-lineal splits
• Combination of multiple models
• Incremental learning (on-line)
• etc.
• Pruning (pre/post)
• Minimize the effect of the greedy approach: lookahead
• Non-lineal splits
• Combination of multiple models
• Incremental learning (on-line)
• etc.
(Murthy 95)(Murthy 95)
AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
Decision Trees and NLPDecision Trees and NLP
• Speech processing (Bahl et al. 89; Bakiri & Dietterich 99)
• POS Tagging (Cardie 93, Schmid 94b; Magerman 95; Màrquez & Rodríguez 95,97; Màrquez et al. 00)
• Word sense disambiguation (Brown et al. 91; Cardie 93; Mooney 96)
• Parsing (Magerman 95,96; Haruno et al. 98,99)
• Text categorization (Lewis & Ringuette 94; Weiss et al. 99)
• Text summarization (Mani & Bloedorn 98)
• Dialogue act tagging (Samuel et al. 98)
• Speech processing (Bahl et al. 89; Bakiri & Dietterich 99)
• POS Tagging (Cardie 93, Schmid 94b; Magerman 95; Màrquez & Rodríguez 95,97; Màrquez et al. 00)
• Word sense disambiguation (Brown et al. 91; Cardie 93; Mooney 96)
• Parsing (Magerman 95,96; Haruno et al. 98,99)
• Text categorization (Lewis & Ringuette 94; Weiss et al. 99)
• Text summarization (Mani & Bloedorn 98)
• Dialogue act tagging (Samuel et al. 98)
AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
• Noun phrase coreference (Aone & Benett 95; Mc Carthy & Lehnert 95)
• Discourse analysis in information extraction (Soderland & Lehnert 94)
• Cue phrase identification in text and speech (Litman 94; Siegel & McKeown 94)
• Verb classification in Machine Translation (Tanaka 96; Siegel 97)
• Noun phrase coreference (Aone & Benett 95; Mc Carthy & Lehnert 95)
• Discourse analysis in information extraction (Soderland & Lehnert 94)
• Cue phrase identification in text and speech (Litman 94; Siegel & McKeown 94)
• Verb classification in Machine Translation (Tanaka 96; Siegel 97)
Decision Trees and NLPDecision Trees and NLPAlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
Decision Trees: pros&consDecision Trees: pros&cons
• Advantages
– Acquires symbolic knowledge in a understandable way
– Very well studied ML algorithms and variants
– Can be easily translated into rules
– Existence of available software: C4.5, C5.0, etc.
– Can be easily integrated into an ensemble
• Advantages
– Acquires symbolic knowledge in a understandable way
– Very well studied ML algorithms and variants
– Can be easily translated into rules
– Existence of available software: C4.5, C5.0, etc.
– Can be easily integrated into an ensemble
AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
• Drawbacks
– Computationally expensive when scaling to large natural language domains: training examples, features, etc.
– Data sparseness and data fragmentation: the problem of the small disjuncts => Probability estimation
– DTs is a model with high variance (unstable)
– Tendency to overfit training data: pruning is necessary
– Requires quite a big effort in tuning the model
• Drawbacks
– Computationally expensive when scaling to large natural language domains: training examples, features, etc.
– Data sparseness and data fragmentation: the problem of the small disjuncts => Probability estimation
– DTs is a model with high variance (unstable)
– Tendency to overfit training data: pruning is necessary
– Requires quite a big effort in tuning the model
Decision Trees: pros&consDecision Trees: pros&cons
AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
Boosting algorithmsBoosting algorithms
• Idea “to combine many simple and moderately accurate
hypotheses (weak classifiers) into a single and highly accurate classifier”
• AdaBoost (Freund & Schapire 95) has been theoretically and empirically studied extensively
• Many other variants extensions (1997-2003)
http://www.lsi.upc.es/~lluism/seminari/ml&nlp.html
• Idea “to combine many simple and moderately accurate
hypotheses (weak classifiers) into a single and highly accurate classifier”
• AdaBoost (Freund & Schapire 95) has been theoretically and empirically studied extensively
• Many other variants extensions (1997-2003)
http://www.lsi.upc.es/~lluism/seminari/ml&nlp.html
AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
AdaBoost: general schemeAdaBoost: general scheme
TS2
D2
TS1D1
Weak
Learner
h1
Weak
Learner
h2
TST. . .
Probability distribution
updating
DT
Weak
Learner
hT. . .
TR
AIN
ING
TR
AIN
ING
Linear combination
F(h1,h2,...,hT)TES
TTES
T
2
AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
AdaBoost: algorithmAdaBoost: algorithmAlgorithmsAlgorithms
(Freund & Schapire 97)(Freund & Schapire 97)
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
AdaBoost: exampleAdaBoost: example
Weak hypotheses = vertical/horizontal hyperplanesWeak hypotheses = vertical/horizontal hyperplanes
AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
AdaBoost: round 1AdaBoost: round 1AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
AdaBoost: round 2AdaBoost: round 2AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
AdaBoost: round 3AdaBoost: round 3AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
Combined HypothesisCombined Hypothesis
www.research.att.com/~yoav/adaboostwww.research.att.com/~yoav/adaboost
AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
AdaBoost and NLPAdaBoost and NLP• POS Tagging (Abney et al. 99; Màrquez 99)
• Text and Speech Categorization (Schapire & Singer 98; Schapire et al. 98; Weiss et al. 99)
• PP-attachment Disambiguation (Abney et al. 99)
• Parsing (Haruno et al. 99)
• Word Sense Disambiguation (Escudero et al. 00, 01)
• Shallow parsing (Carreras & Màrquez, 01a; 02)
• Email spam filtering (Carreras & Màrquez, 01b)
• Term Extraction (Vivaldi, et al. 01)
• POS Tagging (Abney et al. 99; Màrquez 99)
• Text and Speech Categorization (Schapire & Singer 98; Schapire et al. 98; Weiss et al. 99)
• PP-attachment Disambiguation (Abney et al. 99)
• Parsing (Haruno et al. 99)
• Word Sense Disambiguation (Escudero et al. 00, 01)
• Shallow parsing (Carreras & Màrquez, 01a; 02)
• Email spam filtering (Carreras & Màrquez, 01b)
• Term Extraction (Vivaldi, et al. 01)
AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
AlgorithmsAlgorithms
AdaBoost: pros&consAdaBoost: pros&cons
+Easy to implement and few parameters to set
+Time and space grow linearly with number of examples. Ability to manage very large learning problems
+Does not constrain explicitly the complexity of the learner
+Naturally combines feature selection with learning
+Has been succesfully applied to many practical problems
+Easy to implement and few parameters to set
+Time and space grow linearly with number of examples. Ability to manage very large learning problems
+Does not constrain explicitly the complexity of the learner
+Naturally combines feature selection with learning
+Has been succesfully applied to many practical problems
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
±Seems to be rather robust to overfitting (number of rounds) but sensitive to noise
±Performance is very good when there are relatively few relevant terms (features)
– Can perform poorly when there is insufficient training data relative to the complexity of the base classifiers, the training errors of the base classifiers become too large too quickly
±Seems to be rather robust to overfitting (number of rounds) but sensitive to noise
±Performance is very good when there are relatively few relevant terms (features)
– Can perform poorly when there is insufficient training data relative to the complexity of the base classifiers, the training errors of the base classifiers become too large too quickly
AlgorithmsAlgorithms
AdaBoost: pros&consAdaBoost: pros&cons
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
• “Support Vector Machines (SVM) are learning systems that use a hypothesis space of linear functions in a high dimensional feature space, trained with a learning algorithm from optimisation theory that implements a learning bias derived from statistical learning theory”. (Cristianini & Shawe-Taylor,
2000)
• “Support Vector Machines (SVM) are learning systems that use a hypothesis space of linear functions in a high dimensional feature space, trained with a learning algorithm from optimisation theory that implements a learning bias derived from statistical learning theory”. (Cristianini & Shawe-Taylor,
2000)
AlgorithmsAlgorithms
SVM: A General DefinitionSVM: A General Definition
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
• “Support Vector Machines (SVM) are learning systems that use a hypothesis space of linear functions in a high dimensional feature space, trained with a learning algorithm from optimisation theory that implements a learning bias derived from statistical learning theory”. (Cristianini & Shawe-Taylor, 2000)
• “Support Vector Machines (SVM) are learning systems that use a hypothesis space of linear functions in a high dimensional feature space, trained with a learning algorithm from optimisation theory that implements a learning bias derived from statistical learning theory”. (Cristianini & Shawe-Taylor, 2000)
Key ConceptsKey Concepts
SVM: A General DefinitionSVM: A General Definition
AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
Linear ClassifiersLinear Classifiers
otherwise
if
1
0b xw1b xw sign )h(
N
1i
i iN
1i
i ix
• Hyperplanes in RN.
• Defined by a weight vector (w) and a threshold (b).
• They induce a classification rule:
• Hyperplanes in RN.
• Defined by a weight vector (w) and a threshold (b).
• They induce a classification rule:
w
++ +
+
++
_
_ _ _
__
__
_wb
AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
Optimal Hyperplane: Geometric Intuition
Optimal Hyperplane: Geometric Intuition
AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
Optimal Hyperplane: Geometric Intuition
Optimal Hyperplane: Geometric Intuition
Maximal Margin
Hyperplane
Maximal Margin
Hyperplane
These are theSupport Vectors
These are theSupport Vectors
AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
Linearly separable dataLinearly separable data
Seminari SVMs 22/05/2001Seminari SVMs 22/05/2001
liby ii ,,1 allfor 1)( xw
2/2 margin geometric w
:sconstraint subject to minimize toequivalent ismargin themaximizing2
w
Quadratic Programming
AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
Non-separable case (soft margin)
Non-separable case (soft margin)
Seminari SVMs 22/05/2001Seminari SVMs 22/05/2001
lii ,,1 allfor 0
:sconstraint subject to Minimize1
2
n
iiC w
liby iii ,,1 allfor 1)( xw
costs gintroducinfor ablesslack vari positive ,,1 l
AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
n
iii bwf
1
)()( xx
FX : Non-linear mapping
Set of hypotheses
Non-linear SVMsNon-linear SVMs
• Implicit mapping into feature space via kernel functions• Implicit mapping into feature space via kernel functions
Seminari SVMs 22/05/2001Seminari SVMs 22/05/2001
byfl
iiii
1
)()()( xxx Dual formulation
)()(),( zxzx K Kernel function
bKyfl
iiii
1
),()( xxx Evaluation
AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
Non-linear SVMsNon-linear SVMs
• Kernel functions
– Must be efficiently computable
– Characterization via Mercer’s theorem
– One of the curious facts about using a kernel is that we do not need to know the underlying feature map in order to be able to learn in the feature space! (Cristianini & Shawe-Taylor, 2000)
– Examples: polynomials, Gaussian radial basis functions, two-layer sigmoidal neural networks, etc.
• Kernel functions
– Must be efficiently computable
– Characterization via Mercer’s theorem
– One of the curious facts about using a kernel is that we do not need to know the underlying feature map in order to be able to learn in the feature space! (Cristianini & Shawe-Taylor, 2000)
– Examples: polynomials, Gaussian radial basis functions, two-layer sigmoidal neural networks, etc.
Seminari SVMs 22/05/2001Seminari SVMs 22/05/2001
AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
Non linear SVMsNon linear SVMs
Seminari SVMs 22/05/2001Seminari SVMs 22/05/2001
Degree 3 polynomial kernel
lin. separable lin. non-separable
AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
Toy ExamplesToy Examples• All examples have been run with the 2D
graphic interface of SVMLIB (Chang and Lin, National University of Taiwan)
“LIBSVM is an integrated software for support vector classification, (C-SVC, nu-SVC), regression (epsilon-SVR, un-SVR) and distribution estimation (one-class SVM). It supports multi-class classification. The basic algorithm is a simplification of both SMO by Platt and SVMLight by Joachims. It is also a simplification of the modification 2 of SMO by Keerthy et al. Our goal is to help users from other fields to easily use SVM as a tool. LIBSVM provides a simple interface where users can easily link it with their own programs…”
• Available from: www.csie.ntu.edu.tw/~cjlin/libsvm (it icludes a Web integrated demo tool)
• All examples have been run with the 2D graphic interface of SVMLIB (Chang and Lin, National University of Taiwan)
“LIBSVM is an integrated software for support vector classification, (C-SVC, nu-SVC), regression (epsilon-SVR, un-SVR) and distribution estimation (one-class SVM). It supports multi-class classification. The basic algorithm is a simplification of both SMO by Platt and SVMLight by Joachims. It is also a simplification of the modification 2 of SMO by Keerthy et al. Our goal is to help users from other fields to easily use SVM as a tool. LIBSVM provides a simple interface where users can easily link it with their own programs…”
• Available from: www.csie.ntu.edu.tw/~cjlin/libsvm (it icludes a Web integrated demo tool)
AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
Toy Examples (I)Toy Examples (I)
.What happens if we adda blue training example
here?
What happens if we adda blue training example
here?
Linearly separable data setLinear SVMMaximal margin Hyperplane
AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
Toy Examples (I)Toy Examples (I)
(still) Linearly separable data setLinear SVM
High value of C parameterMaximal margin Hyperplane
The example is correctly classified
The example is correctly classified
AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
(still) Linearly separable data setLinear SVM
Low value of C parameterTrade-off between: margin and training error
Toy Examples (I)Toy Examples (I)
The example is now a bounded SV
The example is now a bounded SV
AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
Toy Examples (II)Toy Examples (II)AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
Toy Examples (II)Toy Examples (II)AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
Toy Examples (II)Toy Examples (II)AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
Toy Examples (III)Toy Examples (III)AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
SVM: SummarySVM: Summary
• SVMs introduced in COLT’92 (Boser, Guyon, & Vapnik, 1992). Great developement since then
• Kernel-induced feature spaces: SVMs work efficiently in very high dimensional feature spaces (+)
• Learning bias: maximal margin optimisation. Reduces the danger of overfitting. Generalization bounds for SVMs (+)
• Compact representation of the induced hypothesis. The solution is sparse in terms of SVs (+)
• SVMs introduced in COLT’92 (Boser, Guyon, & Vapnik, 1992). Great developement since then
• Kernel-induced feature spaces: SVMs work efficiently in very high dimensional feature spaces (+)
• Learning bias: maximal margin optimisation. Reduces the danger of overfitting. Generalization bounds for SVMs (+)
• Compact representation of the induced hypothesis. The solution is sparse in terms of SVs (+)
AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
SVM: SummarySVM: Summary• Due to Mercer’s conditions on the kernels the
optimi-sation problems are convex. No local minima (+)
• Optimisation theory guides the implementation. Efficient learning (+)
• Mainly for classification but also for regression, density estimation, clustering, etc.
• Success in many real-world applications: OCR, vision, bioinformatics, speech recognition, NLP: TextCat, POS tagging, chunking, parsing, etc. (+)
• Parameter tuning (–). Implications in convergence times, sparsity of the solution, etc.
• Due to Mercer’s conditions on the kernels the optimi-sation problems are convex. No local minima (+)
• Optimisation theory guides the implementation. Efficient learning (+)
• Mainly for classification but also for regression, density estimation, clustering, etc.
• Success in many real-world applications: OCR, vision, bioinformatics, speech recognition, NLP: TextCat, POS tagging, chunking, parsing, etc. (+)
• Parameter tuning (–). Implications in convergence times, sparsity of the solution, etc.
AlgorithmsAlgorithms
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
• The Classification Problem
• Three ML Algorithms
• Applications to NLP
• The Classification Problem
• Three ML Algorithms
• Applications to NLP
OutlineOutline
• Machine Learning for NLP• Machine Learning for NLP
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
NLP problemsNLP problemsApplicationsApplications
• Warning! We will not focus on final NLP applications, but on intermediate tasks...
• We will classify the NLP tasks according to their (structural) complexity
• Warning! We will not focus on final NLP applications, but on intermediate tasks...
• We will classify the NLP tasks according to their (structural) complexity
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
NLP problems: structural complexity
NLP problems: structural complexity
ApplicationsApplications
• Decisional problems−Text Categorization, Document filtering,
Word Sense Disambiguation, etc.
• Sequence tagging and detection of sequential structures−POS tagging, Named Entity extraction,
syntactic chunking, etc.
• Hierarchical structures−Clause detection, full parsing, IE of
complex concepts, composite Named Entities, etc.
• Decisional problems−Text Categorization, Document filtering,
Word Sense Disambiguation, etc.
• Sequence tagging and detection of sequential structures−POS tagging, Named Entity extraction,
syntactic chunking, etc.
• Hierarchical structures−Clause detection, full parsing, IE of
complex concepts, composite Named Entities, etc.
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
He was shot in the hand as he chased
the robbers in the back street
He was shot in the hand as he chased
the robbers in the back streetNNVBNNVB
JJVBJJ
VBNNVBNNVB
(The Wall Street Journal Corpus)(The Wall Street Journal Corpus)
• Morpho-syntactic ambiguity: Part of Speech Tagging
• Morpho-syntactic ambiguity: Part of Speech Tagging
POS taggingPOS taggingApplicationsApplications
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
POS taggingPOS taggingApplicationsApplications
root
P(IN)=0.81P(RB)=0.19Word Form
leaf
P(IN)=0.83P(RB)=0.17tag(+1)
P(IN)=0.13P(RB)=0.87tag(+2)
P(IN)=0.013P(RB)=0.987
“As”,“as”
RB
IN
others
others
...
...
“preposition-adverb” tree“preposition-adverb” tree
^
Probabilistic interpretation:Probabilistic interpretation:
P( RB | word=“A/as” tag(+1)=RB tag(+2)=IN) = 0.987
P( IN | word=“A/as” tag(+1)=RB tag(+2)=IN) = 0.013
^
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
“as_RB much_RB as_IN”“as_RB much_RB as_IN”
Collocations:Collocations:
“as_RB well_RB as_IN”“as_RB well_RB as_IN”
“as_RB soon_RB as_IN”“as_RB soon_RB as_IN”
POS taggingPOS taggingApplicationsApplications
root
P(IN)=0.81P(RB)=0.19Word Form
leaf
P(IN)=0.83P(RB)=0.17tag(+1)
P(IN)=0.13P(RB)=0.87tag(+2)
P(IN)=0.013P(RB)=0.987
“As”,“as”
RB
IN
others
others
...
...
“preposition-adverb” tree“preposition-adverb” tree
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
POS taggingPOS tagging
Rawtext
Morphologicalanalysis
Taggedtext
Classify Update Filter
Language Model
Disambiguation
stop?
RTT (Màrquez & Rodríguez 97)RTT (Màrquez & Rodríguez 97)
yesno
A Sequential Model for Multi-class Classification: NLP/POS Tagging (Even-Zohar & Roth, 01)
A Sequential Model for Multi-class Classification: NLP/POS Tagging (Even-Zohar & Roth, 01)
ApplicationsApplications
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
POS taggingPOS tagging
STT (Màrquez & Rodríguez 97)STT (Màrquez & Rodríguez 97)
Taggedtext
Rawtext
Morphologicalanalysis
Viterbialgorithm
Language Model
Disambiguation
Lexicalprobs. +
Contextual probs. The Use of Classifiers in sequential inference:
Chunking (Punyakanok & Roth, 00)
The Use of Classifiers in sequential inference: Chunking (Punyakanok & Roth, 00)
ApplicationsApplications
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
• Named Entity recognition
• Clause detection
• Named Entity recognition
• Clause detection
Detection of sequential and hierarchical
structures
Detection of sequential and hierarchical
structures
ApplicationsApplications
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
• We have briefly outlined:
−The ML setting: “supervised learning for classification”
−Three concrete machine learning algorithms
−How to apply them to solve itermediate NLP tasks
• We have briefly outlined:
−The ML setting: “supervised learning for classification”
−Three concrete machine learning algorithms
−How to apply them to solve itermediate NLP tasks
Summary/conclusionsSummary/conclusionsConclusionsConclusions
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
• Any ML algorithm for NLP should be:
– Robust to noise and outliers
– Efficient in large feature/example spaces
– Adaptive to new/changing domains: portability, tuning, etc.
– Able to take advantage of unlabelled examples: semi-supervised learning
• Any ML algorithm for NLP should be:
– Robust to noise and outliers
– Efficient in large feature/example spaces
– Adaptive to new/changing domains: portability, tuning, etc.
– Able to take advantage of unlabelled examples: semi-supervised learning
ConclusionsConclusions
Summary/conclusionsSummary/conclusions
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
• Statistical and ML-based Natural Language Processing is a very active and multidisciplinary area of research
• Statistical and ML-based Natural Language Processing is a very active and multidisciplinary area of research
Summary/conclusionsSummary/conclusionsConclusionsConclusions
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
Some current research lines
Some current research lines
• Appropriate learning paradigm for all kind of NLP problems: TiMBL (DBZ99), TBEDL (Brill95), ME (Ratnaparkhi98), SNoW (Roth98), CRF (Pereira & Singer02).
• Definition of an adequate (and task-specific) feature space: mapping from the input space to a high dimensional feature space, kernels, etc.
• Resolution of complex NLP problems: inference with classifiers + constraint satisfaction
• etc.
• Appropriate learning paradigm for all kind of NLP problems: TiMBL (DBZ99), TBEDL (Brill95), ME (Ratnaparkhi98), SNoW (Roth98), CRF (Pereira & Singer02).
• Definition of an adequate (and task-specific) feature space: mapping from the input space to a high dimensional feature space, kernels, etc.
• Resolution of complex NLP problems: inference with classifiers + constraint satisfaction
• etc.
ConclusionsConclusions
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
BibliografiaBibliografia
• You may found additional information at:
http://www.lsi.upc.es/~lluism/
tesi.htmlpublicacions/pubs.htmlcursos/talks.htmlcursos/MLandNL.htmlcursos/emnlp1.html
• This talk at:
http://www.lsi.upc.es/~lluism/udg03.ppt.gz
• You may found additional information at:
http://www.lsi.upc.es/~lluism/
tesi.htmlpublicacions/pubs.htmlcursos/talks.htmlcursos/MLandNL.htmlcursos/emnlp1.html
• This talk at:
http://www.lsi.upc.es/~lluism/udg03.ppt.gz
ConclusionsConclusions
Machine Learning for NLP 30/06/2003 Machine Learning for NLP 30/06/2003
Seminar: Statistical NLPSeminar: Statistical NLP
Girona, June 2003Girona, June 2003
Machine Learning for Natural Language
Processing
Machine Learning for Natural Language
Processing Lluís Màrquez
TALP Research Center Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya
Lluís MàrquezTALP Research Center
Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya
Recommended