1 Christopher Manning CS300 talk – Fall 2000 [email protected] http://nlp.stanford.edu/~manning/

Probabilistic Grammars

Embed Size (px)

Citation preview

Page 1: Probabilistic Grammars


Christopher Manning

CS300 talk – Fall [email protected]


Page 2: Probabilistic Grammars


Research areas of interest:

NLP/CL• Statistical NLP models: Combining linguistic and

statistical sophistication• NLP and ML methods for extracting meaning

relations from webpages, medical texts, etc.• Information extraction and text mining• Lexical and structural acquisition from raw text• Using robust NLP: dialect/style, readability, … • Using pragmatics, genre, NLP in web searching• Computational lexicography and the visualization

of linguistic information

Page 3: Probabilistic Grammars


Models for language

• What is the motivation for statistical models for understanding language?

• From the beginning, logics and logical reasoning were invented for handling natural language understanding

• Logics have a language-like form that draws from and meshes well with natural languages

• Where are the numbers?

Page 4: Probabilistic Grammars


Sophisticated grammars for NL

• From NP Det Adj* N• there developedprecise andsophisticatedgrammarformalisms(such as LFG,HPSG)

Page 5: Probabilistic Grammars


The Problem of Ambiguity

• Any broad-coverage grammar is hugely ambiguous (often hundreds of parses for 20+ word sentences).

• Making the grammar more comprehensive only makes the ambiguity problem get worse.

• Traditional (symbolic) NLP methods don’t provide a solution.– Selectional restrictions fail because creative/

metaphorical use of language is everywhere:• I swallowed his story• The supernova swallowed up the planet

Page 6: Probabilistic Grammars


The problem of ambiguity close up

• “The post office will hold out discounts and service concessions as incentives.”

• 12 words. Real language. At least 83 parses.

Page 7: Probabilistic Grammars


Page 8: Probabilistic Grammars


Statistical NLP methods• P(to | Sarah drove)• P(time is verb | Time flies like an arrow)• P(NP Det Adj N | mother = VP[drive] )• Statistical NLP methods:

– Estimate grammar parameters by gathering counts from texts or structured analyses of texts

– Assign probabilities to various things to determine the likelihood of word sequences, sentence structure, and interpretation

Page 9: Probabilistic Grammars


Probabilistic Context-Free Grammars

NP Det N: 0.4NP NPposs N: 0.1NP Pronoun: 0.2NP NP PP: 0.1NP N: 0.2






P(subtree above) = 0.1 x 0.4 = 0.04

Page 10: Probabilistic Grammars


Why Probabilistic Grammars?

• The predictions about grammaticality and ambi-guity of categorical grammars are not in accord with human perceptions or engineering needs.

• Categorical grammars aren’t predictive– They don’t tell us what “sounds natural”

• Probabilistic grammars model error tolerance, online lexical acquisition, … and have been amazingly successful as an engineering tool

• They capture a lot of world knowledge for free• Relevant to linguistic change and variation, too!

Page 11: Probabilistic Grammars


Example: near• In Middle English, was an adjective [Maling]• But, today, is it an adjective or a preposition?

– The near side of the moon– We were near the station

• Not just a word with multiple parts of speech! There is evidence of blending:– We were nearer the bus stop than the train– He has never been nearer the center of the

financial establishment

Page 12: Probabilistic Grammars


Research aim

• Most current statistical models are quite simple (linguistically and also statistically)

• Aim: To combine the good features of statistical NLP methods with the sophistication of rich linguistic analyses.

Page 13: Probabilistic Grammars


Lexicalising a CFG





P[inside] NP[box]

D[the] N[box]

the box

•A lexicalized CFG can capture probabilistic dependencies between words

Page 14: Probabilistic Grammars


Left-corner parsing

• The memory requirements of standard parsers do not match human linguistic processing. What humans find hardest – center embedding:– *The man that the woman the priest met

knows couldn’t help• is really the bread-and-butter of standard CFG

parsing:– (((a + b)))

• As an alternative, left-corner parsing does capture this.

Page 15: Probabilistic Grammars


Parsing and (stack) complexity

• She ruled that the contract between the union and company dictated that claims from both sides should be bargained over or arbitrated.

Page 16: Probabilistic Grammars


Tree geometry vs. stack depth

• Kim’s friend’s mother’s car smells.

• Kim thinks Sandy knows she likes green apples.

• The rat that the cat that Kim likes chased died

TD LC BU5 1 1

1 1 7

3 3 7

Page 17: Probabilistic Grammars


Probabilistic Left-Corner Grammars

• Use richer probabilistic conditioning– Left corner and goal category rather

than just parent• P(NP Det Adj N | Det, S)

• Allow left-to-right online parsing (whichcan hope to explain how people buildpartial interpretations online)• Easy integration with lexicalization,part-of-speech tagging models, etc.



Det Adj N

Page 18: Probabilistic Grammars


Probabilistic Head-driven Grammars

• The heads of phrases are the source of the main constraining information about a sentence structure

• We work out from heads by following the dependency order of the sentence

• The crucial property is that we have always built – and have available to us for conditioning – all governing heads and all less oblique dependents of the same head

• We can also easily integrate phrase length

Page 19: Probabilistic Grammars


Information from the web: The problem

• When people see web pages, they understand their meaning – By and large. To the extent that they don’t,

there’s a gradual degradation• When computers see web pages, they get only

character strings and HTML tags

Page 20: Probabilistic Grammars


The human view

Page 21: Probabilistic Grammars


The intelligent agent view

<HTML> <HEAD><TITLE>Ford Motor Company - Home Page</title><META NAME="Keywords" CONTENT="cars, automobiles, trucks, SUV,

mazda, volvo, lincoln, mercury, jaguar, aston martin, ford"><META NAME="description" CONTENT="Ford Motor Company corporate

home page"><SCRIPT LANGUAGE="JavaScript1.2"> … </SCRIPT><!-- Trustmark code --><DIV ID=trustmarkDiv> <TABLE BORDER="0" CELLPADDING=0 CELLSPACING=0 WIDTH=768> <TR><TD WIDTH=768 ALIGN=CENTER> <A HREF="default.asp?

pageid=473" onmouseover="logoOver('fordscript');rolloverText('ht0')" onmouseout="logoOut('fordscript');rolloverText('ht0')"><img border="0" src="images/homepage/fordscript.gif" ALT="Learn more about Ford Motor Company" WIDTH="521" HEIGHT="39"></A><br>

… </TD></TR></TABLE></DIV> </BODY></HTML>

Page 22: Probabilistic Grammars


The problem (cont.)

• We'd like computers to see meanings as well, so that computer agents could more intelligently process the web

• These desires have led to XML, RDF, agent markup languages, and a host of other proposals and technologies which attempt to impose more syntax and semantics on the web – in order to make life easier for agents.

Page 23: Probabilistic Grammars



• The problem can’t and won’t be solved by mandating a universal semantics for the web

• The solution is rather agents that can ‘understand’ the human web by text and image processing

Page 24: Probabilistic Grammars


(1) The semantics

• Are there adequate and adequately understood methods for marking up pages with such a consistent semantics, in such a way that it would support simple reasoning by agents?

• No.

Page 25: Probabilistic Grammars


What are some AI people saying?

“Anyone familiar with AI must realize that the study of knowledge representation—at least as it applies to the “commensense” knowledge required for reading typical texts such as newspapers—is not going anywhere fast. This subfield of AI has become notorious for the production of countless non-monotonic logics and almost as many logics of knowledge and belief, and none of the work shows any obvious application to actual knowledge-representation problems. Indeed, the only person who has had the courage to actually try to create large knowledge bases full of commonsense knowledge, Doug Lenat …, is believed by everyone save himself to be failing in his attempt.” (Charniak 1993:xvii–xviii)

Page 26: Probabilistic Grammars


(2) Pragmatics not semantics

pragmatic relating to matters of fact or practical affairs often to the exclusion of intellectual or artistic matters

pragmatics linguistics concerned with the relationship of the meaning of sentences to their meaning in the environment in which they occur

• A lot of the meaning in web pages (as in any communication) derives from the context – what is referred to in the philosophy of language tradition as pragmatics

• Communication is situated

Page 27: Probabilistic Grammars


Pragmatics on the web

• Information supplied is incomplete – humans will interpret it– Numbers are often missing units– A “rubber band” for sale at a stationery site is a

very different item to a rubber band on a metal lathe

– A “sidelight” means something different to a glazier than to a regular person

• Humans will evaluate content using information about the site, and the style of writing– value filtering

Page 28: Probabilistic Grammars


(3) The world changes

• The way in which business is being done is changing at an astounding rate– or at least that’s what the ads from e

business companies scream at us• Semantic needs and usages evolve (like

languages) more rapidly than standards (cf. the Académie française)

• People use words that aren’t in the dictionary.• Their listeners understand them.

Page 29: Probabilistic Grammars


(4) Interoperation

Ontology: a shared formal conceptualization of a particular domain

• Meaning transfer frequently has to occur across the subcommunities that are currently designing *ML languages, and then all the problems reappear, and the current proposals don't do much to help

Page 30: Probabilistic Grammars


Many products cross industries


• Interfilm offers a complete range of SKC's Skyrol® brand polyester films for use in a wide variety of packaging and industrial processes.

• Gauges: 48 - 1400• Typical End Uses: Packaging, Electrical, Labels,

Graphic Arts, Coating and Laminating– labels: milk jugs, beer/wine, combination

forms, laminated coupons, …

Page 31: Probabilistic Grammars


(5) Pain but no gain

• A lot of the time people won't put in information according to standards for semantic/agent markup, even if they exist.

• Three reasons…– Laziness: Only 0.3% of sites currently use the

(simple) Dublin Core metadata standard. – Profits: Having an easily robot-crawlable site is

a recipe for turning what you sell into a commodity, and hence making little profit

– Cheats: There are people out there that will abuse any standard, if it’s profitable

Page 32: Probabilistic Grammars


(6) Less structure to come

• “the convergence of voice and data is creating the next key interface between people and their technology. By 2003, an estimated $450 billion worth of e-commerce transactions will be voice-commanded.*”

• Question: will these customers speak XML tags?

Intel ad, NYT, 28 Sep 2000*Data Source: Forrester Research.

Page 33: Probabilistic Grammars


The connection to language

Decker et al. IEEE Internet Computing (2000):

• “The Web is the first widely exploited many-to-many data-interchange medium, and it poses new requirements for any exchange format:– Universal expressive power– Syntactic interoperability– Semantic interoperability”

But human languages have all these properties, and maintain superior expressivity and interoperability through their flexibility and context dependence

Page 34: Probabilistic Grammars


NLP and information access

• Solution: use robust natural language processing and machine learning techniques

• NLP comes into its own when you want to do more than just standard IR.

• E.g., defined information needs over text:– “An apartment with 2 bedrooms in Menlo

Park for less than $1,500.”– “Where was there an airline accident today?”– “What proteins is this gene known to


Page 35: Probabilistic Grammars


Example of extracting textual relations: Real Estate Ads

• System starts with plain text of ads– These are hardly exactly “English”

• But an unstructured information source, close to English

– Chosen as lowest common denominator• Output: database records

– A variety of tables giving information about:• the property: bedrooms, garages, price• the real estate agency• inspection times

Page 36: Probabilistic Grammars


Real Estate Ads: Input

<ADNUM>2067206v1</ADNUM><DATE>March 02, 1998</DATE><ADTITLE>MADDINGTON $89,000</ADTITLE><ADTEXT>OPEN 1.00 - 1.45<BR>

U 11 / 10 BERTRAM ST<BR> NEW TO MARKET Beautiful<BR> 3 brm freestanding<BR> villa, close to shops & bus<BR>

Owner moved to Melbourne<BR> ideally suit 1st home buyer,<BR> investor & 55 and over.<BR> Brian Hazelden 0418 958 996<BR> R WHITE LEEMING 9332 3477


Page 37: Probabilistic Grammars


Real Estate Ads: Output

• Output is database tables• But the general idea in slot-filler format:


[Manning & Whitelaw, U. Sydney 1998; in daily use at News Corp.]

Page 38: Probabilistic Grammars


Page 39: Probabilistic Grammars
Page 40: Probabilistic Grammars


One needs a little NLP

• There is no semantic coding to use• Standard IR doesn’t work:

– suburbs• the Paddington of the west• one hours drive from Sydney• real estate agent

– prices• recently sold for $x. Was $y now $z. Rent.

– bedrooms – multi-property ads

Page 41: Probabilistic Grammars


Text Segmentation

Real-estate ads have an hiearchical text structure!!SOUTHPORT UNIT SPECIALS$58,900 o.n.o. 2 brm close to water and shops.$114,000 "Grandview", excellent value, good returnsLJ Coleman Real EstateContact Steve 5527 0572

GLEBE 2br yd $250; 4br yd $430

COOGEE 3br yd $320; 1br $150

BALMAIN 1br $180

H.R. Licensed FEE 9516-3211

Page 42: Probabilistic Grammars


The End