39
1. Introduction to COLLOCATION (1) – the IDEA ; I usually introduce this idea to students after dealing with other forms of word combinations that students are likely to be more familiar with (hyphenated words, compounds, phrasal verbs, idioms, and phrases): Firth, J.R. (1957) `Modes of Meaning’ in F.R. Palmer (ed) Papers in Linguistics 1934-51. London: Oxford University Press. pp. 190-215.

1acorn.aston.ac.uk/SS2010/08-collocation-and-semantic... · Web view1. Introduction to COLLOCATION (1) – the IDEA; I usually introduce this idea to students after dealing with other

Embed Size (px)

Citation preview

1. Introduction to COLLOCATION (1) – the IDEA; I usually introduce this idea to students after dealing with other forms of word combinations that students are likely to be more familiar with (hyphenated words, compounds, phrasal verbs, idioms, and phrases):

Firth, J.R. (1957) `Modes of Meaning’ in F.R. Palmer (ed) Papers in Linguistics 1934-51. London: Oxford University Press. pp. 190-215.

2. Introduction to COLLOCATION (2) – the initial linguistic THEORY (Firth) [NB Firth was the first to make collocation a part of linguistic theory, and the first to see collocational meaning as something different from grammatical and semantic meanings; Palmer incorporated common grammatical combinations (e.g. verb + noun) into his ideas for a Dictionary for Learners of English. I have not been able to read Palmer’s article yet, but it is described in detail in Cowie (2000).

Firth, J.R. (1957) `Modes of Meaning’ in F.R. Palmer (ed) Papers in Linguistics 1934-51. London: Oxford University Press. pp. 190-215.

Palmer, H.E. (1933) Second Interim Report on English Collocations. Tokyo: Kaitakusha.

Cowie, A.P. (2000) The EFL Dictionary Pioneers and their Legacies. Kernerman Dictionary News • Number 8 • July 2000. [http://www.kdictionaries.com/newsletter/kdn8-1.html]

3. Introduction to COLLOCATION (3) – adding details to the THEORY (Halliday) and implementing it PRACTICALLY (Sinclair)

Halliday, M.A.K. (1966) `Lexis as a linguistic level’ in C.E. Bazell, J.C. Catford, M.A.K. Halliday, R.H. Robins (eds) In Memory of J.R. Firth. London: Longman. pp. 148-162.

Sinclair, J. M. (1966) ‘Beginning the study of lexis’ in in C.E. Bazell, J.C. Catford, M.A.K. Halliday, R.H. Robins (eds) In Memory of J.R. Firth. London: Longman. pp. 410–430.

Sinclair J.M., Jones S. & Daley R. (1970) English Lexical Studies. Published asKrishnamurthy R (ed) (2004) English Collocation Studies. London and New York:Continuum.

Sinclair, J.M. (1987b) 'Collocation: a progress report', in Steele R & Threadgold T (eds.) Language Topics: Essays in Honour of Michael Halliday, Amsterdam: John Benjamins, 319-31[NB we put ‘b’ after a date when we to distinguish 2 items by the same author in the same year]

4. For a long time, in fact for many years, I could not understand what Firth meant by his statement ‘one of the meanings of night is its collocability with dark ’! I think I am beginning to understand it now:

In a corpus, you can check which other words occur with night, and which have a similar meaning to dark, and which have an opposite meaning!

Counting these words gave me a shock: ALL the other words added together still did not occur as often as dark! So there is a special relationship (= collocation) which is not the same as grammatical or semantic relationships.

Grammar only tells us that ‘a noun can be preceded by an adjective; and an adjective is usually followed by a noun’.

Semantics only tells us that the words must be ‘appropriate’ in meaning for each other.

For example, think of Chomsky’s famous invented example: Colourless green ideas sleep furiously. This sentence is grammatically correct – or ‘well-formed’ - but semantically incoherent. Collocation tells us that the words in this sentence are NOT COLLOCATES of each other!

* I think Firth (1951) was written in 1951, but published in 1957…

5. Strong collocational patterns can be easily seen, even in random corpus examples. This is a random KWIC (Key Word In Context) concordance for arrant:

So, which words can you see occurring frequently? These words are called the collocates of arrant.

6. Most corpus software will allow you to SORT concordances alphabetically, by the word to the right, the word to the left, etc. The concordances in the previous slide have now been sorted – but which way?

Some strong collocates are even easier to see now – ‘arrant nonsense’. Even if you don’t understand all the words, you can still see the pattern!

(a) Now look more closely – can you notice any similarity between the words occurring one to the right of arrant? For example: bigot, bullshit, chauvinism, coward, drunkard, effrontery, knave, lunacy, melodrama, nonsense (24 examples!), racism, stupidity, superstitions, traitors. Would you say these were generally ‘good’ things or ‘bad’ things?

(b) Now look at some of the other words one to the right of arrant. For example: beginner, Communists, democracy, romanticism, Scottishness. Would you agree that these are all ‘neutral’ words, or perhaps even ‘good/positive’ words (e.g. democracy)? Look again more closely at the lines with these words in. Notice: even against an arrant beginner… Arrant Communists, all four of them… denouncing some proposal as ‘arrant democracy’… 14 th century truth in favour of arrant romanticism… but despite the arrant Scottishness of such painters.

Can you see that there are some clues (underlined; including the quote marks) that the words in (b) have been affected by the ‘negative’ value of the words in (a)? Some linguists call this SEMANTIC PROSODY. The word ‘arrant’ takes on a negative value from its habitual collocation with negative words like nonsense, and influences any other words that occur near it. So, even ‘positive’ words like democracy become ‘negative’ when they are near arrant.

7. These corpus concordance lines are for the word fragranced. Have you come across this word before? Can you think of another word with a similar meaning (synonym)? The word fragranced seems to be quite common, as there are many examples for it in the corpus. But is there anything similar about these lines? What type of text do you think most of these examples come from?

(a) Are these lines SORTED? (b) In which direction? (c) Which words immediately to the left of fragranced occur more frequently than others? (d) What is the ‘grammatical word-class’ or ‘part of speech’ of fragranced? (e) What is the word-class of many of the words immediately to the left of fragranced? (f) Is this sequence of word-classes a normal pattern in English? (g) Can you see any collocations to the right of fragranced? (h) How could we make them easier to see?

8. John Sinclair (1987) and Bill Louw (1993) began to notice patterns involving SEMANTIC PROSODIES: a) many of the examples they found were NEGATIVE – Sinclair had noticed that the phrasal verb set in had negative subjects b) Louw added that (i) even when the example seemed to break the SEMANTIC PROSODY, e.g. seemed to have a positive meaning, there were usually other clues in the example (e.g. inverted commas around improving in line 5 of the concordance for bent on below) which made the meaning negative (ii) these exceptional examples were often humorous or sarcastic (= IRONIC) or may be an indication that the writer/speaker did not really believe what he/she was saying, or was deliberately lying (= INSINCERITY) – see the title of his article below! The examples below are from a seminar Bill Louw gave at Birmingham University in 1991.

Sinclair, J.M. (1987) Looking Up - An account of the COBUILD Project in lexicalcomputing, London: HarperCollins

Louw, W. E. (1993) Irony in the text or insincerity in the writer? The diagnostic potential of semantic prosodies. In: Text and technology. In honour of John Sinclair. Ed. M. Baker, G. Francis and E. Tognini-Bonelli, 152—176. Amsterdam: John Benjamins

9. These were two more examples discovered by Bill Louw in 1991: symptomatic of and even a very common adverb like utterly. Mike Stubbs (1995) then managed to show that very common verbs like happen and cause had negative semantic prosodies!

Louw, W. E. (1993) Irony in the text or insincerity in the writer? The diagnostic potential of semantic prosodies. In: Text and technology. In honour of John Sinclair. Ed. M. Baker, G. Francis and E. Tognini-Bonelli, 152—176. Amsterdam: John Benjamins

Stubbs, M. (1995) ‘Collocations and semantic profiles: On the cause of the trouble with quantitative methods,’ Functions of Language 2/1: 1–33.

10. Even a seemingly innocent phrase like without feeling has a negative SEMANTIC PROSODY (Louw seminar, 1991).

11. Collocation can be a good way to disambiguate synonyms. This means to notice the smaller differences between two words that seem to be very similar in meaning.

A student asked me in 1992: “When do you use electric and when do you use electrical?”These are the words that I found occurring to the right of electric and to the right of electrical. At first I couldn’t see any pattern….

Q.1 What do you know about electric lights, electric fires, electric fences, electric bulbs, electric motors, and electric razors?Q.2 What type of words are system, energy, appliances, equipment, goods, apparatus, devices, and machinery?

A.1 lights, fires, fences, bulbs, motors, and razors are physical things (machines, devices) that you connect to a source of electricity, and the electricity makes them function. You can switch them on and off.A.2 system, energy, appliances, equipment, goods, apparatus, devices, and machinery are more general words, with more abstract meanings – things that are part of an electrical system (system, energy), or groups of things that are powered by electricity (appliances, equipment).

Note that fire occurs in both columns. An electric fire is one that you plug in and switch on to keep warm. An electrical fire is a fire that occurs by accident – because of a fault in the electrical system!

******************THE FOLLOWING SLIDES (12-16) ARE MY ATTEMPT TO REPRODUCE A WONDERFUL PRESENTATION BY PROFESSOR JOHN SINCLAIR THAT I FIRST SAW MANY YEARS AGO.

HE LATER PUBLISHED HIS VERSION IN A PAPER:

Sinclair JM. 1996. The Search for Units of Meaning. Textus IX: 75-106*******************

12. If we just count the number of words occurring (= ‘raw frequency’), the main collocates of every word in the corpus will (almost always) be the most frequent words in the corpus (the, of, and, to, in, etc). So we often use STATISTICS instead; e.g. The T-SCORE statistic checks how often each word occurs in the whole corpus, before calculating whether it is occurring more or less than expected. We can call these SIGNIFICANT collocates. Now if a very frequent word occurs (e.g. an, on, the, below), we can be more certain that it is a collocate.

(a) Can you make up sentences containing eye and its top collocates, e.g. a sentence containing eye and an, a sentence containing eye and on, a sentence containing eye and keep, etc? (b) Notice that you are having to GUESS the word-class and meaning of the collocate in many cases (e.g. is contact a noun or a verb? which meaning of contact might occur in each of the 320 examples? Will all the examples contain contact being used with same meaning?)

13. Now let us look at the top collocates of another word, naked. Notice that naked is much less frequent (4274 lines in this corpus) than eye (19521 lines).

(a) Can you make up sentences containing naked and its top collocates? (b) Are you having difficulties with the first collocate, gun? Don’t worry, it is not because your knowledge of the English language is poor! Corpora containing authentic texts, as well as reflecting the language, also reflect the real world, especially the CULTURE of the place and time when they were created. So if you notice that gun is the top collocate of naked in this corpus, it is useful to know that Naked Gun was a series of 3 very popular Hollywood films made in 1988, 1991, and 1994 – see http://en.wikipedia.org/wiki/The_Naked_Gun for details. (c) naked and body are much easier. (d) In 3rd place, we have naked and eye! But eyes don’t wear clothes, so how can they be naked?

14-16. Here are the 157 concordance lines for the combination naked eye. Have a quick look through them, and note down anything you notice. (What kind of features did we notice before? If you can’t remember, look back now to remind yourself, before staring this exercise).

15. Concordance lines for naked eye (continued; page 2 of 3)

16. Concordance lines for naked eye (continued; page 3 of 3)

(a) What did you notice?(b) Are the concordance lines sorted at all? In which direction? (c) Did you notice that the lines are sorted not just by the 1st word, but also by the 2nd word?(d) Did you notice anything immediately after the word eye? Does this help to explain the direction of sorting?(e) What is the word occurring most frequently before naked eye? How frequent is it? How many lines do NOT contain that word? What word class / part of speech is the word?(f) What is the word occurring most frequently 2 words before naked eye? What word class / part of speech is that word? What other words occur 2 words before naked eye? What word class / part of speech are they?(g) What is the word occurring most frequently 3 words before naked eye? What word class / part of speech is that word? What other words occur 3 words before naked eye? What word class / part of speech are they?(h) What is the word occurring most frequently 4 words before naked eye? What word class / part of speech is that word? What other words occur 4 words before naked eye? What word class / part of speech are they?(i) Did you notice that as you get further away from the core of the phrase, the variety of choices increases, and the variety of word classes?

THAT WAS THE BOTTOM-UP PRESENTATION, MEANING THAT WE STARTED FROM THE INDIVIDUAL EXAMPLES AND GRADUALLY IDENTIFIED AND BUILT UP THE PATTERNS. I consider this to be an inductive methodology. Notice also that we already managed to explain almost all the examples.NEXT, I WILL SHOW YOU A MORE TRADITIONAL TOP-DOWN PRESENTATION OF THE SAME INFORMATION. This is a deductive methodology. The rule/pattern/general statement is given first, then some details/examples.

17. TRADITIONAL TOP-DOWN PRESENTATION (1 of 4): note that eye (44283) is over 4 times as frequent as naked (10105) in the corpus.

18. TRADITIONAL TOP-DOWN PRESENTATION (2 of 4):

19. TRADITIONAL TOP-DOWN PRESENTATION (3 of 4):

20. TRADITIONAL TOP-DOWN PRESENTATION (4 of 4):

(a) Which presentation did you prefer? I think students learn more from the bottom-up, inductive methodology, because it focusses on PROCESS. The top-down, deductive methodology seems to focus more on PRODUCT.(b) If you had only seen the top-down presentation, how much would you remember after a week, or a month, or a year? I think you will remember more because of the bottom-up presentation.

21. When I first noticed the expression cutting edge, I tried to do a similar bottom-up analysis.I first wrote about this in my paper: Krishnamurthy, R. 1996. The Data is The Dictionary: Corpus at the Cutting Edge of Lexicography, in F. Kiefer, G. Kiss & J. Pajzs (eds) Papers in Computational Lexicography, COMPLEX'96, Budapest: Hungarian Academy of Sciences, Research Institute for Linguistics, pp 117-144

22. I first looked in native-speaker dictionaries, but realised that:

AHD = American Heritage DictionaryTimes 2000 = Times English Dictionary

23. I then looked in bilingual dictionaries, and English learners’ dictionaries, and realised that:

24. Here is the entry for cutting edge in the Collins English-Spanish bilingual dictionary:

Note that the English-Spanish section of the dictionary gives vanguardia as an equivalent for cutting edge, but the Spanish-English section entry for vanguardia does not give cutting edge as an equivalent, so a Spanish speaker will not learn this new phrase.

25. Here are the entries for cutting edge in FOUR English learners’ dictionaries:

26. So which feature of cutting edge did ALL 4 of the English learners’ dictionaries omit? Look at the following concordance from the Bank of English corpus:

There is a hyphenated adjective form: cutting-edge!

27. If I was a learner of English, and came across an example of cutting edge, such as:‘Cobuild is at the cutting edge of computational lexicography’, I would ask myself the following questions:

28. And this is what I would find, if I analysed the Bank of English corpus data:

NOTE: this final comment applies only to native-speaker dictionaries!