29
Postgraduate Diploma in Translation Lecture 1 Computers and Language

Postgraduate Diploma in Translation Lecture 1 Computers and Language

Embed Size (px)

Citation preview

Page 1: Postgraduate Diploma in Translation Lecture 1 Computers and Language

Postgraduate Diploma

in Translation

Lecture 1

Computers and Language

Page 2: Postgraduate Diploma in Translation Lecture 1 Computers and Language

Feb 2005 -- MR Diploma in Translation - Lecture 1 2

Course Information

Webhttp://www.cs.um.edu.mt/~mros/diptran

[email protected]@um.edu.mt

D. Arnold et al (1994) Machine Translation: an Introductory Guide. See website.

H. Somers (2003). Computers and Translation, a Translator’s Guide. See website.

Page 3: Postgraduate Diploma in Translation Lecture 1 Computers and Language

Feb 2005 -- MR Diploma in Translation - Lecture 1 3

Computers and Language

Computational Linguistics Emphasis on mechanised linguistic theories. Grew out of early Machine Translation efforts

Natural Language Processing Computational models of language analysis,

interpretation, and generation. Language Engineering

emphasis on large-scale performance example: Google

Page 4: Postgraduate Diploma in Translation Lecture 1 Computers and Language

Feb 2005 -- MR Diploma in Translation - Lecture 1 4

CL: Two Main Disciplines

COMP SCILINGUISTICS

Page 5: Postgraduate Diploma in Translation Lecture 1 Computers and Language

Feb 2005 -- MR Diploma in Translation - Lecture 1 5

Linguistics

Phonetics: The study of speech sounds Phonology: The study of sound systems Morphology: The study of word structure Syntax: The study of sentence structure Semantics: The study of meaning Pragmatics: The study of language use

Page 6: Postgraduate Diploma in Translation Lecture 1 Computers and Language

Feb 2005 -- MR Diploma in Translation - Lecture 1 6

Grammar Rules:Prescriptive versus Descriptive

Prescriptive Grammar

Rules for and against certain uses

Proscribed forms that are in current use

“don’t end a sentence with a preposition”

Subjective

Descriptive Grammar

Rules characterizing what people actually say

Goal to characterize all and only that which speakers find acceptable

Objective

Page 7: Postgraduate Diploma in Translation Lecture 1 Computers and Language

Feb 2005 -- MR Diploma in Translation - Lecture 1 7

Noam Chomsky

Noam Chomsky’s work in the 1950s radically changed linguistics, making syntax central.

Chomsky has been the dominant figure in linguistics ever since.

Chomsky invented the generative approach to grammar.

Page 8: Postgraduate Diploma in Translation Lecture 1 Computers and Language

Feb 2005 -- MR Diploma in Translation - Lecture 1 8

Generative Grammar:Key Points

A language is a (possibly infinite) set of sentences. Grammar is finite. Grammar of a particular language expresses

linguistic knowledge of that language Theory of Grammar includes mathematical definition

of what a grammar is. The “Theory of Grammar” is a theory of human

linguistic abilities.[source: Sag & Wasow]

Page 9: Postgraduate Diploma in Translation Lecture 1 Computers and Language

Feb 2005 -- MR Diploma in Translation - Lecture 1 9

Theories of Sentence and Word Structure: Rewrite Rules

Rules can be used to specify the sentences of a language.

Rules have the formLHS RHS LHS may be a sequence of symbols RHS may be a sequence of symbols or words.

Lexicon specifies words and their categories

Page 10: Postgraduate Diploma in Translation Lecture 1 Computers and Language

Feb 2005 -- MR Diploma in Translation - Lecture 1 10

A Simple Grammar/Lexicon

grammar:

S NP VPNP NVP V NPlexicon:

V kicksN JohnN Bill

S

NP

N

John kicks

NPV

VP

N

Bill

Page 11: Postgraduate Diploma in Translation Lecture 1 Computers and Language

Feb 2005 -- MR Diploma in Translation - Lecture 1 11

Formal v. Natural Languages

Formal Languages

Arithmetic3290 1 1010101

Logicx man(x) mortal(x)

URLhttp://www.cs.um.edu.mt

Natural Languages

EnglishJohn saw the dog

GermanJohann hat den hund gesehen

MalteseĠianni ra kelb

Page 12: Postgraduate Diploma in Translation Lecture 1 Computers and Language

Feb 2005 -- MR Diploma in Translation - Lecture 1 12

Points of Similarity

A language is considered to be a (possibly infinite) set of sentences.

Sentences are sequences of words. Rules determine which sequences are valid

sentences. Sentences have a definite structure. Sentence structure related to meaning.

Page 13: Postgraduate Diploma in Translation Lecture 1 Computers and Language

Feb 2005 -- MR Diploma in Translation - Lecture 1 13

Points of Difference

Formal Languages The grammar

defines the language

Restricted application

Non ambiguous

Natural Languages The language

defines the grammar

Universal application

Highly ambiguous

Page 14: Postgraduate Diploma in Translation Lecture 1 Computers and Language

Feb 2005 -- MR Diploma in Translation - Lecture 1 14

Ambiguity Morphological Ambiguity

en-large-ment Lexical Ambiguity

the sheep is in the pen Syntactic Ambiguity

small animals and children laugh Semantic Ambiguity

every girl loves a sailor Pragmatic Ambiguity

can you pass the salt? The management of ambiguity is central to the

success of CL in general and MT in particular.

Page 15: Postgraduate Diploma in Translation Lecture 1 Computers and Language

Feb 2005 -- MR Diploma in Translation - Lecture 1 15

Computer Science

The study of basic concepts Information Data Algorithm Program

The application of these concepts to practical tasks.

Implementation of computational models from other fields.

Page 16: Postgraduate Diploma in Translation Lecture 1 Computers and Language

Feb 2005 -- MR Diploma in Translation - Lecture 1 16

Information Information is an theoretical concept invented by Shannon in

1948 to measure uncertainty. The units of this measure are called bits. Length – metres Weight – kilos Information – bits

1 bit is the amount of uncertainty inherent to a situation when there are exactly two possible outcomes. Example: for breakfast I will have coffee or I will have tea (nothing else).

When I tell you that I have tea, I have conveyed one bit of information.

The greater the number of possible outcomes, the more bits of infomation involved in the statement that indicates the actual outcome.

Page 17: Postgraduate Diploma in Translation Lecture 1 Computers and Language

Feb 2005 -- MR Diploma in Translation - Lecture 1 17

Data

A formalized representation of facts or concepts suitable for communication, interpretation, or processing by people or automated means.

Example: a telephone directory Unlike information, which is abstract, data is

concrete Data has a certain level of structure. In the

telephone directory, for example, we have the structure of a list of entries, each of which has a name, an address, and a number.

Page 18: Postgraduate Diploma in Translation Lecture 1 Computers and Language

Feb 2005 -- MR Diploma in Translation - Lecture 1 18

Algorithm

A well defined procedure for the solution of a given problem in a finite number of steps

Abstract Designed to perform a well-defined task. Finite description length. Guaranteed to terminate.

Page 19: Postgraduate Diploma in Translation Lecture 1 Computers and Language

Feb 2005 -- MR Diploma in Translation - Lecture 1 19

Algorithm for Chocolate Cake

Page 20: Postgraduate Diploma in Translation Lecture 1 Computers and Language

Feb 2005 -- MR Diploma in Translation - Lecture 1 20

Program to Add X and Y

subtract 1 from X

add 1 to Y

X = 0?

Read X and YX = 2, Y = 3

yesno Output Y

Page 21: Postgraduate Diploma in Translation Lecture 1 Computers and Language

Feb 2005 -- MR Diploma in Translation - Lecture 1 21

Computer Program

A set of instructions, written in a specific programming language, which a computer follows in processing data, performing an operation, or solving a logical problem.

Concrete A program can implement an algorithm. More than one program may implement the

same algorithm. Not all programs express good algorithms!

Page 22: Postgraduate Diploma in Translation Lecture 1 Computers and Language

Feb 2005 -- MR Diploma in Translation - Lecture 1 22

Instructions vs. Execution Steps

1. Read X

2. Read Y

3. X = X-1

4. Y = Y+1

5. If X = 0 then Print(X) else goto 3

How many instructions?

How many execution steps?

Page 23: Postgraduate Diploma in Translation Lecture 1 Computers and Language

Feb 2005 -- MR Diploma in Translation - Lecture 1 23

Algorithms and Linguistics

Does linguistic theory make sense without implementing the concepts?

Linguistic theory provides linguistic knowledge in the form of grammar rules theories about grammar rules

Putting knowledge to some use involves processing issues: parsing generation

Page 24: Postgraduate Diploma in Translation Lecture 1 Computers and Language

Feb 2005 -- MR Diploma in Translation - Lecture 1 24

Computational Linguistics – Issues

How are a grammar and a lexicon represented? How is the structure of a given sentence actually

discovered? How can we actually generate a sentence to

express a particular meaning? How can linguistic theory be made concrete enough

to test algorithmically? Can an artificial system learn a language with

limited exposure to grammatical sentences?

Page 25: Postgraduate Diploma in Translation Lecture 1 Computers and Language

Feb 2005 -- MR Diploma in Translation - Lecture 1 25

Non computational theoriescan be misleading

Representational details omitted. Computer memory requirements omitted. Nature of individual steps may be unclear. Difficult to test. Potentially unimplementable

Page 26: Postgraduate Diploma in Translation Lecture 1 Computers and Language

Feb 2005 -- MR Diploma in Translation - Lecture 1 26

Example of a Non Computational Model

Page 27: Postgraduate Diploma in Translation Lecture 1 Computers and Language

Feb 2005 -- MR Diploma in Translation - Lecture 1 27

Computers and LanguageTwin Goals

Scientific Goal:Contribute to Linguistics by adding a computational dimension.

Technological Goal: Develop machinery capable of handling human language that can support “language engineering”

Page 28: Postgraduate Diploma in Translation Lecture 1 Computers and Language

Feb 2005 -- MR Diploma in Translation - Lecture 1 28

Computers and Language Tools & Resources

Grammar Formalisms, e.g.Definite Clause Grammars

Parsing Algorithmssentence structure

Generation Algorithmsstructure sentence

Statistical Methods Linguistic Corpora

Page 29: Postgraduate Diploma in Translation Lecture 1 Computers and Language

Feb 2005 -- MR Diploma in Translation - Lecture 1 29

Computers and Language: Applications

Information Retrieval/Extraction Document Classification Question Answering Style and Spell Checking Integrated Multimodal Tasks Machine Translation