Upload
clifford-rose
View
229
Download
0
Tags:
Embed Size (px)
Citation preview
Human Translation - Machine Translation
Natural Language Processing (NLP) and Translation
Anca Christine PascuUniversité de Bretagne Occidentale, LabSTICC, Brest, France
A. P. Genova, May 2015
Outline
Cognition – Language – Translation
The Natural Language Processing (NLP) and TranslationModelling in Translation Computational LogicLogic and TranslationComputation and TranslationConcepts and Objects in TranslationThe Text Structure
The Lattice Structure of a TextFormal Concept Analysis and the Text Structure
Human Translation – Machine Translation
2
A. P. Genova, May 2015 4
G. Frege, Nachgelassene Schriften, Hamburg, Meiner, 1969in Desclés, J-P. (1998), « Les Langues sont-elles des représentations du monde », Essais sur le langage, logique, et sens comun, Editions universitaires, FribourgDédié à Evandro Agazzi
A. P. Genova, May 2015 5
It is true that we can express the same meaning (tought) in different languages; but the psychologic trappings (harness), the tought dressing will be osten different. That is why, the foreiner languages learning is useful for the education in logic. We learn to better distinguish the verbal peel from the kernel to which it is organically linked in any language. This is how the differences between natural languages can facilitate our apprehension of that which is logic.
G. Frege, Nachgelassene Schriften, Hamburg, Meiner, 1969 (Posthumous Writings)in Desclés, J-P. (1998), « Les Langues sont-elles des représentations du monde », Essais sur le langage, logique, et sens comun, Editions universitaires, FribourgDedicate to Evandro Agazzi
A. P. Genova, May 2015
Cognition – Language - Translation
K. Cognition: a set of processes related to knowledge:attention, memory, psychologyjudgement, reasoning, « computation », problem
solving, decision making logic, computer science
comprehention and production of language linguistics, psychology
6
A. P. Genova, May 2015 7
Reasoning
Judgement
Computation
Problem solving
Decision makong
Cognition
Attention
Memory
Psychology
Logic, CS
Linguistics,Psychology
Language comprehention
Language production
A. P. Genova, May 2015
Some Questions about Language and Cognition
Natural languages are they representations of the world ?
Each natural language can projects itself on the external world ?
Each natural language can construct its own cognitive representations ?
Do natural languages refer to a universal system of mental representations ?
Jean-Pierre Desclés, « Les Langues sont-elles des représentations du monde », Essais sur le langage, logique, et sens comun, Editions universitaires, Fribourg, 1998.
8
A. P. Genova, May 2015
Three Epistemological Hypotheses
Relativistic hypothesis – Saphir-Whorf (Whorf, 1966);
Anti-relativistic hypothesis – Fodor (Fodor, 1975) Shaumyan (Shaumyan, 1977)
Anti-anti-relativistic hypothesis – Desclés (Desclés, 1998 )
9
A. P. Genova, May 2015
Linguistics - Logic
Natural Language –Language
Linguistics: Lexis, Morphology, Syntaxe, Semantics – Discourse - Text
Logic: Hypoteses, Inferences, Conclusions –Reasonning Inferences: Deduction, Induction, Abduction
Meaning Item (Unit) – Translation Item (Unit) (Ballard, 2004)
Ordered Structure of a Text : Argumentatif Structure, Descriptif Structure
13
NLP Fields via Linguistics
Lexical levelErrors detection and correction Automatic documentation, indexing, search engine
Morphological levelMorphologic annotation
Syntactic levelGrammars and parsers
Semantic level Automatic processing of the meaning Automatic text comprehention
Machine translationA. P. Genova, May 2015 14
NPL fields via applications
Automatic Annotation of CorporaMorphologic annotationSemantic annotation
Text Mining; Indexing Automatic summarizingText GenerationMachine Translation: Automatic
translation Computer-Assisted Translation
A. P. Genova, May 2015 15
Definition
Natural Language Processin (NLP) : multidisciplinary field studying a set of: Theories (linguistics, mathematic, logic....);
Methods (procedures, algorithmes....); Computer Science Systems (languages, procedures......)
For analysis-synthesis in natural languages solving problems related to language and natural
languages
A. P. Genova, May 2015 16
Lexical LevelWord Processing
Spell Checker Lexical Labeling: word labeling with linguistic labelsConcordancers: a computer program searching for a word
all its occurrences in a text with their contexts (http://ecolore.leeds.ac.uk/xml/materials/overview/tools/concordancer.xml?lang=fr)Concordancers are used to build linguistic corpora La La forme du
mot : lemme, forme fléchie ......
Lemmatizers : lemma –inflected form
A. P. Genova, May 2015 17
Syntactic Level Grammars and Parsers
The techniques of analysis are almost the same as these used in Formal Languages.
Formal Grammar = a system of rules which allow, starting from a vocabulary : to analyse a string to generate a string
Formal Language = finite set of words
Word = concatenated string of elements of a vocabulary.
A. P. Genova, May 2015 18
Grammars and ParsersTypes of Formal Grammars
Chomsky’s classification:
L3 L⊂ 2 L⊂ 1 L⊂ 0 ;
Categorial Grammar (Grammaires catégorielles) (CG)
Lexical Functional Grammars (Grammaires lexicales fonctionnelles) (LFG)
Generalized Phrase Structure Grammar (Grammaires syntagmatiques généralisées) (GPSG)
Tree Adjoint Grammar (Grammaires d'arbres adjoints) (TAG)
Head Phrase Structure Grammar (Grammaires syntagmatiques guidées par les têtes) (HPSG)
Dependency Grammar (Grammaires de dépendences) (DG)
A. P. Genova, May 2015 19
Grammars and Parsers
The steps of a syntactic analysis:
Segmentation (tagger) ;
Lemmatisation (identifying words in their canonic form)
Labeling (identifying the morpho-syntactic category)
La relation Syntax – Semantics :
Surface Structure – Deep structure
Typing Lexical Units (Categorial grammars).
A. P. Genova, May 2015 20
Example of CG
Jean aime Marie
N (S\N)/N N
Types : N, S basic types(S\N)/N derived type
A. P. Genova, May 2015 21
A. P. Genova, May 2015 22
CG Rules
Right Application:
OPER : T1/T2 OP : T2>
(OPER OP) : T1
Left Application:
OPER : T1\T2 OP : T2<
(OPER OP) : T1
Computer Text Comprehention
Meaning problem: there are two main positions in the formalisation of the meaning:An independent linguistic levelThe interdependence between the
linguistic level and the level of mind (which implies the degree of dependence)
A. P. Genova, May 2015 24
Computer Text Comprehention and Automatic Processing
Semantics:
Verifunctionel (truth conditions);
Intensional (based on corresponding concepts);
Extetional (based on corresponding objets) ;
Componential (word decomposition into primitive units of meaning
Procedural (an expression is a procedure containing a set of actions);
Argumentative (the chain of speech acts).
A. P. Genova, May 2015 25
Computer Text Comprehention and Automatic Processing
Structural Approaches of the Text
Text Grammars (D. Rumelhart, 1975):Story = Exposition + Theme + Intrigue + Resolution
Rhetorical Structure Theory (W. Mann, S. Thompson, 1987):
A text is a set of units related by relations
A. P. Genova, May 2015 26
Computer Text Comprehention and Automatic Processing
Text Thematic Analysis:
Analysis based on knowledge representation (semantic network, concept maps);
Analysis using statistic tools.
A. P. Genova, May 2015 27
Computer Text Comprehention and Automatic Processing
Concept mapshttp://en.wikipedia.org
/wiki/Concept_map
WORDNET http://wordnet.princeton.edu
Ontology = a network of objects and concepts related by relations; it is specific to a domain)
A. P. Genova, May 2015 28
A. P. Genova, May 2015
Computer Text Comprehention and Automatic Processing
Argumentative Structure of a Text: the text is organise in «argumentation units»Hypothesis ConclusionRules of inferenceElements outside of text
29
Semantic Annotation
Text Annotation: labeling the text accordig to a set of categories a priori defined.
Semantic Annotation: categories are semantic classes (classes of meaning based on relations). CausalityDefintionUtteranceQuotation
A. P. Genova, May 2015 30
A. P. Genova, May 2015
Translation unit
Translation Unit (T U) (Balard, 2004): elementary unit of meaning in source language (Ls) which can be tranfered in the target language (Lt).
Computer Science: the form of the source file after it is passed by C-preprocessor – in this case the output is deterministic and it depends only of the input and the rules.
Translation: A pair (TUs-TUt) with the property that it is an « equivalence » between TUs and TUt. It depends on:Concepts, Sentence, phrase, paragraphe
32
A. P. Genova, May 2015 33
Concepts, concept network, ontologies
Concept (C) : Set of specific features (more primitive
than the notion) (Int C) ;The concept is expressed in a natural
language by a word ;Some authors denote this pair by term
(T). We consider it as a concept with its «language code» (the word).
C = (Int C, W).
A. P. Genova, May 2015 34
Concepts, concept network, ontologies
The concept in a language is dependent of it, i.e. of the cognitive representations in this language
Concepts are organised in networks
They have not the same status (position)
The network in a language is different of the network in other (Desclés, 2006)
Int C as a network (Desclés, Pascu, 2011):
A. P. Genova, May 2015 35
officer of the watch
officer to watch
officier de quart
officier
quart
surveiller
quarter
Il est logique d'interpréter cette assertion par......It makes sense to interpret this statement by ......
Int s
............
Int c
....... .........
..... .....
Two intensions of the same concept
A. P. Genova, May 2015
Computer Science: cloud computing – traitement des données hautement distribuées
Mathematics: rough set – ensemble approximatif (ensemble grossier)
36
Ext E
E Int E
Fr E
Examples
A. P. Genova, May 2015 37
ConceptsLinks between concepts – global network
Inheritence –comprehension relation
..... .....
The Logic of Determination of Objects (LDO)
A. P. Genova, May 2015 38
The Logic of Determination of Objects (LDO)
ObjectsLinks between objects – local networkDetermination –relation between
objects
σ
A. P. Genova, May 2015 39
The Logic of Determination of Objects (LDO)
The link between objects and concepts
f--- f
ordered set - filter
ordered set - ideal
A. P. Genova, May 2015
FCA
OBJ –the set of objects
ATT – the set of attributes
R – binary relation between OBJ and ATT
K = (OBJ, ATT, R) – formal context
O OBJ: O↑ is the set of all attibutes commun to all objects in ⊆O
A ATT: A⊆ ↓ is the set of all objects commun to all attributes in A
42
A. P. Genova, May 2015
Formal Concept
Formal Concept: (Ext, Int) such that :Ext↑ = Int Int↓ = Ext
Subconcept – superconcept (A1, B1)<= (A2, B2) iff
A1⊆ A2 (B2 ⊆ B1)
43
A. P. Genova, May 2015
Contexte formel : (OBJ, ATT, R)C1 = ({o1,o3}, {A1, A2})C2 = ({o1,o3}, {A1, A2, A3})C3 = ({o1,o4}, {A1, A3})C4 = ({o1,o2,o3, o4 }, {A1})C5 = ({o1,o3}, {A2})C6 = ({o1,o3, o4}, {A3})
44
Example Concepts
A. P. Genova, May 2015
Galois Lattice
Two ordered sets: (OBJ, <OBJ), (ATT, <ATT)
Two mappings:φ: OBJ ATT, ψ: ATT OBJ such that
If o1<OBJ o2 then φ(o1) >ATT φ(o2)
If A1<ATT A2 then ψ (o1) >ATT ψ (o2)
o <OBJ ψ(φ(o)) and A <ATT φ(ψ(A))45
A. P. Genova, May 2015
The Context Lattice ∅1, 2, 3, 4o o o o
A1o1,o2,o3,o4 A2
o1,o3A3o1,o3,o4
A1,A2o1,o3
A1,A3o1,o3,o4
A2,A3o1,o3
A1,A2,A3o1,o3
46
A. P. Genova, May 2015
P1 P2 P3 P4 P5 P6 P7 P8 P9
O1 1 1 1
O2 1 1 1
O3 1
O4 1 1
O5 1
O6 1
O7 1 1
O8 1
O9 1 1 1
O10 1 1 1
O11 1 1
O12 1
O13 1
O14 1
O15 1
O16 1
O17 1 1 1
48
∅
P11,2,3,4
P1P21,4
P21,4,5,6,7
P37,8,9,10,11
P42,9,11,12,17
P510,13,14
P610,15
P72,9,17
P8∅
P916,17
P1P31
P1P4...2
P2P3...1,7
P4P7...2,9,17
P3P4...9,11
P1P2P31
P1P4P72
P3P4P79
P4P7P917
P1P2P3P4P5P5P7P8P9 ∅
P7P917
............................
............................
A. P. Genova, May 2015 49
A. P. Genova, May 2015
Interpretation
No differeces between the two lattices
The idea of « the pursuit of happinness »
50
A. P. Genova, May 2015
Applications of the FCA Model to Translation
Object Attributes Independent/Together
Semantic classes Segments of text Independent
Segments of text Semantic classes Independent
Segments of text Semantic classes Independent
Segments of text Semantic classes
Together
51
A. P. Genova, May 2015
Conclusions about FCA It gives the lattice structure of a text depending of the choice of
objects and attributes
The lattice structure can be used to model the translation unit and to implement it in a translation engine
The choice of objects:semantic classesstyle elements
The choice of attributes:
Segments of text; type of segmentation
To apply FCA model in an appropriate manner to a corpus of texts
52
A. P. Genova, May 2015 54
Translation Engine Types
Rules Based - Grammars
Learning-Model Based - Statistics
A. P. Genova, May 2015
DISSCUSSION Modelling
Define : Translation Unit – Meaning Unit and their Computer Model
Transfer Rules based on these primitivesLinguistic Architecture versus Computer Architecture – to give
a degree of unificationArchitecture
Translation Systems containing: Semantic AnnotatorKey Word SearcherDomain Ontology of Source Language – Target LanguageAppropriate Tools for Translation Data Mining
55
A. P. Genova, May 2015
References
BALLARD M., (2004), « La théorisation comme structuration de l’action du traducteur », in La Linguistique, n. 40, Linguistique et traductologie, 2004/1, pp. 51-65. http://www.cairn.info/revue-la-linguistique-2004-1-page-51.htm.
BAKER M., (1992), In Other Words: A Coursebook on Translation, Londres/New York, Routledge, 1992.
CURRY H. B., FEYS R., (1958), Combinatory Logic, vol.1, North Holland.
56
References
DESCLES J.-P (2003), «La grammaire Applicative et Cognitive construit-elle des représentations universelles ? »,http://linx.revues.org/226
DESCLES, J-P. (1998), « Les Langues sont-elles des représentations du monde », Essais sur le langage, logique, et sens comun, Editions universitaires, Fribourg.
ENGLAND R., HANSON S., (2008), « Technical Translation and a Role for FCA », International Conference on Advanced Language Processing and Web Information Technology, IEEE, 2008,pp 99-103.
57A. P. Genova, May 2015
A. P. Genova, May 2015
References
FODOR, J.A. (1975), The Language of Tought, Harvard University Press, Cambridge Mass.
GANTER B., STUMME G., WILLE R., (2005), FormalConcept Analysis, Foundations and Applications, Springer,2005.
PASCU A., DESCLES J.-P (2005), « Modélisation sémantique et logique de la catégorisation », LALICC, Paris-Sorbonne, http://lalic.paris-sorbonne.fr/AXESRECHERCHE/operation5.html
SHAUMYAN, S. (1977), Applicational Grammar as a Semantic Theory of Natural Language, Chicago University Press.
WHORF, B.L. (1966), Linguistique et anthropologie, Payot, Paris (Language Thought and Reality, Wiley and Sons, New York, 1958).
58
A. P. Genova, May 2015
References
FCA page d'accueil –
http ://www.fcahome.org.uk/fca.html
4. Concept Explorer CONEXP -
http ://sourceforge.net/projects/conexp/
59
A. P. Genova, May 2015 60
Fred Sommers, The Logic of Natural Languages, Oxford University Press, 1984
« There is as much truth in beauty as is beauty in truth. »