20
Version 1.0 © Abelardo López Lagunas TC2006 Programming Languages Abelardo López Lagunas, Ph. D. School of Engineering and Architecture Computer Science department ITESM Campus Toluca [email protected] 1 Version 1.0 © Abelardo López Lagunas Outline Course objectives Grading and policy of the course Tentative schedule Textbook and other references Introduction Why so many programming languages? Programming paradigms Programming language specification Abstraction, grammars & programming languages

Computer Science Translators

Embed Size (px)

DESCRIPTION

Essay about translator in computer science

Citation preview

Version 1.0 © Abelardo López Lagunas

TC2006 Programming Languages

Abelardo López Lagunas, Ph. D. School of Engineering and Architecture

Computer Science department ITESM Campus Toluca

[email protected]

1

Version 1.0 © Abelardo López Lagunas

Outline

• Course objectives • Grading and policy of the course • Tentative schedule • Textbook and other references • Introduction

• Why so many programming languages? • Programming paradigms • Programming language specification

• Abstraction, grammars & programming languages

Version 1.0 © Abelardo López Lagunas

Course Objectives

• Present the fundamental concepts • Programming language classification • Programming paradigms

• How to describe programming languages • The basic translation process • Lexical and syntactic analysis

• Present the basics of functional languages • Motivation and the use of Scheme/LISP

• Present the basic concepts in concurrency • Motivation and possibly cover a parallel language

• Present other programming paradigms • Logic languages, script languages, and domain specific languages

3

Version 1.0 © Abelardo López Lagunas

Course Policy• Class starts at 7:35 and ends at 8:55 on Mondays, and Thursdays. • Partial grades are reported using only the integer part of the number. The final grade

is calculated using floating-point numbers with four digits of precision. Grades with a fractional part greater or equal to 0.75 will be rounded to the next integer number; otherwise the fractional part will be truncated.

• The deadline for the assignments is at 16:59 on the specified date. If you decide to submit your assignment electronically the deadline is 22:59hrs (of the Campus Server).

• There are four programming assignments out of which I will select three with the highest grades.

• No project reports will be accepted after the due date at the time specified earlier. The projects should include the following: • Student ID and name for each of the team members written in the cover page, or in the

upper right corner of the first sheet of paper. Also include the submission date. • Description of the project, including the design methodology. All data structures and

function or procedure definitions. Results and conclusions.

• Make-up exams will be made only with the corresponding medical or legal documentation as well as with the authorization of the dean of your department.

• Cheating will result in a grade of 10/100 on the exam or assignment in which it occurred, a written report will be sent to both your department dean and to the registrar so that it is attached to your file.

Version 1.0 © Abelardo López Lagunas

Course Grading

• Exams: • Duration of 85 minutes. • No “cheat sheets”, text books, or articles are allowed.

• There is no project, instead there are four assignments that represent 20% of the grade. • Select your software environment by next week (Cygwin, Co-Linux, Linux, OS X. You

can use Windows tools but use this opportunity to learn something new). • As an incentive I’ll provide the tools only for non-windows platforms

All dates are tentative

Element Assigned on

Returned by

Weight Description

1st 11-Sep-14 11-Sep-14 25% Topic 1 and 2.22nd 27-Oct-14 27-Oct-14 25% Topic 2.3 through 3Final exam TBD TBD 20% Topics 1 through 5.2Assignment # 1 28-Aug-14 04-Sep-14 5% Lexical specification *Assignment # 2 08-Sep-14 15-Sep-14 10% Syntax specification *Assignment # 3 02-Oct-14 16-Oct-14 5% Programming assignment *Assignment # 4 30-Oct-14 20-Nov-14 10% Programming assignment *

Version 1.0 © Abelardo López Lagunas

Grade History

• Students who have taken a this course have told me it was of above average complexity. • To give you an idea of the

complexity of the course the figure on the right is the grade histogram of past semesters.

• The historic average is 81.6 for a population of 53 students.

6

X<7070<X<8080<X<9090<X<95X>95

Version 1.0 © Abelardo López Lagunas

Assignment Description

• There are four assignments: • Lexical analyzer and symbol table management. • Syntactic Analyzer with simple error recovery. • A programming assignment using Scheme/LISP. • A programming assignment on concurrent programming.

• Can use C threads or a parallel language, such as Erlang

• All assignments must follow the guidelines set in the course policy

Version 1.0 © Abelardo López Lagunas

Office Hours

Time Monday Tuesday Wednesday Thursday Friday

7:30 Programming Languages!

127

Adv. Digital Systems!

xxx

Programming Languages!

127

Adv. Digital Systems!

xxx8:00

8:30

9:00 Software Engineering!

127

Analysis & Modeling SW!

311

OS Lab!127

Software Engineering!

127

Analysis & Modeling SW!

3119:30

10:00

10:30 Operating Systems!311

Operating Systems!

31111:00

11:30

12:00

12:30

14:30

15:00

15:30 !Office hours

16:00

16:30 !Class preparation

17:00

17:30

18:00

18:30

Version 1.0 © Abelardo López Lagunas

Office Location

Concepción Muciño

Office block at Aulas I Second Floor

My office

Version 1.0 © Abelardo López Lagunas

• Text: • Robert W. Sebesta “Concepts of programming

languages”. 9th edition Addison Wesley. • Auxiliary texts:

• Michael L. Scott “Programming Language Pragmatics,” Morgan Kaufmann, 2nd edition.

• Harold Abelson, Gerald J. Sussman “Structure and Interpretation of Computer Programs,” on-line version (http://mitpress.mit.edu/sicp/full-text/book/book.html)

• Tom Niemann “A Compact Guide to LEX & YACC” epaperpress.com (can be downloaded from the web)

References

Version 1.0 © Abelardo López Lagunas

Concepts of Programming Languages

• Why study the basic concepts behind programming languages? • Increased ability to express ideas • Improved background for choosing appropriate

languages (choose the right language for the task) • Increased ability to learn new languages

• Better understanding of significance of implementation • Better use of languages that are already known • Learn new paradigms and try to use them in other languages (if

feasible).

• Overall advancement of computing

11

Version 1.0 © Abelardo López Lagunas

Why so many languages?

Source: http://www.digibarn.com/collections/posters/tongues/ComputerLanguagesChart-med.png

Version 1.0 © Abelardo López Lagunas

Why so many languages?

• The goal of any programming language is to capture the programmers intent. • In the beginning computer time was more important

that programmers time, so the language was closer to the actual hardware implementation. • Very difficult to do, even for small programs • Little portability across different hardware devices

• As the computing power was increased and hardware costs decreased tools were devised to translate a higher level language into the actual instructions that were actually executed in the hardware.

• However, the programmers intent can be captured in many ways

13

Version 1.0 © Abelardo López Lagunas

Language design

• There are several factors that enable the proliferation of computer languages: • Evolution of computer science: still growing • Special purpose tasks: some languages solve problems

more efficiently on different domains. • Personal preference: people think differently and thus

can express themselves better in different languages. • Ease of use: how difficult is it to learn? • Ease of implementation: some languages are harder to

map into hardware platforms. • Expressive power: some languages are better than

others in abstracting details. • Good translators: how efficient is the translator?

14

Version 1.0 © Abelardo López Lagunas

Language Evaluation Criteria

• Readability: the ease with which programs can be read and understood

• Writability: the ease with which a language can be used to create programs

• Reliability: conformance to specifications (i.e., performs to its specifications)

• Cost: the ultimate total cost

15

Source: Robert W. Sebesta. Copyright © 2009 Addison-Wesley. All rights reserved.

Version 1.0 © Abelardo López Lagunas

Language Evaluation Criteria

• Readability • Simplicity: manageable set of features and constructs

with minimal feature multiplicity and minimal operator overloading.

• Orthogonality: relatively small set of primitive constructs that can be combined in a relatively small number of ways. Every possible combination is legal

• Useful data types and expressive syntax

• Writability • Abstraction: ability to define and use complex structures

or operations in ways that allow details to be ignored • Expressivity: convenient ways of specifying operations

16

Source: Robert W. Sebesta. Copyright © 2009 Addison-Wesley. All rights reserved.

Version 1.0 © Abelardo López Lagunas

Evaluation Criteria: Reliability

• Type checking • Testing for type errors

• Exception handling • Intercept run-time errors and take corrective measures

• Aliasing • Presence of two or more distinct referencing methods for

the same memory location

• Readability and writability • A language that does not support “natural” ways of

expressing an algorithm will require the use of “unnatural” approaches, and hence reduced reliability

17

Source: Robert W. Sebesta. Copyright © 2009 Addison-Wesley. All rights reserved.

Version 1.0 © Abelardo López Lagunas

Evaluation Criteria: Cost

• Training programmers to use the language • Writing programs (closeness to particular

applications) • Compiling programs • Executing programs • Language implementation system: availability of

free compilers • Reliability: poor reliability leads to high costs • Maintaining programs

18

Source: Robert W. Sebesta. Copyright © 2009 Addison-Wesley. All rights reserved.

Version 1.0 © Abelardo López Lagunas

Evaluation Criteria: Others

• Portability • The ease with which programs can be moved from one

implementation to another

• Generality • The applicability to a wide range of applications

• Well-definedness • The completeness and precision of the language’s official

definition

19

Source: Robert W. Sebesta. Copyright © 2009 Addison-Wesley. All rights reserved.

The Four Rs of Programming Language Design

Dominic OrchardComputer Laboratory, University of Cambridge, UK

[email protected]

Categories and Subject Descriptors D.1.0 [Software]: Program-ming Techniques—General; I.0 [Computing Methodologies]: GEN-ERAL

General Terms Design, Languages

Keywords Programming language design, The Four Rs, Domain-specific languages

“I can learn the poor things reading, writing, and ’rithmetic,and counting as far as the rule of three, which is just as muchas the likes of them require;” Lawrie Todd: Or the Settlersin the Woods, Galt (1832) [4].

˜̃˜Many will be familiar with the old adage that at the core of

any child’s education should be the three Rs: reading, writing,and ’rithmetic. The phrase, which appeared first in print in 1825[12] has been appropriated and parodied at length (“read, reason,recite”, “reduce, reuse, recycle”, etc.). Each permutation has thesame purpose: to express succinctly the core tenets of an approachor philosophy.

The four Rs of programming language design is another suchparody of this old phrase, providing a rubric, or framework, forthe design and evaluation of effective programming languages andlanguage features.

Since the very first programming language back in the 1940s[14] thousands of programming languages have been developed,representing a broad spectrum of paradigms, perspectives, andphilosophies. And yet, there is no single language which is “allthings to all men” (and women!).

The four Rs were born out of trying to answer a number of ques-tions about the nature of programming languages and programminglanguage design: what makes a programming language effective orineffective? What should be the core aims of a language designer?How should programming languages and features be compared?Why is there no single “perfect” language? The four Rs go some-way towards answering these questions.

Before I reveal the four Rs, let’s first consider some more foun-dational questions:

Why programming languages? The development of program-ming languages has greatly aided software engineering. As hard-ware and software have grown increasingly complex, programming

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. To copy otherwise, to republish, to post on servers or to redistributeto lists, requires prior specific permission and/or a fee.Onward! 2011, October 22–27, 2011, Portland, Oregon, USA.Copyright c� 2011 ACM 978-1-4503-0941-7/11/10. . . $10.00

languages have developed to manage this complexity more effec-tively, aiding us in expressing ideas and solving increasingly com-plex problems.

Programming languages provide abstraction, by both hiding de-tails and allowing components to be reused, allowing programmersto more effectively manage complexity in software and hardware.While it is in principle possible for any program to be written inmachine code, it’s hard to imagine some of the larger computerprograms we interact with daily being developed in such a way. Bybuilding layers of abstraction with languages, increasingly complexsystems can be constructed.

What is programming? In essence, programming is a communi-cation process between one or more programmers and one or morecomputer systems. Programming languages are the medium of thiscommunication.

Programming is not only a communication process, it is also atranslation process. Each participant in the programming processhas an internal language, both programmers and machines. In thecase of a machine, the internal language comprises the instructionsof the underlying hardware. In the case of a programmer, the inter-nal language is far more nebulous, perhaps comprising natural andformal languages, along with other incorporeal, abstract thoughts.

In any case, a programming language acts as the intermediatelanguage of translation between the participants. Programming isthe translation from a programmer’s internal language to a pro-gramming language, and execution is the translation from theprogramming language to the machine’s internal language. Mc-Cracken, in 1957, captured some of this sentiment, saying “Pro-gramming [...] is basically a process of translating from the lan-guage convenient to human beings to the language convenientto the computer” where the convenient language for humans was“mathematics or English statements of decisions to be made” [8].Here we consider the “language convenient to human beings” to beprogramming languages, bridging the gap between our ideas andthe underlying, low-level instructions of a computer system.

Sometimes, programming is more exploration than communi-cation. In which case, a programmer explores and learns about aproblem by translating their internal thoughts into a program andthe re-internalising the result to gain further insight. Again the pro-cess is a translational.

It is from this view of programming, as a translation, communi-cation, and exploration, that the four Rs are sculpted.

A programming language should improve thefour Rs of programs: reading, writing, running,and reasoning.

These four tenets are both guidelines for language design andresearch, and criteria for judging a language. They are by no meansmutually exclusive, independent, or orthogonal, but are all inter-related. They are also not designed to subsume or replace the

157Version 1.0 © Abelardo López Lagunas

Additional Lectures• Tom Jepsen “How Programming

Languages Evolve,” IT Pro November ❘ December 1999.

20

68 IT Pro November ❘ December 1999 1520-9202/99/$10.00 © 1999 IEEE

How ProgrammingLanguages EvolveTom Jepsen

O nce upon a time, lifewas simple. Largemonolithic comput-ers, usually painted

blue, ran single-threaded batchprograms under the watchfuleyes of an operator and a systemprogrammer. A card reader

served as the input device, atape drive provided storage,

and a line printer processed out-put. Programmers wrote busi-ness applications in Cobol,scientific applications in Fortran.In either case, they worked outthe program logic on paper first,used a keypunch to producepunched cards, then ran theresulting deck through the cardreader. After a few debuggingsessions, they received theircomputed results on sheets offanfold paper. If a programrequired documentation, pro-grammers produced it on a man-ual typewriter.

Today, a few lumbering pro-gramming languages cling tolife while hordes of newer types

struggle to replace them.Thesenew species include a confusingswirl of languages that are

• compiled,• interpreted,• Web based,• scripting and modeling

capable,• object oriented,• graphically based,• text-processing based, or• founded on artificial

intelligence routines.

In addition, all kinds of spe-cialized languages address thechallenges of developing specificapplications. To an outside ob-server, this proliferation mightseem strange. Why haven’t computer scientists and IT pro-fessionals—people generally sci-entific and rational almost to afault—focused on creating a fewrobust and adaptable problem-solving languages that could suc-ceed in any computing en-vironment?

ECOLOGICAL NICHESQuite simply, the rapid spread

of computerization into allphases of modern life—and thediversity of application andproblem domains this trend hascreated—require a wide rangeof problem-solving tools.

Programming languages haveevolved to provide such toolsand, like evolution in nature,have generated numerousmutations in the process, somesuccessful and some not.A fewforces drive this evolutionaryprocess; understanding themand which languages evolved asan answer to these forces may

Learning whatforces driveprogramming-language evolutioncan help you pickone foryour project.

help you select a language foryour development project.

Send in the clonesMuch of software program-

ming consists of reinventing thewheel with a slightly differentcolor scheme, turning radius,and spoke orientation.This real-ization launched programmerson a quest to develop tech-niques for cloning and recyclingcode used on previous projects.Structured programming led to code modularization, which in time led to object-oriented programming and component-based development. The questfor reusable software also con-tributed to the developmentof architecture- and platform-independent languages.

The incredible shrinking computer

With each succeeding gener-ation, computers have housedthe same processing power insmaller packages: First, main-frames shrank to minicomput-ers, then minicomputers dwin-dled to desktops. Now eversmaller computers find theirway into handheld personal dig-ital assistants, clothing,and jew-elry. Computers have becomeubiquitous. Almost all techno-logical artifacts now contain astored program of some sort.Such small computers demandeconomical languages thatleave a comparably small foot-print.

Have code, will travelManufacturers of mainframes

and even early PCs designedtheir products for standalone

• Dominic Orchard “The Four Rs of Programming Language Design,” Onward! 2011, October 22–27, 2011, Portland, Oregon, USA. © 2011 ACM.

Version 1.0 © Abelardo López Lagunas

Language classification

• Languages can be grouped in two families • Declarative languages: what the computer is to do

• Functional (LISP/Scheme, ML, Haskell) • Dataflow (Id, Val) • Logic or constraint-based (Prolog, spreadsheets) • Template-based (XSLT)

• Imperative languages: how the computer should do it • Sequential or vonNeumann (C, Ada, Fortran)

• Scripting (Perl, Python, PHP) • Object-oriented (Smalltalk, Eiffel, C++, Java)

• Note that there are concurrent extensions of some of the above languages. • However, some languages are naturally concurrent, such as

those based on the dataflow model.

21

Version 1.0 © Abelardo López Lagunas

Influence of Computer Engineering

22

vonNeumann model

initialize the program counter

repeat forever fetch the instruction pointed by

the program counter (PC) increment the PC decode the instruction

execute the instruction

end repeat

Version 1.0 © Abelardo López Lagunas

vonNeumann Bottleneck

• Connection speed between a computer’s memory and its processor determines the speed of a computer • Program instructions often can be executed much faster

than the speed of the connection; the connection speed thus results in a bottleneck

• Known as the vonNeumann bottleneck; it is the primary limiting factor in the speed of computers

23

Version 1.0 © Abelardo López Lagunas

Motivation: Why we need Translators?• What is a translator (a.k.a. compiler)?

• Program that transforms a description written in a source language (usually a high level language) into a destination or object language (usually machine language)

• The compiler name was used early on (~1950) because the translation process was seen as compilation of subroutines selected from a library.

• A translator automates the process of mapping the high level language into machine instructions allowing the programmer to create higher level of abstractions. • Machine Language -> Assembly -> Low Level Language -> High Level

Language -> Application Programming -> ... • Translators may optimize the mapping into machine instructions

• Smaller code size, faster execution times, lower memory usage, etc. • Translators may take care of resource management

• Automatic memory management (i.e. Garbage collection)

Version 1.0 © Abelardo López Lagunas

Motivation: Why we need Translators? (2)

High level of abstraction • Complex manipulation • Automatic\Implied Resource management

Low level of abstraction • Optimized machine instructions • Explicit Resource management

• Data path management (VLIW)

Translator

0

Z 2

0 1 0 1 0 1

A B C

Version 1.0 © Abelardo López Lagunas

On Abstraction

26

‘The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise’!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! - Edsger Dijkstra

Version 1.0 © Abelardo López Lagunas

Translator Types

• Translators can be classified depending on their function: • Compilers: translate from a high level language to machine language

• Assemblers: translate from assembly language to machine language

• Interpreters: the translation process does not produce machine language. The high level language is executed as it is translated. This will be the emphasis of this course.

• Optimizers: translators optimize the object language (size, speed, etc)

• One pass or Multiple passes • One pass is the object code generation is done at the same time as the source language

is analyzed.

• Multiple passes: the object code generation is done as the last sage of translation.

• Syntax Checkers: analyze the source language for syntax and semantic errors or ambiguities (i.e. Lint)

• Parallel or Scalar

Version 1.0 © Abelardo López Lagunas

Translation Process (1)

• Lexical Analysis: takes the source language and extracts its fundamental parts, or tokens, such as identifiers, constants, reserved words.

• Syntactic Analysis: takes the list of tokens and and creates syntactic trees for grammatical structures such as expressions, statements, declarations, etc. It also verifies the syntax correctness of the source.

• Semantic Analysis: checks the syntactic trees verifies their structure, and determines their meaning. It generates an intermediate representation based on a predefined set of semantic actions.

• Intermediate Representation (IR): Also known as intermediate code represents the semantic actions into a format amenable for systematic manipulation (mainly optimization).

Semantic Analysis

Syntax Analysis

IR

IR Optimization

Instruction Selection

Code Optimization

Assembly

Linking

Object Code

Source Code

Interpretation

Lexical Analysis

Version 1.0 © Abelardo López Lagunas

Translation Process (2)• IR Optimization: performs manipulations that

reduce the IR representation by removing unnecessary code and references.

• Instruction Selection: translates IR code into object code using instruction templates. May also include register assignment.

• Object Code optimization: manipulates the object code to minimize size, execution time, memory usage, memory references, etc.

• Assembly: translates the object code into machine language. May perform further manipulations.

• Linking: Generates executable code by including runtime support routines.

• Each of the above stages may generate error messages. Error generation and reporting is very important because it is the main source of feedback to the user.

Semantic Analysis

Syntax Analysis

IR

IR Optimization

Instruction Selection

Code Optimization

Assembly

Linking

Object Code

Source Code

Interpretation

Lexical Analysis

Version 1.0 © Abelardo López Lagunas

Translation: Lexical

• Identify tokens of the language • Numbers • Identifiers • Operators • Reserved words • Strings

• Find and report errors such as • Malformation of tokens • Unknown symbols

• Assign values to the tokens • See values in parenthesis in

the example.

Source: Z = -1.8 * (X / 2)

- 1= . 8 * ( X / 2

Num(-1.8)Tokens:

Source: 72..18 / va#l

2 .7 1 / v a # l

Num(18)

.

Error bad number Div_op Error bad Id

8

Z )

Id(Z) Eq_op Mul_opLpar_op

Id(X)Div_op

Num(2)Rpar_op

Version 1.0 © Abelardo López Lagunas

Translation: Syntax\Semantics

• Syntax Analyzer builds a syntactic or parse tree • Enables checks for malformed

expressions, statements, declarations, etc.

• Semantic Analyzer uses the parse tree to: • Check declarations, types, possible

promotion, uninitialized variables, etc.

• Annotates the parse tree.

Z = -1.8 * (X / 2)

Id(Z) = Num(-1.8) Op(*) Op(() id(X) Op(/) Num(2) Op())

Lexical Analysis

=

Assignment

id Op_eq Expression

Z

( )Expression

id Op Num

X 2/

Num Op

Check_Symbol(Z), Check_Symbol (X), Promote (2)

Declared as Float Get value or reference of X

Declared as Float Get reference of Z

Promote to floatSemantic Analysis

Syntax Analysis

-1.8 *

Version 1.0 © Abelardo López Lagunas

Translation: IR\Optimization

• Translates the annotated parse tree into a linear notation • Easier to manipulate by the

optimizer

Temp1 = int_to_float(2) Temp2 = Value(X) / Temp1 Temp3 = -1.8 * Temp2 Location(Z) = Temp3

Intermediate Representation

=

Assignment

id Op_eq Expression

Z.location

( )Expression

id Op

X.value Promote(2)/

Num Op

-1.8 *

Num

IR Optimization

Temp1 = Value(X) / 2.0 Location(Z) = -1.8 * Temp1

! IR Optimization removes unnecessary code by transforming constants, resolving references, algebraic manipulations, etc.

Version 1.0 © Abelardo López Lagunas

Translation: Back End

Temp1 = Value(X) / 2.0 Location(Z) = -1.8 * Temp1

Instruction Selection

Ld RA, [X.location] Div RB, RA, 2.0 Mul RC, RB, -1.8 St [Z.location], RC

Register Allocation

Ld R0, [X.location] Div R1, R0, 2.0 Mul R2, R1, -1.8 St [Z.location], R2 Code

Optimization

Ld R0, [X.location] Div R1, R0, 2.0 Mul R0, R1, -1.8 St [Z.location], R0

CO: Remove References

Ld R0, X.Value Div R1, R0, 2.0 Mul R0, R1, -1.8

Assembly

FFED FEC5 FBA0

Reclaim Registers

LinkerRuntime Code (load) FFED FEC5 FBA0 Runtime Code (end)

Version 1.0 © Abelardo López Lagunas

Compiler hints

34

‘If you lie to the compiler, it will get its revenge.’!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! -- Henry Spencer

Version 1.0 © Abelardo López Lagunas

Translator Design & Formal Languages

• Translators are designed in a modular fashion • Enables support for different source languages on the same

platform by only changing the analysis modules “front-end”. • Enables support of several platforms for the same source

language by only changing the synthesis modules “back-end” • Can explore different optimization algorithms without

disrupting the rest of the modules.

• Translators use formal languages to describe the analysis of the source language: • Regular Expressions for Lexical Analysis • Context free grammars for Syntactic Analysis • Attributed Grammars for Semantic Analysis

Version 1.0 © Abelardo López Lagunas

Formal Language Concepts (2)

• For example if L = {A,B,...,Z,a,b,...z} D = {0,1,..,9} then • L U D is the set of letters and digits • LD is the set of a letter followed by a digit • L4 is the set of strings of four letters • L* is the set of all strings of letters, including ε • L(L U D)* is the set of all strings and digits that start with a letter,

ε is not included • D+ is the set of all strings with at least one digit.

• As can be seen this formalism helps manipulate infinite number of string combinations with a small set of symbols. • But not all the combinations are meaningful • Need a way to specify strings with patterns of interest.

Furthermore we need to formalize how to generate strings. • Introduce the concept of grammars

Version 1.0 © Abelardo López Lagunas

Formal Language Concepts (3)

• A grammar is the ordered quartet G =({N},{T},P,S) over a language L • N is a finite non-empty set of non terminal symbols (written in bold) • T is a finite non-empty set of terminal symbols • P is a set of productions of the form u -> v such that u is an element

of (N U T)+ and v is an element of (N U T)* • S is a non terminal symbol denoted start

• Example: • identifier -> letter(letter | digit)* • letter -> A | B | . . . . | Z | a | b | . . . | z • digit -> 0 | 1 | . . . . | 9

• So N = {letter, digit, identifier}, T = {A, B, ..., Z, a, b, ..., z,0 ,…9}, P are the above productions and S = {identifier}

• The | operator is a logical “or”.

Version 1.0 © Abelardo López Lagunas

Formal Language Concepts (4)

• To generate a string r from a grammar G: • Take the start symbol S and apply the productions to the non

terminal symbols in that production to produce a new string r’. • Repeat the above step recursively until the new string only

contains terminal symbols T. • Note: a grammar generates strings from a language L with

a pattern specified by the production rules. The language L is just a collection of strings over an alphabet.

• Let s, t, u y v strings over some language L. • If w = s u t, w' = s v t y u -> v is a production of the grammar G

then the string w' is directly generated by the string w • If there is a sequence of strings w0, w1,...., wn, w where w' is

directly generated by w0, w0 is directly generated by w1, and in general wi-1 is directly generated by wi then the sequence w',w0,w1,....,wn,w is called derivation sequence.

Version 1.0 © Abelardo López Lagunas

Examples of Derivation Sequences

• Given • identifier -> letter(letter | digit)*

• letter -> A | B | . . . . | Z | a | b | . . . | z

• digit -> 0 | 1 | . . . . | 9

• Then the string help3 is derived from: !

• Given a grammar G for some language L it is possible to construct a finite state automata to parse through strings. • The automata will accept any string if there is a valid derivation sequence,

otherwise it will reject it.

• Automata will be used to design lexical and syntactic analyzers that will accept all valid strings that can be generated.

• A grammar can be seen as a meta-language that specifies how to derive valid strings from a source language.

letter letter digit

h

identifier

e

letter

l

letter

p 3

Version 1.0 © Abelardo López Lagunas

Grammar Classification

• Grammars can be classified in four groups: • Type 0 - There are no construction restrictions. • Type 1 - Context Sensitive has productions u -> v where u

belongs to (N U T)+ and v belongs to (N U T)* • Type 2 - Context free has productions u -> v where u belongs to

N+ and v belongs to (N U T)* • Type 3 - Regular has the productions:

• Right A -> tB | A; B belongs to N and t belongs to an alphabet T* and A -> t

• Left A -> Bt | A; B belongs to N and t belongs to an alphabet T* and A -> t

• Note that regular grammars are also context sensitive. • Although the number of symbols of a grammar is finite it can

generate an infinite number of symbols.