42
Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution Lice See http://software-carpentry.org/license.html for more informati Regular Expressions

Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Embed Size (px)

Citation preview

Page 1: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Mechanics

Copyright © Software Carpentry 2010

This work is licensed under the Creative Commons Attribution License

See http://software-carpentry.org/license.html for more information.

Regular Expressions

Page 2: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

Notebook #1

Site Date Evil (millivaders)---- ---- ------------------Baker 1 2009-11-17 1223.0Baker 1 2010-06-24 1122.7Baker 2 2009-07-24 2819.0Baker 2 2010-08-25 2971.6Baker 1 2011-01-05 1410.0Baker 2 2010-09-04 4671.6 ⋮ ⋮ ⋮

Page 3: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

Notebook #2

Site/Date/EvilDavison/May 22, 2010/1721.3Davison/May 23, 2010/1724.7Pertwee/May 24, 2010/2103.8Davison/June 19, 2010/1731.9Davison/July 6, 2010/2010.7Pertwee/Aug 4, 2010/1731.3Pertwee/Sept 3, 2010/4981.0 ⋮ ⋮ ⋮

Page 4: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

'(.+)/([A-Z][a-z]+) ([0-9]{1,2}),? ([0-9]{4})/(.+)'

This pattern matches:

- one or more characters

- a slash

- a single upper-case letter

- one or more lower-case letters

- a space

- one or two digits

- a comma if one is there

- a space

- exactly four digits

- a slash

- one or more characters

Page 5: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

How?

Page 6: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

How?

Using finite state machines

Page 7: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

a

Match a single 'a'

Page 8: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

Match a single 'a'

start here

a

Page 9: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

Match a single 'a'

match this character

a

Page 10: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

Match a single 'a'

must be here

at the end

a

Page 11: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

Match a single 'a'

must be here

at the end

a

a

Page 12: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

Match one or more 'a'

a

a

Page 13: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

Match one or more 'a'

a

a

match this

as before

Page 14: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

Match one or more 'a'

a

a

match this

as before

match again

and again

Page 15: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

Match one or more 'a'

a

a

don't have to stop

here the first time,

just have to be here

at the end

match this

as before

match again

and again

Page 16: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

Match one or more 'a'

a

a

don't have to stop

here the first time,

just have to be here

at the end

a

a+

match this

as before

match again

and again

Page 17: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

Match 'a' or nothing

a

Page 18: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

Match 'a' or nothing

a

transition is "free"

Page 19: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

Match 'a' or nothing

a

transition is "free"

So this is '(a|)'

Page 20: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

Match 'a' or nothing

a

transition is "free"

So this is '(a|)'

Which is 'a?'

Page 21: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

Match 'a' or nothing

a

transition is "free"

So this is '(a|)'

Which is 'a?'a

a+

a?

Page 22: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

Match zero or more 'a'

a

a

Page 23: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

Match zero or more 'a'

a

a

Combine ideas

Page 24: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

Match zero or more 'a'

a

a

Combine ideas

This is 'a*'

Page 25: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

Match zero or more 'a'

a

a

Combine ideas

This is 'a*'a

a+

a?

a*

Page 26: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

a

a

d

c

What regular expression

is this?

b

Page 27: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

a

a

d

c

What regular expression

is this?

a+|(b(c|d))

b

Page 28: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

Action at a node depends only on:

Page 29: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

Action at a node depends only on:

- arcs out of that node

Page 30: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

Action at a node depends only on:

- arcs out of that node

- characters in target data

Page 31: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

Action at a node depends only on:

- arcs out of that node

- characters in target data

Finite state machines have no memory

Page 32: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

Action at a node depends only on:

- arcs out of that node

- characters in target data

Finite state machines have no memory

Means it is impossible to write a regular

expression

to check if arbitrarily nested parentheses

match

Page 33: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

Action at a node depends only on:

- arcs out of that node

- characters in target data

Finite state machines have no memory

Means it is impossible to write a regular

expression

to check if arbitrarily nested parentheses

match

"(((....)))" requires memory

Page 34: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

Action at a node depends only on:

- arcs out of that node

- characters in target data

Finite state machines have no memory

Means it is impossible to write a regular

expression

to check if arbitrarily nested parentheses

match

"(((....)))" requires memory (or at least a

counter)

Page 35: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

Action at a node depends only on:

- arcs out of that node

- characters in target data

Finite state machines have no memory

Means it is impossible to write a regular

expression

to check if arbitrarily nested parentheses

match

"(((....)))" requires memory (or at least a

counter)

Similarly, only way to check if a word

contains each

vowel once is to write 5! = 120 clauses

Page 36: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

Why use a tool with limits?

Page 37: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

Why use a tool with limits?

They're fast

Page 38: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

Why use a tool with limits?

They're fast

- After some pre-calculation, a regular

expression

only has to look at each character in the

input

data once

Page 39: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

Why use a tool with limits?

They're fast

- After some pre-calculation, a regular

expression

only has to look at each character in the

input

data once

It's readable

Page 40: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

Why use a tool with limits?

They're fast

- After some pre-calculation, a regular

expression

only has to look at each character in the

input

data once

It's readable

- More readable than procedural equivalent

Page 41: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

Regular Expressions Mechanics

Why use a tool with limits?

They're fast

- After some pre-calculation, a regular

expression

only has to look at each character in the

input

data once

It's readable

- More readable than procedural equivalent

And regular expressions can do a lot more

than

what we've seen so far

Page 42: Mechanics Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See

June 2010

created by

Greg Wilson

Copyright © Software Carpentry 2010

This work is licensed under the Creative Commons Attribution License

See http://software-carpentry.org/license.html for more information.