40
1 DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida Information Management

DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

Embed Size (px)

DESCRIPTION

Information Management. DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida. DO NOT BE A RABBIT!. If you don ’ t know how to Do something, Don ’ t hide under a bush. Tell me Or Come see me. Naturphoto.cz. Regular Expressions. - PowerPoint PPT Presentation

Citation preview

Page 1: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

1

DIG 3563: Lecture 2a:

Regular Expressions

Michael MoshellUniversity of Central Florida

Information Management

Page 2: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

2

If you don’t know how toDo something,

Don’t hide under a bush.

Tell meOr

Come see me.

DO NOT BEA RABBIT!

Naturphoto.cz

Page 3: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

3

Regular Expressions• A "grammar" for validating input useful for many kinds of pattern recognition

The basic built-in Boolean function in PHP is called 'preg_match'.

It takes two or three arguments:

the pattern, like "cat"the test string, like "catastrophe"

and an (optional) array variable, which we can ignore for now

It returns TRUE if the pattern matches the test string.

Page 4: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

4

POSIX Regular Expressions

Always begin with "/ and end with /" (for today's lesson)

$instring = "catastrophe";

if (preg_match("/cat/",$instring)){

print "I found a cat!";}else{

print "No cat here.";}

Page 5: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

5

Regular Expressions

$instring = "catastrophe";

if (preg_match("/cat/",$instring)){

print "I found a cat!";}else{

print "No cat here.";}

I found a cat!

Page 6: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

6

PRACTICE 1:

"/cat/" that is the regular expression

Make up a Regular Expression to recognizeNot the word cat, but rather the word dog.

Write it on your paper, now.

Page 7: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

7

PRACTICE 1:

"/cat/" that is the regular expression

Make up a Regular Expression to recognizeNot the word cat, but rather the word dog.

Write it on your paper, now.

Yes, I mean YOU. Where is your paper and pencil?

(You can use your laptop if that’s what you have…)

Page 8: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

8

PRACTICE 1:

"/cat/" that is the regular expression

Make up a Regular Expression to recognizeNot the word cat, but rather the word dog.

Write it on your paper, now.

Answer: "/dog/"

Yep, it’s that simple. But I gotta get you STARTED.

Page 9: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

9

Regular Expressions

Wild cards: period . matches any single character

$instring = "cotastrophe";

if (preg_match("/c.t/",$instring)){

print "I found a c.t!";}else{

print "No c.t here.";}

Page 10: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

10

Regular Expressions

Wild cards: period . matches any single character

$instring = "cotastrophe";

if (preg_match("/c.t/",$instring)){

print "I found matching string!";}else{

print "No c.t here.";}

I found a matching string!

Page 11: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

11

Regular Expressions

Wild cards: a* matches any number of a characters (or the "null character"!)

$instring = "caaaatastrophe";

if (preg_match("/ca*t/",$instring)){

print "I found a match!";}else{

print "No ca*t here.";}

I found a match!

Page 12: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

12

Regular Expressions

Wild cards: .* matches any string of characters (or the "null character"!)

$instring = "cotastrophe";

if (preg_match("/c.*t/",$instring)){

print "I found a c.*t!";}else{

print "No c.*t here.";}

I found a c.*t!

Page 13: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

13

Regular Expressions

Wild cards: .* matches any string of characters (or the "null character"!)

$instring = "cflippingmonstroustastrophe";

if (preg_match("/c.*t/",$instring)){

print "I found a c.*t!";}else{

print "No c.*t here.";}

Page 14: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

14

Regular Expressions

Wild cards: .* matches any string of characters (or the "null character"!)

$instring = "cflippingmonstroustastrophe";

if (preg_match("/c.*t/",$instring)){

print "I found a c.*t!";}else{

print "No c.*t here.";}

I found a c.*t!

Page 15: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

15

PRACTICE 2:

"/c.t/" that is a model RE for you"/c.*t/" that is a model RE for you"/ca*t/" that is a model RE for you

Make up a Regular Expression to recognize

Rob or Rb or Roob or Rooob, etc.

But to REJECT Reb and Rab and Rats and Mike ….

Page 16: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

16

PRACTICE 2:

"/c.t/" that is a model RE for you"/c.*t/" that is a model RE for you"/ca*t/" that is a model RE for you

Answer:

”/Ro*b/”

Page 17: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

17

Quantification

Multiple copies of something:

a+ means ONE OR MORE a’sExample: "/fa+ther/" matches father, faather, faaather, etc.

a* means ZERO OR MORE a’sExample: "/fa*ther/" matches fther, father, faather, etc.

a? means ZERO OR ONE aExample: "/flavou?r/" will match flavor AND flavour.

a{33} means 33 instances of a

Page 18: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

18

Quantification Example

a+ means ONE OR MORE a’sExample: "/fa+ther/" matches father, faather, faaather, etc.

a* means ZERO OR MORE a’sExample: "/fa*ther/" matches fther, father, faather, etc.

a? means ZERO OR ONE aExample: "/flavou?r/" will match flavor AND flavour.

a{33} means 33 instances of a

How to recognize “Rob” or “Robb”?

Page 19: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

19

Quantification Example

a+ means ONE OR MORE a’sExample: "/fa+ther/" matches father, faather, faaather, etc.

a* means ZERO OR MORE a’sExample: "/fa*ther/" matches fther, father, faather, etc.

a? means ZERO OR ONE aExample: "/flavou?r/" will match flavor AND flavour.

a{33} means 33 instances of a

How to recognize “Rob” or “Robb”? ”/Robb?/"

Page 20: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

20

Quantification Example

a+ means ONE OR MORE a’sExample: "/fa+ther/" matches father, faather, faaather, etc.

a* means ZERO OR MORE a’sExample: "/fa*ther/" matches fther, father, faather, etc.

a? means ZERO OR ONE aExample: "/flavou?r/" will match flavor AND flavour.

a{33} means 33 instances of a

How to recognize “Rob” or “Robb”? Another way:”/Rob{1,2}/"

Page 21: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

21

Escaping

Backslash means "don't interpret this:"

\. is just a dot\* is just an asterisk.

Page 22: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

22

The concept:

Would

$t="/a{3}\.b{1,4}/";$s= "aaa.bbb"; this would or would not be accepted?

preg_match($t,$s) – true or false?

Page 23: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

23

The concept:

Would

$t="/a{3}\.b{1,4}/";$s= "aaa.bbb"; this would or would not be accepted?

preg_match($t,$s) – true or false?

TRUE, because $s matches the pattern string $t.

three a, one dot, and between one and four b characters.

Page 24: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

24

The concept:

Would

$t="/a{3}\.b{1,4}/";$s= "aaa.bbbbb"; this would or would not be accepted?

preg_match($t,$s) – true or false?

Page 25: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

25

The concept:

Would

$t="/a{3}\.b{1,4}/";$s= "aaa.bbbbb"; this would or would not be accepted?

preg_match($t,$s) – true or false?

Perhaps surprisingly, TRUE: because $scontains three a and 4 b.

Page 26: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

26

The concept:

Would

$t="/a{3}\.b{1,4}/";$s= "aaa.bbbbb"; this would or would not be accepted?

preg_match($t,$s) – true or false?

Perhaps surprisingly, TRUE: because $scontains three a and 4 b.

If you have $1.00 and I asked you “do you have 75 cents?” the answer would be YES.

Page 27: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

27

The concept:

Would

$t="/a{3}\.b{1,4}/";$s= "aaa.bbbbb"; this would or would not be accepted?

preg_match($t,$s) – true or false?

Perhaps surprisingly, TRUE: because $scontains three a and 4 b.

If you wanted an EXACT match, I'll show you howIn a bit.

Page 28: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

28

Grouping

Multiple copies of something:

(abc)+ means ONE OR MORE string abc’s(abc)* means ZERO OR MORE string abc’s

like abcabcabcSETS:[0-9] matches any single integer character[A-Z] matches any uppercase letter[AZ] matches A or Z[AZ]? (i.e. 0 or 1 of the previous) matches null, A or Z

Page 29: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

29

Starting and Ending

preg_match("/cat/","abunchofcats") is TRUEbutpreg_match("/^cat/","abunchofcats") is FALSE

because ^ means the RE must match the first letter.

preg_match("/cats$/","abunchofcats") is TRUEbutpreg_match("/cats$/","mycatsarelazy") is FALSE

So, ^ marks the head and $ marks the tail.

Page 30: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

3030

Exact Matching with ^ and $

$t="/^a{3}\.b{1,4}$/";$s= "aaa.bbbbb"; this would or would not be accepted?

preg_match($t,$s) – true or false?

FALSE, because the ending $ in the pattern says "no more input is acceptable" but more stuff comes.

This would also reject$s="aaa.bbbbAndMoreText";

Page 31: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

313131

Alternatives - the 'or' mark |

$t="/flav(o|ou)r/";

This will match 'flavor' and 'flavour'.

And (yes!) there are often more than one way to do things; for instance our good old ? Mark.

"/flavou?r/"

Page 32: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

32

Sets - Examples

[A-E]{3} matches AAA, ABA, ADD, ... EEE

[PQX]{2,4} matches PP, PQ, PX ... up to XXXX

[A-Za-z]+ matches any alphabetic string with 1 or more characters

[A-Z][a-z]* matches any alpha string with first letter capitalized.

[a-z0-9]+ matches any string of lowercase letters and numerals

Page 33: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

33

Practice in class

Write a RE that recognizes any string that begins with"sale".

Here's an example for you to look at, help you remember

^cat

From now on, the RE is just ^cat. You don't need to write the other stuff (preg_match, "/, etc.)

Page 34: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

34

Practice

1) Write a RE that recognizes any string that begins with"sale".

Answer: ^sale

Page 35: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

35

Practice

1) Write a RE that recognizes any string that begins with"sale".

Answer: ^sale

2) Write a RE that recognizes a string that begins with"smith" and a two digit integer, like smith23 or smith99.

Here's an example from your recent past: a{3}\.b{1,4}

Page 36: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

36

Practice

1) Write a RE that recognizes any string that begins with"sale".

Answer: ^sale

2) Write a RE that recognizes a string that begins with"smith" and a two digit integer, like smith23 or smith99.

Answer: ^smith[0-9]{2}

Page 37: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

3737

3) Write a RE that recognizes Social Security numbers in the form like

123-45-6789

Helpers from the recent past: ^smith[0-9]{2}

a{3}\.b{1,4}

Page 38: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

383838

3) Write a RE that recognizes Social Security numbers in the form like

123-45-6789

Answer:

[0-9]{3}\-[0-9]{2}\-[0-9]{4}

Page 39: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

39393939

3) Write a RE that recognizes Social Security numbers in the form like

123-45-6789

Answer:

[0-9]{3}\-[0-9]{2}\-[0-9]{4}

NOTE: That's a conservative answer. It turns out that the dash character is not a special symbol outside sets, and so you could also write

[0-9]{3}-[0-9]{2}-[0-9]{4}

But I don't like to remember stuff, so I use \ a lot.

Page 40: DIG 3563: Lecture 2a: Regular Expressions Michael Moshell University of Central Florida

4040

How to study this stuff?

Practice making up RE for problems like these:

• The UCF NID• French telephone numbers like (+33 5 23 46 22 91)• Dollars and cents, like $942.73• A field that may contain only lowercase strings with

exactly ONE vowel.

How do you know if they're good? If you know PHPYou can test them. Otherwise, check out each others' work.

(OR come see me in office hours!)(Or by appointment!)407 694 6763