39
Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical and Computational Sciences University of Toronto Mississauga November 20 2020 This session is being recorded! 1 / 39

Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

Welcome to CSC 108

Introduction to Computer Programming

Lecture W10C

Drs. Michael Liut, Andi Bergen, Larry Zhang

Mathematical and Computational Sciences

University of Toronto Mississauga

November 20 2020

This session is being recorded!

1 / 39

Page 2: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

Regular Expressions (Regex)

Definition 1 (A sequence of characters that forms a

search pattern)

1. Phone numbers

2. Email addresses

3. Postal Codes / Zip Codes

4. Valid variable names

(e.g., variable names cannot start with digits)

5. etc.

2 / 39

Page 3: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

Regular Expressions (Regex)

1. Groups: (), (?:) capturing vs. non-capturing

2. Quantifiers: *? {1,2}

3. Character classes: [A-Za-z]

4. Escape characters: \.

5. Logical operators: a | b

6. Use of raw string in python: r‘regex’

3 / 39

Page 4: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

Simple Usage

1 >>> import re

2 >>> txt = "Today’s topic in CSC108: Regexes."

3 >>> x = re.search("CSC108", txt)

4 >>> if x:

5 ... print("Yes, there’s a match")

6 ‘Yes, there’s a match’

7

8 We could use ‘‘in” for this simple example.

4 / 39

Page 5: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

Find all phone numbers with area code 416

We don’t know exactly what we are looking for (e.g., 416-555-1235),

but only a pattern.

1 txt = ‘143-614-3330, 556-732-3881, 680-964-1127, 568-769-3556,

2 099-887-1597, 081-997-3959, 842-502-6372, 406-648-1681,

3 416-475-8283, 259-778-2868, 105-776-7011, 912-576-5192,

4 018-087-9554, 975-845-6860, 702-619-1033, 326-382-3556,

5 416-294-6744, 957-135-4565, 667-624-1973, 603-418-9850’

We could use split(), loops, if, startswith() and string splicing.

5 / 39

Page 6: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

Find all phone numbers with area code 416

I A phone number is a sequence of characters that follows a

pattern

I Pattern: 416- followed by 3 digits, a dash and then 4 digits

6 / 39

Page 7: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

Findall

I Search literals: "CSC108", "416-"

I Search ranges/classes: [a-zA-Z], [0-9]

I Wild cards: . Dot matches anything

I Find zero/one or more occurrences: * + (e.g., .* or .+)

I Escape characters: \. \? \+

I Logical operators: a∣∣ b (Either a or b, not both)

I Specific number of occurrences: {1} {2} {3}

7 / 39

Page 8: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

Findall

How do we find a phone number with “416” area code?

I Search literals “416” “-”

I Search ranges/classes: [0-9]

I Specific number of occurrences: {1} {2} {3}

8 / 39

Page 9: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

Findall

Find all phone numbers with “416” in this string.

1 ‘143-614-3330, 556-732-3881, 680-964-1127, 568-769-3556,

2 099-887-1597, 081-997-3959, 842-502-6372, 406-648-1681,

3 416-475-8283, 259-778-2868, 105-776-7011, 912-576-5192,

4 018-087-9554, 975-845-6860, 702-619-1033, 326-382-3556,

5 416-294-6744, 957-135-4565, 667-624-1973, 603-418-9850’

9 / 39

Page 10: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

Findall

1 >>> x = re.findall("416-[0-9]{3}-[0-9]{4}", txt)

2 >>> print(x)

3 [‘416-475-8283’, ‘416-294-6744’]

4 >>> x = re.findall("416-\d{3}-\d{4}", txt)

5 >>> print(x)

6 [‘416-475-8283’, ‘416-294-6744’]

7

10 / 39

Page 11: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

Algorithm Steps

“416-\d{3}-\d{4}” Search text: “416-555-1234 , 516-55-51234”

I Iterate through the input, try to match it to the current

char of the regex

I If it matches, advance to the next input char and next char

in the regex

I If there was a match and the regex has no next char, add

the string to the results

I If it did not match, advance to the next input char and reset

the regex to its start

11 / 39

Page 12: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

Algorithm Steps II

“416-\d{3}-\d{4}”

Search text: “416-555-1234 , 516-55-51234”

12 / 39

Page 13: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

Algorithm Steps II

“416-\d{3}-\d{4}”

Search text: “416-555-1234 , 516-55-51234”

13 / 39

Page 14: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

Algorithm Steps II

“416-\d{3}-\d{4}”

Search text: “416-555-1234 , 516-55-51234”

14 / 39

Page 15: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

Algorithm Steps II

“416-\d{3}-\d{4}”

Search text: “416-555-1234 , 516-55-51234”

15 / 39

Page 16: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

Algorithm Steps II

“416-\d{3}-\d{4}”

Search text: “416-555-1234 , 516-55-51234”

16 / 39

Page 17: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

Algorithm Steps II

“416-\d{3}-\d{4}” First occurrence

Search text: “416-555-1234 , 516-55-51234”

17 / 39

Page 18: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

Algorithm Steps II

“416-\d{3}-\d{4}” Second occurrence

Search text: “416-555-1234 , 516-55-51234”

18 / 39

Page 19: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

Algorithm Steps II

“416-\d{3}-\d{4}” Third occurrence

Search text: “416-555-1234 , 516-55-51234”

19 / 39

Page 20: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

Algorithm Steps II

“416-\d{3}-\d{4}”

Search text: “416-555-1234 , 516-55-51234”

20 / 39

Page 21: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

Algorithm Steps II

“416-\d{3}-\d{4}” First occurrence

Search text: “416-555-1234 , 516-55-51234”

21 / 39

Page 22: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

Algorithm Steps II

“416-\d{3}-\d{4}” Second occurrence

Search text: “416-555-1234 , 516-55-51234”

22 / 39

Page 23: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

Algorithm Steps II

“416-\d{3}-\d{4}” Third occurrence

Search text: “416-555-1234 , 516-55-51234”

23 / 39

Page 24: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

Algorithm Steps II

“416-\d{3}-\d{4}” Fourth occurrence

Search text: “416-555-1234 , 516-55-51234”

24 / 39

Page 25: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

Algorithm Steps II

“416-\d{3}-\d{4}” Fourth occurrence.

Regex is empty, take result, reset regex.

Search text: “416-555-1234 , 516-55-51234”

25 / 39

Page 26: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

Algorithm Steps II

“416-\d{3}-\d{4}”

Search text: “416-555-1234, 516-55-51234”

Result: [“416-555-1234”]

Continue to try and match one item of the regex at a time with

the next character of the input string, until the input string is

empty

26 / 39

Page 27: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

More examples (Find special characters)

1 >>> txt = "Abc. Hello! 100 + 8"

2 >>> x = re.findall(".!+", txt)

3 >>> print(x)

4 [‘o!’] <-- We want . ! +

5 >>> x = re.findall("\.\!\+", txt)

6 >>> print(x)

7 [ ] <-- Still not quite correct

8 >>> x = re.findall("[!.+]+", txt)

9 >>> print(x)

10 [‘.’, ‘!’, ‘+’]

27 / 39

Page 28: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

More examples (Groups)

1 >>> txt = "csccsccsc"

2 >>> x = re.findall(r‘(csc)’, txt)

3 >>> print(x)

4 [‘csc’, ‘csc’, ‘csc’]

5 >>> x = re.findall(r‘(?:csc){3}’, txt)

6 >>> print(x)

7 [‘csccsccsc’]

28 / 39

Page 29: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

More examples (Or)

1 >>> txt = "csccsccsc"

2 >>> x = re.findall("csc∣∣ cs", txt)

3 >>> print(x)

4 [‘csc’, ‘csc’, ‘csc’]

5 >>> x = re.findall("cs∣∣ csc", txt)

6 >>> print(x)

7 [‘cs’, ‘cs’, ‘cs’]

29 / 39

Page 30: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

More examples (Words)

1 >>> txt = "Mike Miller, Mick Furrier, Mike Baker,

2 Myke Mason"

3 >>> x = re.findall("Mike \w*", txt)

4 >>> print(x)

5 [‘Mike Miller’, ‘Mike Baker’]

6 >>> x = re.findall("(M[iy]ke \w*)", txt)

7 >>> print(x)

8 [‘Mike Miller’, ‘Mike Baker’, ‘Myke Mason’]

30 / 39

Page 31: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

Task 1

1 def find_specific(txt: str) -> List[str]:

2 """ Return a list of matches where the string

3 begins with one ‘a’ followed by one or more ‘b’

4 >>> find_specific("aaabbbabbacb")

5 [‘abbb’, ‘abb’]

6 >>> find_specific("")

7 []

8 """

9 pass

31 / 39

Page 32: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

Answer

1 def find_specific(txt: str) -> List[str]:

2 """ Return a list of matches where the string

3 begins with one ‘a’ followed by one or more ‘b’

4 >>> find_specific("aaabbbabbacb")

5 [‘abbb’, ‘abb’]

6 >>> find_specific("")

7 []

8 """

9 return re.findall("ab+", txt)

32 / 39

Page 33: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

Task 2

1 def run_me(txt: str, my_regex: str) -> List[str]:

2 if len(my_regex) <= 6:

3 return re.findall(my_regex, txt)

4 return ["Fail"]

5

6 def task2_helper(txt: str):

7 """ Create a regex that is no more than 6 characters long

8 that matches when a string, or substring, starts with

9 an ‘a’ followed by 0 or more numerals and ends

10 with the following string: ‘CS’

11 >>> task2_helper("a998289CSC is great aaCS")

12 [‘a998289CS’, ‘aCS’]

13 >>> task2_helper("aCSC108")

14 [‘aCS’]

15 >>> task2_helper("")

16 []

17 """

18 your_regex = r""

19 return run_me(txt, your_regex)

33 / 39

Page 34: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

Answer

1 def run_me(txt: str, my_regex: str) -> List[str]:

2 if len(my_regex) <= 6:

3 return re.findall(my_regex, txt)

4 return ["Fail"]

5

6 def task2_helper(txt: str):

7 """ Create a regex that is no more than 6 characters long

8 that matches when a string, or substring starts with

9 an ‘a’ followed by 0 or more numerals and ends

10 with the following string: ‘CS’

11 >>> task2_helper("a998289CSC is great aaCS")

12 [‘a998289CS’, ‘aCS’]

13 >>> task2_helper("aCSC108")

14 [‘aCS’]

15 >>> task2_helper("")

16 []

17 """

18 your_regex = r‘a\d*CS’19 return run_me(txt, your_regex)

34 / 39

Page 35: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

Task 3

1 def task3(txt: str) -> bool:

2 """Return true if the input text contains only

3 valid variables names in Python.

4 Hint: are the doctests complete?

5 >>> task3("x y foo foobar")

6 True

7 >>> task3("x y x-y foobar")

8 False

9 >>> task3("foo_bar")

10 True

11 >>> task3(" ")

12 False

13 >>> task3("")

14 False

15 """

16 pass

35 / 39

Page 36: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

Answer

1 def task3(txt: str) -> bool:

2 """Returns true if the input text contains only valid variables names in Python.

3 Hint: are the doctests complete?

4 >>> task3("x y foo foobar")

5 True

6 >>> task3("x y x-y foobar")

7 False

8 >>> task3("foo_bar")

9 True

10 >>> task3(" ")

11 False

12 >>> task3("")

13 False

14 """

15 res = re.findall("[A-Za-z_]\w]*", txt)

16 if not res:

17 return False

18 return len(txt.replace(" ", "")) == len("".join(res))

36 / 39

Page 37: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

Task 4

Rhyming words...

1 def task4(target: str, valid_words: List[str], quality: int = 2) -> List[str]:

2 """Find all words that rhyme with the target string.

3 Words are considered to rhyme when their last <quality> characters are identical.

4

5 target: the string for which you want to find all rhyming words for

6 valid_words: contains all words in the language

7 quality: how many characters at the end of the target have to match

8 >>> task4("bat", ["cat", "mat", "butter", "adjective", "flat"])

9 [‘cat’, ‘mat’, ‘flat’]

10 >>> task4("batter", ["cat", "weather", "butter", "adjective", "flatter"])

11 [‘weather’, ‘butter’, ‘flatter’]

12 >>> task4("batter", ["cat", "weather", "butter", "adjective", "flatter"], 3)

13 [‘butter’, ‘flatter’]

14 >>> task4("a", ["cat", "weather", "butter", "adjective", "flatter"])

15 []

16 """

17 pass

37 / 39

Page 38: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

Answer

1 def task4(target: str, valid_words: List[str], quality: int = 2) -> List[str]:

2 """Find all words that rhyme with the target string.

3 Words are considered to rhyme when their last <quality> characters are identical.

4

5 target: the string for which you want to find all rhyming words for

6 valid_words: contains all words in the language

7 quality: how many characters at the end of the target have to match

8 >>> task4("bat", ["cat", "mat", "butter", "adjective", "flat"])

9 [‘cat’, ‘mat’, ‘flat’]

10 >>> task4("batter", ["cat", "weather", "butter", "adjective", "flatter"])

11 [‘weather’, ‘butter’, ‘flatter’]

12 >>> task4("batter", ["cat", "weather", "butter", "adjective", "flatter"], 3)

13 [‘butter’, ‘flatter’]

14 >>> task4("a", ["cat", "weather", "butter", "adjective", "flatter"])

15 []

16 """

17 if len(target) < quality:

18 return []

19 v = ",".join(valid_words)

20 reg = "\w+" + target[-quality:]

21 return re.findall(reg, v)

38 / 39

Page 39: Welcome to CSC 108108/lectures/W10C.pdf · 2020. 12. 9. · Welcome to CSC 108 Introduction to Computer Programming Lecture W10C Drs. Michael Liut, Andi Bergen, Larry Zhang Mathematical

Next Time

1. Sorting.

2. Time and Complexity.

39 / 39