20
REGULAR EXPRESSIONS (REGEX) 4 / 20 1 / 20

REGUL AR EXPRESSIONS (REGEX)

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

REGULAR EXPRESSIONS (REGEX)

4 / 201 / 20

REGULAR EXPRESSIONA regular expression (regex) is a pattern that a string may or may not match.Example: [0-9]+

[0-9] means a character in that range“+” means one or more of what came beforeStrings that match:     9125    4Strings that don’t:       abc         empty string

Note: Symbols like [, ], and + have special meaning. They are not part of thestring that matches. We’ll �nd out how to look for these in a string shortly.

4 / 202 / 20

USES FOR REGULAR EXPRESSIONSHandling white space: A program ought to be able to treat any number ofwhite space characters as a separator.Identifying blank lines: Most people consider a line with just spaces on it tobe blank.Validating input: To check that input has the expected format (e.g., a date inDD-MM-YYYY format).

4 / 203 / 20

WHY DO WE USE REGEX?It’s much easier to declare a pattern that you want matched than to writecode that matches it.By having the pattern explicitly declared, rather than implicit in code thatmatches it, it’s much easier to understand what the pattern is and modify itif need be.

4 / 204 / 20

WHERE DO WE USE REGEX?Regular expressions are used in many places.Editors like vi, emacs, and Sublime Text allow you to use regular expressionsfor searching.Many unix commands use regular expressions. Example: grep patternfile prints all lines from �le that match patternMany programming languages provide a library for regular expressions.The syntax varies from context to context, but the core is the sameeverywhere.

4 / 205 / 20

SOME SIMPLE PATTERNS

4 / 206 / 20

ANCHORINGAnchoring lets you force the position of the match.

^ matches the beginning of the line$ matches the end

4 / 207 / 20

ESCAPINGWhat if I want my regular expression to look for the string $ ^ [We would need to escape it’s prede�ned quanti�er meaning by adding in abackslash.

\$ \^ \[

We can also use escapes to get other characters\t is a tab character\n is a newline

4 / 208 / 20

QUANTIFIERS

4 / 209 / 20

PREDEFINED CHARACTERS

4 / 2010 / 20

DEFINING YOUR OWN CHARACTER CLASSES

What if you want to de�ne your own character range?

4 / 2011 / 20

CAPTURING GROUPSCapturing groups allow you to treat multiple characters as a single unit. Useparentheses to group.e.g. (BC)* means zero or more instances of BC, e.g., BC, BCBC, BCBCBC, etc.

4 / 2012 / 20

CAPTURING GROUPS ARE NUMBEREDCapturing groups are numbered by counting their opening parenthesesfrom left to right((A)(B(C))) has the following groups

1. ((A)(B(C)))2. (A)3. (B(C))4. (C)

The numbers are useful because they can used as references to the groups.

4 / 2013 / 20

CAPTURING GROUPS AND BACKREFERENCESThe section of the input string matching the capturing group(s) is saved inmemory for later recall via backreference.A backreference is speci�ed in the regular expression as a backslash (\)followed by a digit indicating the number of the group to be recalled.

For example, one matching string for the regex (\d\d)\1 is 1212One matching string for the regex (\w*)\s\1 is asdf asdf

4 / 2014 / 20

FINDING REGEX PATTERNSWhen given a set of strings that need to be matched, a proper regex for thisset will:

match ALL strings in this set, andwill NOT match any string that is NOT in this set

4 / 2015 / 20

REGULAR EXPRESSIONS IN JAVAThe java.util.regex package contains:

Pattern: a compiled regular expressionMatcher: the result of a match

import java.util.regex.Pattern; import java.util.regex.Matcher;

1 / 2016 / 20

2 / 2017 / 20

MATCHES() VS. FIND()matches():matches the entire string with the regexfind(): tries to �nd a substring that matches the regex

3 / 2018 / 20

GET MORE PRACTICE!Some great tutorials/tools for practicing regex:

https://regexone.com/https://regex101.com/

4 / 2019 / 20

DIFFERENT "FLAVOURS" OF REGEXDifferent languages have different implementations of Regex which maybehave differently.For this course, we will only be testing you on regex usage in JavaFor other types of regex you may use outside this course: know your �avourbefore using it (read the ).manual

4 / 2020 / 20