25
Colloquium - grep v1.0 A. Magee April 6, 2010 1 / 16 Colloquium - grep, v1.0 A. Magee

Grep Introduction

Embed Size (px)

DESCRIPTION

A brief introduction to the grep command line tool

Citation preview

Page 1: Grep Introduction

Colloquium - grepv1.0

A. Magee

April 6, 2010

1 / 16

Colloquium - grep, v1.0

A. Magee

Page 2: Grep Introduction

Outline

1 IntroductionWhat does grep offer?When should I use grep?

2 Understanding Regular ExpressionsClass BasicsQuantifiers & GroupingOnline ToolsExamples

3 Using Regular Expressions With grep

2 / 16

Colloquium - grep, v1.0

A. Magee

Page 3: Grep Introduction

Outline

1 IntroductionWhat does grep offer?When should I use grep?

2 Understanding Regular ExpressionsClass BasicsQuantifiers & GroupingOnline ToolsExamples

3 Using Regular Expressions With grep

2 / 16

Colloquium - grep, v1.0

A. Magee

Page 4: Grep Introduction

Outline

1 IntroductionWhat does grep offer?When should I use grep?

2 Understanding Regular ExpressionsClass BasicsQuantifiers & GroupingOnline ToolsExamples

3 Using Regular Expressions With grep

2 / 16

Colloquium - grep, v1.0

A. Magee

Page 5: Grep Introduction

Introduction What?

What does grep offer?

grep matches regular expressions.

Your first question should be“What is a regular expression?”A regular expression is a language pattern.

grep and REs allow us to find complex things in text.

Complex is relative and can vary from a single character to an IPaddress.

Single character complex: [ajk+0-]IP complex: (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}

(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)

3 / 16

Colloquium - grep, v1.0

A. Magee

Page 6: Grep Introduction

Introduction What?

What does grep offer?

grep matches regular expressions.

Your first question should be“What is a regular expression?”A regular expression is a language pattern.

grep and REs allow us to find complex things in text.

Complex is relative and can vary from a single character to an IPaddress.

Single character complex: [ajk+0-]IP complex: (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}

(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)

3 / 16

Colloquium - grep, v1.0

A. Magee

Page 7: Grep Introduction

Introduction What?

What does grep offer?

grep matches regular expressions.

Your first question should be“What is a regular expression?”A regular expression is a language pattern.

grep and REs allow us to find complex things in text.

Complex is relative and can vary from a single character to an IPaddress.

Single character complex: [ajk+0-]IP complex: (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}

(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)

3 / 16

Colloquium - grep, v1.0

A. Magee

Page 8: Grep Introduction

Introduction When?

When should I use grep?

Always!

Unless you find some better tool.

P.S. - grep stands for g/re/p, an ed command that means global/regex/print

4 / 16

Colloquium - grep, v1.0

A. Magee

Page 9: Grep Introduction

Regular Expressions Class Basics

Class Basics

A character class is a symbol or collection of symbols that describes agroup of characters.

. (period): This matches any single character.

[...]: This matches any one character in the set.

[aeiou] matches one of the vowels.[a-z] matches one of the lowercase alphabet.[0-5] matches one numeral 0 through 5.You will not remember all of these until you use them often, but

there are many special classes that can save you some typing.

5 / 16

Colloquium - grep, v1.0

A. Magee

Page 10: Grep Introduction

Regular Expressions Class Basics

Common Classes

Special Class Meaning Simple RE\d Digit characters [0-9]\D Non-digit characters [ˆ0-9]\w Word characters [a-zA-Z 0-9]\W Non-word characters [ˆa-zA-Z 0-9]\s Whitespace characters characters [\f\n\r\t]\S Non-space characters [ˆ\f\n\r\t]\b Word boundary

The word boundary class is very special as it is zero length and matchestransitions between \s and \w and vice versa.

6 / 16

Colloquium - grep, v1.0

A. Magee

Page 11: Grep Introduction

Regular Expressions Class Basics

More Common Classes

Special Class Meaning Simple RE[:alpha:] All alphabetic characters [a-zA-Z][:alnum:] All alphabetic and numeric [a-zA-Z0-9][:blank:] Tab and space[:cntrl:] Control characters [\x00-\x1F\x7F][:digit:] A numeric digit [0-9]

[:graph:] Any visible character [\x21-\x7E][:lower:] Lowercase characters [a-z][:print:] Printables (i.e. no controls) [\x20-\x7E][:punct:] Punctuation & symbols [!”#$%&’()*+,\-./:;<=>?

@[ ]ˆ ‘{|}∼][:space:] Space, tab, newline, etc [ \t\r\n\v\f][:upper:] Uppercase characters [A-Z][:word:] Word characters [a-zA-Z0-9 ][:xdigit:] Hex digits [A-Fa-f0-9]

7 / 16

Colloquium - grep, v1.0

A. Magee

Page 12: Grep Introduction

Regular Expressions Quantifiers & Grouping

Quantifiers & Grouping

Quantifiers are how a RE counts things.? Exactly zero or one occurrence* Zero or more occurrences

+ One or more occurrences*? Zero or more occurrences non-greedy

+? One or more occurrences non-greedy{x} Exactly x occurrences{x,} At least x occurrences{x,y} At least x but no more than y occurrences

Grouping is used to collect patterns together and to createback-references. A group is simply a set of parentheses ().

8 / 16

Colloquium - grep, v1.0

A. Magee

Page 13: Grep Introduction

Regular Expressions Online Tools

Helpful Tools

The best way to understand the rest of this presentation is to see what isbeing matched live. Here are some online tools that work for our needs.

RegExr - www.gskinner.com/RegExrbeware Flash, but it works well

regexpal - regexpal.comvery simple

reanimator - osteele.com/tools/reanimatorbeware Flash, recommend CS 4/570 first

rubular - rubular.comnice on-page reference

9 / 16

Colloquium - grep, v1.0

A. Magee

Page 14: Grep Introduction

Regular Expressions Examples

Your First RE

Let’s skip trivial REs and get on to something useful. These may be morecomplex than you’re used to but the quicker you are able to read long,complex REs the better. This is a nice, but not perfect, email addressmatcher.

[[:alnum:]][[:word:]\.%+-]*@(?:[[:alnum:]-]+\.)+[[:alpha:]]{2,4}

[[:alnum:]][[:word:]\.%+-]*Match a word that doesn’t start with [.%+-].

@(?:[[:alnum:]-]+\.)+Match the @ symbol and any number of subdomains followed byperiods.

[[:alpha:]]{2,4}Match the top level domain of 2, 3 or 4 characters.

10 / 16

Colloquium - grep, v1.0

A. Magee

Page 15: Grep Introduction

Regular Expressions Examples

Your First RE - Part 2

Let’s examine the first part.

[[:alnum:]][[:word:]\.%+-]*

[[:alnum:]] - Must start with an alphanumeric character.NB: All [: ... :] classes must live in a set like [[: ... :]].

[[:word:]\.%+-] - Other characters maybe a ‘word’ character,a literal space, percent symbol, plus symbol or a dash.NB: The period must be escaped because it has special meaning.

* - repeat the previous set zero or more times.

11 / 16

Colloquium - grep, v1.0

A. Magee

Page 16: Grep Introduction

Regular Expressions Examples

Your First RE - Part 2

Let’s examine the first part.

[[:alnum:]][[:word:]\.%+-]*

[[:alnum:]] - Must start with an alphanumeric character.NB: All [: ... :] classes must live in a set like [[: ... :]].

[[:word:]\.%+-] - Other characters maybe a ‘word’ character,a literal space, percent symbol, plus symbol or a dash.NB: The period must be escaped because it has special meaning.

* - repeat the previous set zero or more times.

11 / 16

Colloquium - grep, v1.0

A. Magee

Page 17: Grep Introduction

Regular Expressions Examples

Your First RE - Part 2

Let’s examine the first part.

[[:alnum:]][[:word:]\.%+-]*

[[:alnum:]] - Must start with an alphanumeric character.NB: All [: ... :] classes must live in a set like [[: ... :]].

[[:word:]\.%+-] - Other characters maybe a ‘word’ character,a literal space, percent symbol, plus symbol or a dash.NB: The period must be escaped because it has special meaning.

* - repeat the previous set zero or more times.

11 / 16

Colloquium - grep, v1.0

A. Magee

Page 18: Grep Introduction

Regular Expressions Examples

Your First RE - Part 3

Now the second part, the subdomains, sub-subdomains, etc.

@(?:[[:alnum:]-]+\.)+

@ - Well that literally matches the ‘at’ character.

The parenthesis denote the beginning of a group.The ?: is a confusing notation that suppresses the creation of aback reference. It is here so you’ll know of it, but it is rarely needed.

Again we see a special class for alphanumerics, but we’ve alsoincluded a dash. The plus symbol tells us to look for one or more ofthese characters, followed by a period.

And lastly we close the group and the plus symbol now tells us tolook for one or more of these groups.

12 / 16

Colloquium - grep, v1.0

A. Magee

Page 19: Grep Introduction

Regular Expressions Examples

Your First RE - Part 3

Now the second part, the subdomains, sub-subdomains, etc.

@(?:[[:alnum:]-]+\.)+

@ - Well that literally matches the ‘at’ character.

The parenthesis denote the beginning of a group.The ?: is a confusing notation that suppresses the creation of aback reference. It is here so you’ll know of it, but it is rarely needed.

Again we see a special class for alphanumerics, but we’ve alsoincluded a dash. The plus symbol tells us to look for one or more ofthese characters, followed by a period.

And lastly we close the group and the plus symbol now tells us tolook for one or more of these groups.

12 / 16

Colloquium - grep, v1.0

A. Magee

Page 20: Grep Introduction

Regular Expressions Examples

Your First RE - Part 3

Now the second part, the subdomains, sub-subdomains, etc.

@(?:[[:alnum:]-]+\.)+

@ - Well that literally matches the ‘at’ character.

The parenthesis denote the beginning of a group.The ?: is a confusing notation that suppresses the creation of aback reference. It is here so you’ll know of it, but it is rarely needed.

Again we see a special class for alphanumerics, but we’ve alsoincluded a dash. The plus symbol tells us to look for one or more ofthese characters, followed by a period.

And lastly we close the group and the plus symbol now tells us tolook for one or more of these groups.

12 / 16

Colloquium - grep, v1.0

A. Magee

Page 21: Grep Introduction

Regular Expressions Examples

Your First RE - Part 3

Now the second part, the subdomains, sub-subdomains, etc.

@(?:[[:alnum:]-]+\.)+

@ - Well that literally matches the ‘at’ character.

The parenthesis denote the beginning of a group.The ?: is a confusing notation that suppresses the creation of aback reference. It is here so you’ll know of it, but it is rarely needed.

Again we see a special class for alphanumerics, but we’ve alsoincluded a dash. The plus symbol tells us to look for one or more ofthese characters, followed by a period.

And lastly we close the group and the plus symbol now tells us tolook for one or more of these groups.

12 / 16

Colloquium - grep, v1.0

A. Magee

Page 22: Grep Introduction

Regular Expressions Examples

Your First RE - Part 4

Finally the third part, the domain.

[[:alpha:]]{2,4}We’ll now this part is easy. Just match 2, 3 or 4 alphabeticalcharacters.

13 / 16

Colloquium - grep, v1.0

A. Magee

Page 23: Grep Introduction

Regular Expressions Examples

Your Second RE

Now we’ll look at a RE that can help use build a header file for a cprogram file, given that some neglectful programmer has failed to designhis/her c program properly. This will be a quicker example.

ˆ[\w\s]*\([\w\s\*&,]*\)\s*{

ˆ[\w\s]*\(At the beginning of a line match some keywords and types andthe function name and then literal parenthesis.

[\w\s\*&,]*Match some more words, keywords, variable modifiers and commas.

\)\s*{Finally match the closing parenthesis, some whitespace and theleft curly brace, denoting the start of the function body.

14 / 16

Colloquium - grep, v1.0

A. Magee

Page 24: Grep Introduction

Regular Expressions Examples

Your Second RE - Fine Details

ˆ[\w\s]*\([\w\s\*&,]*\)\s*{

In general, most RE parsers will not match across multiple lines, eventhough the \s class matches the newline character. This is verybothersome but is easily overcome by using pcregrep. pcre is PerlCompatible Regular Expression. This is all I will ever say about Perl.

Notice that the literal * must be escaped like so, \*.

As must the parentheses due to their special RE meaning.

Escaping so many characters is very annoying, but unfortunately it isnecessary.

15 / 16

Colloquium - grep, v1.0

A. Magee

Page 25: Grep Introduction

Appendix

4 Appendix

16 / 16

Colloquium - grep, v1.0

A. Magee