64
CIS 191: Linux and Unix Class 4 October 7th, 2015

CIS 191: Linux and Unix Class 4 October 7th, 2015

Embed Size (px)

Citation preview

Page 1: CIS 191: Linux and Unix Class 4 October 7th, 2015

CIS 191: Linux and Unix

Class 4October 7th, 2015

Page 2: CIS 191: Linux and Unix Class 4 October 7th, 2015

Next week

• Lecture on Makefiles• Xiuruo OOO

Page 3: CIS 191: Linux and Unix Class 4 October 7th, 2015

Running at

• In Ubuntu, you’ll probably need to install at– sudo apt-get install at– It should just work after this…

• In OSX, at relies on the atrun daemon to manage its jobs– See man atrun

“The atrun utility runs commands queued by at(1). It is invoked periodically by launchd(8) as specified in the com.apple.atrun.plist property list. By default the property list contains the Disabled key set to true, so atrun is never invoked. Execute the following command as root to enable atrun: launchctl load -w /System/Library/LaunchDaemons/com.apple.atrun.plist”

Page 4: CIS 191: Linux and Unix Class 4 October 7th, 2015

Outline

Language Theory Overview

Grep Regular Expressions

Examples of Grep Regular Expressions

Sed

Page 5: CIS 191: Linux and Unix Class 4 October 7th, 2015

Languages

• A set of strings of symbols• These symbols form an “alphabet”• The language is “decided” by some process which

decides if a string is in the language or not

Page 6: CIS 191: Linux and Unix Class 4 October 7th, 2015

Regular Languages

• A regular language is a set that can be decided by viewing a single character at time, using a fixed amount of memory!– Specifically, regular languages are languages that can be decided

by a DFA (deterministic finite automaton); you’ll learn more about this in CIS 262 if you haven’t taken it already.

• It doesn’t matter how long the string is!

Page 7: CIS 191: Linux and Unix Class 4 October 7th, 2015

Regular Expressions

• A regular expression exactly describes a regular language– That is, every regular language can be described by some

regular expressions– And a regular expression describes a regular language

Page 8: CIS 191: Linux and Unix Class 4 October 7th, 2015

Regular Expressions Illustrated

• Suppose A and B are regular languages.

Page 9: CIS 191: Linux and Unix Class 4 October 7th, 2015

Regular Extensions

• A few extensions to classical regular expressions that stay within regular langauges– If A is an RE, then A+ matches one or more copies of A– If A is an RE, then A? matches one or no copies of A

Page 10: CIS 191: Linux and Unix Class 4 October 7th, 2015

Core regex in one page

• ABC– Sequence of A B and C, exactly one copy of each

• A | B– A or B

• *– >= 0 copies

• +– >= 1 copies

• ?– 0 or 1 copies

Page 11: CIS 191: Linux and Unix Class 4 October 7th, 2015

Truly Regular Expressions

• abc matches only the string “abc”• (ab)* matches the empty string “”, “ab”, “abab”, …• (a|b)+ matches any string containing some number of

‘a’s and ‘b’s• (a*b)+ matches any string that has any number of ‘a’s

followed by a single ‘b’, at least once– In other words, any string of ‘a’s and ‘b’s which ends in a ‘b’.

• a(b|c)*a matches any string which starts and ends with an ‘a’ and has only ‘b’s and ‘c’s in between.

Page 12: CIS 191: Linux and Unix Class 4 October 7th, 2015

More Regular Expression Extensions

• There are a number of extensions that allow for more concise representation– . (dot) matches any single character (any character at all)– [cde] matches any single character (here: c, d, and e) listed

between the square brackets– [h-l] matches any character in the range of characters from h-l

• To match any character not in the list, place a caret (^) first inside the brackets.– [^0-9] matches anything that is not a digit.

– If A is a RE, then A{n,m} matches anywhere between m and n copies of A, inclusive.

– A{n} matches exactly n copies of A.

• On this slide, .,[, ], {, and }, are metacharacters.

Page 13: CIS 191: Linux and Unix Class 4 October 7th, 2015

Metacharacters

• A certain number of predefined shortcuts (character classes) are provided.– [[:space:]], or ‘\s’, matches any whitespace character.– [[:alnum:]], or ‘\w’, matches any “word” character

• By which we mean letters and numbers, though some implementations include underscores (_)

– [[:digit:]], ‘\d’, matches any digit (0-9)– ^ matches “beginning-of-line”– $ matches “end-of-line”– \< and \> matches word boundaries

Page 14: CIS 191: Linux and Unix Class 4 October 7th, 2015

Metacharacters

• \\ matches backslash (\)– Since \ is normally used to specify other metacharacters

• \* matches an asterisk– Since * usually matches anything…

• \. matches a dot• Metacharacters need to be preceeded by a backslash in

order to match the literal character

Page 15: CIS 191: Linux and Unix Class 4 October 7th, 2015

“Regular” Expressions: a Misnomer

• Just about any name but “regular” would have been better!– Many extensions describe non-regular languages– The syntax and behavior is different for just about every system

involving regular expressions!– What needs escaping changes based on implementation

• In fact, Vim has four different settings for this.– See “:help magic”

– The way we describe or apply regular expressions and gather the matches differs across settings.

Page 16: CIS 191: Linux and Unix Class 4 October 7th, 2015

New Skill

xkcd.com/208

Page 17: CIS 191: Linux and Unix Class 4 October 7th, 2015

Our focus: grep and sed

• As we’ve discussed, grep applies a regular expression to each line in input file or files

• sed is a stream editor– More on this soon…

Page 18: CIS 191: Linux and Unix Class 4 October 7th, 2015

Outline

Language Theory Overview

Grep Regular Expressions

Examples of Grep Regular Expressions

Sed

Page 19: CIS 191: Linux and Unix Class 4 October 7th, 2015

Motivating Examples

• We’re usually searching for a particular kind of text– An integer, maybe with a minus sign in front– A decimal number (for example 2.718)– A first name followed by a last name

• Or maybe a last, first– An email addres– Sentences beginning with the word “The”, ending with

punctuation.– A phone number– Prime numbers

• This really does exist, but it relies on backreferences and is rather inefficient…

Page 20: CIS 191: Linux and Unix Class 4 October 7th, 2015

Integers and Decimals

• Integers start with an optional -, followed by one or more digits. The perfect regular expression is therefore…

Page 21: CIS 191: Linux and Unix Class 4 October 7th, 2015

Integers and Decimals

• Integers start with an optional -, followed by one or more digits. The perfect regular expression is therefore…– -?[[:digit:]]+– -?\d+

Page 22: CIS 191: Linux and Unix Class 4 October 7th, 2015

Integers and Decimals

• Integers start with an optional -, followed by one or more digits. The perfect regular expression is therefore…– -?[[:digit:]]+– -?\d+

• How about decimals? First, we need a characterization.– There is an optional minus sign, then an optional string of digits,

followed by a ., then a string of digits.

Page 23: CIS 191: Linux and Unix Class 4 October 7th, 2015

Integers and Decimals

• Integers start with an optional -, followed by one or more digits. The perfect regular expression is therefore…– -?[[:digit:]]+– -?\d+

• How about decimals? First, we need a characterization.– There is an optional minus sign, then an optional string of digits,

followed by a ., then a string of digits.– -?[[:digit:]]*\.[[:digit:]]+– -?\d*\.\d+

Page 24: CIS 191: Linux and Unix Class 4 October 7th, 2015

Names

• Let’s begin with a characterization.

Page 25: CIS 191: Linux and Unix Class 4 October 7th, 2015

Names

• Let’s begin with a characterization of First Name Last Name format.– A capital letter, followed by any number of letters, then a space,

then another capital followed by any number of letters

• Now, let’s come up with the regular expression

Page 26: CIS 191: Linux and Unix Class 4 October 7th, 2015

Names

• Let’s begin with a characterization of First Name Last Name format.– A capital letter, followed by any number of letters, then a space,

then another capital followed by any number of letters

• Now, let’s come up with the regular expression– [A-Z]\w*\s[A-Z]\w*

Page 27: CIS 191: Linux and Unix Class 4 October 7th, 2015

Names

• Let’s begin with a characterization of First Name Last Name format.– A capital letter, followed by any number of letters, then a space,

then another capital followed by any number of letters

• Now, let’s come up with the regular expression– [A-Z]\w*\s[A-Z]\w*

• Do you see any potential issues with this approach?

Page 28: CIS 191: Linux and Unix Class 4 October 7th, 2015

Names

• Let’s begin with a characterization of First Name Last Name format.– A capital letter, followed by any number of letters, then a space,

then another capital followed by any number of letters

• Now, let’s come up with the regular expression– [A-Z]\w*\s[A-Z]\w*

• Do you see any potential issues with this approach?– What about hyphenated names? Multiple names? Middle

initials? Middle names written out?

Page 29: CIS 191: Linux and Unix Class 4 October 7th, 2015

Aside: Solve the Problem You Want to

• Many regular expressions will match the target– But some are easier to construct (and to understand) than

others.

• If you know a little more about the text you will be handling, you can sometimes make shortcuts– This will become more apparent when we get to replacing

(rather than just matching) text.

• Modifying the problem is a major theme throughout computer science, and in this course as well!

Page 30: CIS 191: Linux and Unix Class 4 October 7th, 2015

Aside #2: Evil Regular Expressions!!!

• There are two main kinds of RE engines.– NFA (Nondeterministic Finite Automaton) engines step through

the regex and may backtrack on the input text– DFA (Deterministic Finite Automaton) engines always move

forward in the string character by character– Nonbacktracking NFA engines do exist…– See http://swtch.com/~rsc/regexp/regexp1.html for more

details on the differences.

• The runtime can increase drastically for the following– Repetitions of overlapping alternations– Repetitions within repetitions– Repetitions containing both wildcards and normal characters

Page 31: CIS 191: Linux and Unix Class 4 October 7th, 2015

Aside #2: Some evil examples

• Can you figure out why these might be “evil”?– (x*)*– (x.)*– (x|xx)*– (x|x?)*– The prime number checker we mentioned earlier

Page 32: CIS 191: Linux and Unix Class 4 October 7th, 2015

Aside #2: Some evil examples

• Can you figure out why these might be “evil”?– (x*)*– (x.)*– (x|xx)*– (x|x?)*– The prime number checker we mentioned earlier

• Think about how they behave on the string– xxxxxxxxxxxxxxxxy

Page 33: CIS 191: Linux and Unix Class 4 October 7th, 2015

Aside #2: Some evil examples

• Can you figure out why these might be “evil”?– (x*)*– (x.)*– (x|xx)*– (x|x?)*– The prime number checker we mentioned earlier

• Think about how they behave on the string– xxxxxxxxxxxxxxxxy

• Matching is exponential because ‘x’ matches with both the sub-expression x* and the expression (x*); every time it sees an ‘x’ input, potential matching paths doubles!

Page 34: CIS 191: Linux and Unix Class 4 October 7th, 2015

ReDos

• Regular expression denial of service • Use evil regex to attack a service that accepts arbitrary

regex• https://en.wikipedia.org/wiki/ReDoS

Page 35: CIS 191: Linux and Unix Class 4 October 7th, 2015

Outline

Language Theory Overview

Grep Regular Expressions

Examples of Grep Regular Expressions

Sed

Page 36: CIS 191: Linux and Unix Class 4 October 7th, 2015

grep with extended regex

• Generally, we want to use extended regular expressions (as we discussed earlier)– So when you call grep, call it with the –E flag

Page 37: CIS 191: Linux and Unix Class 4 October 7th, 2015

ps -aux

• All processes• You can look up a particular process using grep…

Page 38: CIS 191: Linux and Unix Class 4 October 7th, 2015

ps aux

$ ps –aux | grep yes | less

Page 39: CIS 191: Linux and Unix Class 4 October 7th, 2015

ps aux with word boundry

$ ps -aux | grep –w yes | less

Page 40: CIS 191: Linux and Unix Class 4 October 7th, 2015

C identifiers

• Suppose we want to find all uses of the function strfry in the directory chef

• We can use Bash expansions and grep together!

$ grep –E strfry *.cchef.c: strfry(p_str);chef.c: cond ? strfry(uuname) : uunamerecipes.c: is_strfry_ingredient(p_src)

Page 41: CIS 191: Linux and Unix Class 4 October 7th, 2015

C Identifiers

• But grep included results that we didn’t want, such as is_strfry_ingredient

• What can we do?

Page 42: CIS 191: Linux and Unix Class 4 October 7th, 2015

C Identifiers

• But grep included results that we didn’t want, such as is_strfry_ingredient

• What can we do?– Include word boundaries!

$ grep –E \<strfry\> *.cchef.c: strfry(p_str);chef.c: cond ? strfry(uuname) : uuname

Page 43: CIS 191: Linux and Unix Class 4 October 7th, 2015

Grepping for Hardware…

• Another common scenario: attempting to find a particular piece of hardware

• The lspci command will spit out a list of available PCI (Peripheral Component Interconnect) devices

$ lspci | grep –i NetworkEthernet controller: Intel 82566MM GigabitNetwork controller: Intel PRO/Wireless

Page 44: CIS 191: Linux and Unix Class 4 October 7th, 2015

Grepping for Hardware

• Which kernel modules are related?

$ lsmod | grep –i iwliwl4965 202721 0iwl_legacy 146875 1iwl4965mac80211 267163 2iwl4965,iwl_legacycfg80211 170485 3iwl4965,iwl_legacy,

mac80211

Page 45: CIS 191: Linux and Unix Class 4 October 7th, 2015

Display only the matching text

• Generally, when grep finds a match, it will display the entire line

• Most of the time this is what you want!• But when you are trying to extract a match from the text

– Like when you are looking for an address or a phone number…

• You may want to only display the match.• You can do this with the –o option

– grep –oE ‘regular expression’ file_list– displays just the matches on separate lines

Page 46: CIS 191: Linux and Unix Class 4 October 7th, 2015

Greedy Matching

• Let’s right a regular expression to match all instances of html tags of the form <p>, <em>, <title>…

Page 47: CIS 191: Linux and Unix Class 4 October 7th, 2015

Greedy Matching

• Let’s right a regular expression to match all instances of html tags of the form <p>, <em>, <title>…– <.*>

Page 48: CIS 191: Linux and Unix Class 4 October 7th, 2015

Greedy Matching

• Let’s right a regular expression to match all instances of html tags of the form <p>, <em>, <title>…– <.*>

• What if we run this on– <strong>Hi! I’m an example!</strong>

Page 49: CIS 191: Linux and Unix Class 4 October 7th, 2015

Greedy Matching

• Let’s right a regular expression to match all instances of html tags of the form <p>, <em>, <title>…– <.*>

• What if we run this on– <strong>Hi! I’m an example!</strong>

• We’ll get the following match:– <strong>Hi! I’m an example!</strong>

Page 50: CIS 191: Linux and Unix Class 4 October 7th, 2015

What went wrong?

• Grep matches expressions greedily.• This means that it will try and match as much as it can (if

there is more to match in a line, it will do so – even if it has already found a match!)

• While there are some syntaxes (such as Perl) which allow for lazy matching, Grep’s extended regex syntax does not allow this!

• You can use perl syntax with grep –P, but we are not allowing that for assignments in this class.

Page 51: CIS 191: Linux and Unix Class 4 October 7th, 2015

A right answer (without greed)

• <strong>Hi! I’m an example!</strong>• What if we try the following expression:

– <[^>]*>

Page 52: CIS 191: Linux and Unix Class 4 October 7th, 2015

A right answer (without greed)

• <strong>Hi! I’m an example!</strong>• What if we try the following expression:

– <[^>]*>

• We’ll match every character that is not the close brace, followed by a close brace.

• Hallelujah! Success! We get– <strong>– </strong>

• Just as we expected.

Page 53: CIS 191: Linux and Unix Class 4 October 7th, 2015

A right answer (without greed)

• <strong>Hi! I’m an example!</strong>• What if we try the following expression:

– <[^>]*>

• We’ll match every character that is not the close brace, followed by a close brace.

• Hallelujah! Success! We get– <strong>– </strong>

• Just as we expected.

Page 54: CIS 191: Linux and Unix Class 4 October 7th, 2015

Outline

Scheduled Jobs

Language Theory Overview

Grep Regular Expressions

Examples of Grep Regular Expressions

Sed

Page 55: CIS 191: Linux and Unix Class 4 October 7th, 2015

Sed Introduction

• The man page for sed describes it as “a stream editor for filtering and transforming text.”

• You should always run sed with the –r option, which allows for extended regular expressions– Noticing a pattern here?

• You also always want to give sed its regular expressions in single quotes, which tells Bash not to expand dollar signs, asterisks, question marks, and so on

Page 56: CIS 191: Linux and Unix Class 4 October 7th, 2015

Sed Syntax

• sed regular expressions take the syntax– s/regex/replacement/flags

• The g flag tells sed not to stop after the first replacement– Think “globally”

• Patterns can be captured in parentheses, and used in the replacement with backreferences– Sort of like storing matched information in variables…– Tell sed to store this information using extra parentheses in your

expression. Refer to them later with \1 for first group, \2 for second group…

Page 57: CIS 191: Linux and Unix Class 4 October 7th, 2015

Regular Expression Parenthesis Groups

• From out in first, then from left to right.• Recall the Name example from earlier

– [A-Z]\w*\s[A-Z]\w*

• If we rewrite the expression as– (([A-Z]\w*)\s([A-Z]\w*))

• Group “1” matches the full name• Group “2” matches the first name• Group “3” matches the last name

Page 58: CIS 191: Linux and Unix Class 4 October 7th, 2015

Sed Examples

$ echo “hello” | sed –r ‘s/lo/p/help$ echo “Here is a sentence” | sed –r ‘s/is/was/’Here was a sentence$ echo “This is a sentence” | sed –r ‘s/is/is not’This is not a sentence$ echo “This is a sentence” | sed –r ‘s/is/XXX’ThXXX is a sentence$ echo “This is a sentence” | sed –r ‘s/is/is not/g’This not is not a sentence$ echo “This is a sentence” | sed –r ‘s/\<is\>/is not/g’This is not a sentence

Page 59: CIS 191: Linux and Unix Class 4 October 7th, 2015

Another Sed example

• Consider translating a list of phone numbers from• (xxx)-xxx-xxxx to • xxx-xxx-xxxx• We need to replce the parenthesized part of the

numbers with its contents…• sed –r ‘s/\(([0-9]{3})\)/\1/’

– Extra parentheses tell sed to store the matched number– \1 grabs the matched text as a backreferences

Page 60: CIS 191: Linux and Unix Class 4 October 7th, 2015

Another Sed example

• Consider translating a list of phone numbers from• (xxx)-xxx-xxxx to • xxx-xxx-xxxx• We need to replce the parenthesized part of the

numbers with its contents…• sed –r ‘s/\(([0-9]{3})\)/\1/’

– Extra parentheses tell sed to store the matched number– \1 grabs the matched text as a backreferences

• But there’s a simpler solution…

Page 61: CIS 191: Linux and Unix Class 4 October 7th, 2015

Another Sed example

• Consider translating a list of phone numbers from• (xxx)-xxx-xxxx to • xxx-xxx-xxxx• We need to replce the parenthesized part of the

numbers with its contents…• sed –r ‘s/\(([0-9]{3})\)/\1/’ numbers

– Extra parentheses tell sed to store the matched number– \1 grabs the matched text as a backreferences

• But there’s a simpler solution… Remove the parentheses!– sed –r ‘s/[\(\)]//’ numbers

Page 62: CIS 191: Linux and Unix Class 4 October 7th, 2015

Another Example

• Consider changing a list of names from (Last, First) to (First, Last)

• As usual, we need to characterize the input first

Page 63: CIS 191: Linux and Unix Class 4 October 7th, 2015

Another Example

• Consider changing a list of names from (Last, First) to (First, Last)

• As usual, we need to characterize the input first– A capital letter, followed by any number of letters, then a

comma and a space; finally, one more capital letter and any number of other letters.

• And the sed expression?

Page 64: CIS 191: Linux and Unix Class 4 October 7th, 2015

Another Example

• Consider changing a list of names from (Last, First) to (First, Last)

• As usual, we need to characterize the input first– A capital letter, followed by any number of letters, then a

comma and a space; finally, one more capital letter and any number of other letters.

• And the sed expression?– sed –r ‘s/([A-Z]\w*),\s([A-Z]\w*)/\2, \1/g’