Upload
logan-palanisamy
View
897
Download
6
Embed Size (px)
Citation preview
Sed – tips and Tricks1
Logan Palanisamy
Agenda2
BasicsBio BreakIntermediate ConceptsQ & A
What is sed3
non-interactivenon-screen orientedLine orientedInput file can be any sizeInput file not affected
sed syntax4
sed [options] 'cmd' in_file(s)sed [options] 'cmd' in_file(s) [> out_file]Standard input: sed [options] 'cmd' < in_file
[> out_file]Pipelined input: command | sed [options]
'cmd' [> out_file]
Simple examples5
Example Explanationsed 's/pat1/pat2/g' file Substitute all occurrences of pat1 with pat2sed '/pat1/d' file delete lines containing pat1 from filesed '/pat1/w newfile' file
save lines containing pat1 to newfile
sed –n '30,40p' file Print lines 30 to 40sed '10q' file Print the top 10 linessed –e 's/pat1/pat2/' –e 's/pat3/pat4/' file
Substitute pat1 with pat2, and pat3 with pat4
sed 's/pat1/pat2/;s/pat3/pat4/' file
Substitute pat1 with pat2, and pat3 with pat4
Different sed options/switches6
Range Remarks-e Used when multiple commands are used on
the command line-n Suppress automatic printing of pattern space-f script-file The commands in the script-file get executed-r Use extended regular expressions in the script.
With this option, characters such as (, ), {, }, | become meta characters, and don't have to be escaped.
-s Consider files as separate rather than as a single continuous long stream
-i in-place editing of the input file
The "-f" option - example7
cat mysed.txt/pat1/ s/this/that/g/pat2/ s/before/after/
sed –f mysed.txt in_fileapostrophes not used
With and without "-s" option - Comparison
8
Lines in f1
Lines in f2
sed –n '1,10p' f1, f2 sed –ns '1,10p' f1, f2
4 5 9 lines 4 lines from f1, 5 lines from f2
6 6 6 lines from f1, 4 lines from f2
6 lines from f1, 6 lines from f2
12 10 10 lines from f1. No lines from f2
10 lines from f1, 10 lines from f2
Address Specification9
Range Remarks10 Just the line 101,10 Lines between 1 and 1010,$ Line 10 to end of file10,+3 Line 10 and 3 lines below (lines 10, 11, 12 and
13)10~3 every third line after line 1010, ~3 Line 10 and the next multiple of 3 (lines 10,11 and
12)/pat1/ All lines containing pat1/pat1/,+3 Lines containing pat1 and three lines following
it/pat1/,~3 Lines containing pat1 and up to the multiple of
3/pat1/, 20 Lines between the line containing pat1 and line 20 if pat1
appears before line 20. Otherwise, just the line containing pat1
/pat1/, /pat2/ lines between the line containing pat1 and line containing pat2
Address Specification with negation 10
Range Remarks10! All lines except 10 (! is the negation indicator)1,10! Lines from 11 to end of the file10,$! Lines 1 to 910,+3! All lines except 10, 11, 12 and 1310~3! All lines except line 10 and every third line
after that10, ~3! All lines except lines 10, 11, and 12
/pat1/! All lines not containing pat1/pat1/,+3! All line except lines containing pat1 and three
lines following it/pat1/,~3! All lines containing pat1 and up to the multiple
of 3
Regular Expressions11
Meta character
Meaning
. Matches any single character except newline* Matches zero or more of the character preceding it
e.g.: bugs*, table.*^ Denotes the beginning of the line. ^A denotes lines
starting with A$ Denotes the end of the line. :$ denotes lines ending
with :\ Escape character (\., \*, \[, \\, etc)[ ] matches one or more characters within the brackets.
e.g. [aeiou], [a-z], [a-zA-Z], [0-9], [:alpha:], [a-z?,!][^] matches any characters others than the ones inside
brackets. eg. ^[^13579] denotes all lines not starting with odd numbers, [^02468]$ denotes all lines not ending with even numbers
\<, \> Matches characters at the beginning or end of words
Extended Regular Expressions12
Meta character
Meaning
| alternation. e.g.: ho(use|me), the(y|m), (they|them)+ one or more occurrences of previous character. a+ is
same as aa*)? zero or one occurrences of previous character. {n} exactly n repetitions of the previous char or group{n,} n or more repetitions of the previous char or group{,m} zero to m repetitions on the previous char or group{n, m} n to m repetitions of previous char or group(....) Used for grouping sed –r ... the "-r" option may have to be used on some version
of sed for extended regular expressions to work
e.g.: sed –r '/(pat1|pat2) s/pat3/pat4/'sed –rn '/(pat1|pat2){3,}/ p'
Regular Expressions – Examples13
Example Meaning.{10,} 10 or more characters. Curly braces have
to escaped[0-9]{3}-[0-9]{2}-[0-9]{4} Social Security number([1-9]{3})[1-9]{3}-[0-9]{4}
Phone number (xxx)yyy-zzzz
[0-9]{2,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}
IP address format
[0-9]{3}[ ]*[0-9]{3} Postal code in India[0-9]{5}(-[0-9]{4})? US ZIP Code + 4
Substitution – Format and Options14
[address[!]]s/pat1/pat2/[options]Options: g – global, w – write, i – ignore
case, p – print, n – nth occurrence: or ; +, @ or any other character including
space could also be used as the delimiter. Useful when / is part of the search or replacement string.
Substitution - Examples15
Example Explanationsed 's/pat1/pat2/' fn Substitute the FIRST occurrence of pat1
with pat2sed 's/pat1/pat2/g' fn Substitute ALL occurrences of pat1 with
pat2sed 's/pat1/pat2/3' fn Substitute the third occurrence of pat1 with
pat2sed 's/pat1/pat2/3g' fn Substitute all but the first two occurrences
of pat1 with pat2sed 's/pat1/pat2/gi' fn Substitute ALL occurrences of pat1 with
pat2 ignoring the casesed 's/pat1/pat2/giw new_file' fn
Write to new_files lines containing pat1 substituting with pat2
sed –n 's/pat1/pat2/gp' fn
Print lines containing pat1 substituting with pat2
Substitution – Examples contd16
Example Explanationsed '/pat1/ s/pat2/pat3/g' fn
Substitute all occurrences of pat2 with pat3 on lines containing pat1
sed '/pat1/! s/pat2/pat3/g' fn
Substitute all occurrences of pat2 with pat3 on lines NOT containing pat1
sed '/pat1/,/pat2/ s/pat3/pat4/g' fn
Substitute all occurrences of pat3 with pat4 on lines between pat1 and pat2 (inclusive)
sed '1,100 s/pat1/pat2/gi' fn
Substitute ALL occurrences of pat1 with pat2 ignoring the case between lines 1 and 100
sed '/pat1/,/pat2/! s/pat3/pat4/g' fn
Substitute all occurrences of pat3 with pat4 on lines NOT between pat1 and pat2 (inclusive)
sed '1,100! s/pat1/pat2/3i' fn
Substitute the third occurrence of pat1 with pat2 ignoring the case from 101 to end of file
sed '2~5 s/pat1/pat2/w new_file ' fn
Write every 5th line starting with the 2nd line, substituting pat1 with pat2
Substitution – Examples contd.17
Example Explanationsed 's/\(pat1\|pat2\)/pat3/g' fn
Substitute all occurrences of either pat1 or pat2 with pat3
sed –r 's/(pat1|pat2)/pat3/g' fn
Same as above. With the –r option, parenthesis and alternation characters don't have to be escaped
sed –r 's/abc(pat1|pat2)/pat3/g' fn
Substitute all occurrences of either abcpat1 or abcpat2 with pat3
sed 's/\<pat1/pat3/g' fn Substitute all occurrences pat1 that begin a word with pat3.
sed 's/pat1\>/pat3/g' fn Substitute all occurrences pat1 that is end of a word with pat3
sed 's/\<pat1\>/pat3/g' fn
Substitute all occurrences of the word pat1 with pat3. Note: The angular brackets (< and >) have to be escaped even with the –r option.
sed 's/[a-z]/\u&/g' fn Substitute all lower case letters to upper case letters
sed 's/./&&/g' fn Double each character
Grouping and Back Referencing18
Parts of strings in the Search/Left-hand side can be grouped and referenced in the Replacement/Right-hand side
Up to nine groups possible (\1, \2, ..\9)Groups can be nested or referenced back on
the Search sideSame group can be referenced any number of
times
Substitution with Grouping and Back Referencing. Examples
19
Command Explanationsed 's/^\(.*\):\(.*\)/\2:\1/' fn Swap two fields delimited with :.
"column A:column B" becomes "column B:column A"
sed –r 's/^(.*):(.*)/\2:\1/' fn Same as above. With the –r option, the parentheses don't have to be escaped.
sed –r 's/^([^:]*):([^:]*)/\2:\1/' fn
Same as above. Exchanges only the first two columns even if there are more.
sed –r 's/^(This (.*) nested)/\2 \1/' fn
Group 1 contains everything between "This .. nested". Group 2 contains just the characters between "This" and "nested".
Substitution with Grouping and Back Referencing. Examples
20
Command Explanationsed –r '(\w)(\w)(\w)\w?\3\2\1/p' fn
Print lines with six or seven character long palindromes
sed –r 's/(.)(.)(.)\3\2\1/\1\2\3\1\2\3/g' fn
Convert six char palindrome strings to repetitive strings. Note: Any embedded or trailing six spaces also will match
sed -r ':a;s/(^|[^0-9.])([0-9]+)([0-9]{3})/\1\2,\3/g;ta' numbers.txt
Add comma as thousands separator (Think how easy it would have if we were to do this from the left side)
sed –r 's/(.*)/\1\1/' fn Concatenate the string at the line levelsed –r 's:(.*):\1\1:' fn Same as above, but using : as the
delimiter between search and replacement strings
Inserting text with "i" and "a"21
Example Meaningsed '4i\my text 1\my text 2' in_file
Inserts two lines before line 4
sed '/pat1/ i\my text 1\my text 2' in_file
Inserts two lines before every line that contains pat1
sed '4a\my text 1\my text 2' in_file
Inserts two lines after line 4
sed '$a\my text 1\my text 2' in_file
Inserts two lines after the end of the file
Changing text22
Example Meaningsed '4c\my text 1\my text 2' in_file
Replace line 4 with the two new lines
sed '/pat1/ c\my text 1\my text 2' in_file
Replace all lines containing pat1 with the two new lines
sed '/pat1/, /pat2/ c\my text 1\my text 2' in_file
Replace all lines between pat1 and pat2 with the two new lines
Change affects the whole line. Substitute just the matching words or strings on the line
Deleting lines23
Command Resultsed '10d' in_file Delete the 10th linesed '1,10d' in_file Delete lines between 1 and 10sed '10,$d' in_file Delete lines from 10 to end of filesed '10,+3d' in_file Delete line 10 and 3 lines below (lines 10, 11, 12
and 13)sed '10~3d' in_file Delete every third line after line 10sed '10, ~3d' in_file Delete line 10 and up to the next multiple of 3 (lines
10,11 and 12)sed '/pat1/d' in_file Delete all lines containing pat1sed '/pat1/,+3d' in_file
Delete all lines containing pat1 and three lines following it
sed '/pat1/,~3d' in_file
Delete lines containing pat1 and up to next the multiple of 3
sed '/pat1/!d' in_file Delete all lines NOT containing pat1sed –r '/[0-9]{8,}/d' in_file Delete lines containing 8 or more digits
Printing lines24
Command Resultsed –n '10,/pat1/p' in_file
Print lines from 10 to next line containing pat1
sed –n '10,+3p' in_file
Print line 10 and 3 lines below (lines 10, 11, 12 and 13)
sed –n '10~3p' in_file
Print every third line after line 10
sed –n '10, ~3p' in_file
Print line 10 and up to the next multiple of 3 (lines 10,11 and 12)
sed –n '/pat1/p' in_file
Print all lines containing pat1
sed –n '/pat1/,+3p' in_file
Print all lines containing pat1 and three lines following it
sed –n '/pat1/,~3p' in_file
Print lines containing pat1 and up to next the multiple of 3
sed –n '/pat1/,/pat2/p' in_file
Repetitively print lines between the ranges containing pat1 and pat2
sed –n '/pat1/!p' in_file
Print all lines NOT containing pat1
sed -n '/([0-9]\{3\})[0-9]\{3\}-[0-9]\{4\}/p' in_file
Print lines containing phones numbers.
A note on the –r option25
Command Resultsed -n '/([0-9]\{3\})[0-9]\{3\}-[0-9]\{4\}/p' in_file
Print lines containing phones numbers like (408)806-8330, (408)349-3699
sed -nr '/\([0-9]{3}\)[0-9]{3}-[0-9]{4}/p' in_file
Same as above with the –r option. Note the escaping of parentheses. Without escaping, ( and ) become meta characters, not part of the search string.
Printing with and without "-n" option - Comparison
26
Command Commentssed '1,10p' fn Lines 1 to 10 are printed twice. Rest of the
lines are printed oncesed –n '1,10p' fn Lines 1 to 10 are printed just once.sed 's/pat1/pat2/p' fn Lines containing pat1 are printed twice after
substituting pat1 with pat2. Other lines are printed once
sed –n 's/pat1/pat2/p' fn
Only lines containing pat1 are printed after substituting pat1 with pat2
Inserting/reading from a file27
Command Resultsed '10r my_file' in_file Insert my_file after the 10th linesed '1,10r my_file' in_file
Insert my_file after each line between 1 and 10
sed '$r my_file' in_file Append my_file at the endsed '10,+3r my_file' in_file
Insert my_file after every line between10 and 13.
sed '10~3r my_file' in_file
Insert my_file every third line after line 10
sed '/pat1/,/pat2/r my_file' in_file
Insert my_file after every line between pat1 and pat2
Writing selectively28
Command Resultsed '1,10w nf' in_file Write lines 1 to 10 to new file "nf"sed '/^pat1/w nf' in_file Write all lines beginning with pat1 to nfsed '/pat1$/w nf' in_file Write lines that end with pat1
sed '/\<pat1\>/w nf' in_file
Write lines that contain the WORD pat1 to nf
sed –e '1~2w odd_lines'-e '2~2w even_lines` in_file
Write odd numbered lines to odd_lines, and even numbered files to even_lines
sed '/pat1/, /pat2/w nf' in_file
Write all lines between pat1 and pat2 to nf
sed –r '/[0-9]{3}-[0-9]{2}-[0-9]{4}w ssn' in_file
Write lines containing Social Security Number nnn-nn-nnnn to the file ssn. Note the use of –r option. Otherwise, { and } have to be escaped.
sed –r '/(.)(.)(.)\3\2\1/w palin.txt' in_file
Write lines containing palindromes to palin.txt
Transliterating (like ‘tr’)29
Command Resulty/source/dest/ One to one character by character
substitution. source and dest strings have to be of the same length
sed 'y/13579/aeiou/' fn replace 1 by a, 3 by e, and so on
Quitting from sed30
Command Resultsed '15q' in_file Quit after the 15th line. Print 14 linessed '/\<[Ee]nd\>/q' in_file
Quit after encountering the word End or end
Grouping multiple commands with { and }
31
sed –n '/pat1/,/pat2/ {s/pat3/pat4/gs/pat5/pat6/gw new_filep}' inp_file
One liners32
Command Resultsed 'p' in_file Duplicate all the lines one below the othersed '/^$/ d' in_file Delete the blank lines from the input filesed ‘s/./&:/80’ Add a colon after the 80th character.
(replace the eightieth occurrence of the pattern with itself and a colon)
sed –r ‘s/^(.*) \1/\1/' fn Replace duplicate stringssed G in_file Add a blank line after everyline. find . –type f –name my_files.*.sql –exec sed –i ‘s/TableA/TableB/g’ {} +
Replace all occurrences of TableA with TableB in all my SQL scripts
More oneliners http://www.catonmat.net/blog/sed-one-liners-explained-part-one/
Additional concepts not covered33
labels, branching to themh, H: copy/append patternspace to holdspaceg, G: copy/append holdspace to patternspacex: exchange the contents of the hold and
pattern spaces
References34
sed & awk by Dale Dougherty & Arnold Robins
http://www.grymoire.com/Unix/Sed.htmlhttp://sed.sourceforge.net/sedfaq3.htmlhttp://en.wikipedia.org/wiki/Sedhttp://groups.yahoo.com/group/sed-users/ http://sed.sourceforge.net/sed1line.txt
Q & A35
http://twiki.corp.yahoo.com/view/Main/LpalaniYahoo
Unanswered questions36
How to simulate tail with sed?How to substitute the nth to mth occurrences
of pat1 with pat2?How to substitute the last N occurrences or
the nth occurrence from the end?How to identify palindrome of any length?