30
AWK

AWK. text processing languge awk Created for Unix by Aho, Weinberger and Kernighan Basicly an: ▫interpreted ▫text processing ▫programming language Updated

Embed Size (px)

Citation preview

AWK

awktext processing languge

awk• Created for Unix by Aho, Weinberger and

Kernighan• Basicly an:

▫ interpreted ▫ text processing ▫programming language

• Updated versions▫NAWK

New awk▫GAWK

Free Software Foundation’s version

awk Basics

•Basic form:▫ awk options 'selection criteria {action}' file(s)

•Can use regular expressions•Files are read one line at a time with

contents as fields•Fields are numbered ($1, $2, etc…)

▫Entire line is $0•Can run standalone•Can run as a program•Uses a blank as the default separator

-f Option (stored awk programs)•awk programs can be stored in a file•awk –f awkfile datafile

▫-f filename is the awk program▫datafile contains the data

Example• Find the TAs in the personnel file

▫The file is blank separated -F defines the delimiter

Use “\ “ to escape the blank (a blank after the \)▫Note: the blank is the default seperator anyway

▫Title is in the 3rd field

# cat personnel.dataTony Kombol Lecturer 800111222 704-687-1111Jinyue Xia TA 800111333 704-687-2222Hadi Hashemi TA 800111444 704-687-3333## awk -F\ '$3 == "TA" { print }' personnel.dataJinyue Xia TA 800111333 704-687-2222Hadi Hashemi TA 800111444 704-687-3333#

example• To run an awk program

▫ personnel.data has the data ▫ findta.awk is the code

Looks for TA (3rd parm) Prints first name and telephone number (1st and 5th parms)

▫ Note: what small formatting problem is here?

# awk -F\ -f findta.awk personnel.dataTAsJinyue704-687-2222Hadi704-687-3333Done

# cat personnel.dataTony Kombol Lecturer 800111222 704-687-1111Jinyue Xia TA 800111333 704-687-2222Hadi Hashemi TA 800111444 704-687-3333

# cat findta.awkBEGIN { print "TAs";}$3 == "TA" {print $1 $5}END { print "Done"}

print and printf• Output goes to std out

▫ can be redirected with > or | redirected name must be in quotes: # print $2, $1 | "sort"

▫ the output of the print goes to the sort routine

• print is unformatted• printf allows formatting

▫ %s – string %-20s

20 char spaces, justified (-)

▫ %d – integer %8d

set aside 8 spaces for the number

▫ %f – floating point %4.8f

Set aside 4 chars to the left of the decimal point and 8 to the right

▫ printf needs \n to start new line

Number processing• AWK supports basic computation

▫ + - addition▫ - - subtraction▫ * - multiplication▫ / - division▫ % - modulus▫ ^ - exponentiation

• Also supports:▫ ++ - add one to itself (post and pre fix)▫ += - add and assign to self▫ -- - subtract one from self (post and pre fix)▫ -= - subtract from self▫ *= - multiply self▫ /= - divide self

Variables and Expressions• awk is loosely typed• do not need to declare variables

▫ x = 5• do not need $ to access like sed

▫ print x• strings are double quoted

▫ x = "This is a string"• no string concatenater, done by context

▫ x = "string1"; y = "string2"print x y Space is required

• some conversions done automatically▫ x = "56"; y = 43; z = "abc"print x y # gives 5643 y converted to stringprint x + y # gives 99 + converts x to integerprint y + z # gives 43 + converts z to integer 0

Comparison and Logical Operators•awk supports string and numeric

comparisons▫== is the equality operator

= is for assignment▫< and > can be used on strings

Beware of conversions when dealing with strings that consist of numbers

▫~ is used for regular expressions $2 ~ /[dh]og/

parameter 2 matches hog or dog

Comparison and Logical Operators•awk supports boolean operations

▫&& - and▫|| - or▫! - not

simple comparison

•Field 6 is number of years with organization▫Find those with more than 5 years

# awk '$6 > 5 { print $2 ", " $1 ":" $6}' personnelyears.dataKombol, Tony:6Flintstone, Fred:10#

# cat personnelyears.dataTony Kombol Lecturer 800111222 704-687-1111 6Jinyue Xia TA 800111333 704-687-2222 3Hadi Hashemi TA 800111444 704-687-3333 1Fred Flintstone RA 800123321 704-687-1212 10Barney Rubble URA 800112233 704-687-3344 4#

Regular Expression comparison example

•Find the TAs and RAs including the URAs

# cat personnel.dataTony Kombol Lecturer 800111222 704-687-1111Jinyue Xia TA 800111333 704-687-2222Hadi Hashemi TA 800111444 704-687-3333Fred Flintstone RA 800123321 704-687-1212Barney Rubble URA 800112233 704-687-3344

# awk '$3 ~ /[RT]A/ {print $1 " " $2 " " $5}' personnel.dataJinyue Xia 704-687-2222Hadi Hashemi 704-687-3333Fred Flintstone 704-687-1212Barney Rubble 704-687-3344#

BEGIN and END Sections• BEGIN and END allows for some pre and post

processing▫Both are optional

• General format:▫BEGIN { action }{ action }END { action }

▫BEGIN's actions are done before the processing of the datafile begins Good for headers, setup, etc.

▫END's actions are done after the processing of the datafile ends Good for post processing, notes, etc.

another regular expression• This is a more complex check using a file for the awk

program▫Check to see the ID is 800……

That is 800 followed by 6 characters

# awk -f findbadid.awk personnelbad.dataList of bad IDs followsBad Id has a bad id:809123456End of list

# cat personnelbad.dataTony Kombol Lecturer 800111222 704-687-1111 6Jinyue Xia TA 800111333 704-687-2222 3Hadi Hashemi TA 800111444 704-687-3333 1Fred Flintstone RA 800123321 704-687-1212 10Barney Rubble URA 800112233 704-687-3344 4Bad Id LX 809123456 704-687-8890 0

# cat findbadid.awkBEGIN { print "List of bad IDs follows";} $4 !~ /^800....../ { print $1 " " $2 " has a bad id:" $4};END { print "End of list";}#

awk file example

# cat grades.dataFred Ziffle:99:AArnold Ziffle: 55: FTara Boomdea: 85:BNeo:100:ABuffy Summers: 72:CSheldon Cooper:67:DZorbon Prentwist: 88 : BZorbax Bottlewit:88:BBad Grade: 33: A

# cat ckgrades.awkBEGIN {print "Listing Bs\n"}$3 == "B" {print $0 }END {print "\nDone"}#

# awk -F: -f ckgrades.awk grades.dataListing Bs

Tara Boomdea: 85:BZorbax Bottlewit:88:B

Done#

Note: ": B" does not get matched

Positional Parameters•Parameters are usually used as the fields of

each line•A parameter can be passed to the awk program

▫Used with a shell program▫Must be in quotes in the program

e.g. Instead of

▫$4 > 12▫4th parm in line is > 12

▫$4 > '$2'▫4th parm in line is > 2nd parm passed to the program:▫prog.awk 50 82

Arrays

• awk supports arrays▫ arrays do not need to be "declared"

"declared" the minute they are used• Arrays are associative

▫ index can be numeric alphabetic

▫ thisday["Tue"] = "Tuesday";thisday[2] = "Tuesday"; above are two array elements for the array thisday each reference a separate string

printf("thisday[\"Tue\"] is %s", thisday["Tue"]) ;printf("thisday[2] is %s", thisday[2]) ;▫ Both will print "Tuesday" for the array referenced

Arrays•ENVIRON[ ]

▫an assosciative array containing all the environmental variables

Built-in Variables•awk has a set of built-in variables

▫Some can be overridden

Built-In VariablesVariable Function Default

NR Cumulative # of lines read -

FS Input Field Separator space

OFS Output Field Separator space

OFMT Default FP format %.6f

RS Record separator newline

NF Number of fields in current line -

FILENAME Current input file -

ARGC Number of arguments in command line

-

ARGV Array containing list of arguments -

ENVIRON Assoc. array of all environment variables

-

Functions

•awk has several built-in functions▫() are optional if no parms

encouraged to use▫Arithmetic functions▫String functions

Arithmetic Functions

•int(x)•sqrt(x)

String Functions• length()

▫ length of complete line• length(x)

▫ length of x• tolower(s)

▫ returns s as lower case• toupper(s)

▫ returns s as upper case• substr(str,m)

▫ returns string starting at m to end of string• substr(str,m,n)

▫ returns string starting at m for n characters• index(s1,s2)

▫ finds the position of s2 inside s2• split(str,arr,ch)

▫ splits str int an array, the delimiter is ch• system("cmd")

▫ exectutes a system (Linux) command and returns exit status

If

•Syntax:▫if (cond true) {

statements} else {

statements}

▫Notes: else is optional {} not needed for single statements

For• Syntax form 1:

▫ for ( startval ; condition ; control) statement C like in form

▫Example: for ( k=1 ; k<9 ; k++ ) print k

• Syntax form 2:▫for ( var in array) statement

Will scan every var in the array Great for associative array

Non numeric indices Gaps in array

See ENVIRON example in previous slide

While

•Syntax:▫while (cond is true) {

statement(s)}

continue and break

•Continue and break can be used to stop all loops▫for▫while

•break ▫stops the loop

•continue▫stops processing statements in this loop▫continues to next iteration

Summary

•awk is a "primative" scripting language•good for processing text files

▫filtering•perl is a more modern replacement

▫"religious war" over which is better•if you understand awk it will be a good

basis to understant perl