Shell Scripting and Awk - Day01

Embed Size (px)

DESCRIPTION

Shell Script

Citation preview

Agenda

Shell Scripting & awk2013 GlobalLogic Inc.CONFIDENTIALShell scriptingStandard file descriptors & IO redirectionFile test operators FunctionsExecuting SQLs/FTP from a shell-scriptawkProgramming modelRecords and fieldsPattern matching FunctionsSystem/built-in variablesData manipulation and report Generation Agenda#CONFIDENTIALShell scripting awkDay-1Training schedule2013 GlobalLogic Inc.CONFIDENTIALStandard file descriptors & IO redirectionExecuting SQLs/FTP from a shell-scriptFile test operators FunctionsShell scripting#CONFIDENTIALUNIX standard files : IO-redirectionRe-direction of input and output: >, >>, x2UNIX standard files [ contd.]Ex.2Login as a non root user and go to root - cd /

Find everything find .

Again, find everything, but redirect output to a file find . > $HOME/x1 What is being shown on screen?Why?

Redirect o/p & error to different files:find . > $HOME/x1 2>$HOME/x2

Send o/p & error to different files: find . > $HOME/x1 2>&1

UNIX Shell ScriptingHere document (heredoc/hereis/here-script)Special purpose code-block, read something from stdinSpread over multiple lineDelimiter in the last line, first column

Example, - send mail with message spread over multiple lines:mail -s "Hello" [email protected] UNIX FTP examplesFTP

ftp -n -i ftp.FreeBSD.org UNIX SQL handling [contd. 2]Process output in this single column:sqlite3 TrainingDB File test operators-S socket-tTerminal-device, e.g. whether the stdin [ -t 0 ] or stdout [ -t 1 ] in a given script is a terminal.-r/-w/-xFile has read / write / execute permission-gset-group-id (sgid) flag set on file or directory. If true, then any file created in this directory will have direcotorys group ID-uset-user-id (suid) flag set on file-tsticky bit set (the t at the end of ls l o/p) - thesave-text-modeflag is a special type of file permission, if set, then file will be kept in cache-memory, if set to a file, then write permission will be restricted.-Oyou are owner of file-Ggroup-id of file same as yours-Nfile modified since it was last readf1 -nt f2file f1 is newer than f2f1 -ot f2file f1 is older than f2f1 -ef f2files f1 and f2 are hard links to the same file!"not" -- reverses the sense of the tests above (returns true if condition absent).

File test operatorsFile existence:if [ -f /home/user11/Yogesh ]; then echo "File exists"; else echo "File NOT present"; fi;Directory existence: if [ -d /home/user11/Yogesh ]; then echo 'This is not a file' ;fi; Executable file:if [ -x /home/user11/TrainingScripts/01Hello.sh ]; then echo 'Wow, I can run this! ' ;fi; Writeable file:if [ -w /home/user11/TrainingScripts/01Hello.sh ]; then echo 'Warning! This file could be over-written!! ' ;fi; File-test operatorsLogical Operators:! :NOT-a : AND -o :ORExamples:File empty or not? if [ -f /home/user11/Yogesh.data a -s /home/user11/Yogesh.data ]; then echo 'Some data in file' ;fi; Risk of losing data?if [ -f /home/user11/Yogesh.data -a -s /home/user11/Yogesh.data -a -w /home/user11/Yogesh.data ]; then echo 'Danger of a non-empty file being written over!' ; fi;FunctionsTo be used within the scriptFunctionsor procedures A function may return a value in one of four different ways:Change the state of a variable or variablesUse the exit command to end the shell scriptUse the return command to end the function, and return the supplied value to the calling section of the shell scriptecho output to stdout, which will be caught by the caller just as c=`expr $a + $b` is caughtCan be defined within a fileor inside a project library as well.There is _NO_ scopingOther than the parameters ($1, $2, $@ etc.)

#CONFIDENTIALFunctions: scopingDeclare as:function_name () { list of commands}Invoke as: function_name function_name 1 b 3 other-arguments

Return a value as: return 10Evaluate return code with $? iRC=$retif [ $iRC ge 0 ] .

#CONFIDENTIALStandard file descriptors & IO redirectionExecuting SQLs/FTP from a shell-scriptFile test operators Shell-scripting: functions01020304Checklist Shell scripting#CONFIDENTIALawk programming modelRecords and fieldsPattern matching FunctionsSystem/built-in variablesData manipulation and report Generation awk#CONFIDENTIALawk programming modelDesigned for text-processing and used typically for data-extractionData-driven, Interpretedawk views input stream as a collection of recordsRecords are made up of fieldsFields is word w/ one/more non-whitespace charactersFields are separated by one/more whitespace charactersAn awk program consists of pairs of patterns and braced actionsAll patterns are examined for every input recordFields could be accessed by $1, $2 etc. $0 is for whole recordProgram consist of main input loop, which gets executed over all the recordsTypical awk programs looks like this:BEGIN{}{}END{}pattern { action }pattern { action }Input_file

awk 'BEGIN { action; }/search/ { action; }END { action; }' input_file1 file2

#CONFIDENTIALawk programming model

BEGIN What happens before processing, e.g. initialization partMain input loop the processingEND What happens after processing, e.g. print some concluding stats

Main input loop:You dont write the loop, e.g. in C while(!EOF) do {readline()}.Instructions are written as a series of pattern/action procedures Multiple BEGIN/END/Main-loops are possible will be executed in the order of appearance

#CONFIDENTIALawk programming model - examplesawk '$0 ~ /Rent/{print}' file Rent,900awk '/Rent/{print}' fileawk '/Rent/' fileawk -F, '$1 ~ /Rent/' file Search only in first fieldawk -F, '$1 == "Medicine"{print $2}' file200 /n 600awk '/Rent|cine/' file 3 lines for Medicine and Rentawk '!/Medicine/' file The non-medicine linesawk -F, '$2>500' file where did I spent more than 500?awk 'NR==3|| NR==5' file 3rd and 5th linesawk 'NR!=1' file skip the headerawk 'BEGIN{IGNORECASE=1} /Rent/' file Rent + Restaurent as well!awk '/Rent/{print} /cine/{print}' file +Medicineawk 'BEGIN{IGNORECASE=1;print("--START--")} /Rent/{print} /cine/;END{print ("--END--")}' file ....+ report-heading / footer! awk 'NR>2{ print x} {x=$0}' file Skip first and the last line...what/how?

cat file Medicine,200Grocery,500 Rent,900 Grocery,800 Medicine,600Restaurent,300

#CONFIDENTIALawk programming model BEGIN and ENDImplementation of wc-l in awk (run as awk f awkScriptName inputFileName)

BEGIN { lines = 0} { lines = lines + 1 } END { print lines }

Implementation of cat n in awk (run as awk f awkScriptName inputFileName)BEGIN { linenum = 0 } { linenum = linenum + 1 print \t linenum $0 }

#CONFIDENTIALawk programming model - examplesawk -f awkscript02.awk file Run the script from a fileawk is C-like input language, so youll see printf(), if, while, for with syntax as exactly same as that in C

#CONFIDENTIALawk programming model basic awk programscat awkscript01.awk 1BEGIN{ 2 IGNORECASE=1 3 print("--START--") 4} 5 6/Rent/{print} /cine/; 7 8END{ 9 print ("--END--") 10}

cat -n awkscript02.awk 1BEGIN{ 2 IGNORECASE=1 3 print("--START--") 4} 5 6END{ 7 print ("--END--") 8} 9 10/Rent/{print} /cine/;

cat -n awkscript03.awk 1BEGIN{ 2 IGNORECASE=1 3 FS="," 4 print("--START--") 5} 6{ 7print $1 8 9} 10END{ 11 print ("--END--") 12} 13 14/Rent/{print} /cine/;

#CONFIDENTIALRecords and fieldsInput is structured and not just an endless string of characters.

Delimited by spaces or tabsecho a b c d | awk 'BEGIN { one = 1; two = 2 } { print $(one + two) } cecho a,b,c,d | awk 'BEGIN { one = 1; two = 2 } { print $(one + two) } null-string

$0: Whole record, $1 / $2: First/second field etc. $NF last field

You can change the field separator with the -F option on the command lineecho a,b,c,d | awk -F, 'BEGIN { one = 1; two = 2 } { print $(one + two) }

f vs F: awk -F, -f awkScriptFile.awk inputDataFile.dat

A better option is to specify it in BEGIN: BEGIN { FS = "," } FS = "\t Tab, i.e. a single tab as the field separatorFS = "\t+ Tabs one or more!FS = "[':\t] Any of these three 1, : or tab could be presentawk -F word[0-9][0-9][0-9] file fields separated by 3 digits

#CONFIDENTIALRecords and FieldsRS - how to separate records, default value is \nIt can be changed:awk 'BEGIN { RS = "/" } ; { print $0 }' BBS-listor from the command-line, like this (i.e. even before it starts processing BBS-list!):awk '{ print $0 }' RS="/" BBS-listNR: Record number total records if multiple files - read so far , FNR resets for each file

#CONFIDENTIALawk pattern and actionsKinds of Patterns/regular expression/ It matches when the text of the input record fits the regular expression. Expression A single expression, matches when its value is non-zero (if a number) or non-null (if a string)Range patter, e.g. pat1, pat2 A pair of patterns separated by a comma, specifying a range of records. The range includes both the initial record that matches pat1, and the final record that matches pat2. BEGIN / END Special patterns to supply start-up or clean-up actions

Empty The empty pattern matches every input record #CONFIDENTIALawk pattern and actions: Regular expressionsIt matches when the text of the input record fits the regular expression. awk '/foo/ { print $2 }' BBS-listawk '$1 ~ /J/' inventory-shippedawk '$1 !~ /J/' inventory-shippedtolower($1) ~ /foo/ { ... }

Regexp, e,g: e.g. ^ (Start) $(End) .(1 char) [] (char list) *(0-more) +(1-more)etc. could be used.

#CONFIDENTIALawk pattern and actions: Expressions and RangeA single expression, matches when its value is non-zero (if a number) or a non-null (if a string)

awk '$1 == "foo" { print $2 }' BBS-list Exact word fooawk '$1 ~ /foo/ { print $2 }' BBS-list shall contain fooawk '/2400/ && /foo/' BBS-list 2400 and foo, both should be presentawk '/2400/ || /foo/' BBS-listeither of these twoawk '! /foo/' BBS-list all line, but those having the word foo

Range pattern, pat1, pat2 : A pair of patterns separated by a comma, specifying a range of records. The range includes both the initial record that matches pat1, and the final record that matches pat2awk '$1 == "on", $1 == "off" Everything b/w on and off inclusive

#CONFIDENTIALawk pattern and actions: BEING-END and Empty patternBEGIN / END - Special patterns to supply start-up or clean-up actionsawk BEGIN { print "Analysis of \"foo\"" } /foo/ { ++n } END { print "\"foo\" appears " n " times." }' BBS-list

Empty: To print every input record awk '{ print $1 }' BBS-list

#CONFIDENTIALawk functionsBuilt-in functions:C-like operations, and operators.Arithmetic functions int(), sqrt(), sin( ), cos( ), exp( ), atan2( ), sqrt( ), rand( ), srand()http://www.staff.science.uu.nl/~oostr102/docs/nawk/nawk_91.html#SEC94

String functionsindex(), length(), match(), split(),sprint(), sub(), gsub(),substr(),tolower(), toupper()http://www.staff.science.uu.nl/~oostr102/docs/nawk/nawk_92.html

awk 'x = sqrt( $2+$3);{printf("%f,%f,%f,%f",x, $2,$3, $2+$3)}' file2

awk 'x = sqrt( $2+$3);{printf("%s %.2f %d %d %d", substr($1,3,3) ,x, $2,$3, $2+$3)}' file2

#CONFIDENTIALawk functionsUser-defined functions:

Define:function myprint(num){ printf "%6.3g\n", num}

File rev.askfunction rev(str){ if (str == "") return ""

return (rev(substr(str, 2)) substr(str, 1, 1))}

Use:awk 'function myprint(num) { printf "%6.3g\n", num} $3>0 {myprint($3)}' datafile--OR-- awk 'function myprint(num) { printf "%6.3g\n", num} { if($3>0) myprint($3)}' datafile

echo "Don't Panic" | gawk -e '{ print rev($0) }' -f rev.awk--OR--echo "Don't Panic" | awk -f rev.awk -e '{ print rev($0) }'--OR--echo "Don't Panic" | awk -f rev.awk -e '{ print rev($0 $1) }'

datafile1.2 3.4 5.6 7.8 9.10 11.12 -13.14 15.1617.18 19.20 21.22 23.24

#CONFIDENTIALawk OperatorsArithmetic operators:^, **, -, +, *, / , %, Comparison-operators: =, ==, !=, ~, !~, inString Concatenation:No explicite operator, simply write strings next to each other, e.g. print "Field number one: " $1Assignment:=Increment/Decrement:++, -- : both post and pre-fix

Regexp Operators:\Suppress special meaning of a character, e.g. \$ would match a $ and not something at end of a line^Beginning of a string$End of a string.(Period)Any single character()Group regexp together, e.g. @(samp|code)\{[^}]+\} matches both @code{foo} and @samp{bar}. *Repeat as many times as possible, e.g. ph* - lookup for one p followed by 0 or more h, e.g. p, ph, phhh+Repeat at least once, e.g. p - lookup for one p followed by 1 or more h, i.e. ph, phh etc. but not p?Match once or not at all, e.g. fe?d matches fd or fed, but not feed{n}/{n,},{n,m}Match exactly n / n or more / n to m e.g. wh{3}y whhhy, w{1,2}y - why, why, w{1,}y why, whhy, whhhy etc.[]Bracket expression, match any one, e.g. [Yog] matches any one of the Y, o or g.[^]Complimented bracket expression, e.g. [^Yog] match if it does not contain either of Y, or or g.|Alteration operator, e.g. ^P|[aeiouy] - either it starts with a P, or contains any of aeiouy

#CONFIDENTIALawk built-in variablesField variables: $1, $2, $3, and so on ($0 represents the entire record). NR:Current count of the number of input records / line being readNF: Count of the number of fields in an input record. $NF for the last field in the input recordFILENAME: Contains the name of the current input-file.FS: Field-separator" character for input record, default is "white space (1/more spaces/tabs) characters.FS can be reassigned to another character to change the field separator.RS: Record Separator" character. new line is the default record separator characterOFS: Output field separator for o/p fields when awk prints them, default is a "space" character.ORS: Output record separator, for o/p records when awk prints them, default is a "newline" character.OFMT: Format for numeric output. The default format is "%.6g".#CONFIDENTIALawk Data Manipulation cat file2#trackchr11 61731756 61735132 FTH1 -chr12 6643584 6647537 GAPDH +chr11 18415935 18429765 LDHA +chr12 21788274 21810728 LDHB -chr22 24236564 24237409 MIF +chr4 6641817 6644470 MRFAP1 +chr15 72491369 72523727 PKM -chr10 73576054 73611082 PSAP -chr2 85132762 85133799 TMSB10 +chr13 45911303 45915297 TPT1 -

Input file flat file containing record and fields, available for string-manipulation and arithmetic operations.e.g. consider file2:If the 5th column is + then subtract 5000 from column 2 and add 2000 to column 3If the 5th column is "-", then add 5000 to column 3 and subtract 2000 from column 2Ref.Stack Exchange: http://unix.stackexchange.com/questions/127471/using-awk-for-data-manipulation

awk '$5 == "+" {$2-=5000;$3+=2000}; $5 == "-"{$3+=5000;$2-=2000};{print}' file2

awk -f awkscript04.awk file2 cat awkscript04.awk BEGIN{("---START-----")}{ if($5 == "+"){ $2-=5000; $3+=2000 } if($5 == "-") { $3+=5000; $2-=2000 } print}END { print ("---END-----") }#CONFIDENTIALData transformation and report generation language

Data manipulation and retrieval of information from text files

Initialize variables before reading a file: awk -f progfile a=1 f1 f2 a=2 f3sets a to 1 before reading input from f1 and sets a to 2 before reading input from f3 The -v option lets you assign a value to a variable before the awk program begins running (that is, before the BEGIN action). For example, in awk -v v1=10 -f prog datafile

#CONFIDENTIALReport Generation-IGet employee names and salary: awk '{print $2, $5}' employee.tx

- or something more report like:

cat -n report01.awk 1 BEGIN {2 printf(" Salary Report\n");3 printf("EName\tSalary\n");4 printf("=====\t=======\n")5 }6 { printf("%s\t%d\n",$2,$5)} 7 END {8 printf("--END OF REPORT--\n")9 }

awk -f report01.awk employee.txt

employee.txt 100 Thomas Manager Sales $5,000200 Jason Developer Technology $5,500300 Sanjay Sysadmin Technology $7,000400 Nisha Manager Marketing $9,500Randy DBA Technology $6,000..#CONFIDENTIALReport Generation - IIAn HTML report:awk -f report02.awk -v v1=Technology employee.txt > abc.html cat report02.awk: 1BEGIN { 2 title="Salary Report by awk" 3 print"\n"title"" 4 print"\nSalary Report" 5 print"for "v1" department"; 6 print "#ENameSalary" 7 totalSal=0 8 count=0 9} 10{ 11 #if($4=="Technology") 12 if($4==v1) { 13 count++ 14 print ""count""$2""$5"" 15 totalSal+=$5 16 } 17}

18END { 19 print "\n"; 20 if(count>0) 21 printf("Total %s Salary:%d, for %d employees, average:%0.2f",v1,totalSal,count,totalSal/count); 22 else 23 printf("
Invalid department "v1"") 24 print "

Report run Date/Time:" strftime("%Y-%m-%d/%H:%M:%S") 25 print "
Number of records processed:",NR 26 print ""; 27 28}

...or automate the whole reporting proess!mail -s "Salary Report on `date`" [email protected]