GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for...

Whats Grep?

Grep is a popular unix program that supports a special programming language for doing regular expressions

The grammar in use for software doing regular expressions are based on grep; perl extends it further.

Regular ExpressionSearch String

Compiles

Engine parses your search string

produces a state machine

FALSEFALSE

Searches

Input sent into State Machine

Conceptually, 1 shape/letter at a time

TRUETRUE

Found:The State Machine Object changes state (in this example it is set to true)

User checks machine state when it completes running

Grep Expressions

The “grep” language for doing Regular Expressions on text processing

Grep pattern is another name

called “Regular Expressions”

Grep Expressions

A string of text to match with special characters

“john.*”

would return True on a search of:“john was here”

Grep Expressions“.*\.txt”

.* is anything (.) any length (*)

\. is literally a . (the \ before it means the next character is literal; that is not special)

txt is just letter matching

This would filter out txt files

Its similar to what you see in windows, but its not the same--its more powerful than simple “wildcards” (*) you often see.

Special Chars

. = any single character

^ = beginning of a line

$ = end of line

\w = word & number characters

\d = decimals (numbers)

\ = escape char

Backslash \ (leans to the left)

most popular escape character

sneak past Illegal characters

make secret code characters

Data encoding always has them

Examples

… = three of ANYTHING

\d\d\d = three numbers (decimals)

remember the \ is the escape code

\w\w\w = three letters (no symbols)

good: abc

bad: a34, ab!

Approach

searching for “john” or “joan”

What is the difference between them?

what symbol works?

Special Chars

\D = non numbers

\W = non-word characters

\s = white space

\S = non white space

\n = new line (return/enter key)

\t = tab

\s\s\s = three whitespaces

tabs, space, possibly newlines

\D\s\W = non-decimal, space, non-word

Examples:

x 4, ! !, = 4, A <tab> 5

Quantity Chars* = 0 or more

? = 0 or 1

+ = 1 or more

[] = any of the chars in the [abc]

[^] = NOT any of the chars in []

[a-zA-Z] = ranges of chars

Examples

X+ = 1 or more X

[XYZ] = any of these 1 chars

X, Y, Z

[XYZxyz]+ = 1+ of any of these

y, XYz, zYZZyX, ZZzzzzz

EXAMPLES

[a-zA-Z0-9] = any word or number but no spaces

\.?$ = maybe ends with a .

remember: $ is end of line

.* = 0 to ∞ of any letter

[^abc]* = 0 to ∞ anything but lowercase a,b, or c

Problems

UniCode vs ASCII

Reg.Exp. language is older than UniCode

Many new Engines support UniCode

Minor Extensions to the language will be required for full UniCode support

Options

RegExp Engines typically have options

ignoreCase

saves you from doing [Aa] for each

global

repeats if a match was found until the end of the input; by default: it stops at the 1st match (useful for replace)

Options

multiline

Most breakup the input into lines:

At end of line, it resets for next line

This would make it ignore line endings (unless you use ^ or $ which refer to the beginning and end of lines)

/Common Use/

/string/ similar to “quotes” on strings

if you use “string” you must escape:

/\d\d/ (match 2 digit pattern)

“\\d\\d” (match 2 digit string)

GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for...

Documents

Lecture 4 Regular Expressions grep and sed. Previously Basic UNIX Commands –Files: rm, cp, mv, ls, ln –Processes: ps, kill Unix Filters –cat, head, tail,

GREP - Ghent University Repository

Map grep sort

Unix Talk #2 (sed). 2 You have learned… Regular expressions, grep, & egrep grep & egrep are tools used to search for text in a file AWK -- powerful

Parsing in Unix egrep, sed, awk & regex - …vorgogoz/BioInfoCourses/41-45-grep_sed_awk_… · Parsing in Unix egrep, sed, awk & regex ... grep expression file1 file2 file3 ... vi:

15 Practical Grep Command Examples in Linux _ UNIX

Introduction to UNIX Command Line - Carnegie … to UNIX Command Line Files and directories Some useful commands (echo, cat, grep, find, diff, tar) Redirection Pipes Variables

Grep Samples

Tutorial: Using regular expressions - IBM · PDF fileTutorial: Using regular expressions Section 1. Introduction to the tutorial ... In older UNIX-oriented tools like grep, subexpressions

Linux intro 3 grep + Unix piping

Grep and Sed Commands

CM12 elestic grep

Regular Expressions in Unix/Linux/CygwinGoodrich/Teach/Cs162/Notes/Regex.pdfExample: egrep with ( ) 27 % egrep '(no)+' grep-datafile northwest NW Charles Main 300000.00 northeast NE

Clustering Lecture 8: MapReduce - csce.uark.eduxintaowu/BDAM/mapreduce.pdf · Distributed Grep . Very big data . Split data Split data Split data Split data . grep grep grep grep

Reporte Mapreduce Grep

Introduction to Unix – CS 21 Lecture 6. Lecture Overview Homework questions More on wildcards Regular expressions Using grep Quiz #1

Regular Expressions (RE) Used for specifying text search strings. Standarized and used widely (UNIX: vi, perl, grep. Microsoft Word and other text editors…)

Lecture 5 Regular Expressions; grep; CSE4251 The Unix Programming Environment

grep, diff, find - Oregon State Universityweb.engr.oregonstate.edu/~rubinma/Mines_274/Content/Slides/10... · UNIX> seq 10 | grep 1 1 10 UNIX> UNIX> cat > input.txt 1 haystack 2 haystack

Grep 16.03.07