7. Regular Expressions#

A regular expression defines a set of one or more strings of characters. Many Unix utilities, including grep, ls, and find, use regular expressions to search for and replace strings. A simple string of characters is a regular expression that defines one string of characters: itself. A more complex regular expression uses letters, numbers, and special characters to define many different strings of characters. A regular expression is said to match any string it defines.

7.1. Simple Strings#

Regular
Expression

Matches

Examples

ring

ring

ring,spring,
ringing

or not

or not

or not,
poor nothing

Thursday

Thursday

Thursday,Thursday’s

7.1.1. Exercise#

Try describing what the following command does ?

cat | grep -o "or not"

Try using it to test some example input.

Try using it to test the other regular expressions in the above preceding table.

7.2. Special Characters#

Note - in what follows, a regular expression always matches the longest possible string starting as far forward in the line as possible

7.2.1. Period#

Regular
Expression

Matches

Examples

/ .alk/

stings containing a space, then any
character, followed by awk

will talk
may balk

/.ing/

all strings with any character preceding
ning

singing, ping,
before inglenook

7.2.2. Square Brackets#

Regular Expression

Matches

Examples

[bB]ill

b or B followed by ill

bill,Bill,billed

t[aeiou].k

t followed by lower case
vowel, any character, and k

talk, talkative

number [6-9]

number followed by a space
and a single number in range
6-9 (inclusive)

number 7, knumber 601

[^a-zA-Z]

any character that is not a letter

1, pdf2tif

Notice the negating action of ^ when it is inside [].

7.2.3. Asterix#

Regular Expression

Matches

Examples

ab*c

a followed by zero or more
bs followed by c

ac, abc,
aaabc,ac

ab.*c

ab followed by zero or more
other characters followed by c

abc,abxc, cat,
756.345, x

t.*ing

t followed by zero or more
other characters
followed by ing

thing, ting,
I was going

[a-zA-Z]*

a string composed of only letters
(upper or lower case)

1!!, !any text string

(.*)

as long a string as possible
between ()

Get (this (and) that) done

([^)]*)

the shortest string possible
between ()

(Get (this or)

7.2.4. Carat (^) and Dollar ($)#

Regular Expression

Matches

Examples

^T

a T at the beginning
of a line

Time, In Time

^+[0-9]

a + followed by a number
at the beginning of a line

+514, +3.14

:$

a : that ends a line

… following :

7.2.5. QuotingSpecial Characters#

Regular Expression

Matches

Examples

end.

strings containing end.

end.,end

\$

a dollar sign

lot $ of money

[[0-9]]

a number in []

[9],[90]

7.2.6. Full Regular Expressions#

To test these examples you will need to use

cat | egrep -o "or not"

Regular Expression

Matches

Examples

ab+c

a followed by one or
more bs followed by
a c

yabcw, abbc57

ab?c

a followed by zero or one b
followed by a c

back,abcdef

(ab)+c

one or more ab followed
by a c

zabcd, abababc

(ab)?c

zero or one ab followed
by c

xc, abcc

ab|ac

either ab or ac

ab,ac,abac

(D|N).Jones

D.Jones or N.Jones

N.Jones, P.D.Jones

7.3. Further Special Characters#

7.3.1. Standrd Special Characters#

These work with grep

Special Character

Function

.

Matches any single character

[xyz]

Defines a character class that matches x,y, or z

^xyz]

Defines a character class that matches any character exceptx,y, or z

[x-z]

Defines a character class that matches any character x through z inclusive

*

Matches zero or more of the preceding character

^

Forces a match to the beginning of the line

$

A match to the end of the line

|

Used to quote special characters

(xyz)

Matches xyz

<

Forces match at beginning of word

>

Forces match at end of word

7.3.2. Full Special Characters#

These work with egrep

Special Character

Function

+

Matches one or more occurrences of the preceding character

?

Matches zero or more occurrence of the preceding character

(xyz)+]

One or more occurrences of xyz

(xyz)+

Zero or one occurrence of xyz

(xyz)?

Zero or more occurrences of xyz

(xyz)*

Either xyz or abc

xyz|abc

Either xyc or abc