7. Regular Expressions#
A regular expression defines a set of one or more strings of characters. Many Unix utilities, including grep, ls, and find, use regular expressions to search for and replace strings. A simple string of characters is a regular expression that defines one string of characters: itself. A more complex regular expression uses letters, numbers, and special characters to define many different strings of characters. A regular expression is said to match any string it defines.
7.1. Simple Strings#
Regular |
Matches |
Examples |
---|---|---|
ring |
ring |
ring,spring, |
or not |
or not |
or not, |
Thursday |
Thursday |
Thursday,Thursday’s |
7.1.1. Exercise#
Try describing what the following command does ?
cat | grep -o "or not"
Try using it to test some example input.
Try using it to test the other regular expressions in the above preceding table.
7.2. Special Characters#
Note - in what follows, a regular expression always matches the longest possible string starting as far forward in the line as possible
7.2.1. Period#
Regular |
Matches |
Examples |
---|---|---|
/ .alk/ |
stings containing a space, then any |
will talk |
/.ing/ |
all strings with any character preceding |
singing, ping, |
7.2.2. Square Brackets#
Regular Expression |
Matches |
Examples |
---|---|---|
[bB]ill |
b or B followed by ill |
bill,Bill,billed |
t[aeiou].k |
t followed by lower case |
talk, talkative |
number [6-9] |
number followed by a space |
number 7, knumber 601 |
[^a-zA-Z] |
any character that is not a letter |
1, pdf2tif |
Notice the negating action of ^ when it is inside [].
7.2.3. Asterix#
Regular Expression |
Matches |
Examples |
---|---|---|
ab*c |
a followed by zero or more |
ac, abc, |
ab.*c |
ab followed by zero or more |
abc,abxc, cat, |
t.*ing |
t followed by zero or more |
thing, ting, |
[a-zA-Z]* |
a string composed of only letters |
1!!, !any text string |
(.*) |
as long a string as possible |
Get (this (and) that) done |
([^)]*) |
the shortest string possible |
(Get (this or) |
7.2.4. Carat (^) and Dollar ($)#
Regular Expression |
Matches |
Examples |
---|---|---|
^T |
a T at the beginning |
Time, In Time |
^+[0-9] |
a + followed by a number |
+514, +3.14 |
:$ |
a : that ends a line |
… following : |
7.2.5. QuotingSpecial Characters#
Regular Expression |
Matches |
Examples |
---|---|---|
end. |
strings containing end. |
end.,end |
\$ |
a dollar sign |
lot $ of money |
[[0-9]] |
a number in [] |
[9],[90] |
7.2.6. Full Regular Expressions#
To test these examples you will need to use
cat | egrep -o "or not"
Regular Expression |
Matches |
Examples |
---|---|---|
ab+c |
a followed by one or |
yabcw, abbc57 |
ab?c |
a followed by zero or one b |
back,abcdef |
(ab)+c |
one or more ab followed |
zabcd, abababc |
(ab)?c |
zero or one ab followed |
xc, abcc |
ab|ac |
either ab or ac |
ab,ac,abac |
(D|N).Jones |
D.Jones or N.Jones |
N.Jones, P.D.Jones |
7.3. Further Special Characters#
7.3.1. Standrd Special Characters#
These work with grep
Special Character |
Function |
---|---|
. |
Matches any single character |
[xyz] |
Defines a character class that matches x,y, or z |
^xyz] |
Defines a character class that matches any character exceptx,y, or z |
[x-z] |
Defines a character class that matches any character x through z inclusive |
* |
Matches zero or more of the preceding character |
^ |
Forces a match to the beginning of the line |
$ |
A match to the end of the line |
| |
Used to quote special characters |
(xyz) |
Matches xyz |
< |
Forces match at beginning of word |
> |
Forces match at end of word |
7.3.2. Full Special Characters#
These work with egrep
Special Character |
Function |
---|---|
+ |
Matches one or more occurrences of the preceding character |
? |
Matches zero or more occurrence of the preceding character |
(xyz)+] |
One or more occurrences of xyz |
(xyz)+ |
Zero or one occurrence of xyz |
(xyz)? |
Zero or more occurrences of xyz |
(xyz)* |
Either xyz or abc |
xyz|abc |
Either xyc or abc |