# Regular Expressions

A regular expression defines a set of one or more strings of characters. Many Unix utilities, including
grep, ls, and find, use regular expressions to search for and replace strings. 
A simple string of characters is a regular expression that defines one string of characters: itself. A more complex regular expression uses letters, numbers, and special characters to define many different strings of characters. A regular expression is said to match any string it defines.



## Simple Strings

|Regular <br> Expression|Matches|Examples|
|-|-|-|
| ring | **ring** | **ring**,sp**ring**, <br> **ring**ing |
| or not| **or not** |**or not**, <br> po**or not**hing  |
|Thursday | **Thursday**| **Thursday**,**Thursday**'s  |

### Exercise

Try describing what the following command does ? 

```bash
cat | grep -o "or not"
```

Try using it to test some example input.

Try using it to test the other regular expressions in the above preceding table.

## Special Characters

**Note** - in what follows, a regular expression always matches the longest possible string  starting as far forward in the line as possible

### Period

|Regular <br> Expression|Matches|Examples|
|-|-|-|
| / .alk/ | stings containing a space, then any <br> character, followed by **awk** | will **talk** <br> may **balk** |
| /.ing/ | all strings with any character preceding <br>n**ing**| **sing**ing, **ping**, <br> before **ing**lenook|



### Square Brackets

|Regular Expression|Matches|Examples|
|-|-|-|
| [bB]ill | **b** or **B** followed by **ill** | **bill**,**Bill**,**bill**ed|
| t[aeiou].k | **t** followed by lower case <br> vowel, any character, and **k** | **talk**, **talk**ative |
| number [6-9] |**number** followed by a space <br> and a single number in range <br> 6-9 (inclusive)| **number 7**, k**number 6**01|
| [^a-zA-Z] | any character that is not a letter | **1**, pdf**2**tif |

Notice the negating action of ^ when it is inside [].


### Asterix

|Regular Expression|Matches|Examples|
|-|-|-|
| ab*c| **a** followed by zero or more <br> **b**s followed by **c** | **ac**, **abc**, <br> aa**abc**,**ac**|
| ab.*c| **ab** followed by zero or more <br> other characters followed by **c**| **abc**,**abxc**, **c**at, <br> **756.345**, **x**|
| t.*ing | **t** followed by zero or more <br> other characters <br> followed by **ing**| **thing**, **ting**, <br> I **was going**|
| [a-zA-Z]*| a string composed of only letters <br> (upper or lower case)| **1!!, !any text string** |
|(.*)  | as long a string as possible <br> between ()| Get **(this (and) that)** done| 
| ([^)]*)| the shortest string possible <br> between ()| **(Get (this or)**|


### Carat (^) and Dollar ($)

|Regular Expression|Matches|Examples|
|-|-|-|
| ^T | a **T** at the beginning <br> of a line| **T**ime, In Time|
| ^+[0-9]| a **+** followed by a number <br> at the beginning of a line | **+5**14, **+3**.14 |
|:$ | a **:** that ends a line| ... following **:**|
| | | |


### QuotingSpecial Characters

|Regular Expression|Matches|Examples|
|-|-|-|
|end\. | strings containing **end.**| **end.**,end|
| \\$| a dollar sign | lot **$** of money|
| \[[0-9]\] | a number in [] | **[9]**,[90]|
| | | |


### Full Regular Expressions

To test these examples you will need to use 

```bash
cat | egrep -o "or not"
```


|Regular Expression|Matches|Examples|
|-|-|-|
|ab+c| **a** followed by one or <br> more **b**s followed by <br> a **c**| y**abc**w, **abbc**57 |
| ab?c|**a** followed by zero or one **b** <br> followed by a **c** | b**ac**k,**abc**def|
| (ab)+c| one or more **ab** followed <br> by a **c**  | z**abc**d, **abababc** |
| (ab)?c| zero or one **ab** followed <br> by **c**| x**c**, **abc**c |
| ab\|ac | either **ab** or **ac** | **ab**,**ac**,**ab**ac|
| (D\|N)\.Jones| **D.Jones** or **N.Jones**| **N.Jones**, P.**D.Jones**|


## Further Special Characters

### Standrd Special Characters 

These work with **grep**

|Special Character|Function|
|-|-|
| . | Matches any single character|
| [xyz] | Defines a character class that matches **x**,**y**, or **z**|
| ^xyz] | Defines a character class that matches any character except**x**,**y**, or **z**|
| [x-z]| Defines a character class that matches any character **x** through **z** inclusive|
| * |Matches zero or more of the preceding character |
| ^ | Forces a match to the beginning of the line|
| $ | A match to the end of the line|
| \|| Used to quote special characters|
| \(xyz\) | Matches **xyz** |
| \< |Forces match at beginning of word |
| \> | Forces match at end of word|













### Full Special Characters 

These work with **egrep**

|Special Character|Function|
|-|-|
| + | Matches one or more occurrences of the preceding character|
| ? | Matches zero or more occurrence of the preceding character|
| (xyz)+] | One or more occurrences of **xyz**|
| (xyz)+|Zero or one occurrence of **xyz** |
| (xyz)? |Zero or more occurrences of **xyz**|
| (xyz)* |Either **xyz** or **abc** |
| xyz\|abc |Either **xyc** or **abc** |







