Topics:
Metacharacters that are Operators
Metacharacters used for literal meaning
Escaping metacharacters
Useful Tips for Regular Expression
Metacharacters that are operators
| means or
The regex ‘abc|ABC’ means either abc or ABC can be a match
( ) means grouping
Useful with the repetition metacharacters
Since the repetition metacharacters will repeat only the previous single character, if you need to repeat a group of previous characters, you need to use the ( )
Examples
egrep ‘linux|LINUX’ inputFile
Any line with linux or LINUX will match
egrep ‘abc{3}’ inputFile
Any line with a, followed by b, followed by 3 c’s will match
egrep ‘(abc){3}’ inputFile
Any line with abcabcabc (3 abc’s in a row) will match
Metacharacters used for literal meaning
When the search engine sees a metacharacter, it uses the special meaning of the character
If you want to use the metacharacter for its literal meaning, you need to escape from the meta meaning
2 ways for metacharacters to take their literal meaning:
\ take the literal meaning of next character
[characters] characters inside [ ] have their literal meaning
Examples:
egrep ‘2.5’ inputFile
Match any line with 2, followed by any single character, followed by 5
Matching lines can have: 2a5 or 2 5 or 2.5 or 215
egrep ‘2\.5’ inputFile or egrep ‘2[.]5’ inputFile
Match any line with 2.5
Useful Tips for Regular Expression
(1) For a regular expression to be flexible (and therefore more useful), it most likely will include both literal characters and metacharacters
(2) Make your regular expression as simple (as few characters) as you can
Examples of simple thinking:
‘a+’ and ‘a’ both describe at least 1 a. Use ‘a’
‘a{1}’ and ‘a’ both describe 1 a. Use ‘a’
‘aaaaaaaaaa’ and ‘a{10}’ both describe 10 a’s. Use ‘a{10}’
‘^.*$’ and ‘.*’ both match everything in the line. Use ‘.*’
‘linux|Linux’ and ‘[lL]inux’ both match linux or Linux. Use ‘[lL]inux’
‘^A’ and ‘^A.*$’ both describe a line that starts with A. Use ‘^A’
(3) Pay attention to what the repetition metacharacters will match
Examples of non-intuitive match of repetition:
‘a*’ will match aaaaaaaa (the obvious case), but it also will match bcd (the not so obvious case)
‘^a+$’ means that the line has to have at least 1 a, but
‘^a*$’ means the line can be empty (no character)
(4) Don’t forget the anchors ^ and $ when you need to describe the entire line. This typically happens when you’re looking for:
exactly n numbers of a’s and nothing else: ‘^a{n}$’
only a’s and nothing else: ‘^a+$’
no a’s: ‘^[^a]+$’
If the 3 regex above don’t have both anchors, then the text string: aaaabc will match all 3 of them