10. Regular Expression (Part 2)
Topics:
- Metacharacters that are Operators
- Metacharacters used for literal meaning
- Escaping metacharacters
- Useful Tips for Regular Expression
Metacharacters that are operators
- | means or
- The regex ‘abc|ABC’ means either abc or ABC can be a match
- ( ) means grouping
- Useful with the repetition metacharacters
- Since the repetition metacharacters will repeat only the previous single character, if you need to repeat a group of previous characters, you need to use the ( )
Examples
- egrep ‘linux|LINUX’ inputFile
- Any line with linux or LINUX will match
- egrep ‘abc{3}’ inputFile
- Any line with a, followed by b, followed by 3 c’s will match
- egrep ‘(abc){3}’ inputFile
- Any line with abcabcabc (3 abc’s in a row) will match
Metacharacters used for literal meaning
- When the search engine sees a metacharacter, it uses the special meaning of the character
- If you want to use the metacharacter for its literal meaning, you need to escape from the meta meaning
- 2 ways for metacharacters to take their literal meaning:
- \ take the literal meaning of next character
- [characters] characters inside [ ] have their literal meaning
Examples:
- egrep ‘2.5’ inputFile
- Match any line with 2, followed by any single character, followed by 5
- Matching lines can have: 2a5 or 2 5 or 2.5 or 215
- egrep ‘2\.5’ inputFile or egrep ‘2[.]5’ inputFile
- Match any line with 2.5
Useful Tips for Regular Expression
- (1) For a regular expression to be flexible (and therefore more useful), it most likely will include both literal characters and metacharacters
- (2) Make your regular expression as simple (as few characters) as you can
- Examples of simple thinking:
- ‘a+’ and ‘a’ both describe at least 1 a. Use ‘a’
- ‘a{1}’ and ‘a’ both describe 1 a. Use ‘a’
- ‘aaaaaaaaaa’ and ‘a{10}’ both describe 10 a’s. Use ‘a{10}’
- ‘^.*$’ and ‘.*’ both match everything in the line. Use ‘.*’
- ‘linux|Linux’ and ‘[lL]inux’ both match linux or Linux. Use ‘[lL]inux’
- ‘^A’ and ‘^A.*$’ both describe a line that starts with A. Use ‘^A’
- Examples of simple thinking:
- (3) Pay attention to what the repetition metacharacters will match
- Examples of non-intuitive match of repetition:
- ‘a*’ will match aaaaaaaa (the obvious case), but it also will match bcd (the not so obvious case)
- ‘^a+$’ means that the line has to have at least 1 a, but
- ‘^a*$’ means the line can be empty (no character)
- Examples of non-intuitive match of repetition:
- (4) Don’t forget the anchors ^ and $ when you need to describe the entire line. This typically happens when you’re looking for:
- exactly n numbers of a’s and nothing else: ‘^a{n}$’
- only a’s and nothing else: ‘^a+$’
- no a’s: ‘^[^a]+$’
- If the 3 regex above don’t have both anchors, then the text string: aaaabc will match all 3 of them