Regular Expression “Word Boundary”
Word Boundary:
(The explanation below for grep under Linux using BASH)
Before you can proceed using regular expressions effectively one needs to clearly understand what part of a give string will be treated as a word.
Here are a few points:
- The set [a-zA-Z0-9_] is considered to be a word.
- Any other character between the combination of the above set will be a word separator.
- Any combination of [a-zA-Z0-9_] till the end of the string or line.
- Any combination of [a-zA-Z0-9_] from the beginning of the string till any other character is encountered.
Take the following sentences/strings:
This line contains mice.
mice in the beginning.
is there one in the end! mice
Some mice are free.
Some are trapped like this !mice. !mice! (mice) .
Some are mixedmice, catmicecat and micecat.
Notice I have put ‘mice‘ in many places in the strings. Open you favourite text editor and copy the above strings in the file and save it as say ‘mice.txt‘.
before we proceed using grep, lets make sure grep is set to display the match in different color.
run the following command:
$ alias grep=’grep –color=always’
Running grep command directly on the file mice.txt for ‘mice’ you will notice that all the lines that contains mice are selected. I have highlighted it below:
$ grep ‘mice’ mice.txt
This line contains mice.
mice in the beginning.
is there one in the end! mice
Some mice are free.
Some are trapped like this !mice. !mice! (mice) .
Some are mixedmice, catmicecat and micecat.
Notice that wherever the literals ‘mice’ are coming in the sequence they are selected.
To understand how words are differentiated run the following command to search the word ‘mice’:
$ grep ‘\<mice\>’ mice.txt
OR
$ grep -w ‘mice’ mice.txt
This line contains mice.
mice in the beginning.
is there one in the end! mice
Some mice are free.
Some are trapped like this !mice. !mice! (mice) .
Lets understand the output one line at a time:
Output line 0) mice in the beginning.
The Zeroth line that is in the output contains the ‘mice’ in the beginning and follows a character that is not in the set [a-zA-Z0-9_].
So anything that starts in a string which is a combination of [a-zA-Z0-9_] and then follows any other character which is not the part of the set is a word.
Output line 1) is there one in the end! mice
The first line that is in the output contains the ‘mice’ in the end and just before ‘m’ there is a space which is not in the set [a-zA-Z0-9_]. This makes it a word.
Output line 2) Some mice are free.
The second line contains the ‘mice’ containing and space before and after it, since space is not a part of the word character set it is considered a word. This is also a standard understanding for separating word generally.
Output line 3) Some are trapped like this !mice. !mice! (mice) .
The third line is interesting since it displays three entities selected these are !mice. notice that the character before and after mice is not in the set [a-zA-Z0-9_]. This makes it a word. Similarly for the other two !mice! and (mice) the characters surrounding it are not in the set [a-zA-Z0-9_]. This makes them a word.
So something like the following strings having mice in between will be treated as word.
#@$$$mice##
abcde!mice.ran
can.you.see.the.mice.here
In the following mice will not be treated as a word:
Three blindmice
micemice
redmice
mincemice
Hope this helped you understand what is considered a word from the point of view of a regex.





