The actual post became a bit too long so have chopped in 3 parts.
Here is a small explanation what ‘Backslash Character’ \b and \B does when use in a regular expression. I will try to demonstrate it using the grep command under Linux.
In most of the place you will find the official explanation says:
\b Match the empty string at the edge of a word.
\B Match the empty string provided it’s not at the edge of a word.
Before you can proceed forward you should be clear about how a word is defined with respect to a regex or rather what will be considered as a word when you will be using option/meta characters etc which work on word. Check out the earlier entry Regular Expression “Word Boundary”.
If you are already clear in you understanding with how a word is defined we can proceed ahead.
Take the following sentences/strings:
This is a line that has a cat.
cat command is used to concatenate two or more files.
old weapon used to hit birds is catapult.
Be unique do not be a copycat.
We will use the string ‘cat‘ as reference for understanding how \b and \B works. Open you favorite text editor and copy the above strings in the file and save it as say ‘wordedge.txt‘.
before we proceed using grep, lets make sure grep is set to display the match in different color.
run the following command: $ alias grep=’grep –color=always’
First run the command directly for the string ‘cat’ on the file wordedge.txt your o/p should be something similar below:
$grep ‘cat’ wordedge.txt
This is a line that has a cat.
cat command is used to concatenate two or more files.
old weapon used to hit birds is catapult.
Be unique do not be a copycat.
In the above case it is looking for occurrence of characters ‘c’ followed by ‘a’ followed by a ‘t’ and it selects those where ever applicable.
\b is used to match the pattern at the edge(s) of the word.
\B is used to match the pattern which is not at the edge of the word.
Using \b: Finding pattern at the left edge of words.
Enter the following command (note the \b in the beginning of the pattern)
$grep ‘\bcat’ wordedge.txt
This is a line that has a cat.
cat command is used to concatenate two or more files.
old weapon used to hit birds is catapult.
Notice that first and second line of output cat is selected it is a completed word. This is because the word cat is stand alone and the pattern ‘cat’ is beginning from left side the word.
The line three of the output make it clear as you can see ‘cat’ is selected from ‘catapult’ because it is at the left edge of the word.
Using \b: Finding pattern at the right edge of words.
Enter the following command (note the \b in the end of the pattern)
$grep ‘cat\b’ wordedge.txt
This is a line that has a cat.
cat command is used to concatenate two or more files.
Be unique do not be a copycat.
Notice that first and second line of output cat is selected it is a completed word. This is because the word cat is stand alone and the pattern ‘cat’ is ending from right side the word.
So one point to understand is that if the pattern is available as a standalone word it will match for both right and left edge.
The line three of the output make it clear as you can see ‘cat’ is selected from ‘copycat’ because it is at the right edge of the word.
Using \bpattern\b
So what happens when we put ‘\b’ on both sides of out pattern.
Enter the following command (note the \b at both ends of the pattern)
$grep ‘\bcat\b’ wordedge.txt
This is a line that has a cat.
cat command is used to concatenate two or more files.
So when ‘\b’ is used on both the sides, then the whole word is selected.
It can be interpreted as that cat should be the beginning of the word and as well as the end of the word.. which mean it has to be that word.
Next About:Regular Expression — Use of \B (NOT Word Edge)