Nothing To Lose

If you don’t have it, how can you lose it!
Subscribe

Archive for the ‘Regular Expressions’

Convert lines of text to HTML list item – using regex

March 21, 2011 By: Dexter Category: HTML, Regular Expressions, Web Development

Many a times I have to update pages where I need to display contents as HTML lists. The problems is the contents sent to me is in n number of lines of plain text.

Generally you will end up copy pasting <li> and </li> at the beginning and end of every line, for few lines it is ok, but what about if you have around 100 lines.

Well well well… if you knew regular expressions you life will be saved, here is what you do –

  • Copy paste all text in a text editor that supports find and replace with regular expression support.

This is line one
This is line two
This is line three
This is line four
This is line five
This is line ….
…………………….
…………………….
This is line fifty

  • open the replace dialogue box fill the following (make sure you have selected the regular-expression option for find)
    • find text : ^     (yup the carat symbol)
    • replace with : <li>
    • Hit replace all — all your lines should now be showing <li> in front of it.

<li>This is line one
<li>This is line two
<li>This is line three
<li>This is line four
<li>This is line five
<li>This is line ….
<li>…………………….
<li>…………………….
<li>This is line fifty

  • Open the replace dialogue box again and repeat the same steps again only:
    • find text now should be: $
    • replace with : </li>

<li>This is line one</li>
<li>This is line two</li>
<li>This is line three</li>
<li>This is line four</li>
<li>This is line five</li>
<li>This is line ….</li>
<li>…………………….</li>
<li>…………………….</li>
<li>This is line fifty</li>

  • Do forget to add unordered list or ordered list  tags <ul> or <ol> at the first line and </ul> or </ol> respectively as the last line to complete the HTML list.

<ul>

<li>This is line one</li>
<li>This is line two</li>
<li>This is line three</li>
<li>This is line four</li>
<li>This is line five</li>
<li>This is line ….</li>
<li>…………………….</li>
<li>…………………….</li>
<li>This is line fifty</li>

</ul>

And in three steps your complete text is now  a HTML list.

Catching email id’s from file(s) using grep and other utils

April 18, 2009 By: Dexter Category: BASH, Linux Commands, Regular Expressions, Shell Scripting, Tutorial

Here is a simple mechanism that you can use to collect all the email id(s) from a file(s) into a single file. To do this we will be using the following command cat , grep, sort and uniq.

This one liner should do the work

cat file | grep -io ‘\<[^-.][0-9A-Za-z\.\-\_]\+@[0-9A-Za-z.]\+\>‘ | sort | uniq

If you want all the id’s in some file then redirect the above command to a file.

cat file | grep -io ‘\<[^-.][0-9A-Za-z\.\-\_]\+@[0-9A-Za-z.]\+\>‘ | sort | uniq > mailid.txt

Now lets convert this into a shell script where we shall accept a directory name from the user. This directory will be the one containing the files having the email ids
You can download the script from here Script to retrieve Email ids form files in a directory
I have noticed the copy paste of the code below is not working because of formatting characters.

#!/bin/bash
clear
echo -n “Enter the name of a DIRECTORY from where you want to pick up email id’s: “;
read dirname;

# check if the entered name is a directory

if [ -d $dirname ];then

cd $dirname; #  if it exists change to the directory

else

echo “+============================+”

echo “| Check your directory name! |”

echo “+============================+”

exit 1;

fi

# Loop through all file  in the given directory

for files in *

do

if [ ! -d $files ];then

# process all files and store them in a temporary file in users home dir

echo “Processing file $files”;

cat $files | egrep -io ‘\<[^-.][0-9A-Za-z\.\-\_]+@[0-9A-Za-z.]+\>‘ >> ~/$$;

echo “Processed”;

fi

done

cd -  # get back to previous working dir, i am assuming it was home

# sort the emails ids in the file, remove duplicates and store in a final file.

sort ~/$$ | uniq >> emailids.$$

# remove the temporary file

rm ~/$$

# tell the user where the mail ids are stored

echo

echo “+=================================================+”

echo ” Your email ids are available in ~/emailids.$$ ”

echo “+=================================================+”

exit 0;

Well I should warn you, the regular expression will catch anything that looks like an email id, so you might end up having lots of things that looks like an email id.
[end]

Counting occurences of a words/pattern using grep and wc

March 29, 2009 By: Dexter Category: Regular Expressions, Tutorial

Some times you will come across the requirement of counting how many times a word/pattern has occurred in a file (text).
Here is a simple usage of the command grep and wc to do the same.
lets use the following file for example. (I have named it sample.txt)

Linux is a nice operating system.
Many people thing Linux just has a text based interface.
When people see the GUI on Linux they are really amazed.
Linux was developed by Linus Torvalds.
Linux is to UNIX, so if you have worked on Linux you will be able to work on UNIX also.

Now to count how many time “Linux” occurred in the text you can use:

grep -o ‘Linux’ sample.txt | wc -l

Explaination:
The grep command with -o option searches for the pattern in quotes, in this case ‘Linux’, If you just run the command like that you will get an output like

grep -o ‘Linux’ sample.txt
Linux
Linux
Linux
Linux
Linux
Linux

Now in the actual command

grep -o ‘Linux’ sample.txt | wc -l

When this output is piped to wc -l, which is a word counter tool, with the option -l it counts the number of lines that it receives, and since the output of grep gives each match in a different line, the number of lines are equal to number of occurrences of the text/pattern/word.

Remember grep is case sensitive so use the -i option to ignore case

grep -io ‘Linux’ sample.txt | wc -l

else only ‘Linux’ will be selected all other occurrences will be ignored.

Of course if you are just looking for how many lines have a particular pattern/word occurring and not the count of the word/pattern it self use

grep -c ‘Linux’ sample.txt
OR
grep -ic ‘Linux’ sample.txt // to ignore case

Of course if you are familiar with patterns matching, you can replace ‘Linux’ with your regular expression to look for occurrences of a particular pattern

e.g  ‘(Linux|UNIX|AIX)’  will look for occurrence of Linux or UNIX or AIX.

Note if you are going to use grep do not forge to escape the brackets and the pipe symbol.

Hope that was useful.

NOTE: This explanation is with respect to BASH Shell (GNU bash, version 3.1.17(2)) with grep (GNU grep) 2.5

[end]

Regular Expression — Using of \b and \B together. Part 3

March 18, 2009 By: Dexter Category: Regular Expressions, Tutorial

Using \B and \b together: Finding pattern at the NOT at RIGHT edge of words.

Continues from Previous Post: Regular Expression — Use of \B (Not Word Edges) Part 2

For ‘\B’ at the beginning of the pattern and ‘\b’ at the end of the pattern


$grep ‘\Bcat\b’ wordedge.txt
Be unique do not be a copycat.


when ‘\B’ is at the beginning of the pattern and ‘\b’ at the end of the pattern, ‘\B’ makes sure that the pattern is not at the beginning of the word and ‘\b’ on the end make sure the word is on the right edge of the word.
So in this case you will find the patterns always at the right edge of the word.
This is different than using just ‘\b’ at the end of the pattern because just using ‘\b’ allows us to select the whole word itself if it is the pattern.

Now vice-versa:

For ‘\b’ at the beginning of the pattern and ‘\B’ at the end of the pattern.


$grep ‘\bcat\B’ wordedge.txt
old weapon used to hit birds is catapult.


when ‘\b’ is at the beginning of the pattern and ‘\B’ at the end of the pattern, ‘\b’ makes sure that the pattern is at the beginning of the word and ‘\B’ on the end make sure the word is NOT on the right edge of the word.
So in this case you will find the patterns always at the left edge of the word.
This is different than using just ‘\b’ at the beginning of the pattern because just using ‘\b’ allows us to select the whole word itself if it is the pattern.

Hope this was useful.

[enough]