Catching email id’s from file(s) using grep and other utils
Here is a simple mechanism that you can use to collect all the email id(s) from a file(s) into a single file. To do this we will be using the following command cat , grep, sort and uniq.
This one liner should do the work
cat file | grep -io ‘\<[^-.][0-9A-Za-z\.\-\_]\+@[0-9A-Za-z.]\+\>‘ | sort | uniq
If you want all the id’s in some file then redirect the above command to a file.
cat file | grep -io ‘\<[^-.][0-9A-Za-z\.\-\_]\+@[0-9A-Za-z.]\+\>‘ | sort | uniq > mailid.txt
Now lets convert this into a shell script where we shall accept a directory name from the user. This directory will be the one containing the files having the email ids
You can download the script from here Script to retrieve Email ids form files in a directory
I have noticed the copy paste of the code below is not working because of formatting characters.
#!/bin/bash
clear
echo -n “Enter the name of a DIRECTORY from where you want to pick up email id’s: “;
read dirname;# check if the entered name is a directory
if [ -d $dirname ];then
cd $dirname; # if it exists change to the directory
else
echo “+============================+”
echo “| Check your directory name! |”
echo “+============================+”
exit 1;
fi
# Loop through all file in the given directory
for files in *
do
if [ ! -d $files ];then
# process all files and store them in a temporary file in users home dir
echo “Processing file $files”;
cat $files | egrep -io ‘\<[^-.][0-9A-Za-z\.\-\_]+@[0-9A-Za-z.]+\>‘ >> ~/$$;
echo “Processed”;
fi
done
cd - # get back to previous working dir, i am assuming it was home
# sort the emails ids in the file, remove duplicates and store in a final file.
sort ~/$$ | uniq >> emailids.$$
# remove the temporary file
rm ~/$$
# tell the user where the mail ids are stored
echo
echo “+=================================================+”
echo ” Your email ids are available in ~/emailids.$$ ”
echo “+=================================================+”
exit 0;
Well I should warn you, the regular expression will catch anything that looks like an email id, so you might end up having lots of things that looks like an email id.
[end]




















