Saturday, November 27, 2010

The dumb mail project


For ages I have tried to educate my friends and relatives about not spreading hoax mails and stuff like that and I astonishingly see that they keep on sending that sxxt! Instead of surrendering against it, I will just try to have some fun about it. My plan is doing the following:
  • forward those hoax and non hoax mails with a small tux pic tux.gif hosted in a server I own.
  • Study the path and behaviour of that emails and people who are opening it.
  • take some conclusions, plot maps, etc...we will see.
Some results can be seen in the end of this post.


What I used:
  • apache2
  • the gimp
  • a sub domain
  • an email account that we all have crowded of hoax and similar stuff

The pic

Edited tux image with gimp so that It's small. tux.gif
$ file tux.gif
tux.gif: GIF image data, version 89a, 10 x 12

Apache2 configuration

Created a custom log in apache adequate for my needs:
  • referral: to try to see which email service are using the users
  • date and time
  • originatin ip
  • user agent: to see which web browsers are using the users
In the file /etc/apache2/apache2.conf we have some already-defined log formats
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
LogFormat "%h %l %u %t \"%r\" %>s %b" common
LogFormat "%{Referer}i -> %U" referer
LogFormat "%{User-agent}i" agent

I added a new custom log format under the name dumbmail,I wanted the date to be yymmddhhmmss for tracing purposes
LogFormat "%{%y%m%d%H%M%S}t|%h|%r|%{Referer}i|%{User-agent}i" dumbmail

The brackets before the i variable are there because "%i" is an array containing information sent from the user browser, but I only need the user-agent and the referer field.
As I want to see how different types of emails are spreading into the net I needed an easy/automatic way of differentiate which connection received belongs to which mail. I solved this by creating a redirect, this is, an only tux.gif resource exists on my server but It will be accessed by different URLs that allow me to infer which email the users are reading.
I will copy the virtual host definition I created:
<VirtualHost *>
        DocumentRoot /var/www/thedumbmail
        CustomLog /var/log/dumbmail.log dumbmail
        RedirectMatch ^/test tux.gif

By the way I dont want that file to be rotated so Ill take care my self of compressing or managing it in some way.As you can see when a request starting by test comes to my virtual host its redirected to tux.gif

Forwading the emails

finally I edited the mails in html mode adding the following
<IMG alt="" src="" >

where # is the test number I am sending. Each test corresponds to a certail email Im forwarding.

The resulting log file

The result in my log files are get request to the different test# urls Im forwarding in the emails. As you can see first they "touch" the test#.gif resource and are redirected to the tux.gif.
081129154722||GET /test3.gif HTTP/1.1||Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; InfoPath.1)

081129154722||GET /tux.gif HTTP/1.1||Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; InfoPath.1)

Apache offers some conditional login facility that will allow me log only the interesting connections which are those ones that refer to the test#. You only have to add the following to your virtualhost configuration
SetEnvIf Request_URI "tux.gif$" dontlog
        CustomLog /var/log/dumbmail.log dumbmail env=!dontlog

Exploiting the information

With the information gathered I am plotting a map that you can see in the following link. The numbers are ordered based on timestamp information:

No comments:

Post a Comment