Every time an email address is written on a website, it allows spam robots to collect it and abuse it. If you have a website (e.g. blog or forum) that displays the users e-mail address it would be a nice service to mask it for the spam robots.

The safest way to display an e-mail address is to break it up and convert it to something like “name at company dot com”. However, there are a lot of problems involved with that approach. It is difficult to read and you can’t make it into a hyperlink like “mailto:name at company dot com”. If you want to make it into a hyperlink, the best way would be to use a JavaScript function similar to this:

function SendMail(name, company, domain)

{

  link = 'mai' + 'lto:' + name + '@' + company + '.' + domain;

  window.location.replace(link);

}

Then call that method with a hyperlink like this:

<a href="JavaScript:SendMail('name', 'company', 'domain');void(0)">name at company dot com</a>

That will make it pretty difficult to parse for a spam robot.

Another approach is to encode the characters into hex code which is perfectly readable for all browsers, but can proof to be more difficult to parse by robots but not impossible. What a robot can do is to just decode the entire HTML document from hex values into clear text, which will expose the e-mail addresses. But if we mixed clear text and hex values it will be much more difficult for the robot. That’s what the following HttpModule does.

HttpModule

The module replaces all e-mail addresses on your website with the mixed hex/clear text characters. It turns this

<a href="mailto:name@company.com">name@company.com</a>

Into this

<a href="&#109;ai&#108;to:&#110;am&#101;&#64;c&#111;&#109;p&#97;n&#121;&#46;c&#111;m">
n&#97;&#109;e&#64;&#99;&#111;m&#112;any.c&#111;&#109;</a>

It uses the System.Random class to do the mix of the clear text with the hex values. The primary methods in the modules are the ones that through regex, replaces the clear text addresses.

private static Regex _Regex = new Regex("(mailto:|)(\\w+[a-zA-Z0-9.-_]*)@(\\w+).(\\w+)");

private static Random _Random = new Random();

 

private static string EncodeEmails(string html)

{

  foreach (Match match in _Regex.Matches(html))

  {

    html = html.Replace(match.Value, Encode(match.Value));

  }

 

  return html;

}

 

private static string Encode(string value)

{

  StringBuilder sb = new StringBuilder();

  for (int i = 0; i < value.Length; i++)

  {

    if (_Random.Next(2) == 1)

      sb.AppendFormat("&#{0};", Convert.ToInt32(value[i]));

    else

      sb.Append(value[i]);

  }

 

  return sb.ToString();
}

Implementation

You can add this module to any existing web applications without breaking any code. Download the EmailSpamModule.cs below and place it in the App_Code folder on your website. Then add the following to the web.config:

<httpModules>

  <add type="EmailSpamModule" name="EmailSpamModule" />
</httpModules>

Even though the module makes it much more difficult to decode any e-mail address, it is still my advice that you use the JavaScript method if possible. If you're lazy or don't get paid by the hour, go for the module.

Download

EmailSpamModule.zip (1,16 KB)

Comments


Comments are closed