Friday, 4 May 2012

Protecting E-mail Addresses on Webpages: Beware of using mailto protocol

Placing an e-mail address on a Web page is a dangerous prospect nowadays. If the document on which the address appears generates even a medium amount of traffic, it is a given that a robot or other harvester will pick up the e-mail address and add it to dozens of spam lists.

How do these bots and harvesters collect the e-mail? They work by simply accessing the document and examining the document's source. For example, to insert a link to e-mail Jill at The Oasis of Tranquility, the following code can be inserted into a document:

Although this shows as simply "Email Jill" on a user agent's screen, the harvester is able to look at the code to find The mailto protocol confirms that an e-mail address is within the anchor tag. The key to protecting your e-mail address is not to add it to documents in an unencoded format. Instead, obfuscate it using one of several methods, including the following:

1. Break it into pieces that are reassembled by a script, which can't be easily discerned by the harvesters.
2. Encode it using a method that can preserve its functionality

Tip: One low-security method for obscuring an e-mail address is to replace the at sign (@) with its entity equivalent, @. This method relies on the assumption that most harvesters search documents for the literal "@" in their quest for e-mail addresses. By removing the literal at sign, you impede the harvester's ability to recognize e-mail addresses. By using the equivalent entity, you ensure that compliant browsers will still render the at sign properly.

However, most harvesters are now keen to this trick and recognize the entity as well as the literal at sign.

The first method is fairly straightforward and uses a script similar to the following:

<script type="text/JavaScript">
    document.write('<a href="');
    t1 = "mai";
    t2 = "lto";
    t3 = ":";
    t4 = "jill";
    t5 = "&#64;";
    t6 = "oasi";
    t7 = "softra";
    t8 = "nquil";
    t9 = "ity";
    t10 = ".";
    t11 = "com";
    text = t1=t2=t3=t4=t5=t6=t7=t8=t9=t10=t11;
    document.write('">Mail Jill</a>

The script breaks the e-mail portion into small chunks, assigns each chunk to a variable, concatenates the chunks into one variable, and then outputs the entire anchor tag. The key to this method is that the pieces of the e-mail never appear together in the file. For additional security the chunks could have their order scrambled — placing number 6 before 3, and so on.

The other method, encoding the address, is a little more complicated. It requires that you first run a program to encode the address and then use those results in your document. The encoding can be done in a variety of ways, one of which is shown in the following listing, an HTML document with form entry and JavaScript for the encoding:

    <title>Email Encoder</title>
    <script type="text/JavaScript">
    function encode (email) {
      var encoded = "";
        for (i = 0; i  <  email.length; i++) {
          encoded += "&#" + email.charCodeAt(i) + ";";
        return (encoded);
<form action="" name="encoder"
    onsubmit="encoded.value = encode(email.value);
    return false;">
<table border="0" cellpadding="3px">
     <td>Enter your<br/>email address:</td>
     <td><input type="text" name="email" size="30" /></td>
     <td><input type="submit" value="Encode"/></td>
     <td>Encoded email:</td>
     <td colspan="2"><input type="text" name="encoded"

This document displays a form where you can enter your e-mail address. When you click the Encode button, the e-mail address you entered is converted, character by character, into entity equivalents and placed in the Encoded email field where you can copy it to the clipboard for use in your documents. Note that you can encode only the e-mail address or, optionally, the mailto: protocol string or even the entire anchor tag. Just be sure to replace the same amount of text in your document as you encoded.

No comments:

Post a Comment