Anti-spamming Techniques to Combating Unwanted Spam

April 30, 1998

1. Introduction

In the 21st century an e-mail address is, for most people, something “sacred”, like a cell phone number. It is a reliable means that people can employ to contact each other. However, the pervasiveness of spam (also known as junk e-mail and unsolicited commercial e-mail) has encroached upon Internet users’ trust in the Internet. Worldwide statistics show that spam constitutes over 40% of all e-mail traffic. Many of these unwanted e-mail messages perpetuate get-rich-quick-schemes, advertise Viagra or pornography, or even disseminate viruses.

Spammers use various techniques to locate e-mail addresses on the Internet. This contribution takes a look at these techniques and provides certain guidelines that Internet users can use to reduce the risk of spammers finding their e-mail addresses and/or to ensure that they do not receive further spam from existing spammers.

2. E-mail Harvesting

2.1 The technique: E-mail collecting programs.

The first technique spammers employ to locate existing e-mail addresses is called “e-mail harvesting”. Spammers install computer programs (called “spam ware”), which can be downloaded from the Internet, that search the Internet for existing e-mail addresses. These programs employ “search bots” (also known as “web bots”) that navigate the Internet and automatically retrieve e-mail address from public “forums” such as web pages, chat forums, mailing lists, online directories and newsgroups. These e-mail addresses are then collected for use by the spammer.

Each web page on the Internet has a “source code”. The source code contains the content of the web page and instructs the web browser (e.g. Microsoft Internet Explorer) how to display the said content. The source code of any web page can be viewed by clicking on “View” (accessible on the drop-down list on Internet Explorer) and then on “Source”.

At present, search bots scan the source code of web pages for normal text e-mail addresses. An example of a normal text e-mail address is “john@hofmeyr.com”.

2.2 The prevention

How does one prevent an e-mail address from being discovered by these search bots?

Firstly, never give your e-mail address, or allow your employees to give their e-mail addresses, to other individuals by means of public forums such as chat forums and newsgroups. An appropriate e-mail policy can regulate this in the workplace.

Secondly, an entrepreneur should be careful in the way in which he displays his e-mail address on his web site. There are a few alternatives that an entrepreneur can use to prevent spammers from finding his e-mail address:

(A) The “Munging” technique.

Some web site operators add a few letters or spaces to their e-mail addresses to confuse search bots. Examples are “john@hofmeyr.REMOVE-THIS.com”, “John @ hofmeyr.com” (note the spaces before and after the “@”), or “john hofmeyr com”. This is called “Munging”. The web page on which this e-mail address is listed will then indicate somewhere that readers should either remove the word “REMOVE-THIS” in the e-mail address or ignore the said spaces or replace the and with the appropriate symbols.

Note, however, that this technique is not 100% fool-proof in that IT experts indicate that spamming programs exist that automatically remove words such as “REMOVE-THIS” and “NO-SPAM” from e-mail addresses and omit spaces in the e-mail address. Furthermore, the above addresses are still vulnerable where a spammer employs another individual to search for e-mail addresses. Thirdly, aesthetically speaking, it looks terrible.

(B) Obscuring e-mail addresses in the source code

IT experts mostly suggest that an e-mail address should not be stored in normal plain text in a web page’s source code, for then a search bot will most certainly find it. As explained above, normal text simply means that the e-mail address appears as “john@hofmeyr.com” in the source code. This is also the way it will appear on the web page.

IT experts recommend that the characters in an e-mail address, stored in the source code, should be replaced with “hexadecimal encoding”, which is simply a “computer language”. This can be explained as follows. In hexadecimal encoding the e-mail address “john@hofmeyr.com” is expressed as follows in the web page’s source code: “john@hofmeyr.com”

For the layman’s curiosity, “j” – at the beginning of the above code – simply translates to “j” and the “@” translates to “@”. When the web browser reads this, it displays the e-mail address as “john@hofmeyr.com” on the web page. This is known as a human-readable e-mail: the e-mail address is therefore obscured in the source code while it still remains readable to the human eye, when displayed by the web browser.

Free programs exist on the Internet that can obscure e-mail addresses. One such program can be found at www.wbwip.com/wbw/emailencoder.html. One simply keys in the e-mail address and the program automatically changes it to the “obscured e-mail address”, which can then simply be cut and pasted into the web page’s source code.

Most IT experts are of the opinion that these “obscured” e-mail addresses are unrecognisable to present search bots and will therefore not be collected by them. However, one should bear in mind that, whilst this may be true for now, spammers may some day create search bots that can translate these obscured e-mail addresses into normal text e-mail addresses. Secondly, these e-mail address are also vulnerable where a spammer employs another individual to physically search the Internet for e-mail addresses.

Another more-labourus means of “obscuring” an e-mail address (ie ensuring that search bots do not recognise e-mail addresses) is by displaying an e-mail address in the form of a graphic image (think of logo). Therefore the e-mail address does not appear in normal text in the web page’s source code and will not be recognised by search bots. Note, once more, that these e-mail addresses are vulnerable to spammers employing other people to surf the web for valid e-mail addresses.

(C) Creating “online contract forms”

Most IT experts are of the opinion that the only fool-proof technique to avoid search bots from finding e-mail addresses is by not displaying e-mail addresses at all on a web site.

To ensure that Internet users can still contact you by means of e-mail, you create an online “contact form”. The the web page might for example read that John Doe is the manager of ABC Company. If you wish to contact him click on his name or a link provided for this purpose. When the Internet user clicks on the link, an online contact form appears, where a message can be typed. The Internet user, after typing the message, then clicks on “send”. This message is received by John Doe as an e-mail.

Therefore, John’s e-mail address is never displayed to the Internet user nor on the web page. The Internet user will for the first time see John’s e-mail address when John replies to the said message. This reply will be in the form of a normal e-mail.

3. Dictionary Attacks

3.1 The technique

The term “dictionary attacks”, also known as “mail server attacks” and “brute force attacks”, refers to instances when a spammer employs a computer program to rapidly generate possible e-mail addresses by combining names, letters, or numbers into numerous permutations, in an attempt to discover valid e-mail addresses. Spam is then sent to these “possible” e-mail addresses in order to locate valid and active e-mail addresses.

For example, if a dictionary attack is launched on Hofmeyr’s e-mail server, it will begin with “a@hofmeyr.com” and then “aa@hofmeyr.com” and so on. The computer program employed may also start with general terms or names such as “info@hofmeyr.com”, “webmaster@hofmeyr.com”, “sales@hofmeyr.com” or “james@hofmeyr.com”.

Although this is a relatively inaccurate technique in that it “creates” thousands of non-working e-mail addresses, that will simply be rejected by the recipient’s e-mail server as “non-existing”, short e-mail addresses such as “john@hofmeyr.com are vulnerable to these attacks in that, according to IT experts, dictionary attacks have, for example, been conducted against the mail servers of Hofmail.com, at a rate of 3 to 4 tries per second, 24 hours per day, continuously for 5 months. Therefore, some valid and existing e-mail address will be discovered by dictionary attacks and then be recorded as active e-mail addresses.

3.2 Prevention

IT experts are of the opinion that with the appropriate anti-spam software installed, a web site operator will be in a position to recognise these attacks and block all e-mail (ie dictionary attacks) received from a specific e-mail address, once it is discovered and confirmed that a dictionary attack is in progress.

4. E-mail Brokers

It is generally accepted that spammers who acquired e-mail addresses by means of the above harvesting techniques normally sell these e-mail addresses to other spammers. These sellers are called “e-mail brokers”. Normally a plethora of e-mail addresses are recorded on a CD and then sold.

The result is that once you are on one spammer’s e-mail list, the chances are very good that your e-mail address will find its way to other spammers’ e-mail lists.

5. Further Steps that can be Implemented

5.1 Anti-spam software

Most Internet service providers have e-mail filtering programs that reduce, to some extent, spam.

However, to further minimise one’s expose one must install anti-spam software and ensure that its settings are correct. I will use the functions of Norton’s Anti-virus to explain how to curb spam. Of course, Norton’s Anti-virus integrates with Microsoft Outlook Express which allows additional functions to be available in the latter program.

Once you receive spam from e.g. shelly@aol.com, you can configure your Outlook Express, if you use this e-mail program, to block e-mails from that particular e-mail address. This is done by clicking on “Tools”, then “message rules”, then “Block sender’s list”, and then by adding the unwanted e-mail address.

One can also change Outlook Express’ sorting rules to automatically block e-mails containing certain pre-determined words, contained either in the subject line or in the text of the e-mail message. This is done by firstly clicking on “Tools”, “Message Rules”, “Mail”, then by clicking on the appropriate option such as “Where the subject line contains certain words”, then by clicking on “contains certain words” and then adding words such as “sex”, “viagra”, etc.

Note that this is not fool-proof in that most spammers intentionally misspell words in their subject lines in order to thwart the aforementioned. For example, the word “Viagra” can be misspelled as “Vi@gra”, “V!agra”, “V1agra” or “V*I*A*G*R*A”. Whether the anti-virus program will pick up and block these e-mails will depend on the type of anti-virus program used.

5.2 Do not forward e-mails

Be careful of forwarding e-mails to other people. Internet users love to forward jokes and funny pictures to other Internet users, and especially e-mail stating that if you forward this e-mail to other Internet users, then either you stand to win something or you will save someone’s life by warning them of some lurking danger.

At the top of the e-mail, and normally in the text body of the e-mail, you can see the e-mail addresses of all the other people to whom this particular e-mail was forwarded or had been forwarded. This e-mail may fall in the wrong hands, such as those of a spammer. Secondly, it is not inconceivable that a spammer may one day be in a position to insert malicious programs in these e-mails, sending all e-mail addresses displayed in the e-mail to the creators of the said e-mail.

In a workplace scenario, this can be addressed by an appropriate e-mail policy. Outside the workplace, you may consider politely requesting your friends to cease forwarding these types of e-mails to you.

5.3 Exercise discretion in downloading programs from the Internet

One should also be careful of what one downloads from the Internet. Programs professing to speed up Internet connections may contain Trojan programs (called “spyware”) collecting e-mail addresses from one’s Microsoft Outlook. There are programs available on the Internet to counter this.

5.4 Do not open obvious spam messages

Opening these messages can indicate to the spammer that the e-mail address is active. You can instruct your Outlook Express to, for example, never send a “read” message to the sender without your permission. This is done by clicking on “Tools”, “Options”, “Receipts” and then by clicking on the appropriate option.

5.5 Never respond to spam messages

Many spammers include a return e-mail address in their spam stating that if one does not want to receive further message from them, one should click on the return e-mail address or simply reply to the said message stating that one does not wish to receive further messages. This is the easiest manner in which a spammer will know that an e-mail address is active. Once this is discovered, one can be sure to receive more spam.

Note, however, that not all unsubscribe messages are malicious. If you previously subscribed to a particular “legitimate” web site, you can normally accept that by following the unsubscribe options, your e-mail address will be removed from their mailing list.

5.6 Be careful when completing online forms

Many web sites require Internet users to fill in online forms, for whatever purpose. Think for example of “Joke-for-the-day” web sites. Prior to filling in any online form, one should carefully read the web site’s terms of use and the privacy statements. If the web site is silent on this aspect, the chances are very good that they may sell your e-mail address to third parties.

Furthermore, if it is an unknown web site, there is no guarantee that your e-mail address will remain confidential and not be revealed or sold to third parties, even if the web site contains an e-mail privacy policy.

Likewise if a web site requests you to subscribe your friends to its free service, do not provide the web site with your friends’ e-mail addresses.

One should also bear in mind that even if these web site operators, such as those managing Joke-for-the-day web sites, do not sell the e-mail addresses obtained, spammers can use programs to hack into these systems and abstract the e-mail addresses.

5.7 Do not easily surrender your e-mail address

Do not easily provide your e-mail address to all. An example is warranty cards. As one commentator put it: Do you really want the toaster company to send you e-mail about all its forthcoming toasters?

5.8 Use in appropriate circumstances a second “disposable e-mail address”

If you really have to provide an e-mail address and you do not entirely trust the web site, open a fee e-mail account (such as Hotmail.com) and use that e-mail address for communicating with the web site. Should this e-mail address subsequently receive spam, you can simply drop the e-mail address. There are no cost implications.

Another option is also to use free e-mail forwarding services. Examples are SpamEx, Spam Motel, and Despammed. These services work as follows. You create an e-mail account with these entities. All the e-mail received by this e-mail address is forwarded to your “real” e-mail address, ie the one you use most. Should this forwarding e-mail address receive spam, you simply disable the forwarding mechanism.

5.9 Complaining to the spammer’s ISP

Often, one can glean which e-mail address a spammer is employing. For example, unsolicited e-mail may be send from “alice@aol.com”. From the second level domain name (i.e. “aol”) it can be gleaned that a user of AOL (America Online) is sending the messages. Recipients can ask their system administrators to e-mail AOL’s system administrator(s) and complain that their e-mail system is being used for spamming. In some instances it is easy to find the correct person to complain to – the complaining e-mail can be send to “abuse@” or “webmaster@”. In other instances it is an onerous task to find the correct person to whom one can complain.

Bearing in mind that some ISPs receive thousands of complaints per day, one should not expect a quick response. However, it is worth a shot.

5.10 Complain to your own ISP

You can also indicate to your own ISP that spam is being received from a specific e-mail address. Indicating these e-mail addresses to them will assist them in identifying spammers and empower them to block spam received from certain e-mail addresses.

6 Conclusion

E-mail addresses are constantly at risk of being discovered and persistent spammers will go to great lengths to find valid and active e-mail addresses. None of the above techniques guarantee a spam-free e-mail address but they will reduce spam and manage the aforementioned risk.

Dr Gerrie Ebersöhn is a Member of the IT LAW UNIT at Hofmeyr Herbstein & Gihwala Inc (Sandton), South Africa.