E-mail Tracing

August 31, 2004

Winter 1971, Cambridge, Massachusetts: Computer engineer Ray Tomlinson was about to revolutionise the way people communicate all across the world. With his simple “Send Message” program, Tomlinson sent the very first e-mail. Initially running only on a local network, SNDMSG (as it was known) was subsequently further developed to make it possible to send messages across the recently developed precursor to the Internet; the Arpanet. The ubiquitous “@” symbol came to Tomlinson after “30 to 40 seconds of thought”, as it is not a symbol that occurs naturally in a person’s name, nor is it a digit. The usage of e-mail has grown to the point where in October 2003 it was suggested that 31 billion e-mails are now being sent on a daily basis and, for many people, their personal and business lives would now be unthinkable without it.

With the exception of unwanted advertising, so called “spam”, the vast majority of the e-mail messages that criss-cross the Internet are perfectly legitimate; however, a small number have a more nefarious purpose – to misdirect and to deceive. There are many ways in which e-mail may be unlawfully used, a few examples being: to threaten anonymously; as a form of correspondence during criminal activity; to propagate viruses; or as part of a series of fraudulent transactions. As a result of this, and the requirement to identify the individuals behind such messages, the ability to trace the origins of such e-mails takes on great importance.

The Basics

To understand the processes involved in tracing e-mail messages, one must first understand the basic mechanics of how data is transferred across the Internet.

When a computer connects to the Internet it is required to have a unique, identifying address known as an IP address (a “dotted” decimal number, for example 216.27.45.134). A computer’s IP address is used to uniquely identify that specific computer whilst it is connected to the Internet. The Internet requires this IP address information in order for the computers that are connected to it to know where to deliver the information that a computer requests from the Internet. For example, when a user types the Internet address of the Web site that he wishes to access into his Internet browser, his computer sends a request across the Internet to the computer that stores the data that represents the required Web site. This computer then sends the data (ie the requested Web page) back to the computer that made the request. Without an IP address there would be no way of uniquely identifying the computer that made the request, therefore the computer hosting the Web site would not know where to send the required data.

Typically, a computer used by either a home user or small business will not be directly connected to the Internet but will connect via an Internet Service Provider which has a pool of IP addresses available for their customers. An IP address will typically be dynamically allocated to an individual computer for the duration of a specific Internet session and will not change for the duration of that session. When the computer disconnects from the Internet, the IP address that has been allocated to that computer will be freed up to be re-allocated by the ISP.

Most corporate users will own, or lease, one or more IP addresses that, unlike those typically used by home users, do not change over time; these are known as static IP addresses. To resolve the issue of the disparity between the number of unique IP addresses that a company owns and the number of users who may be connected to the Internet at any one time, a company will typically use Network Address Translation which enables multiple computers on a network to utilise the same IP address, however, each computer can still uniquely request and receive information from the Internet; a computer on the corporate network is used to perform a “translation” between the internal, private address that each computer on the network is allocated, and the external, public IP address that enables data on the Internet to be accessed.

E-mail transmission

The way an individual e-mail message is sent and received across the Internet can be represented diagrammatically. The diagram in Figure 1 represents the individual steps that an e-mail passes through as it is sent from an individual home user to a corporate user:

As is shown above, when the author sends the e-mail, the message is sent from the author’s computer to the e-mail server at their ISP. The ISP then processes the e-mail address contained within the message to determine the e-mail server to which the message ultimately needs to be delivered in order that it can reach its intended recipient. Once this information has been determined, it can identify the next computer on the Internet to which the message needs to be sent in order that it can ultimately be appropriately routed to ensure that it is delivered to its intended recipient. The message is then sent, typically via a number of different computers on the Internet, to the ISP of the recipient’s company. The ISP then passes the e-mail to the company e-mail server which identifies the recipient of the e-mail message and stores it within the recipient’s e-mail account. When the recipient accesses their e-mail account, the message is then finally delivered to their computer.

Each of the computers on the Internet through which the e-mail message passes on its journey from the author’s computer to the recipient’s computer will typically add additional information into the e-mail, recording the details of the computer and the date and time that the message passed through. This information, in addition to other information added by the author’s computer when the message was sent, and the recipient’s computer when it was received, is stored within the header of the e-mail message. The message header is typically hidden from view by most e-mail applications but, if required, can be accessed. It is this information contained within the e-mail header that is required to enable the author of the e-mail message to be traced.

E-mail headers

The following e-mail message header is taken from a message which has been sent from a home computer to a computer on a corporate network:

Return-path: <Craig@car15.com>

Received: from mail.lee-and-allen.com – 1

([184.123.18.240]) – 2

by lee-and-allen.com; Sat, 08 May 2004 16:36:21 +0100

Received: from hafnium.btinternet.com (unverified) by mail.lee-and-allen.com

(Content Technologies SMTPRS 4.2.5) with ESMTP id <T6971fd41a3c0a8012808f@mail-host.lee-and-allen.com> for <cearnshaw@lee-and-allen.com>;

Sat, 8 May 2004 17:06:28 +0100

Received: from [217.148.167.180]3 (helo=cge_home_pc) – 4

by hafnium.btinternet.com with smtp5 (Exim 3.22 #25)

id 1BMTkq-0007ZN-00

for cearnshaw@lee-and-allen.com; Sat, 08 May 2004 16:28:45 +0100

From: “Craig Earnshaw” <Craig@Car15.Com> – 6

To: <cearnshaw@lee-and-allen.com>

Subject: FW:

Date: Sat, 8 May 2004 16:31:29 +0100 – 7

Message-ID: <MMEAJJKILJMDNKOMFGLOEEKKDFAA.Craig@Car15.Com> – 8

MIME-Version: 1.0

Content-Type: text/plain; charset=”iso-8859-1″

Content-Transfer-Encoding: 7bit

X-Priority: 3 (Normal)

X-MSMail-Priority: Normal

X-Mailer: Microsoft Outlook IMO, Build 9.0.6604 (9.0.2911.0) – 9

X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1409

Importance: Normal

The above header contains a large amount of information that assists in the tracing of the route by which the e-mail message has been transferred across the Internet as well as assisting with the identification of the author of the message, including:

1. The name of the e-mail server at the recipient’s ISP.

2. The IP address of the recipient’s company e-mail server.

3. The IP address allocated to the author’s PC when the e-mail was sent.

4. The name of author’s PC.

5. The name of e-mail server at the author’s ISP.

6. The e-mail address of the author.

7. The time, date, and offset from GMT that the e-mail message was sent.

8. The unique ID number associated with the e-mail message.

9. The name of the e-mail application used to generate the message.

Tracing the author

As the IP address of the author’s computer is contained within the header of the e-mail message (shown at 3 above) this information can be used to ascertain the ISP to which the author’s computer was connected at the time that the message was sent. The author’s IP address will typically be included within the e-mail header whether the message was sent via a traditional e-mail application, such as Microsoft Outlook, or by an Internet-based “webmail” service such as Hotmail. In addition to this, if the message is sent using an e-mail application such as Microsoft Outlook, the name of the author’s computer will also be contained within the e-mail header (the name of the computer is included in the e-mail header after the “helo=” at 4 in the above sample e-mail header), which can be of considerable benefit when identifying the actual computer used to send the e-mail

When the author’s ISP has been ascertained from the information contained within the e-mail header, the author’s ISP can then be contacted and provided with a number of elements from the e-mail header which can typically be correlated with the IP address allocation logs, and additional logged data, to identify the account that was used to send the message. The account registration details kept by the ISP would typically contain information such as the user’s name, address, telephone number and payment details. In order for the ISP to disclose this information, a Norwich Pharmacal order is usually required to protect the ISP from claims that it is in breach of the confidentiality and privacy agreements that it has with its customers. It is vital that the ISP is contacted as soon as possible after the e-mail has been received as the IP address allocation logs, and other relevant logged data, are typically stored by the ISP only for a short period (around 14 days).

The account details that are provided by the ISP may not necessarily be indicative of the author of the e-mail in question, as they will record the contact details of only the individual to whom the account is registered. This information is, however, usually enough to enable any further action to be taken, for instance obtaining a subsequent order for the delivery up of the computers that are configured with the name of the computer included in the e-mail header that are located at the address at which the account is registered.

If it is identified that the author’s ISP is not UK-based, as long as the legal system in the jurisdiction in which they are based supports an order with the same effect as a Norwich Pharmacal order, then the same procedures can be followed.

There is a possibility that the IP address, or other information present in the e-mail message, may not be indicative of the computer from which the e-mail was originally authored. There are a number of software applications, online re-mailing and anonymising services, and other, more advanced, techniques that can be used to obfuscate an individual’s IP address, thereby potentially preventing it from being traced in this manner.

Similar tracing scenarios

Another important form of investigation (which works on a very similar basis to that of tracing the author of an e-mail message) is that of tracing the author of messages posted to online forums. Many types of online forum exist, discussing wide-ranging topics from stock prices to the events happening in a certain neighbourhood. There are occasions where people will post malicious, threatening, fraudulent, or confidential material into these online forums and parties such as the police, the administrators of the forums, or the company affected by the postings require that the identity of the individual who made the postings be ascertained. The methodology for identifying the individual behind a posting is the same as that for identifying the author of an e-mail message: a user, when posting to a message board, would typically have their public IP address recorded by the forum which can be used, in a similar manner to that described above, to ascertain the individual’s identity.

Craig Earnshaw is the head of Lee & Allen’s Forensic Computing Services group and is responsible for the group’s three offices in London, New York, and Hong Kong. He has acted as an expert witness in the UK and several international jurisdictions in a wide range of matters and has also acted as a court-appointed expert. Sandeep Jadav is a Forensic Computing Consultant based at the London offices of Lee & Allen’s Forensic Computing Services group.