eDiscovery Tools in a Forensic Investigation

November 29, 2010

E-mail is a primary communication medium in today’s organisations, allowing them to reach a broad range of recipients in a manner that is fast and auditable. However, e-mail can easily be abused, from the sending of unsolicited e-mails, to identity concealment in a sophisticated fraud scheme; abuse may cause damage to an organisation. This article details one incident which Ernst & Young were called in to investigate.

Sarah, an employee of a global organisation, received anonymous slanderous e-mails suggesting she was having an inappropriate personal relationship with one of her team. These e-mails were also sent to Sarah’s clients with the intention of damaging the business relationship. Sarah suspected the author of the e-mail was a disgruntled employee who had grievances with her and the company as the recipient list could only have been generated by someone with knowledge of business operations. The author used a well known third-party e-mail provider using a false identity. Sarah was obviously concerned about the potential damage to her and the company’s reputation and notified senior management, who in turn contacted EY for assistance. Our client was wary of public disclosure as it felt any leak of information could damage its brand. Our client therefore wanted the investigation completed within one week. The objective of our investigation was to determine whether electronic evidence could be found to support the allegations of wrongdoing, to uncover the facts and to identify the author of the e-mails.


IT forensic investigations are generally structured around collection, examination, analysis and reporting.[1] This close alignment to the Electronic Discovery Reference Model (EDRM) (Figure 1: download the figure from the panel opposite) allows us to use eDiscovery techniques to complement a traditional investigation as the EDRM is a proven model of increasing relevance whilst decreasing volume. 


This stage of our investigation relied on identifying the suspects, the potential sources of data, and analysing this information to direct the latter stages of the investigation. Great care needs to be taken at the identification stage to balance the advantages of cost and time streamlining against the risk of excluding relevant data sources and custodians from the investigation.

We created two teams. One team gathered information regarding the ‘who knew what, when, and how’. To achieve this, the team interviewed the e-mail recipient to learn more about the business processes. A top-down approach of interviewing management was also used to narrow down the suspects to investigate. The second team gathered information from the client’s IT management to get a full breakdown of hardware, software and back-up polices. We created a topological diagram of the IT landscape in order to clearly illustrate the relationship between custodians, data, and physical hardware.

Eight employees were identified as priority suspects from approximately 150 employees. Data was stored on network file shares, laptops, desktops, mobile phones, external hard drives, home shares, e-mail server, back-up tapes, CCTV systems, and portable media such as CDs, DVDs and USBs thumb drives. The analysis of the business processes deemed only laptops, mobile phones and the e-mail server to be relevant to our investigation.  However, we complemented this with CCTV analysis of the office area.  

Collection & Preservation

Our EY collection team arrived on-site fully equipped with forensic acquisition kits [Figure 2:  not reproduced online] to start the collection and preservation. Each acquisition kit contains over forty pieces of hardware and includes items such as write blockers, dedicated hard drive copiers, anti-static mats, SATA drives, cameras and software for on-site processing. 

We generated bit-stream copies (images) of media using guidelines outlined by ACPO.[3] The ACPO guidelines specify how data should be captured in a forensically sound manner.  On this investigation, the custodians’ computers were imaged using a dedicated hard drive imaging device called a Dossier.[4] The Dossiers allowed us to image data from one or two suspect drives onto back-up and target drives simultaneously. This allowed us to quickly image suspect hard drives and create multiple copies of the suspect drives to speed the investigation process.

The client had CCTV cameras installed at different locations around the building. The CCTV system was connected to a main computer system. We imaged the hard drives of the CCTV computer for analysis to determine who was in the office on the dates the e-mails were sent. Before commencing analysis, the clock of the CCTV system was read to verify an accurate time and date was displayed, and compared to the time off-set on the suspects’ computers.  

The suspects’ company mobile phones were also seized for imaging. The phones were examined using a tool called Asceso which is developed by Radio Tactics.  Asceso allows for handsets, SIM cards and media cards to be examined.[5] This tool allowed us to extract information around the time the e-mails were sent to see if any text messages were sent related to the incident.  

We also collected data from a Microsoft Exchange e-mail server. The data was extracted using a tool called Exchange Mailbox Merge Wizard (EXmerge). The extracted data was locked down in a logical evidence file.[6]

A preservation exercise was carried out on the collected data by means of bar-coding each piece of evidence and adding this to an evidence database to maintain a legally defensible audit trail. All imaged data collected was copied onto separate target and backup drives and then sealed in tamper-proof evidence bags. The backup copies were put into a secure evidence safe. The target copies were used for analysis.  

The integrity of the evidence collected and preserved was maintained through proper handling in accordance with ACPO guidelines, which include requirements as to full chain of custody documentation. Chain of custody refers to the chronological documentation or paper trail, showing the seizure, custody, control, transfer, analysis, and disposition of evidence, whether physical or electronic.[7] 


Prior to processing the data, we ran Early Case Assessment (ECA) EnCase EnScripts (EnScripts) over the imaged data to get an early indication of the behaviour of the suspects, such as which applications were used, and the broad categories and date ranges of files that were created. We then applied a pre-processing EnScript over the image to extract only readily-recoverable user-generated documents ie container files, documents and archived e-mail. This process involves extracting user-generated files within the image, based on its file signature, but excluding those files whose hashes match those in the National Software Reference Library (NSRL) list of application and system-generated files.[8]  We also generated our own exclusion list by hashing files from the client’s corporate laptop load-set. This helps to reduce the number of redundant files being examined and thus increases the efficiency of the investigation.

When investigating multiple suspects, the likelihood of examining duplicate files increases. To combat this, we employ a number of de-duplication techniques to reduce the number of duplicate documents that are processed and subsequently examined. In this investigation, we applied custodian level duplication which retains single copies of each suspect’s files and suppresses duplicate files. Another method is global de-duplication whereby we retain only single copies of documents across the entire case.  Additionally, near de-duplication uses more sophisticated techniques to allow subtle changes to be detected so that only the latest version of a document will be presented. De-duplication can cut down the total data volume by over 40%. Similarly, e-mail thread reconstruction ensures that only the latest copy of an e-mail conversation is ever seen. These eDiscovery techniques allow an investigator to quickly filter out irrelevant documents.

At this stage, we did not carry out a deep forensic analysis of the suspects’ media. We wanted to get an overview of all of the suspects’ data prior to carrying out any deep forensic investigation.

Analysis & Review

After extracting the user-generated documents and e-mails, the next stage was to load the custodians’ data into a review tool to allow the investigating team to interrogate the data and narrow down the suspect list.  To achieve this, we loaded the data into a review tool called Attenex Patterns eDiscovey software.[9] This tool generates a visual concept map of the document collection [Figure 3:  download the figure from the panel opposite] based on the nouns and noun phrases in each e-mail or document, known as clustering. A reviewer can dynamically re-cluster the collection around concepts or documents of interest to quickly focus on a particular aspect of the review or investigation. If similar information is contained in multiple documents, those documents are clustered together making it very easy to drill down into e-mails quickly and make a review decision on individual documents, or whole clusters (ie relevant, not relevant, hot etc). Attenex’s clustering has a distinct advantage over traditional keyword searches that would be run in other review tools, or in IT forensic investigations. Whereas a keyword search for the word ‘party’ may return birthday party invitations in addition to e-mails discussing political parties, Attenex’s clustering would detect the different noun phrases and cluster documents about birthday parties separately from documents about political parties.  In addition to this, the reviewable set of documents was reduced by using custodian and near de-duplication.

Attenex also has a search facility powered by DTSearch where keyword searches can be used to interrogate all of the data in a particular set.  Other ways of searching data include searching date ranges, file types and other document metadata.  

The review team found one particular custodian with highly relevant e-mails. The e-mail exchanges contained references to the victim.  We identified this custodian as the main suspect to investigate further.  The evidence at this stage was not concrete; however it did cause us to carry out further analysis on his hard drive, mobile phone and CCTV data.   Attenex allowed us to decide which custodian was most relevant very quickly.  

Every e-mail has a header containing information about routing, the origin, authorship and the recipients.[11] By analysing the header, we were able to determine the IP address from which the e-mail was sent. The header of an e-mail sent from an organisation will typically report the IP address of the organisation’s Internet-facing server, and not the address of individual internal computers. By checking the IP address against the list of IP addresses provided by our client, we were able to determine that the sender probably used a proxy in an attempt to conceal the source of the e-mail. We knew that the e-mail was sent from a web based e-mail provider where the user would be required to use an ID string (eg someuser@email.com). Typically, a user will use the same ID for many different Internet services, such as on e-commerce websites, Internet forums and e-mail. We needed to answer several questions.  Could we associate this custodian with the e-mail ID? Could we prove the use of proxies? Could we show this custodian accessed his or her web mail?

Previously, in order to gain an overview, we only extracted live user-generated documents. Once we had narrowed down our list of suspects to a single custodian, we carried out a data carve on the custodian’s laptop. A data carve is a technique used to extract data that is no longer referenced by the file system[12] which is indicative of file deletion.  We found text fragments indicating deleted Internet History records pointing to the webmail provider’s website. We were additionally able to identify several references to the ID used to access the e-mail provider used on several other websites. Finally, our CCTV footage corroborated our electronic evidence by placing the custodian at a computer at the time the e-mails were sent.


Our investigation was a success. By combining traditional eDiscovery tools and methodologies with traditional IT forensics we were able to significantly reduce the volume of data facing the investigation. In normal IT forensic investigations, analysis can be time-consuming and expensive, particularly when faced with a large population of suspects. Our approach allowed us to target the key suspect more easily. Therefore we were able to spend more time in a thorough investigation to answer the questions that would associate the suspect with the e-mail and provide the evidence to the client.


Andrew Pimlott is Fraud Investigation & Dispute Services Manager at Ernst & Young LLP specialising in E-discovery/E-disclosure tools

[1] National Institute of Justice (July 2001) Electronic Crime Scene Investigation A Guide for First Responders. http://www.ncjrs.org/pdffiles1/nij/187736.pdf.

[2] http://edrm.net/wiki/index.php/Main_Page

[3] http://www.forensicrecovery.co.uk/forensicRecovery-ACPO.php

[4] http://www.akl-it.com/en/our-products/duplication-systems/forensic-duplicators/228-forensic-dossier

[5] http://www.radio-tactics.com/products/aceso/

[6] http://www.digitalintelligence.com/software/guidancesoftware/encase/

[7] http://edrm.net/wiki/index.php/Processing_-_Audit_and_Chain_of_Custody & http://en.wikipedia.org/wiki/Chain_of_custody

[8] http://www.nsrl.nist.gov/

[9] http://www.ftitechnology.com/products/attenex_patterns.aspx

[10] http://www.ftitechnology.com/Portals/0/images/Attenex%20Screenshot.png

[11] http://whatismyipaddress.com/email-header

[12] Advanced Data Carving – a paper by S/A Daniel Dickerman, SCERS, GSEC of the IRS Criminal Investigatioon, Electrical Crimes Program 2006