Filtering Electronic Evidence

January 1, 2004

As the dependence on e-mail and personal computers in standard business practices increases, UK solicitors are beginning to realise that paper is no longer the sole source of evidence. Indeed, quite the opposite is true. Research in the US estimates that 2.8 billion e-mails are sent every day, with predictions that this will grow by 150% annually. In addition, nearly 90% of all corporate documents are created and stored electronically and 70% of these never migrate to paper. Consequently, it is essential that e-mail and other electronic documents be taken into account in the courts. While electronic disclosure has not formally surfaced in UK case law, several arguments and recent events, such as the Hutton Inquiry, highlight the value of considering electronic documents in legal actions.

In the US however, electronic discovery practices (as they are known in the US) have taken off with full force, and lawyers are routinely relying on electronic discovery technology to help them with collecting e-mail and documents contained on floppy disks, hard drives, servers, and back-up tapes. Our US colleagues are finding that this technology can greatly streamline the process of searching for and producing relevant documents, thus making litigation far more cost-effective and efficient.

So, how does electronic disclosure technology work? How are US lawyers integrating it in their cases? And can this technology apply in a similar way in the UK? Overall, the aim of electronic disclosure is to focus and limit the search from the outset. This is done by:

  • focusing on where to look – ie which users and date periods
  • limiting that data universe by conducting key word searches and other techniques.

This article analyses some of the most common electronic disclosure sampling and filtering techniques that can help you manage the use of e-evidence in your next case.

Determining what you need to search

It is clear that any e-disclosure exercise must meet a high proportionality hurdle within the CPR, ie the legal and other costs incurred must be in proportion to the size of the claim. Generally it would be unreasonable for a producing party to search all e-mails and documents from every hard drive, server, and back-up tape if the claim does not substantiate that need. Instead, the best place to start an electronic evidence investigation is by limiting the search by custodian and time/date frame.

Custodian filtering – In the paper document disclosure world, solicitors begin by advising their clients to segregate the paper files of individuals who are relevant to the case. In this respect, electronic discovery is no different to paper discovery. Instead of finding documents located in paper files and filing cabinets, the relevant electronic documents are often located in key computer systems such as desktop and laptop computers or back-up tapes. Electronic evidence experts employ technology to segregate the key custodians who may be relevant to the case and isolate the files associated with those specific individuals for further handling.

Time and date filtering – Another piece of technology that can be applied in the context of electronic discovery is that of the date and time filter. This allows solicitors to target discrete periods of times, which are particularly relevant to a case or which are required to be produced in accordance with a pending court order. If backup tapes are at issue in the case, practitioners may segregate tapes for particular time periods according to the recycling schedule or the backup schedule of the clients’ archival tape procedure. For example, tapes are often catalogued by year, month, week, or day. Therefore, solicitors may request from their client’s IT department, tapes representing backups for time periods of particular interest in a given case.

By restricting the electronic data search by these parameters, solicitors can begin to meet proportionality concerns set forth in rule 1.1(2)(c) of the CPR. In the United States, courts have also recognised the important role that the proportionality principle plays, and a practice known as ‘data sampling’ is being regularly used by the courts to determine if producing a larger volume of electronic evidence is worth the cost involved. For instance, in Zubulake v UBS Warburg, 2003 WL 21087884 (S.D.N.Y. May 13, 2003), the court ordered the defendant to produce, at its own expense, all responsive e-mail on five back-up tapes as selected by the plaintiff. After the results of this sample were produced, the court stated that it would determine if electronic discovery of the remaining ninety tapes was warranted. When data sampling is used, it is in the requesting parties’ best interest to identify back-up tapes or other media that would most likely produce the largest amount of responsive data to support their case. Sampling procedures such as those occurring in Zubulake ensure that the parties do not incur expenses beyond the value of the claim as they search for the needle in the haystack.

Further narrowing the search

Once the solicitors have determined what needs to be searched, the electronic evidence expert restores the data contained on the selected media and converts it to a common format for further processing. It is at this time that advanced filtering options can be applied, further narrowing the universe of

electronic evidence.

Keyword searching – One of the distinct advantages to employing electronic evidence technology is the availability of keyword and term searching in order to segregate potentially responsive information for further review and scrutiny. When using an electronic evidence expert to conduct this data searching, as opposed to conducting the searching within individual files using software applications, the solicitor receives the benefit of a keyword term search across the entire document universe or data set all at once.

Depending on the capabilities of the individual electronic evidence expert retained, keyword searching may be very complex, or very simple. In our experience, a list of keyword terms that is between 30 and 50 terms long is recommended in most situations to result in finding potentially responsive information while not being over inclusive of irrelevant data. When significantly more terms than this are included in the document universe, the practitioner runs the risk of diminishing the benefits of the keyword search and including documents which have no bearing on the subject matter of the case.

Additionally, lawyers should consult with their electronic evidence expert in order to determine whether they have a list of ‘noise words’ which the expert advises against applying to the document universe. This list of noise words, such as ‘it, a, and, the’, if included in the search would result in unreasonably high number of ‘hits’ being returned. Additionally, most sophisticated e-evidence experts also recommend that no initials, acronyms, or two to three character words are applied against the document universe, unless special care is given to insure that these terms are handled properly and deliver results

as intended.

Privilege searching – The mechanism of keyword searching may also be used to segregate potentially privileged data for further review and scrutiny. This type of searching is generally accomplished by a combination of identification of data from custodians, who are likely to create and maintain privileged information, such as in-house and outside counsel, and the application of keyword term searches aimed at specifically identifying potentially privileged information. Such terms may include words such as ‘privileged, confidential, attorney, work product’ and other similar words, depending on the circumstances of the particular case. Employing this type of technology may speed up the identification of privileged information, and may help to avoid the pitfalls of the inadvertent waiver of privilege.

De-duplication – When electronic documents are gathered which span many time periods or encompass many individuals’ electronic information, there is bound to be duplication within that data universe. Consider, for example, how frequently in large corporations, individuals send e-mails ‘company wide’ on topics that are both mission critical and mundane. These e-mail messages are then duplicated, not only in the mail box of each individual throughout the company, but also on daily, weekly, monthly and/or quarterly back up tapes for each person! This explosion of data can be enormous.

For the lawyer conducting e-discovery, this duplication adds unnecessary risk and costs to the review and production of electronic information. Further, when employing large teams of lawyers or paralegals to conduct reviews without ‘de-duplication’ of electronic data, there is a great risk that individual reviewers will make decisions as to the responsiveness or privileged nature of documents that are not consistent with each other.

This problem can be solved through the use of e-disclosure industry technology, known as de-duplication. Essentially, de-duplication involves the identification of documents that are duplicates of one another and the elimination of these duplicate documents from the review and production set of documents. De-duplication can decrease the number of documents that need to be reviewed by as much as 90%, and by 30 to 40% in the average case.


As seen below, the e-explosion is generating thousands and millions of pages of e-documents, some of which will contain disclosable documents. Take the example of a network hard drive, which can hold 40 gigabytes of data. If 4,000,000 pages were to be printed on 80gsm A4 paper, it would weigh an astonishing 20 metric tonnes. Using proprietary technology that (a) limits the data to relevant time periods and relevant document custodians, (b) searches the data by keywords, and (c) eliminates duplicate instances of documents and e-mail, solicitors can narrow this vast universe of computer-based documents down to a more manageable set. It is our experience that filtering technology can reduce the volume of paper by an average of 85% -95%.

Type of Storage Medium

Approximate number of pages

before filtering

(assuming a mixture of file and e-mail data)

3.5” Disk (1.44 Megabytes)


CD-Rom (625 Megabytes)


Laptop Hard Drive (2 Gig)


PC Hard Drive (4 Gig)


Network Hard Drive (40 Gig)


The cost and review time savings associated with such technology speak for themselves, making it only a matter of time before e-disclosure technology really catches on in the UK. However, for those law firms that continue to rely on good old-fashioned paper in their search for vital evidence, who knows how much highly-sensitive information and how many ‘smoking gun’ e-mails are being missed?

Tom Hopkinson is a Legal Consultant in the UK Electronic Evidence Services group of Kroll Ontrack Ltd. He can be reached at Michele C.S. Lange, Esq. is a Staff Attorney in the US Electronic Evidence Services group of Kroll Ontrack Inc. She can be reached at