Pre-culling and Non-native Restoration

February 9, 2007

When embarking upon the e-disclosure process, many corporations, organisations and law firms are overwhelmed by the volume of data that needs to be examined ¡V not to mention the costs associated with restoring and processing relevant files.

The plentiful supply of inexpensive electronic storage options available today,  coupled with increasing regulatory compliance demands, has prompted companies to archive any file, document or correspondence they feel may be valuable at some point in the future. As a result, companies retain tremendous amounts of information, saved on hard drives or back-up media tape. This presents a significant challenge when these firms are subsequently required to produce responsive documents during litigation or regulatory compliance activities. The larger the data pool, the longer it takes to uncover and prepare relevant documents ¡V and the higher the price tag associated with the process.

Advanced technologies are available, however, to reduce the time and costs inherent in e-disclosure activities:

„X Pre-culling strategies allow corporate officers and their lawyers to view data structures and files in their raw native format, allowing exclusion of non-relevant information prior to expensive restoration and processing.
„X Non-native data restoration allows firms to restore data without having to re-create the originating or ¡¥native¡¦ environment ¡V that is, the combination of hardware and software used at the time the materials were preserved.

Back-up tapes as incontrovertible evidentiary sources

The vast majority of organisations rely upon back-up tape to archive files, records and documents. Besides being cost effective, tape also offers the advantage of being highly portable and can therefore be readily transported for off-site storage.

Tape is also well regarded as a rich evidentiary source, because it is effectively tamper-proof. While the tape can be damaged or destroyed, it is virtually impossible for an individual to extract a specific file or record from the tape, make changes and reinsert the altered document. Its value in this regard was confirmed in October 2005 when Part 31 of the Civil Procedure Rules was modified so as to change the definition of ¡¥document¡¦ to include specific reference to tape as an evidentiary source in e-disclosure.

While few argue the integrity of back-up tape, lawyers and corporate officers alike contend that the costs inherent in restoring data contained on tape are prohibitive. They argue that it is often extraordinarily expensive to restore and process years of tape in its native format, in the hopes of uncovering documents that may or may not have an impact on current litigation or regulatory compliance activities. In this regard, they maintain, documents stored on tape can hardly be considered reasonably accessible.


The arguments about the downside of back-up tapes involve an accurate assessment if conventional means are employed to restore data from tapes. Traditionally, organisations would be forced to engage in an expensive series of activities to prepare data for electronic disclosure. These would be most likely to entail restoration of all tapes contained in the data pool, as well as subsequent processing activities (eg de-duplication, culling, keyword searches and data filtering). Each of these steps requires an investment of time and resources, often adding thousands of pounds to the costs.

Pre-culling circumvents many of these steps by providing users with a means to view data structures and files in their raw native format prior to expensive restoration and processing. Pre-culling techniques render the data sortable and searchable by a variety of fields, including subject matter, keywords, content, context, custodian, metadata and others. Parties to the e-disclosure process are then able to view the data from numerous perspectives, making early determinations about the relevance of the data, and thereby saving time and cost by excluding extraneous documents.

Pre-culling typically follows a progression of steps:

1. Header scans reveal all available information in the back-up header (eg back-up dates, back-up software type, internal volume identifiers). This process allows the user to enter a window of time from which responsive material must be produced ¡V a period of 18 months prior to the merger between two corporations, for instance.
2. Server scans isolate back-up clients or servers that contain responsive data of particular interest. During an IP case, for instance, this process can identify servers germane to the matter and allow counsel to exclude irrelevant file systems ¡V like unrelated human resource HR network shares.
3. File-level catalogs isolate file names, document creation and modification dates, file or directory pathways, and other information contained on the tape. This allows the user to extract only those documents created or handled by individuals affiliated with the litigation or regulatory investigation. Further, these catalogs pinpoint the specific back-up tapes that contain these relevant files.

By employing any combination of these techniques, specific tapes that contain files of interest are isolated ¡V thereby significantly reducing the volume of data that needs further processing.

Non-native techniques

Once relevant tapes have been segregated, parties to the e-disclosure process must then retrieve the archived information for review. This can present a tremendous challenge if those attempting to restore the original files try to recreate the native environment ¡V ie the precise combination of hardware and software ¡V in which the documents were saved.

Native restoration attempts are frequently stymied for the following reasons.

„X In many instances ¡V particularly with older archives ¡V the operating systems and applications used to store information are unknown. In addition, the employees who created the files or maintained the system may no longer be with the company in question and the current staff can provide no assistance.
„X Often the programs used to originate or archive the documents are obsolete. Software is rewritten and updated at an extraordinary rate, and current versions may not support older files. Likewise, support from the manufacturer may no longer be available, rendering the systems needed to run the programs unattainable.
„X Personnel who may be familiar with the contents of stored or back-up media may no longer be employed by the enterprise ¡V and therefore can provide no insight regarding which files are important and which are not.

Non-native restoration, however, negates the need to utilise original hardware and software when restoring data stored on tape. Best-of-class technology vendors have developed functionality to ¡¥decode¡¦ the intricacies of various drives, operating systems and software applications. Equipped with tools and resources developed through years of electronic data disclosure projects, vendors like eMag Solutions are able to quickly and cost effectively access data even when no prior knowledge about the host hardware, content, application or format is available.

In other words, the use of non-native restoration applications enables the retrieval of data that would have been considered inaccessible ¡V either because of technology limitations or the cost of restoration ¡V in the recent past.


With growing emphasis on the value of tape as a reliable evidentiary source, lawyers and corporate officers will routinely be expected to produce documents contained in these archives during e-disclosure. While technological barriers, time-consuming processes and unreasonable expense may have interfered with a firm¡¦s ability to comply in the past, advances in the field have minimised these impediments. Progressive e-disclosure vendors are able to mitigate the most significant obstructions by 1) reducing the volume data that needs to be restored by utilising efficient pre-culling activities, and 2) streamlining the process of restoring relevant documents by employing non-native techniques that eliminate the need to reproduce the hardware/software combinations used to originally create the file.

Ian Bartlett is a Solutions Analyst with eMag Solutions (UK Headquarters, Cardiff Wales), a leading electronic discovery company specialising in accessing electronic data from a variety of archived sources.