The Power of Metadata

November 1, 2004

There are many instances in litigation where precise information about the provenance of a document can be crucial and investigators and litigators are fortunate that accurate interpretation of document metadata can often provide it.

What is Metadata?

Quite simply it is data about data. In the context of electronic documents, metadata is information about the document that is routinely recorded by the software that created the document but which is not shown on the face of the document.

Metadata can be accessed in several ways: by viewing the properties of the document in the application that created it (eg MSWord) or by using specially written software. Typical metadata includes:

· The name of the document.

· The author of the document, as determined by the computer system. This information is not necessarily a reliable guide to who actually created the document or who last worked on it. Documents may pass through many revisions and be copied and sent to many different people. The authorship information will, however, not necessarily change. For a complete analysis it would be necessary to have the underlying information about how and when the system recorded the name of that individual or company as the document author.

· The company from which the document originates. This information is subject to the same caveats and limitations as the authorship information.

· The location of the file on the computer system.

· When the file was created -a record of the time and date when the file was created at the location from which it has been opened.

· When the file was last accessed – a record of the time and date when the file was last opened.

· When the file was last modified – a record of the time and date when the size of the file changed. This is usually a reliable indicator of when the file was last worked on and data was added or deleted.

· By whom the file was last saved.

Additional metadata that can be extracted by use of specialist software includes:

· when the document was last saved

· when it was last printed

· the identity of last 10 authors and document locations – this can be a very valuable piece of information as it can indicate how a document got to its current location and who previously worked on it.

The time and date stamps on files can be extremely valuable but care must be taken to check the time on the computer’s internal clock as this can often be significantly different from “real” time and, if computers are being operated in different time zones, the time zone setting on the computer from which the document originates must be checked and any necessary conversion must be made. This is particularly important if documents are being sent across different time zones (eg from the US to the UK) and precise times are important to the case.

Metadata can be extracted from any live document or any recovered deleted document. It is, however, essential that a proper forensic copy of the original file is made before metadata is analysed: this must be done to preserve the original time and date stamps and other potentially important information about the file. It is likely that a non-forensic copy will contain inaccurate information because, if the file has been opened and then copied from the original computer system, time and date stamps and possibly other information will be changed.

The information contained in metadata can be combined with other information obtained from other investigative techniques (for example e-mail and phone record analysis) to add significant value to an investigation.

Case Study

A UK company was faced with claims by a former employee alleging unfair dismissal and sex discrimination. These claims were disputed. Computer forensics experts were asked to carry out a forensic examination of a computer belonging to our client and used by the employee to look, among other things, for copies of two letters.

The letters in question were addressed to the employee’s line manager. Both were dated 22 September 2003. The first was a draft and had been sent for comment by e-mail to a senior member of the employee’s team on 22 September 2003. The second had been handed as hard copy to the employee’s line manager on the afternoon of 26 September 2003. The second letter was identical to the first except that it contained a paragraph alleging sex discrimination.

It became very important during the litigation to establish exactly when and, if possible, on whose advice the second letter had been created. The employee was claiming that both the draft and the final version of the letters had been written on the same day.

A forensic image of the computer was taken and this was searched, using key words and time and date searches, for the two letters. Copies of both letters were found and it was possible to determine very quickly from the basic properties encoded into the documents that the first had been created late in the evening on 21 September (a Sunday) and the second during the morning of 26 September. By deeper analysis of the metadata in these documents it was possible to tell who the authors were and when the documents had been printed. It was also possible to determine the file path of the earlier versions of the documents. That is to say it could be said how the document had got onto the computer.

The analysis confirmed the creation dates and also showed that an early version of the second letter had been copied from an external storage device (probably a USB pen drive). This letter had been copied from the device to the desktop and then to the “my documents” folder on the hard disk of the computer. Metadata about the original author of the letters and the company from which they had originated was cross referenced with records from the mobile phone issued to the employee and e-mails found on the hard disk.

Taking all this information together, we were able to say with reasonable certainty from whom the employee had received the letter and who had provided advice about the crucial paragraph which, as the metadata showed, could not have been written on the date on the face of the letter.

Metadata and Disclosure

The American Bar Association paper Managing and Planning for Electronic Discovery (August 2003) states “ Electronic documents contain “hidden” information not reflected in paper documents. such hidden information. complicates review of the documents in advance of production as the reviewing persons must know how and where to look for such information and determine whether reviewing for such information is needed”

In summary, the CPR disclosure rules state that:

• A document is “anything on which information of any description is recorded”. This clearly includes electronic documents.

• Standard disclosure entails a “reasonable” search for all documents supporting or adversely affecting either party’s case.

• Specific disclosure can be ordered at the request of the parties; such disclosure may relate to specific documents or classes or documents.

Relevant factors to be considered in deciding if a search is reasonable are:

· the number of documents involved

· the nature and complexity of the proceedings

· the ease and expense of retrieval

· the significance of any document likely to be located.

Bearing these factors in mind, under what circumstances might a party be obliged to produce a document in its original electronic format preserving the original metadata?

“The [CPR] contain no guidance whatsoever as to the type of search which would be reasonable in the context of electronic data. there are no hard and fast rules that could possibly be devised which would be appropriate in every piece of litigation or, for that matter, even in every piece of commercial litigation. it should, it is submitted, be very rare for any disclosure exercise to require recourse to replicant data, back up data or residual data . a possible exception might be a fraud case” (Electronic Disclosure – a paper for the Commercial Litigators’ Forum see C&L, vol 14, iss 5).

It seems to be accepted that standard disclosure ought to encompass a search of the active data on a computer system. The CLF paper classifies computer data as active (directly accessible), replicant (automatically created by the desktop computer), backup (created by the system administrator to restore in the event of an disaster) and residual (that which is regarded as deleted but may be capable of being recovered using computer forensic techniques).

There may be many cases, and not just those where fraud is alleged, where information contained in metadata will be immensely valuable to provide evidence as to an important issue. In these cases, it is important that those documents the provenance of which is disputed are identified at as early a stage as possible and agreement obtained (probably at the case management conference) or orders sought for their preservation in their original electronic format.

It is likely that, in the case of a disputed document, a search would have to be undertaken for residual copies of that document. This is because the existence of earlier drafts and the circumstances of their creation may be crucial. This will inevitably mean the use of computer forensic techniques to recover residual data. That has cost and time implications but these not need be as great as imagined if the exercise is approached in a pragmatic and reasonable fashion: it is possible for write protected copies of individual files or groups of files to be made in a forensically sound manner.

Once identified and located, the document should be copied in a forensically sound way to preserve its metadata and a procedure agreed upon for extraction and examination of the metadata. It may be that the party requesting the search for the disputed document will have to, initially at least, bear the cost of production (see para 17 of LiST’s Draft Practice Direction on the use of Technology in Civil Proceedings: “17. Unless the parties agree otherwise or the court orders otherwise, the costs incurred by a party in arrange the use of and using technology will initially be borne by that party. This is subject to the court’s general discretion in relation to the payment of costs.”).

In commercial litigation, it is our experience that in appropriate cases, particularly where fraud is alleged, the courts are likely to be sympathetic to applications for specific disclosure of document metadata.

Simon Dawson is Head of Corporate Investigations at The Risk Advisory Group Limited: