Predictive Coding = Proportionality

June 25, 2012

In a recent US Federal case called Da Silva Moore v Publicis Groupe and MSL Group, in which Recommind’s predictive coding software was being used by the defendants, US Judge Peck sanctioned the use of a predictive coding workflow when he said ‘Computer-assisted review now can be considered judicially-approved for use in appropriate cases.’ The decision has prompted the question ‘What attitude will the judges of England and Wales adopt towards predictive coding?’ from US lawyers and from their UK counterparts.  

Case managing judges in England and Wales are more concerned with outcomes than with methods; our disclosure rules are driven by words like ‘reasonable’ and ‘proportionate’, and we have cases which say expressly that parties are not obliged to look under every stone; without ignoring the possibility of misconduct, we do not, on the whole, assume that the opposing lawyers will conceal documents or lie to us; our Practice Direction 31B requires us to discuss the use of technology and the ‘tools and techniques’ which we intend to use, but we do not need precedents for judicial approval of any particular method, tool or technique as they seem to want in the US; case managing judges have wide discretion which is rarely successfully appealed; lastly, we have Senior Master Whitaker’s judgment in Goodale v Ministry of Justice [2009] EWHC B41 (QB), which expressly approves in broad terms the use of software which will: 

render [the data] down to a more sensible size and search it by computer to produce a manageable corpus for human review – which is of course the most expensive part of the exercise. Indeed, when it comes to review, I am aware of software that will effectively score each document as to its likely relevance and which will enable a prioritisation of categories within the entire document set. 

If you take all these points together, you conclude that there will not necessarily be a general judicial view on the use of predictive coding or of any other ‘tools and techniques’. Parties will argue for whatever they think is right. The receiving party may want to challenge the proposed tools and processes pre-emptively, without waiting for the result. Not every such attack will be a bona fide one – the UK has its share of lawyers who will challenge everything in the US style, on what they call principle.  Others will genuinely not understand. There are cases and data sets for which predictive coding is not appropriate. The judge may have to resolve the resulting dispute, and will do so by reference to proportionality, the facts of the case, and the arguments on each side. What arguments will support the use of predictive coding? 

First, what is predictive coding? Predictive coding software such as that used in the Da Silva case is trained by an expert (usually a senior lawyer who understand the issues in the case) who provides initial instructions to ‘seed’ the software, which ‘understands’ the meaning of documents and recognises patterns within them. Documents are ranked according to likely relevance and those provisionally marked as most likely to be relevant are put quickly in front of appropriately skilled lawyers for review. None of the documents are discarded and the instructions can be amended to narrow, broaden or refine the search criteria according to the initial results. The software itself ‘learns’, based on the feedback it is given and non-qualifying documents are sampled for anomalies. 

What’s not to like? Well, that brief description is open to the easy (by which I mean ‘unthinking’) charge that it is a ‘black box’. Those who assert this mean, if I understand them correctly, that the lawyers cannot see what is being done on their behalf, and feel that they are surrendering their role – and their duty – to an algorithm. That view can be attacked on two grounds – traditional keyword searching and linear review is inaccurate and time-consuming, whilst offering no means short of a parallel and duplicative re-review to check the reviewer’s work. Predictive coding software provides multiple ways to sample and cross-check results and, if necessary, to re-run all or part of the exercise. 

Other often heard criticisms are based on a misunderstanding of what the software is doing. It is not making final coding decisions, but provisional ones; it is not discarding any documents of its own volition, merely ranking them by presumed degrees of relevance; it does not involve the disclosure of documents which have not been manually reviewed – users may decide that this is what they want to do, but that is not the advertised purpose or effect of the software. 

It should also be clear that predictive coding is but one of the tools available to the reviewing lawyers. Keywords and conventional clustering tools retain their place both as a way to find obvious categories of documents and as part of the cross-checking mechanism available to lawyers – if a keyword finds hits in that part of the population which has been provisionally discarded, then some further training may be required. Indeed, one of the by-products of the predictive coding training is the identification of keywords which may otherwise have been overlooked. 

The best way to look at predictive coding is as a means of relegating the least relevant material so that it is looked at, if at all, later in the process and perhaps by junior fee-earners. The provisional relevance ranking permits the lawyers to assess the proportionality of looking further: the top-ranking documents will contain a high number of disclosable documents; as the lawyers progress through these, the frequency of disclosable documents will diminish, with ever-longer gaps between them.  

Perfection is not required by the rules; Rule 31.7 requires only a ‘reasonable’ search. The proportionality expected by the Civil Procedure Rules entitles – indeed, requires – parties to expend no more time and money than is needed to be reasonably sure that they have done their job properly. Traditional methods effectively require that every potentially relevant document be looked at. At its simplest, predictive coding, fortified by sampling, cross-checking and other forms of quality assurance checking, increases the probability that the expensive review process referred to by Master Whitaker in Goodale is focused on the documents which matter most. 

Lawyers are not generally early adopters of new technology. Predictive coding technology itself is not new – anti-spam software uses a form of it to decide whether the characteristics of an e-mail, including the recipients’ prior decisions, make it probable that the e-mail is spam. Many businesses use a mixture of objective fact and past experience to predict, for example, shopping preferences. Master Whitaker, and major litigation firms like Herbert Smith (who are building an in-house litigation management function around Recommind’s solution) are leading the charge in bringing a specialised form of predictive coding technology, and education about its potential, to reduce the time and costs of UK litigation. 


Chris Dale is a former commercial litigation partner turned e-disclosure consultant. He runs the e-Disclosure Information Project and is a leading speaker in that field.