Intelligent Review Technology: How to Stop Worrying and Start Practising

August 7, 2012

Technology users are a fickle breed. The latest must-have gadgets have been known to sell out within the first day of their availability, while other innovations have been treated with so much scepticism that they have floundered almost immediately. 

Sitting somewhere in the middle of this cruel spectrum is Intelligent Review Technology (IRT), also referred to as predictive coding or technology assisted review. This is technology that can read documents and categorise them based on logic learned from human reviewers.  Used properly, IRT offers many benefits as an alternative to traditional search and review methods in e-disclosure. 

Critical to its adoption is a widespread craving for guidance from those brave enough to be the first to use IRT to conduct a reasonable search for the documents that need to be produced in disclosure. To date, there is no such guidance in English case law, but cases currently underway in the US like Da Silva Moore v Publicis Groupe & MSL Group Case No. 11 Civ. 1279 (S.D.N.Y. April 25, 2012) have been stirring up significant interest among those interested in learning how to use IRT in a properly defensible manner, largely because the parties have been encouraged to agree on and follow a protocol. But what should such a protocol look like? 

The case for using it 

First, the parties should be agreed upon the necessity to use IRT. 

The value proposition behind IRT is that it makes the process of document review more efficient by offering the opportunity to conclude document reviews at an earlier stage, without necessarily laying eyes upon every document. With data volumes growing exponentially, this is an attractive argument for its use.  

However, there is little point in using IRT unless the volume of documents to be searched by either party make a manual human review prohibitively expensive. In Global Aerospace Inc., v Landow Aviation the defendants had a clear-cut case for using IRT because they had to first-pass review approximately 2,000,000 documents. By the defendant’s estimations, the process of manually reviewing the entire data set would take approximately 20,000 hours and cost over two million dollars.  At the lower end of the spectrum it is possible that parties may agree that there are benefits to using IRT when faced with a document population as small as 20,000 documents.  

Given that IRT requires the input of a suitably qualified case reviewer to train the system using a seed set of documents, the merits of using the technology on  cases involving smaller quantities of documents may be argued; in any event the smaller the document population, the lesser the need for technological assistance. The questions in the Electronic Documents Questionnaire annexed to Practice Direction 31B are designed to define the scope of the reasonable search for disclosable documents and should help the parties agree on the sources to be searched and assess the likely volume of documents to be reviewed. 

So much choice, so little time 

There are a variety of IRT products in the marketplace and naturally each behaves differently. IRT products typically use algorithms to analyse initial samples of documents that have been reviewed by humans known as seed sets. To understand what makes a document relevant, the algorithms examine the content of the seed set and the decisions applied by the human reviewer to then consistently apply the same observed logic to the remaining unreviewed documents. 

Seed sets may be selected at random or may be carefully constructed. The size of the sample may vary depending on the product used and the general purpose of its use. And each different product offers different types of confidence settings. All of these things could be challenged by an opponent looking for flaws in your choice of technology. In some of the US cases there have been discussions about the finer workings of certain products and those types of discussions may be reflected in UK cases where all parties are using alternative IRT products. In order to avoid algorithm level discussions, it may be possible for the parties to agree to use one common platform, thus eliminating potential disagreements relating to the choice of technology. 

Getting under the hood 

Yet in order to put forward a robust and defensible process, let alone reach agreement with the other side, it is necessary to understand the rudimentary principles of whichever system you decide to use. The key to success when using any type of technology is understanding both its capabilities and its limitations, and developing a sound strategy which, when articulated, addresses each of these considerations. 

For example, it is important to avoid the common misconception that IRT will achieve peak performance after humans have reviewed as few as 1,000 documents. This proposition is untenable when you are contemplating the review of say, 1 million documents. Anyone who has ever supervised a team of reviewers (whether paralegals or attorneys, employed directly or through a third-party) conducting a large review, will tell you that they must be constantly trained and their work regularly checked to ensure that all of the documents are correctly coded (and even then, absolute perfection is seldom achieved). The same is true of computer assisted review and therefore any proposals made to the Court must openly acknowledge the need to provide continuous training, following a set of predefined success criteria. 

Choosing realistic success criteria will require the active engagement of the parties in sensible discussions to plan a review framework that assures everyone of a reasonable and proportionate undertaking. 

In any document review project, there is bound to be a margin of error and, in deciding what is acceptable, lawyers must become familiar with the terms Recall, Precision and F-Measure.  

Recall is the measurement of the system’s ability to recall relevant documents. For example, if the system predicts that 100,000 documents are relevant but there are actually 125,000 relevant documents, then the recall measurement is 80% (100,000÷125,000).  

Precision is the measurement of the actual number of relevant documents among those predicted to be relevant. For example, if a human review of 1,000 predicted-responsive documents demonstrates that 300 are actually not responsive, then the system is 70% precise. 

F-Measure is the combined value of Recall and Precision, so if you have 80% Recall and 70% Precision the overall score would be 75%. The higher the score, the better the performance. 

In order to achieve a high F-Measure score, it will be necessary to perform several iterations of training and to take F-Measure readings at the end of each round. The number and scope of iterations would be the subject of agreement, as would the level to which documents reviewed by the system would be quality checked by human reviewers in response to continued monitoring of the F-Measure readings taken against each sample thereafter. 

Another matter for debate is the extent to which you should pre-filter the documents before putting them in front of the IRT system. It is not clear whether it is best to present a raw unfiltered set of data (to teach the system in a balanced way) or a set of results based on a carefully crafted search, which some may argue is somewhat prejudicial. Until there are better statistics and more guidelines from real cases, the ultimate decision is likely to be a strategic one. 

Optimum process 

In Da Silva Moore v Publicis Groupe & MSL Group, Judge Carter endorsed the protocol because it contained “standards for measuring the reliability of the process and … buil(t) in levels of participation by Plaintiffs.  It provide[d] that the search methods [would] be carefully crafted and tested for quality assurance, with Plaintiffs participating in their implementation.” 

This encourages the thought that, if parties were willing, it may be possible for them to jointly train the system to look for relevant documents. Their combined knowledge of the issues would be a very powerful teaching aid that could deliver real efficiency for all concerned, especially if they were to jointly draft the relevance criteria. 

One suspects that, for the majority of litigants, this might be a step too far too soon but, given that in some cases litigants have already pooled documents for centralised de-duplication in order to save costs, such an approach towards training an IRT system may not be as far away as one might think. 

Provided that confidentiality, data privacy and privilege were adequately protected – perhaps using US-style claw back agreements, or an independently appointed reviewer to do the first pass review (a growing trend for many barristers) – this might go some way towards settling some of the potential anxieties.


IRT is a highly desirable search tool, but also much feared and misunderstood. We are currently witnessing a battle to convince litigators to use IRT, in which the infantry are the e-disclosure vendors while the cavalry are pioneers within private practice and the judiciary. Their weapons are education and the dedication to show that the technology works. 

If you believe that the technology does what it says on the tin, does that mean that you can stop worrying and start using IRT without any further responsibility? Not entirely. If you plan on using this new technology it makes sense to try and hammer out a detailed protocol approved by all parties which describes how the technology will be used and how the results will be scientifically validated.  

Only the foolhardy believe that technology (of any sort) can be used as a magic bullet with little or no supervision of the process, which is good news for those who thought that IRT might spell the end of the human labour force in document review. 

Robert Jones is a Legal Consultant & Electronic Disclosure Expert at Kroll Ontrack UK: :