E-disclosure: TAR is Not a Plaything

May 30, 2012

Bart Simpson is perhaps not best known for his in-depth commentary on the use of Technology Assisted Review (or TAR) when reviewing electronically stored information (ESI) in a legal environment but his prescient blackboard scrawling of ‘Tar is not a plaything’ (first aired in 1990) showed a depth of understanding which is only just being attained by those of us in the litigation support community.

TAR has been the talk of the e-disclosure world for some time now (certainly the cognoscenti of the industry have talked of little else since Legal Tech New York 2011). US case law is springing up on the use of TAR in document review projects (see Da Silva Moore v Publicis Groupe et al; Global Aerospace, Inc. v. Landow Aviation, LP (Consolidated Case No. CL 61040)). As I write this article, a live feed of tweets is coming through from a Recommind sponsored panel at one of the numerous e-disclosure conferences. TAR doesn’t seem to be going away anytime soon.

It would seem therefore that this is an appropriate time for a brief, objective, look at TAR and its potential.

What is TAR?

TAR enables computers to categorise a collection of documents (typically as ‘relevant’ or ‘not relevant’) based on the manual review of a subset (seed documents) or subsets of the same collection by a human.  The computers rely on analysing each document in a collection using complex algorithms in order to draw together ‘like’ documents as well as usually giving a ‘confidence level’ indicating how sure the computer is that a document is either ‘relevant’ or ‘not relevant’.

Why use TAR?

The providers of this type of technology (of which there are at least 20 worldwide, 16 of which definitely operate within the UK) claim that this technology will reduce cost and improve accuracy.

Costs are reduced, they say, as there is so much less human intervention and accuracy is improved as computers are not subject to the same levels of subjectivity and inconsistency that humans are. Intuitively both claims seem to be accurate, but for the sake of thoroughness it seems that they both deserve a little more examination.

Reduced Costs?

Traditionally document reviews have been a linear affair with documents being presented to reviewers in chronological order. Review rates in this type of review vary, but let’s say that reviewers review on average 50 documents per hour – so in a review that has 100,000 documents for review it’ll take 2,000 hours to review in the traditional manner.

The argument in favour of TAR says that you only need to review a fraction of those documents to get more accurate results. Service providers seem to be reluctant to actually put a figure on the % of documents that need to be reviewed in order to accurately extrapolate the coding information across the whole document population; estimates vary according to the service provider that you talk to but 15% seems to be about the level that most come out at. It is fair to assume that the person responsible for completing the review of the 15% will be more senior than the average document reviewer and that, given the fact that their responses are to be rolled out across the whole document population, they will progress at a slower speed. So, if we assume that the senior reviewer progresses at a speed of 35 documents per hour, it’ll take them 429 hours to review the 15,000 documents required to seed the entire document population.

If we assume that the document reviewers cost the client £50 per hour (total cost £100,000) and the associate costs the client £200 an hour (total cost £85,800) then the review saving is £14,200. However that saving does not take into account the cost of turning the TAR technology on, which is usually priced at about £100 per GB of data. It is probably accurate to say that 100,000 documents would consist of approximately 40GB of data and so it would cost around £4,000 to turn TAR on, leaving an overall saving of £10,200 (a 10% saving).

Improved Accuracy?

The argument goes that, as there is a single (or group of) well informed reviewer(s) involved in coding the seed documents of a TAR review, the likelihood is that the quality will be far higher because:

1.      the more senior reviewer(s) will know the issues in the case better than a document reviewer

2.      when the computer extrapolates the results it will not make the errors of inconsistency that human reviewers do.

Of course it is likely that humans will make mistakes and so using either method is unlikely to be error free.

So, given that errors are made using either process, is it fair to say that TAR is more accurate? The probable answer to this is that from a pure numbers point of view the answer is probably Yes (ie there will be fewer overall errors). However, it is the qualitative nature of the errors which could give rise to concern.

If a group of document reviewers review 100,000 documents then there will be errors spread across all classes of documents within the population, with documents within a subset of the document population both being coded as ‘relevant’ and ‘not relevant’. The result of this is that, when it comes to a more detailed 2nd-pass review, documents from within the subset will be seen (the relevant ones will be picked up). When using TAR, the risk is that entire subsets will be incorrectly coded as irrelevant and so are more likely to be ignored at the 2nd-pass review stage.

Is TAR better?

So the situation at the moment would seem to be that, whilst TAR offers some potential cost savings and the opportunity for the review results to be of a higher quality, that opportunity is not without risks, principally:

1.      the seed review takes longer than expected which will quickly wipe out any cost saving

2.      the seed reviewer makes an error which results in a relevant subset of documents being wholly ignored.

3.      One of the harder risks to quantify is the loss of the ‘soft’ edges to the review which traditionally help inform lawyers as to the content of a document population (not least by the questions asked by the review team) and subsequently help mould the way in which a case is argued in the later stages. This iterative process takes place as the review progresses and can be very helpful in some cases.

Ways forward

It seems to me that the reason that TAR was developed in the first place should not be forgotten. Quite apart from it being the natural extension to ‘clustering’ and ‘concept searching’ that e-disclosure service providers have been offering for sometime, the impetus for its development is likely to have been the very large disclosure reviews that occasionally take place in the US with small armies of document reviewers (or contract attorneys) slaving away for months on end over millions of documents. When you have a very large number of reviewers reviewing a very large number of documents then the results are very likely to be, at best, inconsistent and there are likely to be very clear advantages from using TAR in your review.

The situation in the UK is not so clear. With regulators giving very precise and often complex instructions on the formation of keyword searches, and most parties to litigation these days being quite adept at tackling the issues of e-disclosure and using the CPR Part 31 and the practice directions to it (as well as the electronic documents questionnaire) in order to reduce the amount of data for review to a manageable level, our review requirements are generally of a smaller scale. That makes the benefits of using TAR less clear whilst the risks associated with it remain the same.

However, I think that there is no reason why TAR and traditional review cannot be used in conjunction with one another, the ‘Assisted’ part of the TAR acronym being of particular importance. If TAR is used to present the documents to reviewers in an order which means that the ‘most relevant’ documents are reviewed first, there will be very obvious time (and cost) savings as well as a higher degree of accuracy as reviewers review ‘like’ documents alongside one another. As the review progresses and the ‘buckets’ of documents that are likely to be relevant are worked through, the reviewers will be left with less relevant documents which should be reviewed at a very fast pace.

Additionally the use of TAR as a quality control tool is a really interesting and very relevant area worth further exploration. If TAR is used to group documents into ‘like’ buckets which are then reviewed by a review team, it is straightforward then to analyse the results of the review against the TAR results to see which documents are being coded by the reviewers in a manner which is inconsistent with the TAR results. By doing this, documents which are either incorrectly coded by TAR or by the reviewers are easily identified and the reviewers who are consistently getting things wrong are also easily identifiable.


The unanswered question in all of this is how do the senior lawyers, who thought that their days of document review were over, feel about coding the initial seed set of documentation? How much of their time will be recoverable? What work were they not doing whilst they were completing the document review? And is it really an effective use of their time and their client’s money?

It seems to me that the halfway house where TAR is seeded using the work product of a review team and then the consistency and quality of reviewers is quality controlled using the TAR results, whilst at the same time the overall speed of the review is increased by presenting like documents to reviewers, is more likely to produce a faster, higher quality review which lawyers and clients are comfortable with than the pure use of TAR with no review team input.

If the example used at the start of this piece is used and the review speed of the team of reviewers is increased to 70 documents per hour (over the course of the review) using TAR (from 50 without TAR), the saving would be approximately £24,500 over the linear review and £14,500 over the pure TAR review.

Unarguably the use of TAR offers lawyers the ability to save time and money, but simply using it as a substitute for a ‘traditional’ review may not maximise the savings on offer (or keep the workforce happy). 

Through his firm i-Lit Limited (www.i-lit.co.uk), barrister turned technogeek, Mike Taylor, has advised law firms and their clients on e-disclosure best practice and strategy since 2006.