SCL Annual Lecture 2011: The Advent of Meaning-based Computing

March 9, 2011

This was an SCL Lecture by the leading light in a field that is itself leading edge. It was about petabytes of data and the possibility of an increasing role for lawyers in a tough and fast-changing regulatory environment. It covered Bayes’ Theorem, probability and the bridge between mathematics and reality. Dr Mike Lynch went on to cover legal hold, conceptual search and the impending requirement for big law firms to have direct access to big corporate’s databanks so as to provide insurance against PR debacles and criminal sanctions. In this avalanche of information, it is comforting to think that this can all be understood, to a large extent at least, by considering the technologies necessary to recognise a reference to a dog.  

Although we had penguins, monkeys and elephants in the course of the lecture, if you focused just on dogs, hounds, curs and the like, you pretty much understood the technology. Consider one sentence from the lecture: ‘Snoopy is not as much a dog as a Dalmatian’, grasp its full import and you are more than half way there. That sentence encapsulates four of the basics: information is not always structured, indeed human friendly information is rarely structured; perspective matters – what is the key piece of information depends on who you are and what you want; some information is more valuable than others; and recognising context, which humans are usually good at, is essential if we are to distinguish the ‘good’ from the ‘not good’ (that too was illuminated in the lecture). 

Stripped of a great deal of valuable information (the value having been conceptually determined by me), the message from Mike Lynch is that search technologies have been rubbish, that all the BG (Before Google) generation are much too easily pleased and that we have entered a phase where our carefully crafted keyword search is an insult to the intelligence of the computing power available. What is transforming search is the need to deal with human-friendly information and to find ways to cope with radical regulatory changes. This is forcing people like Mike Lynch to concentrate on the I in IT – not least because there is so much of it and it is ‘smeared’ across organisations in a random fashion. 

So short is the history of the technologies in which Mike Lynch’s company, Autonomy, specialises that he is able to reflect on almost the entire history – although even he does not go back to the Reverend Bayes of Bayes Theorem fame. He understands the limitations that we have all come to accept and has been a major player in the move towards finding innovative solutions. He explained how conceptual ‘thinking’ and perspective can be a part of search and, crucially, discovery. Since most information is unstructured (think telephone call for instance), the traditional computing mind-set, which sees a need to impose structure where none exists, will drown in a sea of unstructured data; Boolean searches and even sophisticated linguistic search approaches will be found wanting. The conceptual search relies on understanding what is actually wanted, the true search requirement, and delivers that, using a series of mathematical tricks based on probability theories that eliminate the unwanted in a nano-second. But once you take that innovation a little further, the computer can be enabled to make intelligent decisions on how to treat the information – the tantalizing suggestion was that it might single out an e-mail from the office moron that threatens to sink a company’s reputation or which amounts to a clear breach of compliance policies and then block it in real time

What clearly focused the attention of those lawyers in the audience, some of whom were struggling with the concepts  and technologies that had been outlined, was Dr Lynch’s predictions about what might in future be expected of law firms. Since legal hold simply does not work in a large corporate environment because of the weight and complexity of the information held, boards are looking to avoid the risk that they will be accused of deceit – when the reality is that it is likely to be cock up not cover up. They will want law firms to have direct access to their information repositories because they need fast legal advice. Law firms may have to become IT experts as well as legal experts if they are to supply what is needed. Basically, clients want to outsource risk and will want law firms to put their own reputations on the line with an early case assessment of complex issues. And this is a new meaning of early case assessment – it can be a 48-hour deadline not 48 days. 

For this to work in the new environment, where the ability to locate crucial data amid the miasma of information of all kinds is crucial, new techniques are vital. We heard about clustering techniques to make things like side-letters stand out, record deletion management so as to minimize volume (and potential embarrassment), back-filling by computer so that a quick assessment can be made of what a regulator has seized and is focusing on, visualisation tools that can find ‘air gaps’ – the sort of holes that set investigators searching avidly – and techniques for dealing with information in every language or that can compress thousands of hours of tape to an hour of relevant material.  

Dr Mike Lynch is a compelling speaker. He has no need for histrionics; he is compelling simply because he is a master of his subject. His lecture was generously sprinkled with amusing and memorable illustrations so that his complex message was always within even a non-technologist’s grasp (without any meaning debased). He has certainly sparked me into further enquiry and I suspect that a high proportion of his audience went away better informed yet full of a thirst for more answers.

