AustLII, BAILII and more

April 30, 2000

Andrew Mowbray, the Co-Director of AustLII, is an AssociateProfessor at the University of Technology, Sydney. He spoke to Laurence Eastham,the magazine’s Co-ordinating Editor, in London on 10 March.


LE: How did you get into the computing side of law?


AM: I did computing science before I did law and so Icame into it from that perspective. In the early days, we were doing expertsystems research and that’s what started me in an academic career in law.


LE: But with a degree in computer science you could havegone off into the market place and made your fortune in computing science or incomputers.


AM: That sort of thing did not interest me very much. Ithink the combination of computing science and law has always been somethingthat has been good for me – because there have been so few people that have thisexpertise. Very often people bringing together different sorts of expertise fromdisparate areas produce some interesting results.


LE: How did your association with Graham Greenleafbegin?.


AM: Originally I met Graham in about 1984 when he wasteaching at the University of New South Wales and I was a student.


I built an expert system shell called ‘LES’ which wasoriginally part of the assessment for one of his courses and we went on fromthere. We have worked in partnership for 16 years. Graham is the writer and I dothe technical things normally.


LE: How did the idea for AustLII arise?


AM: I think we had this thought a few times before weactually did AustLII. By the time we got to 1995 the Web had happened and it waspossible at that point to do things in a way that hadn’t been possible before,although we had actually been working with hypertext systems and free textsystems (but all things that we had written ourselves) prior to that point. Itwas just this crazy idea to put in a grant application to get some money fromthe Australian Research Council to set up a legal research infrastructurefacility – and they gave us the money and at that point I guess we built thesystem. We put in the application for the grant in 1994 but didn’t start until1995.


LE: Were you surprised yourself by how quickly it worked?


AM: Yes. I think that originally we were sceptical aboutwhether or not AustLII would fly as a concept. We had no data, we had no peopleother than ourselves and it was the sort of thing that was on a very tighttime-frame – we had a year to spend the money. The grant was for $110,000 plusuniversity contributions of about $25,000 each I think, so we had $160,000 forthe year – and by the end of the year we had to have something going that couldbe sustained. A lot of AustLII has been built from a sense of urgency – it hasbeen something from day one where we had to just get it to go and we had a hellof a lot of luck. Timing was crucial: the Attorney-General’s department hadfor a long time run a system called Scale (using the old status software whichyou might recall from Eurolex days). Basically they had a very good set of databut the software was so user-hostile that nobody could really use it outside ofsome of the government departments. So there was a very rich source of data inbeautifully consistent format that we could access once we had permission. Therewas a little bit of ‘crash and crash through’ at the beginning but, withluck and help from the Law Foundation (and particularly Vince Bruce, a supremecourt judge in New South Wales, who got us contacts and so on in other courts),by the end of the year we had something like 14 databases up and a substantialuser base and the system was well and truly established.


LE: Was your aim always as ambitious as it is now?


AM: I’ve been teaching legal research to students sinceI started in 1986 and Graham probably a bit longer than that. We had basicallybeen teaching people how to use commercial systems, which in their own right,whilst they were free at university, were things that I think students and staffcould get a lot out of. The reality was that once the students got into thebusiness world or into practice, the systems were unaffordable and the levels ofusage for those systems were extraordinarily low. In fact there seemed to be avery positive decision on the part of the electronic data providers in thosedays to go for low usage and very high prices, which from the public interestpoint of view is not a good thing. So it seemed to us, out of a sense offrustration more than anything, that it would be nice to have something that wasgenerally available to anybody who wanted to access it. I think that Graham andI both had a strong feeling that public legal information, information which isproduced by courts and governments, ought to be in the public domain and itought to be something which ordinary people would be able to access.


LE: Is that something you went into the project thinking,or was that something that emerged in the course of doing it?


AM: I guess that we didn’t have an extremely clearvision when we started but it became very clear that once we had the systemgoing that we would have something that was going to be very big, in Australianterms anyway. The commercial world was demonstrating a basic inability toprovide this sort of information in a way that was good for the people. Grahamand I come from university environments, and the role of universities is notjust to educate individuals but also to provide something back to the communityto justify their overall existence.


LE: Something a lot of universities forget I think.


AM: If universities forget their public role then Idon’t think they will be here in a short space of time. I imagine that in theUK it is the same as in Australia – that we are being increasingly pushed tobecome more and more business orientated and to generate more and more of ourown funds. I think it is something that is very sad and leaves a social gap.


LE: So by 1995 you had established something successful.Since then have the biggest changes been in the width and the growth in contentthat you have had?


AM: Yes, but the initial 12 months was a period of prettyintense development. We needed a search engine so I sat down and I wrote theSino search engine. We needed to have better approaches to massive mark-ups so Irewrote the mark-up software so that it was scalable to many gigabytes of text.But after that we went through a bit of a phase of consolidation and the size ofthe databases grew from 14 at the end of 1995 to about 80 now. We always saidthat we had an aim to achieve national coverage (which we define to mean theconsolidated legislation from all Australian jurisdictions, that is all statesand territories as well as the Commonwealth, and material from all of the majorcourts in all the states). We got to that point about half way through last yearwhen some of the more difficult data providers finally relented and gave ustheir material to complete the collection. But, whilst that was happening, wehad developed the Worldlaw service and spidering software. This allows people toaccess software not only on AustLII but off AustLII, so we have one of the mostcomprehensive centrally maintained databases and searchable distributeddatabases of targeted legal materials in the world. That was spurred on byproject Dial, a project to develop a search facility primarily for legislationbut also for other legal materials from developing Asian countries – we hadquite significant funding from the Asian Development Bank for that. Thatactually took us off in a slightly new direction. There are always various bitsand pieces of research – we have kept the expert systems research going and weare still trying to see if we can add value to what we are doing withinferencing technologies.


LE: What is inferencing technology?


AM: Prior to AustLII coming into being, we had somethinglike the hypertext and free text facility that you are used to on the Web but italso would allow people to say ‘OK I’m interested in this piece oflegislation, I’ve got a specific problem and I want to have an answer toit’. The inferencing engine, with a knowledge base built for the purpose,enables the user to click on a section and the system then starts askingquestions ‘have you done this?’ or whatever, and eventually it will givesome sort of answer. The problem with this sort of technology (and of artificialintelligence and expert systems technologies generally) has been historicallythat it is difficult to scale. You can do it if you target a very small problem.If you put a lot of work into it then you can get something which actually willwork reasonably well. With AustLII we are dealing with a great deal of materialand we need to have something which is scalable across the database. We neededto develop methodologies to allow us to build very large knowledge bases. It isa fairly optimistic goal but that is basically where the research has beenfocused. We are in the process at the moment of developing a new expert systemshell to replace some of the older stuff that I have written.


LE: Are you contemplating people asking questions innatural language?


AM: The knowledge bases themselves are currently writtenin a quasi-natural language, they still have a structure to them but you canread them. The questions that are asked are natural language questions and theanswers are ‘yes’, ‘no’, a name a date – there are no chunks of textthat we try and parse anything from. It’s a rule-based system essentially andwe use a process of backward and forward chaining to draw inferences and todrive the process.


LE: The enquirer is not asking the question, the enquireris answering questions to limit the field of enquiry?


AM: That’s right. The enquirer asks the initialquestion: ‘I want to know who is going to receive under this will’ and thenthe system will start asking the questions which will be driven by theunderlying legislation or case law and the system will ultimately try to givethem an answer which is explained. At the moment we haven’t got anything thatI would call really production oriented; AustLII does a range of things frompure public interest work to fairly heavy research and that’s right at theresearch end of what we do.


LE: One of the things that I recall from GrahamGreenleaf’s presentation in November was him saying, and I’m sure I’mparaphrasing, that if we can’t do it automatically, we don’t do it at all.


AM: Yes, that is still fairly true. There are 21 millionlinks or so in the system and they are all put in automatically. That is thehypertext market software which we were talking about previously.


LE: That is the thing that probably has astonished me.Was that there at the start or is that something that has developed?


AM: We had been working with hypertext systems for sixyears or more prior to AustLII. We had developed some automated approaches toinserting links based on textual regularities. An approach based on the Sparseparsing mechanism. By ‘Sparse parsing’ I mean you locate bits of a casewhich look interesting and then you try to parse them so as to identify areference to a section of an Act or something that the program understands. Itthen tries to extract the relevant information from it and inserts its bestestimate of where the links should go. The process that we have gone throughwith the development of the massively automated hypertext mark-up software isessentially a compromise between getting it right most of the time and trying topick up as much as you can. It is all very well to pick up 100 million links onthe system but if 50 per cent of them are wrong then it is not something thatpeople will find acceptable so you need to adjust it so that you are stillapplying a fundamentally heuristic approach, something based on rule of thumb,something which is in a sense designed to make errors – but the error rate hasto be very small. It can’t be deterministic because the English language istoo complex but if you can get it so that it is about 99% right and keep thatbalance, then you can have a lot of hypertext links without any manual editing.


LE: So, with the knowledge that there are going to besome errors, you then don’t look for the errors, you just accept that thereare going to be some.


AM: When you are dealing with such huge amounts of data,you have no choice unless you want a very big editorial contribution to themaintenance of the system. You’ve got to remember that a lot of thefundamental parts of AustLII were built in its first year when the technicalteam originally was three of us, and then only two. We were maintaining theboxes that it runs on, writing the software and everything else – so we didn’thave the time to sit there looking at it wondering if a link was right or not.


LE: The search engine for AustLII seems to have changed.


AM: The interface to it has changed but at the back-endis a program called Sino which was originally written in a very short space oftime. It was designed to be small, elegant, fast and not to get too worriedabout the space overhead that it imposed. All programming is a design compromisebetween being fast and being space efficient. Over the years Sino has developedand we have needed to add a lot of extra functionality to it. Most recently wehad to add the facility to have what we call ‘virtual concordances’ – tohave a number of indexes appear as though they are one from the userperspective. This was because our index sizes were just getting bigger than themaximum file size limits on the system we were working with – we were workingwith 32-bit operating systems. There have been a lot of behind the sceneschanges to Sino to support the database and give extra functionality. But thebiggest change in the new interface has been to try and make the system moreobvious from the user perspective and we went through quite a lot of userfeedback. We literally got users in and sat them down and watched what they did.We had a formal set of tasks that we wanted them to go through and somebody satwith each person and watched. We adjusted the system until we got to a pointwhere we thought it was not a bad compromise given that we had such a range ofpeople using it, from well trained law librarians through to members of thepublic who might have only seen the site on the occasion that they visited it.


LE: So what do you see happening next from the AustLIIperspective? Is it going to be a case of moving beyond your originaljurisdictions?


AM: I think that there is an element of that certainly. Ithink that AustLII is fundamentally a centralised database; we have taken datafrom a number of database providers and we have put it into one central system.The Web in a sense allows you to work in a much more distributive model andultimately it will be sensible for each data provider to provide their ownmaterial – but in a way which can be consistently presented and is seamlesslysearchable. I don’t think we are at that stage yet, I think that the AustLIIsystem, the centralised model that it works upon, is probably good for perhapsanother ten years.


LE: So ten years plus on, you wouldn’t worry about anAustLII equivalent, you would just have everybody holding their materialavailable on the Web in one way or another and you would send your search engineoff to find what you want.


AM: That’s going to require the development andacceptance of standards in relation to interoperability which just aren’tthere at the moment. At the moment there is a danger of total fragmentation.


LE: What drew you to becoming involved in onlineprovision of British and Irish law?


AM: One factor which has been a constant reminder thatwe might usefully do something has been the regular reference to Englishmaterial. It is frustrating for Australian law purposes to have references toleading English cases and then not be able to access them.


A second factor is the increasing international content ofAustLII and our widening involvement with other jurisdictions.


The third factor was our perception that the UK was fallingbehind in this area – not only because of AustLII but by comparison with LEXUMin Canada and the Legal Information Institute at Cornell.


Above all of course we believe in the idea and we were simplykeen to help. We had been over and given a presentation as long ago as 1997 andmaintained links, with SCL and other organisations and individuals. Back in 1997we were aiming to raise expectations – anything that arose from that had to bebetter than what was available then. Our involvement has mushroomed followingGraham Greenleaf’s visit in November.


LE: So you produced BAILII?


AM: I had begun to feel that people were getting hung upon detailed issues. We felt that by producing something we would at leastdemonstrate that it could be done and make people focus on what they liked anddid not like about what we had done. It was not a case of pretending it’s thateasy – but it was important to demonstrate that it could be done.


LE: I have to say that I am very impressed with thestructure that you have come up with – from a relatively brief inspection itseems very sound. What do you see as the main challenges ahead in the UK?


AM: The biggest challenges are not really the technicalchallenges. The process of motivating and organising the data suppliers isvital. We need to change attitudes, especially in institutions who might seepotential BAILII data as their own information with revenue potential. Thecourts have to take responsibility for recording judgments, for keeping copies,numbering judgments and so on.


LE: In Australia, I understand that you now actuallyreceive consolidated amendment legislation direct from the draftsman.


AM: That’s right. That’s the level of co-operation wenow have and which the people in Britain and Ireland need to aim for. Of courseyou also have enormous challenges in relation to creating backsets ofinformation – cases, and legislation, from earlier years is obviously crucial tothe success of the project. Some of this is proprietary commercial information.


LE: How long do you see your involvement and AustLII’sinvolvement, in BAILII lasting?


AM: I would hope that we can handover responsibility forthe pilot to a locally based organisation in six months or a year. I don’t seethat as the end of our involvement – we would always hope to be involved. Butwhile there is no technical reason for us not to host the site, there is a needfor a local institution which can organise and liaise, particularly withinformation providers. I expect that to be a university-based organisationbecause the economics of the project dictate that.


I do not want to be seen to be taking all the credit forBAILII – this has been a collaborative venture and both the UK Steering Groupand the Irish Committee have been vital. John Mee from the University of CorkLaw School has been tremendous in getting data both from Northern Irish andIrish sources and facilitating that. It is very interesting that the Irish partof the BAILII project has become so prominent. If you have a look at the site atthe moment the Irish coverage is very good. We have all the major courts and wehave legislation from both Northern Ireland and Ireland. So I think that’sbeen one of the spin-offs of the project. The steering committee’s role hasbeen and I think ought to be, to facilitate funding and access to information.It’s not the operational body that will drive BAILII and I don’t think itwas ever designed to do that.


LE: Do you see a problem in persuading courts to spendmoney on setting up systems to supply information to BAILII or its equivalent?


AM: There may be some initial cost involved. For thesystem to be effective, courts do have to become the authoritative source forjudgments. In Australia there were costs involved in starting new systems withincourts but ultimately I don’t think it ended up costing them money – it mayhave saved them money in general administration. In particular, the High Courtpreviously had to send photocopies and so on of judgments and transcripts, whichwas something that wasn’t producing any profit for them and it was aninconvenient thing for them to have to do – now everything is sitting there onthe AustLII system and they can just point people to that. But yes, there willbe some need to spend money on putting courts in a position where they can beresponsible for their judgments.


LE: Do you think this is a transitional period? Lookingat BAILII, as I was yesterday, you go through the judgments and you see thereferences that you would expect to see, to WLR or whatever. Obviously youdon’t want to see those – what you want to see is a neutral reference. Wasthere a time in Australia when the courts were referring to commercial lawreports?


AM: Yes, they still do that to some large extent. Theacceptance and use of neutral citation took a year’s negotiation orthereabouts. It has now become fairly entrenched but the conventional publishedcitations haven’t gone away, they simply appear as parallel citations. What wehave managed to do is to make part of the standard that the vendor-neutralcitation for a case is part of the name of the case, which means that wheneveryou refer to Smith v Jones, you should get Smith v Jones 1998 HCA123 or whatever the decision number is. That also means that it comes first asit is part of the name of the case and any parallel citations will follow.


The conventional law reports are something which will notexist forever, they may have a life-span of perhaps a decade or two but Idon’t think that in the future there will be very many series ofconventionally published law reports.


LE: Does that mean the death of the court reporter then,although I appreciate that you have never had court reporters in the same wayanyway.


AM: We don’t have many ex tempore judgments but thereseem to be quite a number of those here in the UK. Headnotes are now for themost part, in Australia, being produced by the courts themselves; the judges dothis and supply catch words in accordance with a standard. It really doesn’tleave a lot for a conventional publisher to do. I mean you can get into mattersof whether or not there should be third-party proofreading and editing ofmaterial but to my way of thinking the judgment as written by the judge in thecourt, and released by the court with court authorised headnotes, citations andcatch words, is a much more sensible thing to rely on than something which hasbeen altered and made less official by a third-party commercial publisher.


LE: And how do you deal with the old cases – pre 1990let’s say?


AM: You either continue to refer to them in the old wayor alternatively you retrofit a vendor-neutral citation system. We haven’tdone that at AustLII but we plan to, and it is something that I have done onBAILII. So, for example, look at the House of Lords’ database, which is one ofthe few databases on BAILII which is, as far as I am aware, complete in thesense that it has all the decisions. I have effectively imposed a system ofvendor-neutral citation on the decisions by sorting them into date order andallocating citations accordingly. Similarly with paragraph numbering, most ofthe old decisions won’t have paragraph numbers, which are necessary if you aregoing to be able to cite electronically. One approach is to retrofit automaticparagraph numbers – the only difficulty with that is that you would need to makesure that all of the publishers of that material, certainly all the electronicpublishers, agree on what those paragraph numbers are.


LE: You were saying that the challenge now is to look forconsistency and to get co-operation from the courts and the legislature. Haveyou had a chance to see the Statute Law Database?


AM: Yes, I saw that in 1997 and it looked fairly good atthe time. I certainly got the impression that they knew what they were doing andthat a lot of resources were being put into it. If that data really is there, itwould make a lot of difference. It had a very sophisticated way of presentingthe legislation – it would be a beautiful data source. Apart from anything elsethe maintenance of material in that format is the responsibility of thelegislature, that is their public duty. It is all part of the provision ofpublic legal information. If you are going to pass laws which fine people and ifignorance is no excuse then you need to make sure that they can actually readwhat the laws say.


In Australia, every jurisdiction even the smallest – likeTasmania and the Australian Capital Territory jurisdictions which have fewerthan 300,000 people – is supplying maintained legislative material.


LE: And if they can do it!


AM: Absolutely. Apart from the just being a matter ofcommon sense and public responsibility it is something which governmentdepartments themselves are reliant upon and which they need access to. It isvery short-sighted to rely upon third-party commercial providers as you will endup paying lots of money to access your own material.


There is a balance to maintain between the basic task ofmaintaining and providing a vanilla version of primary material and when youstart to add value, which is perhaps the responsibility of the commercialpublisher. There is always going to be a role for the commercial publisher, toannotate, to write commentary, to produce something which provides a higherlevel of service which requires editorial input. But I get the impression in theUK at the moment that perhaps the balance between the responsibility of the dataproviders and publishers could be shifted somewhat. At the end of the day itmakes the lives of everybody much easier. If you are a parliamentary counselproducing legislation, you really do need to track when amendments have happenedand you really do need to be able to produce for the parliament versions oflegislation at a point in time. I think if you had a careful look at all thework and the cost that would go into being able to perform basic functions, bothdirectly in respect of the parliament and in respect of all of the othergovernment departments and so on, you very quickly build a case for maintaininga basic sort of legislative database along the lines of the Statute LawDatabase.


LE: I suppose the last question that we should close onis: are you feeling optimistic?


AM: Yes. The BAILII project looks remarkably good. It ismuch more ambitious than I expected when I started the project. I thought Iwould put up two or three databases as a ‘look and feel’ exercise. Becauseof a lot of goodwill from some of the people on the ground in sending me data,e-mails, zip files and CD-Roms – things have been coming in in every shape andform – it has got to the point now where I think it is a genuinely usefulservice which not only demonstrates an idea but is a going concern. So I wouldbe very surprised and disappointed if we can’t really get to the point wherethere is something as good as or better than AustLII happening in the UK andIreland. It is just a matter of getting the data flows right and gettingsomebody locally who has the enthusiasm and the wherewithall to be able to do it- a bit of drive and a lot of impatience We certainly look forward to a futureof reciprocal cooperation – just as we have with the Canadians and with theCornell site. There is a lot exchange of ideas and to some extent bits ofsoftware; that creates a cooperative environment where really everybody wins atthe end of the day, in the public interest.


LE: Andrew, thank you very much for your time and for theinsights you have given us.