False citations: AI and ‘hallucination’

February 26, 2024

Adrian Aronsson-Storrier and Helen Hart look at the recent case of the fabricated tax cases

SCL has previously reported on examples of lawyers in the USA using AI to generate case references for court proceedings, and then discovering that the AI had “hallucinated” the cases, which didn’t exist.

Hallucinations are one of the limitations of language models like ChatGPT, where the model occasionally generates outputs that are factually incorrect and do not correspond to information in the training data. These hallucinations occur because tools like ChatGPT have no human-like comprehension of reality – ChatGPT is merely ‘stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning’. The providers of these tools are aware of the risk of hallucination, and OpenAI warns ChatGPT users that while they have safeguards in place, ‘the system may occasionally generate incorrect or misleading information’. As noted in a recent SCL Techlaw News Round-Up, guidance was recently issued for the UK judiciary about the uses and risks of generative AI in courts and tribunals, including the risk that AI “may make up fictitious cases, citations or quotes”. The SCL Round-Up has also noted guidance from other bodies such as the Bar Council and Solicitors Regulation Authority, warning about this possibility.

The case of Harber v Commissions for His Majesty’s Revenue and Customs [2023] UKFTT 1007 (TC) is the first reported UK decision to have found that cases cited by a litigant were not genuine judgments, but had been created by an AI system such as ChatGPT. The case is a recent First-tier Tribunal Tax Chamber judgment, where a self-represented appellant taxpayer had provided the tribunal with the citations and summaries for nine First-tier Tribunal decisions which supported her position. While these citations and summaries had “some traits that [were] superficially consistent with actual judicial decisions”, they were ultimately found not to be genuine case citations.

In Harber, the respondent gave evidence they had checked each of the cases provided by the self-represented appellant against the cases in the Tribunal data, using not only the appellants’ fictious party names and the year, but also extending the search to several years on either side – but were unable to find such cases. The Tribunal itself also carried out checks for those cases in its own database and within other legal repositories. The litigant in person accepted that it was possible that the cases had been generated by AI, stating that the cases were provided to her by “a friend in a solicitor’s office” whom she had asked to assist with her appeal; and had no alternative explanation as to why the cases could not be located on any available database of tribunal judgments.

The Tribunal ultimately concluded that the cited cases did not exist and had been generated by an AI system such as ChatGPT. The Tribunal accepted that the taxpayer was unaware that the AI cases were fabricated, and ultimately the incident made no difference to her unsuccessful appeal. However, the Tribunal made clear that citing invented judgments was not harmless, causing “the Tribunal and HMRC to waste time and public money, and this reduces the resources available to progress the cases of other court users who are waiting for their appeals to be determined”. The Tribunal also highlighted other problems caused by citing hallucinated judgments; endorsing comments made in the highly publicised the US District Court decision Mata v Avianca Case No. 22-cv-1461 (PKC) (S.D.N.Y), where lawyers submitted non-existent judicial opinions with fake quotes and citations created by the AI tool ChatGPT. In Mata the US Court noted:

Many harms flow from the submission of fake opinions. The opposing party wastes time and money in exposing the deception. The Court’s time is taken from other important endeavors. The client may be deprived of arguments based on authentic judicial precedents. There is potential harm to the reputation of judges and courts whose names are falsely invoked as authors of the bogus opinions and to the reputation of a party attributed with fictional conduct. It promotes cynicism about the legal profession and the…judicial system. And a future litigant may be tempted to defy a judicial ruling by disingenuously claiming doubt about its authenticity.

Although the Harber case involved a litigant in person, it seems likely that the Tribunal would have been harder on a lawyer representing a client and there may have been a referral to a regulator such as the SRA. The Guidance for Judicial Office Holders on AI notes that until the legal profession becomes more familiar with generative AI technologies, judges may need to remind individual lawyers of their professional obligation to ensure material they put before the court is accurate and appropriate. The judiciary may also require individual lawyers to confirm they have verified the accuracy of any research or case citations that have been generated with the assistance of AI. It therefore remains vital for lawyers to carefully review the output of LLM models when they are used for legal tasks, and to search reputable databases when looking for judicial authorities.

The UK judicial guidance on AI notes that chatbots such as ChatGPT may be the only source of advice or assistance some self-represented litigants receive, and that such may lack the skills to independently verify legal information provided by AI chatbots and may not be aware that they are prone to error. It is therefore likely that lawyers and the UK courts will need to be attentive to the increasing risk that hallucinated material may be relied upon by unrepresented litigants.

Helen Hart, Senior Practice Development Lawyer at Lewis Silkin
Adrian Aronsson-Storrier, Practice Development Lawyer at Lewis Silkin