Can Machine Learning and black box systems properly exercise contractual discretion?

January 23, 2023

The courts may impose restrictions on, and subsequently scrutinise, how a party exercises its discretion under a contract. Traditionally, only humans have been capable of exercising that discretion – the current law reflects this. But what if machine learning were used to assist with, or exercise, that discretion? Will those using machine learning systems to exercise contractual discretion be able to justify their decisions?

These are not questions for the future. Machine Learning (ML) is already being used to assist and make real-world decisions. However, the technology is quickly evolving and being used in a greater variety of use cases. Regulators and institutions, including the Financial Conduct Authority and Prudential Regulation Authority,1 and separately the Medicines and Healthcare products Regulatory Agency,2 are looking at whether existing regulations are suitable for the rapidly evolving technology and its expanding set of use cases. Similarly, existing laws may require a different approach for evolving technologies.

In this article we look at:

  • potential issues arising where contractual discretion exercised with the assistance of or by machine learning systems is scrutinised, and
  • what those procuring, developing and deploying ML systems will want to consider to address those challenges and justify their decisions.3

An implied duty to exercise contractual discretion properly

These courts may imply a contractual duty that when a contract gives party A, usually but not always in the stronger negotiating position, the discretion to make a decision that affects both party A and party B, then party A must exercise that discretion rationally.

If the duty is implied the court will scrutinise the decision-making, looking at:

(a) the process – have “the right matters been taken into account

(b) the outcome – is “the result so outrageous that no reasonable decision-maker could have reached it“?4

This is sometimes referred to as the Braganza duty after a 2015 Supreme Court case.5 In that case, BP, having the contractual discretion to decide how Mr Braganza had died, was held to have wrongly concluded that he had committed suicide, which coincidentally released BP from the obligation to pay his widow death-in-service benefits. It was held that BP’s decision-making process did not take into account the right matters and the outcome was not reasonable in the circumstances resulting in BP’s decision being reversed.

Knowing when the Braganza duty will be implied is not always straightforward. It requires, amongst other things, consideration of:

(i) the contractual wording;

(ii) the nature of the discretion being exercised;

(iii) whether the party exercising discretion has a conflict of interest; and

(iv) whether it is necessary to imply the term.

However, what is relevant to our questions is that the Braganza duty has been found in various types of contract and sectors, for example: employment;6 financial services;7 and professional services.8 Whether or not that decision was made entirely by a human or with computer assistance has not been a relevant factor in case law to date. So is it possible that an organisation exercising contractual discretion with or by an ML system will have a Braganza duty implied and their decision-making scrutinised?

What is Machine Learning?

Machine learning is usually seen as a sub-set of artificial intelligence. The key points relevant to how the law concerning contractual discretion operates are:

  • ML allows a system to learn and improve from examples without all its instructions being explicitly programmed.
  • An ML system is trained to carry out a task by analysing large amounts of training data and building a model that it can use to process future data, extrapolating its knowledge to unfamiliar situations.

Issues with scrutinising decisions made by ML systems

Determining how an ML system actually made a decision may not be possible because it is a ‘black box’ – ‘a system [..] that can be viewed in terms of its inputs and outputs, without any knowledge of its internal workings‘.9 The risk is that a black box ML system lacks explainability – a phrase often interchanged with ‘interpretability’ and ‘intelligibility’ – the ability to present or explain an ML system’s decision-making process in terms that can be understood by humans.10

The black box problem should be expected with the current approaches as there is a trade-off between technical performance and the ability for humans to explain how an ML system produces its outputs; the greater the performance, the lower the explainability and vice versa.11

Regulations or standards may require a specified balance between, or a minimum level of, technical performance and explainability in particular for safety or liability reasons. For example, emerging regulation for automated vehicles points to specified data requirements for purposes including explainability of decision-making for legal purposes.

These are both drivers behind the field of Explainable AI (also known as XAI) – the ability to explain or to present in understandable terms to a human.12 There are ways to increase the explainability of ML systems, including: ML models which are by their nature interpretable; decomposable ML systems, where the ML’s analysis is structured in stages and interpretability is prioritised for the steps that most influence the output; or tools, such as proxy models which provide a simplified version of a complex ML system.13

However, it is foreseeable that developers, purchasers and/or users of an ML system may prioritise technical performance over explainability because they prioritise the accuracy of the outcome over understanding how it was achieved.14 For example, it is reasonable for a patient to be more concerned with the accuracy of a cancer diagnosis that understanding how it was reached: whether or not their cancer diagnosis is accurate is of significant and direct use to their ability to make an informed decision about what to do next; being able to explain how the ML system worked to arrive at a diagnosis does not have similar use for them.15 Also, it is foreseeable that in some circumstances XAI tools may only provide limited technical benefit and may not be a commercial option.

A court may question whether it was reasonable for an ML system developer or purchaser to have struck a particular balance between the ML system’s technical performance and explainability where it had a choice to do so (e.g. it was not prescribed in legislation or in contract). Relevant factors may include:

  • which stakeholders are involved and their respective experience, skills and resources available – factors already relevant to determining whether a Braganza duty should be implied;
  • the nature and magnitude of any potential harm or benefits resulting from the decision which require explanation;
  • the relationship between performance and explainability – what is the cost/benefit or increasing or decreasing either?

However, whether or not the choice of a black box ML system was reasonable does not change the issues faced when trying to scrutinise the decision made. The decision-maker will still risk being unable to meet the burden of proof upon them to explain why and how it made a decision and, if they are unable to do so, that the court concludes that the decision had been made by simply “throwing darts at a dart board” (Hills v Niksun Inc [2016] EWCA Civ 115).

If the ML system is still a black box, could a court approach a problem by looking at how the ML system was intended to work, either at the point of design or decision? Consider the Singporean case of B2C2 Ltd v Quoine Pte Ltd [2019]. Whilst B2C2 concerned (amongst other things) the approach to unilateral mistake, rather than contractual discretion, the court had to consider ‘intention or knowledge […] underlying the mode of operation of a machine‘ – that is, why it did what it did.

In B2C2 the court held:

‘in circumstances where it is necessary to assess the state of mind of a person in a case where acts of deterministic computer programs are in issue, regard should be had to the state of mind of the programmer of the software of that program at the time the relevant part of the program was written.’

However, potential issues remain with such an approach.

First, the nature of ML. Certain AI systems, such as the one in B2C2, are ‘deterministic’, meaning that when given a particular input they will produce the same output each time. However, other ML systems may be ‘non-deterministic’ meaning that the same inputs may not always result in the same outputs.

Also, ML systems do not require all instructions to be explicitly programmed. The ML system may ‘learn’ throughout its lifecycle, for example, because it analyses more data and identifies patterns with greater precision. As a result, how the ML system was intended to work may be removed from how the ML system actually did work.

Second, determining a party’s intention may be difficult. Documentation may not be available or may be insufficient. Technical instructions may be of little value in determining how a decision was intended to be made, for example if the instructions have not been updated. Also, the ML system may not maintain sufficient logs of how it worked.

Further, it may not be possible for one person (like in B2C2), or a few people, to explain how an ML system was intended to work. Instead, there may be many involved in the ML system lifecycle (including third parties) who each had differing degrees of influence of the ML system and can only explain separate parts of how the ML system was supposed to work.

Difficulties scrutinising the decision-making process may place greater emphasis instead on the outcome – was “the result is so outrageous that no reasonable decision-maker could have reached it”.

However, ML systems may identify patterns which would otherwise not have been identified by humans. This may be because of the vast amounts of data that ML systems are able to analyse which a human simply could not. That may result in the ML system identifying patterns which ‘no reasonable-decision maker‘ could have identified when that decision-maker is human, but is reasonable for a ML system.

These potential issues are important. Those procuring, developing and deploying ML systems to exercise contractual discretion will be concerned about being in a position where they cannot evidence or justify their decisions and the court concluding they, in effect, threw darts at a dart board.

What practical steps can organisations take to address the legal risks?

Organisations procuring, developing and deploying ML systems to assist with or exercise contractual discretion need to consider, amongst other things, the legal issue. For contractual discretion this can be summarised as:

  • were only relevant factors taken into consideration and given appropriate weight; and
  • was the outcome reasonable.

What that analysis looks like depends on a case-by-case basis. But asking certain questions will help including:

  • when are ML systems used to exercise contractual discretion?
  • What are the contractual (and other) requirements (explicit and implied) about how that discretion is exercised?
  • What evidence is available as to how decisions should be made? For example, what technical documentation, impact assessments, risk management and governance reports are available?
  • What evidence is available as to how decisions are actually made? Did the ML system produce event logs?
  • Who is in a position to explain how decisions should be and are made? Are any of those third parties and are they required to co-operate, if needed?
  • Are there factors in favour of choosing/designing an ML system, and/or using Explainable AI tools, which can help the decision-maker explain their decision?

Many of these are questions organisations are already asking. This may be because of existing regulations requiring risk management, such as in financial services. Additionally, it may be part of preparations to comply with future AI-regulations, such as the EU AI Act which, amongst other things, specifies a range of obligations for high-risk AI systems. In any event, these questions (and more) should be asked – and kept under review – where ML systems are being deployed as part of good governance and risk management.

Tom Whittaker is a Senior Associate and solicitor advocate in Burges Salmon’s Dispute Resolution and Technology Teams.

Notes and references

1 https://www.bankofengland.co.uk/prudential-regulation/publication/2022/october/artificial-intelligence

2 https://www.gov.uk/government/publications/software-and-ai-as-a-medical-device-change-programme/software-and-ai-as-a-medical-device-change-programme-roadmap#wp-9-ai-rig-ai-rigour

3 Various regulations may also be applicable but they are not discussed here.

4 Braganza paragraph 24

5 Braganza v BP Shipping Limited and another [2015] UKSC 17

6 Braganza

7 UBS v Rose [2018] EWHC 3137 (Ch)

8 Watson v Watchfinder [2017] EWHC 1275 (Comm)

9 https://ico.org.uk/for-organisations-guide-to-data-protection/key-dp-themes/guidance-on-ai-and-data-protection/glossary/

10 see ‘POST Interpretable machine learning

11 https://docs.aws.amazon.com/whitepapers/latest/model-explainability-aws-ai-ml/interpretability-versus-explainability.html. Also, see ‘POST Interpretable machine learninghttps://researchbriefings.files.parliament.uk/documents/POST-PN-0633/POST-PN-0633.pdf. Though see https://wired.co.uk/article/psychology-artificial-intelligence for a different view

12 https://storage.googleapis.com/cloud-ai-whitepapers/AI%20%Explainability%20Whitepaper.pdf

13 see ‘POST Interpretable machine learning’

14 Assuming that there are no applicable laws or regulations which affect what is required. This is not to say that explainability is never required but instead that the balance between accuracy and explainability can vary depending on context and stakeholders.

15 In contrast to the patient example, a doctor or an NHS Trust may place increased importance on explainability to understand how their systems are working and to improve their other diagnostic processes.