Data Protection Impact Assessments meet AI: Smart questions for building compliant AI systems that use personal data

September 5, 2022

As the tech world continues to discuss the future of AI regulation, it is important to remember that there are already robust legal regimes that impact the development and launch of AI systems, including the General Data Protection Regulation. AI systems which fall within the GDPR’s scope must comply with its requirements – including the requirement to perform and document Data Protection Impact Assessments. Since many AI
systems will use personal data at some point during their lifecycle, DPIAs are of upmost importance for organisations that use AI. DPIAs assess the risks posed by a
processing activity to individuals (known as ‘data subjects’ for the purposes of the GDPR) and the measures taken by organisations to mitigate those risks. In practice, this analysis is more than just a compliance task, it provides key teams with an opportunity to collaborate and, ultimately, results in more ethical and robust AI systems. However, drafting DPIAs for AI systems is a challenging task due to both the complexity of the technology and the developing regulatory landscape. Further, a well drafted DPIA will require the involvement of various teams (including technical, risk & compliance and legal teams) who will need to work closely to identify and mitigate the unique risks associated with the system – all whilst speaking different “languages” from an operations perspective.

Given these complexities, there is no copy and paste set of questions to ask when analysing AI systems. However, there are some key issues that arise time and again. Asking these questions may help to trigger practical conversations with operational teams over the most crucial issues arising when the data protection world intersects with AI systems.

Have you identified your purpose and is the processing of personal data for this purpose lawful?

Every DPIA must clearly set out the purpose of processing and the lawful basis that the responsible party (known as the ‘controller’ under GDPR) relies on to process personal data. After all, whilst the use of AI is not unlawful under the GDPR, AI has been used for unlawful purposes and without an adequate lawful basis. Measures should also be put in place to ensure that the AI system is only used for that purpose and not for other unrelated purposes (sometimes referred to as “function creep”).

Whilst the purpose of the AI system is a commercial/operational question that needs to be answered, the lawful basis for processing personal data will also need consideration from a legal perspective. It is important to bear in mind that the lawful basis may vary based on the stage of AI lifecycle. For example, the use of personal data to train a model is likely to be processed under the controller’s or a third party’s legitimate interest. Conversely, the lawful basis for the use of a
developed AI system may be for entirely different legitimate interests, or an alternative ground may be possible. Whichever lawful basis applies, it is important that the requirements are fully met before proceeding to process any personal data.

Have you ensured that your use of personal data in the AI system is necessary and proportionate to the purpose?

In the data protection compliance world, just because ‘you can’ does not mean ‘you should’. In the context of AI systems this is relevant when selecting a training data set. For technical teams, there may be a temptation to collect and use everything that is available to train a model but, when personal data is involved, it is important to comply with the data minimisation requirement. This requirement involves only using personal data that is adequate, relevant and necessary for achieving the purpose.

A good way of implementing this principle in practice is to ask operational teams to identify which data attributes are absolutely necessary for the AI system at the earliest stages of the lifecycle. Further attributes may be added where you are better able to demonstrate that the attribute was necessary (for example, in order to reduce the false positive rate or mitigate an identified area of bias). In addition, during the training phase you might consider whether you need personal data at
all – could anonymous or synthetic data suffice until a later phase in the AI lifecycle?

Can you explain the logic behind your AI systems decision-making capabilities?

The concept of ‘explainable AI’ (i.e. that which can be understood by humans) has played a key role in the discussion over ethical AI systems for many years. In contrast, machine learning algorithms that use sophisticated neural networks that are unintelligible to
humans are often referred to as ‘black boxes’ for data processing. As well as presenting an ethical conundrum, this can present a legal challenge. Controllers of AI systems must remain transparent about their data processing activities, fulfil rights requests from affected individuals and navigate potential adverse consequences resulting from their AI system’s output – achieving these requirements is challenging when you cannot clearly explain how the AI came to its conclusions.

In addition to ensuring transparent use of AI to data subjects, you should also be able to provide some meaningful information about the logic involved in the AI system and have identified potential consequences for those affected by its output. Can you implement technical ‘explainability’ measures for your AI system (for example, building in an automatic logging process while the AI system is operating)?

Wherever possible, AI systems should be used in partnership with humans that understand the wider context of the processing activity. Where your AI system will be used by others, you may consider publishing instructions for use so that the user can interpret the output and use it appropriately (whether as part of general human oversight or as part of an appeal process).

Have you critically identified and mitigated against risk of bias?

It is well known by now that societal bias is reflected in real data. Even with the best intentions (following the goal of ‘AI for good’), if you haven’t critically considered your training data you may end up with an AI system that replicates inherent biases. The same challenge applies to models that continue to learn from outputs which are later used as input for further operations. A biased AI system will not fulfil the ‘fairness’ principal set out in the GDPR.

Technical teams should critically analyse their input within the context of the intended purpose for the AI system. This analysis should include checking that such data is representative, free from error and taking any opportunities to avoid bias. Consider whether data validation techniques can be applied to challenge the output and review test results critically. Importantly, technical teams should have the room to adjust, amend and test the AI system without pressure to launch until they are
confident that any bias risk is sufficiently mitigated.

Once the AI system is in service, you should consider setting up regular audits to check for bias creep. In addition, implementing other safety features that allow the user to stop the AI system and investigate in case of questions over bias is also recommended.

How have you ensured the accuracy of your AI system?

AI systems cannot be accurate 100% of the time. There are often logical technical reasons for this (for example the risk of overfitting), but technical teams should actively look for opportunities to reduce the rate of false positives/false negatives wherever possible. You should discuss this process with technical teams, understand the likelihood and potential impact of errors and identify what mitigating measures are necessary to avoid notable risks.

Organisational measures can also help to protect the rights of individuals, for example, by setting out the accuracy metrics of the AI system and sharing this with the users of the system so that they can consider the output in context.

Have you identified and mitigated against any vulnerabilities of your AI system?

AI systems are not infallible and AI systems are vulnerable to adversarial techniques of those wishing to attack them. Providers of AI systems which use personal data should identify these vulnerabilities and put in place measures to mitigate or ideally prevent these risks altogether. The likelihood and impact of such vulnerabilities are dependent on the purpose and context of the AI system, and your teams should identify and discuss any such vulnerabilities and look to implement appropriate solutions to address them. For example, having in place technical redundancy solutions (including back-ups and fail-safe plans) that can be called upon in the event of a critical situation, or implementing measures to prevent and control for attacks (such a data poisoning or adversarial inputs).

Closing comments

Asking these key questions will help your team to decipher some of the most significant issues when combining AI with personal data. Just as importantly, documenting these topics in your DPIA from the outset of your AI’s lifecycle will ensure you are better positioned to handle enquiries and protect your business’s
reputation.

AI and personal data are not mutually exclusive – they can be used together in a way that meets regulatory requirements. We see that European Data Protection Regulators recognise the value that AI systems have for organisations by extracting patterns and learning from data but have also raised concerns about the associated risks. The European Data Protection Board is beginning to issue guidance on specific areas of AI processing (most recently for the use of facial recognition
technology in law enforcement – currently in its consultation phase). In addition, a number of regulators have issued detailed tools and opinions on the lawful use of AI systems for processing personal data (for example, the UK, Dutch, Italian and Spanish regulators). As we move closer towards regulation of AI we expect to see more clarity and enforcement in this area.

Francesca Pole

Francesca Pole is a data protection and cybersecurity associate at DLA Piper. She is an active member of
DLA Piper’s Working Group on AI and the firm’s Global Technology Sector Group.

This article was first published on the DLA Piper website and is reproduced here with permission.