The spread of anti-vaccination misinformation on social media, (and its implications for public health and the global fight against COVID-19) is a textbook example of how misinformation can have serious real world effects particularly while we tackle the virus. In advance of a webinar for SCL in September looking at causation in civil cases, a team of economists and data experts from FTI Consulting explain how social media data, artificial intelligence and traditional statistical analysis can be deployed – together – to examine important questions of causation arising from 'fake news'.
Readers will be aware that social media networks are rife with conspiracy theories and misinformation about the origins of the COVID-19 virus and treatments for the disease. Misinformation is not a new phenomenon. However, the rise of social media and changes in how people obtain and consume news, have made misinformation much more infectious: fake news can now spread faster, wider and more freely than ever before. This so-called ‘infodemic’ is having significant – and in the current crisis, even dangerous - effects on public health. Outside of public health, concerted smear campaigns can cause permanent damage to individual and corporate reputations.
In this article, we explore these important issues using a recent ‘test case’: misinformation around the Measles, Mumps and Rubella (MMR) vaccine. We identify and measure misinformation on Twitter, quantify the causal effect it has in the real-world, and identify what implications their findings have for legal and policy responses more generally – including in relation to the regulation and behaviour of social media companies, and to the determination of commercial disputes.
Misinformation is not a new phenomenon – especially when it comes to infectious disease, vaccination and public health. The roots of the anti-vaccination movement can be traced back to the 18th Century, when smallpox inoculations were outlawed after being blamed for a severe outbreak of the disease in Paris.1
Thankfully, vaccination techniques have since improved, and the public health case for widespread vaccination has become stronger, better understood and more widely accepted, resulting in global vaccination programmes and the effective eradication of certain infectious diseases.
However, the anti-vaccination movement persists and it is known to have a significant effect on public health. In 1998, for example, Dr Andrew Wakefield authored a study published in The Lancet, a well-regarded medical journal, claiming to identify a link between the Measles, Mumps and Rubella (MMR) vaccine and the onset of symptoms of autism. The Wakefield study was eventually debunked, found to be fraudulent and revoked entirely – but the damage was done. The study received widespread publicity at the time and was blamed for contributing to a sharp fall in MMR vaccination coverage rates and a resurgence of the disease in Western countries.2
FIGURE 1: “THE COW-POCK-OR-THE WONDERFUL EFFECTS OF THE NEW INOCULATION!”, BY JAMES GILLRAY
This famous satirical cartoon, by British caricaturist James Gillray, pokes fun at contemporary anti-vaccination rumours about the side effects of vaccination against smallpox.
In more recent years, and despite significant pro-vaccine efforts by health practitioners and policy makers, the false link between the MMR vaccine and autism identified in the Wakefield study lives on and has, in fact, found a new lease of life through social media. Posts spreading misinformation continue to pervade online discussions to this day, fuelled by content created and shared by ordinary individuals, celebrities, politicians and organisations who, for various reasons, either wish to propagate the myth or are ignorant of its inaccuracy.
Is this just harmless discussion or does it have a real effect on public health? And what implications might this have for the global fight against the COVID-19 pandemic? In this article, we use the MMR vaccination and measles as a ‘test case’. We apply a combination of:
We find evidence that misinformation can be dangerous. In particular, the proliferation of anti-vaccination misinformation on social media has a material and ‘statistically significant’ causal effect: it has reduced vaccination coverage across England and Wales and contributed to the resurgence of the disease. Our analysis has three important implications:
We explain our approach, our results and their implications in the rest of this article.
How do we measure misinformation?
Before we can assess the effect of misinformation, we need to be able to measure it. That is no easy feat, for three main reasons:
We address these issues by focusing on the spread of misinformation on Twitter, because it is one of the most popular social networking platforms, and because Twitter makes its data publicly available for detailed analysis.3 We use artificial intelligence to sieve through hundreds of thousands of Tweets that relate to the MMR vaccine, and identify which ones contain misinformation. In particular, we use a technique called ‘supervised machine learning’, that entails: (1) performing a human review of a sample of tweets to identify instances of misinformation, (2) training an algorithm to identify such tweets in the broader population, (3) using this algorithm to classify them all and (4) using the classification to construct a statistical index that tracks both the amount of misinformation and its exposure over time – which is shown in Figure 2 below.4
We find that there was a general increase in misinformation over time, but with ‘surges’ around particular events – such as Donald Trump linking vaccination and autism during a Presidential candidate debate in late 2015 and Robert De Niro appearing on television to debate the vaccine-autism link in early 2016.
How can we isolate the causal effect of misinformation on vaccination coverage?
‘Vaccination coverage’ is a measure of the proportion of individuals eligible for a vaccine who actually receive the vaccine. Coverage for the MMR vaccination in England and Wales has fallen in recent years (from around 94% in 2013 to almost 90% in 2019),5 and over the same period of time our misinformation index has increased. This suggests that there may be a relationship between the two but it is well known that simple correlation does not necessarily imply causation.
To measure the causal effect, we need to account for ‘confounding factors’. Confounding factors are those factors (other than misinformation) which might also be related to vaccination coverage. Those effects, therefore, need to be disentangled and removed from the picture before we can attribute any causal effect to misinformation. In general, there are only two ways to measure causal effects:
(1) to conduct an experiment, akin to the randomised control trials used by clinicians to assess the effect of a new drug or
(2) to perform a statistical analysis, using real world data. The former approach is the ‘gold standard’ – but it is hugely expensive, and often practically impossible.
Instead, we use a standard statistical methodology known as ‘multiple regression analysis’, that is capable of disentangling and removing the effect of multiple confounding factors, to leave the effect that we are interested in. In order to identify the relevant confounding factors, we call upon an extensive public health literature on the determinants of higher or lower rates of vaccination coverage, such as socio-economic factors that determine acceptance of vaccines (levels of income, employment, education and ethnicity etc.), and other policy and logistical factors that affect access to vaccines.6
How do we take confounding factors into account?
We collect detailed official data from the UK Office of National Statistics, Public Health England and Public Health Wales in relation to vaccination coverage and the most important confounding factors. We process and match all this data into a single dataset that covers approximately 160 separate geographical areas across England and Wales, with quarterly observations, over a period of 7 years, from 2012 to 2018. The data reveal complex relationships between vaccination coverage and the various factors. For example, Figures 3 – 5 show that vaccination coverage varies significantly over time and across England and Wales, that areas with greater proportions of the population from minority ethnic groups tend to have lower vaccination coverage rates and that areas with more highly educated populations tend to have lower vaccination coverage rates.
We use this data to construct a ‘multiple regression model’ that quantifies the relationship between the level of MMR vaccination coverage in each geographical area and time period (the ‘dependent variable’) on one hand, and the factors that drive vaccination coverage (the ‘explanatory variables’) on the other. The explanatory variables include our measure of anti-vaccination misinformation, and separately, variables representing the confounding factors.7
By measuring the effect of these confounding factors separately, the regression model is able to isolate and quantify the causal effect of anti-vaccination misinformation. We illustrate this in Figure 6 below.
What do we find?
We find that the proliferation of anti-vaccination misinformation on social media has a ‘statistically significant’8 relationship with vaccination coverage, i.e. with parents’ decisions to vaccinate their children and that this relationship is not explained by the various confounding factors. In other words, it appears to be a causal relationship.
In particular, we find that when our measure of misinformation increases by 100%,(that is, it doubles), this causes vaccination coverage to fall (by about 0.20 percentage points, on average). On the face of it, this seems like a small effect. However, it is magnified by the significant growth in misinformation over time: over the 5-year period from 2014-2018, misinformation increased by approximately 800%.9 Vaccination coverage fell by approximately 3 percentage points10 and our regression analysis suggests that over half of this fall may be due to misinformation.11
The effect on public health
The link between vaccination coverage and public health is a matter of medicine and epidemiology. However, it is generally well understood that populations with lower rates of vaccination coverage are likely to experience more frequent and more widespread outbreaks of infectious disease. This applies to the MMR vaccine and measles. Our data suggests that, on average, a 1% decrease in vaccination coverage is associated with a 2% increase in the measles incidence rate12. This finding is consistent with other estimates in the epidemiology literature.13
Our findings have a number of implications:
First, misinformation can affect human behaviour.
Discussion on social media can have a ‘real world’ impact on human behaviour, and those behavioural changes can lead to serious health problems. This raises important questions that are at the forefront of the current debate, for:
At the same time, the US Government is currently conducting a review of Section 230 of the Communications Decency Act, which is relevant to whether social media companies should be treated as publishers or distributors of content created by their users.16
The debate around such regulations will need to balance the potential benefits against the potential costs. For example, previous attempts to hold websites, such as YouTube, accountable for failing to removing content that violates certain rules have been criticized for risking the destruction of the free-form sharing that makes social networks viable.17
Second, social media has an important part to play in the global fight against the COVID-19 pandemic.
As at the date of this article, no vaccine has yet been fully developed and approved for COVID-19. However, if (and hopefully when) such a vaccine is discovered, tested, approved and produced in large enough volumes for population-wide vaccination to be feasible, achieving high-levels of vaccination coverage quickly (ideally, in excess of the herd immunity threshold) will be essential in order to reduce the economic and social burden of the disease. Our analysis suggests that a flood of misinformation on social media in relation to this vaccine will have a serious effect on its acceptance, and ultimately on economic and social wellbeing. Using the techniques outlined here, social media users, companies and policy makers could start work now to anticipate and mitigate this impact. The largest social media platforms are already doing so. For example, Twitter recently updated its approach to address misinformation on its network – including for content relating to COVID-19 – and started to label or even remove posts altogether depending on the form of the misinformation and its propensity to cause harm.18
Facebook and Instagram are also making efforts to take down content with harmful misinformation, to reduce its distribution and to apply warning labels.19
Third, our analysis demonstrates how social media data, artificial intelligence and traditional statistical analysis can be deployed – together – to examine important questions of causation.
These techniques are relevant in a wide range of contexts. For example:
Misinformation can have real life consequences for individuals, businesses and public authorities: it is one of the most important, controversial and hotly debated topics in public discourse today. However, the debate – like many others – is sometimes devoid of facts. We believe that such debates can and should be enriched by clear and objective analysis, using rigorous and scientific techniques from economics, statistics and data science such as those demonstrated in this article.
NOTES & SOURCES
1 The anti-vaccination movement, Measles & Rubella Initiative; History of Anti-vaccination Movements, The College of Physicians of Philadelphia, 10 January 2018.
2 Measles cases spike globally due to gaps in vaccination coverage, World Health Organization, 29 November 2018.
4 We explain our approach to using various artificial intelligence techniques (including both supervised and unsupervised techniques) to measure misinformation in a separate article.
5 FTI Consulting analysis of public health data from Public Health England and Public Health Wales concerning the coverage ratio for the 2 Year MMR1 vaccine. The MMR vaccine is delivered in two doses: the first dose (MMR1) at the age of 1, and the second dose (MMR2) at the age of 3 years and 4 months. The 2-year MMR1 coverage ratio is calculated as the number of children who turned two during that quarter and had received the vaccine, divided by the total number of children who turned two that quarter.
6 Dunn, A.G., Surian, D., Leask, J., Dey, A., Mandl, K.D., Coiera, E. (May 2017). Mapping information exposure on social media to explain differences in HPV vaccination coverage in the United States, Vaccine, volume 35, issue 23, pages 3033,3040.
7 The precise way in which confounding factors are taken into account varies between the factors. Certain factors (such as education and ethnicity) are taken into account explicitly, by including measures of those factors in the regression model (such as the proportion of population with A-Level education or above, or the proportion of the population from an ethnic minority group). Other factors (such as political views or levels of income) are taken into account implicitly, using ‘catch all’ variables called ‘fixed effects’. These fixed effects capture the effect of confounding factors that differ between geographical areas (political views and levels of income which vary significantly between, for example, central London and rural areas in the north of England), but do not vary over the period of time covered by our study.
8 It is common practice in regression analysis to perform a formal statistical test for whether the effect estimated by the regression model is: (a) a genuine effect or (b) whether it has instead been measured by chance (i.e. there is in fact no effect at all).
9 FTI Consulting machine learning analysis of Twitter data.
10 The proportion of children receiving their first dose by age 2 fell from 93.4% in Q1 2014 to 90.4% in Q4 2018.
11 Our model suggests that for every 100% increase in misinformation, we would expect to see a 0.205 percentage point drop in vaccination coverage. Under the assumption that this effect remains constant, an 800% increase in misinformation therefore results in a (800 / 100) * 0.205 = 1.64 percentage point reduction in vaccination coverage.
12 Measles incidence refers to the number of reported measles cases divided by the total population in any given time period. Our data suggests that for every one percent reduction in the 2-year MMR1 vaccination coverage percentage, there is on average a two percent increase in the measles incidence rate.
13 For example, Hall, R. & Jolley, D. (July 2011) International Measles Incidence and Immunization Coverage, The Journal of Infectious Diseases, Volume 204, Supplemental Issue 1, page S161.
14 Tackling online disinformation, European Commission, last updated 13 September 2019.
15 Coronavirus: EU strengthens action to tackle disinformation, European Commission, 9 June 2020.
16 Donald Trump orders legal review targeting social media groups, [sign in required] Financial Times, 29 May 2020.
17 Article 13: UK will not implement EU copyright law, BBC, 24 January 2020. Critics pointed out that a copyright law making social media platforms responsible for user generated content could result in the companies having to pre-moderate content, preventing the real time sharing and the sheer volume of sharing that makes these networks so ubiquitous.
18 Updating our Approach to Misleading Information, Twitter, 11 May 2020.
19 Post by Mark Zuckerberg, CEO of Facebook, 16 April 2020. Instagram will begin blocking hashtags that return anti-vaccination misinformation, The Verge, 9 May 2019.
20 Coronavirus: Man jailed for 5G mast arson attack, BBC, 8 June 2020.
21 The causal effect of mobile phone technology on economic growth is well documented. (See, for example, Calling across the Divide, The Economist, 10 March 2005, which makes reference to a detailed empirical study co-authored by Dr Meschi (one of the authors of the current article). A similar causal relationship may apply to the more modern 5G standard. See, for example 5G conspiracy theories threaten the U.S. recovery [sign in required], The Washington Post, 4 June 2020, which explains why investment in 5G is important to the US’s recovery from the current recession.
22 Colin Kaepernick explains why he sat during national anthem, National Football League, 27 August 2016.
23 Nike stock price reaches all-time high after Colin Kaepernick ad, CBS News, 14 September 2018.
24 We explain how these techniques are relevant to assessing legal liability and compensatory damages in a separate article
Dr Meloria Meschi, Senior Managing Director, Economic and Financial Consulting, +44 20 3727 1362, email@example.com
David Eastwood, Senior Managing Director, Economic and Financial Consulting, +44 203 727 1292, firstname.lastname@example.org
Ravi Kanabar, Senior Director, Economic and Financial Consulting, +44 20 3727 1280, email@example.com