Richard Graham and Adam Lewington paint a Big Data landscape, and highlight the risks in the foreground and those looming on the horizon
We are now sharing more data on digital platforms each day than was previously available to the entire world. Such raw data supports business intelligence and analytics, as well as enterprise-wide solutions. This has paved the way for innovative technological platforms facilitating the collection, storage, aggregation and processing of data. Other types of data collected on mobile devices and other publicly available 'open data' can augment the underlying data and assist with our understanding of the combined dataset. The term 'big data' has been coined to cover our analysis of this data, and the product of this analysis is being used to develop marketing and business strategies, and steer changes to product development and customer experience. It is also relevant in all areas of our society, from business and education to healthcare, government and politics. Increasingly, big data is fuelling the growth in e-commerce, m-commerce and contributing to the development and exploitation of successful digital strategies.
Big data refers to mass repositories of structured, unstructured or semi-structured data, the size of which are beyond the ability of standard database software to structure and analyse. Such data can be collected across multiple digital platforms, in multiple jurisdictions and in multiple languages. We have traditionally been presented with technological barriers to analysing and understanding this data effectively. However, the explosion in cloud computing, open source software and software interoperability has provided us with the opportunity to analyse this data, which includes data deriving from social media platforms such as LinkedIn, Twitter, Facebook, Pinterest and now Google+. The value of data should not be underestimated, particularly for the US$500billion advertising industry. However, big data raises a number of legal issues, including those around intellectual property ownership and licensing, data protection and privacy, interception of communications, cyber security and data breach notification. In addition, there are regulatory issues for certain organisations exploiting big data and liability issues associated with our reliance on the product of the analytics.
The McKinsey Global Institute identified a 40% projected global growth in the amount of data being generated each year. Indeed, the sheer volume of data being created is simply too much to be retained and much of it is currently being deleted. We are also living in a world where the 'Internet of Services' has become a reality, with networks pulling data from objects, including smart meters, automobiles and retail devices. Everything that is required to process and analyse big data can now be made available as a service on the Internet, including the processing software, the tools to develop the software and applications and the platforms (servers, storage and communications) upon which they are hosted. The next challenge is to fully understand how to interpret and analyse this data in a meaningful way.
Opportunities: Is it all about innovation and growth?
Much of the big data debate relates to innovation and growth. The explosion in online behavioural advertising is an example of the value of information analytics, and both Google and Facebook use their mass repositories of data to produce detailed, targeted analytics for marketing purposes. Healthcare providers have also been making use of their datasets, for example to enhance the development of new medicines or spot relationships between data that were otherwise hidden. MasterCard and Visa have also led the way in identifying a market for analysing credit card purchases to target advertisements and create profiles for individuals by pooling different datasets. We are also now witnessing enterprises harnessing the value of big data and this leads to improvement in productivity, customer engagement and profitability.
The McKinsey Global Institute claims that the use of big data could be worth up to US$300 billion per annum to the US healthcare sector and €250 billion per annum to the European public sector administration, and could also give rise to a 60% increase in retailers' operating margins. Data has now become a significant asset, and many organisations are only just beginning to recognise its importance.
However, innovation and growth is just one of the consequences of big data. 2011 saw the 'Arab Spring' and riots in the UK. Valuable intelligence from social media was used to determine the hotspots of anger or criminal activity in both cases, and this allowed law enforcement agencies to allocate resources in the best way that they could. In the French presidential elections, Francois Hollande was able to determine geographic views and moods using social media. Techniques such as sentiment analysis, knowledge mining and the aggregation of conversations into trends will play an increasingly important role in all aspects of our society.
Risks: Is it just about privacy?
Perhaps the highest profile big data disaster was the unfortunate incident involving Target. Target is the second largest discounted retailer in the United States behind Walmart. The New York Times ran a story on how Target, by looking into buying patterns, were able to work out whether their customers were pregnant – a gold mine for any retailer as a pregnant customer is much more likely to shop in bulk. Target undertakes advanced analytics on its customers' behaviour and an incident allegedly occurred where a teenage girl started receiving baby-related coupons at home following a change in her purchasing behaviour. Her father confronted Target about the inappropriateness of his daughter being sent such coupons, only to subsequently learn that his daughter was in fact pregnant. This case highlights the clear privacy risks associated with big data and a valuable lesson that the use of big data analytics still needs to be regulated by humans in the real world.
However, the big data revolution is not just about privacy issues. There are also a number of other risks to big data innovation, including issues arising out of the ownership and licensing of intellectual property rights, data protection, the interception of communications, information security and data breach notification and regulatory issues. There are also liability risks associated with relying on this big data in circumstances where inaccuracies could give rise to significant losses.
Perhaps the most interesting area for big data is intellectual property law. Any data analytics or data mining will often involve the wholesale copying of information or databases, all of which will be protected by intellectual property rights in relevant jurisdictions. Where data is not owned or licensed then the user will need to rely on an exception to copyright infringement to be able to use such data. This has given rise to a gathering storm between data owners on the one side and technology providers on the other, as complex arguments relating to ownership, licensing and exceptions to copyright are currently being rehearsed.
In the UK, the report on copyright reform prepared by Professor Ian Hargreaves made sweeping and controversial recommendations for changes to copyright law to make it fit the requirements of the digital economy. A key proposal was to permit non-commercial use of analytics as well as promoting at a European Union level an exception to support text mining and data analytics for commercial use. This proposal was met with widespread criticism, particularly from copyright owners. However, in December 2012, Vince Cable announced as part of the 'Modernising Copyright: a modern, robust and flexible framework' initiative, that non-commercial researchers will be allowed to use computers to study published research results and other data without copyright law interfering.
There are also complicated issues arising out of 'fair use' or 'fair dealing' of copyright work, and this is an area where technology is regularly in conflict with law. Linked to this is the question of whether there is any copyright infringement at all where arguments relating to the use of the copyright work being 'non-consumptive' or 'transformative' are relevant. Where such use is considered to be 'non-consumptive', the argument is that the data analytics does not seek to take advantage of the originality expended by the original copyright owner. Where such use is considered to be 'transformative' the argument is that the product of the data analytics does not itself create a derivative of the original but exhibits sufficient individualism to be classified as a distinct work in its own right. Accordingly, there should be no copyright infringement.
The battle around the licensing of newspaper headlines in the Meltwater cases (Newspaper Licensing Agency v Meltwater Holding BV  EWCA Civ 890 and associated Copyright Tribunal decisions ) highlights the importance for any organisation of ensuring that it complies with intellectual property law. In this case, the Newspaper Licensing Agency and other newspapers successfully argued that Meltwater required a licence to offer a news monitoring service to clients, in circumstances where it used technology to scan newspaper websites for relevant content that it had no original licence to use.
As data protection and privacy continue to dominate global political agendas, the data protection risks associated with big data still remain a fundamental concern. In general, regulatory requirements, particularly in the EU, dictate that personal data must be processed for specified and lawful purposes and that the processing must be adequate, relevant and not excessive. These provisions can be distilled into the requirements for transparency and necessary justification for processing. The impact of these requirements on big data is significant, with data subjects being able to ask digital platforms to refrain from processing, or remove, their personal data in certain circumstances.
The European Commission published its proposed General Data Protection Regulation on 25 January 2012. The Regulation looks to strengthen the existing rights, such as the requirement to achieve transparency and obtain explicit consent, and introduces new rights and concepts, such as privacy by design and the right to be forgotten. All of these rights present a challenge and a cost for big data innovation - a large part of the success of big data has been its ability to analyse mass quantities of data collected over years. In particular, the success of big data has been about identifying unforeseen consequences. Any restrictions on the ability to collect or retain any of this data will present a challenge to big data. There are also further hurdles for big data providers to overcome, including the costs associated with data portability and dealing with the procedural constraints around the international transfer of data.
The privacy implications associated with data mining are becoming highly controversial. The lawfulness of data mining for commercial purposes has been explored in case law in the United States where privacy rights came into direct conflict with the right to freedom of speech. Sorrell v IMS Healthcare No.10-779  involved a dispute over a Vermont statute that effectively restricted the data mining of physician prescriber records for drug marketing purposes. The US Supreme Court decided to repeal the statute on the basis that it infringed the First Amendment right to freedom of speech. The Supreme Court declared that the statute was unlawful as it was selective in prohibiting the sale and use of personal data to be used for marketing purposes. The ruling serves as an important milestone in the attempt by the courts to strike a balance between protecting legitimate privacy rights (ie by prohibiting persuasive marketing) without suppressing free speech in the process (ie by prohibiting data mining for commercial purposes). The chances of this jurisprudence having any influence in European legislation or case law are slim.
Interception of Communications
A key concern with big data is the use that is made of the data, in particular by enforcement and intelligence agencies. Digital platforms provide evidence of communications between individuals (eg who contacted who, when and where) and the content of the communication itself. There are strict rules on both the access to the underlying communications data and the interception of the communications. Indeed, in the UK, the Queen's Speech of 2012 announced a Draft Communications Data Bill that would maintain the ability of the law enforcement and intelligence agencies to access vital communications data under strict safeguards to protect the public.
Cybersecurity and Data Breach Notification
Information security relates to the availability, confidentiality and integrity of data. A fundamental cornerstone of any big data solution will be the requirement that the data is kept secure and protected against unauthorised or unlawful processing, and against accidental loss or destruction of, or damage to, the data. Weaknesses in information security have become a fundamental threat to the success of new cloud-based solutions and big data. The larger the amount of data that is harnessed within these solutions, the greater the losses arising out of the misappropriation of that data.
There is also a growing global movement to ensure that effective data breach notification regimes are in place where systems are breached. One of the biggest risks associated with big data arises when the underlying data is lost. The costs of data breach include both the direct cost of immediate investigation, response and notification, and the indirect and long-term costs of reputation damage and business interruption. There will also be costs to third parties whose identities and personal information have been appropriated and used to their financial detriment. The negligence of any entity in breach could lead to substantial litigation costs. Therefore serious consideration and management of risk must be taken into account by any organisation, both externally and internally, when working with big data.
Organisations will therefore need to be aware of the constantly evolving and changing reporting requirements across jurisdictions in order to avoid falling foul of any requirements they fail to satisfy. In Europe, the data breach notification laws are in their infancy. At present, only ISPs and telecommunications providers are legally required to notify relevant authorities and data subjects. However, this will change as the proposed data protection reform is implemented, and in February 2013 the European Commission announced plans for a Cyber Security Directive, which includes breach notifications laws for critical infrastructure providers. Laws and regulations in most US states mandate notice of the breach to the affected individuals if they are resident in that state, and some states also require reporting to regulatory agencies or the relevant attorney-general. Often, vast numbers of individuals can be involved in a single breach, and large breaches usually involve residents of many jurisdictions.
There is a major difference between the EU regime and the US regime. In Europe, if a data breach occurs, the data controller will be required to notify all the data subjects regardless of where they are resident. The laws of the Member State in which the data controller is established will dictate the form and content of the notification. In the USA, the state laws protect the residents of that state and a data controller that suffers a breach must carefully review the requirements of each applicable jurisdiction of the data subjects to determine the obligations in that particular jurisdiction.
There will also be regulatory issues to address when undertaking big data analytics in regulated areas. The most obvious sector will be the financial services sector where price sensitive information and other core data is heavily regulated. Any big data analytics on trends could give rise to allegations of insider trading or other forms of market abuse. In addition, Solvency II will impose significant additional regulatory burdens on organisations operating in the insurance sector. A key element of Solvency II focuses on risk management and insurers are increasingly using data analytics to anticipate their enterprise risks and implement risk control procedures to mitigate potential losses. Any use of external data for these purposes will need to comply with the requirements of Solvency II.
The final significant risk with big data is the liability arising out of the reliance on that data, in particular in circumstances where the output is based on inaccurate or incomplete information. It has been suggested that simply because big data contains an enormous amount of information this does not equate to a representative sample of the population.
The Road Ahead
It is clear that big data presents an opportunity for all existing and developing sectors, including health, retail, government, manufacturing and organisations pioneering location-based services. As the amount of data being generated exponentially grows, so will the opportunities and the value we attribute to the underlying data. There will be many operational, technological and legal barriers to overcome before the full value of big data innovation can be captured. However, provided organisations exploit this opportunity in a transparent and open manner there will be exciting developments ahead.
Richard Graham is a partner in the London office of Edwards Wildman, where he specialises in technology, media and telecommunications (TMT) and privacy matters.
Adam Lewington is currently a trainee solicitor at Edwards Wildman, with experience on IP/IT matters.
Big data: The next frontier for innovation, competition, and productivity, 2011
Letter from the President of the CNIL to Larry Page, CEO, Google Inc, dated 27 February 2012
 How Companies Learn Your Secrets, Charles Duhigg, The New York Times Magazine, 16 February 2012
The Data Retention (EC Directive) Regulations 2009 in the UK
Regulation of Investigatory Powers Act 2000 in the UK