Peter Leonard challenges some accepted wisdom about the value of big data and how data driven businesses are regulated.
Much of today’s discussion as to reasons to regulate big data is misguided. Misuse and abuses of data about citizens and consumers clearly must be subject to sanctions which operate as effective disincentives to misconduct by data controllers. But there is not always a close correlation between size of data holdings and potential to cause harm to data subjects. Common errors in correlation include: over-estimation of value of raw data, as distinct from value in ability and capability to link diverse data sets and thereby derive actionable insights; generalisation of conclusions about data capabilities of global consumer data platforms to include other large data driven businesses and shared data eco-systems; and over-concentration upon current tools of competition policy, rather than exploring the possibilities for using a variety of incentives and regulator tools to effect context specific rebalancing of data rights.
Raw data has little inherent value. Large quantities of data are often less valuable than small quantities of the right diversity of transformed and correlated data sets. Data value is derived not by what data is, but by the ability of an entity:
Exclusivity of an entity’s practical control of data can be qualified through regulatory action in a variety of ways. Possible value depleting regulatory interventions include:
Sometimes data derives value not through direct application of that data, but through enabling testing and development of code for application on other data. So-called artificial intelligence (AI) didn’t beat grand masters in chess and Go by being intelligent, but through playing games 24x7x365, generating ‘training data’ to inform machine learning (ML). In AI and ML applications, data may thereby enable code that enables analysis of other data, making that other data more valuable. Often a large volume of data of uneven quality can yield algorithms of substantial value, which may then make poor data or narrow data sets more valuable. In short, data (through the intermediary of code) can be transformative in value of other data.
Creating value in data
Valuation of so-called ‘data rich’ businesses is sometimes confused by a failure to distinguish between the quantity and range of data sets that a business holds, and the capabilities (or lack thereof) of a business to transform those data sets into actionable insights or other sustainable business advantage. Transformational methods, code and algorithms are often fungible across business sectors, with the result that data rich businesses concentrated within particular industry sectors may not achieve economies of scope of data analysis that are available to cross-sector service providers. Scarcity of human capital, and in particular experienced data scientists, means that much data that is captured today is not transformed and never achieves its potential value. Human capital remains the key investment in cleansing, transforming and linking data, in discovering useful correlations, and in creating and applying algorithms to data sets to derive actionable insights. Technology enables, but humans (still) create. What is more, humans are ambitious, fickle and moveable. A quality people culture will often be the key business differentiator of good data driven organisations.
To put it another way: the analogy commonly drawn between ‘control of data’ and ’ownership of oil’ undervalues the value-adding contribution of the processes required to ‘refine’ data and create algorithms and code to power actionable insights for businesses. Good insights as outputs are only possible through firstly, a good deal of hard work in creation of quality data inputs, and second, development, refining, testing and deployment of robust algorithms that are the engine of transformation of data into insights. Creation of quality data inputs and robust algorithms is difficult and can be slow. This is one of the principal reasons why many of the more ambitious predictions as to roll-out of applications of artificial intelligence have proven incorrect.
Valuable business insights are often deployed in disrupted product or service sectors that are characterised by increasingly short product lifecycles, where returns on investment are highly uncertain. Markets for outputs of data are volatile and unpredictable. Refined (real) oil can be stockpiled, whereas much data is time sensitive and rapidly loses value. Actionable insights often have narrow application, a short shelf-life and require continuing innovation and reapplication. Oil is fungible across many industrial, transport and heating applications, and the movement from fossil fuels to alternative energy is still agonisingly slow. Oil markets may appear to be volatile, but the markets for outputs of data analysis are often substantially more unpredictable.
Further, you can own oil, but (generally) you can’t own data. The closest simulation of ‘real’ legal ownership of data that is available to a data controller is to ensure that ‘the service provider’s data’ (which the provider does not ‘own’ as ‘property’) remains defensibly protectable as legally trade secret and confidential information. But increasingly, data sets must be shared to some degree to yield value. Data sharing within multi-party data ecosystems is required to deliver almost all online services, particularly internet of things (IoT) applications, and also many offline suppled products and services. Many IoT services, and online platforms such as Amazon and Alibaba, require a complex supply-side data sharing eco-system of five or more data holding entities to enable delivery of a service to an end-user and billing for that service. A business to consumer IoT service may include a retail service provider, a data analytics service provider, a cloud data platform, a telecommunications network services provider, a billing services provider, a mobile app provider and an IoT device provider, all sharing data in a world today without settled industry standards as to data minimisation and data security. In other words, at least some sharing of data is required to deliver many services, while at the same time the service provider seeks to protect data value through imposition of safeguards and controls to ensure that ‘the service provider’s data’ (which it does not ‘own’ as ‘property’) remains defensibly trade secret and confidential. This is a difficult balancing act.
How should uses and applications of data be regulated?
Before we can determine whether particular uses and applications of data need to be regulated, we need to apply a nuanced understanding of data and good data governance.
Data can be infinitely reproduced and shared at effectively zero cost. Data does not derive its value through scarcity. Value in data is usually created through investment in ‘discoverability’: in collecting and transforming raw data to enhance capability to link data to other data and then explore the linked data sets for correlations and insights. Often in data analytics projects about 70-80% of the cost is cleansing and transforming raw data to make it discoverable: the high-end work of then analysing the transformed data is the smaller part of a program budget.
Discoverability may be created within a privacy protected data analytics environment. In many cases, substantial data value can be created and commercialised without particular individuals being or becoming identifiable. Through deployment of appropriate controls and safeguards, analysis of personal data need not be privacy invasive. Of course, it is easier to link disparate data sets by using personal identifiers than it is to deploy a properly isolated and safeguarded data analytics environment that uses only pseudonymised data linkage transactor keys. It is also easier to release outputs and insights without taking reliable steps to ensure that the outputs cannot be used to re-identify affected individuals. Good privacy management is exacting. The frameworks, tools and methodologies for good data governance are immature and are therefore not well understood. And good data handling on its own does not create good outputs. Executives of organisations often do not know how to evaluate the quality of their data scientist units and the reliability of data science outputs and insights. The term ‘data science’ carries, as the term management science once did, the enticing ring of exactitude. However, algorithms may be painstakingly derived and applied, but may be based on poor data, or simply misapplied in particular contexts. Often poor data practices are implemented inadvertently, or as a result of cutting corners, rather than through bad intent.
Most importantly, we need to recognise that most data about what humans think or do is generated through transactions involving those humans in circumstances where humans no longer understand or control the data exhaust associated with those transactions. Most data is inherently transactional, but it is gathered from, or is about, transactions that take place in circumstances where many individuals making the transaction do not fully understand this transaction, or even that there has been a transaction. In any event, these transactions are between multiple parties and so there is a bundle of rights and responsibilities attaching to them that can be reallocated or repackaged by regulatory intervention. Where citizens and consumers are unwilling or unknowing transactors, there is particular vulnerability to data uses that may be adverse to their interests. A simple example: I don’t choose to be observed by my very smart rental car, but I am. When I drive it out of the parking slot, I don’t reach for the vehicle manual to school up on the car’s data analytics capabilities. Often, I have no real opportunity to think about whether or not they should give consent. Even when I am informed about particular data collections, life is too short for me to read and evaluate the terms: I do not knowingly and reflectively give consent to particular uses.
Protecting the rights of participants in multi-party data ecosystems
Should recalcitrant consumers (such as me) who don’t read all terms proffered to us be punished for our failure to engage with the torrent of privacy disclosures by organisations with whom we deal?
I don’t need more notice or more click-through consents.
I don’t expect, or need, regulators to force more responsibility on me.
But even if I don’t care about privacy, I might wish to join ranks with many millennials and demand to know who is doing what, with the data they have about me. Many millennials do not care about privacy or transparency by right, but sense that value is being derived from data about them, that free services are great but no-cost may be less than fair value, and that they are not given enough information to force a meaningful negotiation over fair allocation of data value.
Many businesses are reluctant to initiate a discussion as to what is fair to consumers, because they can’t control that discussion, or they simply don’t want to give away value. Some early mover data platform businesses, including Facebook, captured the data high ground and since then have engaged in tactical retreats, giving away certain data value if and when required to mitigate particular crises in digital trust of consumers. Many other data driven organisations, such as some insurers and banks, are more willing to sacrifice short term data value in order to preserve longer term certainty and therefore sustainability for data value-adding investments. However, they are concerned that initiating a discussion with customers as to fair data exchange can lead to unpredictable and uncontrollable outcomes: explanations of many data applications and data value chains are devilishly tricky and can sound self-serving, or just plain spooky. Try explaining to sceptical citizens and consumer advocates how real time programmatic advertising does not require any disclosure of the identity of ad recipients, or explaining how audience segmentation value is allocated at points in the advertising and media supply chain. Most data applications have unique, but similarly complex, multi-party supply and fulfilment value chains.
Leaving aside the desire for demand-side transparency to reduce information asymmetry and to enable negotiation as to data value exchange, why should a consumer need to engage with a data collector as to whether a particular collection of data is a fair exchange for benefits that the data collector provides to the data subject, proportionate and reasonable? More transparency may help a consumer advocate or regulator to make relevant assessments, but regulators should not be forcing transparency on the pretext that citizens should then determine whether to change their behaviour. Regulators don’t require consumers to take responsibility for determining whether a consumer product is fit for purpose and safe when used for the product’s stated purpose, and unsuitable or unsafe when used for other purposes. Why should data driven services be any different?
In any event, I usually don’t know when an algorithm is being used in a way that may affect how an entity deals with me, particularly where the algorithm is fuelled by data which is not personally identifying (and therefore largely unregulated by most existing data privacy laws). I don’t want transparency and then responsibility for me to exercise a decision based upon evaluation of that transparency. Instead, I want accountability of the data controller, to ensure that the data controller responsibly and reliably does what is fair and reasonable. This may lead to a need to restrict data flows within a multi-entity data ecosystem, or require opening up of data ecosystems to new data intermediaries. Of course, ‘fairness’ is a notoriously normative concept, which is why competition law seeks exactitude of economic theory in evaluating effects on consumer welfare. Beneficence for most consumer means less than ‘fair’ treatment of a few, at least as those few perceive treatment by others of them. It all turns on the particular context.
Critics of data driven businesses often rightly say that too many data businesses are not self-reflective about balancing their own and societal interests. Many businesses don’t stop to ask: just because I can use data in a particular way, should I? There is a risk that regulators will fall to a similar temptation when considering regulation of business uses of data. Big data holdings of global data corporations look like clear candidates for competition regulation. Data driven businesses can’t assert legal protection against deprivation of ‘their property’ in data, because the bundle of rights and responsibilities of a data controller are not property in data. Rebalancing is unusually enabled because most data is not legally ‘owned’, as ‘ownership’ is conventionally analysed in most jurisdictions. Legally recognised ‘property’ may be tangible (chairs, dogs and pencils) or intangible (software, creative writing, trade marks and patents). Data is none of these things: I don’t own personal data about what I think or do, and often I don’t even know when it is collected or used. Often a large component of intangible value is trade secret (confidential information). Trade secrets are not ‘property’ in most national legal systems and in most (if not all) national variants of generally accepted accounting principles. As a result, rights of protection of trade secrets more readily yield to regulatory interventions.
Of course, the market capitalisation of both ‘unicorns’ and ‘data giants’ demonstrates that public share markets and venture capitalists see value outside traditional classes of property. A single trade secret ‘asset’ can be worth millions, or billions, of dollars. Google emerged out of nowhere to dominate the search engine world by use of Google’s trade secret algorithms. Google’s success today depends upon protecting the trade secret assets collectively described as the Google brand. Many trade secrets derive their value through closely guarded central control: the recipe for Coke, the Google search ranking algorithms, and so on. These trade secret ‘assets’ may not appear in the balance sheet as assets, but derive value through being closely guarded: it is this management that creates scarcity.
Regulators have a broad range of available regulatory tools that may be used to affect activities of data driven businesses. Available tools include enforcement of data protection, consumer protection and competition (antitrust) laws, the new ‘consumer data right’, and facilitation of enforcement by individuals of rights of access to, or portability of, transactional data (whether or not personal information about them) as held by data custodians. These tools should be selectively and surgically used to address particular contexts of data use by businesses that warrant regulatory intervention. But protection of consumers, of individual’s rights of privacy, and of fair competition between entities that operate in a shared data ecosystem over a data platform controlled by one of the parties, are tightly intertwined. Rebalancing the rights and responsibilities of participants in this ecosystem – affected individuals, other consumers, platform operators and entities that willingly or not contribute relevant data through use of the platform – can have profound implications. There is clearly a role, and a need, for good regulation. But context is critical in dynamic markets. Outcomes of regulatory interventions may be unpredictable and unintended. It is hard to be a good regulator.
Peter Leonard, Principal Data Synergies, and Professor of Practice (IT Systems and Management and Business Law), UNSW Business School Sydney