Big Data, Open Data, Midata… It’s All Linked

Readers of James Gleick’s book ‘The Information‘ will know it’s a daunting task to explain ‘what’s going on in the data world’, let alone to make commercial sense of it. We have realised that data is everywhere, indeed it’s everything. We can store a 100 million hours of high-definition video in a coffee cup. Indeed, our own bodies are highly evolved networks, and even rats can be linked to form a super-computer.

Yet we are still in the ‘primordial soup’ phase of our evolving efforts to extract meaning from the great chaotic swirl of data. Our systems for creating, storing, extracting, editing, filtering and accessing data are still very crude. E-mail and text messaging are as silly as the ‘Chappe telegraph’.

Nevertheless, our capabilities for extracting meaning from uncertainty are evolving much faster than our tools for managing and directing those capabilities – law, regulation, self-regulatory codes, standards, contracts, data consents and so on – and it is important to understand the gap. Not only do we need to define and deal with data misuse, but we also need to know when our ‘old’ systems and related management tools are inhibiting innovation and competition in our markets for goods and services. It is all very well to ignore Facebook for one’s personal use, but we mustn’t overlook the fact that over a billion people use the service.

Understanding the latest developments and where they might lead is a constant challenge, but some shorthand definitions of the latest buzzwords should provide a useful starting point:

· ‘Midata‘ is a voluntary programme, facilitated by the Department of Business Innovation and Skills (BIS), to encourage suppliers to make each customer’s transaction data available to the customer in machine-readable format (if they aren’t doing it already);

· ‘Open Data’ is a government programme to enable public access to public data by connecting sets of data using uniform resource identifiers (‘Linked Data‘) and publishing data about them in machine-readable formats;

· ‘Internet of Things‘ is the result of networking physical items using a combination of microchips, sensors, Linked Data and machine readable formats;

· ‘Big Data’ refers to the exploding array of tools for analysing giant sets of data.

So, what do these terms mean in practice?

Midata

What’s the point of gaining access to your own transaction data in computer-readable form? By ‘transaction data’ we mean records of, say, mobile phone calls, retail purchases, current account transactions and energy usage – not just till receipts, monthly summaries or bills. Machine-readable formats make it easier for you or your service provider to convert the data into knowledge, either on its own or in combination with other data. You probably won’t ever ‘look’ at this data, but it can be analysed to ensure you are on the most appropriate mobile phone or energy tariff, for example, and to keep switching as new tariffs become available.

New intermediaries are evolving to help this happen. Let’s call them ‘personal information managers’. They may simply store transaction data, or record where it is stored (‘data stores’); or they may offer analytical services, evaluating different data sets to identify alternative products, and handling the switching process.

New data-sharing arrangements are also necessary in the light of ‘midata’ scenarios. Your transaction data may be released directly to you or your nominated data store or service provider, and then may be transferred to others. Such data sharing presents new challenges for the existing framework of data management controls. Some of these controls are general (such as the compliance regime that flows from the Data Protection Act 1998) while others are industry specific (eg the Consumer Credit Act 1974, s 159, relating to credit reference agency data). The picture is complicated by the fact that the transaction flows (the contracts and messaging amongst the parties who determine the flow of data) tend to be more complex than the flow of data itself. Operational risks that industry participants are working to control include:

• lack of consumer consent, or even lack of awareness that data will be released or shared at all;

• failure to identify one or more parties, fraud, wrongful disclosure and the release of the wrong person’s data;

• ‘wrongful’ refusal to release data;

• interception of messaging and/or data in transit;

• the supply of data that is inaccurate, late, false, corrupted or otherwise unreliable;

• data misuse, loss or destruction;

• apportioning liability for all of the above, including complaints handling procedures.

Industry participants, including consumer groups, are evolving new controls to deal with these challenges, partly through service design and contractual provisions and partly through involvement with other participants in programmes like ‘Midata’ and the World Economic Forum on ‘rethinking personal data’.

While large-scale suppliers in key industries have dragged their feet (hence the government’s legislative plan to force recalcitrant providers to co-operate if necessary), there are already many instances of this type of ‘midata’ sharing. Internet banking services, for example, usually enable you to download current account data in machine-readable form. Other instances are more complex. LinkedIn recently replaced the ability to add certain third-party data storage applications to your profile with the ability to add direct links to data posted elsewhere. Any data that you may have loaded for display on LinkedIn via an affected third-party application can no longer be displayed, and must now be transferred to the third-party provider’s systems if you want to display it. To enable that transfer, you must first set up an account with the third party, then follow the instructions to import the data from LinkedIn. By following this process I was able to transfer my Amazon.com ‘Reading List’ to my new profile on Shelfari, the literary network, without ever handling or seeing the data.

Open Data, Linked Data and the Internet of Things

The practical applications of the Open Data programme are too numerous to begin to summarise, but can be gleaned from the site statistics (which are, naturally, available in machine-readable form). ‘There are over 9,000 datasets available, from all central government departments and a number of other public sector bodies and local authorities.’ In support of that programme, the Open Data Institute was recently founded by Sir Tim Berners-Lee and Professor Nigel Shadbolt, as an independent, impartial, non-profit company to:

‘catalyse an open data culture that … will unlock supply, generate demand, create and disseminate knowledge to address local and global issues.’

While the Open Data programme is focused on opening up public sector data, the real power of Linked Data will only be fully realised when commercial organisations consistently publish their product data in computer-readable format. I could then programme an application or ‘spider’ to search product providers’ open systems to find and assemble the products that are right for me. This spider could use my personal data to conduct its search without disclosing that data to any product providers, at least until the time of purchase (and disclosure might not even be necessary then). It could also collect open public sector data related to my desired activity, and analyse that in the context of my relevant personal transaction history, as well as data drawn from ‘Linked-things’ that I own. ‘Mashing’ all this data could save time and vastly improve my choice of new car, appliances, holiday or home improvement, for example, as well as how I pay for those purchases. The spider could also save me money by keeping me on the right energy and mobile phone tariffs and so on.

This is not about ‘intent-casting’ or ‘demand-casting’ in order to encourage suppliers to send me their offers. My spider would not announce to the world that it is looking for anything. It would simply run around the web collecting and analysing openly available data and report its findings to me.

The End of Big Data?

In a Linked Data world, the marketing challenge for suppliers is not to find customers, but to enable their product data to be found and directly embedded or accessed by customers’ machines as and when required. So, rather than spend vast resources on human-readable sales and marketing, suppliers should only need to ensure that their product data is accurate, up-to-date and in machine-readable format.

A ‘Big Data’ approach to marketing, on the other hand, involves suppliers using analytical tools to sift through giant datasets to find a specific customer type for the purpose of then targeting people who fit that profile with advertising in the hope of making a sale. This approach rests on the proposition that a more accurate profile of a person can be obtained by observing the breadth of that person’s behaviour, rather than the depth of their history in any one area. Some even go so far as to suggest that your behaviour – who has trusted you, when and why – generates ‘reputation capital’ that can be aggregated, displayed and used to extract better pricing from suppliers, similar to the effect of a credit score in financial services (see Rachel Botsman’s TED talk). However, human behaviour does not fit a bell curve. There is no ‘normal’. Similarly, the data related to our behaviour varies by:

the context or the activity we are engaged in;
the persona we are using at the time;
the nature of the data itself;
the permissions given;
the rights that flow from those permissions; and
the various parties involved.

It follows that you have no single reputation, but many. And they do not really ‘add up’ in a meaningful way. It remains uncertain whether a person with a ‘good’ overall reputation, or a high amount of total reputation capital, will be ‘good’ in a specific context. So, in retail marketing terms, Big Data still leaves a supplier to figure out the ‘right’ customer profile and context that signifies the need or desire for a specific product, and then how to ‘target’ similar people with the right message. The assessment remains probabilistic rather than predictive: the supplier still doesn’t know which half of its advertising spend is wasted. Similarly, consumers’ reputation capital may decline in the same way ‘trusted brands’ tend to become diluted, or less trusted, through being overly stretched by association with too many different types of products or failures.

In any case, it seems illogical to extract specific behavioural data about customers from huge datasets only to re-aggregate it into broad-based, unified scores or profiles. This is developing knowledge for the purpose of creating fresh uncertainty.

So, while using Big Data analytics may feel like suitably sophisticated hard work for corporate marketing departments, merely publishing product data in machine-readable form may ultimately prove a more reliable way to attract customers.

However, Big Data tools would be helpful in, say, isolating the many different behavioural data points that could better verify a person’s identity in real time than relying only on a few items of static, official data which might have been hacked or falsified. Such unique and momentary means of verification could also be immediately discarded. As a result, we could simply, conveniently and efficiently prove our identities in the course of our day-to-day activities while being less vulnerable to ‘identity theft’ (see my blog post, ‘Identity is Dynamic, Not Static. Proof: Momentary‘).

Ironically, such tools might also help customers locate the right product for them where there is a huge range, and enable suppliers to tailor products to reflect highly specific demand.

In fact, a combination of Midata, Open Data and Big Data tools seems likely to liberate us from the tyranny of the ‘customer profile’ and reputational ‘scores’, and allow us instead to establish direct connections with trusted products and suppliers based on much deeper knowledge of our own circumstances.

Simon Deane-Johns is a solicitor specialising in retail financial services, e-commerce and IT. He chairs the Media Board of the Society for Computers and Law; and is a member of the Interoperability Board of Midata. Simon also writes at Pragmatist and The Fine Print, and is the author of Lipstick On a Pig: Why Bail-outs Fail and People Power Will Succeed (Searching Finance, 2012).

Upcoming events

Policy Forum 2025: a model for powerful AI legislation

Lunch and Learn: Digital Transformation in practice – Gaining the edge from drafting

AI Law: what every business (and their lawyers) needs to know

Data Protection Conference 2026

Tech Disputes Masterclass 2026