How to Stay Afloat
Egis has developed extensive experience in dealing with large data sets for its motorway operating companies and it is therefore imperative for the company to monitor emerging trends in data management, as Owen O’Reilly reports
Being able to source or gather and then analyse or process large amounts of data or very large datasets can provide insights in many areas. Some of those insights will be enablers of profitable new business models. We have seen this in the financial markets and we are now seeing it in all realms of business as each area recognises that real information insights can be embedded in masses of collected material.
It really is an area perfectly suited to a gold mining analogy: a mountain of data ore needs to be mined for perhaps very few nuggets of real information. Most of this data could be considered non-personal and benign but hidden within could be insights to an individual’s behaviour, a company’s business, or a group’s activities, all of which could potentially be confidential or private.
From a data protection perspective Big Data should be handled like any other data pool. There will be challenges around scale and manipulation but the ground rules will still apply, and will current law be adequate to cater for the sheer scale and the ubiquitous nature of this ever growing resource? It will be important to have a clear idea of what an individual’s and a company’s responsibilities will be in this area.
What is Big Data?
The term has already shifted in meaning in the past two decades. It’s sometimes used as a marketing term, sometimes as just another catch-phrase. There was a time when Big Data could only be processed by bespoke environments and as such was typically defined by that, being a data set so large that it needed a special architecture to process it. It was also plainly out of reach for most organisations because of the massive costs involved in such environments. Now the term is often used to refer to just any large (large being a subjective term here) data-set even if managed by standard processing systems, so we can expect to see it used in that context as well. With developments such as Hadoop (the use of open-source software processing on off-the-shelf components in a distributed computational array) crunching super-sized data sets is possible for most companies. But the large data sets are there to be mined and their availability and application will be increasing apace. Developments such as (that other popular phrase of the moment) “The Internet of Things” will ensure that the Big Data machines are fed.
“It is said that 90 per cent of all data stored in the world has been created in the last two years”.
People also think ‘gold-rush’ when they hear the phrase Big Data and see big business opportunities: ‘quick, let’s search through all our systems to see what we are gathering and see if it has an application or a market’. But where there are opportunities there are also challenges. It could be dangerous thinking if people leap to use or provision before considering consequences. We have an obligation to our companies to enable commerce but this should always be combined with a responsible stance on our stewardship of all our data assets.
One of the key issues impacting the use of Big Data is protection of personal privacy. Any company that is to be successful in its use of Big Data will have to understand how all privacy and data protection legislation impacts its use.
Every company has a responsibility for all data that comes within its remit. That data may be personal or confidential or both, and the true nature may not even be apparent at the outset. If it could potentially impact a person’s privacy then it could have far reaching consequences.
So when is data not information? Data in a raw state may tell you nothing at all: a single plot on a chart with no context says nothing. For it to make sense and convey a message it needs a context or order encoding material. Perhaps it would be better if it was called Information Protection rather than Data Protection since the key to unlocking information from a data set may actually be in a completely different data set.
This raises some fundamental questions such as what your obligations are if you sell or transfer data that in itself appears harmless but someone else takes that and combines it with other information streams and makes something more than the sum of its parts.
For example, take tolling information with number plates processed by most Egis operating companies, combine it with a national driver and/or vehicle database and you suddenly have a possible movement database that tracks individuals: it may have become a more informative dataset with all the responsibilities that incurs. In effect it could enable identity discovery through data aggregation – in this case only two sets of data, but one could imagine an example where multiple pieces of information from multiple data sets might allow you to assign or infer an identity. (I like to call such a derived data set a “Cluedo” set, who knows – maybe the term will catch on.)
If your data is in Europe and you end up with a searchable and structured database of information that can be related to identifiable individuals then you are a database owner and a data controller and subject to all the applicable European laws. To take it to an extreme that could include everyone with a mobile phone directory, which is clearly unenforceable so we have to use a degree of common sense when faced with interpretation of the data protection laws. Nevertheless, beware if you get it wrong as planned European legislation makes clear the company responsibilities with little regard for how difficult they might be to deliver in the real world. If in doubt seek advice. So while no acts or laws apply to data that does not identify the individual we still need to tread cautiously if even to protect an asset that might confer competitive advantage. Of course Big Data could easily be personal data or have personal data hidden deep within it – one of the issues with such large sets being that they make great hiding places.
One of the big traps that a company could stumble into is the inclination to say that this is going to benefit the person as much as it will benefit our company: beware! This thinking will not free you from your responsibilities. Saying that customers will get a better service, they will be safer, they will get the product they don’t even know they need yet, and boy will they be happy when you sell it to them, this is an arrogance that we may see in the near future, but it’s just an excuse for co-opting people’s options to choose or indeed to opt out. This argument will run and run.
We can see this trend become more ubiquitous with many active ways in which the consumer is being trapped into disclosure; they say that “if you are not paying the full price for a service then you are the product” (I wish I could credit the source of this but it already seems to be untraceable).
If we look at Generation Z they are not even conscious of this conflict having grown up ‘connected’, they just accept this level of disclosure as the price of being part of the social cloud. But what happens when all the bits of identity and use information that they have given away for a free lunch are compiled and reused? Will they lose out? Will they have lost competitive advantage and created a digital fingerprint or ghost that haunts everything they do? What about those who “sell” data for short-term gain? Will “buying” data in this way put the poor of the world at a disadvantage. We can see Data Protection rules running to keep up as the next generation realise the impact of what they have been doing, and we can see this already in legislation such as the “right to be forgotten”.
Take mobile communications, one of the big sources of information: the problem with mobile phones is that in reality they are pretty much synonymous with the individual: track the phone and you have the details of one individual’s movements; track lots of mobile phones and you can track whole populations. This is perhaps not getting the consideration by companies or law that it will in the near future. With location services and applications embedded and sending back information all the time the opportunities for business to capitalise on this are huge, as are the privacy issues.
A Framework for an Extended Company Data Protection Policy
The key to handling Big Data Protection is not to mistakenly be blinded by its potential into distinguishing it from any other data the company handles. Just because there is a lot of it doesn’t mean that it should be treated with any less circumspection than smaller data sets that don’t hold the promise of marvellous data insights, indeed, as we have seen from above, it should be treated with even greater caution.
That demands that the right policies and principles for governance of this emerging opportunity are in place. The safest foundation to build these polices on is the basic rules and policies of data protection as already understood for a company’s personal data and as laid down and updated by European Union rulings.
This is no “nice to have” option: proposed European legislation could mean incurring fines of up to €100 million or 2-5 per cent of annual turnover for a breach of personal data rules. We are perhaps used to protecting the HR data of our own staff but if we now have other business sectors of our company accumulating data in an effort to generate revenue or company efficiencies then we all need to be aware of that and its possible consequences and socialise our data protection policies accordingly. The fact that your organisation is “siloed” and there was not an awareness of how some data was being used is unlikely to be a successful defence.
Below is an attempt to create a draft policy as an extension of those understood rules; the current rules are printed in bold and the additions that might be considered when handling Big Data are in italics.
Starting Point: Take it that governance and responsibility lie with the user of the data sets and not the supplier or collector, also that an individual person within the company needs to be responsible for the data. (This may change in law in the future but currently there is a lot to be said for looking at this from the point of view of the group most likely to benefit from the data itself and as the group who are most likely to be processing or mining that data.) This could be the current company Data Controller or a new position of Chief Data Controller (CDO), or indeed roll the two into one. Either-way the role will be important and its success will be founded on the overall company culture and ability of the person appointed. Note that directors and managers of companies that breach data protection rules can be held accountable for those breaches, regardless of whether they approved or were aware of them.
- Fair and Lawful: Personal data shall be processed fairly and lawfully.
Be aware of the source of any data obtained: consider its legitimacy, accuracy, means of collection, any possible disputes that might arise from it. Ask yourself if there are any Intellectual Property rights questions that could arise and of course if it is legal to have and use the material. These might prove difficult questions to answer, but you should at the least be able to show that they were considered and that best efforts were made to ensure compliance. If in doubt it might be better to leave the data alone or consult with data protection officers in your country. This, of course, becomes more challenging with multinationals that operate across many countries, not all of which may have the same data protection regimes.
2. Purposes: Personal data shall be obtained only for one or more specified and lawful purposes, and shall not be further processed in any manner incompatible with that purpose or those purposes.
When it comes to Big Data especially data derived from the “Internet of Things” the use that it might eventually be put to might be unimaginable to its planned use when collected. Indeed its collection could be a by-product or accident of process. That said it’s still important to ask why it was collected in the first place and are there any issues in using it in the manner in which you intend. Again consider it in light of individual privacy, company privacy, fair company competition, intellectual property rights, data ownership, appropriate compensation to the source and what you do with it after you have used it. Consider if any of the above also extends into any derived data set that you create, and keep that in mind when passing it on.
Given that it is likely to be costly to collect big data sets and store them it is probably reasonable to say that no data set should be collected unless an end use is specified and agreed in advance. In the case of by-product data sets (ones that might get collected in carrying out day to day operations, for example an operation might collect lots of road traffic statistics as a by-product of monitoring vehicle speeds) their collection is probably unavoidable but all the questions encompassed in this paper still need to be asked before using or storing such data.
- Adequacy: Personal data shall be adequate, relevant and not excessive in relation to the purpose or purposes for which they are processed.
This is going to be a serious issue with Big Data. If we only look at one obvious area of collection, then we can see that unquestionable “Civil liberties and privacy will be compromised as further technology improvements make it affordable for any organisation (private, public, or clandestine) to analyse the patterns and behaviour of anyone who uses a mobile phone”. With a mobile phone in almost everyone’s possession the device and the user have almost become inseparable where identity is concerned, to such an extent that we have had murder convictions on the basis of mobile phone tracking data. When it comes to using accurate mobile phone data then we need to be extra cautious because of that almost certain link with the individual. Consider every device that could potentially be sharing location data with the cloud and you have a glimpse at the scale of the issues in question. The safest approach would be to anonymise any such data, but such anonymising might mean losing much of the potential value of the data.
- Accuracy: Personal data shall be accurate and, where necessary, kept up to date.
Big Data sets may be so large that ensuring they are up to date and accurate is not going to be possible; indeed depending on the sources it may not even be achievable. The next best thing is that when such a set is knowingly out of date or of no further use to the company then it should become inaccessible (which may mean deleting it).
- Retention: Personal data processed for any purpose or purposes shall not be kept for longer than is necessary for that purpose or those purposes.
Storage costs are lower than ever and are going to continue to fall. The usefulness or financial benefit of a data set may not be obvious at the outset and may only emerge over time. The keeping of data could then become a subjective question and one that might or might not pay dividends. One key is security: if you hold it you pay for that privilege. You have to be able to secure it and audit its use. If you can do that then you may be able to store as much as you wish or can afford. That said there will be data hoarding on an unprecedented scale in the future and all potentially for little return.
Open data also comes into play: it is becoming more common (a trend that will continue) to make some limited data sets freely available for others to use. This is certainly visible in the public sector, where there is a perception that it will encourage application development, improvements in delivery of services or in stimulating or facilitating research to the benefit of all. Therefore it’s important to consider what, if any, data the company releases as open source and consider its potential application when it is released into the public domain. But the public sector is no less bound than the private sector by the rules of data protection.
- Rights: Personal data shall be processed in accordance with the rights of data subjects under this Act.
Big Data may of course be data that allows individuals to be identified or be of such a nature that it could be relatively easily determined who the individuals or groups of individuals referred to are. In this case all of the rules that apply to personal data will apply to your collection, processing and stewardship of such a Big Data set. The fact that it is considered a different form of resource within your company than say HR files confers no special rights or removes no obligations.
Rights also include the right to be notified if the security of the data held about you is breached. Imagine having the personal data of thousands of people and then suffering some form of security compromise: under the legislation the data processor is obliged to notify all of those people. Not an easy task. You would have to also advise all of those people if you were to use a personal data set for a purpose other than that for which it was collected. Again this will limit the repurposing of data-sets in the future. This extends to making a data set available to another party for a known alternative use.
- Security: Appropriate technical and organisational measures shall be taken against unauthorised or unlawful processing of personal data and against accidental loss or destruction of, or damage to, personal data.
This is where network infrastructure, security, and data handling rules come in. This could also be where you incur some of the greatest costs in ownership. You need to have the ability and capacity in accordance with the data handling you are carrying out or you could be making promises that your infrastructure or technology may not be able to deliver on. It is up to the company to ensure the correct people have access to the material that the correct people are using the material for legitimate purposes and that where necessary it can be shown that this is the case.
- International: Personal data shall not be transferred to a country or territory outside the European Economic Area unless that country or territory ensures an adequate level of protection for the rights and freedoms of data subjects in relation to the processing of personal data.
Data sets are already being treated like commodities, indeed there will be markets for such material and premium prices paid for the most generally – or specifically – useful. It is therefore important that when any buying or selling of a data-set occurs that the company takes an objective look at the transaction and ensures that the data being traded comes from or is going to a reputable entity. Of course the problem is going to be establishing who is reputable and who is not and if the stakes are high whether companies will make decisions with our data that are not necessarily in our interest.
This is not an all-inclusive list. A final company policy will need considerably more detail and will need to be updated to reflect changing laws and in particular with EU Data Protection legislation in mind. Such EU legislation is now more urgent than ever but the framing of a comprehensive suite of regulation is also more challenging than ever and whatever is the final format there may be resistance and avoidance by many parties that see such material as core to their business strategies.
As well as the policies to manage the data you cannot underwrite its protection unless you have the means of collecting, transmitting, processing, storing and deleting such massive data sets securely. This involves suitable infrastructures, policies, processes, audit logging, protection in transmission, etc., perhaps made more challenging when the data sets get larger. Crucially you need to resource the handling of such data sources properly in order to capitalise on them and in order to ensure that they do not introduce unacceptable risk to your business.
Big Data offers lots of opportunities for business and opportunities to improve the lot of the individual consumer. It is also a threat, a risk and an issue that is going to regularly feature in the news for the foreseeable future. As the means of collecting and using data on massive scales improves then so too will the challenges and legislation that will always be trying to catch up. Egis operating companies face these challenges daily.
When working with any data set you have to consider its end use, its source, your company’s responsibilities as a Data Controller (or ask your company Data controller if you are not the person in the frame) and also ask yourself how you would like to be treated if this was your data? If the data is attached to individual identities then you must be cautious, as a higher duty of care applies. You would want to see transparency; you would want to see companies recognise that your data is yours and not their property to be carelessly mined for their gain. You want trust and transparency and to be able to see that at the very least the legislation that is currently in place is being adhered to and that indeed the company is helping to draft or quick to respond to new legislation when a need is recognised. If you feel that you are being treated as a commodity or that your data is being aggregated and misused then why would you think of the company that is doing that any differently than you would a hacking or spamming enterprise that uses those very same tools and techniques to target victims?
Finally, remember that while you are intent on using all this data to make a profit you might well find that those that have the best tools for mining your material are the licensing, audit and forensics teams, so be careful where you source your material and what you use it for. Already companies are adopting ethical approaches to Big Data and solidifying their in-house policies, perhaps only to win or hold onto public trust which they value but perhaps also from a recognition of the issues at stake for all of us. At the other end of the spectrum others will abuse the access to such material and certainly we will see many companies wrecked on the rock of Big Data mismanagement in the near future. How are you going to make sure you are not one of them?
Owen O’Reilly MBCS CITP is Back Office Manager and Technical Operations Manager at Egis Projects Ireland.