This is the first in a series of eLetters that will – from time to time – explore some of the issues and risks inherent in the growth of Internet access and the Information Society. The use of such powerful new tools as data mining to analyse personal data collected on the Web, for example, has significant social and personal implications that call for international regulatory action. Most of you, my readers, are involved in some way with ICTs – many, perhaps with the questions raised by this letter.
How do you feel about these questions? What needs to be done to take proper advantage of the technologies while containing the risks? What sort of regulatory action, what sort of international action is called for? Write to me, I would like to include your thoughts and opinions in future eLetters. email@example.com
Correlated beyond all recognition, advertising, and the death of privacy
Once upon a time advertising was just a way to send a company’s message to a consumer. It was a one-way street. Today we are moving in a different, disturbing, direction.
I am not against advertising; it is an important driving force in our economy. It helps us make choices. It informs us of many things we really want to know or are not aware of, but should know about. I think some ads are truly sensational, but then, I even like catalogues. Advertising supports all the broadcast programming we see on free TV, makes magazines and newspapers affordable and one day soon will be bringing the same sort of benefits to your mobile phone.
Targeted advertising, or ‘addressable advertising’ as it is now called, is the goal of today’s advertisers and, by extension, all advertising vehicles. The idea is to address ads targeted to match each individual consumer’s profile. Understanding the consumer’s profile, their interests, needs, wants and desires can – in principle – let advertisers address highly specific messages to each individual. The difference between principle and practice is, in this case, far from trivial. Although this may seem innocent, many of the implications are not.
The data to learn about customer preferences is available. There is a staggering amount of collectable data flowing over the world’s networks. It can be, and is, collected in a great number of ways at a great number of locations, by a wide variety of interested organizations and individuals.
Websites, governments, cable company set-top boxes, mobile phone platforms, suppliers, and location-based services among others, all gather information about us – their citizens, users and buyers. The data and information they gather, properly analyzed, tell them much more about us than we suspect. Governments use the information they gather to profile people and find terrorists, tax-evaders, criminals, track epidemics and much else. Advertisers want to understand how to reach and sell to customers, so that is how they use the data.
The main search engines save data about every search we make, they know if we are concerned about a medical problem, visit porn sites, dating sites, care about model trains, have a police record, buy books about history and like lingerie – Sorry! Is that your wife? Perhaps not? Do you have a mortgage, a credit rating?
It is amazing the sort of things a search engine knows about you. But it is not only the search engine you use, it is the sites you visit, the social networks you use, it is Facebook and YouTube – in fact it could be any site you try to access and some you don’t, you just get re-directed there by an innocent looking link.
What is more amazing are the sorts of things today’s best data mining systems can piece together from the odds and ends. Data mining software and banks of computers can cull facts and correlations from otherwise intractable masses of data. Some of the more innocent uses can be found on the Microsoft AdCenter Labs website. The Labs’ software can predict a user’s age, gender, readiness to buy or sell, or interest to engage in another sort of transaction based upon the user’s recent search history. It can also funnel and analyse search patterns and keyword usage in a wide variety of other ways.
It gives me a creepy feeling to think about the data mining done to target consumers and for other less innocent reasons. We are being relentlessly stalked every time we buy something with a credit card, when we watch a show on cable TV, search for something – really everything and anything – on the Web. Anytime we do anything on the Web, we are subject to the scrutiny of all kinds of data snoops. The high gods of consumerdom know things about us we even we do not know ourselves – and they are getting better at it every day. The world’s governments might know even more and what less savoury groups, including those with criminal intent, might know is frightening.
Yahoo according to one report, and I am certain Google as well, can predict ad response rates and even the time of day the ads will work best. Yahoo, like Microsoft, can analyse the online behaviour on its network, and spot potential buyers at various stages of their on-line search. The depth of analysis and correlation that the best data mining software can perform is awesome. They might for example, based on where you live, the searches you do and the diverse interests you have be able to predict which films you like and which automobiles will interest you. In addition, data miners can analyse the sort of politics you are likely to believe in, what new products you will love and hate, what – if any – books, magazines and newspapers you might like to read.
There is so much data that there is no way to analyse it properly without powerful computers and sophisticated software. There is also no way to act upon the information and devise an appropriate return without, again, powerful computers and sophisticated software. Assuming we can analyse the data and frame a response, the need remains to get the message to the target at the right time and place, but current platforms are designed with just this sort of interactivity in mind.
IBM recently announced a new project called Kittyhawk, to build a worldwide, distributed, supercomputer. The Kittyhawk platform will, they expect, be able to run the Internet, – the entire Internet – alone, as a single application, and replace the current fairly random assortment of interconnected computer networks. The migration of the Internet to one platform, should it ever happen, combined with Kittyhawk’s massive computing power (16,384 racks with up to of 67.1 million cores and 32 petabytes of memory) will increase the power of data mining power to unimaginable levels. Some of the Kittyhawk speculation sounds more like science fiction than fact.
Many of the potential dangers of uncontrolled data mining are obvious, but there is a more subtle, little understood danger that resides in the very nature of data mining: the knowledge discovery methodology employed, the algorithms used to spot patterns and trends and the correlations encountered between the data elements. The process sifts through vast amounts of data searching for patterns not easily seen or found by simpler forms of analysis as they are hidden by the volume and complexity of the data. Neural networks and a variety of mathematical tools are used to spot patterns and calculate the degree of correlation between different types of data. So far, so good, there is no problem with the process; there is a problem, though with way people understand the results.
There is an old saying, “There are three kinds of lies: lies, damned lies and statistics”. I always attributed it to my father – he repeated it often, but according to the Wikipedia, Benjamin Disraeli said it first and Mark Twain later popularized it in the U.S. We need to remember this well whenever we analyse correlations and other sorts of statistical analysis; the numbers may be right, but interpretations often lie.
Correlations are among the most misunderstood mathematical tools. When data is strongly correlated, we tend to assume they are interrelated or even that one of the items causes the other. Gasoline, beachwear and ice cream sales may be strongly correlated – they all go up in the summer – but one is hardly the cause of the other. Genes that produce supermodels might correlate with wealth, fame, newspaper scandals and the garment industry, but the genes are hardly the direct, sufficient, cause of any of these.
In extreme cases, the searches of serious scholars using the Web might be correlated with serial murderers, perverts, tax evaders, terrorists or in some way with whatever else they may be researching.
These cases might be exaggerated, but the guilt by implication – or correlation – and the invasion of privacy that data mining implies are real issues. Data mining results in the wrong hands can destroy credibility, put jobs at risk, destroy families, and create opportunities for blackmail.
The risks might not be obvious, but today’s Big Brother is a computer programme linked to the Web. We, and our lives, might be correlated beyond all recognition, and I see little serious government action anywhere to contain the danger.
Our next Connect-World Europe Issue will be published later this month. This edition of Connect-World will be widely distributed to our reader base and, as well, at shows where we are one of the main media sponsors such as: Sviaz / Expo Comm (14-18 May, Moscow), FT Mobile Media Conference (15-16 May, London), Wimax World Europe (29-31 May, Vienna), and Von Europe (11-14 June, Stockholm).
The theme for this issue will be, The evolving ‘Net’ – Rising to the challenge of rising use.
When speaking of networks, conventional wisdom and traditional business models no longer work as they did. The lines are blurring in the fixed, mobile and even broadcasting markets. Wired networks now handle traffic once thought suitable only for wireless and wireless is substituting wired in a broad range of applications. Seamless handoffs between wired and wireless networks –and, indeed, mergers, partnerships and consolidations bringing together networks and players of all sorts – further confuse the once prettily organised networking landscape.
This issue will examine what these changes in technologies and the market mean for the sector. How can the residential and business consumer best be served? What does the future hold for network operators of all types?
Europe II 2008 Media Pack; Click here