Where does all your personal data collected online really end up? We tell you

While ad-targeting is both prominent and increasingly invasive, ads are but the tip of the iceberg.

Written by:

By Arnav Joshi

The biggest fallacy in many people's understanding of online data gathering is the widely-held belief that the only real downside to what they willingly offer up online is them seeing a few more harmless ads. Nothing, sadly, could be further from the truth.

While ad-targeting is both prominent and increasingly invasive, ads are but the tip of the iceberg, and an ominous mass of other destinations for our data lies beneath the surface. Here is some of what else happens after you scroll down those ostensibly impossible-to-read terms and conditions and hit "I accept".

* Data brokers: These are people who trawl the Internet collecting, buying and selling people's online data, usually to anyone who's willing to buy. While they are not new actors in the data industry, most discussions about them centre around the sale of data to potential advertisers. There is, however, a far darker side to what data can be obtained and the damage it can do.

In Europe, legally perhaps the most secure jurisdiction for data rights, a recent investigation made shocking revelations about the kind of data one could buy off the Internet from such brokers. This included financial records, clean or murky browsing history, locations visited and even drug preferences.

Although user data is typically "anonymised", it is surprisingly easy to merge otherwise innocuous datasets and engineer patterns to profile and pinpoint individual users across devices, locations and websites.

In India, this is exacerbated by biometric Aadhaar information with insufficient safeguards, brazenly collected and connected with every service and utility the government's dart lands on. This then begs the question: In the wrong hands, which could be competitors, governments, or unscrupulous elements on the dark web, what is the kind of damage that this data can cause?

* Scoring: Algorithms that utilise machine learning and big data analytics to support decisions are becoming increasingly ubiquitous. Using myriad structured and unstructured data crunched through what are known colloquially as black boxes, they produce actionable scores and ranks (1 to 5, red, yellow and green), effectively telling the user exactly what decision to make.

These are already applied in areas ranging from the familiar (credit scores), to those made plausible in the name of national security (terrorism scores), to those that lie firmly in the ethical twilight zone -- insurance premiums linked to IoT devices, or the speed at which you scroll through their T&Cs; job application outcomes based on obscure precedent and your online digital traces; and criminality scores used by courts to establish one's propensity for recidivism.

Although designed to be objective, efficient processes, algorithmic scoring is rife with unresolved issues, ranging from biased feedback loops to the inability to analyse the reasons behind their outputs. Having been buried in fine print, if at all disclosed, we are increasingly at the mercy of work-in-progress scoring algorithms while being largely unaware that these processes are even being applied.

* Algorithm fodder: The warm, felicitous (for some, eerie) feeling the suggestions and posts, news pieces, suggested videos and even taxi proximity you see online or on an app gives you, are attributable to a lot more than happenstance. Digital products are designed to be opiates for users, who feel constantly drawn towards the newest notification, like, or the next suggested video when they'd otherwise rather have gone to bed.

These gimmicks and filter bubbles are based on processes that constantly learn, adapt and perfect themselves based on a constant stream of training data from us, their users. On Facebook, among a plethora of other data points, every click, hover, picture and poke is stored and analysed. Uber uses location information even after you've left your cab (although they've vowed to roll this back), Amazon and Google use voice data shared with Google Now and Alexa.

All this data enables companies to refine their products and tailor user-experience with incredible precision, to know you better than you know yourself. While, arguably, there's nothing wrong (or illegal) in the fact that this is done, users are largely blind to their contributions in bettering these products, and in turn adding to someone's bottomline.

We can't feel our data, which makes it difficult to realise just how much it envelops us. In a world with almost entirely blurred boundaries between online and offline lives, however, data is an invaluable resource for everyone involved.

Although the technologies we have grown to know and love have contributed immeasurably to the way we learn, interact, transact and commute, not everything has gone according to plan. This realisation is luckily, albeit slowly, dawning on both companies and policymakers and paradigm shifts in the way data is gathered and used are in the offing.

As we inch towards this, there is plenty people can do by proactively educating themselves and making informed decisions about their data. The oft-repeated adage of social media economics is that on the Internet, if you're not paying for something, you are the product, not the customer. And that's probably okay for most of us, resigned as we increasingly are to our datafied fates.

We can, however, no longer ignore the caveat emptor slapped on every digital product we use, and the time to wake up and smell the megabytes is here, and now.

(Arnav Joshi is a technology lawyer and Data and Society master's candidate at the London School of Economics and Political Science. He can be reached via Twitter @boom_lawyered)

Art

Court

News

Media

Where does all your personal data collected online really end up? We tell you

Related Stories