Should We Be Scared or Excited?
Go to any conference on any topic and at some point, the phrase “big data” will show up. Turn on your TV and witness the latest paranoid presenter claiming that life, as-we-know-it, will end because of the proliferation of big data. So, what is ‘big data’, how ‘big’ does data have to be to be called big, and is this just a subjective term created to confuse?
What the Data Scientists say
Data scientists usually call datasets with a huge amount of frequently updated data presented in a variety of formats ‘big data’; with the size of the data set being beyond the capability of traditional data-processing hardware and software. More specifically, you couldn’t work with a big data set on a calculator or even in Excel. You most certainly could not work with it on a back of an envelope. You get the picture.
And, pictures are what Data Scientists are looking for – they search through masses of data looking for what valuable information and patterns are hidden: the following are a few examples of where big data sets are used to improve the human condition.
First up, healthcare
In healthcare, big data analytics has the potential to cure disease and cancer, reduce costs of treatment and predict outbreaks of epidemics. At the population level, big data are being used to predict such things as the spread of Zika virus in Latin America and predict Dengue Fever in certain cities in Brazil. At the level of the individual patient, medical history and DNA data can be used to determine the most effective treatment for the patient, based on other patients with similar disease patterns and genetics (commonly referred to as personalised medicine). Taking it down another level: big data can be used at the cellular level to find common features and patterns of individual cancer cells could help us to understand how tumours grow and what drug treatments might be the most effective.
Turning to the risks on roads
In the US, the Indiana and Tennessee Police use big data to better understand the dangerous hot spots on their roads – the places that are most frequently associated with road traffic accidents. Using data from all reported crashes in their respective states, they combine this with weather data to predict the likelihood of incidents in any area in a given four-hour window each day. Officers then can have real-time access to such intelligence and can target high-risk areas and provide faster response times.
Not just road hotspots – crime hotspots, too
Another example of big data use in the US: the Los Angeles and Santa Cruz Police Departments are using big data to identify patterns of criminal activity. Data from 13 million crimes, committed over the past 80 years, are entered into a mathematical model to predict where crime is likely to occur on a given day. They assign police officers at the time and location where the predictive model shows a high probability of criminal activities. As a result, there has been 33% reduction in burglaries, 21% reduction in violent crimes and 12% reduction in property crime in the monitored areas.
As with all things – what can be used for good can also be used for profit
Big data gets a bad name when it is used for more nefarious purposes: often to exploit people or communities. Of course, what one might consider good use, another may not be so enthusiastic about. It is likely that the Los Angeles criminal fraternity are less than enthusiastic about the enhanced Police intelligence; conversely, law abiding citizens probably applaud it. Who decides on what data are used and how they are used, is an area that is still evolving. Traditionally in medicine, ethics committees would oversee the collection, collation and use of data, but these processes do not exist for some datasets generated in the public domain.
Arguably, the most concerning use (or misuse) of data is in how data are handled. Security breaches during processing or storing of personal data are sadly far too common. Big data is a valuable commodity and as such, great care should be taken in ensuring its security. It seems that we are never far away from a breach in data security – take for instance, the 2018 hacking of 500 million Marriott hotel customers. In that instance, hackers accessed the reservation database of the hotel chain stealing a myriad of information about Marriott clients; including names, passport details, credit card details, and email addresses.
To develop an understanding of your personal data profile, an analogy is called for. In the classic children’s book (and Greek mythology) “King Midas” , everything that the king touched turned to gold. In a similar manner, everything we touch turns to data. Although that data may not be personal (if you use cash to buy a pie the data generated would be related to the type of pie, time of day, location of shop, temperature of pie warmer etc.), if you used a credit card to purchase the pie then some data would also be collected by the bank and that specific pie purchase could be linked to you. Of course, data are not just generated by our purchasing behavior, data are generated from our smart watches and fitness devices, our social media interactions, our publications, our memberships, in fact, every facet of our daily life.
Should we worry?
A key thing to remember is that a single data point, on its own, has very little meaning and very little value. The value and risk come when data are collected, collated and analyzed, and then from how the data are used (and by whom). Big data has the power to improve life and living conditions and it has the power to disrupt and oppress. As individuals, all we can do is demand that those who are custodians of our data treat it with respect and reverence, as well as assume responsibility for our own security of what and how we share data.