How to proceed in the age of big data?

A couple of weeks ago I read an article in the New York Times about the age of big data, and today at a science and technology conference I got into a conversation about the same thing with a US public health official.

Much has been written (and I am a guilty party) about Google’s quest for information, including allegations of infringements of privacy etc, but not all of this capability should be seen in a negative light. I would like to give you a few examples of why.

A wealth of data

Google collect all of the search terms used by every user and categorize them. Let’s take a hypothetical situation. You are director of a large hospital inManchester. What can Google tell you about your job? Well probably a lot, let’s say that this week there is an enormous peak in the search terms “Flu symptoms” used across the Greater Manchester area, or “rash on back and neck”. Indirectly the knowledge of these search trends tells you that you should prepare your hospital, because late next week you will have a massive influx of patients with the Flu or some other contagious disease as it takes hold of the population.

This information is potentially lifesaving, as one of the main problems with epidemics is they come out of nowhere and so health centres are not properly prepared.

Search terms can also give an indication of how the housing market will behave too, with a rise in searches for houses in a certain area being reflected 6 months later in new sales. The type of house searched could also improve planning, as developers would see what people were looking for and where.

Analysts and programmers are currently working on how to expand on the simple examples above using search terms as wider indicators, a system called ‘sentiment analysis’ looks particularly promising.

This form of analysis looks at terms used during on line communication and categorizes them in terms of their sentiments. The logic is that in an area that is prospering terms will be generally positive, but in an area that is threatened by demise, such as the closure of industry or other societal problems, the terms will differ. This is not dissimilar to the conversation analysis sociologists use to obtain a person’s own sentiments about their position in life, with their true feelings reflected in the terms they use without thought. The hope is that an accurate analysis of this type might signal unfolding problems before they become a reality so that action can be taken in specific areas to avoid social breakdown.

I have addressed these issues in more depth on the Bassetti Foundation website, but want to conclude by saying the following; in my posts I have often raised the issue of data collection as a problem, and collection of personal data for advertising or any other purpose for that matter does raise serious ethical issues, but here Google et al could be sitting on a mine of extremely useful and possibly globally important data if the technology and political will is developed to use it correctly.