Or: Why the geospatial community needs to shed old habits and learn new skills to take advantage of Big Data.
I just had a realisation. Sometime in the recent past I must have crossed a threshold. I don’t know exactly when it happened, or how. It was a kind of intuition or common sense, taking small steps, making gradual changes, and suddenly I’ve woken up and seen what I have done: I no longer believe in the classical doctrine of old-fashioned GIS and data management.
Looking around my department, staffed entirely by data experts and other tefal heads, it suddenly dawned on me that I was no longer engaged in what I might have called ‘data’ a few years ago. Take our placement student who is doing a project on artificial intelligence. Analysing our real estate data he has shown that if you mention the word ‘radiator’ in a property listing, the house won’t sell. Figure that! Then there is the timeline of the British economy, brilliantly visualised by my in-house guru: 10 years’ worth of housing market data, condensed into dots and bubbles performing a beautiful choreography on a five-dimensional stage. Millions of data points made entirely comprehensible in one clip. Amazing. The list goes on.
I hate to admit it but I’ve joined the latest bandwagon: Big Data. But it’s not just the ‘big’ that is different.
Like many others in the geospatial industry, I had grown up with the notion that the world was there to be abstracted, structured, ordered, and modelled with great accuracy. When I entered the industry in the late 1990s, GIS and relational databases were state of the art. People talked about how spatial data infrastructures would create virtual representations of everything that exists in the world. The digital nirvana was near.
When the nirvana finally arrived it didn’t quite look like some people had imagined. Instead of the Legoland which some had expected – a stack of bricks, neatly built from the ground up – it looked more like a pile of Play-Doh balls: amorphous, gooey, messy. Google Earth, for example, was great fun but it didn’t take long for many of us ‘serious’ professionals to dismiss it as eye candy. Not accurate enough. Not the right projection. Bad rubbersheeting. Poor attributes.
Critically, creating spatial data had suddenly become as easy as composing an email, and so KML files began to rain down like a hail storm announcing the arrival of a tornado. People started annotating the maps with random snippets of data without any concern for relevance or quality. Purists who had dedicated their lives to structured order were feeling exasperated: Where are the standards? Where is the metadata? To which most non-initiated people said, meta-what?
The truth is that mapping had simply become a more realistic, a less abstract representation of the world. But hey, we said, we still do the heavy lifting, and we have the degrees to prove it. Who else knows about geodetic datums or spatial intersections? Of course. Not like those lightweights at Google who manage petabytes of data requiring so much energy, they need their own power stations.
The trouble with the old data world is that the only perfect database is an empty one. This is because the world is not perfect, not regular, not linear. It’s a kind of chaos so vast that even its randomness creates patterns – a bit like those Mandelbrot fractals that were so popular with the first PCs in the 1980s.
If you are a classically-trained spatial data professional like me, don’t let your well-honed perfectionism get in the way of your next ride. Big Data is here now. Take a deep breath and accept that quantity will eventually trump quality. And when the quantities are huge, the insights can be many. In the world of Big Data, your job is not about structuring or managing data. It’s about telling stories.