“GIS is not as simple as it used to be.”

As part of my job in the global energy industry I meet a lot of geoscientists. Highly passionate about all aspects of earth science, they’re geologists, geophysicists, or environmental scientists. They use GIS daily but don’t consider themselves to be GIS professionals any more than they are Excel or software professionals. For them, GIS is a means to an end.

When one geoscientist recently said that “GIS is not as simple as it used to be”, it pretty much summed up the mood I’m picking up in a lot of places.  The state of GIS, data and IT are a big frustration. The geoscience community has been using geospatial tools for decades, but the issues they face with GIS have remained unchanged – in fact, they’re getting worse.

The technology has advanced dramatically in recent years, so what’s going on?

Photo: Futureatlas.com via Flickr (Creative Commons)

Photo: Futureatlas.com via Flickr (Creative Commons)

Geoscience ain’t Google or Facebook

We all know how to fire up Google Maps and look for the nearest coffee shop, but geoscientists do not have that luxury. Their data comes from everywhere: digitised from analog sources, scraped from literature or intelligence reports, imported from spreadsheets, dug up from archives, copied from network drives, extracted from databases, downloaded from web feeds.

As geoscientists forensically piece together their story – say, an environmental impact study, or the potential of a gas field hidden under a mile of sand or water – there are as many data formats as datasets. This is quite different to internet companies whose businesses are built on data that is natively digital: they can simply plug in the firehose and suck up whatever comes out.

So when geoscientists have painstakingly assembled their data, the last thing they need is a GIS that doesn’t do what they need, or is like brain surgery to operate.

What has GIS ever done for geoscientists?

From the perspective of a geoscientist, there has been little progress in the world of GIS. If you put yourself in their shoes, this is what we’ve given them over the past decade or so:

First, there were the delusional aspirations of standardizing everything, so that data conversion would no longer be required. The world would consist of spatial data infrastructures and portals, seamlessly joined together by open standards. We all know what happened to these, because no GIS professional today can survive without FME or GDAL. Most industries do not operate like the public sector. The “spatial is special” mantra merely reinforced data silos, hampering integration with other domains.

Coupled with SDIs, web-based GIS also proved that the GIS industry can take a perfectly simple web browser and turn it into a fiendishly slow and complex map system. This came in different incarnations until Google Maps finally put us out of our misery in 2005.

Meanwhile, desktop GIS has gone full-circle from being the professional’s tool to “everyone should use it”, and back to being the professional’s tool. Well, at least it is no longer pretending to be simple, and vendors can now freely praise its complexity as sophistication.

More recently arrived the mobile mapping apps, which – at last! – were simple to use (probably because they were not designed by GIS people). But alas, it is quite a leap from finding the nearest Starbucks to doing geological data interpretation. And so again, the geoscientists fell through the cracks. GIS vendors, forced to respond to the threat of internet mapping start-ups, spread themselves very thinly and ended up pleasing neither its new nor its core user bases.

Then finally came Open Source, the saviour who was going to deliver long-suffering GIS users from the evil clutches of proprietary vendor lock-in. Great. Except that, open source GIS is still GIS. It’s still growing arms and legs, just like any other GIS. It tries to cater for every audience, just like any other GIS. Progress is measured in added functionality.  In this model it simply does not occur to developers to take something away to make it more usable. Steve Jobs would be horrified.

What happened to GIS empowering people?

Professional mapping can now be done in hundreds of ways including proprietary, open source, desktop, web/cloud-hosted, and mobile solutions. Faced with such an array of generic options, where everything is possible but nothing does what you need straight off the shelf, we’ll have to forgive the geoscientists if they’re not ecstatic about what’s on offer.

The task of making GIS usable for a particular purpose falls to the users themselves who, of course, have neither the time nor the skills to do it. Sure, there are partner developers who can provide add-ons or customisations, but that’s not the point.

GIS was always about empowering people. But GIS tools defeat their purpose when you need a whole GIS department to support basic workflows, and a whole IT department to support the GIS stack. People end up working for the technology, not the other way round.

So when the ever-quotable Brian Timoney recently tweeted that most enterprise GIS requirements would be satisfied with Google Earth and networked KML files on a shared drive, he was not far off the truth. Such a pragmatic architecture would certainly have its drawbacks, but in some scenarios it might be easier to work around these than create what you need with a ‘proper’ GIS stack.

It is not surprising that some geoscientists still feel nostalgic for ArcView 3. Sure, this might show their age, but in their eyes ArcView 3 represented everything that a GIS should be: simple enough to use, advanced and flexible enough to do something useful with. Of course, in a modern connected environment ArcView 3 would be woefully inadequate. But in today’s world nothing seems to have taken its place in terms of usability. QGIS comes close but unfortunately it’s beginning to look more and more like ArcMap, which it hopes to displace. For many geoscientists this is barking up the wrong tree.

First the cake, then the icing

To this day GIS has not truly offered geoscientists what they need. Where are the innovative solutions for dealing with the variety of geoscience data? Where are the productivity tools to simply assemble digital scrapbooks of georeferenced information? Where are the flexible data models that enable thematic harvesting and analysis irrespective of data type? Where are the analytical tools that can handle dirty and incomplete data without hours of pre-processing? Where are the predictive user interfaces that only show relevant options?

These tools are emerging, of course… but not in the GIS world. GIS software still assumes that the data is just there, nice and clean. And so it requires a lot of sweat, mud and tears before you even get there – many don’t.

Meanwhile, data mining and analytics packages are slowly absorbing GIS functions into their tools. From the open-source R to the proprietary SAS, many now come with mapping functionality as a standard. It is the same with specialist geoscience packages, such as those from Schlumberger or Landmark. None of these may ever reach the full geospatial specification of a GIS, but why should they? Once you’ve got all the data in one place, with decent analytics tools mapping just becomes one data representation out of many. For illustration, just look at the SAS page above.

Spatial is not special. We in the GIS community have always assumed that because geoscientists like working with maps, they will always like GIS. This is not the case. If GIS is the icing, we first need to help our users bake the cake. If we don’t do it, somebody else will. And if the icing is too complex or expensive, they will just eat the cake without it.

UPDATE 09OCT13: This post has now received over 1,000 hits and 30+ re-tweets. Thank you. But there’s been relatively few comments… surely many of you won’t just simply agree? Agree or not, feel free to add more thoughts below.

Inequality visualised – and solved – with a British Airways 777 airliner

On a recent trip to the US  it occurred to me that BA’s 777 aircraft is a perfect illustration of our unequal distribution of resources, and how we might solve this conundrum for the benefit of everyone.

The problem

At 30 000 feet, the most precious resource (after breathable air) is leg room. And so BA’s 777 cabin is divided into four classes where passengers are granted increasing floor space in return for higher fares. It’s astonishing: over half the aircraft is dedicated to First and Business (Club) Class. I guess it reflects the wealth of the carrier’s host nation. But even so, the aircraft’s most precious resource is distributed rather unequally across four classes. This is not dissimilar to the general distribution of wealth although, to keep the same proportions on board, the front half would have fewer passengers than cabin staff.

Still, as it is 62 people at the front occupy the same space as 162 at the rear*. One passenger in First consumes about the same square footage as 4 or 5 people in Economy. It’s a miracle that such a back-loaded plane can even take off:

Four-class seat layout of a British Airways 777. 14 seats in First, 48 in Club, 40 in Premium and 122 in Economy (total 224). Source: seatplans.com

Four-class seat layout of a British Airways 777.  122 seats in Economy, 40 in Premium Economy, 48 in Club and 14 in First (total 224). Source: seatplans.com

The reality

Over the years I’ve been lucky enough to experience all four classes on long-haul flights. Being 6’7” tall I (barely) survived overnight flights in Economy but I also cruised in Premium or stretched out in Business on corporate missions, and slept like a baby in First thanks to a couple of accidental upgrades.

But, just like diamonds, airline classes are bullshit.

Think about it. Flying is a miracle as well as a privilege – I always ask for a window seat so I can enjoy the view, which I’ll never tire of. This privilege is available whatever the class. And your prime goal is to go from A to B. First Class does not get there any faster. You might save 10 minutes at the immigration queue or baggage carousel, but that’s not worth the extra few grand.

When you think about it objectively, First Class in the air is equivalent to a youth hostel dormitory on the ground. Where else would you share a room with a dozen strangers? Ditto with airline lounges. If they really are that nice, why don’t cafés and pubs look more like them?

I call bullshit. It’s mostly a status thing. It’s outdated and we need to get over it. In an age of resource scarcity we can no longer afford this collective stupidity.

The solution

The solution is quite simple. If the whole aircraft was kitted out in Premium Economy, everybody would be happy enough. In Premium, even I can stretch out enough to avoid Deep Vein Thrombosis. Besides you get served half-edible food and entertained by a decent in-flight movie collection. Out of all classes, Premium Economy offers by far the best value for money.  What’s more, if every seat on the plane was a Premium chair, you’d fit 134 more people on board than with the current layout – an increase of a whopping 60%! And because everybody would be treated equally, there would be no smugness or guilt at the front, and no despair or resentment at the back. Everybody would just “be”. Happy days!

The 777 re-arranged in all-Premium configuration (total: 358), as photoshopped with GIMP.

The 777 re-arranged with an all-Premium configuration (total: 358), as photoshopped with GIMP.

The economics

But wait a minute. What about the cost?

Airline fares are notoriously complex so let’s make some assumptions here. Let’s say the average return fare for a typical long-haul flight is £700 for Economy, £1500 for Premium, £4000 for Business, and £8000 for First. So with the standard cabin layout the total fare would be around £450,000. In the airline business it’s notoriously difficult to make a profit, so let’s also assume that this is a fair return. Now, if we divide this figure by the number of seats in the all-Premium layout, that averages out at £1250 per seat. Outch.

When doing these sums you realise that people in First and Business Class actually subsidise the people in Economy through overinflated fares (no wonder airlines are keen to attract and retain these lucrative customers). Even if the nirvana of equality does not come true, this should provide at least some comfort for the rest of us.

But imagine if we could somehow pull off a one-class flight (and society). Would this be worth paying £1250 each? That’s almost twice the current Economy fare. If the wealth across our society was as equally distributed as the seats on this all-Premium airliner, would £1250 be affordable to all?

I guess it would. And it would more naturally reflect the true cost of flying, economically and environmentally, as well as make more efficient use of resources – in the case of the airliner, 60% more capacity.

What do you think? Do I need to get my head tested, or am I onto something?

* Correction 01Oct13: Not 54 at the front vs 160 at the rear, as posted originally (this was a gremlin that had slipped in during early drafting). So 14 First + 48 Business = 62 seats, and 40 Premium + 122 Economy = 162 seats.

This blog is still alive

There won’t be many readers wondering why I haven’t posted on here for a while but if you stumble on this page, yes it is still alive. In fact my last post, Which type of mapmaker are you? was my most successful ever in terms of views (it rocketed once it got a mention on TechCrunch!) but it also coincided with the start of a new job back in March.

I’m loving my new role and it’s been busy: in my first 3 months I’ve seen clients on 3 continents. My head is bubbling over with ideas for new blog posts but there’s just not been the time (except for a couple of posts on GIS strategy design over at the Exprodat blog). I’m now looking forward to some holidays and will hopefully be able to post new stuff here soon(ish).

Bye for now, and I hope you have a great summer – Thierry

P.S. Thanks to Steven Feldman for inspiring this blog post with his recent update :-)

Flickr: Cynr (CC)

Flickr: Cynr (CC)

Which type of mapmaker are you?

10 ways to make a map – and what they say about you

Once, in my earlier career, I performed a “Geomatics Striptease” during a presentation. Before you get worried, it was just a term I had coined to ask the geomatics profession a simple question: What unique skill do you have left after all the layers of overlapping disciplines have been peeled off? After stripping away IT, geography, GIS, remote sensing and so on, the one core skill unique to the profession was geodetic positioning – making sure that things are precisely & accurately in the right place.

Anybody can make a map these days so, beyond that, what unique skill can you contribute to society? Whatever your profession, you can probably list any number of skills that other disciplines also offer. So what is unique about you? Why should your customer pay you and not someone else? If you’re unsure about the answer, a good starting point might be to determine your preferred mapping style. It will provide some clues as you what makes you unique. So I’ve made this chart to help you find out:

(Click on chart for full size, or get it here from my Flickr page.)

which type of mapmaker are you

The red telephone box

Yesterday, on a Sunday family walk, I came across this red telephone box in Budleigh Salterton, Devon. I was strangely drawn to it. It wasn’t in the best state of repair but, being right on the beach and overlooking the sea, it had a real presence. Despite technological advances that now largely make it obsolete this phone box was, quite literally, standing its ground. It felt like a monolith out of Space Odyssey, its true purpose still waiting to be uncovered.

redtelephonebox

The red telephone box is not just an icon of design but also an incredibly well-proportioned cubicle. Within seconds, literally tens of ideas for alternative uses came to me. It could obviously be a Wifi hotspot or mobile phone charger, but why not a shower or heater cubicle (handy after a swim or surf), espresso vending machine, touchscreen web terminal, digital library, tyre inflator point, electric car charger, first aid dispenser, tourist information, light house, kite launcher, photo booth, exercise/physio stretch bar, hair dryer cubicle, immersive 3D / VR screen for education & entertainment, shoe polisher (with dog poo remover…), etc etc…

What else could it do? The red telephone box is clearly not done yet.

After 4 years on social media, the main thing I’ve learned is…

When I started out with social media in 2009 I was motivated by nothing more than a curiosity to explore and connect the dots. The journey began with Flickr, sharing pictures of places that inspired me. Then followed LinkedIn, Twitter and, finally, this blog which provided a creative outlet to expand on matters exceeding 140 characters. All along I remained  focused on personal interests relating to my profession – data, mapping, geography, technology, global issues, current affairs. Over the past 4 years  I have connected with people old and young, known and unknown, all over the world but also close to home, all sharing common interests. On average I have tweeted about 2-3 times daily, blogged monthly, and built an online network comprising hundreds of people.

So what I have learned?

First, pictures are mostly for telling stories rather than capturing a place or a moment.  I still post pictures via Twitter but my Flickr account has pretty much gone dormant. A smartphone camera is hardly the epitome of photography but still, it would be unfair to say that social media have replaced quality with quantity. Rather, it has become clear that a picture is really just about sharing a joke or connecting with like-minded people. I never envisaged this when I started out, but my most visited image on Flickr is not a scenic vista but a cartoon. Maybe I should do more of this.

I also learned that it is better to make data open. Once I converted the licensing of my Flickr images to Creative Commons, I started getting many more comments and enquiries about re-using my images. The highlight was when Sustrans, a British charity, asked to use this picture of Mother Iveys Bay on the cover of their Cornwall Cycle Map – yay!

However, opening your data also requires that you go into it with your eyes open. LinkedIn, for example, does for all intents and purposes what it says on the tin. It’s a great tool providing a simple and unintrusive way to stay in touch with fellow professionals. But beware of headhunters who distract you with irrelevant job openings because they have seen (but not bothered to read) your CV. Or people who endorse you for the wrong skills. Once you’re out in the open, you have take the good as well as the bad.

Twitter is a different beast altogether. It all depends on how you use it, so it’s down to personal preference. For me, the learning curve was steep: trying to avoid mindless chat or banjo-playing squirrels, the task of filtering the signal from the noise was significant. But with an employer encouraging staff to embrace social media, I kept at it. Today I use Twitter to get industry news, share knowledge, and poke like-minded people in a good-natured way (many of whom I have since met in the flesh). Used in targeted fashion, and in combination with other online sources, Twitter can easily beat the mainstream media and even trade journals.  And tools like Flipboard make curating and digesting information much easier. However, I now follow well over 200 people and am beginning to struggle to keep up in the limited time I’ve got. I really don’t know how anyone can follow thousands of accounts but I guess, beyond a certain point, tweets just become raw data which you need to filter like any other data. But I’m hesitant to go there as I don’t want to lose the human touch.

Still, for all the benefits that Twitter offers, it is not always the useful stuff that catches people’s attention. Of all my tweets there could have been any number of interesting things to share (or so I thought), but the messages that got the most retweets were mostly those where I used my allocation of 140 characters to dispense a dose of dry humour. Like this one, after Apple’s maps fiasco:

iOStweet

On my blog the story is similar. I enjoy the creative outlet that writing offers, and I have been grateful to receive positive reactions to a range of topics I posted on, including Big Data (From Lego to Play-Doh), migration (Where are you from?), books (A true sense of place), or mapping (Let the children map our world). But again the greatest reactions were for things like poking fun at marketing strategies (The Point of No-Geo Return), scoring geo-points in Germany vs England and Seven Questions to Test Your Geo-Personality, gazing into the future in The Next 100 Years, or for suggesting a simple (no) nonsense way for governments to assess the value of open data.

So, after 4 years in social media, what is the main thing I have learned? Well… it’s not about saving the world. It’s not about connecting 7 billion brains to progress knowledge. It’s not about empowering people to topple dictators. It’s none of those things.

It’s simply about having a laugh. (Blimey, and it took me 4 years to figure this out??)

How I used R to create a word cloud, step by step

Or: R is less scary than you thought!

R, the open source package, has become the de facto standard for statistical computing and anything seriously data-related (note I am avoiding the term ‘big data’ here – oops, too late!).  From data mining to predictive analytics to data visualisation, it seems like any self-respecting data professional now uses R. Or at least they pretend to. We all know that most people use Excel when nobody’s watching.

But anyway, R is immensely powerful. It is also command-line driven, which makes it quite scary, especially for those of us who don’t get to be hands-on as often as we’d like to. True, used in the wrong way, statistical algorithms can wreak havoc (garbage in – garbage out), but don’t let this intimidate you. I recently gave it a try myself and found myself hooked in a matter of minutes. And if I can do it, so can you!

There are now many free online courses teaching R but some of these represent a significant investment of time. So to get started and experience a taster of how R works, I would recommend the following: create a world cloud. If you’ve got 1-2 hours to fiddle around then the steps outlined below should help you create your first output with R. For example, here’s a word cloud of all my tweets over the past 3 years:

R word cloud 2010-2012 thierry_g

Yes, you can do this much more easily online with Wordle, but that is not the point… Besides, R also has a package to read directly from Twitter so you can plug all the power of R into it (but we won’t use that here).

So, here’s an example of how it works. I used R for Windows because the family iMac was already in use… As far as I know, however, the steps for the Mac version should be exactly the same.

Step 1: Install R.

Got to r-project.org and follow the download/installation instructions. Easy.

Step 2: Install RStudio.

Why? Because it makes R much more usable, so it won’t scare the pants off you. RStudio is an open-source user interface organising everything you need on one single screen. There are handy tabs and windows: command line, workspace, history, files, plots, packages and help . Do yourself a favour and download it from rstudio.com. Easy.

Step 3: Create a text file to turn into a wordle

You can use any text you like. For the sake of this exercise, the most obscure I could find was the transcript of a House of Lords debate on the state of the bee population… Copy & paste the text into a plain text file (e.g. lords.txt) and stick the file into a dedicated directory in your default documents folder (I’ll call mine ‘temp’). Make sure there are no other files in this directory.

Step 4: Open RStudio, install required or missing packages

For this exercise you need the text mining package (‘tm’) and the wordcloud package (‘wordcloud’). In turn, each of those make use of other packages too. Click on the Packages tab (bottom right window in RStudio) and see if they’re listed. If not, go to Tools > Install Packages (top menu bar) and install them from there. Rather than mess around manually with downloaded zip files, simply install the packages straight through the default CRAN mirror option (if you have a firewall, make sure the URL is not blocked). Once installed, tick the required in the list under the Packages tab – this will in effect load & activate them in the workspace (it’s the same as using the ‘library’ command in R). As you tick them, you may get some warnings of further missing packages that they rely on – if so, install those packages too.

All done? All packages installed? All packages ticked off in the list? Move on to Step 5.

Step 5: The data process – text mining, clean-up, wordcloud

Now we need to load the text file into RStudio and clean it up so that the word cloud makes sense (for example, you don’t want to highlight common words like ‘the’). For reference see Introduction to the tm (text mining) Package.

First, you need to load the text into a so-called corpus, so the tm package can process it. A corpus is a collection of documents (although in our case we only have one). The following command loads everything (beware!) from the specified directory (remember, I called it ‘temp’) into a corpus called ‘lords’:

lords <- Corpus (DirSource(“temp/”))

To see what’s in that corpus, type the command

inspect(lords)

This should print out contents on the main screen. Next, we need to clean it up. Execute the following in the command line, one line at a time:

lords <- tm_map(lords, stripWhitespace)

lords <- tm_map(lords, tolower)

lords <- tm_map(lords, removeWords, stopwords(“english”))

lords <- tm_map(lords, stemDocument)

The tm_map function comes with the tm package. The various commands are self-explanatory: strip unnecessary white space, convert everything to lower case (otherwise the wordcloud might highlight capitalised words separately), remove English common words like ‘the’ (so-called ‘stopwords’), and carry out text stemming for the final tidy-up. Depending on what you want to achieve you could also explicitly remove numbers and punctuation with the removeNumbers and removePunctuation arguments.

It is possible that you may get error messages whilst executing some of the commands, e.g. missing packages. If so install these as outlined above in Step 4, and repeat. Once I also got a message about Java being corrupted (JAVA_HOME not found), so looking this up on Google I found the solution was just to reinstall Java on my machine, reboot, and try again (note you can save your workspace in RStudio, so you never lose any work and always retain the history of what you’ve done). It might all go smoothly the first time, or it might not. Some issues can be specific to your particular hardware, operating system, or software versions. Be prepared for some fiddling – it’s called hacking! And remember, there’s loads of R help forums and tutorials online if you get stuck. Just type the relevant R command or error message into Google and you’ll find something relevant.

If all is well then you should now be ready to create your first wordcloud! Try this:

wordcloud(lords, scale=c(5,0.5), max.words=100, random.order=FALSE, rot.per=0.35, use.r.layout=FALSE, colors=brewer.pal(8, “Dark2″))

This command does what it says on the tin – try it as is, or fiddle with the settings to change the output. For further explanation of the command arguments  see e.g. this page. To highlight a few,  scale basically controls the difference between the largest and smallest font, max.words is required to limit the number of words in the cloud (if you omit this R will try to squeeze every unique word into the diagram!), rot.per is the percentage of vertical text, and colors provides a wide choice of symbolising your data, from single colours (e.g. colors=”black”) to pre-set colour palettes from the ColorBrewer package (e.g. colors=brewer.pal(8, “Dark2″)). Here’s the result:

Lordswordcloud

Congratulations!

Now, to go a step further, you may want to manually remove words from the cloud. For example, to get rid of the words “noble” and “lord”, you could use these commands:

lords <- tm_map(lords, removeWords, “noble”)

lords <- tm_map(lords, removeWords, “lord”)

Or you can make a list of words, c(“noble”, “lord”, etc…), to remove them in one go:

lords <- tm_map(lords, removeWords, c(“noble”, “lord”))

Just rerun the wordcloud command used above (hint: rather than type it all over again, use the Up arrow to scroll back to previously used commands) and see the result. Done!

Have fun!