How I used R to create a word cloud, step by step

Or: R is less scary than you thought!

R, the open source package, has become the de facto standard for statistical computing and anything seriously data-related (note I am avoiding the term ‘big data’ here – oops, too late!).  From data mining to predictive analytics to data visualisation, it seems like any self-respecting data professional now uses R. Or at least they pretend to. We all know that most people use Excel when nobody’s watching.

But anyway, R is immensely powerful. It is also command-line driven, which makes it quite scary, especially for those of us who don’t get to be hands-on as often as we’d like to. True, used in the wrong way, statistical algorithms can wreak havoc (garbage in – garbage out), but don’t let this intimidate you. I recently gave it a try myself and found myself hooked in a matter of minutes. And if I can do it, so can you!

There are now many free online courses teaching R but some of these represent a significant investment of time. So to get started and experience a taster of how R works, I would recommend the following: create a world cloud. If you’ve got 1-2 hours to fiddle around then the steps outlined below should help you create your first output with R. For example, here’s a word cloud of all my tweets over the past 3 years:

R word cloud 2010-2012 thierry_g

Yes, you can do this much more easily online with Wordle, but that is not the point… Besides, R also has a package to read directly from Twitter so you can plug all the power of R into it (but we won’t use that here).

So, here’s an example of how it works. I used R for Windows because the family iMac was already in use… As far as I know, however, the steps for the Mac version should be exactly the same.

Step 1: Install R.

Got to r-project.org and follow the download/installation instructions. Easy.

Step 2: Install RStudio.

Why? Because it makes R much more usable, so it won’t scare the pants off you. RStudio is an open-source user interface organising everything you need on one single screen. There are handy tabs and windows: command line, workspace, history, files, plots, packages and help . Do yourself a favour and download it from rstudio.com. Easy.

Step 3: Create a text file to turn into a wordle

You can use any text you like. For the sake of this exercise, the most obscure I could find was the transcript of a House of Lords debate on the state of the bee population… Copy & paste the text into a plain text file (e.g. lords.txt) and stick the file into a dedicated directory in your default documents folder (I’ll call mine ‘temp’). Make sure there are no other files in this directory.

Step 4: Open RStudio, install required or missing packages

For this exercise you need the text mining package (‘tm’) and the wordcloud package (‘wordcloud’). In turn, each of those make use of other packages too. Click on the Packages tab (bottom right window in RStudio) and see if they’re listed. If not, go to Tools > Install Packages (top menu bar) and install them from there. Rather than mess around manually with downloaded zip files, simply install the packages straight through the default CRAN mirror option (if you have a firewall, make sure the URL is not blocked). Once installed, tick the required in the list under the Packages tab – this will in effect load & activate them in the workspace (it’s the same as using the ‘library’ command in R). As you tick them, you may get some warnings of further missing packages that they rely on – if so, install those packages too.

All done? All packages installed? All packages ticked off in the list? Move on to Step 5.

Step 5: The data process – text mining, clean-up, wordcloud

Now we need to load the text file into RStudio and clean it up so that the word cloud makes sense (for example, you don’t want to highlight common words like ‘the’). For reference see Introduction to the tm (text mining) Package.

First, you need to load the text into a so-called corpus, so the tm package can process it. A corpus is a collection of documents (although in our case we only have one). The following command loads everything (beware!) from the specified directory (remember, I called it ‘temp’) into a corpus called ‘lords’:

lords <- Corpus (DirSource(“temp/”))

To see what’s in that corpus, type the command

inspect(lords)

This should print out contents on the main screen. Next, we need to clean it up. Execute the following in the command line, one line at a time:

lords <- tm_map(lords, stripWhitespace)

lords <- tm_map(lords, tolower)

lords <- tm_map(lords, removeWords, stopwords(“english”))

lords <- tm_map(lords, stemDocument)

The tm_map function comes with the tm package. The various commands are self-explanatory: strip unnecessary white space, convert everything to lower case (otherwise the wordcloud might highlight capitalised words separately), remove English common words like ‘the’ (so-called ‘stopwords’), and carry out text stemming for the final tidy-up. Depending on what you want to achieve you could also explicitly remove numbers and punctuation with the removeNumbers and removePunctuation arguments.

It is possible that you may get error messages whilst executing some of the commands, e.g. missing packages. If so install these as outlined above in Step 4, and repeat. Once I also got a message about Java being corrupted (JAVA_HOME not found), so looking this up on Google I found the solution was just to reinstall Java on my machine, reboot, and try again (note you can save your workspace in RStudio, so you never lose any work and always retain the history of what you’ve done). It might all go smoothly the first time, or it might not. Some issues can be specific to your particular hardware, operating system, or software versions. Be prepared for some fiddling – it’s called hacking! And remember, there’s loads of R help forums and tutorials online if you get stuck. Just type the relevant R command or error message into Google and you’ll find something relevant.

If all is well then you should now be ready to create your first wordcloud! Try this:

wordcloud(lords, scale=c(5,0.5), max.words=100, random.order=FALSE, rot.per=0.35, use.r.layout=FALSE, colors=brewer.pal(8, “Dark2″))

This command does what it says on the tin – try it as is, or fiddle with the settings to change the output. For further explanation of the command arguments  see e.g. this page. To highlight a few,  scale basically controls the difference between the largest and smallest font, max.words is required to limit the number of words in the cloud (if you omit this R will try to squeeze every unique word into the diagram!), rot.per is the percentage of vertical text, and colors provides a wide choice of symbolising your data, from single colours (e.g. colors=”black”) to pre-set colour palettes from the ColorBrewer package (e.g. colors=brewer.pal(8, “Dark2″)). Here’s the result:

Lordswordcloud

Congratulations!

Now, to go a step further, you may want to manually remove words from the cloud. For example, to get rid of the words “noble” and “lord”, you could use these commands:

lords <- tm_map(lords, removeWords, “noble”)

lords <- tm_map(lords, removeWords, “lord”)

Or you can make a list of words, c(“noble”, “lord”, etc…), to remove them in one go:

lords <- tm_map(lords, removeWords, c(“noble”, “lord”))

Just rerun the wordcloud command used above (hint: rather than type it all over again, use the Up arrow to scroll back to previously used commands) and see the result. Done!

Have fun!

Redefining the meaning of progress on the internet

On the internet nothing is too dangerous, or too remote, to connect people. So why are my radio and telephone not working?

Yesterday I was one of millions of people watching live as Felix Baumgartner jumped higher and faster than anyone else before him. It was an awesome feat; absolutely gobsmacking.

Once the initial euphoria has worn off, however, will we still be as impressed? Some records from the 1960s still stand, and probably always will. Baumgartner didn’t quite achieve the longest freefall ever, and obviously he didn’t walk on the moon or even go into orbit. He clearly pushed new limits but in truth the jump may go down in history as something unrelated to space exploration or daredevil pursuits.

Baumgartner takes a dive (Photo via Guardian.co.uk)

In the age of the internet, progress is not as clear-cut as it used to be. Take my internet radio, for example. It can play thousands of stations from around the world, but the menu is complicated and the signal often cuts out when the wifi goes flaky (it’s unreliable near the kitchen wall). Any old FM radio has no such issues. You press a single button and there you go: instant pleasure, without interruption. The only downside: fewer stations. So which is better?

Old radio (Photo by Frisno, Flickr CC)

It’s the same story with internet-based telephony. Whatever happened to just picking up the phone I don’t know. At work we have rolled out MS Lync: a great tool that enables you to do make calls, arrange teleconferences, share your desktop, or instant message with colleagues. But to make a call you have to wear a headset. Unless you want to spend the whole day looking like a dick (wearing a Bluetooth piece) or being physically chained to your laptop (via headphones), answering the phone becomes a panicked fumble with audio equipment that may – or may not – attach to your ear before the call goes to voicemail.

Now that’s what I call a telephone (Photo by mightyohm, Flickr CC)

I really miss my old telephone and radio. At the same time I was absolutely thrilled to be watching Baumgartner’s jump live. So what is progress?

Progress is when I hear someone casually mention Baumgartner at the swimming pool and I spontaneously whip out a device from my trouser pocket that lets us witness him jumping, live and in high definition, from a space balloon nearly 40km above the Earth and some 5000 miles away. 10 people crowded around my iPhone and felt as inspired by the live connection as by the jump itself. We were one of those 8 million live viewers on YouTube  – as TV channels will also have noted with interest. This is a new frontier where nothing is too dangerous, or too remote, to connect people. And anyone can do it.

Well, almost anyone. Now I need Baumgartner to fix my radio and telephone.

Art vs Science: We are all Leonardos now

It doesn’t take a genius to point out that we are experiencing a technology-driven renaissance. We can now share and create knowledge faster than ever before, driving exponential progress that impacts on every aspect of society. Some may lament the fact that we no longer have Concorde, but the web connects people faster than any aircraft could ever hope to.

Inevitably, tech commentators have been drawing comparisons between the current tech revolution and the invention of the printing press, which heralded the Renaissance around the 15th century. It brought learning to the masses and enabled the dissemination of ideas. This was the era of groundbreaking luminaries such as Galileo Galilei or Leonardo da Vinci: the earth was no longer the centre of the universe, and the world saw the first design of a workable flying machine. The progress which we are now experiencing in the 21st century is of a similar magnitude, and so the argument goes that we are witnessing a modern Renaissance. Call it Renaissance 2.0 or whatever.

But this is missing the point entirely.

Just consider this simple example: NASA’s visualisation of ocean currents. They’ve taken scientific measurements from a number of years, bundled them all into a visualisation model, and turned it into a youtube video. No big deal, surely. But look closely:

NASA Perpetual Ocean. (Credits: NASA, via Mashable)

Yes, it looks beautiful. Science and beauty, together in one picture. (You can read more how it was done in this article.)

This kind of thing hasn’t really happened since Leonardo was both a painter and a scientist. For the first time since the original Renaissance, science and art are finally converging again. Over the course of decades and centuries, people have become increasingly specialised in their jobs, but finally we are being freed again from the tyranny of pigeon-holing. Many of us were still brought up with the notion that your education would determine the entire course of your life. You could either become an artist or a scientist, but not both. You would go to school, get a degree (or not), and stick with that profession for life.

No longer. The latest technological advances now enable anyone to do almost anything. Scientists can tell their stories with artful visualisations, and artists can use scientific tools to express themselves – and not just their Macbooks. What we are witnessing now is not a new and faster printing press, but a reconvergence of science and art as envisaged by the cult book Zen and the Art of Motorcycle Maintenance. Art and technology are no longer opposing worlds but complimentary again – as during the Renaissance. Just like it is possible to find emotional fulfilment in the technical pursuit of motorcycle maintenance, the new technology renaissance is liberating us to express ourselves in a more complete way. With the emergence of inexpensive 3D printers, for example, you will be able to create a custom part for your bike or a sculpture for your garden – or something in-between that nobody has ever thought of. And that is just the beginning.

‘Renaissance’ comes from the French language, meaning rebirth and rejuvenation. The current era is a renaissance not because of the increased speed of communication, but because it is enabling us all to become aspiring Leonardos, seamlessly embracing both art and technology.

A true sense of place: My 10 favourite books

Following on from my last (slightly off-topic) post about bookshops, I felt inspired to explore my oak bookshelf to find the books which, to me, have managed to convey a true and magical sense of place. Resisting the temptation of selecting publications from the technical worlds of mapping or geography, I went in search of books I have read and loved because of how they made me feel. A place is not just defined by its position in 2D or 3D space, but also by the time, feelings and senses. This also makes it impossible for anyone to ever to (re)visit the same place – except maybe in a book.

I hope you enjoy the list, and please do let me know if you can recommend any more!

1. 30 Days in Sydney – A wildly distorted account, by Peter Carey

As a student I spent two years in this magical city. Using a masterful blend of autobiography, fact and fiction, Peter Carey truly manages to capture the Australian spirit of Sydney, from its maritime heritage to its lush outback. Simply amazing, especially if you have visited or lived there.

2. A Man on the Moon: The Voyages of the Apollo Astronauts, by Andrew Chaikin

Published in 1994, the 25th anniversary of the first moon landing, this is the most complete and authoritative account of the Apollo Astronauts’ experiences. As I read it I had to keep pinching myself along the way – all of this really happened. I don’t think you’ll find a better read to get a sense of what it’s like to go to the moon, and then walk on it.

3. Quiet for a Tuesday: Solo in the Algerian Sahara, by Tom Stoppard

Similarly to the moon landings this is a beautiful account, accompanied by stunning images, that conveys a great sense of the desert’s space, light, and peacefulness. This time I didn’t have to pinch myself because I have been lucky enough to experience the Sahara myself during my days in oil exploration. This book transported me right back there.

4. Uncommon Places: The Complete Works, by Stephen Shore (Photographer)

This photo essay is just unbelievable. Reprinted in glorious high-definition, it captures Stephen Shore’s large format images of American scapes in the 1970s and 1980s. The quality of the pictures is so sharp you’d think they were taken yesterday. They not only preserve a particular era, the images also transform boring places such as car parks into fascinating spaces you never knew you wanted to explore.

5. The Life and Times of the Thunderbold Kid, by Bill Bryson

I was born neither in America nor in the 1950s, but this book captures Bill Bryson’s childhood so well that it brought back long lost memories of my own childhood which took place 20 years later, thousands of miles away. Similarly to Uncommon Places, this book manages to capture a place and time in an absorbing yet utterly different way. And because it’s written by Bill Bryson, it also makes you laugh!

6. On Chesil Beach, by Ian McEwan

Having read this novel a couple of years ago I can’t remember much about it except for one inconspicuous scene where one of the main characters goes on a bike ride along a country lane. McEwan’s masterful description of this ordinary setting took me down memory lanes that may or may not have existed in my life – immersed in green hills, fragrant fields, and hopeful youth. This is the stuff that powerful writing is made of, and it is impossible to do it justice here (certainly not with my writing!).

7. Atlas of Remote Islands: Fifty Islands I Have Not Visited and Never Will, by Judith Schalansky

A beautiful and original work of art. This is indeed an atlas (it includes maps) but not as you know it. Schalansky, inspired by the Cold War’s travel restrictions in her native East Germany, compiled this world atlas of far-away islands so that the imagination is free to roam. Each of these islands is steeped in history but, even today, logistically very difficult to reach. The atlas deliberately blurs the lines between fact and fiction, taking you on a journey where it is unclear where history ends and the author’s dreams and imagination begin. A pure delight.

8. The Wave: In Pursuit of the Ocean’s Greatest Furies, by Susan Casey

This book masterfully intertwines a good story with oceanography, shipping, sailing and big-wave surfing. This is a book that conveys a true sense of the ocean like nothing else I’ve read (e.g. Moby-Duck – no typo here – doesn’t come close). I never thought a book about waves could be such a page-turner. And even though it is entirely non-fiction, the climax at the end is incredible. Spoiler alert: this is what a big wave feels like.

9. The Wild Places, by Robert Macfarlane

Macfarlane is probably Britain’s best wilderness writer, and this book doesn’t disappoint. I’ve already read it twice and will probably read it twice more. Even if you’re not interested in Britain’s wild places in particular, this masterpiece will transport you to peaceful and beautiful spaces whenever you need to. I particularly loved Macfarlane’s account of his overnight mountain bivvy on Red Pike in the Lake District where, under the stars, he suddenly found a snow-capped winter wonderland all to himself.

10. Eternity: Our Next Billion Years, by Michael Hanlon

This book takes place and time to the next level. Until I read it I had never fully appreciated the true timescale of human endeavour compared to the evolution of the universe. Being of a geo-background I take a natural interest in all of the global issues of the day, from climate change to technology to geopolitics, and the challenges at hand can sometimes seem overwhelming. This book really puts the ‘here and now’ into perspective. This sense was also reconfirmed when I later read Tim Flannery’s epic book, Here on Earth, which highlights the widely underestimated ability of nature and humans to adapt to change.

Oh, and one more thing…

11. Monocle magazine, edited by Tyler Brûlé

What the…? A magazine? And of all magazines, Monocle??

Indeed. Just let me explain. True, Monocle is somewhat pretentious – what with the fashion, the adverts for expensive briefcases and Rolexes, or some of the pompous commentary on culture, design and global affairs. Unfortunately it seems to be targeted at globetrotting yuppie hipsters and (wannabe) wealthy elites. But once you get over that, you will find an eclectic mix that truly celebrates places and their people. It champions small-scale entrepreneurism and intelligent city design. It celebrates creativity, passion, design and artisanship, from corner shops to handmade bicycles to Ordnance Survey’s cartography. In a nutshell, this is a monthly publication that might inspire you to believe that the world is fundamentally a good place, and it provides ideas for making it an even better one.

Happy Jubilee Holidays.

What my ideal bookshop would look like

This week, much-loved British bookseller Waterstones announced a surprising tie-up with its arch-enemy, Amazon, to sell Kindles and ebooks within its physical high street stores. Whether this is a stroke of genius or a unilateral suicide pact remains to be seen. Either way, Waterstones is probably right to embrace the digital age – whether it survives all depends on how it does it.

I love a good bookshop and so I’m keen for physical stores to survive. But I’m only a humble book reader, not an expert media commentator. So here’s from a user point of view, and slightly biased by my professional (data) background, what my ideal bookshop would look like:

1. Don’t turn it into a Starbucks or Costa. Their business model is based on charging £3 for a 50p cup of coffee. What you actually pay for is the rental of an armchair. So just cut out the middleman and provide comfy reading chairs yourself. And anybody who buys a book gets a free coffee thrown in.

2. A bookshop should be an oasis of calm in the urban jungle. The mind needs space and time to browse. If you provide quiet & attractive areas where people can enjoy their purchases (make them buy before they read, so they can enjoy their free coffee!) they will associate the shop with pleasure and come back again and again.

3. I don’t like current e-readers for two reasons: they don’t provide the satisfying tactile/sensual experience a physical book does, and their plastic covers look cheap and disposable. Why on earth would anyone want to make their book look disposable? Literature is not junk food. So cut the rubbish out and provide decent screens – iMacs or whatever – for people to browse. And if that is not compatible with a Kindle, it merely highlights the third issue I have with ebooks: compatibility. I don’t have to buy new furniture every time I get a book in print. So provide unified terminals for people to browse & buy any format they like – including print. And maybe a small onsite printing machine could churn out personalised, special editions – that would be cool. And to please the more digitally inclined, make sure to have a decent supply of power sockets and Wifi, so they can recharge not just themselves but also their devices.

4. Speaking of embracing digital, bookshops should be fountains of knowledge and entertainment, not just a shop. They could do well borrowing a few concepts from university libraries. Terminals should encourage you to explore, learn, have fun, and – oh – download (ie buy) content. And maybe the pièce de résistance could be a Wolfram Alpha type machine that let’s you type in any question and it gives you the answer, fed by open data, Wikipedia, open source as well as proprietary intelligence from around the web. This way you will also attract classes of school children on educational outings, and raise the next generation of book, sorry knowledge, buyers.

5. Another news item this week was that sales of fountain pens are on the rise again. Incidentally I bought myself one only 3 weeks ago, confirming the trend. The reason is that people increasingly value the personal touch in the digital age. So a good bookshop should also sell good stationary, pens and pencils, and provide spaces that inspire people to use them straight away. Doodle, draw, write a postcard (what a novelty!) – some things are simply more satisfying on paper.

The digital age has given us access to all the world’s information, on small devices originally designed to be held against one ear. That does however not mean we also have to consume and create all information on these devices. (Although I confess this blog post was written on a train using my iPhone.)

Bookshops have the unique opportunity to become focal points that bring together the analog and digital worlds in meaningful and satisfying ways. I hope they don’t waste it.

Forget the tech evangelists, these are the real people you should be learning from

Things go in and out of fashion all the time, and so it is with technology. If like me you are a data professional you will be familiar with today’s hot potatoes: open source, open data, big data, cloud, and so on.

Today, as I visited one of my employer’s more recent acquisitions (a small business specialising in high-end data analytics) I was reminded of the fact that each technology has its place. Being small, this business can easily mix & match the best tool and approach for every job at hand.

You might automatically assume that they went for the latest tech in everything. In fact, nothing could be further from the truth: their client industry still uses ASCII files as their ‘standard’ for data exchange, and so they have to cater for this outdated practice. And yet, they run a highly sophisticated operation. Technically it is built on a hybrid stack of proprietary, open source, and homegrown technology: SQL Server for the core database, PostGIS for the spatial bits, homegrown code for the clever analytics.

But the real sophistication lies not in the technology but in how they run their business (and how they spend their time). These guys are domain experts, highly focused on their customers.  They are passionate about what they do, but they also keep a low profile. There is no time to go round tech events evangelising their favourite piece of technology. They routinely beat the competition through hard work, a shrewd eye for opportunities, and quiet persuasion rather than public chest-beating. Besides, if they went to speak at conferences they would only be handing their hard-earned advantage to competitors. The only events they might attend are purely focused on their clients.

Now contrast this with some of the tech communities. By the very nature of technology these communities are focused more around the HOW rather than the WHAT or WHY. This is fine and provides useful inspiration for like-minded individuals, as well as social fun. However, people are tribal by nature, and so these communities invariably end up with leaders and followers. Unless members of a community make a conscious effort to keep an open mind, they can easily fall prey to ‘group think’ where nobody asks tough questions anymore, and any deviation from the gospel is seen as heresy.  I have witnessed this particularly in the open data or open source communities, and the same is true of some proprietary vendors. The end result can be reminiscent of a cult. And cults breed spiritual leaders: evangelists.

I’m highly suspicious of evangelists. As most people know, there is never a single solution to a particular problem. Sure, you need a tech strategy but you also pragmatism. A crusade for its own sake achieves very little.

So stop listening to the evangelists, keep an open mind, keep asking the tough questions, and seek out real people who run real businesses. They may be harder to find but when you do, it will be worthwhile.

Dispatches from the geospatial event circuit: Selling ice to eskimos

On my way to Germany I stopped by at the Geospatial World Forum 2012 which is being held in Amsterdam this week. It was certainly a very enjoyable day: many of the usual suspects were there, and it was great catching up with old friends and colleagues.

The guest keynote was an inspiring highlight. Former Dutch astronaut Wubbo Ockels spoke of his shock when, blasting into orbit, he realised that space – the frontier he’d been dreaming about all his life – was just a dark and scary void. It made him appreciate that it is not space but Earth which is special: a beautiful spaceship delicately wrapped in a wafer-thin blanket of air. Ever since his epiphany Ockels has dedicated his life to developing renewable energy and transport technologies. He also spoke of the need to energise young people. Except – there weren’t any young people in the audience. At 41, I was one of the youngest there.

Next up was a panel with the usual geo-industry luminaries. The first talk quickly descended into a vendor sales pitch and so I made my exit, heading for the trade exhibition.  But the floor circuit took no more than 5 minutes to complete: GIS, CAD, GPS, a few theodolites. Why would I want to buy any of this? But hey, I was able to grab a few pens to replenish my shrinking stock at home.

In the afternoon I dropped in and out of various themed sessions. Good idea to break the conference down into streams: it’s much better to talk about specific topics rather than just high-level benefits. So I dropped in on Energy, Mobile, and Open Source. And contrary to what you might expect from a traditional geo-conference dominated by old men, the open source session was totally packed. People were standing at the back of the room. Respect.

So what’s the problem?

The problem is that I didn’t learn anything. The problem is that I don’t learn anything anymore when I attend a “traditional” geospatial industry event. There’s simply no information that you haven’t already learned elsewhere (usually online). Presentations are quite short, so speakers can only scratch the surface of what they want to share. Booths are manned by sales people who can’t answer detailed questions or run technical workshops. My office department, staffed by about 30 data professionals, has now reached a 50/50 gender balance but these industry events are run by old men, for old men, trying to catch up on what’s happening on the geo-technology front. What’s the bloody point?

Over the course of my career I’ve been to most types of geospatial events around the world, and I’m tired of them. They drain me rather than energise me. They still talk about the same issues as 15 years ago. They try to sell me stuff I don’t need, and most importantly, they don’t put me in touch with my customers because they’re not there. It’s more like going to a class reunion.

Don’t get me wrong, I enjoy class reunions. There are people in the geo-community who are very dear to me, and I love sharing a joke or a drink with them. But don’t make me pay over 400 euros for a bad cup of coffee in a windowless conference lobby. I can organise a drink with a contact in a much more pleasant location, for much less. And it will result in a more illuminating conversation than anything that can be discussed on a live stage.

So – having slaughtered a sacred cow – who’s up for a drink? Even if we pop open a €100 bottle of Château Lafite it will result in a 80% saving.

Google: The real questions we should be asking

Google, who famously coined the motto “don’t be evil,” has had a tough time recently. First, it was lambasted for introducing charges to its map service for sites attracting more than 25,000 hits per day. Next, one of Google’s affiliates was caught playing dirty in Kenya, scraping third party IP without permission and trying to poach customers by deception. Around the same time, people pointed the finger at Google for vandalising OpenStreetMap data  in what seemed a related incident. A week later, Google announced a deal with the World Bank which appears to be handing Google a pseudo-monopoly on crowd-sourced mapping in developing countries. Then came the new privacy policy where Google is merging the terms and conditions of multiple products into one, which some said amounts to the end of privacy on the web. And finally, last week a French court fined Google €500,000 over anti-competitive tactics, that is, a “general elimination strategy” of smaller competitors.

All this matters, of course, because Google is a dominant player in a large market and most of us make use of its services. Nobody – including Google – will agree with the use of illegal practices but beyond that it becomes much less black and white. So how did you react to the recent events?

  1. If you’re a passionate advocate for open data, privacy or fair competition, you will probably have come to the conculsion that Google is evil. That was indeed the predominant stance taken by the tech media and blogosphere.
  2. If you’re a Google fan (or employee) you will see a bigger picture of Google trying to provide a better service around its self-proclaimed mission of “organising the world’s information.” So you may have reacted with a mix of feelings including embarrassment, bemusement, or righteousness.
  3. If you’re a bystander you may just be thinking – yeah, Google has simply become a big company and needs to face up to the kind of challenges experienced by any major multinational. That seems to have been the stance of the mainstream media, who largely ignored most of it.

All of these attitudes are valid to a point, but none is entirely satisfactory. What I’ve been missing in recent weeks is reporting that looks at the bigger picture, and much of that should be looking at what we, as a society, actually expect from the internet now.

The Google privacy policy and map charges got some people hot under the collar but what’s the real story here – has Google become such a public commodity that people think it now belongs to them rather than the shareholders? What then, you want to nationalise it? Following through on that argument, be careful what you wish for.

A lof of issues of the online market are directly related to common attitudes and practices of national government bodies and regulators. For example, why did the World Bank’s lawyers think it was okay to sign this deal with Google? Were they so naïve that competition concerns didn’t even occur to them, or was it a conscious decision based on hardnosed pragmatism? Are we comfortable with national governments interfering in commercial markets, handing single players (including sometimes their own agencies) a unique competitive advantage? Also see e.g. these pieces on ESRI or Ordnance Survey.

Should regulators not be more concerned about the ever deeper entrenchment of established players? This is not just about Google, Facebook, Amazon, Microsoft or Apple, but also about smaller niche companies who are busy creating pseudo-monopolies in their own sectors. There are solid garden walls emerging everywhere on the web, partly because the established players capitalised on their first mover advantage, and partly because their product is such that it feeds off its own users in a perpetual virtuous circle at the exclusion of others (think e.g. of social networks like Linkedin or proprietary formatted music or data sites that are not interoperable).

The result is that users are suffering from vendor lock-in and that new players are prevented from entering the market. Where’s the interoperability? Where’s the consumer choice and freedom to move your data between suppliers? Try migrating your friends between Twitter, Facebook or Google+, or your cloud data from Amazon to Azure, and you see what I mean.

It is this that we should be worried about, not (just) the behaviour of individual companies. What we need is a considered approach and not cheap headlines. Whatever Google does or doesn’t do is mostly a matter for them, their shareholders and their customers, and good luck to them all. What we need is a level playing field, open to all, and that is not Google’s job alone. This is where political leaders and regulators need to step in.

And they will only do so if we ask the right questions.

Simplicity

Politicians around the world argue about making the tax and welfare system fairer, but the system has grown so complex that even experts no longer understand it, so it cannot be reformed.

A state-of-the-art Air France airliner crashed into the Atlantic Ocean, killing all on board, because the avionics have grown so complex that only a computer can control them, and the junior pilot in charge had no idea how to fly the plane manually.

The iPhone app from the UK Meteorological Office used to be nice and simple, but the latest version has so many data points and buttons and map overlays that it is impossible to tell what the weather is going to be.

Are you going to create something simple today?