Internationalizing OpenStreetMap in the Summer of Code

Arindam Ghosh project proposal to internationalize OpenStreetMap has been accepted by Google Summer of Code, and I’ll be mentoring. Congrats Arindam, and the other three SoC students.

Internationalizing and localizing is crucial to growth of the project. There’s many facets here, from the website itself, to the tile rendering, and other tools. We’re going to be flexible and see what’s possible this summer. Internationalization is deceptively complicated, especially when you throw Indic renderings into the mix, but I’ve been here before with My Yahoo! And Arindam is very enthusiastic.

How do *we* determine the names for things?

Illuminating and fascinating reading on Google’s naming policy for disputed places. Commendable that their decision making process, and the process for devising that process, have been so transparently communicated. There are things here for any mapping organization to learn from. And interesting to see how far Google’s process has paralleled OpenStreetMap, and where it ultimately diverges .. in the basis for authority.

I’ve had an interest in the nature of borders and conflicts in mapping ever since maps started to take over my life ;). Traditional cartography hadn’t done a very good job of representing the multiple, fluid realities of the world, especially in the age of nation states. The reality of borders isn’t represented very well by a thick black line. For instance, the national boundaries within the EU are tending towards something like US state borders; in fact, crossing into California, through the agricultural checkpoints, is more restrictive than driving from Germany to Austria. Better are the maps indicating the shifting line of control of WW2, so fascinating to me in my childhood atlas. The thick black line gives the illusion of stasis and control, but ultimately it’s all temporary.

Like most ultimately liberating technologies, maps were primarily designed as military technology, to claim territory, demonstrate authority, control reality. The promise of digital democratic mapping up-ends the military origins. Digital maps have the potential to express multiple and opposing points of view.

OpenStreetMap had the start of its first edit war in the fall. It won’t be the last. At issue was the territory of North Cyprus, in conflict since a nearly universally regarded illegal (excepting Turkey) invasion by Turkish forces over 30 years ago. A English expatriate living in North Cyprus was labelling places with Turkish names. A Greek Cypriot, whose father fled from North Cyprus, had been switching to the Greek names. And back and forth. Traditionally places had both Greek and Turkish names, and Greek and Turkish people, and people used whatever localisation they choose. With the political/military conflict, and a global platform for communication, the conflict has spread to open databases. The Wikipedia article on Cyprus is often in conflict, and now OSM.

I attempted to intervene and broker a solution between the fighting editors. Perhaps I had been spending too much time at the UN, and fancied myself a digital diplomat. With much patience, we came to agree that both Greek and Turkish names will be represented, in A/B fashion. But in the end, we could not reach consensus on who was A and who was B. So disappointedly, dialogue in this case failed.

The discussion within OSM for a solution has been wide ranging .. everything from totally laissez faire, i.e. letting the editors fight it out until one gets tired, policies for disabling certain user accounts (ineffective), whitelists, blacklists, to locking down certain areas from any editing at all. As with most “decisions” in OSM, the solution has been a combination of the simplest thing that works and whatever someone takes the initiative to actually code. We have a rule but no code changes (though disputes has led some impetus to implementing changesets and reverts). We have called our rule the On the Ground Rule, which resembles greatly Google’s Primary Local Usage. The difference of course is in where the ultimate authority for applying this rule lies.

Preceeding this difference, I think I can detect something of the frustration I’ve experienced in attempting to free data from the United Nations as Google has. “We considered attempting to extricate Google entirely from the problem of deciding placenames by simply deferring to the determinations of an existing, authoritative, multilateral or multistakeholder institution.” But the UN keeps a strict policy that their maps are not official political representations, and takes no authoritative stance on boundaries or names. Frustrating, since essentially the UN is hamstrung by the traditional, single reality view of cartography.

So, Google has imbued itself with the authority for these decisions. And they have the funds to employ a Director of Global Public Policy to think these thing fully through. Of course, of any authority, they are presently most open and transparent. Google is spreading its remit way beyond organizing the world’s information, to organizing the world. They are investing in green energy technologies, sponsoring humanitarian information software development, advising governments and intelligence agencies on how to operate. And perhaps the primary colored, happy, relatively open and efficient world of Google is a better alternative to the current world order! They’ve done such a good job with the web, give them our world.

Google has become a target for such sarcasm on many fronts .. because it wasn’t supposed to be this way. The interweb would rebalance power and authority, and this potential is what inspires me for democratic digital mapping. In this vein, Google is an systematic aberration, amassing power by doing what it should — being good. But does power inevitably corrupt? That’s the fear, and the remorse that it doesn’t necessarily need to be this way.

Where does OpenStreetMap derive its authority? The discussion of
the tagging process
touched on this very issue. Jokingly, OpenStreetMap was described as an anarchic collective, but I don’t think that’s far off. There is some ultimate authority — the Foundation runs the servers — but only in the most hands off caretaker position. Beyond some extremes, it’s a continual negotiation and consensus building process, never definitively settling, open to newcomers and new perspectives. The authority of OpenStreetMap is its Openness.

This difference in authority has real implications for real maps. Here is Google and OpenStreetMap compared for Cyprus.


Google has decided to dodge the issue completely by not providing any data for Cyprus. There is definitely data commercially available, since Microsoft maps do show boundaries, roads, and names. OpenStreetMap shows the still somewhat messy circumstances.

OSM has better potential solutions. Our database is already internationalized. All that’s waiting is i18n and localisation of the maps themselves. In an interesting twist, this is one of OSM’s proposed Google Summer of Code projects. Another twist is that the student proposer is from India, and the problem of localising Indic scripts is a complicated one — how a series of characters is rendered differs based on their order, so rulesets/state machines need to be embedded in fonts. Yahoo India has made some impressive progress here, rendering Indian placenames in the local script of each Indian state. The Indian state itself provides the forum for sorting out placenames — state divisions are organized, and reorganized, along real and sometimes semi-imagined linguistic lines. But of any place I’ve ever visited, India demonstrates the greatest variety of people living in relative harmony .. so if there’s any place that will work out the solution for the promise of multiple points of view in digital democracy, I reckon it’s India.

California Fires were Baja California Fires too

On Saturday, I attended Guinevere Harrison’s thesis talk on Neogeography. It’s interesting to see the work of our “field” contextualized for an art criticism audience. Of course, the links between neogeography and art criticism go deep, with Locative Media fostering us techies in the early days, particularly Ben Russell of the Headmap Manifesto.

One shocking item Guinevere discussed was the wildfires last fall. The fires and ash didn’t stop at the border.

Baja California Fires

I never heard about this. Tragic that our media, especially the bottom up media, totally failed to get this message out, and that a political border led to a conceptual divide. The environment doesn’t pay intention to our nationalistic designs on the earth.

GreenDevCamp, Yuri’s Night, LugRadio

It appears that GreenDevCamp has been postponed to June! Disappointed! Hopefully we can get organized by then — this could be a great event.

GreenDevCamp is taking place April 19 postponded in Redwood City at the Green Building Exchange. An un-conference with this focus seems long overdue in the Bay Area, home to some of the most forward thinking technologists and environmentalists. I’m very very interested in how information technology can be to environmental problems, though not sure exactly what I’ll talk about (note this weblog’s subtitle — “Building Digital Technology for Our Planet”), and a gathering of this sort brings to mind the fantastic PlanetWork conferences a few years ago. If you have any angle on sustainability and technology, please do attend!

A couple other events, the weekend before, April 12/13. Yuri’s Night at NASA Ames was awesome last year, and it’s happening again. This is the most amazing setting you can imagine, and a great lineup. We’re putting together some kind of OpenStreetMap related, GPS mapping and animation like activity, will be fun. And at LugRadio Steve and I will have a booth promoting OpenStreetMap to the collected freetards (I mean that in the sweetest way).

In California and Support Open Geo Data? Oppose AB 1978

Update from Adina Levin: “I just heard from Sacramento — the bill sponsor has withdrawn the AB1978 and does not plan to resubmit it this year. In other words, the bill is dead.” Congrats people

When I relocated from the UK, my expectation was a welcome rest from struggles with absurd licensing schemes of government data. Well, Nope. Read this and act and write your California representative. More information here as well.

US Federal Law requires that all all works created by US government agencies are public domain, excepting classified material. This doesn’t require the works be released for free, that’s what the Freedom of Information Act instruments. But in practice, many agencies simply release data, resulting in primary public domain geodata sets like TIGER (the US basis for Navteq, TeleAtlas and OpenStreetMap), NGA Geonames (core component of Geonames), VMAP0, Landsat, and USGS Topos and Orthophotos. It’s a valid point that without a cost recovery mechanism, funds for maintaining and updating these data sets dries up, and the data quality suffers; this has been particularly true of the USGS. But I’d rather start to find solutions from the assumption of free and open to the public, rather than the other way around, and in practice the web and community of open geo data have stepped up to fill this role.

However this federal law doesn’t hold for state and local geo data. And as common in the States, there’s a whole spectrum of policies from public domain to commercial level fees. It’s almost the opposite of the UK, where national data is non-free, and local authorities would like to release their data, if only the OS wouldn’t claim derivative rights over that data.

Santa Clara County started charging high fees for distribution of its geodata. They were sued and California courts ruled that costs can only cover the cost of distribution. And really for digital data the cost of distribution is nearly zero. At the time this was going on I surveyed the terra nullius of new suburban subdivisions in Santa Clara and thought it a damn shame these data sources should disappear.

Well now, Assembly member Jose Solorio’s AB1978 is attempting an end around this ruling by extending the definition of exempt materials (usually classified public safety stuff, think precise locations of critical infrastructure) to “assembled model data, metadata, and listings of metadata”. Vague and ambiguous, the intended effect of this change would allow local governments to reinstate high commercial fees for access to their geodata. As stated here, “Rather than clarifying the Public Records Act, his bill’s proposed paragraph would make the Act more ambiguous, confusing, mis-informed, and obstructive of the public’s right to obtain its government’s records.”

If you’re in California and care about such things, write your representatives.

In a similar vein, I think about the trend of municipalities releasing their geodata for inclusion in Google Maps and Earth. In exchange for some small compensation and maybe some free copies of Google Earth Pro, cities happily give up the goods. Which is great, in part, as that data has been collected with taxpayer money and should be made available to the widest audience possible — and Google certainly has that. One town has gone all the way and has outsourced published all its geodata for collect Google actually, Jason Birch clarifies in the comments: they publicly published their geodata in KML, though orthophotos are only served by Google due to licensing restrictions . But but .. the data is still not truly free and available to any other company or hacker or activist to use. With disappointment, Google’s only motivation is to get that data into Google, not the public. The same is true of Google Transit, where special deals are made to release data to Google and no one else, something UrbanMapping is fighting the good fight on.

Working for government action at the national level is like moving a mountain range. There’s much more possibility to have an affect in your local government. Oakland Crimespotting motivated the Oakland PD to release their data in a machine readable format. Merano Italy released a treasure trove of CC-SA data. Inspire your local government with these examples and others, and push for truly free and open data .. leading to democracy, participation, and an inventive public life.