Datamining Folksonomy

Yahoo Research is going in interesting directions with datamining the relationship between geo and folksonomic tags. And Dan Catt hints at some future direction visualizing along the time dimension as well. 10 million plus photos with location, with time, associated with free form words. There’s definitely some meaning lurking in this folksonomy.

I explored this in a crufty way with mapping flickr tags, a WMS which randomly chooses tags for photos geotagged in area, slippy map browsable .. without the underlying map. It’s possible to actually identify where you are just based on the tags, which often name the place, and the shape of the tags. For instance, search for “New York, NY”, “United States”, and Manhattan is visible in rough outline. Other times the results are geographic poetic. “Barnet”, “UK” is southeast of “clouds” and northwest of “gas”.

When I first swapped over mapping flickr tags to use the new flickr geo api, from searching for “geotagged” tagged photos, the density of words dropped substantially. It seems that many people map their photos but don’t bother to tag them with anything .. perhaps geography is enough of an organizing principle. But it made my hack boring, so I added back the restriction to “geotagged” photos, which tend to have a lot of other tags too .. maybe since they were already using the tagging interface, it was easy to add a few more tags and now that the mapping interface is standalone, there isn’t a well worn path towards making more tags.


It seemed that many people added meaningful placenames to their geotagged photos, so that a geocoder could be built on top of the association .. something potentially much richer than purpose built geocoders. I built a really crude prototype that didn’t quite work. Tim Waters made it work, really well, by applying clustering to the results in flickr geocodr.


Can the same principle be applied to time? jbum’s datamining on flickr sunsets is still a mindblower and inspirational. Most all digital cameras stamp photos with the time they were taken (or at least the time the camera is set to). Perhaps the association between timestamps and tags could build a timecoder? Again I built a really crude prototype, flickr timecodr, which doesn’t work that well (4th of July is timecoded to June 30), but gets the idea across. . it simply averages all the timestamps across a tag; temporal clustering could help.

This first iteration of the timecodr does better on distinct events .. live8 for instance. I really look forward to seeing the time based animations Dan Catt hints about. With hugely growing geospatial databases everywhere, time will become essential to making sense of that volume.


The other facet I find interesting is geometries. Incredibly accurate line and polygon arrangements of photos were discovered right at the launch of flickr geotagging. Some clustering and line simplification could yield really useful results. A simple plotting of photos from west to east can yield crufty but interesting results. flickr linecodr. Races and roads work well .. like bay to breakers, boston marathon, route 66.

machine tags

Whoah! While I’ve been writing, flickr has announced support for machine tags. Blowing my mind, will need to let that sink in. Wonder if such a thing will be coming to crufty hacks on delicious.

One thought on “Datamining Folksonomy”

Comments are closed.