Playing the Pipes

Yahoo Pipes is pretty incredible .. and it’s just started. About a dozen laptop screens were distracted at Lift07 with Pipe dreams .. no coincidence the favicon for Lift and Pipes resemble each other.

I’m really excited about this class of web native scripting services (I’ll try to define what that class is a few paragraphs in). For now I’m going to talk about my first go at building a Pipe, which takes a tag and builds a GeoRSS feed of people posting about that tag, Tag 2 GeoRSS.

Tag 2 GeoRSS

The basic idea is to request a list of tag citations from Technorati, and loop through with the For Each: Replace module, requesting bloginfo on each in a another pipe. You may not know that Technorati stores GeoURL tags, and they’re available through the bloginfo api.

Pipes initially only groks RSS, and the Technorati API can return RSS, but that RSS contains less information than their XML format, and the missing pieces are crucial to construct this Pipe. The RSS feed from the tag api contains a direct link to the post, but not to the weblog, and bloginfo only works on the root blog url. Second, bloginfo RSS does not contain the GeoURL location.

So I decided to go with XSL to transform the Technorati API XML into the RSS that I want. The W3C hosts an XSLT Service, so I hacked up tapitag2rss.xml and tapiblog2rss.xml. If I had those hosted on S3, then this whole Pipe could be entirely web native. The Technorati request UrlBuilder is piped to another UrlBuilder which builds the W3C XSLT request.

Iterating through the development was frustrating because of some bugs. The xmlfile argument for the XSLT UrlBuilder kept reverting to Hostname, rather than [url] any time the Pipe was reloaded. After editing the “sub-pipe”, the main pipe had trouble refreshing the sub-pipe. It seems like many requests are cached behind the scenes, so that changes to the XSL weren’t picked up, so I’d have to do something like change the name of the XSL in order to get a fresh request. Also, the Filter module couldn’t be set to screen out items that don’t have Geo tags.


The weirdest thing was trying to push through GeoRSS. I was certain GeoRSS was supported from Brady’s Deconstruction, yet when I had the XSL output GeoRSS Simple or W3C Geo, it wasn’t present in the Pipes output. It seems that within a Pipe, geo stuff is acted apon only in the y:location element, which produces GeoRSS when output .. and having my XSL output y:location did work. I think this is wrong on two counts. Pipes should grok GeoRSS on the input as well. And any namespace present in source pipes should be passed through, even if not known to Pipes.

Still it’s an amazing start, and the Pipes team has done a good turn releasing early and working with the web to find out what’s needed. GeoRSS is such a key part of the “mashup” environment, it’s available now in some form, and featured in some of the top pipes.

Web Native Scripting

There’s been a need for a web native scripting language, an abstraction to cover the bits and pieces necessary for mashups. Major parts of the mashup toolbox have become codified, but it’s still awkward to code these things in server based scripting languages, and if you’re not a programmer it’s still out of reach.

I’ve grown to admire Excel for how much power it puts in non-programmer hands, and new services like Dabble DB and EditGrid have taken that model for its explanatory leverage, but are also embracing the web native scripting approach. Transformation services like Dappit and FeedBurner have their own partial approaches. Swivel and Many Eyes focus on the visualization. Ning has of course been cultivating a new style of development, and they have abstracted out many of the common mashups and apis, but it hasn’t hit any sweet spot for non-programmers (php is too hard) or programmers (rails). Even Greasemonkey was in some sense an iteration in this space. And I could count Mapufacture among these tools too (btw, here’s Pipes Nearby Something in Mapufacture.)

All these tools are pushing forward the overall idea of making the web a programming environment, and making that environment as widely friendly as possible. It’s hard to know what the right balance will be .. surely it shouldn’t grow to resemble any other scripting language. But should it try and embrace some of the heavyweight ideas of Web Orchestration, in some lightweight fashion? For instance, my Technorati API key was maxed out during development of this Pipe, so I signed up for a new one .. figuring out this was the problem required leaving Pipes and making requests directly. There could be more ways to access the underlying data flow, and set up some kinds of triggers for unexpected behaviors. Could Pipes support OpenID, XSLT, Microformat operations and RESTful services?

The scaling problems seem real enough, and who knows, maybe this eventually pushes some processing back towards the browser (just in time for Firefox 4). Can the concept of Pipes be portable .. is there an abstract way to encode this .. there are open source projects like Plagger in which to pursue these ideas too.

GeoRSS at Yahoo

I’ve meant to dig into Yahoo’s new support of GeoRSS since the announcement in September.

Two cool improvements. They have GeoRSS export and polylines in GeoRSS!

Annotations purely in code denies other developers and the entire Geospatial Web of that data. For instance, Housing Maps put work into geolocating Craigslist housing ads, freeing and transforming that data, but if someone else wants to build on that, say combine Housing Maps with Chicago Crime, they can’t leverage any of that work and have to start from scratch, building screenscrapers, etc. GeoRSS is designed to fully liberate data.

Yahoo has supported GeoRSS since the beginning of their API, and now giving developers the ability to export their maps into GeoRSS is a great step at encouraging more sharing. However it could go farther. The developer must actively export, and the exported data is not available in a subscription compatible interface. True, it would be much more complex on Yahoo’s part to provide GeoRSS feeds for all of their API uses. And the developer should have some choice in the matter. But by default, we should be sharing, just like by default RSS feeds are produced for weblogs.

The Yahoo polylines are specified using <geo:line>, which was my unofficial extension to the W3C Geo namespace. Now has official support for more geometries .. lines, polygons, and boxes. Perhaps this was partially my fault, since I hadn’t updated the worldKit documentation on GeoRSS and polygons with the work of GeoRSS .. until today, where GeoRSS Simple and GML are the recommended encodings (though every other format at there is still supported).

It would be great to see Yahoo Maps update to as well. Other parts of Yahoo, like flickr, are now publishing GeoRSS Simple. And I think there’s enough in GeoRSS to do away with the need for the extra ymap namespace.

So some critical feedback. Still good stuff. Thanks ever to Yahoo for supporting GeoRSS.

So long 20th Century Yahoo Bookmarks

Techcrunch reports Yahoo! Bookmarks Enters 21st Century.

Back in 1999, I developed Yahoo Bookmarks for the 20th Century. Building the site was a fun little project, it was basically a single My Yahoo module blown to full product size. But the real action was in the Yahoo Companion (now Toolbar) which gave seamless access to bookmarks across multiple machines, by getting underneath the skin of the browser. This was pretty mind blowing stuff back then!

So now Bookmarks will run on the MyWeb platform (without the sharing), along with delicious, and that’s so long to the storing bookmarks as key-value serialized structures in the user database. Bookmarks was based pretty literally on Netscape Bookmarks. When I just migrated, first level folders were converted to tags, and nested folders discarded. Adios 20th Century.

I’m really surprised to read that Yahoo Bookmarks has 20 million active users. This is probably 99.9% through the toolbar. The Bookmarks site hasn’t changed one iota since I stopped working on it, the Export Bookmarks feature has been “new” for 7 years. Implementing this feature was a small political struggle .. not everyone was convinced we should allow users to leave. Guess this hasn’t been a problem! We also had a 1000 bookmark limit .. somewhat arbitrary, but there were efficiency limits in the user database. One Techcrunch commentator was “heart broken” by this limit .. me too. Oh well, limits are so 20th century.

Interestingly, we did have a lot of discussion back then about public bookmarks. This wouldn’t have been so much like, but a kind of searchable uber-directory, available perhaps only to friends in Yahoo Messenger. I don’t know, it wasn’t well thought through, and would’ve been a big mess .. the web hadn’t figured out how to share things on that scale. And there was still a wariness about user contributed content, one of the reasons why Yahoo didn’t jump on RSS immediately.

Pretty incredible to see how far Yahoo and the web have gone since those days.