Dancing About Architecture: February 2007 Archives

« January 2007 | Main | March 2007 »

February 26, 2007

Fun with User-Agents: Firefox and IE7

One of the key parts of FeedBurner's stats processing is trying to determine if a request for a feed represents a casual, drive-by browse or an intentioned subscriber. It has gotten a little bit more complicated lately as some clients serve double-duty. I wanted to share with you how we handle the requests from the two most popular browsers, Firefox and Internet Explorer 7.

Firefox

The Firefox browser can also be used as a feed-reading client with the Live Bookmarks feature. So, the key for FeedBurner is to determine if a request for a feed is coming from Live Bookmarks (where we can count it as a subscriber) or from a visitor that just happened to click on the feed chicklet (where we just report it as a browser hit). Up until Firefox version 2.0.0.1, we really have to guess, since the requests for the most part look identical: they both have a User-Agent that looks like Mozilla/.*Firefox/.*. So, what we do is that we look at a couple of other headers: X-Moz and Referer. So the logic tree looks like this:

If version < 2.0.0.1 and (X-Moz: prefetch or Referer is not empty), then it's a browser.

Firefox (version 1)

Otherwise, it's a Live Bookmarks request.

Firefox Live Bookmarks (version 1)

That's not ideal, because if someone just types in the feed URL in the location bar or launches the feed URL from a different app, we'll count it as a Live Bookmarks hit because the Referer will be empty. But we have nothing else to hang onto.

Firefox 2.0.0.1 has a wonderful new addition that makes this tracking much more accurate. Now, if the request is coming from Live Bookmarks, there will be an X-Moz: livebookmarks header. We can detect that and we don't have to do the referrer guessing game.

If version >= 2.0.0.1 and X-Moz: livebookmarks, then it's a Live Bookmarks request.

Firefox Live Bookmarks

Otherwise, it's a browser request.

Firefox

Internet Explorer 7

The latest version of Internet Explorer adds feed reading capabilities by leveraging the Windows RSS Platform. So, on the surface, things seem really straight-forward, since the Windows RSS Platform has its own User-Agent that's distinct from the IE7 User-Agent.

If User-Agent matches Windows[- ]RSS[- ]Platform/\S+ .*, then it's a "Windows RSS Platform" subscription.

Windows RSS Platform

At this point, however, things get complicated. Outlook 2007 has a cool feed reading capability. Unfortunately, the Microsoft Office team didn't get the memo and identifies itself the same as IE7 instead of leveraging the Windows RSS Platform, which would have made much more sense. So how do we distinguish between IE7 browser hits and Outlook 2007 subscriptions? We use the old referrer trick: if there's no referrer, assume it came from the automated poller fueling Outlook.

If User-Agent matches Mozilla/4\.0 \(compatible; MSIE 7.* and Referer is empty, then it's an Outlook 2007 subscription.

Outlook 2007

But wait ... there's more! It turns out that some Microsoft Vista Gadgets also identify themselves as IE7, and we think it's more appropriate to treat those requests as subscriptions rather than browser hits. Fortunately, there's a hook: we can look at the Referer, and if it starts with x-gadget:///, then the request is coming from a Gadget.

If User-Agent matches Mozilla/4\.0 \(compatible; MSIE 7.* and Referer starts with x-gadget:///, then it's a Vista Gadget subscription.

Microsoft Vista Gadget

Finally, if none of the other rules match, we treat it as an IE7 browser hit.

Internet Explorer 7

So, those are the kinds of decisions that we make when evaluating each of the over 300 million feed requests we get each day. We're constantly reviewing the list of User-Agents we get in those requests in an effort to make these stats as accurate they can be. What really makes our lives easier is when we can definitively discern through request headers if the request is for an intentioned subscription vs. "other". With developments like a distinct User-Agent header for the Windows RSS Platform and the new X-Moz header, we're getting closer!

Posted by Eric at 11:50 AM | Permalink | Comments (0)

February 25, 2007

FeedBurner is 3

I recently checked my email archive to find out when, exactly, we launched the "pre-alpha" of FeedBurner. Turns out it was February 25, 2004 ... three years ago today. Happy birthday, FeedBurner!

Here was the original email that we sent out to friends and family on that day. I'm amazed with how we've pretty much stayed on target with our original vision.

Hi there,

We are very excited to announce a pre-alpha release of our new service, FeedBurner.com ( http://www.feedburner.com ). FeedBurner is an RSS/Atom post-processing service that enables publishers of syndicated content to enhance their feeds in a variety of interesting and powerful ways. By "pre-alpha", we mean that the software still contains a number of bugs, and while we wouldn't release something that was "unstable", we also wouldn't go throwing around the term "high availability" just yet either.

Our pre-alpha release contains a small subset of the services we will be rolling out over the coming weeks. Look for additional services like authenticated feeds (enabling premium content to be syndicated), "future-proofing" to eliminate the market's current debate over feed formats, and even content namespace enhancements to facilitate the broad syndication and feed-splicing of rich content types (eg, think syndicated music meta-data and associated news and purchase information).

As syndication and non-browser content aggregation/display become rapidly more popular, we believe our "syndication clearinghouse" model will provide a large collection of publishers with an enourmous [sic] amount of leverage in the market. For example, today even the most popular bloggers have little to no control over how frequently their content is polled, what newsreaders do with their content layout/style, or how many advertisements are layered into the display at the fringes of syndication. Nor are they afforded any revenue opportunities for their syndicated content. By channeling large numbers of publishers through our post-processing facility, we can begin to help these publishers effect changes to their benefit in a number of interesting ways ( poorly coded newsreaders that excessively drain bandwidth can be shutout, significant opportunities to provide revenue channels to the individual publisher emerge, and much much more).

If you'd like to stay up to speed on the service and the company, we are going to maintain a weblog at //www.burningdoor.com/feedburner that tracks the progress of the company and service, and we'll also use this space to discuss publisher services and business issues.

best,

Dick Costolo
CEO
Burning Door Syndication Services, Inc.
http://www.feedburner.com

Posted by Eric at 02:26 PM | Permalink | Comments (1)

February 15, 2007

Pipes Dreams

First off, let me just say that I think Yahoo! Pipes is very cool and that it has the potential to be an important building block for the next phase of the web (see "A More Personalized Internet?" for an overview). It's the logical next step for this ecosystem that is made possible because of the standard content interchange format called a "feed". Feeds first allowed there to be a loose coupling between content publishers and content consumers and let each evolve separately. Then, FeedBurner came around and showed that this loose coupling also enabled value-add middleware that respected and in some cases even strengthened the "content contract" between the producers and clients. Pipes is a logical next step which does a very cool thing: it allows external parties to construct content workflows and, most importantly, gives them a sharable URL. FeedBurner and Pipes actually complement each other very well, and I've been having a lot of fun over the past week demonstrating that.

There are some very interesting directions that Pipes can take as it evolves, and I'll be curious to see what Yahoo! does with it. One of the first things I wanted to do when I started working with Pipes is that I wanted to be able to construct and share new modules. I hope that it's something they would consider exposing, because man would that be tight! From personal experience, though, I know that it's probably not going to happen -- it's really hard to lock-down any kind of code that would have to execute in your process space, so that's probably out. But maybe if they could just expand the existing "Fetch" module so that one could POST the current state of the stream to an external URL that I host on a server somewhere and I could return the transformed content, and you could wrap that up in a sub-pipe that expects additional user inputs as the config parameters ... something like that could work.

Which brings me to the meat of this post: wonderful things could happen if you marry Pipes to the Atom Publishing Protocol (APP). What if the pipe output, rather than just being XML that spills on the floor when the URL is requested, could instead be hooked up to a module that speaks APP? Now you've got a really cool content routing mechanism. The "Fetch" module already really handles the input end of things, but being able to channel the output to a different destination could open up some amazing possibilities.

One detail to be worked out is the triggering mechanism for the workflow. Currently, a request to the resultant URL serves as the triggering mechanism that the workflow should executed. This is how FeedBurner works as well -- there's no master cronjob that ticking away and retrieving all the source feeds every 30 minutes. Instead, when a request for the burned version of the feed comes and the source feed is stale (i.e., hasn't been checked in the last 30 minutes), then go refresh the source feed. That way, you don't waste cycles updating dormant feeds. Pipes works the same way.

So, if there isn't a request URL, how would you "run" the workflow? Probably the most appropriate thing to do would be to use something like a ping mechanism, so that if the pipe is pinged and the content has been modified since the previous run, you run the pipe. That could probably work.

In the end, if you take the promise of Pipes, the potential of Google Base, and add some of the stuff that you'll see from FeedBurner in the next few months, you'll have some wicked tools to start rewiring the next version of the web. I think it's going to be quite a trip.

Posted by Eric at 08:21 PM | Permalink | Comments (0)

Dancing About Architecture

Feeds, music, life