Of Devious User-Agents and Web Applications
Think you've already seen the most ingenious uses of existing, "benign" communications technology to channel spam? Check this one out:
Here at FeedBurner, we have a variety of statistics and reporting tools for publishers who manage their feeds with our service. One of the most popular services we offer is a daily subscribership breakdown that points out exactly where traffic for your feed is coming from. We maintain an extensive catalog of "user-agents," or text codes that identify requests for a feed as coming from a service (e.g., My Yahoo), ordinary web browser, or an individual desktop feed aggregator (e.g., FeedDemon or NewsGator for Outlook). Below is a snippet of what our "Subscribers" report looks like:
We list user-agents we recognize as links (which you can click for additional detail) and other unknown user agents as plain black text entries in that table. Last week, we started to see some truly mysterious behavior. Any time you would view this Subscribers report for certain highly-subscribed feeds, your browser would be hijacked and redirected to www.sirseek.com, a search engine whose quality I leave for you to judge.
At first, we suspected a spyware or virus intrusion on my Mac PowerBook G4. Impossible, right? OS X Macs just don't get sick. Subsequent testing on other machines, Mac and PC alike, revealed the same hijackish behavior when viewing the report, no matter what browser or operating system. At that point I suspected some JavaScript was being executed that we hadn't put on the page ourselves, and sure enough, some jackass had coded a user-agent string as follows:
<SCRIPT>window.location='http://www.sirseek.com'</script> (compatible; MSIE 6.0; Windows NT 5.1; Avant Browser)
I doubt anyone involved with the official Avant Browser product had anything to do with this particular user-agent. Shame on us, though, for displaying third-party text content in our own web application without taking the simple step of properly encoding it to avoid execution. We immediately changed all of our reporting capabilities to HTML-encode third party provided strings so that they can never be executed as obnoxious, unwanted scripts again.
This little life lesson reveals that web application developers who parse and retrieve third party content need to be on the lookout for edge cases like this one, because spam is on the march, and it's getting more insidiously clever by the hour. I guess I'm only surprised we haven't seen this sooner; FeedBurner recognizes and reports on thousands of unique user-agents.
Comments
Matt, that is a useful edge case to share. Thanks.
Posted by: John Roberts | June 4, 2006 10:44 PM