Beyond Syntax

Looking beyond syntactical meaning

RSS Sucks.

I'll admit that I haven't spent too much time working with RSS feeds, but so far I'm unimpressed. All they really seem to provide is a consistent view of published data for clients to read when they want. That seems okay, but inefficient and a little redundant. It seems like you could implement the same thing by just sending an email to people who want to subscribe. At least then the end-user doesn't have to use both an email client and feed reader (yes, I understand some programs combine the two technologies). Alright, fine maybe you don't want to give the "evil-faceless-corporate-giant" your email address, after all you know they're going to sell it to someone. Is there a better way to publish data?

So, I'll start with what possessed me to write this. I'm trying to watch a Google Code project and I want to get updates whenever something changes. The "easiest" way to do that is through the RSS feed. But, I don't really want to download and use another application just to watch the feed for updates. Even if I did download the program, it wouldn't really gain me anything since it is just going to query ("poll") the server for new updates periodically, just like my email client already does.

I begin searching online for something that will watch RSS feeds on my behalf and send me an email when it updates. The first thing I come across is Feed My Inbox, they seem to offer the exact service I want. Upon closer inspection, they promise to only send one email every 24 hours. That won't cut it---I want my updates and I want them now! After a bit more searching I find rss2email, a simple Python program that keeps track of multiple RSS feeds and converts entries into emails when it executes. I go through the initial configuration and set up a cronjob to check for new entries every 4 minutes. Good enough for now.

However, this brings up an annoyance with RSS feeds:

RSS feeds do not provide real-time updates

Once you get down to it, all an RSS feed does is provide some subset of content on a page. It is still up to the client to ask the server when new content exists. This fact has bugged me very slightly in the past since I know xkcd updates every Monday, Wednesday, and Friday at 11pm central time, but my RSS subscription in Firefox won't provide me with the link until (at its discretion) polls the server for new content. Now, it bothers me slightly more since I know I have to wait at least 4 minutes for my cronjob to run, plus any time it might take Google to publish the updates in the feed (but I'll ignore that part).

Is there any way to improve this and give end-users real-time updates from content-providers? It seems like this would be great for users of Twitter and Facebook, since the end-users want to know what is happening now.

One answer seems to be in push notifications, that seem to have been popularized by Blackberry and iPhone applications. These allow a central server to send a tiny message to the phone that nudges the device that there is data to be had. Of course, this works very well on phone systems that can easily associate an content generator with a telephone number to contact. However, it is a bit tougher with IP-only devices that migrate from network to network. Although, it seems Apple's Push Notification Service (APNS) should be able to do this. Either way, this technology seems to be heading in the right direction.

APNS works by maintaining a connection between the client and server, that way when an event happens server side it just sends it on to all the connected clients. Unfortunately, I'm not convinced at how well these push notifications will scale. A system implementing IP push notifications seems like it could easily have on the order of 1000s of simultaneous, persistent connections. According to this "Comparison of Push and Pull Techniques for AJAX," (tech report, PDF) push-style system do bog down servers a bit. (Admittedly, the methodology for that paper might not be the best, but I'm guessing the conclusions are valid---I would like to see more/better studies of this.)

This bring me to what I want to see implemented or for someone to point me to the implementation of a distributed content syndication protocol (DCSP). The high-level view that I think would work (I haven't thought long or carefully about it), would be similar to other distributed networks. The content provider would maintain a complete list of current computers subscribed to the feed and the feed itself. The client would run software that asks the server who to connect to and select a few peers and create a long running connection. When new content arrives, the server pushes the content to its peers, who push to their peers, and so forth. This removes the burden of pushing content to all subscribers from the server, giving it scalability (in my mind). It would then be up to the client to connect and maintain connections with a collection of peers to get the real-time updates. I'm sure there would have to be some control messages to prevent flooding. But, it seems like it would give real-time updates to users.

I suppose this would mean I would have to run another program on my system, but it could either be a front-end client that handles my content feeds or a daemon running in the background and set up to deliver an email to a local mailbox (or even a remote mailbox) when fresh content arrives.

Ah well, who knows if it would work. Hell, maybe I just missed a fact about RSS that doesn't make it suck as much as I think it does. Thus ends my stream of though.