2.10. Robustness Principle TCP implementations will follow a general principle of robustness: be conservative in what you do, be liberal in what you accept from others.

Long ago I had accepted and internalized this bit of sound general advice as almost univerally applicable in software design - so long ago in fact I had forgotten the source.

Admittedly I have not tried to puzzle out all the tumult around Atom and the various RSS flavors. Seems to me if you generate feeds you want to make sure whatever you generate is as clean as possible. On the other hand if you want to consume the universe of feeds, you have to adapt to all the flawed generators out there - and for aggregator sites and the like there simply is no other choice. You might try to encourage the flawed generators to clean up their acts, but adaptation is something you cannot practically avoid.

For specific limited applications you can often afford to be picky and require that all the feeds you consume be well-formed. General purpose feed consumers do not have this luxury and must be “liberal in what you accept”. Adapting to flawed generators is a pain, but is also something you cannot ignore. In the true spirit of the Internet, if your application does an incomplete job then users will simply route around you and use other applications that do a better job.

The above conclusion is pretty easy to come by - so why all the noise?

In trying to puzzle out an obscure posting from Sam Ruby - just ran across a nearly two year old item from Mark Pilgrim (who is doing exactly the right thing, BTW)

There are no exceptions to Postel’s Law [dive into mark] This last one is interesting, in that it tries to wish away Postel’s Law (originally stated in RFC 793 as be conservative in what you do, be liberal in what you accept from others). Various people have tried to mandate this principle out of existence, some going so far as to claim that Postel’s Law should not apply to XML, because (apparently) the three letters X, M, and L are a magical combination that signal a glorious revolution that somehow overturns the fundamental principles of interoperability.

There are no exceptions to Postel’s Law. Anyone who tries to tell you differently is probably a client-side developer who wants the entire world to change so that their life might be 0.00001% easier. The world doesn’t work that way.

I maintain a feed parser. Real people rely on it. It is used in several end-user products, including Chandler and Straw, and lots of other people use it in their own homegrown aggregators. It is as liberal as possible because that is what clients need to be. It handles the 7 different versions of RSS seamlessly and equally. It handles Atom. It even goes so far as to try to to abstract away the differences between RSS and Atom, duplicating RSS elements into Atom fields and Atom elements into RSS fields, RSS 2.0 fields into RSS 1.0 fields, and so forth and so on. If all you care about is title/link/description, you can get that from any feed, even a souped-up Atom feed. If you want to use the more advanced content model of Atom, you can do that too, even from the most minimal RSS feed.

My feed parser resolves relative links. It maps non-standard elements to standard ones. It parses 10 different types of dates and then normalizes them in case somebody claims their latest entry was last modified on June 31st. It handles many common cases of non-well-formed XML, because many feeds contain XML well-formedness errors, even the feeds of people who should really know better, and a feed parsing library that can’t parse the feeds that exist in the real world is simply a waste of everyone’s time. Don’t whine to me that parsing feeds is hard. I know how hard it is.

I also help maintain a feed validator, and I strongly advocate its use among producers. It tickles me whenever I go to a new site and see a valid RSS or valid Atom banner. I have spent an ungodly amount of time making the validator as easy to use as possible and also as strict as possible. It has almost 1000 test cases backing up its rules; 300 of those were added in the last release alone. It checks for June 31st. I swear to God it does. One day I was writing test cases and Sam was writing code to pass them, and when he saw that test case fail he almost reached through his cable modem and strangled me. He almost removed the test case out of spite. He gave in and coded it anyway, and checked it in, and we deployed, and three days later I got a bug report from someone who couldn’t figure out why his feed wasn’t validating. And I couldn’t figure it out either, until he mentioned that it only seemed to choke on the date for one specific entry, and I looked at it one more time and I swear to God it said 2003-06-31.

There are no exceptions to Postel’s Law.

More recently from the Microsoft Team RSS Blog : Feeds and well-formed XML we hear…

… we’ve adopted the following overriding principle for IE 7 and RSS platform in Windows Vista: We will only support feeds that are well-formed XML. </p>

On reflection I suppose this is harmless. If you have an application with a limited domain, you can use the Microsoft code with little grief. If you need to support a larger domain then you either use a better feed parser, or you run your feeds through an aggregator (presumably itself running a better feed parser). This will encourage the folks generating new applications to do a better job with what they generate.

At the very least the above reading snags links to a feed validator and an excellent feed parser. Might have use for both in the coming months.

The feed parser is Python code - wonder if it will run in a JVM using Jython?