random memes }

Massively scalable wikis

While out hiking yesterday had another inspiration as to how to make a Wiki site extremely efficient.

The basic idea boils down to the web server returning three files for a wiki page:

site.css links.css page.html

The first file and the rationale behind it is pretty standard. The site.css is a static file containing all the common style elements used by all wiki pages. With the usual browser-side caching this static file will be fetched once for the first page viewed on the site, and will not be fetched for subsequent pages. The usual argument for using a seperate CSS file applies - by moving presentation information into the CSS, the HTML files become smaller.

The second file was yesterday's inspiration. The links.css file just contains CSS to style the wiki links, is usually the smallest of the three files, and is the file most likely to change from request to request. The usual convention in wikis is to vary the formatting of a link based on whether the linked-to wiki page exists. The essential bit here is that each unique wiki link gets a unique named CSS rule. So while the contents of the wiki page only changes when the page is edited, the link-formatting changes more often - when any of the referenced pages are updated. Updating a very small CSS file will use much less server-side CPU than updating an entire wiki page.

The third file represents another notion to dramatically improve performance. The page.html is either a static file, or a static HTML fragment pasted into a template with only very simple processing. The basic idea here is to store wiki pages as HTML fragments. The usual approach I have seen in wiki implementations is to store the exact text typed in by the user into an edit box. When a request came in to display the wiki page, the original text would be run through (very CPU expensive) code to render the text to HTML. Typically a page is edited once and viewed many times. A more efficient solution would be to render the page once when the edited page was saved, and then do only very simple processing when the page was viewed.

The notion of storing wiki pages as partially rendered HTML came to me a couple years back, but I got hung up on how to efficiently the update the links when other pages were edited. Seems to me that this last bit of inspiration pretty much nails the problem.

There are of course other details related to making updates to the links.css optimal, but there is nothing especially unusual about cached database queries or in-memory inverse hash structures.

(sigh) Another interesting problem I don't have time to work on....