random memes }

Structured documents, DITA, DocBook, and Wikis

Dived into reading about DocBook and DITA this week.

This all started with my long-standing annoyance with MS Word (or OpenOffice at present) when editing documents with structure. Back in the early 1990's, I was at a small outfit writing both software and documentation. The documentation was somewhat long (hundreds of pages), and illustrated well how poorly MS Word was suited to longer documents. What to maintain a consistent format through the document? You need to learn to use customized styles. Need to make a consistent change through the document? You need to eliminate any deviations from your customized set of styles. Possible ... but very tedious.

Skip forward to the more recent task of writing design documents. Style is an annoyance, as the master "template" documents (done by someone else) are a bit of a hash. My other long-standing annoyance is that documents of this sort tend to be throw-away. Extracting information in design documents is always done manually. The documents have structure, but the structure is not in the document in any easily programmatically accessable form. Seems the lack of consistent manageable style and the lack of explicit accessible form - are simply two sides of the same problem.

You could encode the structure of your documents as XML. You could then apply transforms and stylesheets to get the appearance you want. Certainly possible ... but rather than start from scratch, mucked around looking at for what other folks had done. No point in re-inventing the wheel, after all.

Found DocBook. DocBook has been around a fairly long time, and has much supporting material (books, articles, and software support). On the other hand, seems to be oriented rather heavily towards books (not entirely appropriate in my context), and to be rather cumbersome to learn and use. Might be convinced to make the effort, but would I want to inflict this on all the other developers in the company? Nope, not really.

Found DITA. As a more recent product, DITA seems to have learned from DocBook. DITA is heavily used/supported by some folk at IBM. Might be a bit lighter to pick up and learn than DocBook. There is an open source DITA toolkit, though rather cumbersome to learn and use. Would I want to inflict this on other developers? Nope. Still too "heavy".

Had briefly hoped that OpenOffice (with it's basis in XML) might have some sort of direct support. There is some support for DocBook that kind'a/almost works. So much for the notion of generating structured documents from a word processor.

Took another look at Apache Cocoon and Lenya. Seems like a CMS that processes structured documents as XML might fit. But Lenya seems to be a dying project, and as such a dubious bet.

So after burning much of the week, came up pretty much empty-handed. After chewing on the question for a while, my opinion is that adopting a "heavy" framework (like DocBook or DITA) would be a mistake for occasional use in creating documents with lightweight structure. Better to use something lightweight to begin with - like a wiki - with support for exporting documents as XML with transformable structure.

But what to use as a base?