Ran into an unexpected problem.

Had this bit of inspiration for what I thought would be an optimally performant wiki/weblog, and started putting together a prototype. The client-side Javascript went together pretty easily. The server-side is presenting more of a problem.

The basic notion is I want to send an HTML fragment to the server. The server-side code would then:

  1. Read and parse an HTML file from disk into a DOM tree.
  2. Replace a DIV identified by a ID attribute with the HTML fragment received from the client.
  3. Save the edited HTML into a file on disk.

Pretty simple, right?

The application-specific code on the server is very small. I was hoping to keep the server-side code simple so multiple implementations (PHP, Perl, Java) would be feasible. This would allow widest possible usage.

Granted a parser for sloppy HTML is not trivial. One the other hand, you only need one good open source HTML parser for all the languages implemented in C/C++. I know there are a couple for Java. By now I had hoped that HTML -> DOM (and back) would be relatively common.

Guess I was wrong.

Pulled up the PHP documentation and found the DOM functions. Hmmm, PHP4 apparently only supported XML, which is not good enough. I want a parser able to tolerate less-than-perfect HTML. PHP5 apparently has a more tolerant parser, though I’d hoped not to have to require the very latest version. Oh well … wrote code to slurp the HTML fragment from the client (OK), slurp the HTML file off disk (OK), find the DIV to replace with getElementById() … nope. The file is HTML 4.01 Strict (been through the validator) and getElementById() works from browser Javascript. Looks like the newish PHP DOM functions may be a bit buggy still.

Don’t see an alternative, so PHP may be a non-starter.

My web host allows use of PHP, Perl, Python, and Ruby … so off to do some reading.

Update: Found my way around the problem with the PHP DOM functions. A bit of a hack, but not too bad. What I really want is an assignable equivalent to innerHTML, as this would be simpler and more efficient. Oh well, sometimes you work with what you have.