Server-side parsing of HTML to DOM
Ran into an unexpected problem.
The basic notion is I want to send an HTML fragment to the server. The server-side code would then:
- Read and parse an HTML file from disk into a DOM tree.
- Replace a DIV identified by a ID attribute with the HTML fragment received from the client.
- Save the edited HTML into a file on disk.
Pretty simple, right?
The application-specific code on the server is very small. I was hoping to keep the server-side code simple so multiple implementations (PHP, Perl, Java) would be feasible. This would allow widest possible usage.
Granted a parser for sloppy HTML is not trivial. One the other hand, you only need one good open source HTML parser for all the languages implemented in C/C++. I know there are a couple for Java. By now I had hoped that HTML -> DOM (and back) would be relatively common.
Guess I was wrong.
Don't see an alternative, so PHP may be a non-starter.
My web host allows use of PHP, Perl, Python, and Ruby ... so off to do some reading.
Update: Found my way around the problem with the PHP DOM functions. A bit of a hack, but not too bad. What I really want is an assignable equivalent to innerHTML, as this would be simpler and more efficient. Oh well, sometimes you work with what you have.