Preston L. Bannister { random memes }

2009.12.31

… status as Invited Expert in HTML Working Group

Filed under: Web, html@w3c — Preston L. Bannister @ 11:23 pm

At one time I had hoped there was a small chance I might be able to nudge the HTML working group in a constructive direction. Over time, what I found is that there are a small number of individuals that are able to invest an inordinate amount of time to this same working group, and I cannot possibly invest the time to construct thoughtful responses to the flood ill-considered notions.

There is almost no chance I can move the working group is a useful direction. Time to disconnect.

This is all rather discouraging. The HTML working group will proceed. Some of the work is worthwhile. Much (measured by volume of email list traffic) is not. What mix will make it into the generated proposed “standard” is sure to be a mess. Not sure how to change any of this.

My status as an “Invited Expert” is up for renewal. With extreme reluctance … my judgement is that I cannot make a useful contribution, and should disassociate from the HTML working group. Of course, they will continue on the present course, in my absence. My withdrawal makes no difference of significance. There is a fair chance the body of work from this working group will be adopted, imperfect as it is. The existing body of work is … badly skewed by an imperfect process.

Nothing meaningful I can do. The result will be a mess, and will create a mess for years after. Time to disengage.

Funny bit – I do not see a way to force a disconnect.

2009.12.30

Almost but not quite … server-side JavaScript

Filed under: Javascript, Software — Preston L. Bannister @ 8:05 pm

Bit over three years back I looked at server-side Javascript, and was not enthused with the available choices.

Three distinct usages I’d like to cover: optimal performance,Windows web server (IIS) interoperable, and webhosting.

In addition, there are three interesting aspects of optimal performance: throughput, scalability, and stability.

For serving static content, I really like the model of a single-threaded non-blocking web server, of which thttpd was an early example, and for which the C10K question clarified the need. A small/simple web server has a much better chance to being very reliable. With the single-threaded non-blocking model, massive scalability is possible.

For serving dynamic content, I really like the isolation and load distribution possible with the FastCGI model (or the like). Dynamic code tends to be complex. Javascript interpreters are complex. Complex code tends to fail more often. Complex code can use more compute throughput than possible on a single box. For intranet applications, a single front-end web server is often preferable, and load distribution via FastCGI offers more headroom. All of which tends to argue for the FastCGI model, with isolation from the front-end web service, and potential distribution of load across more than one machine.

For the widest possible usage, in additional to optimal deployments (when there is no restriction on the front-end web server), the engine on which the application runs should be deployable behind IIS (for Windows-only organizations), and at common web-hosting services (like Dreamhost). Microsoft’s recent support of FastCGI with IIS is a big help.

At that time (three years back), none of the solutions were really optimal – and in fact were pretty far from optimal. The Java-based RhinoJavascript interpreter was easiest to embed, but failed the webhosting case. The C++ based JavaScript interpreters were a pain to embed, and offered good (but not great) performance.

Fast forward to the present, and Google offers the V8 JavaScript Engine that offers great performance, and is easy to embed. (Google as the good guys, riding to the rescue once again … you’d think they have white hats superglued to their brains.) Suddenly we have lots of projects embedding the V8 engine. In addition, seems most all the single-threaded non-blocking web servers have picked up support for FastCGI.

Oh … and I am pretty much fed up with the Java Servlet model. After considerable time with the problem, I am of the opinion that the servlet model chose the wrong abstractions, and this makes for awkward solutions. (Of course, the servlet model appeared very early in the history of web applications, so the mistake is easy to understand.)

Which means the model offered by node.js makes a lot of sense. I like the notion of a naked node (running JavaScript on the V8 engine) performing request dispatch without any extra layers or abstractions. The main lack with node.js is the ability to work via FastCGI (and thus no means to be deployed behind IIS on Windows).

But there are as yet items unresolved and/or unclear.

  • Projects like v8cgi offer the V8 JavaScript Engine connected via FastCGI.
  • The node.js project offers a single-threaded non-blocking web server … but can it work behind FastCGI?
  • Is the environment for server-side JavaScript the same (or sufficiently similar) between node.js and v8cgi?
  • Comet is still a question. Can FastCGI work well with long-outstanding requests from applications?

The good news is that we seem a lot closer to attractive and well-supported server-side JavaScript for web applications … but it seems we are not quite fully there, as yet.

2009.12.24

Wireless network and Linux

Filed under: Software — Preston L. Bannister @ 7:40 pm

A signpost of sort – wireless network support on Linux, at least for the Intel 4965AGN adaptor – sucks.

Went with the Intel adaptor when I ordered this notebook, in part as Intel seems to be actively supporting the development of Linux drivers. In practice, my laptop wireless connection is mostly unreliable, and often near-useless.

I used to think the problem was interference from other nearby wireless routers, but my daughter’s laptop seems to work well when mine does not. I have a simple “bounce” test for network performance. My daughter’s cheap Toshiba, on the same “bounce” page, gets steady/fast performance when side-by-side to my laptop … which gets slower/unsteady/unreliable performance.

Not specific to the wireless router or crowded neighborhood either – I get similar poor performance when at my father’s place in Colorado (different brand and generation of router, and fewer/further neighbors).

2009.12.23

Trie in Java – revisited

Filed under: Software — Preston L. Bannister @ 5:12 pm

An earlier attempt at writing a fast general purpose Trie in Java gave huge memory use, and disappointing results. Seems a Trie implementation that is both fast and general purpose is not possible. (Translation: For most use a Trie cannot replace a hash table.)

After the prior results, I wanted to see if a less general-purpose implementation would perform better. Given enough advantages, could a Trie out-perform a hash table? (Again, this visits to some extent the question asked in a prior discussion.)

For the current exercise I built two Trie implementations. The LinkedTrie is cheap to build, minimal in use of memory, but not especially fast to access. The FixedTrie implementation should be pretty close to optimal in access time (for a Trie), but expensive to build (in fact the FixedTrieBuilder transforms a LinkedTrie into an optimized FixedTrie).

As before, the sources are in (as an Eclipse project): http://svn.bannister.us/public/Trie/.

The performance numbers make sense. The older TallTrie and WideTrie implementations (that traded increased memory use for speed) are indeed faster, though the LinkedTrie uses much(!) less memory. The new FixedTrie is fastest (hurrah!) and uses the least memory (a shade less than LinkedTrie).

But even the FixedTrie is slower than a generic hash table, with read-only access about 3 times as expensive.

Sample measurements…

=== words
21 ms - 98569 words loaded

=== hash
3935/second - hash map loaded {4007 ms, 15771040 operations = 254 ns/op}
8554/second - hash map re-loaded {4010 ms, 34302012 operations = 116 ns/op}
10768/second - access each item in hash map {4000 ms, 43074653 operations = 92 ns/op}

=== trie (linked)
870/second - loaded trie (linked) {4078 ms, 3548484 operations = 1149 ns/op}
1204/second - re-loaded trie (linked) {4010 ms, 4829881 operations = 830 ns/op}
1431/second - access each item in trie (linked) {4063 ms, 5815571 operations = 698 ns/op}
98569 slots of trie (linked)
225791 nodes of trie (linked)

=== build fixed trie

=== trie (fixed)
85779/second - loaded trie (fixed) {4000 ms, 343118689 operations = 11 ns/op}
87652/second - re-loaded trie (fixed) {4000 ms, 350609933 operations = 11 ns/op}
3438/second - access each item in trie (fixed) {4013 ms, 13799660 operations = 290 ns/op}
98569 slots of trie (fixed)
225791 nodes of trie (fixed)

=== trie (wide)
65/second - loaded trie (wide) {4502 ms, 295707 operations = 15224 ns/op}
1659/second - re-loaded trie (wide) {4040 ms, 6702692 operations = 602 ns/op}
1672/second - access each item in trie (wide) {4007 ms, 6702692 operations = 597 ns/op}
41533124 slots of trie (wide)
225890 nodes of trie (wide)

=== trie (tall)
682/second - loaded trie (tall) {4043 ms, 2759932 operations = 1464 ns/op}
1733/second - re-loaded trie (tall) {4036 ms, 6998399 operations = 576 ns/op}
1913/second - access each item in trie (tall) {4019 ms, 7688382 operations = 522 ns/op}
6982694 slots of trie (tall)
446075 nodes of trie (tall)

=== string to UTF8 conversion
1393/second - word to UTF8 (stock) {4033 ms, 5618433 operations = 717 ns/op}
9585/second - word to UTF8 (fast) {4000 ms, 38343341 operations = 104 ns/op}

(Note that the FixedTrie ignores re-load, so the times for load and re-load are bogus.)

I suspect a C++ Trie implementation could do a bit better … but not necessarily outperform … compared to hash tables.

Looks very much like even a specialized read-only Trie cannot match the performance of a generic hash table (at least in Java).