Preston L. Bannister { random memes }

2010.01.31

Multiplexed FastCGI connections?

Filed under: Software, Web — Preston L. Bannister @ 12:22 pm

Does anyone use FastCGI with FCGI_MPXS_CONNS set to “1″ (for multiplexed connections)?

Most FastCGI backends seems to be written for non-multiplexed connections. (Much simpler, so understandable.) The IIS FastCGI connector apparently does not support multiplexed connections.

Writing a FastCGI backend that allows for multiplexed connections. Would be a waste of time if not supported by a frontend, or if existing frontends are buggy.

(Not really expecting a response, but have to state the question.)

2010.01.17

Giving up HTML@W3C

Filed under: Software, Web, html@w3c — Preston L. Bannister @ 10:00 pm

Got the “status as Invited Expert in HTML Working Group” email. This I will let expire. Spent my time tilting at windmills, and do not see any point in continuing.

The HTML Working Group at W3C is … far too much noise. The HTML5 “standard” is going to be a bloated monster, and there is no chance I can change that. Time to stop the pretense of trying.

Not that all the work is bad, or that there is any shortage of well-intentioned folk. What the group lacks is any sense of minimalism, and enough strong voices able to say “no”. What we will get is going to be even harder to digest than HTML4. This is sad. Future developers are going to have an even harder time, for no good reason.

The wildcard here is that the mainstream browser implementations may not follow all the half-thought ideas thrown in by the HTML5 Working Group. No idea how this will work out.

At the core HTML is pretty damn simple – or could be. HTML4 got stuffed with a bunch of half-thought notions, most of which have since proved of no value, and were ignored by developers. Ideally we would learn from experience, omit the fuzzy disused bits, and trim HTML down to the useful core. There are well-known (though not universally known) means to achieve this aim. This is not going to happen.

I cannot keep up with the herd of Energizer-bunnies eager to make their mark, and with too-limited experience.
Time to stop pretending.

2010.01.16

Efficient UTF-8 recoding and secure processing

Filed under: Software, Web — Preston L. Bannister @ 1:20 pm

An attempt to make a point…

The use of UTF-8 on the web is common and increasing. Lots of data comes in as UTF-8, and inefficiency in UTF-8 data handling is going to have pretty pervasive impact.

On the flip side, the creators of UTF-8 did a good job. There is nothing really complicated about the UTF-8 format, and processing is simple.

So I was surprised (or rather shocked) to find in an earlier experiment that Java performed UTF-8 conversion slowly. In fact, I was able to write a faster UTF-8 decoder in Java than the stock decoder. This is just plain wrong. Conversion between encodings is a primitive/simple operation best written in C/C++ and run as native code (and this is the sort of processing where C/C++ is probably always going to be much faster than Java).

There is a problem in that malicious external parties can send oddly-encoded UTF-8, and bypass simple-minded malware detection software. Ordinary ASCII characters can coded as an alternate multi-byte sequences, and simple scanners miss the alternate encoding.

This is a problem. There is a simple solution. One of a set of principles I adopted a long time ago is “convert at the edges”. If you have data coming in from an untrusted source, then you perform conversion and validation at the “edge” where the data is first received.

In the case of UTF-8 coming from an untrusted source, to make all later processing simpler, you must recode to eliminate any alternate encodings. This is quite simple, as recoded UTF-8 will always be the same size or smaller, and so can be done in-place. The prior experiment measured the cost of UTF-8 recoding. Looks like we can drive a 1-Gbit network link at full speed (with efficient code), while recoding the entire contents. Since UTF-8 data usually represents a smaller portion of traffic, and since other processing tends to take the larger part of the load, there is no reason to not perform recoding on any UTF-8 data coming from an untrusted source.

Combine “convert at the edges” with UTF-8 recoding, and we lose the basis for the requirement in RFC 3629 for detection of “illegal” UTF-8 code sequences. In addition we allow all downstream processing to be simpler and more efficient … and we also can be tolerant of imperfect upstream software. (Yes, I am going to invoke Postel, again.)

The basis is good (secure processing with untrusted sources), but the requirement for detection of “illegal” sequences is not necessary and (most definitely!) not optimal.

Example of UTF-8 recoding (from String.cpp).

void UTF8::String::recode() {
    // Iterate until all UTF8 characters are normalized.
    // UTF8 in canonical form can only be smaller, so work in-place.
    char* p1 = pBuffer;
    char* p2 = pBuffer;
    char* pEOS = pBuffer + nContent;
    while (p1 < pEOS) {
        int c = 255 & *p1++;
        if (c < 0x80) {
            *p2++ = c;
            continue;
        }
        if (c < 0xE0) {
            c = (31 & c) << 6;
            c |= 63 & *p1++;
        } else if (c < 0xF0) {
            c = (15 & c) << 12;
            c |= (63 & *p1++) << 6;
            c |= 63 & *p1++;
        } else if (c < 0xF8) {
            c = (7 & c) << 18;
            c |= (63 & *p1++) << 12;
            c |= (63 & *p1++) << 6;
            c |= 63 & *p1++;
        } else if (c < 0xFC) {
            c = (3 & c) << 24;
            c |= (63 & *p1++) << 18;
            c |= (63 & *p1++) << 12;
            c |= (63 & *p1++) << 6;
            c |= 63 & *p1++;
        } else {
            c = (1 & c) << 30;
            c |= (63 & *p1++) << 24;
            c |= (63 & *p1++) << 18;
            c |= (63 & *p1++) << 12;
            c |= (63 & *p1++) << 6;
            c |= 63 & *p1++;
        }
        if (c < 0x80) {
            *p2++ = c;
        } else if (c < 0x800) {
            *p2++ = 0xC0 | (c >> 6);
            *p2++ = 0x80 | (63 & c);
        } else if (c < 0x10000) {
            *p2++ = 0xE0 | (c >> 12);
            *p2++ = 0x80 | (63 & (c >> 6));
            *p2++ = 0x80 | (63 & c);
        } else if (c < 0x200000) {
            *p2++ = 0xF0 | (c >> 18);
            *p2++ = 0x80 | (63 & (c >> 12));
            *p2++ = 0x80 | (63 & (c >> 6));
            *p2++ = 0x80 | (63 & c);
        } else if (c < 0x4000000) {
            *p2++ = 0xF8 | (c >> 24);
            *p2++ = 0x80 | (63 & (c >> 18));
            *p2++ = 0x80 | (63 & (c >> 12));
            *p2++ = 0x80 | (63 & (c >> 6));
            *p2++ = 0x80 | (63 & c);
        } else {
            *p2++ = 0xFC | (1 & (c >> 30));
            *p2++ = 0x80 | (63 & (c >> 24));
            *p2++ = 0x80 | (63 & (c >> 18));
            *p2++ = 0x80 | (63 & (c >> 12));
            *p2++ = 0x80 | (63 & (c >> 6));
            *p2++ = 0x80 | (63 & c);
        }
    }
    nContent = (int) (p2 - pBuffer);
}

2010.01.12

UTF8/UCS conversion benchmark

Filed under: Software, Web — Preston L. Bannister @ 9:12 pm

Point of reference…

UCS (Unicode) to UTF8 conversion, and the reverse, when efficiently coded in C++ clocks in well above 100MB/s on current generation CPUs. If you are getting something much less – enough to be a problem – then there are questions you should ask. The following run spans 1 to 6 byte UTF8 encodings.

Recoding a UTF8 string to a normalized form is a bit slower at 122-87 MB/s.

preston@athena:~/workspace/json-c$ time Release/json
base 00000010 :  229.9 MB/s UCS to UTF8 conversion
base 00000010 :  253.7 MB/s UTF8 to UCS conversion
base 00000010 :  122.0 MB/s UTF8 recode
base 02000010 :  196.8 MB/s UCS to UTF8 conversion
base 02000010 :  177.1 MB/s UTF8 to UCS conversion
base 02000010 :  100.6 MB/s UTF8 recode
base 04000010 :  173.8 MB/s UCS to UTF8 conversion
base 04000010 :  157.1 MB/s UTF8 to UCS conversion
base 04000010 :   89.2 MB/s UTF8 recode
base 06000010 :  174.0 MB/s UCS to UTF8 conversion
base 06000010 :  158.5 MB/s UTF8 to UCS conversion
base 06000010 :   89.2 MB/s UTF8 recode
base 08000010 :  174.0 MB/s UCS to UTF8 conversion
base 08000010 :  154.5 MB/s UTF8 to UCS conversion
base 08000010 :   89.6 MB/s UTF8 recode
base 0a000010 :  174.1 MB/s UCS to UTF8 conversion
base 0a000010 :  156.5 MB/s UTF8 to UCS conversion
base 0a000010 :   89.8 MB/s UTF8 recode
base 0c000010 :  174.0 MB/s UCS to UTF8 conversion
base 0c000010 :  155.1 MB/s UTF8 to UCS conversion
base 0c000010 :   89.6 MB/s UTF8 recode
base 0e000010 :  174.0 MB/s UCS to UTF8 conversion
base 0e000010 :  158.8 MB/s UTF8 to UCS conversion
base 0e000010 :   87.4 MB/s UTF8 recode
base 10000010 :  170.8 MB/s UCS to UTF8 conversion
base 10000010 :  158.2 MB/s UTF8 to UCS conversion
base 10000010 :   89.7 MB/s UTF8 recode
base 12000010 :  174.0 MB/s UCS to UTF8 conversion
base 12000010 :  158.8 MB/s UTF8 to UCS conversion
base 12000010 :   86.5 MB/s UTF8 recode
base 14000010 :  171.5 MB/s UCS to UTF8 conversion
base 14000010 :  153.9 MB/s UTF8 to UCS conversion
base 14000010 :   87.1 MB/s UTF8 recode
base 16000010 :  172.1 MB/s UCS to UTF8 conversion
base 16000010 :  158.1 MB/s UTF8 to UCS conversion
base 16000010 :   87.5 MB/s UTF8 recode
base 18000010 :  172.1 MB/s UCS to UTF8 conversion
base 18000010 :  158.2 MB/s UTF8 to UCS conversion
base 18000010 :   86.9 MB/s UTF8 recode
base 1a000010 :  171.3 MB/s UCS to UTF8 conversion
base 1a000010 :  158.2 MB/s UTF8 to UCS conversion
base 1a000010 :   86.5 MB/s UTF8 recode
base 1c000010 :  169.5 MB/s UCS to UTF8 conversion
base 1c000010 :  158.5 MB/s UTF8 to UCS conversion
base 1c000010 :   86.5 MB/s UTF8 recode
base 1e000010 :  173.0 MB/s UCS to UTF8 conversion
base 1e000010 :  157.8 MB/s UTF8 to UCS conversion
base 1e000010 :   86.1 MB/s UTF8 recode
base 20000010 :  173.1 MB/s UCS to UTF8 conversion
base 20000010 :  158.4 MB/s UTF8 to UCS conversion
base 20000010 :   87.2 MB/s UTF8 recode
base 22000010 :  173.1 MB/s UCS to UTF8 conversion
base 22000010 :  158.2 MB/s UTF8 to UCS conversion
base 22000010 :   88.2 MB/s UTF8 recode
base 24000010 :  173.3 MB/s UCS to UTF8 conversion
base 24000010 :  158.8 MB/s UTF8 to UCS conversion
base 24000010 :   87.6 MB/s UTF8 recode
base 26000010 :  173.8 MB/s UCS to UTF8 conversion
base 26000010 :  158.1 MB/s UTF8 to UCS conversion
base 26000010 :   87.9 MB/s UTF8 recode
base 28000010 :  172.0 MB/s UCS to UTF8 conversion
base 28000010 :  158.7 MB/s UTF8 to UCS conversion
base 28000010 :   87.6 MB/s UTF8 recode
base 2a000010 :  172.0 MB/s UCS to UTF8 conversion
base 2a000010 :  158.1 MB/s UTF8 to UCS conversion
base 2a000010 :   86.1 MB/s UTF8 recode
base 2c000010 :  172.1 MB/s UCS to UTF8 conversion
base 2c000010 :  158.5 MB/s UTF8 to UCS conversion
base 2c000010 :   86.3 MB/s UTF8 recode
base 2e000010 :  173.8 MB/s UCS to UTF8 conversion
base 2e000010 :  158.2 MB/s UTF8 to UCS conversion
base 2e000010 :   89.3 MB/s UTF8 recode
base 30000010 :  170.0 MB/s UCS to UTF8 conversion
base 30000010 :  158.7 MB/s UTF8 to UCS conversion
base 30000010 :   87.2 MB/s UTF8 recode
base 32000010 :  173.8 MB/s UCS to UTF8 conversion
base 32000010 :  158.4 MB/s UTF8 to UCS conversion
base 32000010 :   87.8 MB/s UTF8 recode
base 34000010 :  173.5 MB/s UCS to UTF8 conversion
base 34000010 :  158.5 MB/s UTF8 to UCS conversion
base 34000010 :   88.0 MB/s UTF8 recode
base 36000010 :  173.1 MB/s UCS to UTF8 conversion
base 36000010 :  158.4 MB/s UTF8 to UCS conversion
base 36000010 :   88.4 MB/s UTF8 recode
base 38000010 :  172.5 MB/s UCS to UTF8 conversion
base 38000010 :  158.5 MB/s UTF8 to UCS conversion
base 38000010 :   88.7 MB/s UTF8 recode
base 3a000010 :  170.5 MB/s UCS to UTF8 conversion
base 3a000010 :  158.4 MB/s UTF8 to UCS conversion
base 3a000010 :   87.4 MB/s UTF8 recode
base 3c000010 :  169.3 MB/s UCS to UTF8 conversion
base 3c000010 :  154.7 MB/s UTF8 to UCS conversion
base 3c000010 :   86.3 MB/s UTF8 recode
base 3e000010 :  172.0 MB/s UCS to UTF8 conversion
base 3e000010 :  156.9 MB/s UTF8 to UCS conversion
base 3e000010 :   87.9 MB/s UTF8 recode
base 40000010 :  171.8 MB/s UCS to UTF8 conversion
base 40000010 :  148.9 MB/s UTF8 to UCS conversion
base 40000010 :   83.7 MB/s UTF8 recode
base 42000010 :  173.1 MB/s UCS to UTF8 conversion
base 42000010 :  149.3 MB/s UTF8 to UCS conversion
base 42000010 :   88.1 MB/s UTF8 recode
base 44000010 :  173.7 MB/s UCS to UTF8 conversion
base 44000010 :  157.7 MB/s UTF8 to UCS conversion
base 44000010 :   87.4 MB/s UTF8 recode
base 46000010 :  172.1 MB/s UCS to UTF8 conversion
base 46000010 :  152.2 MB/s UTF8 to UCS conversion
base 46000010 :   87.3 MB/s UTF8 recode
base 48000010 :  174.0 MB/s UCS to UTF8 conversion
base 48000010 :  142.7 MB/s UTF8 to UCS conversion
base 48000010 :   87.8 MB/s UTF8 recode
base 4a000010 :  173.5 MB/s UCS to UTF8 conversion
base 4a000010 :  150.2 MB/s UTF8 to UCS conversion
base 4a000010 :   87.7 MB/s UTF8 recode
base 4c000010 :  174.0 MB/s UCS to UTF8 conversion
base 4c000010 :  158.4 MB/s UTF8 to UCS conversion
base 4c000010 :   88.4 MB/s UTF8 recode
base 4e000010 :  174.0 MB/s UCS to UTF8 conversion
base 4e000010 :  156.9 MB/s UTF8 to UCS conversion
base 4e000010 :   88.4 MB/s UTF8 recode

real	2m0.751s
user	2m0.540s
sys	0m0.190s

The above run is on a AMD Phenom(tm) II X4 955 Processor running at the stock clock rate.

Code is in: http://svn.bannister.us/public/json-c/
(A start on an experiment in fastest-possible JSON conversion.)

Note that I intentionally allow proper conversion of “invalid” UTF8 code strings. I completely understand the reason for the disallowed conversions, and I disagree.

Update: Converted to use pointer arithmetic, rather than array and index. Was not sure pointer math was still a win on current CPUs and compilers. Got a big boost in throughput, so it is!

2010.01.08

Musing about cumulative impact

Filed under: Software — Preston L. Bannister @ 12:52 pm

About 15 years back I was working on a C++ GUI application with a cyclic workload and a lot of string manipulation. For both performance and reliability I came up with a lightweight string class that did allocations off a free list. The class benchmarked well, and performed very well in practice. About 10 years back I started writing a C++ back-end bulk processing application with massive string manipulation, and re-used the same lightweight string (with excellent results). Five years back I again ran benchmarks (with good results) and wrote up the results.

The lightweight string class is simple enough to be reproduced from memory (nothing over-complicated), but can make quite a difference in performance. I had hoped to make a point.

Since posting, after the initial burst, the articles have collected what looks like a few new hits, every day, for the past five years. Counting hits is a pretty foggy indicator, but it seems possible quite a few folk have read the article (though a relatively small fraction of the programming community). Some may have copied the C++ string class in their code. (What I hoped!)

But there is no sure way of knowing. What is the cumulative impact of this steady trickle of readers?

2009.12.31

… status as Invited Expert in HTML Working Group

Filed under: Web, html@w3c — Preston L. Bannister @ 11:23 pm

At one time I had hoped there was a small chance I might be able to nudge the HTML working group in a constructive direction. Over time, what I found is that there are a small number of individuals that are able to invest an inordinate amount of time to this same working group, and I cannot possibly invest the time to construct thoughtful responses to the flood ill-considered notions.

There is almost no chance I can move the working group is a useful direction. Time to disconnect.

This is all rather discouraging. The HTML working group will proceed. Some of the work is worthwhile. Much (measured by volume of email list traffic) is not. What mix will make it into the generated proposed “standard” is sure to be a mess. Not sure how to change any of this.

My status as an “Invited Expert” is up for renewal. With extreme reluctance … my judgement is that I cannot make a useful contribution, and should disassociate from the HTML working group. Of course, they will continue on the present course, in my absence. My withdrawal makes no difference of significance. There is a fair chance the body of work from this working group will be adopted, imperfect as it is. The existing body of work is … badly skewed by an imperfect process.

Nothing meaningful I can do. The result will be a mess, and will create a mess for years after. Time to disengage.

Funny bit – I do not see a way to force a disconnect.

2009.12.30

Almost but not quite … server-side JavaScript

Filed under: Javascript, Software — Preston L. Bannister @ 8:05 pm

Bit over three years back I looked at server-side Javascript, and was not enthused with the available choices.

Three distinct usages I’d like to cover: optimal performance,Windows web server (IIS) interoperable, and webhosting.

In addition, there are three interesting aspects of optimal performance: throughput, scalability, and stability.

For serving static content, I really like the model of a single-threaded non-blocking web server, of which thttpd was an early example, and for which the C10K question clarified the need. A small/simple web server has a much better chance to being very reliable. With the single-threaded non-blocking model, massive scalability is possible.

For serving dynamic content, I really like the isolation and load distribution possible with the FastCGI model (or the like). Dynamic code tends to be complex. Javascript interpreters are complex. Complex code tends to fail more often. Complex code can use more compute throughput than possible on a single box. For intranet applications, a single front-end web server is often preferable, and load distribution via FastCGI offers more headroom. All of which tends to argue for the FastCGI model, with isolation from the front-end web service, and potential distribution of load across more than one machine.

For the widest possible usage, in additional to optimal deployments (when there is no restriction on the front-end web server), the engine on which the application runs should be deployable behind IIS (for Windows-only organizations), and at common web-hosting services (like Dreamhost). Microsoft’s recent support of FastCGI with IIS is a big help.

At that time (three years back), none of the solutions were really optimal – and in fact were pretty far from optimal. The Java-based RhinoJavascript interpreter was easiest to embed, but failed the webhosting case. The C++ based JavaScript interpreters were a pain to embed, and offered good (but not great) performance.

Fast forward to the present, and Google offers the V8 JavaScript Engine that offers great performance, and is easy to embed. (Google as the good guys, riding to the rescue once again … you’d think they have white hats superglued to their brains.) Suddenly we have lots of projects embedding the V8 engine. In addition, seems most all the single-threaded non-blocking web servers have picked up support for FastCGI.

Oh … and I am pretty much fed up with the Java Servlet model. After considerable time with the problem, I am of the opinion that the servlet model chose the wrong abstractions, and this makes for awkward solutions. (Of course, the servlet model appeared very early in the history of web applications, so the mistake is easy to understand.)

Which means the model offered by node.js makes a lot of sense. I like the notion of a naked node (running JavaScript on the V8 engine) performing request dispatch without any extra layers or abstractions. The main lack with node.js is the ability to work via FastCGI (and thus no means to be deployed behind IIS on Windows).

But there are as yet items unresolved and/or unclear.

  • Projects like v8cgi offer the V8 JavaScript Engine connected via FastCGI.
  • The node.js project offers a single-threaded non-blocking web server … but can it work behind FastCGI?
  • Is the environment for server-side JavaScript the same (or sufficiently similar) between node.js and v8cgi?
  • Comet is still a question. Can FastCGI work well with long-outstanding requests from applications?

The good news is that we seem a lot closer to attractive and well-supported server-side JavaScript for web applications … but it seems we are not quite fully there, as yet.

2009.12.24

Wireless network and Linux

Filed under: Software — Preston L. Bannister @ 7:40 pm

A signpost of sort – wireless network support on Linux, at least for the Intel 4965AGN adaptor – sucks.

Went with the Intel adaptor when I ordered this notebook, in part as Intel seems to be actively supporting the development of Linux drivers. In practice, my laptop wireless connection is mostly unreliable, and often near-useless.

I used to think the problem was interference from other nearby wireless routers, but my daughter’s laptop seems to work well when mine does not. I have a simple “bounce” test for network performance. My daughter’s cheap Toshiba, on the same “bounce” page, gets steady/fast performance when side-by-side to my laptop … which gets slower/unsteady/unreliable performance.

Not specific to the wireless router or crowded neighborhood either – I get similar poor performance when at my father’s place in Colorado (different brand and generation of router, and fewer/further neighbors).

Next Page »