Preston L. Bannister { random memes }

2010.01.31

Multiplexed FastCGI connections?

Filed under: Software, Web — Preston L. Bannister @ 12:22 pm

Does anyone use FastCGI with FCGI_MPXS_CONNS set to “1″ (for multiplexed connections)?

Most FastCGI backends seems to be written for non-multiplexed connections. (Much simpler, so understandable.) The IIS FastCGI connector apparently does not support multiplexed connections.

Writing a FastCGI backend that allows for multiplexed connections. Would be a waste of time if not supported by a frontend, or if existing frontends are buggy.

(Not really expecting a response, but have to state the question.)

2010.01.17

Giving up HTML@W3C

Filed under: Software, Web, html@w3c — Preston L. Bannister @ 10:00 pm

Got the “status as Invited Expert in HTML Working Group” email. This I will let expire. Spent my time tilting at windmills, and do not see any point in continuing.

The HTML Working Group at W3C is … far too much noise. The HTML5 “standard” is going to be a bloated monster, and there is no chance I can change that. Time to stop the pretense of trying.

Not that all the work is bad, or that there is any shortage of well-intentioned folk. What the group lacks is any sense of minimalism, and enough strong voices able to say “no”. What we will get is going to be even harder to digest than HTML4. This is sad. Future developers are going to have an even harder time, for no good reason.

The wildcard here is that the mainstream browser implementations may not follow all the half-thought ideas thrown in by the HTML5 Working Group. No idea how this will work out.

At the core HTML is pretty damn simple – or could be. HTML4 got stuffed with a bunch of half-thought notions, most of which have since proved of no value, and were ignored by developers. Ideally we would learn from experience, omit the fuzzy disused bits, and trim HTML down to the useful core. There are well-known (though not universally known) means to achieve this aim. This is not going to happen.

I cannot keep up with the herd of Energizer-bunnies eager to make their mark, and with too-limited experience.
Time to stop pretending.

2010.01.16

Efficient UTF-8 recoding and secure processing

Filed under: Software, Web — Preston L. Bannister @ 1:20 pm

An attempt to make a point…

The use of UTF-8 on the web is common and increasing. Lots of data comes in as UTF-8, and inefficiency in UTF-8 data handling is going to have pretty pervasive impact.

On the flip side, the creators of UTF-8 did a good job. There is nothing really complicated about the UTF-8 format, and processing is simple.

So I was surprised (or rather shocked) to find in an earlier experiment that Java performed UTF-8 conversion slowly. In fact, I was able to write a faster UTF-8 decoder in Java than the stock decoder. This is just plain wrong. Conversion between encodings is a primitive/simple operation best written in C/C++ and run as native code (and this is the sort of processing where C/C++ is probably always going to be much faster than Java).

There is a problem in that malicious external parties can send oddly-encoded UTF-8, and bypass simple-minded malware detection software. Ordinary ASCII characters can coded as an alternate multi-byte sequences, and simple scanners miss the alternate encoding.

This is a problem. There is a simple solution. One of a set of principles I adopted a long time ago is “convert at the edges”. If you have data coming in from an untrusted source, then you perform conversion and validation at the “edge” where the data is first received.

In the case of UTF-8 coming from an untrusted source, to make all later processing simpler, you must recode to eliminate any alternate encodings. This is quite simple, as recoded UTF-8 will always be the same size or smaller, and so can be done in-place. The prior experiment measured the cost of UTF-8 recoding. Looks like we can drive a 1-Gbit network link at full speed (with efficient code), while recoding the entire contents. Since UTF-8 data usually represents a smaller portion of traffic, and since other processing tends to take the larger part of the load, there is no reason to not perform recoding on any UTF-8 data coming from an untrusted source.

Combine “convert at the edges” with UTF-8 recoding, and we lose the basis for the requirement in RFC 3629 for detection of “illegal” UTF-8 code sequences. In addition we allow all downstream processing to be simpler and more efficient … and we also can be tolerant of imperfect upstream software. (Yes, I am going to invoke Postel, again.)

The basis is good (secure processing with untrusted sources), but the requirement for detection of “illegal” sequences is not necessary and (most definitely!) not optimal.

Example of UTF-8 recoding (from String.cpp).

void UTF8::String::recode() {
    // Iterate until all UTF8 characters are normalized.
    // UTF8 in canonical form can only be smaller, so work in-place.
    char* p1 = pBuffer;
    char* p2 = pBuffer;
    char* pEOS = pBuffer + nContent;
    while (p1 < pEOS) {
        int c = 255 & *p1++;
        if (c < 0x80) {
            *p2++ = c;
            continue;
        }
        if (c < 0xE0) {
            c = (31 & c) << 6;
            c |= 63 & *p1++;
        } else if (c < 0xF0) {
            c = (15 & c) << 12;
            c |= (63 & *p1++) << 6;
            c |= 63 & *p1++;
        } else if (c < 0xF8) {
            c = (7 & c) << 18;
            c |= (63 & *p1++) << 12;
            c |= (63 & *p1++) << 6;
            c |= 63 & *p1++;
        } else if (c < 0xFC) {
            c = (3 & c) << 24;
            c |= (63 & *p1++) << 18;
            c |= (63 & *p1++) << 12;
            c |= (63 & *p1++) << 6;
            c |= 63 & *p1++;
        } else {
            c = (1 & c) << 30;
            c |= (63 & *p1++) << 24;
            c |= (63 & *p1++) << 18;
            c |= (63 & *p1++) << 12;
            c |= (63 & *p1++) << 6;
            c |= 63 & *p1++;
        }
        if (c < 0x80) {
            *p2++ = c;
        } else if (c < 0x800) {
            *p2++ = 0xC0 | (c >> 6);
            *p2++ = 0x80 | (63 & c);
        } else if (c < 0x10000) {
            *p2++ = 0xE0 | (c >> 12);
            *p2++ = 0x80 | (63 & (c >> 6));
            *p2++ = 0x80 | (63 & c);
        } else if (c < 0x200000) {
            *p2++ = 0xF0 | (c >> 18);
            *p2++ = 0x80 | (63 & (c >> 12));
            *p2++ = 0x80 | (63 & (c >> 6));
            *p2++ = 0x80 | (63 & c);
        } else if (c < 0x4000000) {
            *p2++ = 0xF8 | (c >> 24);
            *p2++ = 0x80 | (63 & (c >> 18));
            *p2++ = 0x80 | (63 & (c >> 12));
            *p2++ = 0x80 | (63 & (c >> 6));
            *p2++ = 0x80 | (63 & c);
        } else {
            *p2++ = 0xFC | (1 & (c >> 30));
            *p2++ = 0x80 | (63 & (c >> 24));
            *p2++ = 0x80 | (63 & (c >> 18));
            *p2++ = 0x80 | (63 & (c >> 12));
            *p2++ = 0x80 | (63 & (c >> 6));
            *p2++ = 0x80 | (63 & c);
        }
    }
    nContent = (int) (p2 - pBuffer);
}

2010.01.12

UTF8/UCS conversion benchmark

Filed under: Software, Web — Preston L. Bannister @ 9:12 pm

Point of reference…

UCS (Unicode) to UTF8 conversion, and the reverse, when efficiently coded in C++ clocks in well above 100MB/s on current generation CPUs. If you are getting something much less – enough to be a problem – then there are questions you should ask. The following run spans 1 to 6 byte UTF8 encodings.

Recoding a UTF8 string to a normalized form is a bit slower at 122-87 MB/s.

preston@athena:~/workspace/json-c$ time Release/json
base 00000010 :  229.9 MB/s UCS to UTF8 conversion
base 00000010 :  253.7 MB/s UTF8 to UCS conversion
base 00000010 :  122.0 MB/s UTF8 recode
base 02000010 :  196.8 MB/s UCS to UTF8 conversion
base 02000010 :  177.1 MB/s UTF8 to UCS conversion
base 02000010 :  100.6 MB/s UTF8 recode
base 04000010 :  173.8 MB/s UCS to UTF8 conversion
base 04000010 :  157.1 MB/s UTF8 to UCS conversion
base 04000010 :   89.2 MB/s UTF8 recode
base 06000010 :  174.0 MB/s UCS to UTF8 conversion
base 06000010 :  158.5 MB/s UTF8 to UCS conversion
base 06000010 :   89.2 MB/s UTF8 recode
base 08000010 :  174.0 MB/s UCS to UTF8 conversion
base 08000010 :  154.5 MB/s UTF8 to UCS conversion
base 08000010 :   89.6 MB/s UTF8 recode
base 0a000010 :  174.1 MB/s UCS to UTF8 conversion
base 0a000010 :  156.5 MB/s UTF8 to UCS conversion
base 0a000010 :   89.8 MB/s UTF8 recode
base 0c000010 :  174.0 MB/s UCS to UTF8 conversion
base 0c000010 :  155.1 MB/s UTF8 to UCS conversion
base 0c000010 :   89.6 MB/s UTF8 recode
base 0e000010 :  174.0 MB/s UCS to UTF8 conversion
base 0e000010 :  158.8 MB/s UTF8 to UCS conversion
base 0e000010 :   87.4 MB/s UTF8 recode
base 10000010 :  170.8 MB/s UCS to UTF8 conversion
base 10000010 :  158.2 MB/s UTF8 to UCS conversion
base 10000010 :   89.7 MB/s UTF8 recode
base 12000010 :  174.0 MB/s UCS to UTF8 conversion
base 12000010 :  158.8 MB/s UTF8 to UCS conversion
base 12000010 :   86.5 MB/s UTF8 recode
base 14000010 :  171.5 MB/s UCS to UTF8 conversion
base 14000010 :  153.9 MB/s UTF8 to UCS conversion
base 14000010 :   87.1 MB/s UTF8 recode
base 16000010 :  172.1 MB/s UCS to UTF8 conversion
base 16000010 :  158.1 MB/s UTF8 to UCS conversion
base 16000010 :   87.5 MB/s UTF8 recode
base 18000010 :  172.1 MB/s UCS to UTF8 conversion
base 18000010 :  158.2 MB/s UTF8 to UCS conversion
base 18000010 :   86.9 MB/s UTF8 recode
base 1a000010 :  171.3 MB/s UCS to UTF8 conversion
base 1a000010 :  158.2 MB/s UTF8 to UCS conversion
base 1a000010 :   86.5 MB/s UTF8 recode
base 1c000010 :  169.5 MB/s UCS to UTF8 conversion
base 1c000010 :  158.5 MB/s UTF8 to UCS conversion
base 1c000010 :   86.5 MB/s UTF8 recode
base 1e000010 :  173.0 MB/s UCS to UTF8 conversion
base 1e000010 :  157.8 MB/s UTF8 to UCS conversion
base 1e000010 :   86.1 MB/s UTF8 recode
base 20000010 :  173.1 MB/s UCS to UTF8 conversion
base 20000010 :  158.4 MB/s UTF8 to UCS conversion
base 20000010 :   87.2 MB/s UTF8 recode
base 22000010 :  173.1 MB/s UCS to UTF8 conversion
base 22000010 :  158.2 MB/s UTF8 to UCS conversion
base 22000010 :   88.2 MB/s UTF8 recode
base 24000010 :  173.3 MB/s UCS to UTF8 conversion
base 24000010 :  158.8 MB/s UTF8 to UCS conversion
base 24000010 :   87.6 MB/s UTF8 recode
base 26000010 :  173.8 MB/s UCS to UTF8 conversion
base 26000010 :  158.1 MB/s UTF8 to UCS conversion
base 26000010 :   87.9 MB/s UTF8 recode
base 28000010 :  172.0 MB/s UCS to UTF8 conversion
base 28000010 :  158.7 MB/s UTF8 to UCS conversion
base 28000010 :   87.6 MB/s UTF8 recode
base 2a000010 :  172.0 MB/s UCS to UTF8 conversion
base 2a000010 :  158.1 MB/s UTF8 to UCS conversion
base 2a000010 :   86.1 MB/s UTF8 recode
base 2c000010 :  172.1 MB/s UCS to UTF8 conversion
base 2c000010 :  158.5 MB/s UTF8 to UCS conversion
base 2c000010 :   86.3 MB/s UTF8 recode
base 2e000010 :  173.8 MB/s UCS to UTF8 conversion
base 2e000010 :  158.2 MB/s UTF8 to UCS conversion
base 2e000010 :   89.3 MB/s UTF8 recode
base 30000010 :  170.0 MB/s UCS to UTF8 conversion
base 30000010 :  158.7 MB/s UTF8 to UCS conversion
base 30000010 :   87.2 MB/s UTF8 recode
base 32000010 :  173.8 MB/s UCS to UTF8 conversion
base 32000010 :  158.4 MB/s UTF8 to UCS conversion
base 32000010 :   87.8 MB/s UTF8 recode
base 34000010 :  173.5 MB/s UCS to UTF8 conversion
base 34000010 :  158.5 MB/s UTF8 to UCS conversion
base 34000010 :   88.0 MB/s UTF8 recode
base 36000010 :  173.1 MB/s UCS to UTF8 conversion
base 36000010 :  158.4 MB/s UTF8 to UCS conversion
base 36000010 :   88.4 MB/s UTF8 recode
base 38000010 :  172.5 MB/s UCS to UTF8 conversion
base 38000010 :  158.5 MB/s UTF8 to UCS conversion
base 38000010 :   88.7 MB/s UTF8 recode
base 3a000010 :  170.5 MB/s UCS to UTF8 conversion
base 3a000010 :  158.4 MB/s UTF8 to UCS conversion
base 3a000010 :   87.4 MB/s UTF8 recode
base 3c000010 :  169.3 MB/s UCS to UTF8 conversion
base 3c000010 :  154.7 MB/s UTF8 to UCS conversion
base 3c000010 :   86.3 MB/s UTF8 recode
base 3e000010 :  172.0 MB/s UCS to UTF8 conversion
base 3e000010 :  156.9 MB/s UTF8 to UCS conversion
base 3e000010 :   87.9 MB/s UTF8 recode
base 40000010 :  171.8 MB/s UCS to UTF8 conversion
base 40000010 :  148.9 MB/s UTF8 to UCS conversion
base 40000010 :   83.7 MB/s UTF8 recode
base 42000010 :  173.1 MB/s UCS to UTF8 conversion
base 42000010 :  149.3 MB/s UTF8 to UCS conversion
base 42000010 :   88.1 MB/s UTF8 recode
base 44000010 :  173.7 MB/s UCS to UTF8 conversion
base 44000010 :  157.7 MB/s UTF8 to UCS conversion
base 44000010 :   87.4 MB/s UTF8 recode
base 46000010 :  172.1 MB/s UCS to UTF8 conversion
base 46000010 :  152.2 MB/s UTF8 to UCS conversion
base 46000010 :   87.3 MB/s UTF8 recode
base 48000010 :  174.0 MB/s UCS to UTF8 conversion
base 48000010 :  142.7 MB/s UTF8 to UCS conversion
base 48000010 :   87.8 MB/s UTF8 recode
base 4a000010 :  173.5 MB/s UCS to UTF8 conversion
base 4a000010 :  150.2 MB/s UTF8 to UCS conversion
base 4a000010 :   87.7 MB/s UTF8 recode
base 4c000010 :  174.0 MB/s UCS to UTF8 conversion
base 4c000010 :  158.4 MB/s UTF8 to UCS conversion
base 4c000010 :   88.4 MB/s UTF8 recode
base 4e000010 :  174.0 MB/s UCS to UTF8 conversion
base 4e000010 :  156.9 MB/s UTF8 to UCS conversion
base 4e000010 :   88.4 MB/s UTF8 recode

real	2m0.751s
user	2m0.540s
sys	0m0.190s

The above run is on a AMD Phenom(tm) II X4 955 Processor running at the stock clock rate.

Code is in: http://svn.bannister.us/public/json-c/
(A start on an experiment in fastest-possible JSON conversion.)

Note that I intentionally allow proper conversion of “invalid” UTF8 code strings. I completely understand the reason for the disallowed conversions, and I disagree.

Update: Converted to use pointer arithmetic, rather than array and index. Was not sure pointer math was still a win on current CPUs and compilers. Got a big boost in throughput, so it is!

2010.01.08

Musing about cumulative impact

Filed under: Software — Preston L. Bannister @ 12:52 pm

About 15 years back I was working on a C++ GUI application with a cyclic workload and a lot of string manipulation. For both performance and reliability I came up with a lightweight string class that did allocations off a free list. The class benchmarked well, and performed very well in practice. About 10 years back I started writing a C++ back-end bulk processing application with massive string manipulation, and re-used the same lightweight string (with excellent results). Five years back I again ran benchmarks (with good results) and wrote up the results.

The lightweight string class is simple enough to be reproduced from memory (nothing over-complicated), but can make quite a difference in performance. I had hoped to make a point.

Since posting, after the initial burst, the articles have collected what looks like a few new hits, every day, for the past five years. Counting hits is a pretty foggy indicator, but it seems possible quite a few folk have read the article (though a relatively small fraction of the programming community). Some may have copied the C++ string class in their code. (What I hoped!)

But there is no sure way of knowing. What is the cumulative impact of this steady trickle of readers?