Preston L. Bannister { random memes }

2010.03.08

Magnetic propulsion?

Filed under: General — Preston L. Bannister @ 1:44 am

Airplanes have always always been an interest, since I was a kid (though theoretic, not actual).

Simple basic facts about aeroplanes: long thin wings tend to more efficient (aerodynamically, not structurally) than wider/thicker/shorter wings. For much the same reason – propellers are more efficient than jets. Ducted fans are less efficient than propellers, but more efficient than pure-jet engines. Turbofan engines are basically ducted fan turboprops. Gains in jet efficiency over than past few decades are in part due to higher bypass turbofans (basically moving from pure jets closer to propellers).

Even propellers are not ideal. Swirling a couple curved sticks of metal through the airstream at high speeds is going to chew up energy without adding to propulsion. Many-bladed turbofans chewing through the airstream have got to be worse. Lots of energy wasted – could there be a more efficient way? Nothing especially obvious … or something better would be in practice.

Ideally we would like a way to throw back the bulk of an airstream without lots of extraneous physical churning. The only certain way we know is to use propellers – like oars in water. Could there be another way?

There is a well-known phenomena in Physics known as “electric wind”, that moves air without physical contact, but is by no measure efficient. Is there any way this could be used?

Is there any way to efficiently push an airstream without physical contact?

The “electric wind” is a stream of charged particles. A magnetic field deflects an charged particle moving through. The deflection exerts a force on the magnet (assuming the force is not sufficient to capture the charge). Movement against that force consumes energy. That energy presumably could accelerate the charged particles.

The mean free path of a charged particle at normal atmospheric pressures is short. Any accelerated ion would give up about half it’s energy at each collision. Short mean free paths mean many collisions. The net result would be (presumably) to accelerate a bulk of the airstream. Maybe.

Magnetic fields deflect charged particles. Strong magnets rotating on opposite directions could deflect and accelerate, then re-deflect and further accelerate charged particles – maybe. Would the result be significant? Would the result be efficient? I have no idea.

This might be an approach only possible if the “controller” is sufficiently smart, and with quite intense magnets (superconducting?). Matching the acceleration and deflection of charged particles through alternating magnetic fields through changing atmospheric conditions may not be possible with simpler control.

Is a “magnetic propeller” is practical possibility?

2010.03.04

Using GMail for mailto: links in Ubuntu

Filed under: Web — Preston L. Bannister @ 6:02 pm

Create the file $HOME/bin/mailto with the contents:

#!/bin/sh
gnome-open "https://mail.google.com/mail?extsrc=mailto&url=$*"

Make the file executable.

On Ubuntu Linux (using the Gnome desktop), go to:

System > Preferences > Preferred Applications

Under Internet / Mail Reader select “Custom” and enter the command:

/home/preston/bin/mailto %s

(Replace “/home/preston” with your $HOME.)

This should open GMail in your default web browser, composing a new message, with the recipient set.

2010.02.24

Between Marketing and Engineering

Filed under: Humor — Preston L. Bannister @ 1:00 pm

Brought up a command window in Windows 7, and saw:

Microsoft Windows [Version 6.1.7600]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

Yep. Windows 7 is in fact Windows version 6. Gotta love those Marketing folk.

2010.01.31

Multiplexed FastCGI connections?

Filed under: Software, Web — Preston L. Bannister @ 12:22 pm

Does anyone use FastCGI with FCGI_MPXS_CONNS set to “1″ (for multiplexed connections)?

Most FastCGI backends seems to be written for non-multiplexed connections. (Much simpler, so understandable.) The IIS FastCGI connector apparently does not support multiplexed connections.

Writing a FastCGI backend that allows for multiplexed connections. Would be a waste of time if not supported by a frontend, or if existing frontends are buggy.

(Not really expecting a response, but have to state the question.)

2010.01.17

Giving up HTML@W3C

Filed under: Software, Web, html@w3c — Preston L. Bannister @ 10:00 pm

Got the “status as Invited Expert in HTML Working Group” email. This I will let expire. Spent my time tilting at windmills, and do not see any point in continuing.

The HTML Working Group at W3C is … far too much noise. The HTML5 “standard” is going to be a bloated monster, and there is no chance I can change that. Time to stop the pretense of trying.

Not that all the work is bad, or that there is any shortage of well-intentioned folk. What the group lacks is any sense of minimalism, and enough strong voices able to say “no”. What we will get is going to be even harder to digest than HTML4. This is sad. Future developers are going to have an even harder time, for no good reason.

The wildcard here is that the mainstream browser implementations may not follow all the half-thought ideas thrown in by the HTML5 Working Group. No idea how this will work out.

At the core HTML is pretty damn simple – or could be. HTML4 got stuffed with a bunch of half-thought notions, most of which have since proved of no value, and were ignored by developers. Ideally we would learn from experience, omit the fuzzy disused bits, and trim HTML down to the useful core. There are well-known (though not universally known) means to achieve this aim. This is not going to happen.

I cannot keep up with the herd of Energizer-bunnies eager to make their mark, and with too-limited experience.
Time to stop pretending.

2010.01.16

Efficient UTF-8 recoding and secure processing

Filed under: Software, Web — Preston L. Bannister @ 1:20 pm

An attempt to make a point…

The use of UTF-8 on the web is common and increasing. Lots of data comes in as UTF-8, and inefficiency in UTF-8 data handling is going to have pretty pervasive impact.

On the flip side, the creators of UTF-8 did a good job. There is nothing really complicated about the UTF-8 format, and processing is simple.

So I was surprised (or rather shocked) to find in an earlier experiment that Java performed UTF-8 conversion slowly. In fact, I was able to write a faster UTF-8 decoder in Java than the stock decoder. This is just plain wrong. Conversion between encodings is a primitive/simple operation best written in C/C++ and run as native code (and this is the sort of processing where C/C++ is probably always going to be much faster than Java).

There is a problem in that malicious external parties can send oddly-encoded UTF-8, and bypass simple-minded malware detection software. Ordinary ASCII characters can coded as an alternate multi-byte sequences, and simple scanners miss the alternate encoding.

This is a problem. There is a simple solution. One of a set of principles I adopted a long time ago is “convert at the edges”. If you have data coming in from an untrusted source, then you perform conversion and validation at the “edge” where the data is first received.

In the case of UTF-8 coming from an untrusted source, to make all later processing simpler, you must recode to eliminate any alternate encodings. This is quite simple, as recoded UTF-8 will always be the same size or smaller, and so can be done in-place. The prior experiment measured the cost of UTF-8 recoding. Looks like we can drive a 1-Gbit network link at full speed (with efficient code), while recoding the entire contents. Since UTF-8 data usually represents a smaller portion of traffic, and since other processing tends to take the larger part of the load, there is no reason to not perform recoding on any UTF-8 data coming from an untrusted source.

Combine “convert at the edges” with UTF-8 recoding, and we lose the basis for the requirement in RFC 3629 for detection of “illegal” UTF-8 code sequences. In addition we allow all downstream processing to be simpler and more efficient … and we also can be tolerant of imperfect upstream software. (Yes, I am going to invoke Postel, again.)

The basis is good (secure processing with untrusted sources), but the requirement for detection of “illegal” sequences is not necessary and (most definitely!) not optimal.

Example of UTF-8 recoding (from String.cpp).

void UTF8::String::recode() {
    // Iterate until all UTF8 characters are normalized.
    // UTF8 in canonical form can only be smaller, so work in-place.
    char* p1 = pBuffer;
    char* p2 = pBuffer;
    char* pEOS = pBuffer + nContent;
    while (p1 < pEOS) {
        int c = 255 & *p1++;
        if (c < 0x80) {
            *p2++ = c;
            continue;
        }
        if (c < 0xE0) {
            c = (31 & c) << 6;
            c |= 63 & *p1++;
        } else if (c < 0xF0) {
            c = (15 & c) << 12;
            c |= (63 & *p1++) << 6;
            c |= 63 & *p1++;
        } else if (c < 0xF8) {
            c = (7 & c) << 18;
            c |= (63 & *p1++) << 12;
            c |= (63 & *p1++) << 6;
            c |= 63 & *p1++;
        } else if (c < 0xFC) {
            c = (3 & c) << 24;
            c |= (63 & *p1++) << 18;
            c |= (63 & *p1++) << 12;
            c |= (63 & *p1++) << 6;
            c |= 63 & *p1++;
        } else {
            c = (1 & c) << 30;
            c |= (63 & *p1++) << 24;
            c |= (63 & *p1++) << 18;
            c |= (63 & *p1++) << 12;
            c |= (63 & *p1++) << 6;
            c |= 63 & *p1++;
        }
        if (c < 0x80) {
            *p2++ = c;
        } else if (c < 0x800) {
            *p2++ = 0xC0 | (c >> 6);
            *p2++ = 0x80 | (63 & c);
        } else if (c < 0x10000) {
            *p2++ = 0xE0 | (c >> 12);
            *p2++ = 0x80 | (63 & (c >> 6));
            *p2++ = 0x80 | (63 & c);
        } else if (c < 0x200000) {
            *p2++ = 0xF0 | (c >> 18);
            *p2++ = 0x80 | (63 & (c >> 12));
            *p2++ = 0x80 | (63 & (c >> 6));
            *p2++ = 0x80 | (63 & c);
        } else if (c < 0x4000000) {
            *p2++ = 0xF8 | (c >> 24);
            *p2++ = 0x80 | (63 & (c >> 18));
            *p2++ = 0x80 | (63 & (c >> 12));
            *p2++ = 0x80 | (63 & (c >> 6));
            *p2++ = 0x80 | (63 & c);
        } else {
            *p2++ = 0xFC | (1 & (c >> 30));
            *p2++ = 0x80 | (63 & (c >> 24));
            *p2++ = 0x80 | (63 & (c >> 18));
            *p2++ = 0x80 | (63 & (c >> 12));
            *p2++ = 0x80 | (63 & (c >> 6));
            *p2++ = 0x80 | (63 & c);
        }
    }
    nContent = (int) (p2 - pBuffer);
}

2010.01.12

UTF8/UCS conversion benchmark

Filed under: Software, Web — Preston L. Bannister @ 9:12 pm

Point of reference…

UCS (Unicode) to UTF8 conversion, and the reverse, when efficiently coded in C++ clocks in well above 100MB/s on current generation CPUs. If you are getting something much less – enough to be a problem – then there are questions you should ask. The following run spans 1 to 6 byte UTF8 encodings.

Recoding a UTF8 string to a normalized form is a bit slower at 122-87 MB/s.

preston@athena:~/workspace/json-c$ time Release/json
base 00000010 :  229.9 MB/s UCS to UTF8 conversion
base 00000010 :  253.7 MB/s UTF8 to UCS conversion
base 00000010 :  122.0 MB/s UTF8 recode
base 02000010 :  196.8 MB/s UCS to UTF8 conversion
base 02000010 :  177.1 MB/s UTF8 to UCS conversion
base 02000010 :  100.6 MB/s UTF8 recode
base 04000010 :  173.8 MB/s UCS to UTF8 conversion
base 04000010 :  157.1 MB/s UTF8 to UCS conversion
base 04000010 :   89.2 MB/s UTF8 recode
base 06000010 :  174.0 MB/s UCS to UTF8 conversion
base 06000010 :  158.5 MB/s UTF8 to UCS conversion
base 06000010 :   89.2 MB/s UTF8 recode
base 08000010 :  174.0 MB/s UCS to UTF8 conversion
base 08000010 :  154.5 MB/s UTF8 to UCS conversion
base 08000010 :   89.6 MB/s UTF8 recode
base 0a000010 :  174.1 MB/s UCS to UTF8 conversion
base 0a000010 :  156.5 MB/s UTF8 to UCS conversion
base 0a000010 :   89.8 MB/s UTF8 recode
base 0c000010 :  174.0 MB/s UCS to UTF8 conversion
base 0c000010 :  155.1 MB/s UTF8 to UCS conversion
base 0c000010 :   89.6 MB/s UTF8 recode
base 0e000010 :  174.0 MB/s UCS to UTF8 conversion
base 0e000010 :  158.8 MB/s UTF8 to UCS conversion
base 0e000010 :   87.4 MB/s UTF8 recode
base 10000010 :  170.8 MB/s UCS to UTF8 conversion
base 10000010 :  158.2 MB/s UTF8 to UCS conversion
base 10000010 :   89.7 MB/s UTF8 recode
base 12000010 :  174.0 MB/s UCS to UTF8 conversion
base 12000010 :  158.8 MB/s UTF8 to UCS conversion
base 12000010 :   86.5 MB/s UTF8 recode
base 14000010 :  171.5 MB/s UCS to UTF8 conversion
base 14000010 :  153.9 MB/s UTF8 to UCS conversion
base 14000010 :   87.1 MB/s UTF8 recode
base 16000010 :  172.1 MB/s UCS to UTF8 conversion
base 16000010 :  158.1 MB/s UTF8 to UCS conversion
base 16000010 :   87.5 MB/s UTF8 recode
base 18000010 :  172.1 MB/s UCS to UTF8 conversion
base 18000010 :  158.2 MB/s UTF8 to UCS conversion
base 18000010 :   86.9 MB/s UTF8 recode
base 1a000010 :  171.3 MB/s UCS to UTF8 conversion
base 1a000010 :  158.2 MB/s UTF8 to UCS conversion
base 1a000010 :   86.5 MB/s UTF8 recode
base 1c000010 :  169.5 MB/s UCS to UTF8 conversion
base 1c000010 :  158.5 MB/s UTF8 to UCS conversion
base 1c000010 :   86.5 MB/s UTF8 recode
base 1e000010 :  173.0 MB/s UCS to UTF8 conversion
base 1e000010 :  157.8 MB/s UTF8 to UCS conversion
base 1e000010 :   86.1 MB/s UTF8 recode
base 20000010 :  173.1 MB/s UCS to UTF8 conversion
base 20000010 :  158.4 MB/s UTF8 to UCS conversion
base 20000010 :   87.2 MB/s UTF8 recode
base 22000010 :  173.1 MB/s UCS to UTF8 conversion
base 22000010 :  158.2 MB/s UTF8 to UCS conversion
base 22000010 :   88.2 MB/s UTF8 recode
base 24000010 :  173.3 MB/s UCS to UTF8 conversion
base 24000010 :  158.8 MB/s UTF8 to UCS conversion
base 24000010 :   87.6 MB/s UTF8 recode
base 26000010 :  173.8 MB/s UCS to UTF8 conversion
base 26000010 :  158.1 MB/s UTF8 to UCS conversion
base 26000010 :   87.9 MB/s UTF8 recode
base 28000010 :  172.0 MB/s UCS to UTF8 conversion
base 28000010 :  158.7 MB/s UTF8 to UCS conversion
base 28000010 :   87.6 MB/s UTF8 recode
base 2a000010 :  172.0 MB/s UCS to UTF8 conversion
base 2a000010 :  158.1 MB/s UTF8 to UCS conversion
base 2a000010 :   86.1 MB/s UTF8 recode
base 2c000010 :  172.1 MB/s UCS to UTF8 conversion
base 2c000010 :  158.5 MB/s UTF8 to UCS conversion
base 2c000010 :   86.3 MB/s UTF8 recode
base 2e000010 :  173.8 MB/s UCS to UTF8 conversion
base 2e000010 :  158.2 MB/s UTF8 to UCS conversion
base 2e000010 :   89.3 MB/s UTF8 recode
base 30000010 :  170.0 MB/s UCS to UTF8 conversion
base 30000010 :  158.7 MB/s UTF8 to UCS conversion
base 30000010 :   87.2 MB/s UTF8 recode
base 32000010 :  173.8 MB/s UCS to UTF8 conversion
base 32000010 :  158.4 MB/s UTF8 to UCS conversion
base 32000010 :   87.8 MB/s UTF8 recode
base 34000010 :  173.5 MB/s UCS to UTF8 conversion
base 34000010 :  158.5 MB/s UTF8 to UCS conversion
base 34000010 :   88.0 MB/s UTF8 recode
base 36000010 :  173.1 MB/s UCS to UTF8 conversion
base 36000010 :  158.4 MB/s UTF8 to UCS conversion
base 36000010 :   88.4 MB/s UTF8 recode
base 38000010 :  172.5 MB/s UCS to UTF8 conversion
base 38000010 :  158.5 MB/s UTF8 to UCS conversion
base 38000010 :   88.7 MB/s UTF8 recode
base 3a000010 :  170.5 MB/s UCS to UTF8 conversion
base 3a000010 :  158.4 MB/s UTF8 to UCS conversion
base 3a000010 :   87.4 MB/s UTF8 recode
base 3c000010 :  169.3 MB/s UCS to UTF8 conversion
base 3c000010 :  154.7 MB/s UTF8 to UCS conversion
base 3c000010 :   86.3 MB/s UTF8 recode
base 3e000010 :  172.0 MB/s UCS to UTF8 conversion
base 3e000010 :  156.9 MB/s UTF8 to UCS conversion
base 3e000010 :   87.9 MB/s UTF8 recode
base 40000010 :  171.8 MB/s UCS to UTF8 conversion
base 40000010 :  148.9 MB/s UTF8 to UCS conversion
base 40000010 :   83.7 MB/s UTF8 recode
base 42000010 :  173.1 MB/s UCS to UTF8 conversion
base 42000010 :  149.3 MB/s UTF8 to UCS conversion
base 42000010 :   88.1 MB/s UTF8 recode
base 44000010 :  173.7 MB/s UCS to UTF8 conversion
base 44000010 :  157.7 MB/s UTF8 to UCS conversion
base 44000010 :   87.4 MB/s UTF8 recode
base 46000010 :  172.1 MB/s UCS to UTF8 conversion
base 46000010 :  152.2 MB/s UTF8 to UCS conversion
base 46000010 :   87.3 MB/s UTF8 recode
base 48000010 :  174.0 MB/s UCS to UTF8 conversion
base 48000010 :  142.7 MB/s UTF8 to UCS conversion
base 48000010 :   87.8 MB/s UTF8 recode
base 4a000010 :  173.5 MB/s UCS to UTF8 conversion
base 4a000010 :  150.2 MB/s UTF8 to UCS conversion
base 4a000010 :   87.7 MB/s UTF8 recode
base 4c000010 :  174.0 MB/s UCS to UTF8 conversion
base 4c000010 :  158.4 MB/s UTF8 to UCS conversion
base 4c000010 :   88.4 MB/s UTF8 recode
base 4e000010 :  174.0 MB/s UCS to UTF8 conversion
base 4e000010 :  156.9 MB/s UTF8 to UCS conversion
base 4e000010 :   88.4 MB/s UTF8 recode

real	2m0.751s
user	2m0.540s
sys	0m0.190s

The above run is on a AMD Phenom(tm) II X4 955 Processor running at the stock clock rate.

Code is in: http://svn.bannister.us/public/json-c/
(A start on an experiment in fastest-possible JSON conversion.)

Note that I intentionally allow proper conversion of “invalid” UTF8 code strings. I completely understand the reason for the disallowed conversions, and I disagree.

Update: Converted to use pointer arithmetic, rather than array and index. Was not sure pointer math was still a win on current CPUs and compilers. Got a big boost in throughput, so it is!

2010.01.08

Musing about cumulative impact

Filed under: Software — Preston L. Bannister @ 12:52 pm

About 15 years back I was working on a C++ GUI application with a cyclic workload and a lot of string manipulation. For both performance and reliability I came up with a lightweight string class that did allocations off a free list. The class benchmarked well, and performed very well in practice. About 10 years back I started writing a C++ back-end bulk processing application with massive string manipulation, and re-used the same lightweight string (with excellent results). Five years back I again ran benchmarks (with good results) and wrote up the results.

The lightweight string class is simple enough to be reproduced from memory (nothing over-complicated), but can make quite a difference in performance. I had hoped to make a point.

Since posting, after the initial burst, the articles have collected what looks like a few new hits, every day, for the past five years. Counting hits is a pretty foggy indicator, but it seems possible quite a few folk have read the article (though a relatively small fraction of the programming community). Some may have copied the C++ string class in their code. (What I hoped!)

But there is no sure way of knowing. What is the cumulative impact of this steady trickle of readers?

Next Page »