Preston L. Bannister { random memes }

2009.10.30

What’s Wrong with the Culture of Wall Street? – TIME

Filed under: Society — Preston @ 11:26 am

Reflections of an Anthropologist on the Wall Street mindset.

What’s Wrong with the Culture of Wall Street? – TIME
Before this crazy crash of 2008, bankers always landed on their feet, almost always. Job insecurity isn't the same thing for the average American worker. They often experience downward mobility or don't land on their feet.

2009.10.28

Moon rocks and a bit of math

Filed under: General — Preston @ 6:43 pm

Ran across a copy/article on extracting oxygen from moon rocks. The interesting bit:

New Device Extracts Oxygen from Moon Rocks | Universe Today
Fray anticipates that three reactors, each a meter high, would be enough to generate a ton of oxygen per year on the Moon. Three tons of rock are needed to produce a ton of oxygen, and in tests the team saw almost 100% recovery of oxygen, he says.

The comments got the math badly wrong. Poking around the web I got some rough numbers:

  • ~0.25 liters / minute – oxygen consumed for human at rest.
  • ~5 liters / minute – oxygen consumed for athlete / heavy exercise.
  • ~1.43 grams / liter – oxygen gas at Earth sea level pressure.

Barring any dumb math errors on my part, that translates to….

  • ~636,000 liters / year – the ton of oxygen produced by the above-mentioned generator.
  • ~131,500 liters / year – oxygen consumed by a person at rest.
  • ~4.8 person-at-rest-years of oxygen generated every year.

Assuming that a “real” space habitat would also have the means to recycle oxygen from exhaled CO2, that is a lot of oxygen!

What is the real need for that much oxygen? Outside use as rocket fuel (or other manufacturing process), the main need would be to supply air for expanded living space. Assuming ~3 meter ceilings, we need ~3000 liters air for each square meter of living space. So one generator would allow expanding the living space by ~2000 square meters / year.

Not bad. And that is just one reactor (and essentially a prototype, at that).

Only one part of the equation (for habitat supply needs), of course. For practical purpose, would be nice to also have some nitrogen in the air. Not sure where that would come from. Do any moon rocks contain nitrogen? Water is also needed. Barring any fortunate discovery of water on the moon, we could synthesize water given a source of hydrogen. Do any moon rocks contain hydrogen?

2009.10.24

First impressions – Google Wave

Filed under: Software — Preston @ 11:09 am

Basically, Google Wave is a full and natural merger of messaging, on the web. Before you say “oh, only that”, think this through. This is a pretty big deal. Put differently, this is messaging where:

  • Conversations are a first-class concept, and each message is a part of a conversation.
  • Each message can be presented by a different application.

Because Wave is on the web, the set of applications is come from all applications in the world (on the web).

E-mail clients can present “threads” that are something like conversations, but this requires a bit of guessing and fakery on the part of the client, and is easily confused. IM clients can present a log of messages – that looks something like a conversation – but does not work when you switch between devices. Given that email and IM servers know nothing about conversations, the clients do pretty much the best they can. If you know about conversations on the server, you can do a better job on the client.

When conversations are a first-class concept, there are lots of new possible functions.

On the flip side, the Google Wave folk have made some (common) missteps.

  • Using new names for existing notions is sometimes useful, but more often not. Calling a conversation a “wave”, calling items (messages) within a conversation “blits” … I find this more confusing than useful. This reminds me more of talk from marketing folk than engineers.
  • It’s Talk, on the web. As a programmer, I get it – this is kewl! Writing a web application that allows all participants to see each character typed – is tricky. As a demo of tricky programming, this is sure to impress. As a useful feature in a general use application – not so much. Character-at-a-time message traffic is a huge load increase on the server, and a disservice to the users. Think about:
    • Locus of attention – the on-screen activity from updates to other users’ incomplete messages is sure to distract.
    • Quality of expression – how often have you started writing a message, then revised what you said and how you said it, before sending? Do you really want everyone to see your missteps … always?

    Note that both are a disservice to all participants – cool in a programming demo, not cool in a deployed application.

Other bits….

  • It looks like Outlook. This was new and cool in 1997, but not so much now. Not at all sure this layout is efficient, especially given the changes in most-common screen layouts. In 1997 you would target 640×480, 800×600, and 1024×768 on 14″ to 17″ as most common screen sizes. Today you are looking 1280×800 on 14″ or 15″ as your minimum size, and 1920×1080 at 20″ or bigger as common. Present and future screens have not so much new height, with lots of new width – in fact too much new width for text content. Seems a different presentation is in order.
  • Mixed applications in one web page is cool and exciting as ActiveX was in the web browser in 1996. Viewed in terms of the good things it can do, ActiveX allows an open-ended mix of applications to present within a web page – very cool. The Microsoft folk were blindsided (as constructive, well-intentioned folk) as they did not consider all the bad things that could be done (and with ActiveX, very bad things are possible). The Google folk have lots of history to make them wary, and the scope for harm is much smaller, so likely little bad stuff get through. Still – do not underestimate the black hats. There are bound to be some exploits (though likely few and quickly suppressed).

In sum, a decent start on a great idea. I’d want to be pretty ruthless about overhead that takes height away from the conversation view. I’d drop the per-character updates. I’d drop the use of funky names for existing ideas. None are fundamental problems, and all are easy to address.

2009.10.16

PrinterJob.pageDialog() in Java is broken?

Filed under: Software — Preston @ 10:05 pm

More specifically, the handling of margins, PageFormat, and the PrinterJob.pageDialog(PageFormat) seems to be broken.

Fixing up an old Java desktop application for viewing old “green-bar” reports. Should be pretty simple – the reports are all fixed pitch text. Given I like to do things that “just work” (from the users perspective), I’d added live “smart” definition of controllable font-auto-sizing, empty page suppression, and margins. Given that old printed reports may be formatted for paper sizes different than in your desktop (or network) printer, this was slightly tricky, but only that.

Of course, many of those old reports – meant for old line-printer green-bar paper – are best printed in landscape orientation. To my surprise, I find when in landscape orientation, the return from PrinterJob.pageDialog(PageFormat) consistently alters the margins, even when no changes were made, and the error is cumulative.

Um, what?

At first I assumed the error was in my code, but this bit of logged output caught the problem:

PageFormat validate::
	paper dx: 792.0 dy: 612.0
	margins top: 72 left: 72 bottom: 72 right: 72
	orientation : LANDSCAPE
PageFormat after pageDialog::
	paper dx: 794.0 dy: 613.0
	margins top: 72 left: 74 bottom: 73 right: 72
	orientation : LANDSCAPE

The first is the contents of the PageFormat before calling PrinterJob.pageDialog(PageFormat), and the second is the new PageFormat returned. Note the +1 dy bump and +2 dx bump to the page extents (and the matching errors introduced to the left/bottom margins).

The code I used to remove the error:

            controller.pageFormat = job.pageDialog(controller.pageFormat);

            // Work around an interesting(?) bug in PrinterJob.pageDialog().
            // For some reason the returned Paper is over-sized. (Why? Dunno.)
            // Use the Paper returned from PrinterJob.defaultPage() to get a proper size.
            // If the Java folk remove the bug, this code should change nothing.
            PageFormat pf = job.defaultPage();
            Paper paper1 = controller.pageFormat.getPaper();
            Paper paper2 = pf.getPaper();
            double x = paper1.getImageableX();
            double y = paper1.getImageableY();
            double dx = paper1.getImageableWidth();
            double dy = paper1.getImageableHeight();
            paper2.setImageableArea(x,y,dx,dy);
            controller.pageFormat.setPaper(paper2);

The bug was encountered using Java 1.6.0_15 (64-bit) on Ubuntu (Linux 2.6.28-15-generic #52-Ubuntu SMP Wed Sep 9 10:48:52 UTC 2009 x86_64 GNU/Linux).

2009.10.05

Example – general purpose Trie in Java

Filed under: Software — Preston L. Bannister @ 8:14 am

Run across mention of the Trie data structure in a slightly random discussion. Had no notion of how performance of a Trie compares with the usual hash table, so wrote a general-purpose Trie implementation in Java (sources in an Eclipse project) with a bias toward performance.

The results are not encouraging. From a test run:

=== words
14 ms - 98569 words loaded

=== hash
3837/second - hash map loaded {4007 ms, 15376764 operations = 260 ns/op}
7953/second - hash map re-loaded {4003 ms, 31837787 operations = 125 ns/op}
12958/second - access each item in hash map {4001 ms, 51847294 operations = 77 ns/op}

=== trie (wide)
343/second - loaded trie (wide) {4022 ms, 1379966 operations = 2914 ns/op}
2089/second - re-loaded trie (wide) {4010 ms, 8378365 operations = 478 ns/op}
2165/second - access each item in trie (wide) {4005 ms, 8674072 operations = 461 ns/op}
41533124 slots of trie (wide)
225890 nodes of trie (wide)

=== trie (tall)
2254/second - loaded trie (tall) {4022 ms, 9068348 operations = 443 ns/op}
2446/second - re-loaded trie (tall) {4029 ms, 9856900 operations = 408 ns/op}
2429/second - access each item in trie (tall) {4017 ms, 9758331 operations = 411 ns/op}
6982694 slots of trie (tall)
446075 nodes of trie (tall)

=== string to UTF8 conversion
4708/second - word to UTF8 (stock) {4019 ms, 18925248 operations = 212 ns/op}
10503/second - word to UTF8 (fast) {4007 ms, 42088963 operations = 95 ns/op}

The test data is a file of 98569 words found in Linux installation. The words are read from a file, and loaded into an array of strings. The words are loaded into a HashMap (a library class) for comparison, and loaded into both a “tall” and “wide” Trie (differing space/time trade-offs).

The notion used to generate a “fast” Trie is for each node in the Trie to index off a byte in the UTF8 encoding of key String. Along the way, found that the stock String to UTF8 conversion to be a bit slow, so wrote a faster version. Even with the faster encoding, measured Trie performance is not good.

(Bit of a shock to find the stock String-to-UTF8 conversion slow in Java, as this is used a lot – especially in web applications – and may be significant enough to effect benchmarks. Java has been around long enough, so that I expected better. In fact, I expected UTF8 to be in native code in the JVM. Byte-bashing code is better suited to C/C++.)

If strings were already in UTF8 (or ASCII) format – as in C/C++ code – a similar Trie implementation would perform somewhat better, but still use too much memory. Whether a Trie in C++ could more closely approach the performance of a hash table is a bit of an open question.

Not sure why the “wide” Trie (using one node to index one UTF8 byte) is slightly slower than the “tall” Trie (using two nodes to index 4-bits each from one UTF8 byte). Expected the “wide” version to be faster. Suspect the larger memory use is enough to bust the CPU cache, and on smaller data the “wide” variant may be faster. Also saw a lot of jitter in the times – presumably due to Java GC activity (garbage collection) – though the relative results were consistent.

Clearly this “fast, general-purpose” Trie implementation is not overly useful – at ~5 times slower than a hash table, and huge use of memory. Alternate forms can be more space-efficient, but are likely to be slower.

As a guess, a Trie is not going to find much use in my code. :)

2009.10.02

Concurrency and threading is the new thing, again.

Filed under: Software, Web — Preston L. Bannister @ 12:56 pm

Tim Bray is writing a series of posts, taking a run at the concurrent programming problem, with a focus on languages. I think Tim is aiming in the right direction, but has his focus set at the wrong distance.

There are good reasons to take a run at the problem. Physics is changing what we can expect from future computers. Starting a few years back, and barring any unexpected shifts in technology, the rate at which a CPU can process a single instruction stream will increase only slowly. The economics of chip fabrication allow us to build a CPU with multiple cores. The physics of power consumption tell us that we can get more computing done per watt with slimmer cores at slightly lower clock rates. All of which argues for fabricating CPU chips with a slowly increasing number of cores, and slowly increasing clock rates.

All of which means that to make full use of present and future CPUs, there has to be a lot of concurrent computing.

Concurrency in programming is tricky, and often got wrong. This is nothing new. My first job out of college (so many years ago) was to work of an Ada compiler (a computer language with direct support for multi-threaded interaction) on a product (the Pascal Microengine) that had thread support built into the CPU’s microcode. There was then much talk of how to do concurrent programming.

What we learned then and in the time since is that fine-grained multi-threaded concurrent programming is tricky, and very easy to get wrong. For the bulk of programs and programmers, there is very limited need for this sort of concurrency. All things considered, this is probably a good thing.

Tim Bray – and the folk responding to his post – are mainly focusing on a programming language for concurrent programs. I suspect this is a mistake. Maybe some new (or newish) programming language will make bug-free concurrent programs easy, but I do not think this is likely.

I think we already have the bulk solution. Web-scale applications (at least those that work well) make use of large numbers of CPUs, with a huge amount of concurrent execution. Web-scale programming is mostly about swarms of small-scale execution single-threaded programs (not necessarily small), well isolated from other threads. Many of the attributes Tim lists are true – or mostly-true, or should-be-true – of web scale applications.

Clearly the web-scale application approach does not work for all applications – though it may work for more than you expect. There are always going to be some applications that need fine-grained threading … but I suspect this group is very small. For the bulk of programming we want to allow for massive concurrency, but well-isolated and coarse-grained.

Why the focus on programming languages that support fine-grained concurrent programs? What problem – in the application space – does that solve? Are more than a tiny fraction of applications in that space?

My answer to Tim’s question is to point at the concerns of web-scale applications and cloud-computing. The problem does not drive an interest in new programming languages. The answer to large-scale concurrent execution is – for most applications – large numbers of single-threaded programs, responding to requests. Tim’s list of characteristics – in part – is useful for that sort of programming. Editing down the list:

(Have to admit, I am not sure the above grouped notions are distinct, when viewed at this level.)

For the bulk of programmers and applications, the main needed change is finding the simplest possible adaptation to the needs of web-scale programming. Once done, concurrency and threading are solved problem. We do not want or need fine-scale single-name/address space multi-threading – that way lies madness. We do need well-isolated single threads, in mass numbers, cooperating across web-scale process, machine, and network boundaries.