random memes }

Wrapping up Wide Finder 2

The main point of the Tim Bray's Wide Finder 2 exercise was to solve a common problem (processing web server logs files) in a way that took full advantage of a large number of CPUs. As Sun is shipping machines with many cores per chip, and as Tim works for Sun - he has good reason to use his soapbox to take a public poke at the problem.

My approach to solving the Wide Finder problem is meant to be re-useable. As such, I am not interested in exotic languages, or solutions minutely specialized to this exact problem. I do not expect to get the highest possible performance, but I do expect to get reasonably close - close enough that the difference in runtime is not significant compared to the programmer time saved through easy re-use and adaptation.

Large log file processing is something I have done quite a lot of in the past. Each time I tended to be looking for slightly different information and correlations. In that sort of usage you want simple, clearly written scripts that can be easily adapted to the present need. My solution to Tim's problem is written for just that sort of usage.

Getting back to the original notion - just how does the benchmark compare when run on the "wide" Sun box to a generic x86 box? From the prior test got about 22MB/s running on the Sun box. My local x86 box (Athlon64 x2 4800) gets about 15MB/s. This is a bit unfair to x86 as the Sun box is current generation, and my x86 box is a few years old. I would guess that a current generation quad processor x86 box would more than double the processing rate, and out-perform the Sun box by a fair margin.

In fact, this is not a bad showing for Sun. This particular solution to the problem becomes much "fatter" (more CPU cycles used) when distributed across many CPUs. For usage that can be more cheaply distributed (like web serving) we could expect proportionately better performance from the Sun box. A more specialized solution - with multiple threads updating shared in-memory data - should perform better on this problem, but would likely also prove harder to re-use (and less interesting for my usage).

Upshot - for this sort of solution, the Sun box is not going to out-perform an equivalent generation x86 box.

Prior "Wide Finder 2" articles: