My father sent a link to an article in a hardware-oriented magazine - EDN (Electronic Design News) - about the upcoming Power7 CPU from IBM.

This is out of our fields, but interesting.

</p> http://www.edn.com/article/CA6686259.html?nid=2435&rid=8150303

I liked the one comment: “the generally horrible code from the Microsoft world”.

My characteristically reserved response:

Yes, it is usually cute when hardware guys talk about software.

</p> As I remember, the result came from the large number of branches found in “the generally horrible code from the Microsoft world that dominated tasks in those days”. Except this characteristic is common to most modern code, not just in “Microsoft world” code. It comes from complex behaviors - an increasing and increasingly common characteristic in contemporary code. Outside a small set of niche applications (with large number crunching), you are going to have a lot of branches. Branching limits the value of concurrent instruction dispatch, and speculative execution (as we might guess). Results pointed at more than ~3-way dispatch as on the wrong side of the “knee” of the curve.

I think the chip design offers a hint as to their expected application domain:

Each of the Power7’s cores has 12 execution units, including two each load/store and fixed-point, four double-precision floating-point, and a decimal floating-point unit. The core can dispatch six instructions per cycle.

With four floating-point execution units, my guess is that they are aiming at niche mass number-crunching applications, not general purpose applications. Combined with the large shared-cache, these should do well on problems that allow highly concurrent bulk number crunching. Might see a lot of these to run weather simulations (to predict Global Warming), and some sorts of large engineering design calculations/simulations.

As a general purpose CPU, my guess is that benchmarks (when and if they appear) will be unimpressive.

Add to this the fact that most big new web applications are moving (have moved, in fact) to a no-shared-memory model - lots of single-board nodes with 1 to 8 CPU sockets, no shared memory between processes, connected via ethernet to large-scale high-performance redundant datastores (specialized database/filesystems).

Which makes the really cool design aspects of the Power7 pretty much useless. This is not the sort of CPU that would ever see large-scale deployment in a web-service datacenter.

Too bad the writer of the article does not have a clue about software, but no surprise from a hardware magazine.

Oh wait! I take it all back! This CPU could be terrific at the more compute-intense computer games! Perhaps Microsoft will pick this up for the next XBox? Super high frame rates in MegaBlaster2010! … except highly concurrent shared memory applications are a real bitch to write, even worse to debug, so better make that MegaBlaster2012, or 2013, or 2015, or … (hey! they never did ship!?!).

… maybe not. :)