Preston L. Bannister { random memes }

2006.06.29

Template for News Stories on Government Data Gathering

Filed under: Humor, Politics — Preston @ 8:15 am

Concurring Opinions: Template for News Stories on Government Data Gathering
Under a top secret program initiated by the Bush Administration after the Sept. 11 attacks, the [name of agency (FBI, CIA, NSA, etc.)] have been gathering a vast database of [type of records] involving United States citizens.

“This program is a vital tool in the fight against terrorism,” [Bush Administration official] said. “Without it, we would be dangerously unsafe, and the terrorists would have probably killed you and every other American citizen.” The Bush Administration stated that the revelation of this program has severely compromised national security.

“This program is a threat to privacy and civil liberties,” [name of privacy advocate] said. But [name of spokesperson for Bush Administration] said: “This is a very limited program. It only contains detailed records about every American citizen. That’s all. It does not compromise civil liberties. We have a series of procedures in place to protect liberty.”

“We’re not trolling through the personal data of Americans,” Bush said, “we’re just looking at all of their records.”

The [name of statute] regulates [type of record] and typically requires a [type of court order]. Although the [name of agency] did not obtain a [type of court order], the Bush Administration contends that the progam is “totally legal.” According to the Attorney General, “we can [do whatever we did or want to do]. The program is part of the President’s emergency war powers.”

Posted by: cshardie at June 29, 2006 10:56 AM
Good stuff. But I think you’re missing a paragraph or two:

Administration officials expressed anger at the disclosure of the program by the [newspaper or magazine breaking the story; default value: New York Times]. “Some in the press have made the job of defending against further terrorist attacks more difficult by insisting on publishing detailed information about vital national security programs,” [high-ranking administration official] said, adding that the program provides ‘valuable intelligence’ and has been ’successful in helping break up terrorist plots,’” though no specific examples of intelligence or foiled plots could be recollected.

Other supporters of the Administration went further. [Representative/Senator] [rabid, boot-licking, pseudo-fascist congressman], [Republican/Democrat] of [state] and the chairman of the [rubber-stamping committee of some kind], released a letter in which he called on the attorney general to investigate whether [publishing newspaper]’s decision to publish the article violated the Espionage Act.”

2006.06.26

New Ubuntu and VMware Server install

Filed under: Software — Preston @ 9:33 pm

So … a machine dies in Virginia – the one hosting a wiki and a Notes server for our development group at work. Bit of irony here – seems it is the RAID card that died. The hardware is an old desktop box with a collection of 9GB disks in a RAID 5 configuration. Probably not worthwhile to re-build the machine.

Had a Windows XP box in the local office that was not seeing much use – it was used to host installs of software prototypes, and for testing with different databases (Oracle and Microsoft SQL Server), among other things. Had wanted to put VMware on the box eventually, to support testing with multiple server configurations (for folk in the office – I have my own test box here at home). Given the dead box is Virginia, seemed hosting the Notes server and wiki on this box was the best alternative.

Downloaded and burnt a CD with the Ubuntu 6.06 “desktop” install. Fired up the install and … everything just worked. Simpler and slicker than a Windows installation.

Was not sure, but thought that the free VMware Server might be appropriate. Downloaded the VMware Server installation files and documents (not real clear what was required – so grabbed everything). Hint: The installation instructions are in the “Server Administration Manual”. Once the install instructions were found, installation was very straightforward.

Copied over some pre-configured virtual machines from my home test box (using a paid copy of VMware Workstation) – clean/base installs of Windows NT Server, Windows 2000 Server, and Windows 2003 Server (I have a developer’s license, so this is all legit). Fired up the VMware Server console on the new Ubuntu box – and after a bit of housekeeping, the various Windows instances were ready to go.

Now for the very cool part – installed the “VMware Server Console” on my Windows box, and I can now start, control, and interact with the VM’s running on the Ubuntu box – as though they were running locally! OK – if you’ve played with VMware server products before, this is probably old hat. I had not, so this was a pleasant surprise.

The free version of VMware Server is lacking the “clone” operation in VMware workstation (fair enough). Copying an entire VM image is a bit of a pain (copy the folder containing a rather large amount of data), but do-able. Odds are the VM hosting the Notes server and wiki will eventually be transferred to a machine run by the IT folks, running a full version of VMware (once they procure more hardware).

Setting this all up was a lot easier than I had expected. Something amusing about using Linux to make Windows installations easier to use. :)

Say what?

Filed under: Web — Preston @ 9:32 pm

Doubtless I have been guilty of just such an obscurity…

Jon Udell: Say what?
“By syndicating metadata, I’m inviting others to more richly contextualize their aggregations of our stuff.”

Sounds like something you might say after a reading of Vogon poetry. Come to think of it, I am not really sure what the heck the above phrase is supposed to mean. Of course, I can pretend with the best…

A long time ago I ran across a notion, something like – “Anything you understand well you can explain to a child.” – or the converse – “If you cannot clearly explain an idea to a child, you probably do not fully understand the subject yourself.” Kind of a sanity check on your level of understanding. :)

I am sure that the notion Jon is trying to express would be a lot clearer – to everyone (including Jon) – if expressed more simply.

2006.06.23

Your data isn’t yours, with AT&T

Filed under: Politics, Web — Preston @ 8:42 am

AT&T has rewritten their privacy policy so in essence “privacy” is gone.

AT&T rewrites rules: Your data isn’t yours
AT&T has issued an updated privacy policy that takes effect Friday. The changes are significant because they appear to give the telecom giant more latitude when it comes to sharing customers’ personal data with government officials.

The new policy says that AT&T — not customers — owns customers’ confidential info and can use it “to protect its legitimate business interests, safeguard others, or respond to legal process.”

The policy also indicates that AT&T will track the viewing habits of customers of its new video service — something that cable and satellite providers are prohibited from doing.

Moreover, AT&T (formerly known as SBC) is requiring customers to agree to its updated privacy policy as a condition for service — a new move that legal experts say will reduce customers’ recourse for any future data sharing with government authorities or others.

The company’s policy overhaul follows recent reports that AT&T was one of several leading telecom providers that allowed the National Security Agency warrantless access to its voice and data networks as part of the Bush administration’s war on terror.

“They’re obviously trying to avoid a hornet’s nest of consumer-protection lawsuits,” said Chris Hoofnagle, a San Francisco privacy consultant and former senior counsel at the Electronic Privacy Information Center.

To a degree, I cannot blame AT&T for this move.

AT&T is also believed to have participated in President Bush’s acknowledged domestic spying program, in which the NSA was given warrantless access to U.S. citizens’ phone calls.

But the company also asserted that it has “an obligation to assist law enforcement and other government agencies responsible for protecting the public welfare, whether it be an individual or the security interests of the entire nation.”

Government spying programs expose AT&T to potentially massive liability from customers. To protect themselves they probably had no choice but to change their written public policy.

The new version, which is specifically for Internet and video customers, is much more explicit about the company’s right to cooperate with government agencies in any security-related matters — and AT&T’s belief that customers’ data belongs to the company, not customers.

My choice is to not be a customer of AT&T, as much as possible. I only hope Cox (my internet, cable, and phone service) will not cave in similar fashion.

2006.06.15

Kind of annoying, actually

Filed under: Software — Preston @ 10:33 pm

About a decade ago I was working on an application that used strings quite a lot – enough to be a major factor in performance. At the time I came up with a simple, lightweight C++ string class that took full advantage of the most optimal code – at the time – and benchmarked as radically faster than either the Microsoft or standard C++ string classes. I have re-used the same string class (which is simple enough to write from memory) in just about every C++ application since.

A while back I documented a series of exercises to check whether the string class code was still optimal on current compilers and CPUs. Quite a lot has changed in ten-odd years – so possibly what was once optimal might not be now. Some interesting observations came out of the exercise:

  1. The specialized x86 instructions for string operations are now slower than performing the same operation with generic x86 instructions.
  2. Out-of-line calls to the library string functions generally equals or beats simple inlined code.
  3. The default optimization settings for Microsoft C++ compiler enable the use of x86 string ops – which is not optimal for current generation CPUs.
  4. My simple string class does indeed still radically outperform the standard C++ string class.

The aim of the exercise was to check my assumptions against current generation compilers and hardware. The end goal was to determine the best performing string class for use in my applications. The upshot was I added a single #pragma to the string class implementation, and performance is once again optimal for current generation hardware.

I do indeed have a better string class – by measured performance.

For some reason I am still getting a trickle of hits from this article, where the article – and the comments – are just plain wrong. Kind of annoying, this.

Freedom of expression

Filed under: Politics — Preston @ 8:52 pm

Oddly Enough News Article | Reuters.com
A former Nazi officer living on a Wisconsin farm has built a memorial to Adolf Hitler in his bunker-like tractor storage shed, much to the chagrin of local officials.

A court order was being sought to stop Ted Junker, 87, from opening to visitors a 30- by 50-foot (9- by 15-meter) concrete building that he has spent $200,000 outfitting with photos of Hitler, a swastika-emblazoned flag and other items.

Whether or not you agree with this guy’s point of view, he should have the right to express himself. Putting up a display inside a building on his farm – seems pretty harmless. Or about as harmless as free speech ever needs to be.

Breakage – or misapplication?

Filed under: Software — Preston @ 11:17 am

So …. there’s this thread across weblogs discussing the “breakage” of binary search when the search algorithm uses an array, and the index into the array approaches the maximum value for a 32-bit signed integer … and I am having trouble classifying this as a bug.

Let’s talk about sizes a bit. Programs on 32-bit machines can address at most 4GB of memory – with most desktop machines limited to less. Given that the smallest possible meaningful object is an int (which requires 4 bytes), no 32-bit program could create an array over ~1 billion items – which makes the “bug” found impossible.

Of course, doing a binary search of anything close to a billion integers is … dubious. Other algorithms could yield much better results.

  1. For integers, address-based lookup on a bitmap would be O(1) and require at most 0.5GB storage. (There is also a huge saving in the initial sort, and any re-sort on changes.) On pure performance this is the winner. Just on the basis of storage this is the better solution (over an array) when the number of integers approach 128 million.
  2. For smaller (but still large) numbers of integers, you could “snap” the value – use half for address-based lookup (replacing up to 16 comparisons), and half for binary search (over at most 64K values). Storage needed is a few times 256KB (pointers plus heap overhead) plus 2 bytes per value – half the storage needed for single array. Performance is less than (1), but less storage is needed if the number of values is less than ~250 million.
  3. For even smaller numbers of integers, I might be tempted to “snap” out the low byte for use in address-based lookup into 256 buckets, and binary search an array of integers in each bucket. Storage is slightly more than a single array, but we’ve saved ~8 comparisons on lookup. For a long-lived application where the set of values to be searched may change over time, we are going to save some time when maintaining the sort order on smaller buckets. We might want to benchmark to discover the cross-over points in performance.

In most applications what you are searching over is objects, not integers. Objects are referenced through pointers, contain more than an int probably cost a couple words of heap overhead. So in the usual case you limited to ~250 million objects (or much less).

Now with 64-bit machines and a matching 64-bit JVM we have the possibility of Java programs that could address more than 4GB of memory. We are talking about server-class machines. (The number of desktops configured with more the 4GB is statistically insignificant – much smaller than the number of crooked politicians). On a server-class machine if you are going to the trouble of loading billions of objects into memory, odds are you are going to do a lot of searches, and performance matters.

While binary search scales well, with billions of objects some form of hashing and address-based lookup is certain work very well and chop a huge chunk off the runtime – and the time saved is likely critical to the application. When time is critical, and data and the number of repetitions is large, you also need to think about the impact of sparse data on the processor cache. If the frequently-accessed data can fit within the processor cache, odds are you gain another healthy boost in performance.

With long-lived applications you also need to be concerned with the performance cost of maintaining your data structures when the values searched change over time. A large single array is almost certainly an expensive choice.

Taking the above into account, my suggested “fix” for binarySearch would be something like:

    IF count of objects remotely close to 1 billion
    THEN throw AbuseOfAlgorithmException() AND terminate programmer.

Seems to me the “bug” is more with the programmer’s choices than the algorithm.

2006.06.12

REST notes

Filed under: Software, Web — Preston @ 2:56 pm

REST Web Services
Create a URL to each resource. The resources should be nouns, not verbs. For example, do not use this:

    http://www.parts-depot.com/parts/getPart?id=00345

Note the verb, getPart. Instead, use a noun:

    http://www.parts-depot.com/parts/00345

To this I would add the modification, use instead:

    http://www.parts-depot.com/parts?id=00345

In the KISS spirit this does the right thing most of the time without further elaboration. Typically the response to URLs containing “?” query strings are not cached. This is the right default as applications typically cannot return a meaningful “last-modified” time without extra work. So no extra work always returns the correct (uncached) response – by default.

Of course, if you do want to put in the extra work, and it makes sense for your application, you could add the modification time to the response header. The point is that the simplest form is always correct if you use a query string, and optimization is still possible.

Next Page »