Preston L. Bannister { random memes }

2007.02.26

Why buy new server versions of Windows?

Filed under: Software — Preston @ 9:36 pm

Just as businesses are starting to discover the benefits of virtualization, Microsoft is changing the license terms for new versions of Windows.

VMTN Blog
The goal from Microsoft seems to be to slow down the market and downplay features they can’t match, so that they have a chance to catch up.

In fact this is in direct conflict with the needs of their customers. The move by Microsoft not at all surprising. Before virtualization, businesses had to choose one operating system to install on their servers, and there was often one application that forced that operating system to be Windows. Once you installed an operating system on your server, you would likely end up buying other applications that ran on Windows – and Microsoft had a pretty strong lock on those customers. With virtualization a business can ran exactly as many different operating systems as they need, and new applications need not run on a Microsoft operating system. This is a threat to Microsoft’s lock on the market – so naturally they want to prevent the use of virtualization (except on their own terms).

Nothing surprising here.

The kicker is – if I were a business making use of virtualization, I would be looking at two things right now:

  • Moving services off Microsoft’s operating systems where practical.
  • Buying up old licenses for Windows 2000 (and 2003?) Server that did not limit use of virtualization.

Right off the bat, I bet we could free up some licenses by moving file and directory servers from Windows Servers to Samba on Linux (or Solaris). The freed licenses could be used to host any legacy Windows Server applications. In the mid-term I would be looking at moving any databases hosted on Windows Server machines onto one of the many non-Windows alternatives. Certainly if I were a forward-looking business, I would not be fond of Microsoft limiting my choices while cranking up costs.

With Samba, moving file and directory services off Microsoft servers is dead easy. Moving databases off Microsoft servers is more difficult (depending on the application), but given a bit of lead time, often quite feasible.

This is a dangerous gambit for Microsoft. By restricting the choices available to businesses currently using Windows Servers, they run the risk of pushing customers off Windows more quickly.

2007.02.24

Structured documents, DITA, DocBook, and Wikis

Filed under: Software, Web — Preston @ 7:44 pm

Dived into reading about DocBook and DITA this week.

This all started with my long-standing annoyance with MS Word (or OpenOffice at present) when editing documents with structure. Back in the early 1990’s, I was at a small outfit writing both software and documentation. The documentation was somewhat long (hundreds of pages), and illustrated well how poorly MS Word was suited to longer documents. What to maintain a consistent format through the document? You need to learn to use customized styles. Need to make a consistent change through the document? You need to eliminate any deviations from your customized set of styles. Possible … but very tedious.

Skip forward to the more recent task of writing design documents. Style is an annoyance, as the master “template” documents (done by someone else) are a bit of a hash. My other long-standing annoyance is that documents of this sort tend to be throw-away. Extracting information in design documents is always done manually. The documents have structure, but the structure is not in the document in any easily programmatically accessable form. Seems the lack of consistent manageable style and the lack of explicit accessible form – are simply two sides of the same problem.

You could encode the structure of your documents as XML. You could then apply transforms and stylesheets to get the appearance you want. Certainly possible … but rather than start from scratch, mucked around looking at for what other folks had done. No point in re-inventing the wheel, after all.

Found DocBook. DocBook has been around a fairly long time, and has much supporting material (books, articles, and software support). On the other hand, seems to be oriented rather heavily towards books (not entirely appropriate in my context), and to be rather cumbersome to learn and use. Might be convinced to make the effort, but would I want to inflict this on all the other developers in the company? Nope, not really.

Found DITA. As a more recent product, DITA seems to have learned from DocBook. DITA is heavily used/supported by some folk at IBM. Might be a bit lighter to pick up and learn than DocBook. There is an open source DITA toolkit, though rather cumbersome to learn and use. Would I want to inflict this on other developers? Nope. Still too “heavy”.

Had briefly hoped that OpenOffice (with it’s basis in XML) might have some sort of direct support. There is some support for DocBook that kind’a/almost works. So much for the notion of generating structured documents from a word processor.

Took another look at Apache Cocoon and Lenya. Seems like a CMS that processes structured documents as XML might fit. But Lenya seems to be a dying project, and as such a dubious bet.

So after burning much of the week, came up pretty much empty-handed. After chewing on the question for a while, my opinion is that adopting a “heavy” framework (like DocBook or DITA) would be a mistake for occasional use in creating documents with lightweight structure. Better to use something lightweight to begin with – like a wiki – with support for exporting documents as XML with transformable structure.

But what to use as a base?

2007.02.23

OpenID is a big win

Filed under: Software, Web — Preston @ 8:06 pm

So if you are a developer the notion of single sign-on via OpenID sounds like a good idea, but since it lacks authentication, you might not be entirely convinced. Well, this story is going to sound familiar.

I wanted to check my plan with Verizon. Not sure how many minutes I have every month, mostly because I never go over. Tried to login to the Verizon website, and (of course) I could not remember my username on their site. Tried the first most likely … nope. Tried the second most likely … nope. OK, at this point I do not know if I got the wrong username, or the wrong password. A few more random tries, and they might lock my account.

Perhaps this sounds familiar? Lots of sites, each wanting a unique login. On some of which your preferred username is disallowed or already taken. On some of which your preferred password(s) are disallowed. Many of which you do not visit very often, so you are not going to remember your username/password, the next time you want to login.

Yep. I would really prefer that the Verizon site was OpenID enabled.

2007.02.22

“Groundswell” means what?

Filed under: Politics — Preston @ 5:30 pm

Admittedly I am no expert on paid political organizations, but I have to wonder: What is their motivation?

Lately the CATO folk(s) are writing about differing tax rates for corporations and wealthy individuals, as though there was some sort of competition between counties. Maybe there is – but I tend to suspect this equation has more than one variable. From the latest missive:

Cato-at-liberty » Tax Cuts North of the Border
The rest of the world is responding to tax competition, and the high corporate tax rate in the US is becoming an ever-larger problem for American companies in the global marketplace. Unfortunately, there is no groundswell — or even idle gossip — for a reduction in America’s punitive corporate tax.

Is “the rest of the world” really “responding to tax competition”? What exactly is a “groundswell” for reducing corporate taxes? Sounds like wishful thinking. Maybe if they repeat the idea enough times, perhaps they can convince … someone. A variant on “if I say it three times, it must be true”?

Somehow I rather doubt there are many folks concerned about cross-border differences in corporate tax rates. Kind’a completely misses my notion of what might be called a “groundswell”.

Is this in fact some sort of paid campaign? Could it be that some person or entity is angling for a tax break, in this rather indirect fashion?

2007.02.21

Frustrated with WYSIWYG document editors

Filed under: Software — Preston @ 10:32 am

Much recent time has gone to writing design documents (fun). As usual, this is sucking up more time than I would like, in part because I find the available tools so poorly suited.

Lets set some requirements. (Oh goody, more requirements to write down…)

  • Documents should be readable by anyone on the distribution list – i.e. no specialized reader software. Anyone has a web browser, Adobe Reader, and a MS Word (or in my case – OpenOffice) installed.
  • Document templates should be in form a usable by all development folk within the company. That pretty much rules out TeX or the like, and any software with a stiff per-seat price.
  • It would be nice if the documents could be displayed from the web.

Used Google’s on-line document editor to write a “rationale” document – essentially somewhat free-form story describing what have done, what we are doing, and what we are going to do. For this purpose docs.google.com is pretty nice. Export to MS Word *.doc format is decent. Export to HTML is … the visual result is pretty good, but the generated HTML is a little messy. Good enough for this task.

For the documents with a bit more structure, there isn’t anything very satisfactory. Given the somewhat well defined structure, seems we would be best off with some sort of structured document format. XML seems like a logical choice, in that we could capture the logical structure of the document. Editing and presentation are problems. Presentation less so, as a transform to HTML is probably no more than tedious. Editing is not so easy. How do you edit an XML document from a template without distracting the writer? Mundane tasks – like embedding an image – are trouble (you don’t really want the image in a separate file).

We have MS Word documents as templates for the structured documents. I find it incredibly annoying, every time I have to work with one of the templates, as these are not good examples even of MS Word documents. I could generate cleaned-up documents to replace the existing templates, but … I do not want to expend the energy needed to got updated templates adopted by the company when I really want something better.

I suspect that OpenOffice could be adopted as a base for a structured document editor. The model of bundled XML and associated files in a ZIP seems an inherently extensible base. If I had a few months of free time (which I do not), perhaps OpenOffice could adapted to the purpose.

Oddly enough, I find myself missing Interleaf. Many years ago FileNet had a (somewhat unreliable) port of Interleaf on their machines (we are talking about mid/late 1980’s). Ignoring it’s many failings, Interleaf had one strength – in that you were more aware of and had more control over the styles used in the document. As I remember (and this was long ago), the named style for each composite document element was shown in a gutter on the left(?) side of the screen. Changes to document format could be affected by editing the stylesheet associated with the named style.

MS Word has something similar in that there is an option (unknown to most MS Word users) to display associated styles in a gutter. MS Word is somewhat clumsy in this sort of use, as accidental or ad hoc local modifications to a common style are hard to detect. Changing the format of a common element is often only partly successful, in that you have to visually scan the entire document for elements that did not format as intended, then do a bunch of tedious poking around through dialogs trying to find the problem. The pursuit of easy WYSIWYG has made MS Word difficult to use if you want to maintain a consistent document format and structure.

Unfortunately OpenOffice – the GUI in particular – is slightly worse in this regard. The “Styles and Formatting” panel is far inferior to displayed named styles outside the page margin. Worse, for some reason(?) the named style shown in the panel is only sometimes the style of the current document element (no idea why this is). Also, as with MS Word, there is no easy way to detect a locally modified style.

What I’d really like to do is take a year off, and write a web-based structured document editor, but … oh well.

2007.02.17

Introducing RDFa – or re-implementing Lisp, kinda

Filed under: Software — Preston @ 6:08 pm

Another almost-reinventing-Lisp in XML.com: Introducing RDFa, just messier.

You could represent HTML as a simple Lisp expressions – for example:

<a href="http://some/where" onclick="foo()">some content</a>

maps simply to the equivalent:

(a ((href "http://some/where") (onclick (foo))) "some content")

You can then view the web browser as a sort of specialized interpreter with built-in default behaviors for ‘a’ and ‘href’ nodes. CSS lets you redefine some of those default behaviors, kinda (the semantics are less than complete). You can attach addition behaviors using Javascript, kinda (again, the semantics are incomplete).

If you were well acquainted with Lisp, the programming model increasing exposed by new uses of HTML, CSS, and Javascript (as offered in RDFa) – is rather old and familiar. As a markup format for documents, arguably HTML is better suited than Lisp notation. As an underlying data model, combined usage seems to be stumbling steadily closer to Lisp.

Don’t know what to do about this, but it is amusing, kinda.

Benchmarking and FastCGI

Filed under: Software — Preston @ 5:57 pm

An interesting, but not especially useful benchmark.

Quick Django Benching :: SuperJared.com
Today I did an unscientific benchmark to see which was quickest between Apache’s mod_python, Lighttpd’s FastCGI and Nginx’s FastCGI. The Django application was a basic “Hello world” application that used one variable in one template.

All of the web server configurations were stripped to their minimums. I’m not going to pretend that this was a thorough benchmark, so these results might be a bit off, but they’re useful nonetheless.

Not sure that this benchmark tells us anything interesting. There are a couple dimensions we would want to explore, to get a better notion of performance characteristics. Performance, ease of development, and ease in scaling are relevant dimensions we want to explore. Note the following applies to FastCGI, SCGI, and reverse proxies.

(Apologies to the folk that have been down this path before. You will find nothing new in what follows.)

What do we care about?

First, performance has to be an issue. For small deployments where a single ordinary server is sufficient, other concerns (ease of deployment, portability, whatever) are going to dominate. As demand increases, you get interested in how much performance you can get out of a single machine. The above-referenced benchmark tells you something about the single machine scenario (not much – but something). As demand really increases you are interested in the performance and scaling across a growing group of machines.

Say you start with a single-server configuration. Let’s talk about something like the classic web application scale-out.

As demand increases, you might want to think about partitioning your application. Static contents (HTML, CSS, images) are better served by the front-end HTTP service. The single-thread/async-IO HTTP services (lighttpd, thttpd) tend to beat to out Apache and other general-purpose HTTP services, while putting less load on the server(s). If you application is heavy on static content (say, a photo-sharing site), then Apache is at a heavy disadvantage.

So you partition your application into static and dynamic content. Static content is optimally served by lighttpd (or the like). Dynamic content is served over FastCGI by your application code (Ruby, Python, PHP, Java, or whatever). Performance serving static content is now optimal.

As demand increases further, you find a single server box is no longer adequate. At this point you buy another box, and move your application code to the new box. Note that users do not see any change – they are still hitting the same front-end HTTP server. FastCGI connects the old front-end HTTP server to the new back-end application server. Note that your front-end and back-end boxes can be configured differently, as needed by the different sorts of load (static file serving versus dynamic content generation). Note that your front-end box should be able to easily saturate a 100MB network link on static content (so you have a lot of headroom).

Lets say your application is heavy on serving static files, more than dynamic content. As load increases you need to buy one (or more) additional front-end servers. No problem – the second server can connect over FastCGI to existing back-end server. Your front-end servers will be configured specifically for your static content – light on CPU, fast network throughput, fast access to your static content (disk and/or cache). Also now you are safe against failure of any one front-end server.

As load increases you need to add more back-end servers. No problem – the ratio of front-end to back-end servers can be adjusted to exactly match the needs of your application. Note that every front-end server can connect to every back-end server, so adjusting for load and handling fail-over is straight-forward.

Note also that the stripped-down/locked-up configuration of front-end boxes suitable for exposure to the Internet is pretty straightforward. The back-end boxes (which may need a more permissive and complex configuration) do not need to be addressable from the Internet. Security is somewhat simpler.

There is another scenario where FastCGI (or similar) is interesting. Lets say your company has a single common internal-use web server. Odds are, it is a Windows/IIS box (company policy, legacy, clueless network administrators, or whatever reason). By hooking up via FastCGI you can host your web applications on a box with whatever configuration is suitable (Linux perhaps rather than Windows). Odds are the old/common IIS box is going to go a long ways just serving static content and forwarding FastCGI requests, before running out of gas. The folks putting up the web application could be in an entirely different compartment within the company, with domain-specific knowledge lacked by the folks managing the IIS server. Once the FastCGI link is setup out from the IIS server, no further demands need be placed on the IIS folks.

What should we measure?

First, we need to measure the relative performance of just serving static files using lighttpd (or the like) over Apache. Nothing specific about FastCGI use in this part of the benchmark. There is a range to this benchmark – small, mid, and large-size files – all served from cache (benchmarking disk throughput is a different item).

Next, we need to measure the throughput and latency servicing FastCGI requests. The back-end process needs to be on a different machine, does no processing (so we are not measuring back-end performance). Vary the number of concurrent FastCGI connections (and matching number of back-end threads/processes). Vary the back-end delay before responding (to simulate light to heavy processing). Vary the amount of data in the response from the back-end. We could vary the size of requests – but these are usually small enough we can ignore size. Note that we are not measuring back-end performance. This set of numbers is relatively reliable as the processing performed by the HTTP is fixed (no variable application specific code to complicate the measure).

Measuring back-end performance is a different issue, and only going to be relevant for your specific application.

My guess is that we find:

  • Much higher throughput serving static files – especially at high loads (no surprise there).
  • Much higher numbers of concurrent connections to the front-end server with good performance (on modest hardware).
  • As the responses get large, there will be a cross-over point where there is no advantage to FastCGI. The rate at which responses can be generated will be limited to what Apache in-process code can generate using a modest number of threads. (Most applications should not be generating large responses, so this is atypical).

Separation of concerns is almost always a good idea in programming. There is every chance the same notion applied to web application architecture works out as well. Performance is more predictable when the workload is uniform. Uniform workloads make choosing an optimal hardware/software configuration easier. In at least some scenarios, administration is easier.

2007.02.15

microformats revisited

Filed under: Software, Web — Preston @ 8:22 am

Microformats as generally described – a small chunk of data embedded in an HTML in one of a small number of recognizable formats – makes a lot of sense to me. Yesterday I actually visited the microformats website, read the details of the current proposal … and found myself uncomfortable with the proposed implementation. Using the same mechanisms for two quite distinct purposes is usually a bad idea. Using CSS class names for both presentation and as markers for data? Sounds like trouble.

Use of a single class name (i.e. “microformat_”) would be alright if used to wrap entire chunks of data. Not much chance of collisions between use as a marker for metadata and presentation (CSS: “.microformat_ { display: none }”). Use of potentially dozens of class names, applied to the data used also for presentation? Seems like a recipe for bad programming style. The needs for use as data in a microformat and for use in presentation often differ. Better to isolate differing concerns.

So yesterday I went from generally interested in the use of microformats, to somewhat dismayed at the proposed implementation. Today an article shows up from Jon Udell expressing a similar dissonance.

XMP and microformats revisited « Jon Udell
Yesterday I exercised poetic license when I suggested that Adobe’s Extensible metadata platform (XMP) was not only the spiritual cousin of microformats like hCalendar but also, perhaps, more likely to see widespread use in the near term. My poetic license was revoked, though, in a couple of comments:

Mike Linksvayer: How someone as massively clued-in as Jon Udell could be so misled as to describe XMP as a microformat is beyond me.

Danny Ayers: Like Mike I don’t really understand Jon’s references to microformats – I first assumed he meant XMP could be replaced with a uF.

Actually, I’m serious about this. If I step back and ask myself what are the essential qualities of a microformat, it’s a short list:

1. A small chunk of machine-readable metadata,
2. embedded in a document.

I do not know if XMP is “the answer” (I am tempted more by small chunks of JSON), but I am reasonably certain the proposal at microformats.org is trouble.

Next Page »