random memes }

What makes a good "Databox"?

What is a good, reliable way to store and preserve your personal data? A topic I have had reason to chew on, of late. Yes, I got bit.

A week or so back (when dealing with some other hardware problems), I heard a dwindling screech from my home file server box. Not good, the disk with most of the data died.

No problem - I had a backup (if slightly old) on another machine - or so I thought. Turns out, I couldn't find the backup. Most likely at some point I reformatted a hard disk during an install, not recalling that presence of the backup. Not good - big time.

OK - so clearly the approach of backing up to another system (with space) did not work out. I do not like to repeat mistakes.

Considered buying a RAID box from Buffalo or Infrant. These standalone RAID boxes offer great protection against disk failure, are quiet, and use less power than a PC. But what would happen if some other part in the box failed? Would I be able to get replacements (say) ten years from now? Would there be any other way to get the data off the RAID? Guess I could buy two RAID boxes and ... maybe not.

Another interesting notion was to run ZFS (very cool, BTW) on a generic box. Replacements would be less of a issue. But ZFS requires Solaris. Setting up Solaris on generic hardware is bound to be more trouble. Once setup a Solaris box (at least at this time) is going to be less useful. If I am going to leave a box powered on all the time, hopefully it could serve more than one role (like hosting VMware for testing). The OpenSolaris distributions are still very much bleeding edge - and I don't need another hobby.

With some reluctance, I discarded the notion of using ZFS. Today there is a posting from Tim Bray on this topic.

ongoing · The Databox "The Databox has one or two cheap-ish CPUs running Solaris, ten or so cheapish disks, and offers a half-terabyte or so of completely reliable, completely maintainable, network-accessible storage for your data, which lives in ZFS, striped and replicated across the disks. "

As much as I like this idea, for personal use it lands somewhat on the impractical side. To get the power and noise down, you are likely to end up with purpose-built hardware. A solid, reliable file server could be in use for a long time. With no guarantee of available replacement hardware, a non-disk failure years from now could put you in a really bad position.

Generic hardware and generic software gives you the best chance of finding replacements in the long-term.

Probably the most reliable setup would be to have two generic PCs powered on. One would act as the file server. The second could store an "rsync" copy, and be configurable to become the file server. This would protect against any single hardware failure, and replacements are not special in any way. But this means two PCs powered on all the time, with the attendant noise and power consumption. Ick.

Settled on having one box with two identical 300GB disks. One disk will hold the files served, and the second will hold an "rsync" mirror of the first. I can setup cron jobs to run "rsync" nightly, and compressed tars (say) weekly. Less often, I could copy off a compressed tar to another box. Once setup, one less chance for human error.

Yes, I know I could use an software RAID to replicate files. With nightly "rsync" runs I get some protection from accidental deletions - at least until the next "rsync".

Incidentally, I bought Maxtor 300GB disks (the local Staples had a good price). Seems this was a minor mistake. Curious about whether the power supply was adequate, went and looked up the power requirements for various 300GB drives. Seems current generation Western Digital drives use about half the power, and can be got for pretty much the same price (online) as the Maxtors. The dual Maxtors are almost too hot to touch (~120°F), even with an extra fan pulling air around the drives.

Just about finished setting up the new file server (which is also my CVS and Subversion server, hosts VMware images of various Windows OS versions, etc). Ran Memtest86+ on the memory (which turns out is not up to spec!). Ran SpinRite on the disks. This is all in a very solid steel mid-tower PC Power & Cooling case, with well-reviewed very quiet Seasonic power supply (and vinyl floor tile for additional noise suppression - I kid you not).

In fact, though I did not have a complete backup of the old file server, I did not loose much. The small personal stuff is replicated online. Almost all my photos are up on Flickr. The mp3 files are on an iPod (or can be re-ripped from CD). The CVS archive has multiple backups. About all the leaves is the installation files for software I've bought and downloaded - may have lost some license keys. The Magnatune WAV files are gone. The downloaded ISOs and license keys from MSDN can be grabbed again. Could have been worse.