[Tech] Twile needs server halp ;~; Part I: Storage server
14 years ago
A little over a year ago, I got a slick rack mount case with space for 20 SATA drives. We were pushing the limits of what our previous chassis could hold, and we wanted more drive capacity in less volume.
A year later I'm back, but with different needs! This time, it's all about the OS, file system, and software.
Windows Home Server works pretty well, and it was great for our needs last year. But as our 11.5 TB has grown to 16.7 TB and we've moved out our smaller drives (presently we use 2x1 + 5x1.5 + 6x2 TB) in the name of performance, it's become evident that it wasn't the smaller drives that were slowing our system down. For whatever reason, WHS operates in a way that doesn't scale too well:
* It "balances" the drives every hour which is a lot of unwanted I/O that interferes with actual, user-initiated file transfers
* It copies files to the OS drive before offloading them to other drives, which means if you copy multiple files it's reading from the OS drive to move the first file while the second file is being written the OS drive
* File fragmentation can become an issue, I've seen movie files split into thousands of fragments
* For no reason I can explain, when the system isn't doing anything, writing or reading a single large file (10-30 GB) will sometimes (this past month, OFTEN) slow to a 2-4 MB/sec crawl which is sustained for the entire file
* The successor to Windows Home Server has ditched the Drive Extender functionality which makes a single convenient pool of storage out of any attached hard drives.
So in short, performance is ass with the main feature we want (drive extender) and that feature has been cut from the product line. Sepf and I don't want to wind up in an evolutionary dead end, so it's time for us to move on.
What we want, as before, is a system which lets you dump lots of files, ranging from tiny .txt files to 50 GB MKV files, into a storage pool. This pool should be expandable as more capacity is needed with minimal effort--ideally, little more than a reboot. We can't have automated tools which bring user transfers to a crawl, we can't have performance dropping much below the minimum sustained transfer rate of the slowest drive, and the system needs to be scalable to at least 60 TB / 20 drives. The storage pool should be accessible as a network share to Windows machines, with per-user permissions for read/write. If parts of the storage pool can't dynamically have redundancy, then the system needs to be able to accommodate multiple pools, one of which MUST support redundancy.
To summarize, we have a lot of files which we want in a centralized location spread across dozens of drives with some amount of redundancy, performance in the 50-150 MB/sec range, and different permission levels for different users. And what we don't want are to have to remember drive letters or rely on search to find where we put that one file.
Fuzzies, tell me, what are my options? At this point we're considering most anything. Including crazy shit like using an SSD as virtual memory to accelerate drive operations.
Don't say Drobo. At $1500 for their largest box it cost more and stores less than my current solution.
A year later I'm back, but with different needs! This time, it's all about the OS, file system, and software.
Windows Home Server works pretty well, and it was great for our needs last year. But as our 11.5 TB has grown to 16.7 TB and we've moved out our smaller drives (presently we use 2x1 + 5x1.5 + 6x2 TB) in the name of performance, it's become evident that it wasn't the smaller drives that were slowing our system down. For whatever reason, WHS operates in a way that doesn't scale too well:
* It "balances" the drives every hour which is a lot of unwanted I/O that interferes with actual, user-initiated file transfers
* It copies files to the OS drive before offloading them to other drives, which means if you copy multiple files it's reading from the OS drive to move the first file while the second file is being written the OS drive
* File fragmentation can become an issue, I've seen movie files split into thousands of fragments
* For no reason I can explain, when the system isn't doing anything, writing or reading a single large file (10-30 GB) will sometimes (this past month, OFTEN) slow to a 2-4 MB/sec crawl which is sustained for the entire file
* The successor to Windows Home Server has ditched the Drive Extender functionality which makes a single convenient pool of storage out of any attached hard drives.
So in short, performance is ass with the main feature we want (drive extender) and that feature has been cut from the product line. Sepf and I don't want to wind up in an evolutionary dead end, so it's time for us to move on.
What we want, as before, is a system which lets you dump lots of files, ranging from tiny .txt files to 50 GB MKV files, into a storage pool. This pool should be expandable as more capacity is needed with minimal effort--ideally, little more than a reboot. We can't have automated tools which bring user transfers to a crawl, we can't have performance dropping much below the minimum sustained transfer rate of the slowest drive, and the system needs to be scalable to at least 60 TB / 20 drives. The storage pool should be accessible as a network share to Windows machines, with per-user permissions for read/write. If parts of the storage pool can't dynamically have redundancy, then the system needs to be able to accommodate multiple pools, one of which MUST support redundancy.
To summarize, we have a lot of files which we want in a centralized location spread across dozens of drives with some amount of redundancy, performance in the 50-150 MB/sec range, and different permission levels for different users. And what we don't want are to have to remember drive letters or rely on search to find where we put that one file.
Fuzzies, tell me, what are my options? At this point we're considering most anything. Including crazy shit like using an SSD as virtual memory to accelerate drive operations.
Don't say Drobo. At $1500 for their largest box it cost more and stores less than my current solution.
FA+

Also, in defragging the server, you see it on a per-drive level. You defragment the drive, all the files on that drive drop to 0 fragments. Also, if you delete a file from a single drive (or add one) you can see the capacity change for that drive, and be untouched for the others.
I mean, I can understand some really bad ways that software could split a file across multiple disks, but I don't think this is one of them. Even just looking at the drive access lights and per-drive I/O statistics, you can see that accessing a single file only pulls from a single drive. Good thought, though.
Although, saying that.. you might have to make sure that all hard drives are the same capactiy if you are going to use RAID.
I want to read a 30 GB file from the server. This file is stored on one drive which is capable of hitting 100+ MB/sec. No other I/O is happening on the disc, and very little is happening on the server. Performance caps out at 3 MB/sec. WTF?
I mean, I get that with good RAID I could get performance exceeding that of a single drive, and that with software RAID performance might drop relative to a hardware RAID solution during crazy I/O. That all makes perfect sense. What I don't get is how in a seemingly optimal situation, something that's really just JBOD can have such poor performance.
I can't comment on the server which you are using but thats normally the case.. although 3MB/s is rather low :(
Regarding RAID, you get around normal single drive performance with RAID 5 (yes I know you SHOULD get slightly faster but because its calcuating parity and spreading round all the drives its around 1 drive performance but with the peace of mind in case of HDD failure. With RAID 1 you'll get slightly less than single drive performance (although we're talking hardly any at all!) but you lose half your storage due to duplicating the data between drives. RAID 0 is the one which you are thinking of regarding better performance than single drive as you are writing to both drives at the same time (you get about 60% extra speed but double the chance of data failure).
Perhaps I could better explain it like this: Earlier today, I wanted to copy a 30 GB file from my desktop to my server. I do it about 5-6 times a week, very common. It wrote it to a single drive, and went about 2.5 MB/sec the whole time. WTF :< :< :<
I understand the difference in RAID implementations (or so I believe, and what you said is consistent with that), but really, if I have any sort of RAID on this system it wouldn't be to get performance in excess of a single drive. When you consider that our Gigabit ports max out at 119 MB/sec (and that most of the disk I/O will be limited to <5 MB/sec media streaming), not only would it not be the best use of money to implement RAID improving performance, but it would only really give us a performance boost when we get to the slower parts of the drive.
Most RAID implementations aren't useful for us anyway. RAID 0 is completely out because we need reliability more than throughput. RAID 1 is partially an option, though we probably only need 1-2 TB of redundant space. RAID 5 is something we want to avoid. Our best option right now is basically just JBOD + a two-drive RAID 1, and when storage gets even cheaper/we get more money, RAID 1 would be more viable and useful. Though as Sepf says, if you have 4 or more drives, why use RAID 1 when you can use RAID 10?
That is rather strange regarding the throughput of the single drive.. you looked into hardware failure.. either on the HDD side or maybe the motherboard? Try and bench each HDD by itself and it might give you a hint. The whole PC is always slowed by the slowest point so you might just have to try and trace it through trial and error + minimal config.
All the drives are like that these days. Some are connected to the motherboard, others to one of several SATA cards. Thing is, we defragged them just a couple weeks back and it generally took about a day per 2 TB drive (and they're almost completely full) which would come out to 24 MB/sec being defragged (or, 48 MB/sec total I/O, which isn't bad when you consider it's... defragging...). So drive operations aren't necessarily slow when it's internal work, just when I want to read and write files from other systems D:
RAID 5 puts a parity bit on every drive so that if ANY drive fails (up to 2/3 of all the drives) the array can still rebuild itself.. cool eh? That means that the data is perfectly safe, while still allowing you to use a large amount of your data. For example, if you have 4 2tb drives, you can still use 6tb of space of it all. This is because only 500mb of each drive is used to be the parity drive. This scales awesomely, so if you have 10 2tb drives, you would still get 15tb of data use.
RAID 1 however, copies itself completely. This means that you have double the data and it is generally pretty safe because its written on two drives so if you lose one, you still have the other to rebuild it from. However, you lose HALF your storage doing so. So if you had 4 2tb drives like above, you would have 4tb of space. Or if you had 10 2tb drives, you would only get 10tb. Bit rubbish :(
Let's say you wanted 8 TB of redundant space. You could get 8x 2 TB drives at $640, or you could get 5x 2 TB drives at $400 and a RAID 5 card. Can you get a decent RAID 5 card for under $240? How will the performance measure up to 4 sets of 2-drive RAID1 arrays?
Not had any first hand experience with servers I'm afraid, I've never needed one, and nor can I afford to get one running. I tend to give my old computers away to people in need instead of keeping them for myself, at which point I probably would turn one into a server. But linux is very well suited to file storage, hence why most web servers (essentially just a file storage device with some additional stuff inbetween to process requests from browsers and clients) tend to run linux or some form of unix nowadays.
Plus, the chances that you saturate the PCI bus are pretty much 0, so you end up with something lower because of headroom.
And yeah, getting a hardware RAID adapter is a good idea. Anything made in software works fine, but that's just what it is, software, and the machine has to take care of it instead of being able to read/write from the RAID adapter and let it handle the I/O.
Finally, Windows? I love it for myself, but on a production server I've only ever used it where it was required. I'd suggest looking into a *nix OS and something like an ext4 filesystem, it should support SMB and permissions, and take care of your fragmentation problem pretty well.
If it is fragmentation, defragment it! It should be that simple. Still, I think you'd benefit from a true RAID card just to offload some of the work from the host, or involving the OS much at all. The OS should be copying into RAM, anyway; if it's using the pagefile/swap, you might consider adding more RAM and tweaking the pagefile so it's not as appealing versus RAM. I don't suggest turning off the paging file.
If you want to keep the same setup, I'd suggest you invest in some new drives to use for temporary storage, and reformet those in production. It would help you eliminate fragmentation, and give you a chance to increase (or decrease) the block size to something that provides you with better performance. It would also probably help to reinstall the system (and get rid of unecessary services, programs, or memory/cpu hogs in general), or, if you do move to a new OS, would be a great time to start new.
Probably will move to a new OS. Even if we got the WHS performance fixed, I don't like the idea of us relying on some software that won't have a proper replacement, and won't be supported in a few years. It's just not a great situation to be in.
I came across this online:
Windows 2008 supports a dynamic, logical volume format which allows you to extend a partition beyond a single disk in a sort of software JBOD. By choosing to Expand a partition from it's right-click menu in Server Manager > Storage > Disk Management, you can convert any basic-partitioned disk into a dynamic disk and extend that partition onto any other unallocated free space on any other (non-removable) drive.
This has a few caveats, though:
- You can convert a Basic disk to a Dynamic disk without losing data, but to convert a disk from Dynamic to Basic requires a format of (or deleting all partitions on) that disk.
- You can grow drives just fine, but NTFS is sometimes a gigantic pain in the rear to shrink down.
- Dynamic disks are not accessible in any other operating systems other than Windows 2003 Server, 2008, 2008 R2, Vista, and 7. This can make data recovery, if a drive fails (backups, backups, backups!) an excitingly expensive endeavor.
If you don't need redundancy, then you can use LVM to have a dynamically growing/shrinking pool of storage that's presented as just a single device. None of the drives need to be the same size with this method.
From: http://home.clara.net/drdsl/windows.....#Dynamic_Disks
"Spanned Volume - Combines free space from up to 32 disks, data written to first disk until that fills up, then to second disk, and so on. Spanning provides no fault tolerance, and is in fact more prone to failure than simple volumes: failure in any one disk in a spanned volume, will result in the whole volume failing. Windows 2003 will not install to a spanned volume, nor can you extend or span an existing system volume. Recommended only as a stop gap measure or where tolerance for failure is high."
"RAID5 volume - Fault tolerant striped volume. requires three or more physical disk. data written to all disks at same rate, interlaced with checksum information (parity). If one disks fails it can be regenerated with remaining data and parity information. Parity is distributed amongst all volumes, but total overhead for fault tolerance is equal to one/numberOfPartitions used. Calculation of parity decreases performance compared to mirror when writing data, but read performance improved as multiple spindles are used. An existing volume cannot be converted to RAID 5: data must be backed up, RAID5 created, and then data restored. When a disk fails in a RAID5 volume, all read operations require that the data is regenerated on the fly using remaining data and parity, hence performance degraded"
The first one that comes to mind for me, perhaps because I set it up recently, is FreeBSD with a ZFS setup of some kind. FreeBSD's installer does not currently support installing to/booting from a ZFS filesystem on all platforms*, so if you want your boot filesystem to be ZFS (this is optional), some hackery in the fixit prompt is necessary--however, the procedure is documented on wiki.freebsd.org (look under RootOnZFS). If you do not feel comfortable doing this but still want it, ask somebody familiar with FreeBSD for help. I have done this before, so maybe I can help. However, you don't have to go to the trouble of booting from ZFS unless you really want to; you can simply have an ordinary UFS boot disk and then put all the other disks into a ZFS pool and set them up as raidz2 or something. Also, that way, in the event that your boot disk fails, the ZFS filesystem on the other disks can still be used by something that groks ZFS--say, a new boot disk with a new install of FreeBSD.
ZFS is useful here because it is basically the easiest way to get a Just-A-Bunch-Of-Disks setup doing something fast and useful, and because it's one of the only things that's any good at software RAID. An established zpool can be turned into a RAID filesystem with single parity (raidz), double parity (raidz2), or even triple parity (raidz3). Also supported are mirroring, snapshots, deduplication, hot spares (disk snot in use that can be switched into service as soon as another fails) and lots of other neat stuff. With as many disks as you have, raidz or raidz2 would be a good idea. raidz3 is excessive, and would probably just be a waste of space.
Doing things with ZFS generally involves two commands: zfs and zpool. That's it. Just with lots of options. The man pages (just type, say, "man zpool") describe their various options pretty thoroughly and, if that's confusing, there are a lot of webpages about "How To Do This Thing With ZFS" which provide step-by-step instructions. I have found both these and the man pages instrumental.
In my experience, ZFS is stable, fast, and reliable. It takes a lot to damage a ZFS filesystem these days, and after I got acquainted with it, things like UFS, NTFS, and ext3 seemed like medieval technology. It doesn't even require fsck--it is designed not to need it, because it is designed to repair itself. Even after I overhauled a fileserver of mine--installed a bunch of new SCSI hardware, rearranged some of the disks, added some new ones, and installed Solaris 10 on two mirrored disks, overwriting whatever was on them--the new OS still found the old filesystem and asked me if I wanted to try to recover it! Probably the most spectacular demonstration of its robustness is likely floating around on YouTube: A presentation in which some Sun guys set up a bunch of hard disks in a ZFS filesystem, hooked up externally to a server, and proceeded to smash some of those disks with a sledgehammer to demonstrate that such petty disruptions could not stop I/O operations. (Admittedly, I was more impressed when HP literally blew up a data center to demonstrate that such a trifling event as the use of high explosives was not sufficient to kill an OpenVMS cluster that spanned multiple data centers.)
There is, however, one minor problem with ZFS that you should know about (though, as I shall explain, it is actually easy to work around): Currently, once a zpool has been turned into a raidz (or z2 or z3) setup, there is no way to change that specific pool's size. However, you can add the RAIDed zpool to a new raidz (or z2 or z3) pool, which accomplishes nearly the same effect and also allows for the use of differently sized disks--normally, RAID setups would waste that difference in space! zpools are dynamic by default, and you can just toss disks in there as you please as long as you haven't RAIDed the pool, but the way data is laid out on RAIDed disks makes adding disks to a RAIDed pool a very complex task, and adding that functionality to ZFS would likely be very difficult and could threaten performance. Quoting an old post from the OpenSolaris forum:
"You'd create your pool by saying:
zpool create mypool raidz A B C
You could later grow the pool by adding three more, like this:
zpool add mypool raidz D E F
In practice, this is generally a more useful model because by the time
you need three more disks, they're often a different (higher) capacity.
That capacity is wasted if you add them to an existing RAID stripe of
smaller disks. To make this concrete, suppose A, B, C are 250G drives,
and D, E, F are 750G drives. If you were using something like LVM
and grew your RAID-5 stripe, you'd get 5 * 250G = 1.25T capacity.
With two RAID-Z stripes in ZFS, you'd get 2 * 250G + 2 * 750G = 2.0T."
So, on other words, this problem is easily circumvented by the fact that ZFS is quite happy to add any pool to any other pool. ZFS' greatest strength is not so much its capabilities and its performance as its administration tools, which make the taske it so easy to put those capabilities to use. Things like LVM in Linux intimidate me. I want advanced filesystem features, but I'm dumb and lazy and can't afford proprietary solutions to this problem. What do I need? ZFS!
Another option--and, honestly, I have no experience with this--is btrfs in Linux, which is designed to bring to Linux the same sort of functionality that ZFS has provided Solaris and FreeBSD. I think some distributions support btrfs from the installer; I think Ubuntu may even do this. With some others, there's a bit of DIY involved. Much to my surprise, btrfs is doing pretty well. I thought it would just be a shoddy hack meant to imitate ZFS that would evaporate as soon as Oracle (btrfs' primary financier) bought Sun and thereby acquired Solaris and its ZFS implementation. I was wrong, however! btrfs development has attracted the attention of many Linux developers who want Linux to have a better, more capable filesystem, and many of them have concluded that it should be btrfs that eventually replaces ext4, so a whole lot of people have been hacking away on btrfs and generally making it into a pretty good filesystem. In fact, btrfs even seems to beat ZFS on some benchmarks (though it is impossible to compare btrfs and ZFS directly because there is no OS kernel that supports both of them right now.) So look around and see what you can find out about btrfs support on various Linux distributions. You might find something that's pretty easy to deal with.
Anyway, once you have a nice server set up, there are a lot of nifty tools you can use for backing things up onto the server and otherwise working with the thing remotely. I will describe two of the most important:
rsync.
This allows you to back up only the changes you have made to the contents of a directory, and can even be set up as a daemon that runs automatically at times of your choosing. It is usually operated over SSH (Secure SHell) for security reasons, but that's fine and dandy since SSH is easy to deal with. If you rsync a directory from machine A to machine B, if it's the first time, you will simply copy that whole directory from A to B. However, if you have done this already, and have just added a few files to A, then, next time rsync runs, only those new files will be copied. Additionally, rsync keeps track of where it is, so if the transfer is interrupted, it can be resumed later. rsync takes several flags pertaining to permissions, verbosity, copression, whether to remove files that are on B but not A, &c. In summary, extremely useful tool for working with backups! Most of the time, I invoke it as
rsync -avz local_directory/ rexar@remotehost:/path/to/destination/
(putting the slashes on one of those there ensures that the stuff goes into the existing directory, rather than putting "local_directory" properly inside "destination". I always forget which slash it is, so I tend just to put both on there.)
-a for archive mode (actually corresponds to several useful flags, one of which imposes recursive copying), -v for verbose output, -z for compression. Sometimes I change it to -avzP if I want it to be nice to permissions, too.
Numerous guides exist on how to set up an rsync daemon. You don't have to set up the daemon, though--running it by hand once in a while is fine as long as you are diligent. In fact, backing up with rsync is a little safer than backing things up instantly, because if you make a mistake on machine A, then the mistake will not be on B until you copy it there.
I'm pretty sure there are Windows rsync clients. You can use these to rsync things from a Windows machine to a Unix fileserver.
ssh.
This may be the thing you use the most in dealing with a remote fileserver. You are from a Windows background, so I think I should give ssh a little explanation, because it is basically the best thing ever for working with servers remotely. Many users of some kind of Unix will tell you that it is either their favorite command or one of their three favorite commands**. SSH, in its simplest application, gives you a command-line shell on a remote system. On Unix machines, a command-line shell allows a user to do anything within their privileges, and you will not be denied the ability to do something just because you did not bother to waste bandwidth and responsiveness on graphics (though, as I will mention later, you can also do that with SSH if you need to); there is no limit to the kind of system administration you can do with SSH. You are limited only by your ability to use the shell, and that is an ability that, contrary to a popular opinion formulated primarily by people who do not use it, can be learned to a useful extent within a day and to great useful extent within a few days. Using SSH, you can access your backup server from anywhere and do anything you wish with it. However, SSH can also be used to forward graphics from a Unix machine's X server (ssh -X, ssh -XC, ssh -Y), and it can be used to transfer data (usually through a derivative command, scp--Secure CoPy, but there are other ways, too) and even to set up encrypted tunnels (ssh -R, ssh -L).
Usually, using ssh is as simple as:
rexar@my_machine$ ssh rexar@nowhere
This is the middle of bumfuck nowhere, and I'm a banner message in /etc/issue.
Password:
Welcome to bumfuck nowhere! Now you can do all kinds of shit.
rexar@nowhere$
If you're in Windows and want to SSH into something, you can use a free application called PuTTY. Also, if you want use of SCP in Windows, you can use WinSCP, which is also free.
Another note about SSH: If you can see it and it accepts SSH connections, then you can SSH to it. If you have, say, a router accepting SSH connections from the internet, then you can get into your home network from anywhere on the internet. This is extremely convenient, and I will not even attempt to describe the range of things this can enable you to do. (If you become curious, read up on GNU Screen, which is yet another program you will hear about if you ask a bunch of Unix users what their three favorite commands are. Screen + SSH are a match made in heaven.)
In any case, if you have a good Unix storage server with a suitably performant, reliable journaling filesystem and a good RAID setup, whether hardware or software RAID, what you will have is a low-maintenance box in the corner that does its job. If all you ask of it is to hold onto your files and provide an easy means of backing them up, then that's not especially difficult to get going. Do not, however, allow it to become a single point of failure. Use this machine to back up data from elsewhere, but do not let it hold the only copy of anything. RAID is not a backup. Power supplies can fail catastrophically and zorch things; RAID controllers can fail and scribble all over the disks; some careless twit can spill a drink in a bad place; lightning can strike your house and annihilate most of your computer infrastructure in a few milliseconds. The only real backup is an offsite backup; failing that, genuine, separate copies on separate machines. Tape archives are nice, too; I am considering starting one up, since I found a bunch of data tapes and have some tape drives lying around.
If you have any questions about the tedious load of cobblers I just wrote, feel free to ask.
*Currently, Debian/kFreeBSD--a somewhat experimental Debian on top of a FreeBSD kernel--has it working on x86-64, but it's fucked on i386 last I checked. It used to be that, if you wanted good ZFS support in the installer, you'd reach for OpenSolaris, but a certain yacht-racing enthusiast has seen to the end of that for the foreseeable future. Oracle quietly strangled OpenSolaris last year, and most of the userbase is running for the hills. The Illumos project is the closest thing to a viable fork.
**Windows' lack of native SSH support, client or server, and its lack of anything providing comparable utility, is part of why I stopped using Windows. Anything without SSH and a good shell to work with just feels crippled and inflexible anymore.
Do not use ZFS on a system with too little memory! Part of why ZFS is so fast is that it does a lot of caching in memory. Use at least two to four gigabytes; generally, as much as you can stand to install. With less than two gigabytes, you may suffer degraded filesystem performance.
The server would have 4GB of RAM, but RAM is cheap if we need to expand.
I wanted to try and see if I could get the OS on a USB Flash Drive (thinking the OS itself doesn't need fast I/O) and then use the 30GB SSD to augment the system (L2Arc) (waiting for parts for the VM Server so I can start doing some testing).
Your thoughts on OpenSolaris vs FreeBSD? I assume you prefer FreeBSD since that's what you've been setting up recently.
Also, since I've no Linux systems on hand, haven't been able to look into the man pages...am I correct in saying that in order to add drives to a raidz pool, we have to add them to their own little array and then the LVM on top takes care of the rest...
But here's where I'm not too sure...how (un)elegant is replacing drives on one of these systems? Am I going to have to break one of the raid sets and have it rebuild on a larger disk? One thing we run into fairly often is we like to replace 'older' drives (like we're in the process of deprecating the 1TB drives and consuming them in other systems)...or is there some lovingly written fancy drive replacement wizard (lol)?
I appreciate the textwall though, as it does reconfirm a lot of the points that I've found in my research, I can't wait to test out some VMs (hopefully later this week!).
So, really, my hand has been forced. FreeBSD came up first because it's probably the only thing that can really match Solaris' technology. I may have to jump ship (or run an OS I can't upgrade or patch), but this way at least I don't have to lose the functionality I got used to having in Solaris.
Oh yeah. I forgot about Nexenta. You guys could totally use Nexenta for this.
Adding drives to a RAIDed zpool is as simple as in the example: One command. ZFS takes care of the details, i.e. striping the new drives separately to account for the fact that they may not be of the same size as the old ones.
If you want to switch out old disks and replace them with newer, bigger ones, yes, you may have to do some filesystem surgery. However, what you would have to do would be relatively quick and simple, as long as you had enough space to work with. Commands for ZFS filesystem/pool administration (zfs and zpool, respectively) are very straightforward, and that they make filesystem- and disk-related tasks suck substantially less is one of the best things about ZFS.
I'm not 100% sold on Raid-z just yet (I'm one of those where I've always been able to demand nothing but Raid 10 for my work servers due to the necessity of high-performance, guest-facing/financial databases) but I'm definitely wanting to give it a try!
The biggest 'issue' really is our migration path (we have like 17/18TB worth of data we'd rather not loose and no real desire to drop a whole bunch of $$$ on more new drives at the given moment), but we'll work through it.