The death of tiering? Disk storage in 5 years

There’s been lots of discussion in the last few days about disk storage’s tiering model, based on using solid-state storage, high-speed Fibre channel disks, and low-speed SATA disks to deliver consistent performance to different applications on a shared area, and the mention by NetApp of the death of tiering, to be replaced by a single layer of SATA disks and a large cache.

To me, a cache is a relatively small but very fast storage area which is used for temporary workloads.

However, I know a good number of people wouldn’t agree with this simple definition of a cache vs a fast tier so I’d go with something like this:

  • If I can move data into the fast storage in advance of it being read from the slower disk, it’s a tier.
  • If data permanently resides on the fast storage, with a copy on slower disk only used as a backup in case of hardware failure, it’s a tier.
  • If the data remains in the fast storage area, even when the storage area is full, rather than being deleted, due to some kind of classification rules keeping it there, it’s a tier.

However I agree with the concept of manual “tiering” having a limited life-span, I certainly hope it goes away soon, to be replaced with policy based decisions made by the array management software.

Having a “Tier1 (Flash) -> Tier2 (15K RPM SAS/FC) -> Tier3 7.2K RPM SATA)” model doesn’t work as well in the new structures of IT delivery, where things can change on a daily/weekly basis.

Instead I think a model of “High, Medium and Low Priority” and “High, Medium and Low Reliability” which can be applied to data belonging to specific applications, and which can be changed dynamically works much better.

Simplistic examples could be:

  • Production Oracle Database – High Priority, High Reliability
  • Images for Sharepoint Server – Low Priority, Low Reliability

But slightly more complicated policies like this one should be equally easy to use:

  • Oracle Database – High Priority during working hours (9-5 Mon-Fri), Med Priority otherwise, High Reliability 24×7

Once we’ve got these kind of policy-based management tools, the method that the array uses to achieve them become fairly irrelevant to anyone, the only thing left to work on would be the target SLAs that you’d want the array to achieve, something like:

  • High Priority = 0.01ms Response time
  • Medium Priority = 0.5ms Response time
  • Low Priority = 5ms Response time
  • High Reliability = 99.99% Data Availability

This probably isn’t going to happen very quickly, but I hope it does, and I look forward to it.

Thoughts on Iomega IX4-200d performance tests

There’s been an excellent blog post overnight on the performance of the Iomega IX4-200d disk array, one of the cheapest (if not the cheapest) VMware certified iSCSI capable disk arrays available.

I’m a big fan of the Iomega IX4-200d and I’ve seem them used to good effect in various situations, so I was interested to see what happens when you push it to the edge of performance with the iSCSI functionality.

Executive Summary – The IX4-200d is still an excellent NAS device for SMB’s, but these tests suggest that when the workloads are highly random and the box is pushed to the limit, rather than handling the situation gracefully it seems to slow down to a crawl. The problem may be configuration, iSCSI, RAID5 or firmware related, we won’t be able to tell without more tests.

After reading through the post, I had a few questions about how close the IX4-200d was running to the limit of a 4 disk SATA array so went off to figure them out, using the figures from Gabes Virtual World post and this recent Yellow Bricks post of RAID impact on disk IOs which saved me from any hard maths.

Gabe helpfully listed out the disks used (Seagate Barracuda 7200.11 1TB 7200RPM drives ST3100520AS), that write cache was enabled, the server is connected via iSCSI, and all 4 disks were in a RAID5 array.

I’ve taken a quick look on the Seagate site, and while they don’t list that model number, the Barracuda 7200.11 is listed in general, and I’d expect around 75 IOPS per disk based on their own specifications, which is fairly typical for a 7200 RPM SATA drive. Update – Gabe’s let me know that the model number was wrong, the correct one is ST3100520AS which is a 5400 RPM drive, so 50 IOPS is more likely).

I had 2 questions about the IX4-200d performance – is the caching working, and is RAID5 impacting performance of the box to such an extent that you’d only want to run in RAID10?

Gabe ran 4 initial IOmeter tests, which gave me the bulk of the information I wanted.

Test 001a covers 100% sequential read access of the drives, in theory telling us how fast the array can possibly run. The result of 55MB/sec isn’t great, but IOPS of 1761 is extremely high – given that the drives themselves can only deliver around 75-100 IOPS per second, 1761 is obviously a sign that the read cache is doing it’s job. As I say though, 55MB/sec isn’t great, a single Seagate Barracude 7200.11 would be expected to return more than that when plugged into a drive, indicating there’s some kind of limiting factor outside the disks, either the iSCSI implementation or something else, possibly network related like a slow switch being used.

Test 001b is 65% read, 35% write, some sequential some random, or the “real-life” test. The MB/sec result falls through the floor here, down to just 0.69MB/sec indicating something is up – either the write cache isn’t turned on, isn’t working, or the sheer load of IO’s being generated by IOmeter is causing the box to essentially collapse – I’d be interested to see this test re-run with the volume of IOPS ramping up slowly overtime so we can see whether this is the case. Using the figures from Yellow Brick’s RAID overhead post, 89 IOPS at 35% write turns into around 60 physical read IOPS on the disks, and 100 write IOPS because of the RAID5 overhead. 25 writes per second per disk isn’t too bad for a SATA drive, but it’s not good either. This result definitely suggests something isn’t working right on the IX4-200d for some workloads.

Test 001c is 50% read, 50% write, but is all sequential unlike Test 001b, so this should clarify is the issue is write performance, overloading of IOPS, or random vs sequential workloads causing the slow down. The result of 22MB/sec and 705 IOPS is massively improved over test 001b, which does suggest it’s the “random” workload that causes the IX4-200d to slow right down. The caching obviously works much better for sequential access, which isn’t unexpected, though the impact of it is a little.  705 IOPS is again definitely higher than I’d expect the 4 SATA drives to return, so the caching is working well. 22MB/sec for test 001c compared to 55MB/sec for test 001a do imply that sequential writes happen at a much lower speed than reads (which Gabe does cover in a later test, the “Super ATTO Clone pattern”).

Test 001d is the final IOmeter test, this time 70% reads, 30% writes, 100% random. Given my earlier comments on test 001b, I’d expect these results to be even worse, and so it seems – 0.5MB/sec and 64 IOPS does suggest that with random workloads the IX4-200d simply isn’t working, the average IO response time rises to 913ms and the maximum IO response time hits 12127ms. These figures simply aren’t workable, and suggests there’s something up with the IX4-200d under high volume random workloads – high volume sequential loads like test 001c have produced maximum response times of 252ms for higher write performance levels.

To skip a couple of tests in Gabe’s testing, we finally come to the “Super ATTO Clone pattern”, which attempts to discover the maximum performance achievable by a disk, by varying block sizes while performing reads and writes. The optimal figures produced are 41MB/sec read and 9.7MB/sec write at high (64K> block sizes), but the 8K block size results of 34MB/sec read and 9.2MB/sec write are very respectable, and what I’d expect the IX4-200d to be delivering.

In conclusion, to me it seems that they’re something broken with the IX4-200d in iSCSI mode with RAID5 and highly random workloads. Gabe is going to re-run his tests in NFS mode and see what difference that makes, but I’d also like to see the same tests run in RAID10 mode to see if it’s RAID5 that’s causing the issue – with 2TB drives available, RAID10 would still give you 4TB of usable disk space on the IX4-200d.

The Iomega IX4-200d is still an excellent NAS device, but these tests have made me reconsider where it could be used. It might be that NFS or RAID10 works much better, but otherwise it suggests you’re probably best not using the IX4-200d for highly random workloads.

Update 31/12/2009 – over at blog.storming there’s a follow-up post running similar benchmarks with SSDs instead of SATA drives with more interesting results

Red Hat Virtualization (RHEV-H) price and feature comparison

I’ve been putting together a very rough and ready comparison of the price and listed functionality of Redhat’s new RHEV-H virtualization platform, based on KVM with a small footprint version of Redhat’s enterprise Linux system, all wrapped up with a Windows-based management client.

I say “listed functionality” because Red Hat are the only x86 virtualization platform developers that I can think of that don’t even let you quickly download a version of their software, slightly ironic given that they’re an open-source developer and their competitors VMware, Microsoft and Citrix are all historically closed-source companies, though Citrix have open-sourced their base XenServer virtualization system.

Assuming I can get a trial version of RHEV-H and it’s management client, I’ll write a new post giving you my experiences with it in comparison to VMware vSphere.

On paper, RHEV-H is a pretty functional product, supporting:

• High availability – failover between physical servers
• Live migration – online movement of VM’s between physical hosts without interruption
• System scheduler – dynamic live migration between physical hosts based on physical resource availability
• Maintenance manager
• Image management
• Monitoring and reporting

These are the major components of a virtualization platform, indeed live migration and the system scheduler are high-end features on the other virtualization platforms, so for Red Hat to include in it’s “one-size-fits-all” package is a nice addition.

The major player in the virtualization arena is without a doubt VMware, and their vSphere Advanced product will deliver the functionality that pretty much any company would want, though the have an “Enterprise Plus” option which adds even more for larger corporations.

VMware vSphere Advanced includes:

  • VMware ESXi or VMware ESX (deployment-time choice)
  • VMware vStorage APIs / VMware Consolidated Backup (VCB)
  • VMware Update Manager
  • VMware High Availability (HA)
  • VMware vStorage Thin Provisioning
  • VMware VMotion™
  • VMware Hot Add
  • VMware Fault Tolerance
  • VMware Data Recovery
  • VMware vShield Zones

A lot of that functionality, especially the Fault Tolerance, vShield Zones and vStorage APIs simply aren’t matched in any other virtualisation platform right now, whatever the price. However, the vSphere Standard product misses out the VMotion and Fault Tolerance functionality along with thin-provisioning and data recovery features, which means that while it’s still an excellent product, it does mean more management overhead in the event of needing to arrange physical server downtime, etc.

Now to the prices, I’ve put together the list prices of RHEV-H and VMware vSphere Standard and Advanced, and put them below in a table and also a sample configuration based on 1 management server and 5 physical hosts, each with 2 sockets.

Because Tumblr doesn’t seem to let you embed a table, I’ve had to put the table as an image, sorry about that.

As you can see, RHEV-H is the cheapest software option of the 3, though the 3 year cost-benefit compared to vSphere Standard aren’t huge, especially when 24×7 support is included. vSphere Advanced costs significantly more, but delivers a lot more too, though it could be more than your own company needs.

Below are the full costs I’ve used to calculate the above results, please let me know if you think I’ve got anything wrong or missed anything out.

The prices above were taken from the VMware online store on 29th December 2009, and the Red Hat Virtualization Cost PDF, again on the 29th December 2009.

Overall, it looks like the pricing of Red Hat’s RHEV-H system makes it worth the effort of aquiring it and giving it a solid shakedown, but it’s not going to force VMware into radically changing their own pricing structure.

vSphere Advanced is streets ahead in terms of functionality, and the wide-spread adoption of VMware products in general means vSphere Standard may lack some of the functionality of RHEV-H but makes up for it in other areas, especially around the management and backup+restore side of virtualisation, where RHEV-H has a long way to go to catch up.