Our website would like to use cookies to store information on your computer. You may delete and block all cookies from this site, but parts of the site will not work as a result. Find out more about how we use cookies.

Login or Register

Powered by
Powered by Novacaster
 
3Ware SATA Raid 9550SX Performance
by Simon at 18:03 09/09/07 (Forum::Technical Advice::General)
I first suspected there was a problem with this card when trying to install Centos 4.4 on a twin Opteron 2.4GHz H8DA8 Supermicro box with 4GB RAM, using an LVM / partition on top of a RAID 1 of 2x Maxtor 250GB SATA II disks.

Formatting the / partition seemed to take forever, with significant pauses when writing the inode table. OK, it's a large partition (200+GB) but hangs for tens of seconds and a collapse in responsiveness led me to believe that the installer had crashed.

It is only after a lot of experimenting (including trying Centos 4.4, 4.5, openSUSE 10.2 and lastly RHEL AS 4 Update 5 - yes, I know that's the same as Centos 4.5 but just in case... also different disks - Western Digital Seagate - and booting SMP kernel v nonSMP) that I've come to realise there's a more fundamental problem.

Attached are four PDF files showing graphs of the output of vmstat 1. During these tests the machine's in init 3, with a minimal install of RHEL AS 4 update 5 using default partitioning (small /boot, 2GB swap, large / on LVM, ext3) as suggested by the installer. There are test runs for both the SMP and the nonSMP kernels.

Write cache on the card is switched on, the drives are in 3Gb/s SATA II mode with NCQ enabled and queuing enabled on the card itself. The firmware's the latest (codeset 9.4.1.2 from 3Ware, fw FE9X 3.08.02.005), as is the driver (2.26.05.007) and the card's got 128MB installed on it. The RAID 1 array has been initialised and verified with 3dm2.

The test commands are:

Read test: sync; time -p `dd if=/dev/sda of=/dev/null bs=1M count=X`

... where X is 3072, 4096, 6162 and 20480 ie approx 3G, 4G, 6G and 20G of data

Write test: sync; time -p `dd if=/dev/zero of=[filename] bs=1M count=X`

... where X is as above, and [filename] is on / partition.

vmstat's "blocks in" comes in typically at around 80000 blocks/s, but take a look at the "blocks out" graphs in the attached PDFs - they're extremely 'bursty' - anything up to 1,000,000+ blocks/s at times, followed by long periods of nothing at all.

Well, that's fine, you might think - all that IO's been handed off to the 3Ware card for processing so it's only to be expected that enormous chunks get thrown at it periodically and it just muches through it for a bit, updating the RAID 1 array, before asking for some more.

Trouble is, during these periods of not much going on the system performance in terms of responsiveness - 'feel', as it were - goes off a cliff with as much as a minute between typing 'ls' in a small directory and getting back any output.

Loadave heads ever upwards - reaching 12+ in some instances (and before anyone starts telling me loadave is calculated differently in 2.6 kernels, I've read that debate and understand it's not the best indicator of what's going on), with processes like pdflush, kjournald and kswapd hanging around for ages in D state (uninterruptible sleep).

I've seen up to 8 pdflush processes like this during the 20G run, and believe me this has a major impact on things.

Where this really starts to bite you is when doing intensive IO with files greater than available RAM - there are some impressive throughput figures for filesizes that fit easily in RAM - naturally, plenty of room for the OS to work out its own async IO if it's got room to breath - here's a table of the timed tests for the various sizes:

... yes, single processor 3G write throughput is greater than with SMP. Also the 20G writes - though 35MB/s is nothing to write home about. "WD" in the header is a mistake, these are Seagate disks, not Western Digital ones.

Before anyone asks, tweaking setra=16384, nr_requests=512 or the scheduler (deadline vs cfq) in line with 3Ware's tuning hints has no impact on the specific problem - ie the collapse in responsiveness of the entire system under certain intensive IO operations. Neither does jumpering the card slot down to 66MHz from 133MHz, nor does taking LVM out of the picture have any effect.

I've read a ton of stuff over the last week, building up every more interesting Google searches as I learn more about this - the most fruitful being today, when having spent a load of time graphing vmstats output in Excel I finally Googled 3ware vmstats and came across this very recent post, which I suspect has hit the nail right on the head:

Too many years of awful 3ware performance. - Discussion@SR

I'll probably add to this thread as and when I discover other things, but in the meanwhile if anyone else has come across this performance/responsiveness cliff and worked out how to bypass it, please get in touch.

On with the vmstats graphs - download all the PDFs below and look at what's going on for yourself. All those processes in 'b', 100% iowait, at one stage in the 20G write test the machine actually runs out of memory and sendmail gets automatically killed (I'm in init 3 remember).

Oh, - and finally, why 20G? Well 3Ware's own "Benchmarking the 3ware 9000 Controller Using Linux Kernel 2.6" document says to use 40x installed RAM - but their test machine's only got 512MB installed. I've got 4GB installed and haven't the patience to wait for a 160GB file to be written out in million block bursts separated by minute-long gaps. I reckon 5x installed RAM's achieving the same (exhausting any cache behaviour).

Right, now, really, those graphs... (see below)
--
simon

Attachments...
JPG image (172 K) Summary of dd timed output for various data sizes
PDF file (62 K) 3G read and write
PDF file (71 K) 4G read and write
PDF file (92 K) 6G read and write
PDF file (163 K) 20G read and write
<< Anyone know of a solution to Let's get preloaded >>
View Comments (Threaded Mode) Printer Version
3Ware SATA Raid 9550SX Performance Simon - 18:03 09/09/07
Re: 3Ware SATA Raid 9550SX Performance - different blocksizes Simon - 12:52 10/09/07
Results of some vmstat tests using different blocksizes in the read/write of 3GB of data.

Basically:

dd if=/dev/sda of=/dev/null bs=X count=y followed by dd if=/dev/zero of=/3GBfile bs=X count=Y

... foreach each of X = 512B, 1K, 2K, 4K, 8K, 16K, 32K, 64K, 128K, 256K and 512K and Y is the appropriate multiplier to get to 3GB.

In answer to the question "does changing the blocksize make any difference?" the answer is "No".

Attached PDF graphs the results of a whole run - again we've got fairly solid 80000 blocks/s on reads and very bursty block/s on writes, with lots of processes being blocked when writing.
--
simon

Attachments...
PDF file (111 K) 3GB read/write with various block sizes from 512B to 512K
Re: 3Ware SATA Raid 9550SX Performance - different VM settings Simon - 12:30 13/09/07
Based on a comment in this bugzilla thread, here are a couple of iozone + vmstat 1 tests run with:

/proc/sys/vm/dirty_expire_centisecs = 1000 (default is 3000)
/proc/sys/vm/dirty_ratio = 10 (default is 30)

but with no other tuning tweaks applied.

This change reduces the number of processes in 'b' state somewhat.
--
simon

Attachments...
PDF file (118 K) Contrast different VM params
3Ware codeset 9.4.1.3 Simon - 09:49 21/09/07
Note to self, 3w-9xxx.c and 3w-9xxx.h in 9.4.1.3 are identical to those supplied in codeset 9.4.1.2, ie version 2.26.05.007.

Therefore there's no need for me to bother recompiling the driver/initrd as it's already what is provided in CentOS 4.5.

Firmware in 9.4.1.3 has been updated from 3.08.02.005 to 3.08.02.007

None of the other things I might care about (CLI, 3BM etc) have changed.
--
simon

Re: 3Ware SATA Raid 9550SX Performance - io schedulers Simon - 17:30 26/09/07
Now testing with RHEL 5, rather than CentOS 4.5 - in the hope that perhaps kernel 2.6.18 is less prone to the original issue than 2.6.9.

Some LTP disktest runs, with various schedulers. Note how cfq does particularly badly.

3ware driver: 2.26.06.002-2.6.18
sdb is a RAID 1 pair of 250GB Seagates.

elevator=deadline:
Sequential reads:
| 2007/09/26-16:19:30 | START | 3065 | v1.2.8 | /dev/sdb | Start args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -r (-N 488259583) (-c) (-p u)
| 2007/09/26-16:20:00 | STAT | 3065 | v1.2.8 | /dev/sdb | Total read throughput: 45353642.7B/s (43.25MB/s), IOPS 11072.7/s.
Sequential writes:
| 2007/09/26-16:20:00 | START | 3082 | v1.2.8 | /dev/sdb | Start args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -w (-N 488259583) (-c) (-p u)
| 2007/09/26-16:20:30 | STAT | 3082 | v1.2.8 | /dev/sdb | Total write throughput: 53781186.2B/s (51.29MB/s), IOPS 13130.2/s.
Random reads:
| 2007/09/26-16:20:30 | START | 3091 | v1.2.8 | /dev/sdb | Start args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -r (-N 488259583) (-c) (-D 100:0)
| 2007/09/26-16:21:00 | STAT | 3091 | v1.2.8 | /dev/sdb | Total read throughput: 545587.2B/s (0.52MB/s), IOPS 133.2/s.
Random writes:
| 2007/09/26-16:21:00 | START | 3098 | v1.2.8 | /dev/sdb | Start args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -w (-N 488259583) (-c) (-D 0:100)
| 2007/09/26-16:21:44 | STAT | 3098 | v1.2.8 | /dev/sdb | Total write throughput: 795852.8B/s (0.76MB/s), IOPS 194.3/s.

elevator=noop:
Sequential reads:
| 2007/09/26-16:24:02 | START | 3167 | v1.2.8 | /dev/sdb | Start args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -r (-N 488259583) (-c) (-p u)
| 2007/09/26-16:24:32 | STAT | 3167 | v1.2.8 | /dev/sdb | Total read throughput: 45467374.9B/s (43.36MB/s), IOPS 11100.4/s.
Sequential writes:
| 2007/09/26-16:24:32 | START | 3176 | v1.2.8 | /dev/sdb | Start args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -w (-N 488259583) (-c) (-p u)
| 2007/09/26-16:25:02 | STAT | 3176 | v1.2.8 | /dev/sdb | Total write throughput: 53825672.5B/s (51.33MB/s), IOPS 13141.0/s.
Random reads:
| 2007/09/26-16:25:03 | START | 3193 | v1.2.8 | /dev/sdb | Start args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -r (-N 488259583) (-c) (-D 100:0)
| 2007/09/26-16:25:32 | STAT | 3193 | v1.2.8 | /dev/sdb | Total read throughput: 540954.5B/s (0.52MB/s), IOPS 132.1/s.
Random writes:
| 2007/09/26-16:25:32 | START | 3202 | v1.2.8 | /dev/sdb | Start args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -w (-N 488259583) (-c) (-D 0:100)
| 2007/09/26-16:26:16 | STAT | 3202 | v1.2.8 | /dev/sdb | Total write throughput: 795989.3B/s (0.76MB/s), IOPS 194.3/s.

elevator=anticipatory:
Sequential reads:
| 2007/09/26-16:37:04 | START | 3277 | v1.2.8 | /dev/sdb | Start args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -r (-N 488259583) (-c) (-p u)
| 2007/09/26-16:37:34 | STAT | 3277 | v1.2.8 | /dev/sdb | Total read throughput: 45414126.9B/s (43.31MB/s), IOPS 11087.4/s.
Sequential writes:
| 2007/09/26-16:37:35 | START | 3284 | v1.2.8 | /dev/sdb | Start args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -w (-N 488259583) (-c) (-p u)
| 2007/09/26-16:38:04 | STAT | 3284 | v1.2.8 | /dev/sdb | Total write throughput: 53895168.0B/s (51.40MB/s), IOPS 13158.0/s.
Random reads:
| 2007/09/26-16:38:04 | START | 3293 | v1.2.8 | /dev/sdb | Start args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -r (-N 488259583) (-c) (-D 100:0)
| 2007/09/26-16:38:34 | STAT | 3293 | v1.2.8 | /dev/sdb | Total read throughput: 467080.5B/s (0.45MB/s), IOPS 114.0/s.
Random writes:
| 2007/09/26-16:38:34 | START | 3300 | v1.2.8 | /dev/sdb | Start args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -w (-N 488259583) (-c) (-D 0:100)
| 2007/09/26-16:39:18 | STAT | 3300 | v1.2.8 | /dev/sdb | Total write throughput: 793122.1B/s (0.76MB/s), IOPS 193.6/s.

elevator=cfq:
Sequential reads:
| 2007/09/26-16:42:18 | START | 3353 | v1.2.8 | /dev/sdb | Start args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -r (-N 488259583) (-c) (-p u)
| 2007/09/26-16:42:48 | STAT | 3353 | v1.2.8 | /dev/sdb | Total read throughput: 2463470.9B/s (2.35MB/s), IOPS 601.4/s.
Sequential writes:
| 2007/09/26-16:42:48 | START | 3360 | v1.2.8 | /dev/sdb | Start args: -B 4k -h 1 -I BD -K 4 -p l -P T -T 30 -w (-N 488259583) (-c) (-p u)
| 2007/09/26-16:43:18 | STAT | 3360 | v1.2.8 | /dev/sdb | Total write throughput: 54572782.9B/s (52.04MB/s), IOPS 13323.4/s.
Random reads:
| 2007/09/26-16:43:19 | START | 3369 | v1.2.8 | /dev/sdb | Start args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -r (-N 488259583) (-c) (-D 100:0)
| 2007/09/26-16:43:48 | STAT | 3369 | v1.2.8 | /dev/sdb | Total read throughput: 267652.4B/s (0.26MB/s), IOPS 65.3/s.
Random writes:
| 2007/09/26-16:43:48 | START | 3376 | v1.2.8 | /dev/sdb | Start args: -B 4k -h 1 -I BD -K 4 -p r -P T -T 30 -w (-N 488259583) (-c) (-D 0:100)
| 2007/09/26-16:44:31 | STAT | 3376 | v1.2.8 | /dev/sdb | Total write throughput: 793122.1B/s (0.76MB/s), IOPS 193.6/s.


--
simon
Re: 3Ware SATA Raid 9550SX Performance - scheduler comparison Simon - 17:03 02/10/07
Attached PDFs show graphs of vmstat -n 1 recorded 'bi' and 'bo' data for the same test using different io schedulers under a fresh install of CentOS 5 with the built-in 3w-9xxx driver.

The test was:

dd if=/dev/sda of=/dev/null bs=1M count=4096 &; sleep 5; dd if=/dev/zero of=./4G bs=1M count=4096 &

Twin Opteron 2.4GHz, 4GB RAM, 3ware 9550SX RAID 1, built-in driver 2.26.02.007, nr_requests 128 (default), ext3 on LVM (default), data=default (as opposed to 'writeback' or other non-default tweaks). All other filesystem/kernel param settings also at default for the CentOS 5 install - only the io scheduler is being changed.

Also tested the 3ware 9.4.1.3 src-compiled driver and saw identical results.

Edit: Redid tests with 4k blocksize and count 1048576 - very similar graphs.


--
simon
Attachments...
PDF file (30 K) CentOS 5 io scheduler comparisons 1M block size
PDF file (30 K) Same tests but with 1048576 blocks of 4k size instead
JPG image (186 K) Comparison of 1M v 4k bs side by side
Re: 3Ware SATA Raid 9550SX Performance - scheduler comparison Nat Makarevitch - 01:06 17/12/07
Hi,

I had similar problems and awful random I/O performance (6 spindles, RAID5).

I could somewhat alleviate all this (up to ~130 random read/second)

You may read my notes

--
Nat

Re: 3Ware SATA Raid 9550SX Performance - scheduler comparison Simon - 10:53 17/12/07
Thanks - I eventually tried replacing the 3ware with an Areca, saw much better performance and far less io blocking going on.

At the point, I decided to stop the intensive testing and get on with other priorities.

I'm keeping an eye on kernel bug 7372, in case there's any notable change.

--
simon