[Untitled]‎ > ‎

Bechmarking 10Gbps Ethernet on Desktop Motherboards and EXT4

posted Dec 31, 2011, 6:06 AM by Unknown user   [ updated Dec 31, 2011, 6:06 AM ]
Info
GOAL: To maximize throughput between two hosts.
NOTES: There is a bug in ext4 that limits the speed on mdadm-based RAID arrays to ~350MiB/s.  This bug does not appear to affect hardware raid controllers.

Specifications
Distribution: Debian Testing
Architecture: 64-bit

Outline
1. Setup
2. Benchmarks
3. Network configuration
4. Disk array configured used to compensate for EXT4 bug (+72-114MiB/s performance gain)

Setup
1. Intel DP55KG (3ware 9650SE-16PML (> 500-600MiB/s sequential writes with EXT4 or XFS)
2. A Gigabyte P35-DS4 (Software RAID-5 (> 400-600MiB/s sequential writes with XFS, capped at 350-400MiB/s with EXT4)
3. 10GbE AT2 server adapters (Intel states these are PCI-e 2.0, but the BIOS shows otherwise, showing as PCI-e v1.0, 2.5GT/s)
4. A regular Cat6 cable was used between the two hosts since switches at this time are a minimum of $5k-$10k or more.  From what I read, a regular Cat6 cable can go 33 meters on 10Gbps; whereas Cat 6a can run up to 100 meters.
5. I read a 10Gbps tuning paper, it stated one should use 9000 MTU when using 10GbE, otherwise it is possible that you may hit a ceiling of 3Gbps on a single connection.  I did not bother testing with the default MTU (1500) and instead went right to an MTU of 9000 instead.

Benchmarks
1. iperf - This shows what the network can push, regardless of disk or other I/O channels.
2. nfs - Using cp over NFS.
3. ftp - Using FTP to transfer a single file.

1. iperf benchmark
p34:~# iperf -c 10.0.0.254
------------------------------------------------------------
Client connecting to 10.0.0.254, TCP port 5001
TCP window size: 27.4 KByte (default)
------------------------------------------------------------
[ 3] local 10.0.0.253 port 52791 connected with 10.0.0.254 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 10.9 GBytes 9.39 Gbits/sec
p34:~#
Using nload to watch the speed during the test:
Device eth3 [10.0.0.253] (1/1):
================================================================================
Incoming: Outgoing:
Curr: 1.10 MByte/s Curr: 1120.57 MByte/s
Avg: 0.07 MByte/s Avg: 66.20 MByte/s
Min: 0.66 MByte/s Min: 666.95 MByte/s
Max: 1.10 MByte/s Max: 1120.57 MByte/s
Ttl: 4.40 MByte Ttl: 4474.84 MByte
2. nfs copy benchmark

When I copy a large file over NFS from Gigabyte (SW) raid to the Intel motherboard, I get what the SW raid can read at, approximately: 560MiB/s.
Here is the file: (49GB)
-rw-r--r-- 1 abc users 52046502432 2010-02-25 16:01 data.bkf

From Gigabyte/SW RAID-5 => Intel/HW RAID-6 (9650SE-16PML):
$ /usr/bin/time cp /nfs/data.bkf .
0.04user 45.51system 1:47.07elapsed 42%CPU (0avgtext+0avgdata 7008maxresident)k
0inputs+0outputs (1major+489minor)pagefaults 0swaps
=> Downloading 49000MB for 1.47 minutes is: 568889KB/s.

However, from Intel/HW RAID-6 (9650SE-16PML) => Gigabyte/SW RAID-5
$ /usr/bin/time cp data.bkf /nfs
0.02user 31.54system 4:33.29elapsed 11%CPU (0avgtext+0avgdata 7008maxresident)k
0inputs+0outputs (0major+490minor)pagefaults 0swaps
=> Downloading 49000MB for 4.33 minutes is: 193133KB/s.

When running top, I could see md raid 5 and pdflush at or near 100% CPU.
Is the problem scaling with mdadm/raid-5 on the Gigabyte motherboard?

Gigabyte: 8GB Memory & Q6600
Intel DP55KG: 8GB Memory & Core i7 870

From the kernel:
[ 80.291618] ixgbe: eth3 NIC Link is Up 10 Gbps, Flow Control: RX/TX

With XFS, it used to get > 400MiB/s for writes.

With EXT4, only 200-300MiB/s:
(On Gigabyte board)
$ dd if=/dev/zero of=bigfile bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 38.6415 s, 278 MB/s
I submitted a bug [1] to the Linux Kernel Mailing List (LKML) to let the ext4 developers know that EXT4 is capped at 350MiB/s when using an mdadm software RAID.

3. FTP benchmark
(Gigabyte->Intel)

---- Connecting data socket to (10.0.0.254) port 51421
---- Data connection established
---> RETR CentOS-5.1-x86_64-bin-DVD.iso
<--- 150-Accepted data connection
<--- 150 4338530.0 kbytes to download
---- Got EOF on data connection
---- Closing data socket
4442654720 bytes transferred in 8 seconds (502.77M/s)
lftp abc@10.0.0.254:/r/1/iso>
CentOS DVD image in 8 seconds!

rsync is much slower:
$ rsync -avu --stats --progress /nfs/CentOS-5.1-x86_64-bin-DVD.iso .
sending incremental file list
CentOS-5.1-x86_64-bin-DVD.iso
4442654720 100% 234.90MB/s 0:00:18 (xfer#1, to-check=0/1)

I am using nobarrier, guess some more tweaking is required on the Gigabyte motherboard with software raid.
Network Configuration

For the future if anyone is wondering, the only tweak for the network configuration is setting the MTU to 9000 (jumbo frames, here is my entry for /etc/network/interfaces)
 Intel 10GbE (PCI-e)
auto eth3
iface eth3 inet static
address 10.0.0.253
network 10.0.0.0
netmask 255.255.255.0
mtu 9000
Disk array configured used to compensate for EXT4 bug on mdadm raid arrays (+72MiB/s performance gain)

The option that gives the boost is '
nodelalloc,' the mount man page reveals:

       nodelalloc
              Disable delayed allocation. Blocks are allocation when  data  is
              copied from user to page cache.

Before:
/dev/md3        /r/1            ext4    defaults,noatime,nobarrier 0 1

After:
/dev/md3        /r/1            ext4    defaults,noatime,nobarrier,nodelalloc 0 1

Before:

$ dd if=/dev/zero of=bigfile bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 38.6415 s, 278 MB/s
After:
$ dd if=/dev/zero of=bigfile2 bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 27.3865 s, 392 MB/s
In the original testing with kernel 2.6.33, EXT4 performance usually did not peak past 350MiB/s.  However, with the final test shown above at 392MiB/s, this is with the 2.6.33.1 kernel, again with the 'nodelalloc' option.  This increased the speed from 278MiB/s to 392MiB/s, to which EXT4 is currently capped on mdadm raid arrays, at least at the time of this writing.

URLs
[1] http://linux.derkeiler.com/Mailing-Lists/Kernel/2010-02/msg10889.html
Comments