Zbr's days.

About :: TODO :: Blog :: RSS :: Old blog :: Projects :: GIT :: Gallery :: Notes

Mon, 02 Jun 2008

As promised, let's see shadowed miserable POHMELFS results.

Usually you will not see bad benchmark results for developing technology, but any such result is actually a _very_ good result for work-in-progress and not yet completed system. It allows to see how new proof-of-concept code can be comparable with already completed tuned and optimized system.
Conclusions from such test results in a really superior decisions.

Let's compare iozone read/reread, write/rewrite and random read and write for POHMELFS and NFS with 8Gb test files different record size (from 8Kb to 1Mb) on XFS over the GigE link.
I described hardware and local iozone benchmark results in details previously.

Now its time for network tests.
Async NFS in-kernel server results.

						    random  random
      KB  reclen   write rewrite    read    reread    read   write
 8388608       8   60969   57743    39705    97031  464898    5160
 8388608      16   59925   57402    39045    98269  641388    8827
 8388608      32   58094   55263    39075    94654  775064   14389
 8388608      64   58168   57156    40306    98639  868796   22360
 8388608     128   58908   56573    40392   100018  941509   33211
 8388608     256   59444   56446    40842   102503 1030451   41576
 8388608     512   60280   57686    39835    97879 1042570   49858
 8388608    1024   60817   57886    40886    96646  851175   47993
And now POHMELFS results.
						    random  random
      KB  reclen   write rewrite    read    reread    read   write
 8388608       8   70073   64232    12518    14817   40334    5079
 8388608      16   63984   67948    31976    19106   41462    8702
 8388608      32   67250   63440    47506    38657   75908   14357
 8388608      64   69970   66198    41899    29566  136294   21385
 8388608     128   69838   68523    76232    33971  222909   30946
 8388608     256   70012   66439    69125    58223  330886   40685
 8388608     512   70946   68291    76460    58738  428881   51001
 8388608    1024   70985   64958    76317    59561  421973   48531
Sequential writing is 10-15% faster for POHMELFS (and limited by underlying fs speed), while random writing is essentially the same and is limited by disk speed. But sequential reading is _much_ worse for small requests. THe reason is simple: POHMELFS does not support readahead, since it does not have ->readpages() callback, so any sequential access ends up with set of ->readpage() callbacks, which waits for theirs completion, which is slow, so currently readahead is not invoked from reading path.
I could not resist to highlight, that big sized requests are 1.5-2 times faster for POHMELFS than NFS :) and is also limited by underlying filesystem.

One can note, that NFS random reading results are actually better than local filesystem behaviour, and its is better very noticebly. Why does local filesystem behave worse than being mounted via NFS in random reading?
I believe that's because in a network case we actually have double buffering: on client, where the most active pages are in RAM, and on server, where readahead populated pages, which are not active (since active pages are being read from client's cache, so they will be evicted from server's page cache, since client will not try to read them from server), but those server pages, which are not active currently will be accessed soon by client, when it will read next portion of the random data, and it will be very fast access to RAM.
So we have really good caching scheme, where the most actively used pages are in client RAM, and they are flushed to disk on server, and isntead server populated other less active pages via readahead.

This reading behaviour is just a result of yet not completed VFS callback implementation of the POHMELFS. With ->readpages() in place it will be faster than NFS even in this bench. Also POHMELFS has multiple-server parallel read balancing and simultaneous writing to them, but there are no results yet.
I already created a mind model of the optimized read and write transactions (based on memory pools for the maximum OOM-robustness and small memory usage overhead), so in a day or two it will be implemented in code.

Stay tuned, now its time for excellent POHMELFS results!

/devel/fs :: Link / Comments ()