Zbr's days.
October
Sun Mon Tue Wed Thu Fri Sat
     
10
 
2008
Months
OctNov Dec

About :: TODO :: Blog :: RSS :: Old blog :: Projects :: GIT :: Gallery :: Notes

Fri, 10 Oct 2008

How to get back 100 MB/s in several clicks or fixing tbench regression for fun.

It was reported recently that tbench has a long history of regressions, started at least from 2.6.23 kernel. I verified, that in my test environment tbench 'lost' more than 100 MB/s from 470 down to 355 for 8 threads between at least 2.6.24 and 2.6.27. 2.6.26-2.6.27 performance regression in my machines rougly corresponds to 375 down to 355 MB/s.

I spent several days (please do not think that I'm bored and have nothing to do: there are really interesting things to work with, but since I already started...) in various tests and bisections (unfortunately bisect can not always point to the 'right' commit), and found following problems.

First, related to the network, as lots of people expected: TSO/GSO over loopback with tbench workload eats about 5-10 MB/s, since TSO/GSO frame creation overhead is not paid by the optimized super-frame processing gains. Since it brings really impressive improvement in big-packet workload, it was (likely) decided not to add a patch for this, but instead one can disable TSO/GSO via ethtool. This patch was added in 2.6.27 window, so it has its part in its regression.

Second part in the 26-27 window regression (I remind, it is about 20 MB/s) is related to the scheduler changes, which was expected by another group of people. I tracked it down to the a7be37ac8e1565e00880531f4e2aff421a21c803 commit, which, if being reverted, returns 2.6.27 tbench perfromance to the highest (for 2.6.26-2.6.27) 365 MB/s mark. I also tested tree, stopped at above commit itself, i.e. not 2.6.27, adn got 373 MB/s, so likely another changes in that merge ate couple of megs.

Curious reader can ask, where did we lost another 100 MB/s? This small issue was not detected (or at least reported in netdev@ with provocative enough subject), and it happend to live somehere in 2.6.24-2.6.25 changes. I was so lucky to 'guess' (just after couple of hundreds of compilations), that it corresponds to 8f4d37ec073c17e2d4aa8851df5837d798606d6f commit about high-resolution timers. I sent a patch, based on revert of the above commit, to the mail lists and developers, unfortunately it is impossible to clearly revert it in 2.6.25 not even talking about 2.6.27 tree. That patch brings performance for the 2.6.25 kernel tree to 455 MB/s.

There are still somewhat missed 20 MB/s, but 2.6.24 has 475 MB/s, so likely bug lives between 2.6.24 and above 8f4d37ec073 commit, but this excercise I left to Ingo and Peter :)

Sigh, it is more than 3 A.M. in Moscow, I think if I would be on stronger than Linux kernel hacking drugs, all my organs would run away from me long ago...

/devel/other :: Link / Comments (5)

sftf wrote at 2008-10-10 07:06:

Так производительность вернется в будущих версиях? Т.е. они приняли твои изыскания и исправления?

Zbr wrote at 2008-10-10 09:08:

I hope Ingo and Peter will find a solution, since I although somewhat fixed problem, but still it is not a good solution, since reverted commits could fix some other issues, and actually the biggest perfromance drop was only fixed for 2.6.25 kernel, since patch does not apply cleanly to the current version.

So, I just pointed to the problem and provided 'proof-of-concept' fix, which should be analyzed by maintainers. I hope it will be done and result in a solution (either based on my patches or not), which brings performance back to its highest level.

Zbr wrote at 2008-10-10 09:10:

Link to the original mail (so far no discussion) for the interested reader: http://lkml.org/lkml/2008/10/9/401

Zbr wrote at 2008-10-10 12:14:

Peter Zijlstra suggested to turn off hrticks via scheduler's section in debugfs, which brought 2.6.27 to the previously unreachable (for this kernel version) 382 MB/s (GSO and TSO are also turned off) limit. But still it is slower than .25 without mentioned commit (455 MB/s) and plain .24 (475 MB/s).

Zbr wrote at 2008-10-10 17:37:

The whole thread: http://marc.info/?t=122359565500004&r=1&w=2

So far maximum is at 386 MB/s and current suggestion (-tip tree) has only 365 MB/s.

Please solve this captcha to be allowed to post (need to reload in a minute): 62 * 76

Name:
URL (optional):
Captcha:
Comments: