|
|
About
TODO
Blog
RSS
Old blog
Projects
Gallery
Notes
Fri, 29 Feb 2008
Debugging undebuggable.
If something looks undebuggable from the first view, than take a secon one.
Better from different angle. Some problems require third look.
Bits of history of the problem.
Pohmelfs
has extremely large latencies
when syncing local inode to the remote server. This involves sending
a command to the server to create an object with given name and receive back
a response with its real inode information (like inode number and other
fields cached for faster stat() and similar workloads). Pohmelfs
then changes local inode info to match the real data.
Syncing of small tree of 500 files takes about 40 (!) seconds. Well, in Xen
environment where I develop this things local creation of 500 files in single
ext3 directory takes more than 15 seconds, but another 25 is a pure overhead.
That was short description of previous series.
Next, problems of fixing the problems.
First, Xen version used at that testing machine is old enough, so oprofile
does not work. Second, I do not know VFS internals enough (this is my first filesystem,
interested reader can find how I managed to
step
likely on every possible rakes
on that field, some of them were even small kid rakes...) to determine where there
is a possibility to catch that long delays, but since linux filesystem is actually a
not that complex system, but set of callbacks, implementation is not really outstanding,
but knowing in which condition each callback can be invoked and which problems can be
here or there is kind of a magic... Third, remote userspace pohmelfs server was not actually
written by me, instead its bytecode was blown out because of some substances inspiration,
so it can be very much a reason for all the problems, given that it is trivial as
pretty much all my userspace code, even total rewrite will not fix the issue.
So, latency problem in pohmelfs looked really undebuggable. But you know, cup of excellent
tea (from tea-packet) with lemon can fix any problem (or high themperature and substances,
or fair amount of alcohol, everyone has fun the way he likes), so it was first
decided to implement
a simple network kernel module which would connect to remote userspace server and exchange
messages in a similar fasion like pohmelfs does.
Such module was implemented, started and showed excellent performance (about 1 thousand of messages
per second send and received back in test network, which is several orders of magnitude faster
than pohmelfs). So, move back to VFS and pray for inspiration.
Inspiration was met today (thanks Arnaldo, likely it is because I'm getting healthier :).
I always thought that number of subsequent calls for recv() is not a good idea no matter
where: in kernel or userspace, since it takes a socket lock, which in turn can introduce latencies found,
so I eliminated subsequent recvs in pohmelfs code (testing module was written better and does
sending and receiving without such 'fragments'), which resulted in... nothing, results did not changed
at all. So, wrong step, but having subsequent sending calls in a row is not a good idea too,
so I replaced them with allocation and copy, so that there would be only single kernel_sendmsg()
call. As you might expect performance... changed by 30 times. Just by having single send call instead of two
for as much as 500 invokations forced the whole network exchange to behave completely different.
So, to debug problem further I extended testing module and introduced ability to send and receive
data not by single packet but via two fragments: 4 bytes and rest of the packet (60 bytes). Here is a result
table for 1000 of messages sent and received back by testing module:
no fragments: 1.43 seconds
send fragments (4 and 60 bytes): 40.43 seconds
recv fragments (4 and 60 bytes): 1.43 seconds
both fragmentations: 40.43 seconds
It is 30 times difference just for simple application change!
tcpdump on receiving side shows that subsequent fragments sending results in a real message sending
all the time kernel_sendmsg() is invoked, which results on ack for each such message (both 4 and 60
bytes), which completely degrades tcp window and connection just can not recover with such behaviour.
So, all that words were written just to show that even undebuggable from the first view problems can be easily
solved, and that harmless (from the first view again) programming mistakes can result in very interesting results...
Now back to drawing board to think how to improve pohmelfs protocol even more to get the last bits out of the wire.
Btw, interested reader can get my network testing module and userspace from theirs
just created homepage.
/devel/networking :: Link / Comments (4)
Please solve this captcha to be allowed to post (need to reload in a minute): 5 + 21
|
M.S. wrote at 2008-02-29 18:45: