Zbr's days.

About :: TODO :: Blog :: RSS :: Old blog :: Projects :: GIT :: Gallery :: Notes

Tue, 01 Apr 2008

Fix for the fundamental network/block layer race in sendfile().

Summary of the previous series with this pompous header: when sendfile() returns, pages which it sent can still be queued in tcp stack or hardware, so subsequent write into them will endup in corrupting data which will be eventually sent. This concerns all ->sendpage() users namely sendfile() and splice().

We can only safely reuse that pages only when ack is received from the remote side, which will force network stack to release pages. My simple extension allows to hook into data releasing path and perform any actions we want. This is achieved by replacing skb->destructor with own callback registerd by interested user, for example splice/sendfile code. Splice (pipe info structure) in turn is extended to hold atomic counter of the pages in flight (without structure size change because of alignment issues it has right now), so splice code will sleep when full pipe info (->nrbufs pages) have been sent, it will wait until number of pages in flight hits zero, which is decremented in private splice callback.

Patch was tested with simple send and recv applications, which can be found in archive.

One has to run them on different machines, since loopback uses a bit different scheme (namely page is _never_ copied, so when it is received by 'remote' side, it still exists on the 'local' side, so modifications will endup in data corruption).

devfs1# ./recv -a 0.0.0.0 -p 1025 -c 1024
devfs2# ./send -a devfs1 -p 1025 -f /tmp/test -c 1024
In case of failure you will get this:
Connected to devfs1:1025.
/tmp/test/1024 -> devfs1:1025
Data was corrupted: ab.
after short period of time, where above 'ab' is a hex byte writen into mapped file, which has been sent, immediately after senfile() returns to userspace. Data is supposed to be always zero, and applications should run forever.
-c parameter specifies number of bytes to be sent in each run of the sendfile(). It has to be the same on both machines.

This idea was first thought as soft barriers in distributed storage.

/devel/networking :: Link / Comments ()