|
|
About
TODO
Blog
RSS
Old blog
Projects
Gallery
Notes
Tue, 18 Dec 2007
Fundamental race between block layer/IO and networking.
This header is about impossibility to work without races with
netowork's ->sendpage() method, which is used mostly
to transfer IO mapped pages, without either turning off offload capabilities
and copying data into new buffer or using own acks in the protocol.
->sendpage() in the optimised case (hardware supports checksum offloading
and scater/gather) will not copy content of the page to the new buffer, but instead
will increase page's reference counter, so that page could not be freed. When
->sendpage() returns this does not guarantee, that data was sent, received by remote
side or whatever, since packet can be queued (in hardware or qdisk), it can be later retransmitted,
there is no way to know that data was received until ACK (lets talk about TCP)
is received, but there is no API to know that ACK was received. When ACK is received,
appropriate packet will be found in the TCP retransmit queue and freed, this will drop page's
reference counter.
If user (and there is no other way actually) does expect that after
->sendpage()'s return data can be processed (for example rewritten),
then there is non-zero probability that remote side will get this new data, instead
of old, which can lead to state machine breaks and data corruption.
One can try to use sendfile() and simultaneously write data to the
file - remote side can get mix of the old and new data. One can argue that using proper locking
around sendfile() and write will help, but actually it will not -
consider the case when we send only single page - after sendfile() returned,
data still can be in the queue, so subsequent write, which already does not race with
sendfile() itself, but not with data sending, will overwrite data and
remote side will get new one instead of old data.
There are two fixes for thei problem: first is not to use ->sendpage()
(or use it with copy of the data into new buffer, which is essentially how
usual send() works), second is to use protocol specific acknoledgement
system, so that any subsequent operation on given data would be postponed not until
->sendpage()/sendfile() returns, but until that ACK received.
Both greatly harm performance.
I would be really glad to find that my conclusions are incorrect.
/devel/fs :: Link / Comments (8)
Climbing evening.
That was very good although again a bit shorter training -
most of it was devoted to the complex trace with the start on the
horizontal negative slope, which sucked power very quickly, so that
at the end (after about 3 hours) I was not able to complete even small
parts of it (while doing it quite stable at the begining).
Trace requires back and arms especially, so after the training I feel
myself tired as hell, which is great of course!
It was very good time there today!
/life :: Link / Comments (0)
|