|
|
About
TODO
Blog
RSS
Old blog
Projects
Gallery
Notes
Tue, 18 Dec 2007
Fundamental race between block layer/IO and networking.
This header is about impossibility to work without races with
netowork's ->sendpage() method, which is used mostly
to transfer IO mapped pages, without either turning off offload capabilities
and copying data into new buffer or using own acks in the protocol.
->sendpage() in the optimised case (hardware supports checksum offloading
and scater/gather) will not copy content of the page to the new buffer, but instead
will increase page's reference counter, so that page could not be freed. When
->sendpage() returns this does not guarantee, that data was sent, received by remote
side or whatever, since packet can be queued (in hardware or qdisk), it can be later retransmitted,
there is no way to know that data was received until ACK (lets talk about TCP)
is received, but there is no API to know that ACK was received. When ACK is received,
appropriate packet will be found in the TCP retransmit queue and freed, this will drop page's
reference counter.
If user (and there is no other way actually) does expect that after
->sendpage()'s return data can be processed (for example rewritten),
then there is non-zero probability that remote side will get this new data, instead
of old, which can lead to state machine breaks and data corruption.
One can try to use sendfile() and simultaneously write data to the
file - remote side can get mix of the old and new data. One can argue that using proper locking
around sendfile() and write will help, but actually it will not -
consider the case when we send only single page - after sendfile() returned,
data still can be in the queue, so subsequent write, which already does not race with
sendfile() itself, but not with data sending, will overwrite data and
remote side will get new one instead of old data.
There are two fixes for thei problem: first is not to use ->sendpage()
(or use it with copy of the data into new buffer, which is essentially how
usual send() works), second is to use protocol specific acknoledgement
system, so that any subsequent operation on given data would be postponed not until
->sendpage()/sendfile() returns, but until that ACK received.
Both greatly harm performance.
I would be really glad to find that my conclusions are incorrect.
/devel/fs :: Link / Comments (8)
Please solve this captcha to be allowed to post (need to reload in a minute): 19 - 40
|
Jens Axboe wrote at 2007-12-18 23:55: