Zbr's days.
December
Sun Mon Tue Wed Thu Fri Sat
           
18
         
2007
Months
Dec

About TODO Blog RSS Old blog Projects Gallery Notes

Tue, 18 Dec 2007

Fundamental race between block layer/IO and networking.

This header is about impossibility to work without races with netowork's ->sendpage() method, which is used mostly to transfer IO mapped pages, without either turning off offload capabilities and copying data into new buffer or using own acks in the protocol.

->sendpage() in the optimised case (hardware supports checksum offloading and scater/gather) will not copy content of the page to the new buffer, but instead will increase page's reference counter, so that page could not be freed. When ->sendpage() returns this does not guarantee, that data was sent, received by remote side or whatever, since packet can be queued (in hardware or qdisk), it can be later retransmitted, there is no way to know that data was received until ACK (lets talk about TCP) is received, but there is no API to know that ACK was received. When ACK is received, appropriate packet will be found in the TCP retransmit queue and freed, this will drop page's reference counter.
If user (and there is no other way actually) does expect that after ->sendpage()'s return data can be processed (for example rewritten), then there is non-zero probability that remote side will get this new data, instead of old, which can lead to state machine breaks and data corruption.
One can try to use sendfile() and simultaneously write data to the file - remote side can get mix of the old and new data. One can argue that using proper locking around sendfile() and write will help, but actually it will not - consider the case when we send only single page - after sendfile() returned, data still can be in the queue, so subsequent write, which already does not race with sendfile() itself, but not with data sending, will overwrite data and remote side will get new one instead of old data.

There are two fixes for thei problem: first is not to use ->sendpage() (or use it with copy of the data into new buffer, which is essentially how usual send() works), second is to use protocol specific acknoledgement system, so that any subsequent operation on given data would be postponed not until ->sendpage()/sendfile() returns, but until that ACK received.
Both greatly harm performance.

I would be really glad to find that my conclusions are incorrect.

/devel/fs :: Link / Comments (8)


Climbing evening.

That was very good although again a bit shorter training - most of it was devoted to the complex trace with the start on the horizontal negative slope, which sucked power very quickly, so that at the end (after about 3 hours) I was not able to complete even small parts of it (while doing it quite stable at the begining). Trace requires back and arms especially, so after the training I feel myself tired as hell, which is great of course!
It was very good time there today!

/life :: Link / Comments (0)