|
|
About
TODO
Blog
RSS
Old blog
Projects
Gallery
Notes
Thu, 06 Mar 2008
Got 4 seasons of "House M.D."
Only completed half of the 4'th season...
And there is fair number of "South Park" unwatched yet.
Work seems to be stopped for a while...
Does anyone know how to watch them all not wasting a time?
/life :: Link / Comments (0)
POHMELFS: was done just wrong!
So, last several days devoted mostly to thinking about the things and some
experiments with them lead me to the headline conclusion: pohmelfs was done
just wrong!
Its network ping-pong protocol is wrong, its inode resync logic and overall
need for inode number change is wrong, its writeback logic is wrong (btw, why
Linux VFS calls writeback for inode after it calls writeback for inode's pages?
This leads to the inode number resync code duplication and fair number of problems),
its userspace server cache is wrong (well, its userspace server is a braindamage,
but that does not prevent it from being wrong too), and the most important: it becomes complex,
so I frequently have to read my own code multiple times to understand what I meant here or
there.
That just has to be changed (mostly just removed)!
Thinking about all that crap lead me to the more phylosophical conclusion: any network
protocol which requires precise acknowledge for a packet is broken. Point.
TCP is not broken, since it can send acks for multiple packets. TCP can aggregate on both
sides of the connection (which can lead to the huge
performance increase
as was observed in userspace network
stack over netchannels),
so it is a stream, not a ping-pong, although its policy for ack generation is not always the best decision.
Out of curiosity, why original ping and traceroute commands were not implemented as TCP applications
which would catch ack/rst packets?
So, anything ping-pong like is just broken. Never ever use that logic at all, since it breaks performance
and ability to extend. More to the game, it breaks ability to create real duplex communication,
since while you expect an ack you can get data from the other peer for different command.
So, brilliant idea (yes, I sometimes get them from the deep abyss of the mindless) is to convert POHMELFS
protocol into two real streams: from clinet to server and completely independent stream from server to client.
It has zillions of benefits, but lets see how it is going to be implemented and what will be fully broken in the fileystem.
First, there will not be resync logic. At all. Each inode (and its number) on the client will not correspond
to any inode object on the server, so local inodes will never be synced with the server one. Instead cache of the objects
on the server side will be indexed by special keys containing name, length and other parameters needed for unique number generation.
Client inode number will never be sent to the server, so object creation will have only single direction: just send a packet.
If there is unrecoverable error, connection can be broken, so subsequent command sending would reconnect or make some
changes. Things like permissions will be guarded by the client, there might be no space problem though.
Second, commands, which require feedback from the server, like reading directory content will become completely
asynchronous, so feedback from the server will not be exactly a sync reply for given command, instead
we can wait until directory content was populated and start providing it back to VFS.
Third, and the main, there is a possibility for the stream commands both from client and server. Since clients
now do not require sync ack/reply, they can be batched to the maximum performance, but that is not a main feature,
really interesting is ability to receive a stream of commands from the server, so each ot them can be parsed
independently from the original client command state. This allows to implement cache coherency protocol without major
pain and have a high perfomance stream of data from server to client.
More to the game is ->sendpage()/sendfile(), which are
broken
without proper acknowledge, so to fix the issue I plan to submit a socket extension patch, which will call
appropriate registered callback when page reference counter is about to be dropped, which automatically means
data was received on the remote side. This kind of acknowledge does not break connection down more than
simple unidirectional bulk transfer, so it is fast.
So, started deleting lots of code and implement needed bits, the nearest future will show how broken my approach is.
This rises a question about design vs. evolution... I actually prefer the former, but frequently end up with the
latter (like this decision about network protocol, which is a design, but only after several evolution steps
in wrong direction). This reminds me kernel evolution
topic, which does not actually show anything good for the kernel: there are lots of dead-end evolutional branches which
believe they are the top of the progress, maybe mankind is one of them...
That was a lyrical digression, so back to business!
/devel/fs :: Link / Comments (0)
|