|
|
About ::
TODO ::
Blog ::
RSS ::
Old blog ::
Projects ::
GIT ::
Gallery ::
Notes
Thu, 21 Sep 2006
Intel folks implemented TCP socket splicing.
Here
is initial presentation.
It looks like Linus prefers that way of doing receiving pseudo zero-copy,
although there are some other ways too, which allows to create real zero-copy
support into VFS cache and userspace (well, to be 100% fair I need to admit,
that I know only two my implementations (
old one,
and based on network allocator)),
there is also implementation by Alexey Kuznetsov, which does one copy only
and is very similar to Intel's splice work.
Main problem with receiving side is that received data is almost always unaligned
to be used in VFS or userspace. No modern hardware easily allow to specify
where to put that data, and only quite a few of NICs allows to create header split,
i.e. put headers into skb->data and data into list of fragments.
This means that most of the time data must be copied to fill gaps in VFS cache,
which completely kills the whole idea.
It is very unlikely that some vendors will add header split into theirs hardware,
although it can be done as marketing step by some of them, which are
heavily connected with Linux network development team, like Neterion.
Using simple header split and ability to specify data alignment
it is possible to completely eliminate additional copies for any kind
of received data, even if it has some not rounded to power of two size of chunk,
like it was shown in
initial zero-copy implementation.
If MMIO copy was not that slow, it would be possible with cheap card
to outperform modern NICs in server-like workloads.
Let's return to Intel's TCP splice implementation.
Since they use splicing they need to put data into pages and provide them as a pipe
buffer, so for NICs that do not use fragment list, it will require per packet page allocation,
it's mapping, copying of the data and placing it into the pipe buffer.
Splice pipe itself is just a wrapper over wake_up(), i.e.
it is only called "pages were put into pipe", actually special structure, allocated
in the stack is provided to splice_to_pipe() and
it stores pointer to that pages, splice_to_pipe() just performs
some checks and wakes remote side up, so it could get provided pages.
One can see here that splicing introduces another work postponing with sleeping/awakening,
which in some places can end up with major perfromance degradation.
So, TCP splice has two major problems, which are there by splice design -
needed allocation/mapping/copying (compared to copy_*_user() copying only)
and additional work postponing. Usual socket code has a lot of optimisations,
when receiving process does not sleep, which increases socket code performance a lot
(and makes sockets to be a bit closer to netchannels by design),
and which are completely removed with splicing work.
Probably all above are reasons for performance drop for receiving splice, showed
by Intel folks.
/devel/networking :: Link / Comments ()
|