|
|
About ::
TODO ::
Blog ::
RSS ::
Old blog ::
Projects ::
GIT ::
Gallery ::
Notes
Fri, 08 Jun 2007
I've moved to the canoe trip.
If I will not sink (and I will not, neither element can kill a seaman in my mind), I will be back this thuesday.
/life :: Link / Comments (0)
Linus and Andrew talk about current Linux kernel state.
Link.
For me the most interesting was filesystem part - Linux does need a new filesystem,
which must be simple and fast. Neither can satisfy at least partially both parts - each
one is complex and slow in some or all patterns.
I want to change that, developing my own filesystem,
but dues to total lack of time, progress is minimal. Maybe it will even be a totally broken
approach I decided to take, but I want to know that myself.
We will see...
/devel/other :: Link / Comments (0)
Ok, after some work on network splice, it somehow works.
Although received data is not valid (file contains
several repeated chunks sometimes, and sometimes previous pieces of the original file,
likely it is result of incorrect page-boundary crossing processing),
kernel does not crash.
That forced me to change page releasing code in fs/splice.c a bit,
since I think it is not correct, that page can be blindly freed there,
and clone skb for each page splice requests, which is likely too big overhead,
but on receiving fast clone is unused frequently, so maybe there is some gain
there.
/devel/other :: Link / Comments (0)
Playing with splice and networking.
Splice is a in-kernel mechanism, which allows to perform
zero-copy transfer of the pages between different users: it is possible
to 'move' data between usersapce (vmsplice) and/or files (file
descriptors). For example sendfile can be implemented via splice call
(and it is for some file types). Receiving splicing, from another
side, is not supported.
There were several attempts to implement receiving zero-copy, I recall
at least three: my work,
patch by Alexey Kuznetsov and work by intel folks (the latter is very
similar to what Alexey proposed, but was more generic, since it was
first splice work, while Alexey's and mine works were purely receiving
zero-copy (Alexey implemented single-copy approach for unaligned data,
while I changed driver to always properly align data)).
Couple of days ago Jens Axboe from Oracle posted his variant,
which used SLAB pages (that pages are allocated using
kmalloc() function and contain network data if driver does not
use pages as fragments), but was quite broken, since SLAB pages
do not have reference counting (the only page which has non-zero reference counter
is first page in the combined set - SLAB uses 0 and higher-order
pages to store objects), and it never change reference counting
when storing data in that pages. So, it is impossible to just increase
a refenrece counter for any SLAB page, since that will end up badly when
page will be reclaimed in SLAB. I tried to fix that issues and eventually
completed reference counting for SLAB pages, which was based heavily
on Jens' work, but here comes another problem.
While SLAB page is not being freed, it can be reused, and thus the same address inside
the page can store different data at a different time. So, if skb, which holds network
packet, will be freed, but splice will not finish with given page, it is possible
that freed pointer will be returned after subsequent allocation, and data will be
overwritten by the next packet. When splice will finish its work (for example dump page
to the disk), incorrect data will be there.
The right way is to stop skb freeing if page, its data referes to, is being used by splice.
Seems simple, but it is not - the same page can contain quite a lot of packets,
so page must hold a reference for every skb, which data is placed into given page, but that
task is not that simple - there are no unused members in page structure.
While I write this post, Jens posted a patch, which implements exactly the same idea, but with
introduction of privite field in the splice private structure.
Let's check this out.
/devel/other :: Link / Comments (0)
I was invited to work in Yandex - small russian Google.
I declined (as with google), but they insisted to meet (not as with google :),
so I will go to see how they work. Actually, if I would ever work in Google or Yandex,
I would definitely like to create a automatic tracking system over
theirs maps, which would allow to put marks on the map and select
the shortest way between the points getting into account information
about traffic jams and so on. There are such systems all over the world,
but they are heavily limited to the specially crafted vectorized maps,
while I would start with plain pixmaps.
but working in such company (no matter if it is Yandex, Google, SWSoft
or anything else) requires to devote much of the time to them, while I
prefer my own projects (without any gain though), so that will eventually
ends up with cancellation of my own ideas. So, no, at least right now.
/devel/other :: Link / Comments (0)
OpenBSD hackathon.
Hackroom teardown and
second climbing day.
/devel/other :: Link / Comments (0)
|