Zbr's days.
June
Sun Mon Tue Wed Thu Fri Sat
         
8
2007
Months
Jun

About :: TODO :: Blog :: RSS :: Old blog :: Projects :: GIT :: Gallery :: Notes

Fri, 08 Jun 2007

I've moved to the canoe trip.


If I will not sink (and I will not, neither element can kill a seaman in my mind), I will be back this thuesday.

/life :: Link / Comments (0)


Linus and Andrew talk about current Linux kernel state.


Link.
For me the most interesting was filesystem part - Linux does need a new filesystem, which must be simple and fast. Neither can satisfy at least partially both parts - each one is complex and slow in some or all patterns.
I want to change that, developing my own filesystem, but dues to total lack of time, progress is minimal. Maybe it will even be a totally broken approach I decided to take, but I want to know that myself.
We will see...

/devel/other :: Link / Comments (0)


Ok, after some work on network splice, it somehow works.


Although received data is not valid (file contains several repeated chunks sometimes, and sometimes previous pieces of the original file, likely it is result of incorrect page-boundary crossing processing), kernel does not crash.
That forced me to change page releasing code in fs/splice.c a bit, since I think it is not correct, that page can be blindly freed there, and clone skb for each page splice requests, which is likely too big overhead, but on receiving fast clone is unused frequently, so maybe there is some gain there.

/devel/other :: Link / Comments (0)


Playing with splice and networking.


Splice is a in-kernel mechanism, which allows to perform zero-copy transfer of the pages between different users: it is possible to 'move' data between usersapce (vmsplice) and/or files (file descriptors). For example sendfile can be implemented via splice call (and it is for some file types). Receiving splicing, from another side, is not supported.
There were several attempts to implement receiving zero-copy, I recall at least three: my work, patch by Alexey Kuznetsov and work by intel folks (the latter is very similar to what Alexey proposed, but was more generic, since it was first splice work, while Alexey's and mine works were purely receiving zero-copy (Alexey implemented single-copy approach for unaligned data, while I changed driver to always properly align data)).
Couple of days ago Jens Axboe from Oracle posted his variant, which used SLAB pages (that pages are allocated using kmalloc() function and contain network data if driver does not use pages as fragments), but was quite broken, since SLAB pages do not have reference counting (the only page which has non-zero reference counter is first page in the combined set - SLAB uses 0 and higher-order pages to store objects), and it never change reference counting when storing data in that pages. So, it is impossible to just increase a refenrece counter for any SLAB page, since that will end up badly when page will be reclaimed in SLAB. I tried to fix that issues and eventually completed reference counting for SLAB pages, which was based heavily on Jens' work, but here comes another problem.
While SLAB page is not being freed, it can be reused, and thus the same address inside the page can store different data at a different time. So, if skb, which holds network packet, will be freed, but splice will not finish with given page, it is possible that freed pointer will be returned after subsequent allocation, and data will be overwritten by the next packet. When splice will finish its work (for example dump page to the disk), incorrect data will be there.
The right way is to stop skb freeing if page, its data referes to, is being used by splice. Seems simple, but it is not - the same page can contain quite a lot of packets, so page must hold a reference for every skb, which data is placed into given page, but that task is not that simple - there are no unused members in page structure.
While I write this post, Jens posted a patch, which implements exactly the same idea, but with introduction of privite field in the splice private structure.
Let's check this out.

/devel/other :: Link / Comments (0)


I was invited to work in Yandex - small russian Google.


I declined (as with google), but they insisted to meet (not as with google :), so I will go to see how they work. Actually, if I would ever work in Google or Yandex, I would definitely like to create a automatic tracking system over theirs maps, which would allow to put marks on the map and select the shortest way between the points getting into account information about traffic jams and so on. There are such systems all over the world, but they are heavily limited to the specially crafted vectorized maps, while I would start with plain pixmaps.
but working in such company (no matter if it is Yandex, Google, SWSoft or anything else) requires to devote much of the time to them, while I prefer my own projects (without any gain though), so that will eventually ends up with cancellation of my own ideas. So, no, at least right now.

/devel/other :: Link / Comments (0)


OpenBSD hackathon.


Hackroom teardown and second climbing day.

/devel/other :: Link / Comments (0)