Zbr's days.
December
Sun Mon Tue Wed Thu Fri Sat
         
           
2006
Months
Dec

About TODO Blog RSS Old blog Projects Gallery Notes

Sat, 30 Dec 2006

Day of uncelebrated birthdays with forest party.


Today we celebrated Yuliana Silich (former Sviridovskaya) and Perec (Pavel Sherstnev) - they decided to make a party in Dolgoprudny forest with fire, vodka and shashlik. And they did not lose - it was great time, although quite hold especially later, but eventually we heated and it had become quite warm. I met old friends which I did not see for several years already, some of them were in the abroad, some of them just did not appear.
Later party continued in Yuliana's house - we drunk as far as I recall upto 3 A.M. with Mephody - we talked about life, work, future plans and the whole situation - eventually we concluded that although I work for small salary, but with extremely cool technologies and this is essentialy work for myself, although he works with interesting ideas in Dell, but he is just a small, although quite significant, screw in the someone other's machine.

That was great time with friends, which smoothly flows into New Year celebration.

Happy New Year!

/life :: Link / Comments (0)


Fri, 29 Dec 2006

Climbing evening.


It was quite new trainig - I climbed on the vertical walls and on negative slope second time for the last three months. My insurance partner was instructor Anya, who helped a lot and even tried to push me up on the final trace - but after four quite complex old traces, combined into series of two in one run, I was too tired to complete on-sight new one on negative slope (start is almost horizontal wall, but next part was quite easy on vertical wall), so I miserably failed several times on start until helped myself with other holds.
It as quite new feelings to climb on vertical walls, but I found that I perform quite good and make some new interesting movings trained 'on the ground' with boulderings.

My next training will be only in a week or so - climbing zone is closed for vacations.

/life :: Link / Comments (0)


New netchannels 'take20' release.


Short changelog:

  • process NAT directly in enqueue callback instead of postponing work to kernel thread
  • fixed obsoleted route update
  • fixed netchannel queueing locking

/devel/networking :: Link / Comments (0)


New kevent 'take30' release.


Short changelog:

  • new private userspace notifications - allows to queue any userspace private event and then mark it as ready using kevent_ctl(KEVENT_READY) command
  • KEVENT_REQ_READY flag - if set kevent will be marked as ready at enqueue time (allows for example to send a signal to process through kevent subsytem)
  • port to 2.6.20-rc2 tree (54abb5fcdae74a811ed440ec6556cabc6b24f404 commit)
  • use struct kmem_cache instead of kmem_cache_t
  • added notificaion type into search key, this allows to have the same id for different types of notifications
Enjoy!

/devel/kevent :: Link / Comments (0)


Thu, 28 Dec 2006

Ireland mug Mephody meeting party.


I organized Mephody and Ira meeting today - we spent a great time in "5 oborotov" restaurant with a lot of old friends - there were Mephody (btw, his name is Alexander Boykov) and his wife Ira, Fedor and Ira Schtrom, Perec, Lyasha with wife Olga, white John (Burnyakov Evgeniy) and me.

As Mephody says, he visits Russia to remove his Europe depression - everything is so damn good and right, that one wants to blow it up - and Russia is completely different - dirty streats, bad weather, all with problems - extremely cool place to live compared to swamp-like table Europe.

/life :: Link / Comments (0)


Wed, 27 Dec 2006

Climbing eveninig.


It was just bloody perfect training - a lot of new small and bigger traverses and boulderings, many old ones, talks with new people, theirs traces... I did not have such training quite for a while alreday - there were no negative moments at all. I think life becomes better and longer with each such evening.
Later I spent several hours preparing to Mephody's meeting - last tickets check, New Year booking and so on.
Now everything is good. I mean EVERYTHING is just bloody good.

/life :: Link / Comments (0)


My vacation timetable.




See how monkey screams "Stop working!". I hear it right now.
I start celebration today evening after climbing training with Schtrom's family (they got theirs keys for new flat), then tomorrow with ireland mug Mephody, Irina and friends. Friday will be short working day with climbing training. Saturday will be devoted to non-celebrated birthdays - I will be in Dolgoprudny forest with friends, fire and vodka. Sunday will be (started) spent in "5 oborotov" new year party with friends.

I will be back to normal life somewhere about Jan 4-5.

Happy New Year!

/life :: Link / Comments (0)


News about kevent.


Jonathan Corbet (LWN editor) has published an article about kevent in LinuxWorld, which has reached me via Linux today news.
As a side note, kevent homepage got the first place in the Google's rank for 'kevent' word.

it becomes popular, and major thing kevent lacks is true AIO, which will be started very soon.

/devel/kevent :: Link / Comments (0)


Mon, 25 Dec 2006

Climbing evening.


It was a good training - all traverses and boulderings were quite old, but I started to combine them, so it was hard traces. I almost completed relife traverse over the whole climbing zone - there is only one place where I do not know how to move (well, I have some ideas on that place, but theirs testing will absolutely end up with failing, which will not be that pleasant since there is only small floor-mat there). Actually I failed to complete the whole traverse, but I did not stop at the middle when ars started to tell me that something is not usual, and they feel itself a bit uncomfortable - probably at the moment I had quite monster view (wry face due to pain, heavy music from head-phones, likely I produced some non-human roars trying to stimulate myself to complete complex or hard movings), since all people in opposite direction quickly moved away...
Anyway, it was good training.
Climbing is quite addictive and I became a victim.

/life :: Link / Comments (0)


New netchannels release is coming...


It is inevitable like New Year.
I've fixed bug, introduced in take19 release and plan to put new version to the wild.
I've also setup new machine with gigabit NIC, so I will test NAT performance and then release new version.
This is scheduled to tomorrow, and now I am, dirty as pig after yesterday flat development, going to climbing zone for my regular (almost daily) dose of good mood and shower.

/devel/networking :: Link / Comments (0)


Sun, 24 Dec 2006

Meanwhile on flat development side.


I've completed my hinged ceiling. It is not 100% ready yet, but about 90-95% - small board is created, almost all ceiling seams are filled with plaster, all parts are combined together. The only things to complete are: dotty lights setup, plaster filling complete and final postprocessing (includion the rest of the ceiling).
I never worked before with stucco cardboard, but ceiling looks really cool - it has two rounded layers with boards, the whole covered area is about 8-10 square meters, distance between layers and main ceiling is about 11 and 23 santimeters accordingly.
Hinged ceiling will contain dotty lights over the area and neon cord in the board.

Next thing is to complete the whole ceiling final plaster postprocessing - it will include three layers of plaster (and one stucco ground already filled) - ground, finish plaster which will form the paint and finall fixative colour wax.
This is quite simple work, but getting into account the fact, that I only perform my flat development each Sunday, this can take some time...

And I'm still in doubts about bathroom - I do not have bricks to complete my idea about small podium for handmade shower cabin, so probably I will fill the bathromm with ceramic tiles and then will setup usual shower cabin - we'll see...

/devel/flat :: Link / Comments (0)


Sat, 23 Dec 2006

New kevent 'take29' release.


Short changelog:

  • optimized af_unix to use socket notifications
  • changed ALWAYS_QUEUE behaviour with poll/select notifications - previously kevent was not queued into poll wait queue when ALWAYS_QUEUE flag is set
  • added KEVENT_POLL_POLLRDHUP definition into ukevent.h header
  • libevent-1.2 patch (Jamal, your request is completed, so I'm waiting two weeks before starting final countdown :)
    All regression tests passed successfully except test_evbuffer(), which crashes on my amd64 linux 2.6 test machine for all types of notifications, probably it was fixed in libevent-1.2a version, I did not check. Patch and README can be found here.
libevent is very BSD specific with a lot of workarounds for epoll, /dev/poll and finally for kevent, especially for signals in the first two, but kevent does not require it - this is another argument in favour of absent unneded sigmask parameter for kevent.

Kevent support for libevent can be further optimized though, current code adds event when appropriate callback is invoked, while kqueue postpones it to dispatching time - they can use it since kqueue() contains additional array of events, which are considered as changes to existing ones. It is possible that kevent addition could be postponed too.

All kernel kevent options must be turned on (namely CONFIG_KEVENT_POLL, CONFIG_KEVENT_SOCKET, CONFIG_KEVENT_PIPE). I did not hack 'configure' to check for supported notification types.
Call me lazy boom slacker.

/devel/kevent :: Link / Comments (0)


Fri, 22 Dec 2006

libevent has been ported to kevent.


In the most simple way - it currently only supports signal notifications and poll for all others, which should be optimized for sockets and pipes.

It is good library indeed, although it is very kqueue() centric.
And it does not contain any documentation about internal structures (i.e. there is no documentation for those who want to add new event dispatching mechanism into library), although there is a good man page for users.
And it is made by OpenBSD developer Niels Provos, this OS, if you do not know, claims that it has the best documentation ever (as a side note: no matter how good documentation is, my work with mbuf in its network stack completely killed _any_ possible wish to work with its kernel in future :) ).
Fortunately there is a source code for kqueue(), so kevent poring was easy.

Porting uncovered interesting 'feature' of one of the kevent flags - KEVENT_REQ_ALWAYS_QUEUE, it was added to workaround applications which can not deal with the fact, that event can be immediately ready and thus will not be queued (like lighttpd), in case of poll/select notifications and this flag being set, kevent was not added into wait queue for given file and thus will never be processed later, which leads to impossibility to work with this flag and poll notifications.

Another issue is that set of regression tests in libevent crashes on every event dispatching mechanism I tested (poll/select and epoll) on my AMD64 test machine, but I have not looked deeply into this issue.

I will fix poll/select notifications and KEVENT_REQ_ALWAYS_QUEUE flag issue and release new kevent version and library port tomorrow. I will also include socket/pipe optimisation into library port (not supported right now).

/devel/kevent :: Link / Comments (0)


Thu, 21 Dec 2006

New netchannels release and scalability tests.


I've release version #19 of the netchannel subsystem, which contains API changes required to run userspace network stack.

This version contains locking bug which leads to crash on netchanel removal when a lot of them were created.

Scalability testing.
Script has created 22k wildcard (only one dimension (source address) is wildcard to save some mem) NAT rules (16k $rand.$rand.$rand.$rand/255.255.248.0 rules and 6k $rand.$rand.$rand.$rand/255.255.255.$rand).

I.e. it is equal to 22k following netfilter rules:

iptables -t nat -I POSTROUTING \
        -s $rand.$rand.$rand.$rand/255.255.248.0 \
	-d $static/$some_wildcard_mask \
	--sport $static_num/$some_wildcard_mask \
	--dport $static_num/$some_wildcard_mask \
	--proto $static_num/$some_wildcard_mask \
	-j SNAT --to-source $static_num
$static_num and $some_wildcard_mask are different on each line, but are the same for different commands.

Raw socket and netchannel send/recv performance over gigabit link is the same as described previously.

I can not test NAT processing speed itself (I do not have enough test machines :).

Userspace trie tests (described here) included 1-2 millions multidimensional wildcards insertions/searches.

/devel/networking :: Link / Comments (0)


Userspace network stack over new netchannels performance.


Sending:

  • 128 bytes writing:
    • netchannels: 27-28 MB/sec, 20-30 % CPU usage
    • sockets: 7-8 MB/sec, 80-90 % CPU usage
  • 4096 bytes writing:
    • netchannels: 27-28 MB/sec, 20-30 % CPU usage
    • sockets: 30-31 MB/sec, 30-40 % CPU usage
Receiving:
  • 128 bytes reading:
    • netchannels: 70-71 MB/sec, 80-90 % CPU usage
    • sockets: 24-25 MB/sec, 80-90 % CPU usage
  • 4096 bytes reading:
    • netchannels: 73-74 MB/sec, 80-90 % CPU usage
    • sockets: 79-80 MB/sec, 80-90 % CPU usage

New version of the userspace network stack is available in archive.

/devel/networking :: Link / Comments (0)


Kevent will celebrate its first birthday (one year) in a two weeks (Jan 6).


I will try to start and get some results about true AIO for that date.

Btw, due to the fact that kevent stalls, its inclusion into lighttpd official tree is postponed/dropped.

Update: I've put a task to patch libevent to support kevent into my schedule - site is currently down, but I hope it will be ready soon, so I can complete this task this week.

/devel/kevent :: Link / Comments (0)


Kevent patchset has been resent again.


This is second resending of 'take28' patchset - the last one will be sent either saturday or monday.

Due to complete lack of feedback (oh, no, Ulrich Drepper said that kevent build fails on his machine, but did not present actual files and did not confirm that it was fixed by 'make prepare' command, so it can not be counted).

Jonathan Corbet has published an excellent article in the Dec 13 LWN edition about kevents.

/devel/kevent :: Link / Comments (0)


Wed, 20 Dec 2006

Climbing eveninig.


It was first time for the last three months I climbed on the walls and not traverses or boulderings. I held a girl from DDS (which is closed, so there are enourmous crowds of people in Skala-city), which actually most of the time were in hinged state, so the whole training was not that active.
I completed three old traces and found that although I can do the things, it was not that easy as I expected, but nevertheless it was good training although easy and short (if number of minutes I spent on the walls are calculated).
Next time unknown girl will ask me to insure her, I should ask if she permit me to climb a bit more than 3-4 traces :)

/life :: Link / Comments (0)


Network address translation (NAT) works over netchannels.


One can add/remove source/destination NAT using connector (Documentation/connector/ in your source tree).
NAT over netchannels uses two aditional netchannels per dataflow (input and output from NAT server point of view) and one main (what administrator sets up using source/destination rules) as I described previously.
It uses the same tries as other netchannels (userspace or others), so only one lookup is performed to transfer packet.
Packet processing happens in process context on behalf of special threads, dedicated specially for netchannel processing, when packet enters netchannel stack and netchannel has been selected, it is possible to schedule its processing either to current thread, or to thread on other CPU.
Netchannel searching is lockless (protected by RCU), skb queueing is proceted by spinlock.

So, brief list of netchannel features:

  • multidimensional wildcards support
  • RCU searching
  • single multidimensional trie for different kinds of dataflows
  • dedicated processing threads with possibility to schedule processing on different CPUs for those netchannel types which are not acked with processing context
  • userspace netchannel backend (allows to receive packets to userspace), which can be used for:
    • high-performance sniffers
    • tun/tap device replacement
    • packet socket replacement (note, that netchannels steal packets from main stack)
    • userspace network stack implementation
    • own protocol stack implementaion (from VPN tunnels to TOE)
  • netfilter netchannel backend (only NAT is supported as the most interesting user, NAT caches appropriate route, so essentially routing becomes part of the netchannel trie)

Some testing was done in 'emulator', i.e. pre-netchannel userspace multidimensional trie implementation.
Results are here.

This version (netchannels.18) can be found in archive.
it has been also sent to netdev@ for review and comments.

/devel/networking :: Link / Comments (0)


Tue, 19 Dec 2006

Netchannels and NAT.


Only because of drink evening and korean stories I have not completed NAT support for netchannels, but I will do it tomorrow or in a day (if climbing training will not be that hard).

/devel/networking :: Link / Comments (0)


Collegue has returned from Korea (Samsung).


First impression - they drink. No, they do drink. Even, they DO DRINK. That's a good news if I will ever visit that country.
He has brought us a box (about 20 small bottles of 0.3 litre) of super-puper known product called 'Soju' or something like that - it is korean vodka, which has only 20.1 degrees compared to russian 40, but nevertheless is quite strong.
We drunk a littel and listened a lot of fun stories about koreans (I'm sorry if it will be a bit unpleasant, but what would you expect from big white monkeys?).
First of all, theirs language. From russian point of view it is fun.
For example they lived in big hotel somewhere on the sea-side (15 minutes from Seoul), and every time they entered a lift (in Samsung) some movie was shown - how many degrees must be in different bows and what are other rules of decencies...
After several such movies they started to use interesting korean phrases as toasts in korean restaurants (well, they did drink a lot there).
So, the most interesting (I managed to recall after couple of bottles of theirs 'Soju') phrases which sounds specially in russian:

  • 'ebazie' - something like 'hello' when you answer a phone call.
  • 'humida' - something like 'to be' (actually it is pure russian pronuncation, used for phrases like 'that thigs', i.e. 'eta humida').
  • 'ebti' - something like 'next door' - does not require a translation for russians, we will forget it in a minute (I specially wrote on paper), since it sounds too fun.
So, imagine several big white monkeys sitting in the korean restaurant, which drink for 'Hello', drink for 'Doors are closing', 'Next door' and 'to be' words, i.e. one of them stands and starts using grave voice:
Let's drink for the next door.

Collegue has told interesting story about sea products and how they eat it.
In Russia we sometimes exit the city and go 'po gribi - po yagodi', i.e. for mushrooms and berries, and we eat berries right in the forest.
Koreans do the same with sea-livers - consider a family, which has a good car(s), which moved to the sea-side from Seoul to get some fresh monsters - after low tide they get alive ratchets and eat them right on the coast. If ratchets want to run away and dig into the sand, they have special small spades - just like russian mushroomers have special knifes.
Another story was about _alive_ octopus being cookied right for them, so even after its skin was removed and it was sliced, its parts still moved (even in your mouth). Or some other alive, which cried when you eat it.

They start to drink thursday evening (at least in Samsung), they drink a lot, but automagically friday morning they are fresh (although each evening they are so tired that can sleep on neighbours shoulder in the bus), then exactly the same happens firday and saturday (including major drinks and parties), but sunday they get themself and go shopping (but can sleep in the shop).

It looks like they live completely different lives than we.
I think I would like to visit Korea.

/life :: Link / Comments (0)


Elton John sings oven instruction.




Excellent. Although I do not like his songs, but it is cool.
Audience is very interesting.

In russian I would say 'Zhest' kak ona est'.

/other :: Link / Comments (0)


Mon, 18 Dec 2006

Climbing evening.


Today we took a film action. Actually not film, but "Usage instruction" TV show - two girls as presenters, operator, some other people...
I hope I was not screened, but eventually they stopped on my traverse way, so I couple of time asked them to move (very civilly indeed).
Among this it was quite good usual training - old traverses and boulderings, one or two new one - I started to try 8b/8c start - obviously I can not climb such complex trace, but I try to make several start holds - quite successfully though.

/life :: Link / Comments (0)


Refrigerator or flatiron?


Who will win if turned on flatiron will be closed in the refrigerator?

One man decided to check and closed faltiron in the refrigerator, after about 30 minutes, when flatiron almost completely defreezed refrigerator, a very strong steam in the inner area cracked something in flatiron and it had died, refrigerator got the situation under own control and iced the door.
Draw?

"Who I am, where I am, what is my address, where is exit?"
One day of DHCP packet's life.

- What is galvanic resistance?
- It is rise of the batteries.

Got from bash.org.ru (in russian).

/other :: Link / Comments (0)


Sun, 17 Dec 2006

Initial work of netchannels NAT.


Source NAT works in one direction - i.e. it is possible to send packets from internal network to the outside using address of NAT machine as a source, but packets are not received yet (actually they are, but no address changes happens).

/devel/networking :: Link / Comments (0)


The first resending of the 'take28' kevent release.


Since I got no feedback about three previous releases and one resending, I started decrementing counter of resendings - when it hits zero, I will stop resending and move kevent into maintenance mode without further pushing it upstream and will concentrate on more interesting work than sending mails into non-feedbackable black hole called linux-kernel@.
Situation with kevent becomes idiotic from fun - a lot of people want that functionlity, kevent works faster than epoll, there are patches for real-life users, implemented tons of functionality which allows kevent to scale to hell, and things stopped.
So, if people for whom it was developed do not care, I do not care even more - I created it, I can kill it.

Thanks a lot for those who supported kevent and tried to push it upstream, but it looks like we wasted our time for nothing :)

/devel/kevent :: Link / Comments (0)


Sat, 16 Dec 2006

2.6.19 kernel acrypto release.


Acrypto allows to handle crypto requests asynchronously in hardware.

Acrypto supports following features:

  • multiple asynchronous crypto device queues
  • crypto session routing (allows to complete single crypto session when several operations (crypto, hmac, anything) are completed)
  • crypto session binding (bind crypto processing to specified device)
  • modular load balancing (one can created load balancer which will get into account for example pid of the calling process)
  • crypto session batching genetically implemented by design (acrypto provides the whole data structure to crypto device, i.e. it is possible to use acrypto as a bridge which routes requests between completely different devices, since it does not differentiate between users, just handles requests)
  • crypto session priority
  • different kinds of crypto operation(RNG, asymmetrical crypto, HMAC and any other)

Combined patchsets include:
  • acrypto core
  • IPsec ESP4 port to acrypto
  • dm-crypt port to acrypto
  • OCF to acrypto bridge, which allows to run OCF device drivers with acrypto (for example ixp4xx), requires OCF installed.

/devel/acrypto :: Link / Comments (0)


Fri, 15 Dec 2006

Climbing evening.


I've bought myself an mp3 player since Skala-city's sound accompaniment does not hold any critics (in my opinion), so I started to climb with own music in the head.
That was great training - likely because of excellent mood and good music - although most of the boulderings I made were old ones, but I started to climb several new ones - quite complex, but very interesting. Eventually I even completed part of the trace instructors failed to do by using parts of the relief they dit not found (it looks like some other master created that trace, not local trainers), but I'm quite sure they will complete it soon too.
I got an extremely super good mood - people likely with great surprise looked at man who climbs alone and smiles itself even when fail.
That was great time.

/life :: Link / Comments (0)


Shit happens.


My boss has read my blog...
And probably is reading right now - if I will be fired soon, that will not be a surprise.
At least I will stop doing uninteresting things :)

/other :: Link / Comments (0)


Thu, 14 Dec 2006

New acrypto release for 2.6.19 kernel tree.


Changes from 2.6.18 are small enough, but my testing machine is not functional yet, so I will start testing tomorrow and then will announce release in mail lists.

/devel/acrypto :: Link / Comments (0)


New kevent 'take28' release.


Short changelog:

  • made kevent default yes in non embedded case
  • added falgs to callback structures - currently used to check if kevent can be requested from kernelspace only (posix timers) or userspace (all others)

With this release I start 3 days resending timeout - i.e. each third day I will send either new version (if something new was requested and agreed to be implemented) or resending with back counter started from three. When back counter hits zero after three resendings I consider there is no interest in subsystem and I will stop further sending. Thanks for understanding and your time.

/devel/kevent :: Link / Comments (0)


Wed, 13 Dec 2006

Climbing evening.


Not bad and quite long training - today it was devoted to bouldering traces.
I ran quite a few, although most were old enough, but also found couple of new ones.
Found why instructors climb in running shoes but not in special climbing shoes most of the time - they say that when new trace is created and they can complete it in running shoes, then likely usual people can complete them in climbing shoes, otherwise we need to be able to climb 7'th cathegory and higher. Well, maybe that is true, most of the time they create simple traces it does not matter in what shoes one can run over it.
Talked with instructor Anya about climbing areas in Turkey - she wants to open climbing tourist agency there, although I'm not sure it will have major profits - there are a lot of good climbable rocks, weather and area are good for that too, but I doubt there are a lot of climbers all over the world who will visit it again and again - people tend to like different places.

/life :: Link / Comments (0)


Avoiding - and fixing - memory fragmentation.


Interesting article in LWN about possible solutions for avoiding and fixing memory fragmentation.
This issue was main goal of the network tree allocator, which among others has ability to perform zero-copy sending and receiving, which I plan to integrate into netchannels.
Main difference is the fact, that network tree allocator combines chunks inside pages including composed pages, while solution proposed by Mel Gorman is to combine pages.

/devel/networking/nta :: Link / Comments (0)


Tue, 12 Dec 2006

Network address translation (NAT) over netchannels.


Netchannels itself work - one can configure either userspace netchannels or kernel ones, which are processed by kernel threads.

NAT is special and thus interesting case - packets must be mangled and resent upto two times - when packet is received from one network (before changes), then it should be changed, sent to likely other network, reply received, changed and sent back to original host.
There are at least two types of rules in such scenario.
One type contains NAT rules which are created by system administrator, another one is per connection rule, which includes real manipulation data. At minimum it contains two routes - to original host and to remote host. Main mangling rule setup by administrator, for example 192.168.0.0/255.255.0.0 convert to look like source is 1.0.0.1, can contains several different networks, which are connected through different network adapters, so input routes are essentially per-connection, the same applies to destination routes.

So, what is the logic behind NAT in case of netchannels or netfilter?
I will describe how netchannel implementation will work.
When packet first time enters network stack and it hits NAT netchannel, it will be queued for processing in special thread, where new netchannel with exact source/destination parameters will be created, including routes for input and output directions (actually they are both output directions for NAT server (or only one direction, if connection is to/from NAT server itself), but one of them is input and other is output from the initial packet point of view, so I will call it that way), then packet will be changed and sent to appropriate destination. When next packet or reply will enter the stack, it will be caught by that new netchannel, since its priority is higher, and eventually packet will be changed and resent in special thread. Each new netchannel, created by NAT system, will have either kernel timer attached or special state machine will be created to track too old netchannels which potentially can be freed (i.e. some kind of garbage collection).
Since each new netchannel, created by NAT system, has always exact parameters without wildcards, it would be possible to put them not into trie, but into special binary tree with faster search time, but such tree must be put into main NAT netchannel to not mess with other rules, which were setup in trie, so actually there will be even more lookups than in case when new netchannels are put directly into the same trie - one lookup for main NAT netchannel and then additional tree search for subnetchannel (lets give such a name to netchannels created by NAT system).
In theory it is possible to not create new subnetchannel for each new connection - system can take an advantage of the fact, that several dataflows (potentially mapped to individual netchannels) can use the same routes, so they could share the same subnetchannel, but in practice there is no way to select set of addresses which contain the same route, only possible to get route by addresses.
This rises a question actually, since I want to use exactly one netchannel for server-like scenarious, when there is only one listening netchannel, which transfers data to/from userspace, which handles protocol itself. I will think about posible good implementation of such a feture, but until it is done, new netchannel must be created for each new connection.

Netfilter NAT work essentially the same.
It is based on famous connection tracking subsystem - each packet, when it enters the stack, will get connection tracking entry attached to it, when packet within the same connection (i.e. source/destination parameters) enters the stack, it will be changed according to that connection tracking entry. Connection tracking entries live in own cache independent on NAT, but called from NAT rule - so we have exactly two lookups which I decided to avoid in described above netchannel scheme. Netfilter/connection tracking lookups are perfomed in fixed-sized hash tables, contained linked lists, which does not scale neither with number of packets, nor with number of rules.

/devel/networking :: Link / Comments (0)


New kevent 'take27' release.


Short changelog:

  • made kevent visible in config only in case of embedded setup
  • added comment about KEVENT_MAX number
  • spell fix
This major patchset has been sent to linux-kernel@ and netdev@ for review.
Let's see where it will end up.

/devel/kevent :: Link / Comments (0)


Mon, 11 Dec 2006

Climbing evening.


It was quite short training, only about two hours, but quite hard - I started new relief traverse part - from the begining to the middle, where previously I failed, now there are only couple of places on the whole trace which I can not complete, but I can not run the whole trace even without that parts - do not have enough power endurance, although trace's rating is not that high (about 6c).
Also several old boulderings and traverses were completed and that was all.
Now I feel pain in virtually every pipece of my body, and strangely enough I like that masochistic feeling. Well, all except pain in knees and head, which I damaged a bit during unsuccessfull failings from the wall, i.e. all muscle pain is realy pleasant - it is not something really strange, but if I would have a psychoanalyst it could be probably a thing to worry about.

/life :: Link / Comments (0)


The last 'take26' release has been resent.


Let's see how 'commit early' linux motto sucks (well, it only works for major known developers) - I have not received any feedback from that major release, which included most (all except sigmask syscall parameter) issues mentioned by other people as long as a lot of additional features.
And kevent (although obsoleted 'take25' release) was removed even from -mm tree.
Maybe I should heat an interest in kevent a bit with this kevent benchmark?..

/devel/kevent :: Link / Comments (0)


Netchannels got wildcards support and more modular design.


I splitted netchannels into core processing unit, which selects netchannel from the trie when packet arrives, queues packet, export add/remove and other operations and other similar core tasks.
Trie layer itself is just an implementation of the multidimensional data structures, used for fast searching of the final destination point (i.e. netchannel), it can be replaced with some other implementation easily as long as new one provides the same three function to add/remove/search operations and obey the same locking rules, which should not be hard, since netchannel layer does not perform any kind of special locking in that operations, except that requirement for search RCU protection and RCU-aware netchannel freeing.
Other layers are just netchannel users - userspace with ability to work with netchannels using syscall interface, NAT implementation (work in progress yet).

Interesting decision was done for some netchannel subsystems which can not have backed process - like NAT, when packets must be processed in kernel only. There is set of special threads, which performs netchannel processing (it is quite simple - they just calls nc_process() callback of each linked ready netchannel), processing thread selection can be performed when new skb has been received - system can select local thread or schedule netchannel processing on the some other CPU bound thread (for example if netchannel is being processed on different CPU, cache thrashing overhead will be much smaller if system will move skb to different CPU, than the whole netchannel to current CPU).

Thus this subsystem only performs RCU protected netchannel selection, spinlock protected skb queueing and awakening of the actuall processing thread (either userspace parked in syscall, or kernel thread).
All the real work is performed in either userspace or by kernel users (currently only NAT).

I plan to complete NAT implementation, perform some testing and release new major update this week.

/devel/networking :: Link / Comments (0)


Sun, 10 Dec 2006

Hinged ceiling setup. Layer 2.


I've completed the heaviest work - weight of the second layer is about 40-50 kg with about 6-7 square meters, the whole ceiling thus is about 60 kg and about 10 squares with two levels. Doing that alone was not that simple task, but it is done already, fortunately without major damages to the neighbour nature.
Now it requires plaster filling, rubbing, board creation and final postprocessing. Likely I will complete that part this week.
After it is done, I can say, that the dirties parts are done and I can start finish processing (although I did not even start bathroom development - did not not get bricks and concrete).

This is phone-made photo of the part of the ceiling completed about midnight.

Second layer of the hinged ceiling

/devel/flat :: Link / Comments (0)


Tue, 05 Dec 2006

Due to personal issues I will be offline for several days.

/life :: Link / Comments (0)


Badness in postponing work.


RCU is not a perfect solution for existing schemas actually. Lets consider synchronous situation when object freeing does not only return memory to the cache, but also perform some additional state machine changes - like putting reference counters and so on - obviously postponing of such work likely will not be very good solution, since when we work with fast path, we want the whole sequence be completed quickly, but not fast first half and postpone other for a long time.
But even pure memory freeing - i.e. returning mempory to the cache, if being postponed to RCU callback, can lead to very noticeble performance degradation.
Thinking about better and faster skb processing for netchannels I created simple patch which just postpones skb freeing (kfree_skbmem(), i.e. pure releasing memory back to skb cache) to RCU callback invoked from __kfree_skb(). This leads to the following performance degradation (receiving of small packets):
kfree_skb() RCU speed degradation

Speed is about 2.5 times slower, although CPU usage is smaller too - likely due to the increased work of RCU tasklet and increased number of context switches.

As a conclusion: using RCU protected lists of skbs for sockets will lead to major performance degradation.
As a second conclusion: RCU is not a panacea, so its usage will be limited in netchannels.

/devel/networking :: Link / Comments (0)


Locking tricks inside trie algorithm.


This tree is never rebalanced by design, its pointers are always the same except the case when node is added or removed, thus it is possible to use RCU protection when traversing or modifying trie structure - just using rcu_dereference() and rcu_assign_pointer() it is possible to create completely lockless network stack.

Let's consider trie modification which leads to node removing (when appropriate reference counter becomes zero).
Trie removal algo deletes pointer from root node to its child using rcu_assign_pointer() and all subsequent subnodes and schedules node freeing through call_rcu() mechanism. Trie traversal, which uses rcu_dereference(), will eventually fail to dereference next pointer or will complete netchannel processing and thus will exit RCU protected section, which will lead to freeing appropriate nodes.

Node addition is essentially the same - set of node insertions (if new trie set of nodes requires allocations and insertions and does not reuse existing inserted nodes) will be 'published' one by one through rcu_assign_pointer(), which can be interrupted between different isnertions and lead to obtaining NULL pointer, which is perfectly fine, since netchannel is not yet added.

RCU callbacks are scheduled for invocation after so called grace period is completed, in practice it means that callback will be invoked some time after context switch happens (surely since context has switched, section is no longer executed, thus it is ended), so we need to protect our execution path from being scheduled away while traversing the trie with rcu_read_lock()/rcu_read_unlock(), which is normal, since searching path is executed in softirq context. So when searching is completed new data is added into netchannels queue and appropriate user is awakened (either userspace process or kernel thread/work queue for things that do not have userspace process, for example netfilter checks, for the latter case netchannel will be put into appropriate per-cpu list, accessible from kernel thread, bound to given CPU).

Since netchannel processing happens in different context than searching, it means that grace period has elapsed, thus netchannel and related nodes can be freed, while netchannel will be accessed from processing context. To solve this problem and increase performance I design special multeplexing schema which will be used in softirq/hardirq fast path and allow better scalability by minimizing number of context switches and data moving between CPU caches.

Addition is simple - while holding mutex, which will prevent simultaneous deletion, it will allocate nodes one by one and publish them using rcu_assign_pointer(), since fast path running in different context uses rcu_dereference() it will either get new value with all above logic, or get NULL pointer, thus returning without processing.
Addition will not be protected by rcu_read_lock()/rcu_read_unlock(), since simultaneous deletion is not allowed due to mutex being held, so sleepeing allocation can be used.

/devel/networking :: Link / Comments (0)


Mon, 04 Dec 2006

Meanwhile on flat development side.


I completed to fill self-leveling floor in kitchen, bathroom and part of the hall - do not believe the advertisement, it does not self leveling, no matter how thick the layer is - one needs to level it during filling. Its nature is about the same as plaster, so it behaves exactly like it, but it is more fluid. I spent about an hour mixing and filling, and then about 3 hours to finally level it - get palette-knife and meter-by-meter stating to cut all non even parts, with some water oiling it becomes quite good.

Thinking some more about hinged ceiling I decided to implement second layer even although there are no long enough hangers - I will use parts of ceiling profiles with cut middle part to implement some kind of 'U' hangers - without middle part it is possible to level ceiling.

/devel/flat :: Link / Comments (0)


Multidimensional trie test has been completed.


It ran for almost two days and about 7.4 milliards of insert/search operations were completed in 1 million series. No crashes, no failed searches or inserts, no bugs. It's a win.
I start to combine netchannels and this algorithm so wildcards get supported.
Thus netchannels become extremely useful functionality for Linux network stack - it allows to create and test different network stacks in userspace or kernelspace, thus creating TOE engines without influence on core stack (since netchannels engine runs before core network protocol hooks are called), it allows to create very fast and scalable network filters and packet manipulation entities like NAT. Can you ever consider Linux system with one million netfilter rules? System with million of sockets is already quite hard to get.
I also have some plans to implement userspace network stack as a library and patch existing web server (likely lighttpd to work with it - it only requires one netchannel (wildcard of course), i.e. only one socket in existing notation, and socket and related structures like inodes and files allocation and freeing and related syscall overhead is quite major limiting factor in existing systems.
Other system usage include packet-socket-like interface - when userspace can select set of IP addressed to work with without main kernel stack - like BGP/OSPF daemons, TUN/TAP interface, VPN proxies and much more.

So there are plenty of interesting ideas, which I plan to sublimate my energy into...

/devel/networking :: Link / Comments (0)


Sun, 03 Dec 2006

Huge development shopping day.


I spent the whole day in Leroy merlin development shop - from early morning to 22-00 without food and drink I selected ceramic tiles, walpapers, floor covering and so on. I can not say I liked it, but it should be done and now I have everything to complete my repair process.
Since there are no standard long hangers I will create only one layer for hinged ceiling. I also bought new finish plaster for decorative ceiling smearing - looking at what it is and how it should be used I can conclude that I cound do the same with usual finish plaster (not stucco, but oil) and it could about 5-10 times cheaper, but let's see what our french friends invented. Also bought transformer and set of ceiling dotty lights - I plan to complete hinged ceiling this week, and it would be great if I could do it tomorrow.
Main thing is of course ceramic tiles - women in shop agreed that they do not like and do not even imagine how grey/blue marbled tiles on the walls (it is actually floor tiles) with blue marbled floor can look like. Then it was decided that it is man's bathroom, where everything is possible. This took about 4-5 hours. Well, let's see what my imagination created.
Wallpapers selection was simpler - I almost did not have power to spend there as much time as I did when selected ceramic, so I bought a lot of wallpapers for room, kitchen and checkroom.
Floor covering was easier - although it is a bit early, but this will force me to faster complete developemnt - various types from black ceramic granite to blue carpet covering.

Major things I missed are bricks and concrete for small podium I want to create for shower cabin, this will stop bathroom development actaully, but I plan to get them next week.

Eventually I bought all that and other stuff, which will be delivered Wednesday, so while waiting for the bus I decided to visit nearly located Auchan gigantic shop - I just wanted to find immersion heater - but I completely lost in the area of the couple of stadiums filled with various stuff - nature brought me to the area where strong alcohol drinks were sold, so I got bottle of rum (they do not have "Captain Morgan", only "Baccardi") and moved away.
While waiting for the bus and drinking some rum I decided that it was not that bad day actually.

/devel/flat :: Link / Comments (0)


Sat, 02 Dec 2006

Multidimensional trie searching algorithm is completed.


I'm back to work on this problem and here are results.

Hardware.
Intel Core Duo 3.4 Ghz (each core runs with 3.7 Ghz) with 2Gb of RAM (userspace application is singlethreaded, so only one core is used).

Test description.
A number of 3-dimensional pseudo-random rules has been inserted/searched/inserted+searched.
More than 80% of rules contain pseudo-random wildcards (wildcard is formed with higher bits from 0 to 31 set without gaps). Each rule is a 32bit range, for example 192.168.0.0/255.255.255.0 produces such a range.
Searching is performed for the non-wildcard rule, i.e. rule with 0xffffffff mask.
Such setup is a powerfull IPv4 packet filter emulation.

Results.
Graph shows number of microseconds needed to insert (including time for memory allocation), search and sum one multidimensional rule depending on total number of rules.

Trie algo speed

This graph shows memory usage for trie implementation. Theoretical maximum limit is calculated by multiplying number of rules, number of directions, size of each rule object (in onde dimension), 32 (number of bits, i.e. number of rule objects to represent the rule) and 0.6 (20% is non-wildcard rules and 80% is wildcards with equal distribution of 1-31 bits, so 0.2 + 0.8*0.5). Each rule contains 28 bytes on x86 of auxiliary information, which is used to build the tree, it can be reduced though.

Trie memory usage

Test run for the case without wildcards shows increased memory usage (since with wildcard existing algorithm allocates nodes only for masked bits, i.e. the biger mask is (0xf0000000 is bigger mask than 0xffff0000 in above notation) the less memory it requires, thus insert time was increased, but still is constant value about 12 microseconds.
Search speed becomes constant and is about 1 microsecond.

I've started following test to run till Monday - system creates and inserts 500.000 rules with random data and prefixes, then 500.000 rules without wildcard, after each insertion it search for the rule, after one million rules has been isnerted, program starts from the beginning (by existing and restarting). I do not put rule deleting test, since there is no memory to hold all rules and all intervals used to create those rules (deleting was successfully tested on smaller amounts).

/devel/networking :: Link / Comments (0)


Fri, 01 Dec 2006

Climbing evening.


It was interesting training - besides usual traverses and boulderings I created new type of exercises. Consider the case when you got big hold, where two arms can be placed and there is some place for feet - yes, while you are in hinged state, you get your feet over the head to arms (another leg can be setup for friction stop), put the feet onto the hold and try to move your body so that you sat on the hold - some kind of lifting your body using arms and leg on hold from the initial place (somewhere over the head) to the place when you sit on the hold. It is not that complex exercises when you use two holds - one for arms and different one for leg, but when there is only obe hold it becomes very interesting - I spent quite big part of the training trying to complete it using different holds - big reliefs were done quite easily, but usual big holds required quite a lot of efforts. That was great time.

Something like that (photo from bouldering championship in Skala-city, man on photo is not me), but with only one hold about 20 santimeters width (like big blue hold rotated to 90 degrees). Final position is when you sit on the hold, where right hand is.

Interesting bouldering

/life :: Link / Comments (0)


First sounds flew out of my new trumpet.


And I would not call them beautiful. Actually couple of times I produced really nice sound, but I can not easily reproduce it, so most of the time (I started about 23-30 and practiced about an hour) it was horrible wheeze and creak, but I alsmost start to figure out how to play notes...
I think it will take looong time to produce something interesting, but I do not hurry.

/life :: Link / Comments (0)


Thu, 30 Nov 2006

New kevent 'take26'release.


Short changelog:

  • use struct timespec as timeout parameter.
  • added high-resolution timer to handle absolute timeouts.
  • added flags to waiting and initialization syscalls.
  • kevent_commit() has new_uidx parameter.
  • kevent_wait() has old_uidx parameter, which, if not equal to u->uidx, results in immediate wakeup (usefull for the case when entries are added asynchronously from kernel (not supported for now)).
  • added interface to mark any event as ready.
  • kevent POSIX timers support.
  • return -ENOSYS if there is no registered event type.
  • provided file descriptor must be checked for fifo type (spotted by Eric Dumazet).
  • signal notifications.
  • documentation update.
  • lighttpd patch updated.
Appropriate applications (ring_buffer.c, signal.c, posix_timer.c) and lighttpd patch can be found in archive.

/devel/kevent :: Link / Comments (0)


Hoax or not? New kevent benchmark.


I've completed a bunch of kevent updates including major kevent interface changes and will release new version soon. I've also created some locking changes, which I call speculative locking (i.e. some fields (ret_flags mostly) can be updated without locks, but it should not harm the system), which can be wrong, but results are impressive.

Hardware setup.
Client: Intel Core Duo 3.4 Ghz (run with 3.7 Ghz each core), 2 GB of RAM, sky2 driven gigabit NIC (Marvell 88E8053).
Server: AMD Athlon 64 3500+ 2.2 Ghz, 1 GB of RAM, r8169 gigabit NIC.

Software.
Both client and server systems have this network timewait socket setup:

# echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse
# echo 1 > /proc/sys/net/ipv4/tcp_tw_recycle
# ulimit -n100000
it is the only changes from default startup.

Client runs Apache benchmark utility, which was started with following command line:
  ab -n80000 -c8000 http://192.168.0.48/
i.e. 80k requests total with 8k requests started concurrently.

Results.
kevent about 7200-7400 requests per second
epoll about 4300-5300 requests per second
With 10k concurrence performance drops to
kevent about 4000 requests per second
epoll about 3800 requests per second

it is possible that I made some errors in that benchmarks though, so I will not publish results to linux-kernel@ or make awailable from kevent homepage.

Update: even without speculative lock changes I got the same outperforming results. Kernel is compiled for SMP setup, all debugging is turned off (in my previous run on that platform I had some debugging options turned on, and performance was about the same and even worse for kevent), preemption model is 'Low-Latency Desktop'.

Update 2: forgot to say, that web server is lighttpd-1.4.13.

/devel/kevent :: Link / Comments (0)


Grange's song on OpenBSD hardware mini-hachathon h2k6.




Here is mp3 version (6.1 Mb, renamed to moscow.mp for those who has corporate filters for mp3 files) of the song from Mongol Shuudan band. This song and some others are available from band's homepage - mostly punk style.

/devel/other :: Link / Comments (0)


My J. Michael TR-300S trumpet has arrived.


I have not tried to extract sounds from it yet - I'm in office and I want to get home without major damages, which are unavoidable if I would try to 'play' right now - there are several people here, and they even try to work, so my first lesson is postponed to the evening.

I will start to rape and crack neighbours ears later today at home - likely no one will suffer too much, building is quite new, so there are only about 20 'alive' flats in the whole building (about 350 apartments).
I think my sounds will be in a harmony with sounds extracted from concrete breaking perforators and corner-grinding machines.

/life :: Link / Comments (0)


Wed, 29 Nov 2006

Perfect. Just perfect.


I'm talking about my climbing training, if you have not understood.
Actually there were nothing special from climbing point of view - the same traverses and bouldering, some progress here and there, bits of mixed traces and so on. But the whole training was extremely good - when interesting people smile to you and you smile to them, when you found new moving on the old trace, when body aching after the holds, when ... - all that small bits create excellent mood and I become to think that I just spent perfect time, and actually most of the time (I would even say overwhelming part of the time) I spent was really good.

/life :: Link / Comments (0)


OpenBSD hardware minihackathon is over.


Grange has really burnt them brains - in the brazilian pub he took away a guitar from singer and started to sing russian song "Moscow" with Sergey Esenin words.
I would say that Grange signs not that bad, and especially when he is drunk, and song itself is very good too, but the whole situation made it even better, so he was called 'the most talented man at h2k6'. It is quite possible. One can even find his records on the homepage (in russian).
So, it was very interesting meeting, as we can see.
Among good mood he also got bunch of various hardware, so I hope he will finally start hacking interesting stuff instead of slacking his ... fifth point.

/devel/other :: Link / Comments (0)


Multidimensional trie implementation.


Although long-running test was completed without errors (actually it is not interesting test at all, since 10 rules is a too small set), after some thoughts I've found a bug in the algorithm - I tried to avoid recursive searching in case of wildcard nodes found traversing trie, but it seems that it is unavoidable.
I also added trivial memory leak detector, which found another bug in algorithm - there is possibility to not free next-dimensional roots in some cases.
So I started to change algorithm itself, and although its theoretical ground is fully completed, practical implementation has a place where bug lives - I know exactly what is the bug and where is that buggy place in the code, what are the reasons, actually I know everything to fix it, but do not know how to make a good fix. Yet.

/devel/networking :: Link / Comments (0)


Tue, 28 Nov 2006

Multidimensional trie implementation has been completed.


It is a first and the most significant step in netchannels wildcard implementation. I tested it with different number of dimensions (1-5) and different number of added/searched/removed rules (upto several millions in each dimension).
Algorithm is simple and fast - my previous estimation was incorrect - currently 1 million 5-dimensional insertions take about 6 seconds, 1 million 1-dimension insertions take about 1.1 seconds on my test machine.
When I add search/remove operation (one-by-one, i.e. add one, search for it and remove it), the whole tuple takes about 15 and 3.2 seconds for 5-dimensional and 1-dimensional case accordingly.

I will (probably) create some graphs describing algorithm and how it will be used with netchannels tomorrow, but if algorithm itself is implemented correctly, it is the most significant part of wildcards support fot netchannels.

I have started following testing case for 5-dimensional case for the night: 10 rules are added, then 10 various values (actually inserted values, but with 0xffffffff mask) are searched for, and then all they are removed, let's see how this will end up tomorrow.

/devel/networking :: Link / Comments (0)


New musical toy.


JMichael trumpet

P.S. Do not scream and laugh, but I can not play trumpet - will start to learn.

/life :: Link / Comments (0)


That is what I call good hacking.


Good hacking evening

Other photos from h2k6 hackathon.

/devel/other :: Link / Comments (0)


Mon, 27 Nov 2006

Climbing evening.


After quite empty day (I did not hack, did not work, did not do anything usefull) I moved to climbing zone and spent there excellent time - there were several old traverses completed, couple of new boulderings. The most interesting part was relief traverse - each day I add more and more holds there, although it is marked not as the most complex trace, I only saw one person who tried to work with it (except me), others somehow do not like that trace. unfortunately in Skala-city there are no relief vertiacl traces, which I'm quite sure would be my favourite ones.

/life :: Link / Comments (0)


New Year is coming.


I've started to think about presents.
First of all, present for myself (well, second one actually) - I think I want to start to make photos, so I decided to get Cannon EOS 400D. Not sure if I will really buy it, but at least I think about it.
The best present for myself would be flat repair complete, but we will see...

So, to touch the perfect, I've subscribed to couple of interesting livejournals of photo masters.
There are indeed several types of photos I like - first of all, it is night photos, then probably building/stuff photos (especially from higher floors or roofs) and panorams.
I have extremely small experience in that area, but will try to learn. Maybe.

/life :: Link / Comments (0)


OpenBSD h2k6 hardware mini-hackathon.


Day first (in russian).
Grange did not forget bottle of "Russian Size", but it was seized in Zurich.

As became known from authoritative sources, Grange tries to add support for IBM ServeRAID to OpenBSD.
It looks roughly like this:
Some hardware

Small photo archive of the meeting.

This algorithm describes what is hackathon and what they are doing there:
OpenBSD hackathon

For comparison: Linux network developers conference NetConf is completely different meeting - hackers sing karaoke songs and seems to not drink at all, I never participated there though so can not say for sure. On picture core linux network hackers (David Miller, Rusty Russell and not-yet-a-netdev-hacker Soyoung Park) select karaoke song.

Linux network hackers select karaoke song

/devel/other :: Link / Comments (0)


Sat, 25 Nov 2006

Initial trie implementations for netchannels wildcard support.


I completed 1-dimensional trie implementation and started its wild testing - random prefix and address mask is created in infinite loop and then random address is being searched there - it has passed more than 123 millions insertion/searches and ate about 1.8Gb of ram - each trie node is 20 bytes in size, which would take 2.29 Gb of RAM if they lived as-is, but due to the nature of tries, which allows to reuse the same entries for several prefixes, memory usage is smaller. 100k insert+search operations took always less than 2.5 seconds on my test machine.

Next step, which I will start after weekend, is multi-dimensional trie implementation, which is actually netchannel wildcard support.

/devel/networking :: Link / Comments (0)


Grange has flown to OpenBSD hardware mini-hackathon.


This small party, also known as 'I-can-drink-more-than-you-can-hack' (there is no real evidence that they spend there time with profit for the world or at least for own health), will be spent in University of Coimbra in Portugal.
Grange will hack there probably Dallas one-wire bus, maybe something else. Actually I do not know, why he is going there except that to become drunk with people which live about tens of thousands kilometers away.
I hope he did not forget to get one bottle of 'Russian Size'.
So, let's wish them nice time there.

/devel/other :: Link / Comments (0)


Fri, 24 Nov 2006

The main development words.


Rusty Russell's "Hanging Out With Smart People".
Hanging Out With Smart People

/devel :: Link / Comments (0)


New books.


I bought yesterday evening two books about history of two interesting states:

  • "History of Russia. From the ancient times to 21 century." A. N. Saharov (Russain Academy of Sciences).
  • "All under heaven. A complete history of China." Rayne Kruger.
Work and other is postponed - I'm going another world.

/life :: Link / Comments (0)


Kevent last words.


Ok, after I was requested to change interfaces again to the state they were in the previous release, I've thrown the last words - I'm waiting for about one week and listen for other developers to comment on the interfaces, if no one will comment, I will gather all the latest suggestions, implement them and make the new release - no interface changes after that day.
And I do not care if there will be some problems and/or some people do not have time to review all my releases or something else - I have time to implement them on per-week basis (I could create new release every day actually), so there is plenty of time to review for those who cares.
If you have something to say about kevent, your opinion on its inclusion/declining or on my behaviour and so on, it is the right time to say.

Thank you.
I'm going to implement netchannels wildcards - it is much more interesting than spending hours on completely void tasks.

/devel/kevent :: Link / Comments (0)


New kevent features scheduled for the next release.


  • struct timespec and possibility to have absolute timeouts in syscalls instead of simple number of nanoseconds.
  • signal mask parameter for syscalls to allow temporal blocking of the signals while syscall is executed (almost, but I'm still trying to show that it is stupid idea with kevents).
  • kevent posix timers support.
  • allow kevent to mark any events as ready from userspace.
  • allow to break any waiting syscalls.
  • various errno return values update.
  • various interface changes, like addition of some new fields, and removal of otheres, which will not affect functionality behind that syscalls.
  • remove some locks in the hottest path - it is indeed needed task. I plan to refactor searching/modifying code so that there were places in the struct ukevent which are never modified by another than caller users (like simultaneous changes in readiness state machine and searching part), so there would be possible to only have one lock in the readiness state machine around ready queue.
Call me a looser, but although I'm strongly against first two entries, I likely will add them to make at least some progress on kevents.

Crap, I spent 2.5 hours sending e-mails into empty discussion. I wish my day would have not 36, but 48 hours...

/devel/kevent :: Link / Comments (0)


Thu, 23 Nov 2006

Kevent posix timers support.


Early morning hack has been sent to mail lists for review.
I have not tested it (except compilation test), and frankly do not want, but if my flow of thoughts is correct, I will create simple application, which will use timer_create() and friends to test kevent posix timers support.
Implementation (and design of POSIX timers API) does not allow to have KEVENT and SIGNAL or THREAD delivery mechanism simultaneously, at least I concluded it after quick look into the code, which is probably is not correct.
Let's see if my patch is good or not.

/devel/kevent :: Link / Comments (0)


Wed, 22 Nov 2006

Climbing evening.


That was quite easy training - I ran several old traverses and couple of bouldering, also created one traverse, which previously contained green holds, but most of them were removed or replaced with time, so it was a new trace, that's all - nothing special this day.
The most pleasant part was sauna and shower which definitely were required by development related dirtiness.
Now I feel mysef just bloody excellent.

/life :: Link / Comments (0)


New features scheduled (implemented) for the next kevent release.


Most of them are implemented in the local tree, I'm just waiting for ack from Ulrich Drepper:

  • new command, which allows to mark any events as ready. It is also possible to just wake up listener of the given kevent queue if number of events requested to be marked as ready is zero. It allows glibc to notify other event processing threads, which are parked in the waiting syscalls like kevent_wait() or kevent_get_events(), that they can process new events in cae of originally awakened thread was terminated.
  • fixed unsufficient parameters check in kevent pipe notifications (spotted by Eric Dumazet).
  • return -ENOSYS if there is no registered event type instead of -EINVAL.
  • add 'flags' parameters to kevent_init() syscall (not implemented yet).

Actually looking at how kevent makes fast progress and how slowly I get positive feedback and how complex are negotiations about features and requirement addons are performed with some people, I seriously doubt there will be enough resources to really complete the task in the way I want and think is the best for future development.
I have several possible ways to solve existing (stagnating) problem with APIs and feature addons:
  • just implement all horrible workarounds and ugliness I'm asked about, which I think will allow quite fast kevent integration into mainline.
  • continuously perform seems to be endless negotiations with mutual insults in linux-kernel@. This will not force kevent inclusion, and likely will not end up with it at all, but there is probability (I think it is somewhere between zero and void), that I can convince people that I am right. Practice shows that it is hardly to convince people, which do not hear what you talk to them, no matter correct it is or not.
  • say that 'I do not care and will not even discuss all requested utterly damaged ideas' or something like that and stop discussion. It can or can not help with kevent integration, but in any case I at least will have enough time for more interesting tasks than empty talks in linux-kernel@. For example netchannels wildcard support and fast NAT implementation, zero-copy support for netchannels, asynchronous IO, threading library, log-structured filesystem
The more I think, the more I like third variant...

/devel/kevent :: Link / Comments (0)


First part (about one fourth) of the hinged ceiling is completed.


Something like that.
I screwed it yesterday evening, and can say that although making hinged ceiling alone is not that simple task, it is quite doable.
Next task is to perform second layer, it will be lower and take more square, but it is task for another Sunday.

Hinged ceiling setup

Will create better photos when it is ready.

/devel/flat :: Link / Comments (0)


Tue, 21 Nov 2006

New kevent 'take25' release.


Short changelog:

  • new (old (new)) ring buffer implementation with kernel and user indexes.
  • added initialization syscall instead of opening /dev/kevent.
  • kevent_commit() syscall to commit ring buffer entries.
  • changed KEVENT_REQ_WAKEUP_ONE flag to KEVENT_REQ_WAKEUP_ALL, kevent wakes only first thread always if that flag is not set.
  • KEVENT_REQ_ALWAYS_QUEUE flag. If set, kevent will be queued into ready queue instead of copying back to userspace when kevent is ready immediately when it is added.
  • lighttpd patch (Hail! Although nothing really outstanding compared to epoll).

  • And special blog edition changelog entry (interesting, do people read my patches to detect this lyrics? Let's see...):
  • small three-line verse (I think I created not bad step for non-english speaking person) about one-line optimisation. Casted by Shakespeare.
I installed slightly used, but still functional (bought on ebay) remote mind reader, and set it up to read Ulrich's alpha brain waves (I hope he agrees that it is a good decision), which took me the whole week. So I think the last ring buffer implementation is what we all wanted. Details in documentation part.

/devel/kevent :: Link / Comments (0)


Hinged ceiling installation.


That is how I spent Sunday.
Weight of that part is about 20 kg, it was challenging to lift and fasten the detail and it is still in a hung state, since I do not have long enough screws. But I will complete it soon, maybe even today.

Hinged ceiling setup

/devel/flat :: Link / Comments (0)


Mon, 20 Nov 2006

New kevent release.


It will be 'take25', but I have not released it yet, since it is in testing mode right now.
This release will include:

  • new (old) ring buffer implementation with head and tail in userspace buffer with two syscalls to wait and to commit events (the latter becomes just an index update).
  • additional feature of the ring buffer - overflow counter to prevent situation when two threads are going to free the same events, but one of them was scheduled away for too long, so ring indexes were wrapped, so when that thread will be awakened, it will free not those events, which it suppose to free.
  • wake-all-threads flag - when it is set all interested threads will be awakened, otherwise only first one.
  • initalization syscall and removed /dev dependency.
  • small three-line verse (I think I created not bad step for non-english speaking person) about on-line optimisation. Casted by Shakespeare.

I wil probably include there userspace notifications.

Did I say that httperf is unfair benchmark?
It definitely is, and actually all existing kevent results are unfair. So...
Apache ab benchmarks shows kevent win, but I do not know how fair it is and how it works at all.
It is also possible, that my epoll based server is broken too.

So, let's state for now, that my benchmarks can not show kevent superiority in socket tests.
PIPE/socket test, created by Eric Dumazet proves kevent works better that epoll.
Johann Borck's tests show that kevent slightly outperforms epoll in web benchamrks.
But not mine.

So I have not gone to climbing zone (and I'm dirty as pig after weekend hinged ceiling setup) and started lighttpd patch to create real-world benchmark.

Here we go.
I've patched lighttpd with kevent support (although it is limited, since lighttpd does not support edge-triggered notifications and immediate events).
Server runs on Xeon 2.4 Ghz with 1 GB of ram and e100 NIC.
Client uses apache ab benchmark, which was started with following parameters:
  ab -n50000 -c5000 http://192.168.4.78/
i.e. 50k requests with 5k concurrent requests. Document size is 699 bytes to not make bandwidth a limiting factor.

Here are the results:
lighttpd_kevent:  4902.17 req/sec
lighttpd_epoll :  4843.71 req/sec
With 6k concurrent requests rates dropped down to:
lighttpd_kevent:   4307.18 req/sec
lighttpd_epoll :   4027.34 req/sec
although it can be just a file/socket rollup.
So essentially kevent socket code behaves like epoll in real-world benchmarks, as long as allows a lot of other interesting features.

/devel/kevent :: Link / Comments (0)


New district and shops.


I live in a new district which is only became to be inhabited.
There is a bus station near the buildings and generally there are a lot of people there and obviously it is a good place for small shop or stall which would have cigaretts, beer and other such stuff.

And what would you think is the first shop opened there?
It is flavour shop!
I love my country.

/life :: Link / Comments (0)


Sat, 18 Nov 2006

New generic wildcard search design.


Well, it is not new, but it is different from what was described before. And although previous variant has higher lookup speed, it requires too much memory, so it is not acceptible.

New one is quite simple - it will be implemented on top of trie structures. Main feature of tries is that they are quite compact (in case of PATRICIA tries) and allow W^d lookup speed in the worst case, where W is width of the interval in bits (i.e. 32 bits in current netchannels implementation, which does not support IPv6 due to my laziness and absence of requirements feedback from the community), and d is number of dimensions (5 currently), which results in very bad number of 33.5 millions operations in the worst case of (2^W)*(2^d) == 2^(W+d) (137.5 milliards) rules.
Actually with netchannels it will be much smaller since only two dimensions use full 32 bit space (source and destination addresses), while others use either 16 bits (source and destination ports) or 8 bits (protocol number), so it will be something about 2 millions operations.

Basic idea behind tries is following: trie is a simple unbalanced tree each node of which represent only one bit (i'th level of the tree represent bits in i-1 position in the searched value), so it's maximum depth is 32 (in PATRICIA trie several bits can be combined into the same node if there are no other rules, which have different bits in that positions). In that case rule 192.168.0.0 - 192.168.0.255 will be presented as 0b11000000101010000000000000000000(192.168.0.0) - 0b11000000101010000000000011111111(192.168.0.255), so actually trie will be stopped at bit 24 (counted from the most significant one) and the rest will present a wildcard (0-255 in the least significant byte in host order) or it will contain additional nodes if there are additional rules which has bits in that range, for example 192.168.0.0-192.168.0.20.
Each direction is a base for own tree.
Each node contains a pointer to the subtree of the next-dimension tree, where its rule starts. So when each node is examined (i.e. each bit is checked) in the first dimension and bit is found to to belong to some rule, next dimension is checked from the bit position specified in the node and this is being done recursively for each dimension. If in some dimension requested bit is not found in the trie (where it should be by the previous dimension rule), it means that there are no rules for requested values.

In the previous memory-hungry design, if rules do not have ranges but only single tuples (i.e. single source/destination IP address and single ports and protocol), or if all rules present the same range (for example all port rules have the same 1-65535 range), it can not lead to the described problem with memory usage. The more range overlaps rulset has, the more memory such algorithm will use - in theory, when each new rule requires interval split, i.e. when each new rule overlaps with all previous and has additional range (for example when each new rule is slightly bigger than previous one), it is O(d*W*(N^d)) where N is number of rules and d is number of directions, W is width in bits of each dimension (actually it can be hidden by O(), since it is a constant for different set of rules).

/devel/networking :: Link / Comments (0)


I wish my day would have 36 hours.


So I could quickly complete flat development, moved my laptop to HP repair (its disk fails after couple of hours of work) and started to work more than currently.

Likely if you do not see entry in the blog, that means that I'm too busy at paid work (or even unavailable at all) to work on my own projects.

/devel/other :: Link / Comments (0)


True asynchronous IO.


All existing AIO - both mainline and kevent based lack major feature - they are not fully asyncronous, i.e. they require synchronous set of steps, some of which can be asynchronous. For example aio_sendfile() requires open of the file descriptor and only then aio_sendfile() call. The same applies to mainline AIO and read/write calls.

My idea is to create real asyncronous IO - i.e. some entity which will describe set of tasks which should be performed asynchronously (from user point of view, although read and write obviously must be done after open and before close), for example syscall which gets as parameter destination socket and local filename (with optional offset and length fields), which will asynchronously from user point of view open a file and transfer requested part to the destination socket and then return opened file descriptor (or it can be closed if requested). Similar mechanism can be done for read/write calls.

This approach as long as asynchronous IO at all requires access to user memory from kernels thread or even interrupt handler (that is where kevent based AIO completes its requests) - it can be done in the way similar to how existing kevent ring buffer implementation and also can use dedicated kernel thread or workqueue to copy data into process memory.

It is very interesting task and should greatly speed up workloads of busy web/ftp and other servers, which can work with a huge number of files and huge number of clients.
I've put it into TODO list.

Someone, please stop the time for several days, so I could create some really good things for the universe.

/devel/other :: Link / Comments (0)


Fri, 17 Nov 2006

Climbing evening.


It was quite easy training, since there were crowds of people, which usually means, that it is quite problematic to climb horizontal traverses. But nevertheless I completed several old traces and started parts of the relief traverse (longer and longer each training, which is a good sign). At the end I also completed couple of start of complex traces on negative slope and found an interesting bouldering (read: lurked in climbing school training which happend near the place where I had a rest), but failed to complete it since was tired already.
Sauna, shower at climbing zone and a little beer with Gabriel Garcia Marquez book at home completed the day - yet another good day has gone...

/life :: Link / Comments (0)


Netchannels wildcard design and implementation.


It was first described here, but I did not present memory usage for that scenario.
After some calculations I found it is about O(d*(N^d)/4 + N) where N is number of netchannels in the whole tree and d is number of dimensions we search in. With 5 dimensions (source/desctination address/port and protocol number) it becomes completely unfeasible.

And after my implementation completed (although it requires some cleaning now) I've found that memory limit on practice.

Presented design allows extremely fast searching, but it is too memory hungry, so I will modify it to not replicate sub-trees of intervals in each interval, but instead to have different kind of trees which will reduce memory consumption down to O(d*N) instead of above O(d*(N^d)) and increased search time.
So, stay tuned...

/devel/networking :: Link / Comments (0)


Wed, 15 Nov 2006

Theatre evening.


I've seen Eugene Grishkovec's 'Po Po' performance created under expression of Edgar Po works.
Performance is a compilation of many novels written by famous author, all of them are easily detectible.
Grishkovec plays with Alexander Cekalo, they call it 'fun story about the death' and it is really so. I would even say that Grishkovec plays second role in that single-actor monologue performance.
There are some opinions that Grishkovec has star fever, but I would not say so - neither during his presentation before the performance, nor during the whole act he showed any signs of it, I managed to get tickets in the first raw, so I saw actors right in a couple of meters. Even after performance, when we met Grishkovec on the street with his girlfriend and asked couple of questions and presented a flower he behaved very good.
I like his performances, they are really different from the mos others, but unfortunately he plays quite rarely in Moscow. I hope I will see his 'Drednouti' ('Dreadnoughts') performance this season.

Later we got couple of beers in 'Vobla' beer restaurant and moved home. I think the whole day indeed succeeded, although it looks I caught a cold in addon to my current sick...

/life :: Link / Comments (0)


Packets in alternative worlds of netchannels.


Do you have an excellent spatial imagination?
Can you consider yourself and easily operate say in 3-dimensional world?
Likely you can, since we live in such a world. Probably we can think about 4-dimensional world too.

But can you describe to someone 5-dimensional world?
Or can you put yourself into one of the N worlds, where N is potentially millions, each of which has number dimensions smaller than previous world by one and upto N subworlds in each dimension? And while you - poor walker - are searching for the shortest way to the destination point in that brothel, someone looks into your universe through the glas and loudly lough counting your times clock.

I've just described couple of moments in a single network packet live, when it enters netchannels subsystem with this wildcards design.
It looks like it will have complex live, but we will see, since I've implemented generic multi-dimensional tree hierarchy and will complete searching algorithm and start testing tomorrow.

And now I want to finish my fantastic narration, since I'm going into a theatre.

/devel/networking :: Link / Comments (0)


Reading S. Minaev "Duhless" (aka "Spirit less").


It is extremely 'trendy' book, here one can find a chapter called 'Office'.

Crap. Complete crap - some kind of cool artistic slightly squeamish flood of mind(less).
One can probably think that it is trendy to read it, that it is cool, it is another view on our usual life and so on...
Maybe it is right, but I think it is a crap created to show 'how different author is from all those stupids'.

There are quite a few fun moments though - probably life of the top management really looks like it is described.
"What about putting the sea or beach or wild nature against a background of a woman".

But why so declamative? Why so depreciatingly? Crap.

/life :: Link / Comments (0)


Tue, 14 Nov 2006

Delayed allocation and fragmentation.


Quotation from Andreas Dilger:

Essentially, delaying the disk allocation until a file is large/complete
(delalloc) and then using a buddy allocator in memory to get contiguous
chunks of disk is better than hard 64K+ cluster because it avoids
internal fragmentation and allows much more optimal/efficient placement
than just a factor of 16 reduction in the block count.
This completely confirms my thoughts about delayed allocation.

But I want to add even more - log-structured file system, i.e. file systems, which allows to write updated block essentialy everywhere on the disk instead of the place where previous one was placed, have self-defragmentation mechanism by design - there is no need to reallocate inodes and move files all over the disk - new version can be put in new location, which can be carefully selected and can be written together with unupdated part thus removing fragmentation. Journaling filesystems require defragmentator for that task.

/devel/fs :: Link / Comments (0)


Why trees are better than binary search.


For netchannels wildcard support I need to create set of non-crossing intervals which will be examined for each packet. It is possible to put them one-by-one and thus form an array of intervals which can be traversed using binary search, but there is a major problem when we will add new interval, which should be placed somewhere inside the existing array - that will force array reallocation and copy. This problem also applies to hash tables and actually all other t