|
|
About ::
TODO ::
Blog ::
RSS ::
Old blog ::
Projects ::
GIT ::
Gallery ::
Notes
Sat, 30 Dec 2006
Day of uncelebrated birthdays with forest party.
Today we celebrated Yuliana Silich (former Sviridovskaya)
and Perec (Pavel Sherstnev) - they decided to make a party in Dolgoprudny forest
with fire, vodka and shashlik. And they did not lose - it was great time,
although quite hold especially later, but eventually we heated and
it had become quite warm. I met old friends which I did not see
for several years already, some of them were in the abroad, some of them
just did not appear.
Later party continued in Yuliana's house - we drunk as far as I recall upto 3 A.M.
with Mephody - we talked about life, work, future plans and the whole situation -
eventually we concluded that although I work for small salary, but with extremely
cool technologies and this is essentialy work for myself, although
he works with interesting ideas in Dell, but he is just a small,
although quite significant, screw in the someone other's machine.
That was great time with friends, which smoothly flows into New Year celebration.
Happy New Year!
/life :: Link / Comments (0)
Fri, 29 Dec 2006
Climbing evening.
It was quite new trainig - I climbed on the vertical walls and on negative slope
second time for the last three months. My insurance partner was instructor Anya,
who helped a lot and even tried to push me up on the final trace - but after four
quite complex old traces, combined into series of two in one run,
I was too tired to complete on-sight new one on negative slope
(start is almost horizontal wall, but next part was quite easy on vertical wall),
so I miserably failed several times on start until helped myself with other holds.
It as quite new feelings to climb on vertical walls, but I found that I perform
quite good and make some new interesting movings trained 'on the ground' with
boulderings.
My next training will be only in a week or so - climbing zone is closed for vacations.
/life :: Link / Comments (0)
New netchannels 'take20' release.
Short changelog:
- process NAT directly in enqueue callback instead of postponing
work to kernel thread
- fixed obsoleted route update
- fixed netchannel queueing locking
/devel/networking :: Link / Comments (0)
New kevent 'take30' release.
Short changelog:
- new private userspace notifications - allows to queue any userspace private
event and then mark it as ready using kevent_ctl(KEVENT_READY) command
KEVENT_REQ_READY flag - if set kevent will be marked as ready at enqueue time
(allows for example to send a signal to process through kevent subsytem)
- port to 2.6.20-rc2 tree (54abb5fcdae74a811ed440ec6556cabc6b24f404 commit)
- use
struct kmem_cache instead of kmem_cache_t
- added notificaion type into search key, this allows to have the same id for
different types of notifications
Enjoy!
/devel/kevent :: Link / Comments (0)
Thu, 28 Dec 2006
Ireland mug Mephody meeting party.
I organized Mephody and Ira meeting today - we
spent a great time in "5 oborotov"
restaurant with a lot of old friends - there were
Mephody (btw, his name is Alexander Boykov) and his wife Ira,
Fedor and Ira Schtrom, Perec, Lyasha with wife Olga, white John
(Burnyakov Evgeniy) and me.
As Mephody says, he visits Russia to remove his Europe depression -
everything is so damn good and right, that one wants to blow it up -
and Russia is completely different - dirty streats, bad weather, all with
problems - extremely cool place to live compared to swamp-like table Europe.
/life :: Link / Comments (0)
Wed, 27 Dec 2006
Climbing eveninig.
It was just bloody perfect training - a lot of new
small and bigger traverses and boulderings, many
old ones, talks with new people, theirs traces...
I did not have such training quite for a while alreday -
there were no negative moments at all. I think life becomes
better and longer with each such evening.
Later I spent several hours preparing to Mephody's meeting -
last tickets check, New Year booking and so on.
Now everything is good. I mean EVERYTHING is just bloody good.
/life :: Link / Comments (0)
My vacation timetable.

See how monkey screams "Stop working!". I hear it right now.
I start celebration today evening after climbing training
with Schtrom's family (they got theirs keys for new flat), then tomorrow with
ireland mug Mephody, Irina and friends. Friday will be short working day
with climbing training.
Saturday will be devoted to non-celebrated birthdays - I will be in Dolgoprudny
forest with friends, fire and vodka. Sunday will be (started) spent in
"5 oborotov" new year party with friends.
I will be back to normal life somewhere about Jan 4-5.
Happy New Year!
/life :: Link / Comments (0)
News about kevent.
Jonathan Corbet (LWN editor) has published
an article about kevent in
LinuxWorld, which has reached me via
Linux today news.
As a side note, kevent homepage
got the first place in the Google's rank for 'kevent' word.
it becomes popular, and major thing kevent lacks is
true AIO,
which will be started very soon.
/devel/kevent :: Link / Comments (0)
Mon, 25 Dec 2006
Climbing evening.
It was a good training - all traverses and boulderings were quite old,
but I started to combine them, so it was hard traces. I almost
completed relife traverse over the whole climbing zone - there is
only one place where I do not know how to move (well, I have some ideas
on that place, but theirs testing will absolutely end up with failing,
which will not be that pleasant since there is only small floor-mat there).
Actually I failed to complete the whole traverse, but I did not stop
at the middle when ars started to tell me that something is not usual,
and they feel itself a bit uncomfortable - probably at the moment
I had quite monster view (wry face due to pain, heavy music from head-phones,
likely I produced some non-human roars trying to stimulate
myself to complete complex or hard movings), since all people in opposite
direction quickly moved away...
Anyway, it was good training.
Climbing is quite addictive and I became a victim.
/life :: Link / Comments (0)
New netchannels release is coming...
It is inevitable like New Year.
I've fixed bug, introduced in take19
release and plan to put new version to the wild.
I've also setup new machine with gigabit NIC, so I will test NAT performance
and then release new version.
This is scheduled to tomorrow, and now I am, dirty as pig after yesterday
flat development, going to climbing zone for my regular (almost daily)
dose of good mood and shower.
/devel/networking :: Link / Comments (0)
Sun, 24 Dec 2006
Meanwhile on flat development side.
I've completed my hinged ceiling. It is not 100% ready yet,
but about 90-95% - small board is created,
almost all ceiling seams are filled with plaster, all
parts are combined together. The only things to complete
are: dotty lights setup, plaster filling complete and final
postprocessing (includion the rest of the ceiling).
I never worked before with stucco cardboard, but ceiling looks
really cool - it has two rounded layers with boards, the whole covered
area is about 8-10 square meters, distance between layers and
main ceiling is about 11 and 23 santimeters accordingly.
Hinged ceiling will contain dotty lights over the area
and neon cord in the board.
Next thing is to complete the whole ceiling final plaster postprocessing -
it will include three layers of plaster (and one stucco ground already filled) -
ground, finish plaster which will form the paint and finall fixative colour wax.
This is quite simple work, but getting into account the fact, that I only
perform my flat development each Sunday, this can take some time...
And I'm still in doubts about bathroom - I do not have bricks to
complete my idea about small podium for handmade shower cabin, so probably
I will fill the bathromm with ceramic tiles and then will setup usual shower cabin -
we'll see...
/devel/flat :: Link / Comments (0)
Sat, 23 Dec 2006
New kevent 'take29' release.
Short changelog:
- optimized
af_unix to use socket notifications
- changed
ALWAYS_QUEUE behaviour with poll/select notifications - previously
kevent was not queued into poll wait queue when ALWAYS_QUEUE flag is set
- added
KEVENT_POLL_POLLRDHUP definition into ukevent.h header
- libevent-1.2 patch (Jamal, your request is completed, so I'm waiting two weeks
before starting final countdown :)
All regression tests passed successfully except test_evbuffer(), which
crashes on my amd64 linux 2.6 test machine for all types of notifications,
probably it was fixed in libevent-1.2a version, I did not check.
Patch and README can be found here.
libevent is very BSD specific with a lot
of workarounds for epoll, /dev/poll and finally for
kevent,
especially for signals in the first two, but kevent does not require it - this is another
argument in favour of absent unneded sigmask parameter for kevent.
Kevent support for libevent can be further optimized though,
current code adds event when appropriate callback is invoked, while
kqueue
postpones it to dispatching time - they can use it since kqueue() contains
additional array of events, which are considered as changes to existing ones. It is
possible that kevent addition could be postponed too.
All kernel kevent options must be turned on (namely CONFIG_KEVENT_POLL, CONFIG_KEVENT_SOCKET, CONFIG_KEVENT_PIPE).
I did not hack 'configure' to check for supported notification types.
Call me lazy boom slacker.
/devel/kevent :: Link / Comments (0)
Fri, 22 Dec 2006
libevent has been ported to kevent.
In the most simple way - it currently only supports signal notifications and
poll for all others, which should be optimized for sockets and pipes.
It is good library indeed, although it is very
kqueue()
centric.
And it does not contain any documentation about internal structures (i.e. there is no
documentation for those who want to add new event dispatching mechanism into library),
although there is a good man page for users.
And it is made by OpenBSD developer Niels Provos,
this OS, if you do not know, claims that it has the best documentation ever
(as a side note: no matter how good documentation is, my work with mbuf
in its network stack completely killed _any_ possible wish to
work with its kernel in future :) ).
Fortunately there is a source code for kqueue(), so
kevent poring was easy.
Porting uncovered interesting 'feature' of one of the kevent flags - KEVENT_REQ_ALWAYS_QUEUE,
it was added to workaround applications which can not deal with the fact, that
event can be immediately ready and thus will not be queued (like lighttpd),
in case of poll/select notifications and this flag being set, kevent was not added
into wait queue for given file and thus will never be processed later,
which leads to impossibility to work with this flag and poll notifications.
Another issue is that set of regression tests in libevent
crashes on every event dispatching mechanism I tested (poll/select and epoll) on my
AMD64 test machine, but I have not looked deeply into this issue.
I will fix poll/select notifications and KEVENT_REQ_ALWAYS_QUEUE flag issue
and release new kevent version and library port tomorrow. I will also include
socket/pipe optimisation into library port (not supported right now).
/devel/kevent :: Link / Comments (0)
Thu, 21 Dec 2006
New netchannels release and scalability tests.
I've release version #19 of the
netchannel subsystem,
which contains API changes required to run
userspace network stack.
This version contains locking bug which leads to crash on netchanel removal
when a lot of them were created.
Scalability testing.
Script has created 22k wildcard (only one dimension (source address)
is wildcard to save some mem) NAT rules
(16k $rand.$rand.$rand.$rand/255.255.248.0 rules and
6k $rand.$rand.$rand.$rand/255.255.255.$rand).
I.e. it is equal to 22k following netfilter rules:
iptables -t nat -I POSTROUTING \
-s $rand.$rand.$rand.$rand/255.255.248.0 \
-d $static/$some_wildcard_mask \
--sport $static_num/$some_wildcard_mask \
--dport $static_num/$some_wildcard_mask \
--proto $static_num/$some_wildcard_mask \
-j SNAT --to-source $static_num
$static_num and $some_wildcard_mask are different on each line, but are
the same for different commands.
Raw socket and netchannel send/recv performance over gigabit link is the same as described
previously.
I can not test NAT processing speed itself (I do not have enough test
machines :).
Userspace trie tests
(described here)
included 1-2 millions multidimensional wildcards insertions/searches.
/devel/networking :: Link / Comments (0)
Userspace network stack over new netchannels performance.
Sending:
- 128 bytes writing:
- netchannels: 27-28 MB/sec, 20-30 % CPU usage
- sockets: 7-8 MB/sec, 80-90 % CPU usage
- 4096 bytes writing:
- netchannels: 27-28 MB/sec, 20-30 % CPU usage
- sockets: 30-31 MB/sec, 30-40 % CPU usage
Receiving:
- 128 bytes reading:
- netchannels: 70-71 MB/sec, 80-90 % CPU usage
- sockets: 24-25 MB/sec, 80-90 % CPU usage
- 4096 bytes reading:
- netchannels: 73-74 MB/sec, 80-90 % CPU usage
- sockets: 79-80 MB/sec, 80-90 % CPU usage
New version of the userspace network stack
is available in archive.
/devel/networking :: Link / Comments (0)
Kevent will celebrate its first birthday (one year) in a two weeks (Jan 6).
I will try to start and get some results about
true AIO for that date.
Btw, due to the fact that kevent stalls, its inclusion into lighttpd
official tree is postponed/dropped.
Update: I've put a task to patch libevent
to support kevent into my schedule - site is currently down, but I hope it will be ready soon,
so I can complete this task this week.
/devel/kevent :: Link / Comments (0)
Kevent patchset has been resent again.
This is second resending of 'take28'
patchset - the last one will be sent either saturday or monday.
Due to complete lack of feedback (oh, no, Ulrich Drepper said that kevent build fails
on his machine, but did not present actual files and did not confirm that
it was fixed by 'make prepare' command, so it can not be counted).
Jonathan Corbet has published an excellent article
in the Dec 13 LWN edition about kevents.
/devel/kevent :: Link / Comments (0)
Wed, 20 Dec 2006
Climbing eveninig.
It was first time for the last three months I climbed
on the walls and not traverses or boulderings. I held
a girl from DDS (which is closed, so there are enourmous crowds of people
in Skala-city), which actually most of the time were in hinged state,
so the whole training was not that active.
I completed three old traces and found that although
I can do the things, it was not that easy as I expected,
but nevertheless it was good training although easy and short (if number
of minutes I spent on the walls are calculated).
Next time unknown girl will ask me to insure her, I should ask
if she permit me to climb a bit more than 3-4 traces :)
/life :: Link / Comments (0)
Network address translation (NAT) works over netchannels.
One can add/remove source/destination NAT using
connector (Documentation/connector/ in your source tree).
NAT over netchannels uses two aditional netchannels per dataflow
(input and output from NAT server point of view)
and one main (what administrator sets up using source/destination rules)
as I described previously.
It uses the same tries as other netchannels (userspace or others),
so only one lookup is performed to transfer packet.
Packet processing happens in process context on behalf of special threads,
dedicated specially for netchannel processing, when packet enters
netchannel stack and netchannel has been selected, it is possible
to schedule its processing either to current thread, or to
thread on other CPU.
Netchannel searching is lockless (protected by RCU), skb queueing is proceted
by spinlock.
So, brief list of netchannel features:
- multidimensional wildcards support
- RCU searching
- single multidimensional trie for different kinds of dataflows
- dedicated processing threads with possibility to schedule
processing on different CPUs for those netchannel types which are not acked with processing context
- userspace netchannel backend (allows to receive packets to userspace), which can be used for:
- high-performance sniffers
- tun/tap device replacement
- packet socket replacement (note, that netchannels steal packets from main stack)
- userspace network stack implementation
- own protocol stack implementaion (from VPN tunnels to TOE)
- netfilter netchannel backend (only NAT is supported as the most interesting user, NAT caches appropriate route,
so essentially routing becomes part of the netchannel trie)
Some testing was done in 'emulator', i.e. pre-netchannel userspace multidimensional trie implementation.
Results are here.
This version (netchannels.18) can be found in archive.
it has been also sent to netdev@ for review and comments.
/devel/networking :: Link / Comments (0)
Tue, 19 Dec 2006
Netchannels and NAT.
Only because of drink evening and korean stories
I have not completed NAT support for netchannels, but I will do it tomorrow or in a day
(if climbing training will not be that hard).
/devel/networking :: Link / Comments (0)
Collegue has returned from Korea (Samsung).
First impression - they drink. No, they do drink. Even, they DO DRINK.
That's a good news if I will ever visit that country.
He has brought us a box (about 20 small bottles of 0.3 litre) of super-puper
known product called 'Soju' or something like that - it is korean vodka,
which has only 20.1 degrees compared to russian 40, but nevertheless is quite strong.
We drunk a littel and listened a lot of fun stories about koreans
(I'm sorry if it will be a bit unpleasant, but what would you expect
from big white monkeys?).
First of all, theirs language. From russian point of view it is fun.
For example they lived in big hotel somewhere on the sea-side (15 minutes from Seoul),
and every time they entered a lift (in Samsung) some movie was shown -
how many degrees must be in different bows and what are other rules
of decencies...
After several such movies they started to use interesting korean phrases as toasts
in korean restaurants (well, they did drink a lot there).
So, the most interesting (I managed to recall after couple of bottles of theirs 'Soju')
phrases which sounds specially in russian:
- 'ebazie' - something like 'hello' when you answer a phone call.
- 'humida' - something like 'to be' (actually it is pure russian pronuncation, used for
phrases like 'that thigs', i.e. 'eta humida').
- 'ebti' - something like 'next door' - does not require a translation for russians,
we will forget it in a minute (I specially wrote on paper), since it sounds too fun.
So, imagine several big white monkeys sitting in the korean restaurant, which drink
for 'Hello', drink for 'Doors are closing', 'Next door' and 'to be' words, i.e.
one of them stands and starts using grave voice:
Let's drink for the next door.
Collegue has told interesting story about sea products and how they eat it.
In Russia we sometimes exit the city and go 'po gribi - po yagodi', i.e.
for mushrooms and berries, and we eat berries right in the forest.
Koreans do the same with sea-livers - consider a family, which
has a good car(s), which moved to the sea-side from Seoul to get some fresh
monsters - after low tide they get alive ratchets and eat them right on the coast.
If ratchets want to run away and dig into the sand, they have special small spades
- just like russian mushroomers have special knifes.
Another story was about _alive_ octopus being cookied right for them,
so even after its skin was removed and it was sliced, its parts still moved
(even in your mouth). Or some other alive, which cried when you eat it.
They start to drink thursday evening (at least in Samsung), they drink a lot, but automagically friday morning
they are fresh (although each evening they are so tired that can sleep on neighbours shoulder
in the bus), then exactly the same happens firday and saturday (including major drinks
and parties), but sunday they get themself and go shopping (but can sleep in the shop).
It looks like they live completely different lives than we.
I think I would like to visit Korea.
/life :: Link / Comments (0)
Elton John sings oven instruction.
Excellent. Although I do not like his songs, but it is cool.
Audience is very interesting.
In russian I would say 'Zhest' kak ona est'.
/other :: Link / Comments (0)
Mon, 18 Dec 2006
Climbing evening.
Today we took a film action. Actually not film, but "Usage instruction"
TV show - two girls as presenters, operator, some other people...
I hope I was not screened, but eventually they stopped on my traverse way, so I
couple of time asked them to move (very civilly indeed).
Among this it was quite good usual training - old traverses and boulderings,
one or two new one - I started to try 8b/8c start - obviously I can not climb
such complex trace, but I try to make several start holds - quite successfully though.
/life :: Link / Comments (0)
Refrigerator or flatiron?
Who will win if turned on flatiron will be closed in the refrigerator?
One man decided to check and closed faltiron in the refrigerator,
after about 30 minutes, when flatiron almost completely defreezed
refrigerator, a very strong steam in the inner area cracked something
in flatiron and it had died, refrigerator got the situation
under own control and iced the door.
Draw?
"Who I am, where I am, what is my address, where is exit?"
One day of DHCP packet's life.
- What is galvanic resistance?
- It is rise of the batteries.
Got from bash.org.ru (in russian).
/other :: Link / Comments (0)
Sun, 17 Dec 2006
Initial work of netchannels NAT.
Source NAT works in one direction - i.e. it is possible to send
packets from internal network to the outside using address of NAT machine
as a source, but packets are not received yet (actually they are,
but no address changes happens).
/devel/networking :: Link / Comments (0)
The first resending of the 'take28' kevent release.
Since I got no feedback about three previous releases and one resending,
I started decrementing counter of resendings - when it hits zero, I will
stop resending and move kevent
into maintenance mode without further pushing it upstream and will
concentrate on more interesting work than sending mails into
non-feedbackable black hole called linux-kernel@.
Situation with kevent becomes idiotic from fun - a lot of people
want that functionlity, kevent works faster
than epoll, there are patches for real-life users, implemented
tons of functionality which allows kevent to scale to hell,
and things stopped.
So, if people for whom it was developed do not care, I do not care even more - I created it, I can kill it.
Thanks a lot for those who supported kevent and tried to push it
upstream, but it looks like we wasted our time for nothing :)
/devel/kevent :: Link / Comments (0)
Sat, 16 Dec 2006
2.6.19 kernel acrypto release.
Acrypto
allows to handle crypto requests asynchronously in hardware.
Acrypto
supports following features:
- multiple asynchronous crypto device queues
- crypto session routing (allows to complete single crypto session when
several operations (crypto, hmac, anything) are completed)
- crypto session binding (bind crypto processing to specified device)
- modular load balancing (one can created load balancer which will get
into account for example pid of the calling process)
- crypto session batching genetically implemented by design (acrypto
provides the whole data structure to crypto device, i.e. it is
possible to use acrypto as a bridge which routes requests between
completely different devices, since it does not differentiate between
users, just handles requests)
- crypto session priority
- different kinds of crypto operation(RNG, asymmetrical crypto, HMAC and
any other)
Combined patchsets include:
- acrypto core
- IPsec ESP4 port to acrypto
- dm-crypt port to acrypto
- OCF to acrypto bridge, which allows to run OCF device drivers with acrypto (for example ixp4xx),
requires OCF installed.
/devel/acrypto :: Link / Comments (0)
Fri, 15 Dec 2006
Climbing evening.
I've bought myself an mp3 player since Skala-city's
sound accompaniment does not hold any critics (in my opinion), so I started to climb
with own music in the head.
That was great training - likely because of excellent mood and good music -
although most of the boulderings I made were old ones, but I started to climb
several new ones - quite complex, but very interesting. Eventually I even completed
part of the trace instructors failed to do by using parts of the relief
they dit not found (it looks like some other master created that trace, not local trainers),
but I'm quite sure they will complete it soon too.
I got an extremely super good mood - people likely with great surprise looked
at man who climbs alone and smiles itself even when fail.
That was great time.
/life :: Link / Comments (0)
Shit happens.
My boss has read my blog...
And probably is reading right now - if I will be fired soon,
that will not be a surprise.
At least I will stop doing uninteresting things :)
/other :: Link / Comments (0)
Thu, 14 Dec 2006
New acrypto release for 2.6.19 kernel tree.
Changes from 2.6.18 are small enough, but my testing machine is not functional yet,
so I will start testing tomorrow and then will announce release in mail lists.
/devel/acrypto :: Link / Comments (0)
New kevent 'take28' release.
Short changelog:
- made kevent default yes in non embedded case
- added falgs to callback structures - currently used to check if kevent
can be requested from kernelspace only (posix timers) or
userspace (all others)
With this release I start 3 days resending timeout - i.e. each third day I
will send either new version (if something new was requested and agreed to
be implemented) or resending with back counter started from three.
When back counter hits zero after three resendings I consider there is no
interest in subsystem and I will stop further sending.
Thanks for understanding and your time.
/devel/kevent :: Link / Comments (0)
Wed, 13 Dec 2006
Climbing evening.
Not bad and quite long training - today it was devoted to bouldering traces.
I ran quite a few, although most were old enough, but also found couple of new ones.
Found why instructors climb in running shoes but not in special climbing shoes most of the time -
they say that when new trace is created and they can complete it in running shoes, then
likely usual people can complete them in climbing shoes, otherwise we need to be able to climb
7'th cathegory and higher. Well, maybe that is true, most of the time they create simple
traces it does not matter in what shoes one can run over it.
Talked with instructor Anya about climbing areas in Turkey - she wants to open climbing
tourist agency there, although I'm not sure it will have major profits - there are
a lot of good climbable rocks, weather and area are good for that too, but I doubt there
are a lot of climbers all over the world who will visit it again and again - people
tend to like different places.
/life :: Link / Comments (0)
Avoiding - and fixing - memory fragmentation.
Interesting article in LWN about
possible solutions for avoiding and fixing memory fragmentation.
This issue was main goal of the
network tree allocator,
which among others has ability to perform zero-copy sending and receiving, which I plan
to integrate into netchannels.
Main difference is the fact, that network tree allocator combines chunks inside
pages including composed pages, while solution proposed by Mel Gorman is to combine pages.
/devel/networking/nta :: Link / Comments (0)
Tue, 12 Dec 2006
Network address translation (NAT) over netchannels.
Netchannels itself work - one can configure either userspace
netchannels or kernel ones, which are processed by kernel threads.
NAT is special and thus interesting case - packets must be mangled
and resent upto two times - when packet is received from one
network (before changes), then it should be changed, sent to
likely other network, reply received, changed and sent back to original host.
There are at least two types of rules in such scenario.
One type contains NAT rules which are created by system administrator, another one
is per connection rule, which includes real manipulation data.
At minimum it contains two routes - to original host and
to remote host. Main mangling rule setup by administrator,
for example 192.168.0.0/255.255.0.0 convert to look like source is 1.0.0.1,
can contains several different networks, which are connected
through different network adapters, so input routes are essentially per-connection,
the same applies to destination routes.
So, what is the logic behind NAT in case of netchannels or netfilter?
I will describe how netchannel implementation will work.
When packet first time enters network stack and it hits NAT netchannel,
it will be queued for processing in special thread, where new netchannel
with exact source/destination parameters will be created, including
routes for input and output directions (actually they are both output directions for NAT server
(or only one direction, if connection is to/from NAT server itself),
but one of them is input and other is output from the initial packet point of view,
so I will call it that way), then packet will be changed and sent to appropriate
destination. When next packet or reply will enter the stack, it will be caught
by that new netchannel, since its priority is higher, and eventually packet will
be changed and resent in special thread. Each new netchannel, created by NAT system,
will have either kernel timer attached or special state machine will be created
to track too old netchannels which potentially can be freed (i.e. some kind of
garbage collection).
Since each new netchannel, created by NAT system, has always exact parameters
without wildcards, it would be possible to put them not into trie, but into
special binary tree with faster search time, but such tree must be put into
main NAT netchannel to not mess with other rules, which were setup in trie,
so actually there will be even more lookups than in case when new netchannels are put
directly into the same trie - one lookup for main NAT netchannel and then additional
tree search for subnetchannel (lets give such a name to netchannels created by NAT system).
In theory it is possible to not create new subnetchannel for each new connection -
system can take an advantage of the fact, that several dataflows (potentially mapped
to individual netchannels) can use the same routes, so they could share the same
subnetchannel, but in practice there is no way to select set of addresses which
contain the same route, only possible to get route by addresses.
This rises a question actually, since I want to use exactly one netchannel for server-like
scenarious, when there is only one listening netchannel, which transfers data
to/from userspace, which handles protocol itself. I will think about posible
good implementation of such a feture, but until it is done, new netchannel must be created
for each new connection.
Netfilter NAT work essentially the same.
It is based on famous connection tracking subsystem - each packet, when it enters
the stack, will get connection tracking entry attached to it, when packet within the same
connection (i.e. source/destination parameters) enters the stack, it will
be changed according to that connection tracking entry. Connection tracking entries
live in own cache independent on NAT, but called from NAT rule - so we have exactly
two lookups which I decided to avoid in described above netchannel scheme. Netfilter/connection tracking
lookups are perfomed in fixed-sized hash tables, contained linked lists,
which does not scale neither with number of packets, nor with number of rules.
/devel/networking :: Link / Comments (0)
New kevent 'take27' release.
Short changelog:
- made kevent visible in config only in case of embedded setup
- added comment about
KEVENT_MAX number
- spell fix
This major patchset has been sent to linux-kernel@ and netdev@ for review.
Let's see where it will end up.
/devel/kevent :: Link / Comments (0)
Mon, 11 Dec 2006
Climbing evening.
It was quite short training, only about two hours, but quite hard -
I started new relief traverse part - from the begining to the middle,
where previously I failed, now there are only couple of places on the
whole trace which I can not complete, but I can not run the whole trace
even without that parts - do not have enough power endurance, although
trace's rating is not that high (about 6c).
Also several old boulderings and traverses were completed and that was all.
Now I feel pain in virtually every pipece of my body, and strangely enough
I like that masochistic feeling. Well, all except pain in knees and head,
which I damaged a bit during unsuccessfull failings from the wall,
i.e. all muscle pain is realy pleasant - it is not something really strange,
but if I would have a psychoanalyst it could be probably a thing to worry about.
/life :: Link / Comments (0)
The last 'take26' release has been resent.
Let's see how 'commit early' linux motto sucks (well, it only works for major known developers) - I have not received any feedback from
that major release, which included most (all except sigmask syscall parameter) issues mentioned by other people
as long as a lot of additional features.
And kevent (although obsoleted 'take25' release) was removed even from -mm tree.
Maybe I should heat an interest in kevent a bit with this
kevent benchmark?..
/devel/kevent :: Link / Comments (0)
Netchannels got wildcards support and more modular design.
I splitted
netchannels
into core processing unit, which selects netchannel from the trie when packet arrives,
queues packet, export add/remove and other operations and other similar core tasks.
Trie layer itself is just an implementation of the multidimensional data structures,
used for fast searching of the final destination point (i.e. netchannel),
it can be replaced with some other implementation easily as long as new
one provides the same three function to add/remove/search operations
and obey the same locking rules, which should not be hard, since netchannel layer
does not perform any kind of special locking in that operations, except that
requirement for search RCU protection and RCU-aware netchannel freeing.
Other layers are just netchannel users - userspace with ability to work with netchannels
using syscall interface, NAT implementation (work in progress yet).
Interesting decision was done for some netchannel subsystems which can not have
backed process - like NAT, when packets must be processed in kernel only. There is
set of special threads, which performs netchannel processing (it is quite simple -
they just calls nc_process() callback of each linked ready netchannel),
processing thread selection can be performed when new skb has been received -
system can select local thread or schedule netchannel processing on the some other
CPU bound thread (for example if netchannel is being processed on different CPU,
cache thrashing overhead will be much smaller if system will move skb to different CPU,
than the whole netchannel to current CPU).
Thus this subsystem only performs RCU protected netchannel selection, spinlock protected
skb queueing and awakening of the actuall processing thread (either userspace parked in syscall,
or kernel thread).
All the real work is performed in either userspace or by kernel users (currently only NAT).
I plan to complete NAT implementation, perform some testing and release new major update this week.
/devel/networking :: Link / Comments (0)
Sun, 10 Dec 2006
Hinged ceiling setup. Layer 2.
I've completed the heaviest work - weight of the second layer is about 40-50 kg
with about 6-7 square meters, the whole ceiling thus is about 60 kg and about 10 squares
with two levels. Doing that alone was not that simple task,
but it is done already, fortunately without major damages to the neighbour nature.
Now it requires plaster filling, rubbing, board creation and final postprocessing.
Likely I will complete that part this week.
After it is done, I can say, that the dirties parts are done and I can start
finish processing (although I did not even start bathroom development -
did not not get bricks and concrete).
This is phone-made photo of the part of the ceiling completed about midnight.
/devel/flat :: Link / Comments (0)
Tue, 05 Dec 2006
Due to personal issues I will be offline for several days.
/life :: Link / Comments (0)
Badness in postponing work.
RCU is not a perfect solution for existing schemas actually.
Lets consider synchronous situation when object freeing does not only
return memory to the cache, but also perform some additional state machine changes -
like putting reference counters and so on - obviously postponing of such work
likely will not be very good solution, since when we work with fast path, we want
the whole sequence be completed quickly, but not fast first half and postpone
other for a long time.
But even pure memory freeing - i.e. returning mempory to the cache, if being postponed
to RCU callback, can lead to very noticeble performance degradation.
Thinking about better and faster skb processing for netchannels I created
simple patch which just postpones skb freeing (kfree_skbmem(), i.e. pure
releasing memory back to skb cache) to RCU callback invoked from __kfree_skb().
This leads to the following performance degradation (receiving of small packets):

Speed is about 2.5 times slower, although CPU usage is smaller too - likely due to the increased
work of RCU tasklet and increased number of context switches.
As a conclusion: using RCU protected lists of skbs for sockets will lead to major performance
degradation.
As a second conclusion: RCU is not a panacea, so its usage will be limited in netchannels.
/devel/networking :: Link / Comments (0)
Locking tricks inside trie algorithm.
This tree is never rebalanced by design, its pointers
are always the same except the case when node is added or removed,
thus it is possible to use RCU protection when traversing
or modifying trie structure - just using rcu_dereference()
and rcu_assign_pointer() it is possible to create completely
lockless network stack.
Let's consider trie modification which leads to node removing (when appropriate
reference counter becomes zero).
Trie removal algo deletes pointer
from root node to its child using rcu_assign_pointer() and
all subsequent subnodes and schedules node freeing through call_rcu() mechanism.
Trie traversal, which uses rcu_dereference(), will eventually fail
to dereference next pointer or will complete netchannel processing
and thus will exit RCU protected section, which will lead to freeing appropriate nodes.
Node addition is essentially the same - set of node insertions (if new trie
set of nodes requires allocations and insertions and does not reuse existing
inserted nodes) will be 'published' one by one through rcu_assign_pointer(),
which can be interrupted between different isnertions and lead to obtaining
NULL pointer, which is perfectly fine, since netchannel is not yet added.
RCU callbacks are scheduled for invocation after so called grace period is completed,
in practice it means that callback will be invoked some time after context switch happens
(surely since context has switched, section is no longer executed, thus it is ended),
so we need to protect our execution path from being scheduled away while traversing
the trie with rcu_read_lock()/rcu_read_unlock(), which is normal,
since searching path is executed in softirq context. So when searching
is completed new data is added into netchannels queue and appropriate user
is awakened (either userspace process or kernel thread/work queue for things that
do not have userspace process, for example netfilter checks,
for the latter case netchannel will be put into appropriate per-cpu list, accessible
from kernel thread, bound to given CPU).
Since netchannel processing happens in different context than searching, it means
that grace period has elapsed, thus netchannel and related nodes can be freed, while
netchannel will be accessed from processing context. To solve this problem
and increase performance I design special multeplexing schema which will be used in
softirq/hardirq fast path and allow better scalability by minimizing
number of context switches and data moving between CPU caches.
Addition is simple - while holding mutex, which will prevent simultaneous
deletion, it will allocate nodes one by one and publish them using
rcu_assign_pointer(), since fast path running in different context uses
rcu_dereference() it will either get new value with all above logic,
or get NULL pointer, thus returning without processing.
Addition will not be protected by rcu_read_lock()/rcu_read_unlock(),
since simultaneous deletion is not allowed due to mutex
being held, so sleepeing allocation can be used.
/devel/networking :: Link / Comments (0)
Mon, 04 Dec 2006
Meanwhile on flat development side.
I completed to fill self-leveling floor in kitchen, bathroom and part of the hall -
do not believe the advertisement, it does not self leveling, no matter
how thick the layer is - one needs to level it during filling. Its nature
is about the same as plaster, so it behaves exactly like it, but it is more
fluid. I spent about an hour mixing and filling, and then about 3 hours
to finally level it - get palette-knife and meter-by-meter stating to cut all
non even parts, with some water oiling it becomes quite good.
Thinking some more about hinged ceiling I decided to implement second layer
even although there are no long enough hangers - I will use parts of ceiling profiles
with cut middle part to implement some kind of 'U' hangers - without middle part
it is possible to level ceiling.
/devel/flat :: Link / Comments (0)
Multidimensional trie test has been completed.
It ran for almost two days and about 7.4 milliards of insert/search
operations were completed in 1 million series. No crashes, no failed
searches or inserts, no bugs. It's a win.
I start to combine netchannels
and this algorithm so wildcards get supported.
Thus netchannels become extremely useful functionality for Linux network stack -
it allows to create and test different network stacks in userspace or kernelspace, thus
creating TOE engines without influence on core stack (since netchannels engine runs before
core network protocol hooks are called), it allows to create very fast and scalable network filters
and packet manipulation entities like NAT. Can you ever consider Linux system with one million
netfilter rules? System with million of sockets is already quite hard to get.
I also have some plans to implement userspace network stack
as a library and patch existing web server (likely lighttpd
to work with it - it only requires one netchannel (wildcard of course), i.e. only
one socket in existing notation, and socket and related structures like inodes and files allocation
and freeing and related syscall overhead is quite major limiting factor in existing systems.
Other system usage include packet-socket-like interface - when userspace can select set of IP addressed
to work with without main kernel stack - like BGP/OSPF daemons, TUN/TAP interface, VPN proxies
and much more.
So there are plenty of interesting ideas, which I plan to sublimate my energy into...
/devel/networking :: Link / Comments (0)
Sun, 03 Dec 2006
Huge development shopping day.
I spent the whole day in Leroy merlin
development shop - from early morning to 22-00 without food and drink I selected
ceramic tiles, walpapers, floor covering and so on. I can not say I liked it, but
it should be done and now I have everything to complete my repair process.
Since there are no standard long hangers I will create only one layer for hinged
ceiling. I also bought new finish plaster for decorative ceiling smearing - looking
at what it is and how it should be used I can conclude that I cound do the same
with usual finish plaster (not stucco, but oil) and it could about 5-10 times cheaper,
but let's see what our french friends invented. Also bought transformer and set
of ceiling dotty lights - I plan to complete hinged ceiling this week, and it would
be great if I could do it tomorrow.
Main thing is of course ceramic tiles - women in shop agreed that they do not like and
do not even imagine how grey/blue marbled tiles on the walls (it is actually floor tiles)
with blue marbled floor can look like. Then it was decided that it is man's bathroom, where
everything is possible. This took about 4-5 hours. Well, let's see what my imagination created.
Wallpapers selection was simpler - I almost did not have power to spend there as much time
as I did when selected ceramic, so I bought a lot of wallpapers for room, kitchen and checkroom.
Floor covering was easier - although it is a bit early, but this will force me to faster
complete developemnt - various types from black ceramic granite to blue carpet covering.
Major things I missed are bricks and concrete for small podium I want to create for
shower cabin, this will stop bathroom development actaully, but I plan to get them next week.
Eventually I bought all that and other stuff, which will be delivered Wednesday, so while waiting for
the bus I decided to visit nearly located Auchan gigantic shop -
I just wanted to find immersion heater - but I completely lost in the area of the couple of stadiums
filled with various stuff - nature brought me to the area where strong alcohol drinks were sold,
so I got bottle of rum (they do not have "Captain Morgan", only "Baccardi") and moved away.
While waiting for the bus and drinking some rum I decided that it was not that bad day actually.
/devel/flat :: Link / Comments (0)
Sat, 02 Dec 2006
Multidimensional trie searching algorithm is completed.
I'm back to work on this problem and here are results.
Hardware.
Intel Core Duo 3.4 Ghz (each core runs with 3.7 Ghz) with 2Gb of RAM
(userspace application is singlethreaded, so only one core is used).
Test description.
A number of 3-dimensional pseudo-random rules has been inserted/searched/inserted+searched.
More than 80% of rules contain pseudo-random wildcards (wildcard is formed with
higher bits from 0 to 31 set without gaps). Each rule is a 32bit range,
for example 192.168.0.0/255.255.255.0 produces such a range.
Searching is performed for the non-wildcard rule, i.e. rule with 0xffffffff mask.
Such setup is a powerfull IPv4 packet filter emulation.
Results.
Graph shows number of microseconds needed to insert (including time for memory allocation),
search and sum one multidimensional rule depending on total number of rules.

This graph shows memory usage for trie implementation. Theoretical maximum limit is calculated
by multiplying number of rules, number of directions, size of each rule object (in onde dimension), 32 (number of bits,
i.e. number of rule objects to represent the rule) and 0.6 (20% is non-wildcard rules
and 80% is wildcards with equal distribution of 1-31 bits, so 0.2 + 0.8*0.5).
Each rule contains 28 bytes on x86 of auxiliary information,
which is used to build the tree, it can be reduced though.

Test run for the case without wildcards shows increased memory usage (since with wildcard existing algorithm allocates
nodes only for masked bits, i.e. the biger mask is (0xf0000000 is bigger mask than 0xffff0000 in above notation)
the less memory it requires, thus insert time was increased, but still is constant value about 12 microseconds.
Search speed becomes constant and is about 1 microsecond.
I've started following test to run till Monday - system creates and inserts 500.000 rules with
random data and prefixes, then 500.000 rules without wildcard, after each insertion it search
for the rule, after one million rules has been isnerted, program starts from the beginning (by existing
and restarting). I do not put rule deleting test, since there is no memory to hold all rules
and all intervals used to create those rules (deleting was successfully tested on smaller amounts).
/devel/networking :: Link / Comments (0)
Fri, 01 Dec 2006
Climbing evening.
It was interesting training - besides usual traverses and boulderings I created
new type of exercises. Consider the case when you got big hold, where two arms can be placed
and there is some place for feet - yes, while you are in hinged state, you get your feet
over the head to arms (another leg can be setup for friction stop),
put the feet onto the hold and try to move your body so that you sat on the hold -
some kind of lifting your body using arms and leg on hold from the initial place
(somewhere over the head) to the place when you sit on the hold. It is not that complex
exercises when you use two holds - one for arms and different one for leg, but when
there is only obe hold it becomes very interesting - I spent quite big part of the training
trying to complete it using different holds - big reliefs were done quite easily, but usual
big holds required quite a lot of efforts. That was great time.
Something like that (photo from bouldering championship in Skala-city,
man on photo is not me), but with only one hold about 20 santimeters width (like big blue hold rotated to 90 degrees). Final position
is when you sit on the hold, where right hand is.
/life :: Link / Comments (0)
First sounds flew out of my new trumpet.
And I would not call them beautiful. Actually couple of times
I produced really nice sound, but I can not easily reproduce it,
so most of the time (I started about 23-30 and practiced about an hour)
it was horrible wheeze and creak, but I alsmost start to figure out
how to play notes...
I think it will take looong time to produce something interesting, but I
do not hurry.
/life :: Link / Comments (0)
|