Zbr's days.
August
Sun Mon Tue Wed Thu Fri Sat
   
   
2006
Months
Aug

About :: TODO :: Blog :: RSS :: Old blog :: Projects :: GIT :: Gallery :: Notes

Thu, 31 Aug 2006

Back from Scandinavian trip.


I've just returned from small 4 days vacations in Finland and Sweden.
If you want my brief opinion on how it was, then I can say, that it was exteremely good time there.

Day 1.
Moscow - St. Petersburg train trip.
Since train moved the whole day (from 10:30 to 19:00), some beer was taken and Confucian book. There were very interesting neighbours - Shvechikov Alexey Nikolaevich - professor of history of religion and his wife. We had a very interesting discussion about technical world and history, about Confucianism and various religions, about "Russian Way" and does it exist at all, how country and nation intellectual development happend and where it will end up. He wrote a book "Religion. History. Science. History of western civilization. Experience of hitsory-metodological research." about civilization evolution and where and how russian way (I still do not agree with him, I think that at least right now we do not have any such "way" except eating nature resources and steal from each other) was created and changed. That book will be available in January.

When we were in St. Petersburg we found an extremely good bookshop - Bukvoed - gigantic selection of books, nice cafe where you can read them, nice music... I definitely recommend it for everyone who will be there - it is situated just near Revolt square (ploshad' Vosstaniya).

Day 2.
We started to move to Helsinki in about 1 A.M. and arrived there early morning (about 8:00 or 9:00).
We found that there is either still a weekend in Monday or people in Helsinki start to work later - there were almost no people on the streets (although it is quite small amount of them at all - about 5.5 millions of people in the whole country, which is about 2 times smaller than in Moscow).
I think finland built kommunism (or something very good at least) - all people look very happy, they respect each other and rules (for example if you even try to cross the street, vast majority of drivers will stop even if there are no crossing lines). Almost everyone knows english (and two other national languages - finnish and swedish). We had some small dinner - and paid only for beer and (very big) main dish - garnish, salad, soup, tea and coffee all for free. I never saw such restaurants in Moscow, although there some of them which provide some kind of buffet table.
Drivers are very calm - no signals, no racers, if there is 30 km/h speed limit, than everyone will move with that speed, I did not see there any traffic jams.
I only saw police car three times, and there is no road police there (at least in Helsinki).

It is extremely good country, but all it's advantages are completely destroyed by one main disadvantge (for me at least): I like russain type of women much more than scnadinavian one :), so I will not move there at least for now.

We moved to Sweden from finland city Turku on gigantic ship "Silja Europe", which is really a small city in the sea - it has just everything you can imagine.

Day 3.
Sweden. We arrived to Stockholm early morning and had a small tour of the city.
I especially liked small sailing ships moored not far from city hall.
We visited "Wasa" museum (which was original goal of my tour) - museum devoted completely to one ship built in 17'th century (it was sunk in the first voyage), which was extremely interesting.
Then me and Grange put out to sea on Sweden Grand Royal Navy pride ship - rented canoe on Jurgarden island ("Animal's isalnd", sorry if I call it wrong), which was successfully ended in an hour. While being afloat some tourists had photos of us naively thinking that we are Stockholm's aborigins.
While walking on Stockholm's streets, we found interesting bar called "Oliver Twist" (it looks like it is english bar in Stockholm) where had a dinner and drunk some swedish beer - it was not bad, had very interesting taste, but (as long as finnish one) I do not like it. Bar itself was relly nice place - if you want to find it, you need to move left from King's palace to the neighbour island, then move upstairs over some very busy street and you can find it somewhere not far from local church (well, not very productive description, but I can not remember and even read swedish street names, although they look simpler than finnish ones). Moving back we had a walk not over central foot-street (it looks like it is really the place where only tourists walk like Arbat in Moscow), but on other very busy street, where found that Stockholm's life is exactly the same as in Moscow - busy crowds of people, racers and bad drivers on roads (although not as much as on Moscow streets) and so on - usual busy life from usual busy city, but not something new like Helsinki.

There are a lot of cyclists in Helsinki and Stockholm - they are full members of traffic flows, there are a lot of special bicycle lanes in parallel with footpathes and car roads. I've found that cyclists do not stop (even try to stop) when you cross theirs line, which can be quite danger. We moved back to Finland on "Silja Festival" ship - it is slighlty smaller than "Silja Europe", but still is a city in the sea - very powerfull construction on the water.

Day 4.
We spent several hours in Helsinki, moved to the far outskirts of town and found that sleeping districts are built right in the forest - small buildings of three-four floors, most of which are even smaller than surrounding pine-trees.

Then we moed back to St. Petersburg, sat in the book-cafe, where I bought myself a book about russian history from 9 to 20 centuries and moved to Moscow, where I write this story.

Expect a lot of photos soon!
It was really good journey to completely different world. I enjoied it very much.

/life :: Link / Comments (0)


Sat, 26 Aug 2006

Zero-copy sending and receiving support.


tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
16:47:27.233768 IP10 truncated-ip - 256 bytes missing! 192.168.4.78 > 192.168.0.48: udp
	0x0000:  abab 0578 abab abab ab11 abab c0a8 044e
	0x0010:  c0a8 0030 abab abab abab abab abab abab
	0x0020:  abab abab abab abab abab abab abab abab
	0x0030:  abab abab abab abab abab abab abab abab
	0x0040:  abab abab abab abab abab abab abab abab
	0x0050:  abab
This is zero-copy sent datagram, which was captured on receiving side, as you can see it is perfectly correct (i.e. it contains exactly those IP and higher layers, which were filled in userspace on sending side).

I've also cleaned zero-copy mapping support a lot, so there would not appeared some situations when allocation would not be caught due to mmap troubles (like different CPU mapping crossing and so on).
I also moved notification about new packet arrival in zero-copy sniffer into freeing function, since when it is placed in allocation one userspace can find new buffer until it is even filled by the kernel. When buffer is being freed, it is obviously already contains data (except cases when allocated object was not used at all).
In general zero-copy sniffer can not catch data changes happend somewhere inside main processing code, for example IPsec packet can not be caught decrypted, since it is very short time while packet itself is in transient state after receiving and decryption, in such cases that transient states must be copied, for example using new allocation (which freeing will be caught by sniffer), memory copying and immediate freeing.

There is still a small problem there with freeing - due to addition of struct skb_shared_info, but it is not really that complex, so I will postpone it for a while and will try to implement trivial dump analyzer.

Almost forgot, you can find current patch and userspace utilities in archive.

/devel/networking/zcs :: Link / Comments (0)


Fri, 25 Aug 2006

Kevent.


I've released 'take14' patchset. Short changelog:

  • do not get lock aroung user data check in __kevent_search()
  • fail early if there were no registered callbacks for given type of kevent
  • trailing whitespace cleanup

/devel/kevent :: Link / Comments (0)


Wed, 23 Aug 2006

How to get an IP address.


There are some situations, when DHCP can not be used, but board must obtain an IP address, for example when you have an embedded device which is going to be sold into environment where there are no DHCP servers, but you can setup your own applications. One solution is to reinvent DHCP server, but it is possible that new environment actually has DHCP server, but it should not assign an address to the board (read no one wants to notify people when some new system has been installed in the network, and steal IP address from some pool preallocated far ago).
Briefly saying you need to implement something, which must fit into 500kb flash and be possible to obtain addresses from the outside, and it can not use DHCP.
I decided to use multicast - board sends multicast notifications and some userspace application receives them and sends back information with IP addresses. From the first point of view it looks simple, but let's enter wonderland of Linux inet devices.
To be able to send data there must be default route, which can be setup from initrd using rtnetlink. Rtnetlink new route command requires that there must exist an so called inet device (structure which contains IP address information for given network device), which must have an IP address, which in turn can be assigned through rtnetlink. But new adressess assignment requires that device in question must be turned on, which can not be performed through rtnetlink, and only works through ioctl().

So, if you will try to send and receive some data over multicast (for example to get IP address) from initrd, you must:

  • change interface's flags to IFF_UP using ioctl(SIOCGIFFLAGS)
  • set some IP address (DHCP sets 255.255.255.255) using rtnetlink RTM_NEWADDR command
  • set route for given multicast group using rtnetlink RTM_NEWROUTE command
Size of the application, which does exactly what is described above, compiled with gcc-3.4.4 for PPC32, can fit into 500kb (although there is almost no space for something else, except if that "something else" does not work with network).

That is what I was doing (well, I did not exactly that, but put tons of printk() to determine why I only get RTNETLINK answers: No such device and other errors from initrd, but can easily setup new route from normal userspace started over NFS (after DHCP address resolution)), all the time today at my paid work. And of course I participated in, seems to be, endless flood in linux-kernel@ about kevents.

/devel/networking :: Link / Comments (0)


Kevent.


Discussion about kevents in linux-kenel@ has come to interesting point. Here is couple of citations:

	Go fuck yourself
 ...
 	In a decent society you would have your nose broken
and the like.

I think it is obvious that it is highly professional discussion.

I've released 'take13' patchset. Short changelog:
  • remove non-chardev interface for initialization (Christoph Hellwig)
  • use pointer to kevent_mring instead of unsigned longs (Christoph Hellwig)
  • use aligned 64bit type in raw user data (can be used by high-res timer if needed)
  • simplified enqueue/dequeue callbacks and kevent initialization (based on work by Eric Dumazet)
  • use nanoseconds for timeout
  • put number of milliseconds into timer's return data
  • move some definitions into user-visible header
  • removed filenames from comments
Let's see what new words it will bring to the linux-kernel@ readers.

/devel/kevent :: Link / Comments (0)


Tue, 22 Aug 2006

Zero-copy networking.


I've implemented initial zero-copy sending support based on network allocator. Here is tcpdump dump:

tcpdump: listening on lo, link-type EN10MB (Ethernet), capture size 96 bytes
20:55:13.709761 IP0 [|ip]
	0x0000:  0000 0000 0000 0000 0000 0000 0800       ..............
There is a problem though - I mmap data for only CPU0-bound allocator, but it is possible that allocation happens on different CPU, so there will be incorrect data (that is what you see above - there should not be any zeroes).
This problem should be fixed by proper protocol between userspace sniffer and network allocator (currently what is being used can not be called protocol at all). Since I introduced ->ioctl() method anyway I will use appropriate commands there.

/devel/networking/zcs :: Link / Comments (0)


Mon, 21 Aug 2006

Climbing.


Easy training today - I've damaged one finger on previous training, so there were no major progress today - several old traces and couple traverses. I even recall my first complex trace I completed about half of a year ago, and completed it suite easily - either it was changed a bit to be simpler, or my progress does not stay on the same place. Also completed "jumping" trace - I changed it's path to make a small dynamic jump (about several tens of santimeters high) instead of a loop, although no one likes it I prefer it definitely. Trace over rock-crack holds was not completed today - I failed several times even in the middle - shoes are not ready yet, finger is aching and I generally do not complete without major trainings traces on negative (even small) slope of that complexity. So, no record yet, but they are coming.

/life :: Link / Comments (0)


Kevent.


I've released 'take12' patchset. Short changelog:

  • include missing headers into patchset
  • some trivial code cleanups (use goto instead of if/else games and so on)
  • some whitespace cleanups
  • check for ready_callback() callback before main loop which should save us some ticks

P.S. I do not see that kevent integration process comes to it's final end (no mater if it is inclusion or decline) actually...

/devel/kevent :: Link / Comments (0)


Zero-copy sniffer.


I've completed entirely zero-copy sniffer based on network (formerly tree) allocator. I've sent the whole patchset to netdev@ for review. One can find it and userspace utility in archive.

Design notes.
Network allocator steals pages from main system allocator and use them for all network allocations (it's benefits are behind the scope of zero-copy sniffer description, one can find network allocator features on project's homepage), thus it is possible to mmap all stolen pages from userspace and provide special structure for each allocated chunk into userspace which include offset from the begining of the node (each node contains contiguous page-aligned memory region), node number and other info. Since network allocator tracks number of users for for each memory region, when the last one completes with data procesing (for example userspace sniffer), it must commit that area back to allocator, so NTA relies on correct values returned from userspace (if returned from userspace chunk is not valid, it will not be freed, but if userspace will not "free" chunks (by sending info about them back to kernel) eventually maximum allowed number of shared free regions is achieved and no more data will be sent to userspace (and be allowed to be shared).

Since by default network tree allocator is used for all network allocations (including unix sockets and netlink), sniffer will get all those data and must somehow differentiate between them. That task is out of the scope for this mail though, simple solution is just to attach network allocator to network device (i.e. call NTA allocation functions from netdev_alloc_skb() only).

I never run any special performance tests, but simple "top" command shows much smaller CPU usage for zero-copy sniffer (although it gets all data from every skbs in the machine) compared to tcpdump - 17% vs. 33% maximum on my test machine.
Both sniffers dump received data into /dev/null.
Server side (where sniffers run) runs epoll() based trivial web server, client side runs httperf.
Machines are connected over 100mbit LAN (e100 server NIC, 8169 client NIC).

For zero-copy userspace netchannels I plan to only send to userspace information about allocations which really belong to created netchannel instead of info for each chunk.

Sending zero-copy support is in TODO.

/devel/networking/zcs :: Link / Comments (0)


Sun, 20 Aug 2006

What is the hope and where one can find it?


We generally think that it comes from the future, but if we try to analyze, to deeply think about it we will only find some plans, tasks, wishes there, but is it a hope? No, it is not.
Maybe hope lives in the present? In present everything is clear, one can even see how concrete things are - there is no hope here.
The only time where it can come from is the past - even old family photo album with black and white photos contain the hope, since it is enough just to see that life was normal for hope to burn...

It is possible to rush in cities, airports and hotels, to be sick and have big financial problems, do not have a home but be happy. Absolutely happy. Because she is waiting for you. Not some abstract waiting, but exactly person, that you want to be waiting.
And one can be successfull and show considerable promise, but be completely unahappy. Just because no one waits for you...
And you have a phone and it has a number, but no one calls for you, because she does not call and thus it seems that no one calls.
But here number fires up in your brain, and you know that you should not call her, in no way, since it will be only worse, much worse in any case. And just after this thought has come to your mind, you immediately call her. And things become much worse in any case.
Small rings - she talks with someone, with whom?
Long rings - she does not take a phone, she knows you number and do not want to talk...
Or operator says that abonent is not accessible right now.
Or you hear voice, her voice, but she does not want and does not like it, and some false answers and absolutely indifferent questions...

Sometimes it is impossible to just seat behind you window, fire your lamp and send a signal somewhere into the city about your solitude and hope that someone will feel it and respond. And you so much want to get out to somewhere where you will not be alone, somewhere where there are no people, since if there are no people, there can not be a loneliness. And the more city is, the more people are there, the more you feel your solitude...

    Eugene Grishkovec "The Planet"

/life :: Link / Comments (0)


Sat, 19 Aug 2006

Network usage statistic.


While hacking on zero-copy sniffer I've discovered additional interesting thing about networking in Linux. First one was quite rough number of allocation/freeing on system startup.
Now I see continuous allocation/freeing flow of 292 bytes objects. I think it is netlink messages, but why they do not stop? And who calls them? Initial suspected is kobject_uevent, at lest while kernel is being started and new devices are discovered it broadcasts a lot of messages (from 292 to 2340 bytes) about each one.
But it is really broken behaviour to continuously spam userspace.
Have you seen "Why userspace sucks?" slides by Dave Jones about stupid things userspace can do? I think this can be a continuation...

Well, things are not that bad - it stopped after some period of time, so it was false alarm from my side.

/devel/networking :: Link / Comments (0)


Zero-copy sniffer. First results.


add@/class/mem/kmsg.ACTION=add.DEVPATH=/class/mem/kmsg.SUBSYSTEM=mem.SEQNUM=105.MAJOR=1.MINOR=11
................................................................................................
................................................................................................
....

add@/devices/system/timer/timer0.ACTION=add.DEVPATH=/devices/system/timer/timer0.SUBSYSTEM=timer.SEQNUM=106
...........................................................................................................
..............................................................................

oc->avl_node_list);...alloc->avl_container_array = kzalloc(sizeof(struct list_head) * AVL_CONTAINER_ARRAY_S
IZE, GFP_KERNEL);..if (!alloc->avl_container_array)...goto err_out_exit;...for (i=0; i<AVL_CONTAINER_ARRAY_
SIZE; ++i)...INIT_LIST_HEAD(&alloc->avl_container_array[i]);...entry = avl_node_entry_alloc(GFP_KERNEL, AVL
_ORDER);..if (!entry)...goto err_out_free_container;...avl_node_entry_commit(entry, cpu);...return 0;..err_
out_free_container:..kfree(alloc->avl_container_array);.err_out_exit:..return -ENOMEM;.}../*. * Initialize 
network allocator.. */.int avl_init(void).{..int err, cpu;...for_each_possible_cpu(cpu) {...err = avl_init_
cpu(cpu);...if (err)....goto err_out;..}...err = avl_init_zc();...printk(KERN_INFO "Network tree allocator 
has been initialized.\n");..return 0;..err_out:..panic("Failed to initialize network allocator.\n");...retu
rn -ENOMEM;.}..............................................................................................
.........................................................................................................

.k........ ................................................................................................
......................................................................................a.....*.c..E....,@.@.
. ...N...0.P.3Epe_;........_...........1......width: 47%;.....padding-right: 3%;.....float: left;.....paddi
ng-bottom: 2em;....}.....content-column-left hr {.....display: none;....}.....content-column-right {...../*
Values for IE/Win; will be overwritten for other browsers */.....width: 47%;.....padding-left: 3%;.....floa
t: left;.....padding-bottom: 2em;....}.....content-columns>.content-column-left, .content-columns>.content-
column-right {...../* Non-IE/Win */....}....img {.....border: 2px solid #fff;.....padding: 2px;.....margin:
2px;....}....a:hover img {.....border: 2px solid #f50;....}..../*]]>*/...</style>..</head>...<body>...<h1>F
edora Core <strong>Test Page</strong></h1>....<div class="content">....<div class="content-middle">.....<p>
This page is used to test the proper operation of the Apache HTTP server after it has been installed. If yo
u can read this page, it means that the Apache HTTP server installed at this site is working properly.</p>.
...</div>....<hr />.....<div class="content-columns">.....<div class="content-column-left">......<h2>If you
are a member of the general public:</h2>.......<p>The fact that you are seeing this page indicates that the
website you just visited is either experiencing problems, or is undergoing routine maintenance.</p>.......<
p>If you would like to let the administrators of this website know that you've seen this page instead of th
e page you expected, you should send them e-mail. In general, mail sent to the name "webmaster"............
...........................................................................................................
...........................................................................................................
..........................
Above junk was obtained from zero-copy sniffer running with epoll based web server on my test machine (I manually repleaced all "<" symbols with "&lt;" in the dump to not break HTML formatting).
First two dumps are kobject_uevent during startup ('.' means unprintable symbol, i.e. some binary data), then you can see part of my network tree allocator code being transferred over ssh (decrypted text being sent over unix socket), and at the end there are some pieces of default web page (copied from Fedora Core apache default index.html) and some unknown symbols all over the place.
Binary data at the end of each chunk is added for alignment, binary data at the beginning is header, and one in the middle corresponds to tabs, line foldings and so on.

It works, although there are issues yet to resolve - for example mapping code only maps initial cache, userspace can not see when it has grown yet, sniffer also does not know how many pages are inside each new cache cnunk. I will resolve that issues soon and send code to netdev@ for review.

/devel/networking/zcs :: Link / Comments (0)


Zero-copy sniffer.


Implementation of design with additional bitmask wastes too much space per node, so I decided to create much more simple solution - attach a tag to each allocated chunk, which contains a canary and reference counter. The former is just 4 bytes of special data which is used to check in freeing function if object being freed is valid and there were no memory corruption. Reference counter is used to mark mapped objects as used, so freeing would not destroy them. The only thing to implement is ->nopage() method for zero-copy sniffer underlaying char device, so when network allocator cache grows user could automatically be able to get new pages into mapping.

/devel/networking/zcs :: Link / Comments (0)


Zero-copy networking.


Initial zero-copy implementation is receiving side for zero-copy sniffer based on network allocator.

/devel/networking/nta :: Link / Comments (0)


ICFPC-2006 contest.


I do not know anything about functional programming (except that it is different from imperative programming), but this reading (in russian) about International Conference on Functional Programming contest is extremely interesting.

/other :: Link / Comments (0)


Fri, 18 Aug 2006

Very good climbing training.


I've completed couple of interesting traverses and several old good traces, also tried a trace, which was started on the previous training - orange trace over rock crack-like holds. Since I was tired anf it was end of the training, I failed somewhere at the second half, but I feel I can complete it. Excellent training was finished with campus-board exercises, tired body and very good mood.

/life :: Link / Comments (0)


Towards full zero-copy network support.


I've started zero-copy sniffer implementation, which is quite straightforward - each node contains bitmask of free/used chunks and bitmask of mapped (and used) to userspace chunks, when some area is mapped and is marked as being used and is going to be freed, freeing algorithm checks if it can do it or not, so freeing actually can be postponed (for arbitrary long time). Userspace reads from special char device set of structures which show allocated pointers and theirs sizes, so it can access raw data. Writing the same structures to that char device marks appropriate chunks of memory as mmaped but unused, so it can be freed when needed. Mmap itself is not implemented yet.

/devel/networking/nta :: Link / Comments (0)


Thu, 17 Aug 2006

Kevent.


Short changelog for 'take11' patchset:

  • removed non-existent prototypes
  • added helper function for kevent_registered_callbacks
  • fixed 80 lines comments issues
  • added shared between userspace and kernelspace header instead of embedd them in one
  • code restructuring to remove forward declarations
  • s o m e w h i t e s p a c e c o d y n g s t y l e c l e a n u p s
  • use vm_insert_page() instead of remap_pfn_range()
What really demotivates me hard in this process, is absence of the real vision of what should be done in trivial aspects like spaces and enums vs. defines. For example initial code contained enums, then I was suggested to use defines, now people tell me to use enums again, the same issues with type of underlaying device (char, misc, syscall) and so on... That is why I hate linux-kernel@ mail list (and of course because of it's politic and other floods).

/devel/kevent :: Link / Comments (0)


Wed, 16 Aug 2006

Climbing.


My new shoes are almost ready, so today I tried several old interesting traces and one new complex trace (actually it is very old, but I never completed it in the past), but since shoes slowly killed my feet, it was quite hard to climb. I think in a couple of training I will be ready for new records, so stay tuned...

/life :: Link / Comments (0)


Kevent.


I've released 'take10' patchset. Changes from 'take9' only contain fix for ->nopage() method.

/devel/kevent :: Link / Comments (0)


Network allocator.


I've released second version of network allocator and sent it to mail lists for review. Short changelog:

  • added dynamically grown cache
  • changed some inline issues
  • reduced code size
  • removed AVL tree implementation from the sources
  • changed minimum allocation size to l1 cache line size (some arches require that)
  • removed skb->__tsize parameter
  • added a lot of comments
  • a lot of small cleanups

As usual patch is available in archive.

/devel/networking/nta :: Link / Comments (0)


Tue, 15 Aug 2006

Network allocator.


After some cleanups it is possible to achieve more than 2460 requests per second with trivial epoll based web server on system with network allocator instead of usual kmalloc/SLAB one for network payload data (for reference: system with kmalloc/SLAB allocator can only handle 1600-1800 requests per second).

/devel/networking/nta :: Link / Comments (0)


Mon, 14 Aug 2006

Climbing.


It was easy training again - climbing shoes are not trod out yet, although I feel myself much better already in them. I expect in a couple of training shoes will be ready for new records nd I finally start new complex traces, sine I did not climb interesting ones quite for a while already. Today I only completed three traces (old and interesting, but I want more) and couple of boilderings and traverses. End of the trainig was devoted to the campus-board exercises, and I completed more than usual, since neither arms nor legs were not tired enough, only fingers on the feets feel the pain.

/life :: Link / Comments (0)


Network tree allocator homepage.


I've create one here. It includes design description, benchmarks, TODO items and all related information.

NTA implementation with design notes has been sent for review. This work was supposed to be funded by external company, but since they dissapeared I will release it in a way I want.
Patch is available in archive.

/devel/networking/nta :: Link / Comments (0)


Kevent.


I've released 'take9' patchset. Following issues were resolved:

  • mmap release bug fix
  • use module_init() instead of late_initcall()
  • use better structures for timer notifications
In kevent TODO there is a new addon - block device notifications: create, remove and error.

/devel/kevent :: Link / Comments (0)


Network tree allocator.


Weekend was quite productive: I've completed per CPU support for NTA, so it is fully per-cpu right now excecpt one case when freeing happens on different CPU than original allocation, in that case I put a chunk into queue to be freed on original CPU.

I've also added support for combined pages, so it is possible to allocate upto 16k on x86 with netwrok tree allocator right now.

While hacking on NTA I've decided to completely drop tree from the allocator, since struct page has enough place to put there a pointer to the node. I'm also working on removing so called container cache for network tree allocator (container is a structure which holds free chunks in a list), so when that tasks are completed I will do first release. I expect it to be done today.

Ok, I've removed container cache entirely, so neither allocation, nor freeing requires any kind allocation anymore (sounds really crazy, but it is).

There is some problem with extensive struct page usage in the network tree allocator - combined pages use page->private member as a pointer to the head of combined pages, while it is a spinlock_t for mapping code, so it is impossible to map combined pages and mappind destroys combining, so I need to create some tricks with page->lru instead of stock combining usage.

Here we go: when chunk of memory is free, it is stored in special LIFO list, since it is free, it is possible to dereference it into list entry itself without any kind of containers around it, since each chunk is at least 32bytes long (it should be L1 cache size actually), it is possbile to store there double linked entry, so removing as long as lookup of that entry takes O(1) (lookup is just a dereferencing of the pointer into list entry).
Since each page->lru has two pointers unused (well, they are used in by kernel, but since allocator is not supposed to return it's pages to the kernel, it is perfectly ok to overwrite them), I placed there a pointer to the node and a cpu number where that page was allocated. So freeing just gets that pointers and checks if CPU it runs on differs from allocation one or not, in case it is the same CPU, node is obtained from page->lru and appropriate neighbour pointers are calculated, which are then dereferenced into struct list_head and removed from appropriate lists. Pointers are combined and thus fragmentation is greatly reduced.

/devel/networking/nta :: Link / Comments (0)


Fri, 11 Aug 2006

Kevent.


I've created 'take8' patchset. It includes:

  • new mmap interface (not tested, waiting for other changes to be acked)
    • use nopage() method to dynamically substitue pages
    • allocate new page for events only when new added kevent requres it
    • do not use ugly index dereferencing, use structure instead
    • reduced amount of data in the ring (id and flags), maximum 12 pages on x86 per kevent fd

/devel/kevent :: Link / Comments (0)


Network tree allocator.


Scalability issues.
SLAB allocator is essentially per-cpu - memory being freed stays on the CPU which calls kfree() even if it was not originally allocated on that CPU. From one point of view it is bad (the same address must live in allocation and freeing CPUs caches and so on), but from other point it is very good, since allocator becomes lock-free. Since SLAB allocator by design can only contain chunks of memory of predefined size even from completely different pages, it can not perform any kind of fragmentation avoidance.
Network tree allocator was designed to be able to combine neighbour chunks into region of bigger size, so when freeing happens allocator will search for neighbours. So if NTA will become per-cpu, allocator must search for neighbours not on freeing CPU, but on CPU which was used for allocation, and since it is possible to simultaneously free different chunks which were originally allocated on the same CPU, there must exist some locking between them. Since freeing allows to change allocation state - i.e. some chunks of free memory can be removed and combined with other chunks, freeing logic must lock part of allocation logic (so allocator would not get chunk which is going to be combined with currently being freed one), so basically we need to introduce at least two locks - per free list (all free chunks are combined into FIFO lists) and per node (since the same node can contain chunks of the different sizes which can be simultaneously freed on different CPUs). Such complex locking can not be cheap, and the worst thing is that each node must contain a lock, which increases it's size from 12 to 36 bytes when debugging is turned off and thus does not fit into single cache line on a lot of arches. Decision to combine chunks only when freeing happens on the same CPU as allocation is not considered, since it is unlikely condition, so it will lead to constant increase of fragmentation.
As practice shows this solution is bad, since there is a problem with locking - allocation path locks list of free objects, gets free chunk, drops free list lock, locks corresponding node, updates node's bitmask, drops node lock; while freeing path gets node from freeing pointer, locks that node, updates it's bitmask, locks list of free objects of one neighbour, searches for that neighbour, drops the lock, locks list of objects for the next neighbour, searches for that neighbour, drops the lock and finally drops the lock for node. This approach has a race.

Interesting idea is not to free objects if freeing happens on different CPU than allocation, and put free object into queue for freeing on the original CPU. When CPU, where allocation originally happend, is going to perform next freeing or allocation, it can combine those batched objects.
In this scheme there is only tiny locking place when object being freed is going to be placed or removed from queue of "semi-free" objects (i.e. queue of objects allocated on different CPU and thus scheduled for freeing there).

/devel/networking/nta :: Link / Comments (0)


Thu, 10 Aug 2006

Kevent.


After all optimisations made for kevent, I'm pleased to announce new record of 2500 requests per second.
epoll/kqueue and similar techniques show about 1600-1800 requests per second for single-threaded trivial web server. Actually not, it is old record, current one is 2600+ req/s.

New kevent patchset has been released, it name is 'take7' and following changes were done after Andrew Morton's review:

  • a lot of comments!
  • do not use list poisoning for detection of the fact, that entry is in the list
  • return number of ready kevents even if copy*user() fails
  • strict check for number of kevents in syscall
  • use ARRAY_SIZE for array size calculation
  • changed superblock magic number
  • use SLAB_PANIC instead of direct panic() call
  • changed -E* return values
  • a lot of small cleanups and indent fixes

I've removed all AIO (both network and VFS) stuff and socket notifications from patchset and resent it to linux-kernel@ and netdev@.

/devel/kevent :: Link / Comments (0)


Wed, 09 Aug 2006

Climbing.


There is only one word, which can describe today's training, and that word is pain.

I've bought new climbing shoes for new records, which are 3 sizes smaller than my usual shoes, that is something like a requirement for complex traces, since feet have better contact with the wall and holds, but next several trainings will not be easy since shoes are not trod out yet.
So todays training was simple, although I tried several interesting old traces to check how new shoes can stay on small holds. Shoes showed that they can do it very well. Training was finished with usual campus-board exercises which were not that hard, since arms did not tired at all.

/life :: Link / Comments (0)


Network tree allocator performance test.


I've run epoll based web server from kevent testbed and got 2301 requests per second, while with usual code it is about 1600-1800 requests per second.
It can be explained by tons of reasons, but this test clearly shows that network tree allocator can behave not worse and maybe better than usual slab one (all debugging options are turned off) for network traffic allocations.

This test (single-threaded web server and httperf as client) has been run without any SMP performance tuning (and I have one gigantic lock right now around all allocations and freeings, but do not think too bad about my mental abilities, it will be completely eliminated after per-cpu tuning is completed (similar to how it is implemented in SLAB-allocator)).

All changes in the core network stack (not including allocator itself) conains of *kmalloc()/kfree() replacement in *alloc_skb()/skb_release_data() and addition of a new field into struct sk_buff which holds total allocation size, since ->totalsize variable can be changed while skb is being processed in the kernel.

/devel/networking/nta :: Link / Comments (0)


Network tree allocator.


While thinking some more about generic tree and hash table data representation, I've come to the conclusion, that tree should be more appropriate case for the structures which can dynamically grow/shrink with the time. For example with netwrok tree allocator it is trivial task to add new memory into the cache, and it is easy task to remove pages (but not trivial, since AVL-tree removing algo is very complex, although fast (and to be 100% honest with the reader, I want to note, that I did not implement it for NTA)), so memory hotplug and various OOM conditions can be handled much more nicely than with table based approach where parts of the table must be relocated.

The same issue comes in mind with recent changes in network hash tables manipulations - table dynamic grow/shrink sometimes requires the whole table relocation, which can be extremely large. As far as I recall there was a discussion about tree vs. table approach and the later was selected, but I do not recall any details already. Well, maybe it's time to reimplement the thing... At least for upcoming fast NAT rework I plan to use trees instead of hash tables to store NAT entries, and since most of my work looks for the most people like researching-only (it is not actually) projects far from reality (only two of them are in the kernel tree), I can create any crazy schemes I like.

/devel/networking/nta :: Link / Comments (0)


Startup system statistic.


For interested reader: while kernel is starting (no network drivers, no NFS) more than 1200 skbs with data are allocated (does not including skb cloning, network adapter is loaded as module later when userspace is ready), when userspace has started and configured interface, without network cable plugged (no fancy stuff like network console) in there were more than 50k skbs allocations and freeings and it is still counting...

/devel/networking :: Link / Comments (0)


Kevent.


I've released 'take6' patchset. Following things have been implemented:

  • removed compilation warnings about unused wariables when lockdep is not turned on
  • do not use internal socket structures, use appropriate (exported) wrappers instead
  • removed default 1 second timeout
  • removed AIO stuff from patchset

/devel/kevent :: Link / Comments (0)


Tue, 08 Aug 2006

Network tree allocator.


I've moved it into the kernel and made all network traffic to be allocated using it. It is not tuned for SMP performance yet (it requires some per-cpu-alike magic), NTA does not support cache grow when there is requirement for that and context allows and there are no interfaces for the zero-copy networking yet, but the most complex part has been implemented already, although there are some bugs there yet.

After I complete SMP tuning I will run some performance tests and start sending and receiving zero-copy network stack implementation.

/devel/networking/nta :: Link / Comments (0)


Kevent.


I've created fifth patchset. It includes:

  • use miscdevice instead of chardev
  • comment fixes
Patchset has been sent to linux-kernel@ and netdev@ and I've asked for inclusion or declining of the kevent subsystem.

/devel/kevent :: Link / Comments (0)


Mon, 07 Aug 2006

Climbing.


It was relaxing climbing today - no complex traces, no negative slope, just several old good ones and one new completed on-sight. I've added campus-board exercise for passive endurance improvement, let's see how it will help, although I perform it not 100% clear yet.

/life :: Link / Comments (0)


Network tree allocator.


Additional 100 milliards of allocations have been done for network tree allocator. It's time to move it into the kernel.

While hacking on NTA I've created special SLAB-like 3 layer cache for struct avl_container - special structures used to store pointers to free chunks inside special crafted FIFO lists.

Now there is only following allocation being done using Linux memory allocation primitives:

  • initial storage structures for AVL trees (i.e. pages of data (which will be reused by tree allocator) and tree nodes) and array of lists of free chunks
  • container cache layers (l1 and l2 are pages, l3 is element of the list, which should be allocated very rarely), which are only allocated when appropriate layer is empty

So in run-time there are no allocations from main memory except rare page-sized allocations to refill container cache. As expected after some short period of time container cache stopped to grow.

Interesting note that after switching to cache allocator from usual malloc()/free() for containers general allocation speeds has increased.

/devel/networking/nta :: Link / Comments (0)


Sun, 06 Aug 2006

Alignment overhead in Linux networking code.


I was involved into recent discussion about jubo frames in e1000 network adapter - it ends up with 32k allocation for just 9k jumbo frames, which is great waste of memory. This happens due to the fact that e1000 does not have MTU at all, it only has maximum allowed segment size, and it must be power of two (with some exceptions), so e1000 rounds 9k to 16k, then network layer adds sizeof(struct skb_shared_info) at the end and SLAB allocator aligns it to 32k.

I've designed IMHO very elegant and simple solution to workaround this issue - if difference between requestd allocation size and aligned size exceeds sizeof(struct skb_shared_info), the latter will be allocated from cache. This will immediately solve a problem with PAGE_SIZE allocation which happens in e1000 for standard 1500 bytes MTU for some chips.
I will implement it later today and send to netdev@ for review.

/devel/networking :: Link / Comments (0)


Network tree allocator.


Let's see how fragmentation problem is being solved in NTA.
For initial test I've run NTA with set of pseudo-random sized allocations until first allocation fails, when it hapens I decrease maximum allocation size in two times. Each graph below shows free and used chunks inside each page (there are 4094 pages), green points correspond to free and red ones - to used chunks (each one of 32 bytes).
Maximum allocation size is equal to PAGE_SIZE, failed allocation was for 1912 bytes (60 chunks of 32 bytes):
Fragmentation graph

Maximum allocation size is equal to PAGE_SIZE/2 (decreased after allocation failure), failed allocation was for 968 bytes (31 chunks of 32 bytes):
Fragmentation graph

Maximum allocation size is equal to PAGE_SIZE/4 (decreased in two times after first and second allocation failures), last failed allocation was for 504 bytes (16 chunks of 32 bytes):
Fragmentation graph

Maximum allocation size is equal to PAGE_SIZE/8 (decreased in two times after each of three allocation failures), last failed allocation was for 252 bytes (8 chunks of 32 bytes):
Fragmentation graph

This tests do not show how fragmentation is changed with the time, when there are a lot of allocations and freeings are completed, but even existing results show that network tree allocator performs very well. Next time I will run the same tests after some pseudo-random allocation and freeing periods.

For comparison I've run the same test with power-of-2 slab-like allocator (actually it is much more simple, but it has the same ideas as SLAB allocator and probably can behave even better if we get into account big-sized chunks). Picture does not change when maximum allocation size is being decreased after allocation failures, since most of the overhead and fragmentation is obtained from power of 2 rounds.
SLAB-like power-of-two allocator overhead and fragmentation

This SLAB-like power-of-two allocator overhead and fragmentation actually looks different than on the picture, since almost all allocations have fragmentation overhead, so each vertical line actually must contain several red(used)-green(free or fragmentation overhead) pieces, where sum of all pieces of the same colour will be equal to what is shown on the picture. But picture presents that absolute amount of fragmentation overhead is extremely high for power-of-2 allocators. For the real SLAB allocator picture will be better for small-sized chunks (since chunks never share pools with different sized ones, except when they can steal pages when cache is refilled), but much worse for big-sized ones.

Difference in used and free chunks position on the pictures is due to the fact, that in network tree allocator chunks in page are shown in reverse order (i.e. higher addresses are first).

/devel/networking/nta :: Link / Comments (0)


Sat, 05 Aug 2006

Network tree alocator.


First test stage completed - more than 52 milliards of allocations of different sizes (currently tested only from 1 to PAGE_SIZE with 32 bytes granularity) have been done, so roughly it is correct. To prove correctness more I plan to start second testing stage, which will include much faster freeing (test will keep pointers to allocated objects inside array, which indexes will be structured to not contain any gaps) and full utilisation of allocated page pools, so if allocation fails, it will be restarted with smaller size until all allocator's memory is used. That will give an interesting statistics of memory usage and fragmentation in tree allocator. I will also include periodical dump of bitmasks of free and used objects so it would be possible to visually observe fragmentation issues.

I've found a way to determine if given address belongs to PAGE_SIZEd chunk or to bigger contiguous region - it is quite simply by looking at page->lru.next and page->private, which can be used to detect compound pages and it's order, which solves a problem of converting a freed address into a page which holds freed area.

/devel/networking/nta :: Link / Comments (0)


Added second photo album from canoe trip to gallery.


One can find Eugene Burnyakov (Wijo) and Alexandra, Alexander Boykov (mephody) and (his wife already) Irina there.
Enjoy!

/life :: Link / Comments (0)


Kevent.


I've released fourth kevent patchset. Changes from the previous 'take3' patchset:

  • removed serializing mutex from kevent_user_wait()
  • moved storage list processing to RCU
  • removed lockdep screaming - all storage locks are initialized in the same function, so it was learned to differentiate between various cases
  • remove kevent from storage if it is marked as broken after callback
  • fixed a typo in mmaped buffer implementation which would end up in wrong index calcualtion

I've sent it to linux-kernel@ and netdev@ for review. As far as I recall there are no issues which must be fixed or changed anymore.

/devel/kevent :: Link / Comments (0)


Fri, 04 Aug 2006

Excellent climbing day.


Amount of alchohol in my blood became optimal, so it was extremely good training today - a lot of old traces (simple and more complex, on the vertical wall and with negative slope, one trace with quite big dynamic jump, on other trace I've broken a hold, but not because I'm that strong or heavy, it was already slightly broken, I just completed the process), several traverses, campus-board exercises, sauna, good weather, not bad supper and I feel myself just damn good.
Trainings become simpler - I did not try new traces quite for a long time already (although two new traces I finished, but they are not very complex), and almost do not climb on negative slope, which is not very good. So things need to be changed soon.

/life :: Link / Comments (0)


Lockdep and kevent.


I've enabled lockdep and found that it catches [ INFO: inconsistent lock state ] when system boots, after looking into the lockdep code and it's lock initialization, I've come to the following conclusion: lockdep initilizes special key for each type of locks, i.e. it uses tricky macro which inserts a static variable just before lock initialization inside spin_lock_init(), so if the same function is used for different locks initialization, lockdep will think that all locks are the same, and if later one of them is called with BH disabled and others are called without any irqs disabled it will fire.
I see exactly that: lockdep fires on kevent_storage_ready() which is called both in softirq context and in process context without BH being disabled, it happens when inode and socket are going to check their's queues, but there is no way the same inode can be used both for socket and something else. Actually it fires even when lockdep was learned to diffirentiate between inode and socket case - there is a case when socket is closed and struct file, assotiated with it, is being freed, kevent calls kevent_storage_fini() to flush all pending kevents, which always happens in process context, and lockdep fires a signal that it is possible (since BH are enabled) that the same lock can be obtained by socket code called from BH context. It is not true, since when kevent_storage_fini() is called appropriate socket already removed from socket table and it can not be accessed from softirqs.

The only sane way I see is to reinit lockdep after it has initilized locks, and reinitialization must happen in the high level (i.e. not in kevent_storage_init() since it does not know what storage is being initialized, and event the same storage owner can be used for different processings (like inode is being used for both socket and VFS notifications)) in for example kevent_socket_enqueue() or kevent_inode_enqueue().

While testing kevents it looks like impossible thing happend - I've found two bugs: one was introduced when mapped buffer was implemented (it is a typo in define, which ends up in wrong calculation of offset inside the buffer), and second one was created when I removed a mutex between events copying into userspace (after they are marked ready and it was detected) and controlling operations, which could end up in a race between removing and waiting code.
I plan to test some RCU ideas inside kevent code (this actually shuts up lockdep, but there were some problems when I first time tried to use RCU with kevents) and complete lockdep related changes tomorrow.

/devel/kevent :: Link / Comments (0)


Network tree allocator.


I've started stress testing for new allocator.
First one is quite simple - system tries to allocate a lot of chunks of random size, when there is no memory or number of allocated chunks exceeds some threshold (1 million allocations), system starts to free them one by one from the begining. It is quite slow test, since test's freeing logic (do not confuse with freeing logic inside allocator) runs through the whole array (currently it contains 1 million entries) of allocated chunks and tries to free them all.

/devel/networking/nta :: Link / Comments (0)


Thu, 03 Aug 2006

TODO.


I've created development TODO list, feel free to send ideas and beer.

/devel :: Link / Comments (0)


Kevent.


I've tested all modifications made before and added an optimisation, which is aimed to help when a lot of kevents are being copied from userspace. I also plan to implement initial mapped buffer today and send the whole patchset as take3 version.

Ok, I've completed initial mapped buffer implementation and sent take3 patchset to linux-kernel@ and netdev@ for review.
Mapped buffer implementation is quite simple - when kevent user queue is created system attaches set of pages to the queue, so the whole queue (not kernel kevents, but user request structures called struct ukevent) could be placed there (maximum allowed queue length is KEVENT_MAX_EVENTS and is equal to 4096 events). Since size of struct ukevent is 40 bytes on every arch, they do not exactly fill the page, so I use 4 bytes at the begining of the first one to store number of ready events placed into the buffer. Events are placed into the buffer when they are queued into ready queue under kevent_user->ready_lock, so updates are always atomic, index update happens after event has been placed into the buffer.

/devel/kevent :: Link / Comments (0)


Wed, 02 Aug 2006

Climbing.


Weekend parties stuff still plays in my blood, so today's training was quite hard. I've only completed three old not that complex traces and couple of traverses. All traces which I previously could finish without the rest in between today were completed with falls, since I tired very quickly, so I even have not started man's start and campus-board exercises. But nevertheless I've spent a good time.

/life :: Link / Comments (0)


Kevent.


I've completed most of the issues Zach Brown (zach.brown@oracle.com) mentioned in his review of kevents, main are:

  • split kevent_finish_user() to locked and unlocked variants
  • do not use KEVENT_STAT ifdefs, use inline functions instead
  • use array of callbacks of each type instead of each kevent callback initialization
  • changed name of ukevent guarding lock
  • use only one kevent lock in kevent_user for all hash buckets instead of per-bucket locks
  • do not use kevent_user_ctl structure instead provide needed arguments as syscall parameters
  • various indent cleanups
New patchset has not been tested yet (only booted with new kernel), so I will release it tomorrow after several checks. Since it will not contain mapped buffer implementation, it's name will be take2' instead of take3.
David Miller has an opinion, that we can completely disable possibility to get events through syscall, and always get them from mapped buffer, but until glibc guys like kevent and complete support for mapped buffer it is not a real solution, so at least for now I plan to use them in parallel.

/devel/kevent :: Link / Comments (0)


Network tree allocator.


Userspace model has been completed.
I have not run stress tests yet, but it already can allocate set of objects and combine them back when they are freed. Let's see an example:

  • allocate 120 bytes
  • allocate 200 bytes
  • allocate 70 bytes

  • free 120 bytes
  • free 70 bytes
  • free 200 bytes
This ends up in the following sequence:
  • get one page
  • split it to 120 and PAGE_SIZE-120 bytes parts
  • mark PAGE_SIZE-120 bytes part as free and move it's container into new list
  • split PAGE_SIZE-120 into 200 and PAGE_SIZE-120-200 bytes parts (second allocation)
  • mark PAGE_SIZE-120-200 bytes part as free and move it's container into new list
  • split PAGE_SIZE-120-200 into 70 and PAGE_SIZE-120-200-70 bytes parts (third allocation)
  • mark PAGE_SIZE-120-200-70 bytes part as free and move it's container into new list

  • free first chunk (120 bytes) - it was first in the page and it does not have free neighbours (above 200 bytes allocation was done right after this chunk and it is used)
  • allocate new container and add it into the list for 120 bytes (aligned to AVL_MIN_SIZE actually) free objects
  • free third chunk (70 bytes) - it has a big free area at the left (note that I'm talking about little endian here), of PAGE_SIZE-120-200-70 bytes and second allocated (currently used) chunk of 200 bytes at the right, so this freeing reuse container from the left chunk and move it into the list for PAGE_SIZE-120-200-70+70 bytes
  • free 200 bytes chunk - it has both free neighbours: with PAGE_SIZE-120-200-70+70 bytes and 120 bytes. So it will reuse container for 120 bytes, so it results in PAGE_SIZE chunk.

For those who likes to look at unknown logs and symbols (like I do, especially new dmesgs), here is debug output from initial implementation of network tree allocator, produced by described above steps:
PAGE_SIZE: 4096, max nodes: 4094, node size: 32.
avl_update_node: node: 0x523800, value: 02554000, ptr: 0x2554000, cpos: 127, size: 128, num: 4.
avl_fill_bits: num: 4, pos: 0, idx: 0, p: f, start: 0, stop: 4, fffffffffffffff0.
avl_update_node: reuse container 0x2555f60 in pos 123 with ptr 0x2554080.
main: allocated ptr: 0x2554000, size: 120.

avl_update_node: node: 0x523800, value: 02554000, ptr: 0x2554080, cpos: 123, size: 224, num: 7.
avl_fill_bits: num: 7, pos: 4, idx: 0, p: 7f0, start: 4, stop: 11, fffffffffffff800.
avl_update_node: reuse container 0x2555f60 in pos 116 with ptr 0x2554160.
main: allocated ptr: 0x2554080, size: 200.

avl_update_node: node: 0x523800, value: 02554000, ptr: 0x2554160, cpos: 116, size: 96, num: 3.
avl_fill_bits: num: 3, pos: 11, idx: 0, p: 3800, start: 11, stop: 14, ffffffffffffc000.
avl_update_node: reuse container 0x2555f60 in pos 113 with ptr 0x25541c0.
main: allocated ptr: 0x2554160, size: 70.

avl_free: ptr: 0x2554000 [02554000], pos: 0, sbits: 4, size: 120.
avl_fill_bits: num: 4, pos: 0, idx: 0, p: f, start: 0, stop: 4, ffffffffffffc00f.
avl_combine: lp: (nil), lbits: 0, lc: (nil), rp: (nil), rbits: 0, rc: (nil), 
	current: ptr: 0x2554000, bits: 4, combined: ptr: 0x2554000, idx: 3, cont: (nil).
avl_combine: Added new container for pointer 0x2554000, size: 128.
main: freed ptr: 0x2554000.

avl_free: ptr: 0x2554160 [02554000], pos: 11, sbits: 3, size: 70.
avl_fill_bits: num: 3, pos: 11, idx: 0, p: 3800, start: 11, stop: 14, fffffffffffff80f.
avl_free: found free left neighbour at 0x25541c0, bits: 114.
avl_combine: lp: 0x25541c0, lbits: 114, lc: 0x2555f60, rp: (nil), rbits: 0, rc: (nil), 
	current: ptr: 0x2554160, bits: 3, combined: ptr: 0x2554160, idx: 116, cont: 0x2555f60.
avl_combine: Using existing container for pointer 0x2554160, size: 3744.
main: freed ptr: 0x2554160.

avl_free: ptr: 0x2554080 [02554000], pos: 4, sbits: 7, size: 200.
avl_fill_bits: num: 7, pos: 4, idx: 0, p: 7f0, start: 4, stop: 11, ffffffffffffffff.
avl_free: found free left neighbour at 0x2554160, bits: 117.
avl_free: found free right neighbour at 0x2554000, bits: 4.
avl_combine: lp: 0x2554160, lbits: 117, lc: 0x2555f60, rp: 0x2554000, rbits: 4, rc: 0x2555f80, 
	current: ptr: 0x2554080, bits: 7, combined: ptr: 0x2554000, idx: 127, cont: 0x2555f80.
avl_combine: Using existing container for pointer 0x2554000, size: 4096.
main: freed ptr: 0x2554080.

Completed.

/devel/networking/nta :: Link / Comments (0)


Tue, 01 Aug 2006

Network tree allocator.


While creating various bitfield operations I've found, that several existing Linux kernel ones are way too suboptimal, for example set_bit_string() and __clear_bit_string() on x86_64 (actually I have not seen in other arches). And I'm saying not about assembler optimisations, but usual C ones.
So right now I'm a bit snapped between kevents and friends, tree allocator, slacking and paid work (yep, I need to work not less than 8 hours every day to get some beer and other goodies), but I plan to complete userspace implementation very soon, since most of the things are already implemented.

/devel/networking/nta :: Link / Comments (0)


Kevent.


I've released second patchset and sent it to netdev@ and linux-kernel@ for review.
It still contains AIO and aio_sendfile() implementation on top of get_block() abstraction, which was decided to postpone for a while (it is simpler right now to generate patchset as a whole, when kevent will be ready for merge, I will generate patchset without AIO stuff).

It does not contain mapped buffer implementation, since it's design is not 100% completed, I will present that implementation in the third patchset.

Changes from previous patchset:

  • rebased against 2.6.18-git tree
  • removed ioctl controlling
  • added new syscall kevent_get_events()
  • use old syscall kevent_ctl() for creation/removing, modification and initial kevent initialization
  • use mutuxes instead of semaphores
  • added file descriptor check and return error if provided descriptor does not match kevent file operations
  • various indent fixes
  • removed aio_sendfile() declarations

/devel/kevent :: Link / Comments (0)