|
|
About ::
TODO ::
Blog ::
RSS ::
Old blog ::
Projects ::
GIT ::
Gallery ::
Notes
Thu, 31 Aug 2006
Back from Scandinavian trip.
I've just returned from small 4 days vacations in Finland and Sweden.
If you want my brief opinion on how it was, then I can say, that it was
exteremely good time there.
Day 1.
Moscow - St. Petersburg train trip.
Since train moved the whole day (from 10:30 to 19:00), some beer was taken
and Confucian book. There were very interesting neighbours - Shvechikov Alexey Nikolaevich -
professor of history of religion and his wife. We had a very interesting discussion
about technical world and history, about Confucianism and various religions,
about "Russian Way" and does it exist at all, how country and nation intellectual development
happend and where it will end up. He wrote a book "Religion. History. Science. History of
western civilization. Experience of hitsory-metodological research." about
civilization evolution and where and how russian way (I still do not agree with him,
I think that at least right now we do not have any such "way" except eating nature resources
and steal from each other) was created and changed. That book will be available in January.
When we were in St. Petersburg we found an extremely good bookshop - Bukvoed -
gigantic selection of books, nice cafe where you can read them, nice music... I definitely recommend
it for everyone who will be there - it is situated just near Revolt square (ploshad' Vosstaniya).
Day 2.
We started to move to Helsinki in about 1 A.M. and arrived there early morning (about 8:00 or 9:00).
We found that there is either still a weekend in Monday or people in Helsinki start
to work later - there were almost no people on the streets (although it is quite small amount
of them at all - about 5.5 millions of people in the whole country, which is about 2 times
smaller than in Moscow).
I think finland built kommunism (or something very good at least) - all people look very happy,
they respect each other and rules (for example if you even try to cross the street, vast majority
of drivers will stop even if there are no crossing lines). Almost everyone knows english (and two other
national languages - finnish and swedish). We had some small dinner - and paid only for beer and
(very big) main dish - garnish, salad, soup, tea and coffee all for free.
I never saw such restaurants in Moscow, although there some of them which provide some kind of buffet table.
Drivers are very calm - no signals, no racers, if there is 30 km/h speed limit, than everyone will
move with that speed, I did not see there any traffic jams.
I only saw police car three times, and there is no road police there (at least in Helsinki).
It is extremely good country, but all it's advantages are completely destroyed by one main disadvantge (for me at least):
I like russain type of women much more than scnadinavian one :), so I will not move there at least for now.
We moved to Sweden from finland city Turku on gigantic ship "Silja Europe", which is really a small
city in the sea - it has just everything you can imagine.
Day 3.
Sweden. We arrived to Stockholm early morning and had a small tour of the city.
I especially liked small sailing ships moored not far from city hall.
We visited "Wasa" museum (which was original goal of my tour) - museum devoted completely
to one ship built in 17'th century (it was sunk in the first voyage), which was extremely interesting.
Then me and Grange put out to sea on
Sweden Grand Royal Navy pride ship - rented canoe on Jurgarden island ("Animal's isalnd", sorry if I call it wrong),
which was successfully ended in an hour. While being afloat some tourists had photos of us naively
thinking that we are Stockholm's aborigins.
While walking on Stockholm's streets, we found interesting bar called "Oliver Twist" (it looks like
it is english bar in Stockholm) where
had a dinner and drunk some swedish beer - it was not bad, had very interesting taste,
but (as long as finnish one) I do not like it. Bar itself was relly nice place - if you want to find it,
you need to move left from King's palace to the neighbour island, then move upstairs over
some very busy street and you can find it somewhere not far from local church (well, not very
productive description, but I can not remember and even read swedish street names, although
they look simpler than finnish ones). Moving back we had a walk not over central foot-street
(it looks like it is really the place where only tourists walk like Arbat in Moscow),
but on other very busy street, where found that Stockholm's life is exactly the same as in Moscow -
busy crowds of people, racers and bad drivers on roads (although not as much as on Moscow streets)
and so on - usual busy life from usual busy city, but not something new like Helsinki.
There are a lot of cyclists in Helsinki and Stockholm - they are full members of traffic flows,
there are a lot of special bicycle lanes in parallel with footpathes and car roads. I've found
that cyclists do not stop (even try to stop) when you cross theirs line, which can be quite
danger.
We moved back to Finland on "Silja Festival" ship - it is slighlty smaller than "Silja Europe",
but still is a city in the sea - very powerfull construction on the water.
Day 4.
We spent several hours in Helsinki, moved to the far outskirts of town and found that sleeping districts
are built right in the forest - small buildings of three-four floors, most of which are even
smaller than surrounding pine-trees.
Then we moed back to St. Petersburg, sat in the book-cafe, where I bought myself
a book about russian history from 9 to 20 centuries and moved to Moscow,
where I write this story.
Expect a lot of photos soon!
It was really good journey to completely different world. I enjoied it very much.
/life :: Link / Comments (0)
Sat, 26 Aug 2006
Zero-copy sending and receiving support.
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
16:47:27.233768 IP10 truncated-ip - 256 bytes missing! 192.168.4.78 > 192.168.0.48: udp
0x0000: abab 0578 abab abab ab11 abab c0a8 044e
0x0010: c0a8 0030 abab abab abab abab abab abab
0x0020: abab abab abab abab abab abab abab abab
0x0030: abab abab abab abab abab abab abab abab
0x0040: abab abab abab abab abab abab abab abab
0x0050: abab
This is zero-copy sent datagram, which was captured on receiving side, as you can see
it is perfectly correct (i.e. it contains exactly those IP and higher layers,
which were filled in userspace on sending side).
I've also cleaned zero-copy mapping support a lot, so there would not
appeared some situations when allocation would not be caught due to mmap troubles
(like different CPU mapping crossing and so on).
I also moved notification about new packet arrival in zero-copy sniffer into
freeing function, since when it is placed in allocation one userspace can
find new buffer until it is even filled by the kernel. When buffer is being freed,
it is obviously already contains data (except cases when allocated object
was not used at all).
In general zero-copy sniffer can not catch data changes happend somewhere inside
main processing code, for example IPsec packet can not be caught decrypted,
since it is very short time while packet itself is in transient state after
receiving and decryption, in such cases that transient states must be copied,
for example using new allocation (which freeing will be caught by sniffer),
memory copying and immediate freeing.
There is still a small problem there with freeing - due to addition of struct skb_shared_info,
but it is not really that complex, so I will postpone it for a while and
will try to implement trivial dump analyzer.
Almost forgot, you can find current patch and userspace utilities in archive.
/devel/networking/zcs :: Link / Comments (0)
Fri, 25 Aug 2006
Kevent.
I've released 'take14' patchset. Short changelog:
- do not get lock aroung user data check in
__kevent_search()
- fail early if there were no registered callbacks for given type of kevent
- trailing whitespace cleanup
/devel/kevent :: Link / Comments (0)
Wed, 23 Aug 2006
How to get an IP address.
There are some situations, when DHCP can not be used, but board must obtain an IP address,
for example when you have an embedded device which is going to be sold into environment where
there are no DHCP servers, but you can setup your own applications.
One solution is to reinvent DHCP server, but it is possible that new environment actually
has DHCP server, but it should not assign an address to the board (read no one wants
to notify people when some new system has been installed in the network, and steal IP address
from some pool preallocated far ago).
Briefly saying you need to implement something, which must fit into 500kb flash
and be possible to obtain addresses from the outside, and it can not use DHCP.
I decided to use multicast - board sends multicast notifications and some userspace application receives
them and sends back information with IP addresses. From the first point of view
it looks simple, but let's enter wonderland of Linux inet devices.
To be able to send data there must be default route, which can be setup from initrd
using rtnetlink. Rtnetlink new route command requires that there must exist
an so called inet device (structure which contains IP address information for given
network device), which must have an IP address, which in turn can be assigned
through rtnetlink. But new adressess assignment requires that device in question
must be turned on, which can not be performed through rtnetlink, and only works through ioctl().
So, if you will try to send and receive some data over multicast
(for example to get IP address) from initrd, you must:
- change interface's flags to
IFF_UP using ioctl(SIOCGIFFLAGS)
- set some IP address (DHCP sets 255.255.255.255) using rtnetlink
RTM_NEWADDR command
- set route for given multicast group using rtnetlink
RTM_NEWROUTE command
Size of the application, which does exactly what is described above,
compiled with gcc-3.4.4 for PPC32,
can fit into 500kb (although there is almost no space for something else, except if that
"something else" does not work with network).
That is what I was doing (well, I did not exactly that, but put tons of printk()
to determine why I only get RTNETLINK answers: No such device and other errors from initrd, but can easily
setup new route from normal userspace started over NFS (after DHCP address resolution)),
all the time today at my paid work. And of course I participated in, seems to be, endless flood in linux-kernel@
about kevents.
/devel/networking :: Link / Comments (0)
Kevent.
Discussion about kevents in linux-kenel@ has come to interesting point. Here is couple of citations:
Go fuck yourself
...
In a decent society you would have your nose broken
and the like.
I think it is obvious that it is highly professional discussion.
I've released 'take13' patchset. Short changelog:
- remove non-chardev interface for initialization (Christoph Hellwig)
- use pointer to
kevent_mring instead of unsigned longs (Christoph Hellwig)
- use aligned 64bit type in raw user data (can be used by high-res timer if needed)
- simplified enqueue/dequeue callbacks and kevent initialization (based on work by Eric Dumazet)
- use nanoseconds for timeout
- put number of milliseconds into timer's return data
- move some definitions into user-visible header
- removed filenames from comments
Let's see what new words it will bring to the linux-kernel@ readers.
/devel/kevent :: Link / Comments (0)
Tue, 22 Aug 2006
Zero-copy networking.
I've implemented initial zero-copy sending support based on
network allocator.
Here is tcpdump dump:
tcpdump: listening on lo, link-type EN10MB (Ethernet), capture size 96 bytes
20:55:13.709761 IP0 [|ip]
0x0000: 0000 0000 0000 0000 0000 0000 0800 ..............
There is a problem though - I mmap data for only CPU0-bound allocator, but it is possible
that allocation happens on different CPU, so there will be incorrect data (that is what
you see above - there should not be any zeroes).
This problem should be fixed by proper protocol between userspace sniffer and network
allocator (currently what is being used can not be called protocol at all).
Since I introduced ->ioctl() method anyway I will use appropriate commands
there.
/devel/networking/zcs :: Link / Comments (0)
Mon, 21 Aug 2006
Climbing.
Easy training today - I've damaged one finger on previous training,
so there were no major progress today - several old
traces and couple traverses. I even recall my first complex
trace I completed about half of a year ago, and completed it
suite easily - either it was changed a bit to be simpler,
or my progress does not stay on the same place. Also completed
"jumping" trace - I changed it's path to make a small dynamic jump (about
several tens of santimeters high) instead of a loop,
although no one likes it I prefer it definitely.
Trace over rock-crack holds was not completed today - I failed
several times even in the middle - shoes are not ready yet,
finger is aching and I generally do not complete
without major trainings traces on negative (even small) slope
of that complexity. So, no record yet, but they are coming.
/life :: Link / Comments (0)
Kevent.
I've released 'take12' patchset. Short changelog:
- include missing headers into patchset
- some trivial code cleanups (use goto instead of if/else games and so on)
- some whitespace cleanups
- check for
ready_callback() callback before main loop which should save us some ticks
P.S. I do not see that kevent integration process comes to it's final end (no mater if it is inclusion
or decline) actually...
/devel/kevent :: Link / Comments (0)
Zero-copy sniffer.
I've completed entirely zero-copy sniffer based
on network (formerly tree) allocator.
I've sent the whole patchset to netdev@ for review. One
can find it and userspace utility in archive.
Design notes.
Network allocator steals pages from main system allocator and use them
for all network allocations (it's benefits are behind the scope of
zero-copy sniffer description, one can find network allocator features
on project's homepage),
thus it is possible to mmap all stolen pages
from userspace and provide special structure for each allocated chunk
into userspace which include offset from the begining of the node (each
node contains contiguous page-aligned memory region), node number and
other info. Since network allocator tracks number of users for for each
memory region, when the last one completes with data procesing (for
example userspace sniffer), it must commit that area back to allocator,
so NTA relies on correct values returned from userspace (if returned
from userspace chunk is not valid, it will not be freed, but if
userspace will not "free" chunks (by sending info about them back to
kernel) eventually maximum allowed number of shared free regions is
achieved and no more data will be sent to userspace (and be allowed to
be shared).
Since by default network tree allocator is used for all network
allocations (including unix sockets and netlink), sniffer will get all
those data and must somehow differentiate between them. That task is
out of the scope for this mail though, simple solution is just to
attach network allocator to network device (i.e. call NTA allocation
functions from netdev_alloc_skb() only).
I never run any special performance tests, but simple "top" command shows
much smaller CPU usage for zero-copy sniffer (although it gets all data
from every skbs in the machine) compared to tcpdump - 17% vs. 33%
maximum on my test machine.
Both sniffers dump received data into /dev/null.
Server side (where sniffers run) runs epoll() based trivial web server,
client side runs httperf.
Machines are connected over 100mbit LAN (e100 server NIC, 8169 client
NIC).
For zero-copy userspace netchannels I plan to only send to userspace
information about allocations which really belong to created netchannel
instead of info for each chunk.
Sending zero-copy support is in TODO.
/devel/networking/zcs :: Link / Comments (0)
Sun, 20 Aug 2006
What is the hope and where one can find it?
We generally think that it comes from the future, but
if we try to analyze, to deeply think about it we will
only find some plans, tasks, wishes there, but
is it a hope? No, it is not.
Maybe hope lives in the present? In present everything is clear,
one can even see how concrete things are - there is no hope here.
The only time where it can come from is the past - even old family
photo album with black and white photos contain the hope, since
it is enough just to see that life was normal for hope to burn...
It is possible to rush in cities, airports and hotels, to be sick and have
big financial problems, do not have a home but be happy. Absolutely happy.
Because she is waiting for you. Not some abstract waiting, but exactly person,
that you want to be waiting.
And one can be successfull and show considerable promise, but be completely unahappy.
Just because no one waits for you...
And you have a phone and it has a number, but no one calls for you, because
she does not call and thus it seems that no one calls.
But here number fires up in your brain, and you know that you should not
call her, in no way, since it will be only worse, much worse in any case.
And just after this thought has come to your mind, you immediately call her.
And things become much worse in any case.
Small rings - she talks with someone, with whom?
Long rings - she does not take a phone, she knows you number and do not
want to talk...
Or operator says that abonent is not accessible right now.
Or you hear voice, her voice, but she does not want and does
not like it, and some false answers and absolutely indifferent questions...
Sometimes it is impossible to just seat behind you window, fire your lamp
and send a signal somewhere into the city about your solitude and hope
that someone will feel it and respond. And you so much want to get out
to somewhere where you will not be alone, somewhere where there are no people,
since if there are no people, there can not be a loneliness. And the more city is,
the more people are there, the more you feel your solitude...
Eugene Grishkovec "The Planet"
/life :: Link / Comments (0)
Sat, 19 Aug 2006
Network usage statistic.
While hacking on zero-copy sniffer
I've discovered additional interesting thing about networking in Linux.
First one
was quite rough number of allocation/freeing on system startup.
Now I see continuous allocation/freeing flow of 292 bytes objects.
I think it is netlink messages, but why they do not stop? And who calls them?
Initial suspected is kobject_uevent, at lest while kernel is being started
and new devices are discovered it broadcasts a lot of messages (from 292 to 2340 bytes)
about each one.
But it is really broken behaviour to continuously spam userspace.
Have you seen "Why userspace sucks?" slides
by Dave Jones about
stupid things userspace can do? I think this can be a continuation...
Well, things are not that bad - it stopped after some period of time, so it was false alarm from my side.
/devel/networking :: Link / Comments (0)
Zero-copy sniffer. First results.
add@/class/mem/kmsg.ACTION=add.DEVPATH=/class/mem/kmsg.SUBSYSTEM=mem.SEQNUM=105.MAJOR=1.MINOR=11
................................................................................................
................................................................................................
....
add@/devices/system/timer/timer0.ACTION=add.DEVPATH=/devices/system/timer/timer0.SUBSYSTEM=timer.SEQNUM=106
...........................................................................................................
..............................................................................
oc->avl_node_list);...alloc->avl_container_array = kzalloc(sizeof(struct list_head) * AVL_CONTAINER_ARRAY_S
IZE, GFP_KERNEL);..if (!alloc->avl_container_array)...goto err_out_exit;...for (i=0; i<AVL_CONTAINER_ARRAY_
SIZE; ++i)...INIT_LIST_HEAD(&alloc->avl_container_array[i]);...entry = avl_node_entry_alloc(GFP_KERNEL, AVL
_ORDER);..if (!entry)...goto err_out_free_container;...avl_node_entry_commit(entry, cpu);...return 0;..err_
out_free_container:..kfree(alloc->avl_container_array);.err_out_exit:..return -ENOMEM;.}../*. * Initialize
network allocator.. */.int avl_init(void).{..int err, cpu;...for_each_possible_cpu(cpu) {...err = avl_init_
cpu(cpu);...if (err)....goto err_out;..}...err = avl_init_zc();...printk(KERN_INFO "Network tree allocator
has been initialized.\n");..return 0;..err_out:..panic("Failed to initialize network allocator.\n");...retu
rn -ENOMEM;.}..............................................................................................
.........................................................................................................
.k........ ................................................................................................
......................................................................................a.....*.c..E....,@.@.
. ...N...0.P.3Epe_;........_...........1......width: 47%;.....padding-right: 3%;.....float: left;.....paddi
ng-bottom: 2em;....}.....content-column-left hr {.....display: none;....}.....content-column-right {...../*
Values for IE/Win; will be overwritten for other browsers */.....width: 47%;.....padding-left: 3%;.....floa
t: left;.....padding-bottom: 2em;....}.....content-columns>.content-column-left, .content-columns>.content-
column-right {...../* Non-IE/Win */....}....img {.....border: 2px solid #fff;.....padding: 2px;.....margin:
2px;....}....a:hover img {.....border: 2px solid #f50;....}..../*]]>*/...</style>..</head>...<body>...<h1>F
edora Core <strong>Test Page</strong></h1>....<div class="content">....<div class="content-middle">.....<p>
This page is used to test the proper operation of the Apache HTTP server after it has been installed. If yo
u can read this page, it means that the Apache HTTP server installed at this site is working properly.</p>.
...</div>....<hr />.....<div class="content-columns">.....<div class="content-column-left">......<h2>If you
are a member of the general public:</h2>.......<p>The fact that you are seeing this page indicates that the
website you just visited is either experiencing problems, or is undergoing routine maintenance.</p>.......<
p>If you would like to let the administrators of this website know that you've seen this page instead of th
e page you expected, you should send them e-mail. In general, mail sent to the name "webmaster"............
...........................................................................................................
...........................................................................................................
..........................
Above junk was obtained from zero-copy sniffer running with epoll based web server on my test machine
(I manually repleaced all "<" symbols with "<" in the dump to not break HTML formatting).
First two dumps are kobject_uevent during startup ('.' means unprintable symbol, i.e.
some binary data), then you can see part of my
network tree allocator code being
transferred over ssh (decrypted text being sent over unix socket),
and at the end there are some pieces of default web page (copied from Fedora Core apache default index.html)
and some unknown symbols all over the place.
Binary data at the end of each chunk is added for alignment, binary data at the beginning is header,
and one in the middle corresponds to tabs, line foldings and so on.
It works, although there are issues yet to resolve - for example mapping code
only maps initial cache, userspace can not see when it has grown yet, sniffer also does not know
how many pages are inside each new cache cnunk. I will resolve that issues
soon and send code to netdev@ for review.
/devel/networking/zcs :: Link / Comments (0)
Zero-copy sniffer.
Implementation of design with additional bitmask wastes too much space per node, so
I decided to create much more simple solution - attach a tag
to each allocated chunk, which contains a canary and reference counter.
The former is just 4 bytes of special data which is used to check
in freeing function if object being freed is valid and there were no
memory corruption. Reference counter is used to mark mapped objects
as used, so freeing would not destroy them. The only thing to implement
is ->nopage() method for zero-copy sniffer underlaying char device,
so when network allocator cache grows user could automatically be able
to get new pages into mapping.
/devel/networking/zcs :: Link / Comments (0)
Zero-copy networking.
Initial zero-copy implementation is receiving side for
zero-copy sniffer
based on network allocator.
/devel/networking/nta :: Link / Comments (0)
ICFPC-2006 contest.
I do not know anything about functional programming (except that
it is different from imperative programming), but this
reading (in russian) about International Conference on Functional Programming contest
is extremely interesting.
/other :: Link / Comments (0)
Fri, 18 Aug 2006
Very good climbing training.
I've completed couple of interesting traverses and
several old good traces, also tried a trace, which
was started on the previous training - orange trace
over rock crack-like holds. Since I was tired anf it was end of the training,
I failed somewhere at the second half, but I feel I can
complete it. Excellent training was finished with
campus-board exercises, tired body and very good mood.
/life :: Link / Comments (0)
Towards full zero-copy network support.
I've started zero-copy sniffer implementation, which
is quite straightforward - each node contains bitmask
of free/used chunks and bitmask of mapped (and used)
to userspace chunks,
when some area is mapped and is marked as being used
and is going to be freed, freeing algorithm checks if
it can do it or not, so freeing actually can be postponed
(for arbitrary long time). Userspace reads from special
char device set of structures which show allocated
pointers and theirs sizes, so it can access raw data.
Writing the same structures to that char device marks
appropriate chunks of memory as mmaped but unused,
so it can be freed when needed. Mmap itself is not
implemented yet.
/devel/networking/nta :: Link / Comments (0)
Thu, 17 Aug 2006
Kevent.
Short changelog for 'take11' patchset:
- removed non-existent prototypes
- added helper function for
kevent_registered_callbacks
- fixed 80 lines comments issues
- added shared between userspace and kernelspace header instead of embedd them in one
- code restructuring to remove forward declarations
- s o m e w h i t e s p a c e c o d y n g s t y l e c l e a n u p s
- use
vm_insert_page() instead of remap_pfn_range()
What really demotivates me hard in this process, is absence of the real vision
of what should be done in trivial aspects like spaces and enums vs. defines.
For example initial code contained enums, then I was suggested to use defines, now
people tell me to use enums again, the same issues with type of underlaying device
(char, misc, syscall) and so on... That is why I hate linux-kernel@ mail list (and of course
because of it's politic and other floods).
/devel/kevent :: Link / Comments (0)
Wed, 16 Aug 2006
Climbing.
My new shoes are almost ready, so today I tried several old interesting
traces and one new complex trace (actually it is very old, but I never
completed it in the past), but since shoes slowly killed my feet, it was quite
hard to climb. I think in a couple of training I will be ready for new records,
so stay tuned...
/life :: Link / Comments (0)
Kevent.
I've released 'take10' patchset. Changes from 'take9' only contain
fix for ->nopage() method.
/devel/kevent :: Link / Comments (0)
Network allocator.
I've released second version of network allocator and sent it to mail lists for review.
Short changelog:
- added dynamically grown cache
- changed some inline issues
- reduced code size
- removed AVL tree implementation from the sources
- changed minimum allocation size to l1 cache line size (some arches require that)
- removed
skb->__tsize parameter
- added a lot of comments
- a lot of small cleanups
As usual patch is available in archive.
/devel/networking/nta :: Link / Comments (0)
Tue, 15 Aug 2006
Network allocator.
After some cleanups it is possible to achieve more than 2460 requests per second
with trivial epoll based web server on system with network allocator instead of
usual kmalloc/SLAB one for network payload data (for reference: system with
kmalloc/SLAB allocator can only handle 1600-1800 requests per second).
/devel/networking/nta :: Link / Comments (0)
Mon, 14 Aug 2006
Climbing.
It was easy training again - climbing shoes are not trod out yet,
although I feel myself much better already in them.
I expect in a couple of training shoes will be ready
for new records nd I finally start new complex traces, sine
I did not climb interesting ones quite for a while already.
Today I only completed three traces (old and interesting,
but I want more) and couple of boilderings and traverses.
End of the trainig was devoted to the campus-board
exercises, and I completed more than usual,
since neither arms nor legs were not tired enough,
only fingers on the feets feel the pain.
/life :: Link / Comments (0)
Network tree allocator homepage.
I've create one here.
It includes design description, benchmarks, TODO items and all related information.
NTA implementation with design notes has been sent for review. This work was supposed to be funded by
external company, but since they dissapeared I will release it in a way I want.
Patch is available in archive.
/devel/networking/nta :: Link / Comments (0)
Kevent.
I've released 'take9' patchset. Following issues were resolved:
- mmap release bug fix
- use
module_init() instead of late_initcall()
- use better structures for timer notifications
In kevent TODO there is a new addon - block device notifications: create, remove
and error.
/devel/kevent :: Link / Comments (0)
Network tree allocator.
Weekend was quite productive: I've completed per CPU support for
NTA, so it is fully per-cpu right now excecpt one case when freeing
happens on different CPU than original allocation, in that case
I put a chunk into queue to be freed on original CPU.
I've also added support for combined pages, so it is possible to allocate upto 16k
on x86 with netwrok tree allocator right now.
While hacking on NTA I've decided to completely drop tree from the allocator, since
struct page has enough place to put there a pointer
to the node. I'm also working on removing so called container cache
for network tree allocator (container is a structure which holds
free chunks in a list), so when that tasks are completed I will do
first release. I expect it to be done today.
Ok, I've removed container cache entirely, so neither allocation, nor freeing
requires any kind allocation anymore (sounds really crazy, but it is).
There is some problem with extensive struct page usage in the network
tree allocator - combined pages use page->private member as a pointer
to the head of combined pages, while it is a spinlock_t for mapping code,
so it is impossible to map combined pages and mappind destroys combining,
so I need to create some tricks with page->lru instead of
stock combining usage.
Here we go: when chunk of memory is free, it is stored in special LIFO list, since
it is free, it is possible to dereference it into list entry itself without any kind
of containers around it, since each chunk is at least 32bytes long (it should be
L1 cache size actually), it is possbile to store there double linked entry,
so removing as long as lookup of that entry takes O(1) (lookup is just a dereferencing
of the pointer into list entry).
Since each page->lru has two pointers unused (well, they are used in
by kernel, but since allocator is not supposed to return it's pages to the kernel,
it is perfectly ok to overwrite them), I placed there a pointer to the node and
a cpu number where that page was allocated. So freeing just gets that pointers
and checks if CPU it runs on differs from allocation one or not, in case it is the same
CPU, node is obtained from page->lru and appropriate neighbour pointers are
calculated, which are then dereferenced into struct list_head and
removed from appropriate lists. Pointers are combined and thus fragmentation is greatly
reduced.
/devel/networking/nta :: Link / Comments (0)
Fri, 11 Aug 2006
Kevent.
I've created 'take8' patchset. It includes:
- new mmap interface (not tested, waiting for other changes to be acked)
- use
nopage() method to dynamically substitue pages
- allocate new page for events only when new added kevent requres it
- do not use ugly index dereferencing, use structure instead
- reduced amount of data in the ring (id and flags), maximum 12 pages on x86 per kevent fd
/devel/kevent :: Link / Comments (0)
Network tree allocator.
Scalability issues.
SLAB allocator is essentially per-cpu - memory being freed stays
on the CPU which calls kfree() even if it was not
originally allocated on that CPU. From one point of view
it is bad (the same address must live in allocation and freeing
CPUs caches and so on), but from other point it is very good, since
allocator becomes lock-free. Since SLAB allocator by design
can only contain chunks of memory of predefined size even from completely
different pages, it can not perform any kind of fragmentation avoidance.
Network tree allocator was designed to be able to combine neighbour chunks into
region of bigger size, so when freeing happens allocator will search
for neighbours. So if NTA will become per-cpu, allocator must search for
neighbours not on freeing CPU, but on CPU which was used for allocation,
and since it is possible to simultaneously free different chunks which were
originally allocated on the same CPU, there must exist some locking between
them. Since freeing allows to change allocation state - i.e. some chunks of
free memory can be removed and combined with other chunks, freeing logic
must lock part of allocation logic (so allocator would not get
chunk which is going to be combined with currently being freed one),
so basically we need to introduce at least two locks -
per free list (all free chunks are combined into FIFO lists) and per node
(since the same node can contain chunks of the different sizes which can be
simultaneously freed on different CPUs). Such complex locking can not be cheap,
and the worst thing is that each node must contain a lock, which increases
it's size from 12 to 36 bytes when debugging is turned off and thus does not fit
into single cache line on a lot of arches. Decision to combine chunks only
when freeing happens on the same CPU as allocation is not considered, since
it is unlikely condition, so it will lead to constant increase of fragmentation.
As practice shows this solution is bad, since there is a problem with locking -
allocation path locks list of free objects, gets free chunk, drops free list lock,
locks corresponding node, updates node's bitmask, drops node lock; while freeing
path gets node from freeing pointer, locks that node, updates it's bitmask,
locks list of free objects of one neighbour, searches for that neighbour, drops
the lock, locks list of objects for the next neighbour, searches for that neighbour,
drops the lock and finally drops the lock for node. This approach has a race.
Interesting idea is not to free objects if freeing happens on different CPU than allocation,
and put free object into queue for freeing on the original CPU. When CPU, where
allocation originally happend, is going to perform next freeing or allocation,
it can combine those batched objects.
In this scheme there is only tiny locking place when object being freed is going
to be placed or removed from queue of "semi-free" objects (i.e. queue of objects allocated on
different CPU and thus scheduled for freeing there).
/devel/networking/nta :: Link / Comments (0)
Thu, 10 Aug 2006
Kevent.
After all optimisations made for kevent, I'm pleased to announce new record of 2500 requests per second.
epoll/kqueue and similar techniques show about 1600-1800 requests per second for single-threaded
trivial web server. Actually not, it is old record, current one is 2600+ req/s.
New kevent patchset has been released, it name is 'take7' and following
changes were done after Andrew Morton's review:
- a lot of comments!
- do not use list poisoning for detection of the fact, that entry is in the list
- return number of ready kevents even if copy*user() fails
- strict check for number of kevents in syscall
- use ARRAY_SIZE for array size calculation
- changed superblock magic number
- use
SLAB_PANIC instead of direct panic() call
- changed
-E* return values
- a lot of small cleanups and indent fixes
I've removed all AIO (both network and VFS) stuff and socket notifications
from patchset and resent it to linux-kernel@ and netdev@.
/devel/kevent :: Link / Comments (0)
Wed, 09 Aug 2006
Climbing.
There is only one word, which can describe today's training,
and that word is pain.
I've bought new climbing shoes for new records,
which are 3 sizes smaller than my usual shoes, that is
something like a requirement for complex traces,
since feet have better contact with the wall and holds,
but next several trainings will not be easy since shoes
are not trod out yet.
So todays training was simple, although I tried
several interesting old traces to check how new shoes can
stay on small holds. Shoes showed that they can do it very well.
Training was finished with usual campus-board exercises which
were not that hard, since arms did not tired at all.
/life :: Link / Comments (0)
Network tree allocator performance test.
I've run epoll based web server from kevent
testbed and got 2301 requests per second, while with usual code it is about 1600-1800 requests per second.
It can be explained by tons of reasons, but this test clearly shows that network tree allocator can behave
not worse and maybe better than usual slab one (all debugging options are turned off) for network traffic allocations.
This test (single-threaded web server and httperf as client) has been run without any SMP performance tuning (and I have one gigantic lock
right now around all allocations and freeings, but do not think too bad about my mental abilities,
it will be completely eliminated after per-cpu tuning is completed (similar to how it is implemented in SLAB-allocator)).
All changes in the core network stack (not including allocator itself) conains
of *kmalloc()/kfree() replacement in *alloc_skb()/skb_release_data()
and addition of a new field into struct sk_buff which holds total
allocation size, since ->totalsize variable can be changed while
skb is being processed in the kernel.
/devel/networking/nta :: Link / Comments (0)
Network tree allocator.
While thinking some more about generic tree and hash table
data representation, I've come to the conclusion, that tree should be
more appropriate case for the structures which can dynamically grow/shrink
with the time. For example with netwrok tree allocator it is
trivial task to add new memory into the cache, and it is easy task
to remove pages (but not trivial, since AVL-tree removing algo is
very complex, although fast (and to be 100% honest with the reader,
I want to note, that I did not implement it for NTA)), so memory hotplug and
various OOM conditions can be handled much more nicely than with table based
approach where parts of the table must be relocated.
The same issue comes in mind with recent changes in network hash tables
manipulations - table dynamic grow/shrink sometimes requires the whole table
relocation, which can be extremely large. As far as I recall there was
a discussion about tree vs. table approach and the later was selected,
but I do not recall any details already. Well, maybe it's time to reimplement
the thing... At least for upcoming fast NAT rework I plan to use trees
instead of hash tables to store NAT entries, and since most of my work
looks for the most people like researching-only (it is not actually) projects
far from reality (only two of them are in the kernel tree),
I can create any crazy schemes I like.
/devel/networking/nta :: Link / Comments (0)
Startup system statistic.
For interested reader: while kernel is starting (no network drivers, no NFS)
more than 1200 skbs with data are allocated
(does not including skb cloning, network adapter is
loaded as module later when userspace is ready), when userspace has started
and configured interface, without network cable plugged (no fancy stuff like network console)
in there were more than 50k skbs allocations and freeings and it is still counting...
/devel/networking :: Link / Comments (0)
Kevent.
I've released 'take6' patchset. Following things have been implemented:
- removed compilation warnings about unused wariables when lockdep is not turned on
- do not use internal socket structures, use appropriate (exported) wrappers instead
- removed default 1 second timeout
- removed AIO stuff from patchset
/devel/kevent :: Link / Comments (0)
Tue, 08 Aug 2006
Network tree allocator.
I've moved it into the kernel and made all network traffic
to be allocated using it. It is not tuned for SMP performance yet (it requires some per-cpu-alike
magic), NTA does not support cache grow when there is
requirement for that and context allows and there are no interfaces
for the zero-copy networking yet, but the most complex part
has been implemented already, although there are some bugs there yet.
After I complete SMP tuning I will run some performance tests
and start sending and receiving zero-copy network stack implementation.
/devel/networking/nta :: Link / Comments (0)
Kevent.
I've created fifth patchset. It includes:
- use miscdevice instead of chardev
- comment fixes
Patchset has been sent to linux-kernel@ and netdev@ and
I've asked for inclusion or declining of the kevent subsystem.
/devel/kevent :: Link / Comments (0)
Mon, 07 Aug 2006
Climbing.
It was relaxing climbing today - no complex traces, no negative slope,
just several old good ones and one new completed on-sight.
I've added campus-board exercise for passive endurance improvement,
let's see how it will help, although I perform it not 100% clear yet.
/life :: Link / Comments (0)
Network tree allocator.
Additional 100 milliards of allocations have been done for
network tree allocator. It's time to move it into the kernel.
While hacking on NTA I've created special SLAB-like 3 layer cache for
struct avl_container - special structures used to store
pointers to free chunks inside special crafted FIFO lists.
Now there is only following allocation being done using Linux memory allocation
primitives:
- initial storage structures for AVL trees (i.e. pages of data (which will be reused by tree allocator) and tree nodes) and array of
lists of free chunks
- container cache layers (l1 and l2 are pages, l3 is element of the list, which should be allocated very rarely),
which are only allocated when appropriate layer is empty
So in run-time there are no allocations from main memory except rare page-sized allocations
to refill container cache. As expected after some short period of time container cache stopped to grow.
Interesting note that after switching to cache allocator from usual malloc()/free() for containers
general allocation speeds has increased.
/devel/networking/nta :: Link / Comments (0)
Sun, 06 Aug 2006
Alignment overhead in Linux networking code.
I was involved into recent discussion about jubo frames in e1000
network adapter - it ends up with 32k allocation
for just 9k jumbo frames, which is great waste of memory.
This happens due to the fact that e1000 does not have MTU at all,
it only has maximum allowed segment size, and it must be power of two
(with some exceptions), so e1000 rounds 9k to 16k, then network layer
adds sizeof(struct skb_shared_info) at the end and
SLAB allocator aligns it to 32k.
I've designed IMHO very elegant and simple solution to workaround
this issue - if difference between requestd allocation size and aligned
size exceeds sizeof(struct skb_shared_info), the latter will be allocated
from cache. This will immediately solve a problem
with PAGE_SIZE allocation which happens
in e1000 for standard 1500 bytes MTU for some chips.
I will implement it later today and send to netdev@ for review.
/devel/networking :: Link / Comments (0)
Network tree allocator.
Let's see how fragmentation problem is being solved in NTA.
For initial test I've run NTA with set of pseudo-random sized allocations
until first allocation fails, when it hapens I decrease maximum allocation
size in two times. Each graph below shows free and used chunks
inside each page (there are 4094 pages), green points correspond
to free and red ones - to used chunks (each one of 32 bytes).
Maximum allocation size is equal to PAGE_SIZE, failed allocation was for 1912 bytes
(60 chunks of 32 bytes):

Maximum allocation size is equal to PAGE_SIZE/2 (decreased after allocation failure),
failed allocation was for 968 bytes (31 chunks of 32 bytes):

Maximum allocation size is equal to PAGE_SIZE/4 (decreased in two times after first and second
allocation failures), last failed allocation was for 504 bytes (16 chunks of 32 bytes):

Maximum allocation size is equal to PAGE_SIZE/8 (decreased in two times after each of three
allocation failures), last failed allocation was for 252 bytes (8 chunks of 32 bytes):

This tests do not show how fragmentation is changed with the time, when there are a lot of allocations and
freeings are completed, but even existing results show that network tree
allocator performs very well. Next time I will run the same tests
after some pseudo-random allocation and freeing periods.
For comparison I've run the same test with power-of-2 slab-like allocator (actually
it is much more simple, but it has the same ideas as SLAB allocator and
probably can behave even better if we get into account big-sized chunks).
Picture does not change when maximum allocation size
is being decreased after allocation failures, since most of the
overhead and fragmentation is obtained from power of 2 rounds.

This SLAB-like power-of-two allocator overhead and fragmentation actually looks different
than on the picture, since almost all allocations have fragmentation overhead,
so each vertical line actually must contain several red(used)-green(free or fragmentation overhead)
pieces, where sum of all pieces of the same colour will be equal to what is shown
on the picture. But picture presents that absolute amount of fragmentation overhead
is extremely high for power-of-2 allocators. For the real SLAB allocator picture
will be better for small-sized chunks (since chunks never share pools with different sized ones,
except when they can steal pages when cache is refilled),
but much worse for big-sized ones.
Difference in used and free chunks position on the pictures is due to
the fact, that in network tree allocator chunks in page are shown
in reverse order (i.e. higher addresses are first).
/devel/networking/nta :: Link / Comments (0)
Sat, 05 Aug 2006
Network tree alocator.
First test stage completed - more than 52 milliards of allocations
of different sizes (currently tested only from 1 to PAGE_SIZE
with 32 bytes granularity) have been done, so roughly it is correct. To prove correctness more
I plan to start second testing stage, which will include much faster freeing
(test will keep pointers to allocated objects inside array, which indexes will be
structured to not contain any gaps) and full utilisation of allocated page pools,
so if allocation fails, it will be restarted with smaller size until all allocator's
memory is used. That will give an interesting statistics of memory usage and fragmentation
in tree allocator. I will also include periodical dump of bitmasks of free and used
objects so it would be possible to visually observe fragmentation issues.
I've found a way to determine if given address belongs to PAGE_SIZEd
chunk or to bigger contiguous region - it is quite simply by looking at
page->lru.next and page->private, which can be used
to detect compound pages and it's order, which solves a
problem
of converting a freed address into a page which holds freed area.
/devel/networking/nta :: Link / Comments (0)
Added second photo album from canoe trip to gallery.
One can find Eugene Burnyakov (Wijo) and Alexandra, Alexander Boykov (mephody) and
(his wife already) Irina there.
Enjoy!
/life :: Link / Comments (0)
Kevent.
I've released fourth kevent patchset. Changes from the previous 'take3' patchset:
- removed serializing mutex from
kevent_user_wait()
- moved storage list processing to RCU
- removed lockdep screaming - all storage locks are initialized in the same function, so it was learned
to differentiate between various cases
- remove kevent from storage if it is marked as broken after callback
- fixed a typo in mmaped buffer implementation which would end up in wrong index calcualtion
I've sent it to linux-kernel@ and netdev@ for review. As far as I recall there are
no issues which must be fixed or changed anymore.
/devel/kevent :: Link / Comments (0)
Fri, 04 Aug 2006
Excellent climbing day.
Amount of alchohol in my blood became optimal,
so it was extremely good training today - a lot of old
traces (simple and more complex, on the vertical wall
and with negative slope, one trace with quite big dynamic jump,
on other trace I've broken a hold, but not because I'm that strong
or heavy, it was already slightly broken, I just completed the process),
several traverses, campus-board exercises, sauna, good weather, not bad supper
and I feel myself just damn good.
Trainings become simpler - I did not try new traces quite
for a long time already (although two new traces I finished,
but they are not very complex), and almost do not climb on negative
slope, which is not very good. So things need to be changed soon.
/life :: Link / Comments (0)
Lockdep and kevent.
I've enabled lockdep and found that it catches [ INFO: inconsistent lock state ]
when system boots, after looking into the lockdep code and it's lock initialization,
I've come to the following conclusion: lockdep initilizes special key for each
type of locks, i.e. it uses tricky macro which inserts a static variable just before
lock initialization inside spin_lock_init(), so if the same function is used
for different locks initialization, lockdep will think that all locks are the same,
and if later one of them is called with BH disabled and others are called without
any irqs disabled it will fire.
I see exactly that: lockdep fires on kevent_storage_ready() which is called
both in softirq context and in process context without BH being disabled, it happens
when inode and socket are going to check their's queues, but there is no way the same inode
can be used both for socket and something else. Actually it fires even when lockdep was
learned to diffirentiate between inode and socket case - there is a case when socket is closed
and struct file, assotiated with it, is being freed, kevent calls
kevent_storage_fini() to flush all pending kevents, which always happens in
process context, and lockdep fires a signal that it is possible (since BH are enabled) that
the same lock can be obtained by socket code called from BH context. It is not true, since
when kevent_storage_fini() is called appropriate socket already removed from
socket table and it can not be accessed from softirqs.
The only sane way I see is to reinit lockdep after it has initilized locks, and reinitialization
must happen in the high level (i.e. not in kevent_storage_init() since
it does not know what storage is being initialized, and event the same
storage owner can be used for different processings (like inode is being used for
both socket and VFS notifications)) in for example kevent_socket_enqueue()
or kevent_inode_enqueue().
While testing kevents it looks like impossible thing happend - I've found two bugs:
one was introduced when mapped buffer was implemented (it is a typo in define, which
ends up in wrong calculation of offset inside the buffer), and second one was created
when I removed a mutex between events copying into userspace (after they are marked ready and
it was detected) and controlling operations, which could end up in a race between removing and
waiting code.
I plan to test some RCU ideas inside kevent code (this actually shuts up lockdep, but there
were some problems when I first time tried to use RCU with kevents) and complete
lockdep related changes tomorrow.
/devel/kevent :: Link / Comments (0)
Network tree allocator.
I've started stress testing for new allocator.
First one is quite simple - system tries to allocate a lot of
chunks of random size, when there is no memory or number of allocated
chunks exceeds some threshold (1 million allocations), system starts
to free them one by one from the begining. It is quite slow test, since
test's freeing logic (do not confuse with freeing logic inside allocator)
runs through the whole array (currently it contains 1 million entries)
of allocated chunks and tries to free them all.
/devel/networking/nta :: Link / Comments (0)
Thu, 03 Aug 2006
TODO.
I've created development TODO
list, feel free to send ideas and beer.
/devel :: Link / Comments (0)
Kevent.
I've tested all modifications made before and added an optimisation,
which is aimed to help when a lot of kevents are being copied
from userspace. I also plan to implement initial mapped buffer today
and send the whole patchset as take3 version.
Ok, I've completed initial mapped buffer implementation and sent
take3 patchset to linux-kernel@ and netdev@ for review.
Mapped buffer implementation is quite simple - when kevent user queue
is created system attaches set of pages to the queue, so the whole
queue (not kernel kevents, but user request structures called struct ukevent)
could be placed there (maximum allowed queue length is KEVENT_MAX_EVENTS
and is equal to 4096 events). Since size of struct ukevent
is 40 bytes on every arch, they do not exactly fill the page,
so I use 4 bytes at the begining of the first one to store number of
ready events placed into the buffer. Events are placed into the buffer
when they are queued into ready queue under kevent_user->ready_lock,
so updates are always atomic, index update happens after event
has been placed into the buffer.
/devel/kevent :: Link / Comments (0)
Wed, 02 Aug 2006
Climbing.
Weekend
parties stuff still plays in my blood, so today's training was quite hard.
I've only completed three old not that complex traces and couple
of traverses. All traces which I previously could finish without the rest in between
today were completed with falls, since I tired very quickly, so I even
have not started man's start and campus-board exercises. But nevertheless
I've spent a good time.
/life :: Link / Comments (0)
Kevent.
I've completed most of the issues Zach Brown (zach.brown@oracle.com) mentioned
in his review of kevents,
main are:
- split
kevent_finish_user() to locked and unlocked variants
- do not use
KEVENT_STAT ifdefs, use inline functions instead
- use array of callbacks of each type instead of each kevent callback initialization
- changed name of ukevent guarding lock
- use only one kevent lock in
kevent_user for all hash buckets instead of per-bucket locks
- do not use
kevent_user_ctl structure instead provide needed arguments as syscall parameters
- various indent cleanups
New patchset has not been tested yet (only booted with new kernel), so I will release it tomorrow after several checks.
Since it will not contain mapped buffer implementation, it's name will be take2' instead
of take3.
David Miller has an opinion, that we can completely
disable possibility to get events through syscall, and always get them from mapped buffer,
but until glibc guys like kevent and complete support for mapped buffer it is not a real solution,
so at least for now I plan to use them in parallel.
/devel/kevent :: Link / Comments (0)
Network tree allocator.
Userspace model has been completed.
I have not run stress tests yet, but it already can allocate set of objects and combine them back
when they are freed. Let's see an example:
- allocate 120 bytes
- allocate 200 bytes
- allocate 70 bytes
- free 120 bytes
- free 70 bytes
- free 200 bytes
This ends up in the following sequence:
- get one page
- split it to 120 and
PAGE_SIZE-120 bytes parts
- mark
PAGE_SIZE-120 bytes part as free and move it's container into new list
- split
PAGE_SIZE-120 into 200 and PAGE_SIZE-120-200 bytes parts (second allocation)
- mark
PAGE_SIZE-120-200 bytes part as free and move it's container into new list
- split
PAGE_SIZE-120-200 into 70 and PAGE_SIZE-120-200-70 bytes parts (third allocation)
- mark
PAGE_SIZE-120-200-70 bytes part as free and move it's container into new list
- free first chunk (120 bytes) - it was first in the page and it does not have free neighbours (above 200 bytes allocation
was done right after this chunk and it is used)
- allocate new container and add it into the list for 120 bytes (aligned to
AVL_MIN_SIZE actually) free objects
- free third chunk (70 bytes) - it has a big free area at the left (note that I'm talking about little endian here),
of
PAGE_SIZE-120-200-70 bytes and second allocated (currently used) chunk of 200 bytes at the right,
so this freeing reuse container from the left chunk and move it into the list for PAGE_SIZE-120-200-70+70 bytes
- free 200 bytes chunk - it has both free neighbours: with
PAGE_SIZE-120-200-70+70 bytes and 120 bytes.
So it will reuse container for 120 bytes, so it results in PAGE_SIZE chunk.
For those who likes to look at unknown logs and symbols (like I do, especially new dmesgs), here is debug output from
initial implementation of network tree allocator, produced by described above steps:
PAGE_SIZE: 4096, max nodes: 4094, node size: 32.
avl_update_node: node: 0x523800, value: 02554000, ptr: 0x2554000, cpos: 127, size: 128, num: 4.
avl_fill_bits: num: 4, pos: 0, idx: 0, p: f, start: 0, stop: 4, fffffffffffffff0.
avl_update_node: reuse container 0x2555f60 in pos 123 with ptr 0x2554080.
main: allocated ptr: 0x2554000, size: 120.
avl_update_node: node: 0x523800, value: 02554000, ptr: 0x2554080, cpos: 123, size: 224, num: 7.
avl_fill_bits: num: 7, pos: 4, idx: 0, p: 7f0, start: 4, stop: 11, fffffffffffff800.
avl_update_node: reuse container 0x2555f60 in pos 116 with ptr 0x2554160.
main: allocated ptr: 0x2554080, size: 200.
avl_update_node: node: 0x523800, value: 02554000, ptr: 0x2554160, cpos: 116, size: 96, num: 3.
avl_fill_bits: num: 3, pos: 11, idx: 0, p: 3800, start: 11, stop: 14, ffffffffffffc000.
avl_update_node: reuse container 0x2555f60 in pos 113 with ptr 0x25541c0.
main: allocated ptr: 0x2554160, size: 70.
avl_free: ptr: 0x2554000 [02554000], pos: 0, sbits: 4, size: 120.
avl_fill_bits: num: 4, pos: 0, idx: 0, p: f, start: 0, stop: 4, ffffffffffffc00f.
avl_combine: lp: (nil), lbits: 0, lc: (nil), rp: (nil), rbits: 0, rc: (nil),
current: ptr: 0x2554000, bits: 4, combined: ptr: 0x2554000, idx: 3, cont: (nil).
avl_combine: Added new container for pointer 0x2554000, size: 128.
main: freed ptr: 0x2554000.
avl_free: ptr: 0x2554160 [02554000], pos: 11, sbits: 3, size: 70.
avl_fill_bits: num: 3, pos: 11, idx: 0, p: 3800, start: 11, stop: 14, fffffffffffff80f.
avl_free: found free left neighbour at 0x25541c0, bits: 114.
avl_combine: lp: 0x25541c0, lbits: 114, lc: 0x2555f60, rp: (nil), rbits: 0, rc: (nil),
current: ptr: 0x2554160, bits: 3, combined: ptr: 0x2554160, idx: 116, cont: 0x2555f60.
avl_combine: Using existing container for pointer 0x2554160, size: 3744.
main: freed ptr: 0x2554160.
avl_free: ptr: 0x2554080 [02554000], pos: 4, sbits: 7, size: 200.
avl_fill_bits: num: 7, pos: 4, idx: 0, p: 7f0, start: 4, stop: 11, ffffffffffffffff.
avl_free: found free left neighbour at 0x2554160, bits: 117.
avl_free: found free right neighbour at 0x2554000, bits: 4.
avl_combine: lp: 0x2554160, lbits: 117, lc: 0x2555f60, rp: 0x2554000, rbits: 4, rc: 0x2555f80,
current: ptr: 0x2554080, bits: 7, combined: ptr: 0x2554000, idx: 127, cont: 0x2555f80.
avl_combine: Using existing container for pointer 0x2554000, size: 4096.
main: freed ptr: 0x2554080.
Completed.
/devel/networking/nta :: Link / Comments (0)
Tue, 01 Aug 2006
Network tree allocator.
While creating various bitfield operations I've found,
that several existing Linux kernel ones are way too suboptimal,
for example set_bit_string() and __clear_bit_string()
on x86_64 (actually I have not seen in other arches). And I'm saying
not about assembler optimisations, but usual C ones.
So right now I'm a bit snapped between
kevents
and friends, tree allocator, slacking and paid work (yep, I need to work not less
than 8 hours every day to get some beer and other goodies),
but I plan to complete userspace implementation very soon, since most
of the things are already implemented.
/devel/networking/nta :: Link / Comments (0)
Kevent.
I've released second patchset and sent it to netdev@ and linux-kernel@ for review.
It still contains AIO and aio_sendfile() implementation on top of get_block()
abstraction, which was decided to postpone for a while (it is simpler right now to generate patchset as a whole,
when kevent will be ready for merge, I will generate patchset without AIO stuff).
It does not contain mapped buffer implementation, since it's design is not 100%
completed, I will present that implementation in the third patchset.
Changes from previous patchset:
- rebased against 2.6.18-git tree
- removed ioctl controlling
- added new syscall
kevent_get_events()
- use old syscall
kevent_ctl() for creation/removing, modification and initial kevent initialization
- use mutuxes instead of semaphores
- added file descriptor check and return error if provided descriptor does not match kevent file operations
- various indent fixes
- removed aio_sendfile() declarations
/devel/kevent :: Link / Comments (0)
|