Zbr's days.
January
Sun Mon Tue Wed Thu Fri Sat
 
15
     
2007
Months
Jan

About :: TODO :: Blog :: RSS :: Old blog :: Projects :: GIT :: Gallery :: Notes

Mon, 15 Jan 2007

Extremely powerful climbing evening.


That was rgeat training - a lot of traces, really a lot, most of them combined into pairs or threes with the rest in between - I even managed to complete some new one and really old traces. And magically I was not that tired, of course hands became weaker with time, but that means traces completion became more technical. I completed trace with the horizontal (negative) slope start with only minor problems (and if you get into account that it was almost the last one I climbed this day, it can be considered as a good climbing), which was a bit of surprise to me.
Later dry sauna actually almost killed my body - it became so slack, that I even sat several minutes without moving just to get myself into the shape.

It was just perfect training. Excellent time.

/life :: Link / Comments (0)


Initial benchmarking of pthread_create() vs. makecontext().


Benchmark is simple - allocate new thread, thread function immediately exits, parent thread waits for cancellation and starts again.
One case is pure pthread_create()+pthread_join(), another one is getcontext(), stack allocation (8mb as of my current rlimit), makecontext(), swapcontext() thread function immediately exits, stack is being freeing.

Obviously I expected that makecontext() will be much faster, but (time is a number of microseconds to create/destroy one thread, i.e. perform sequence described aboved):

$ ./test_pthread 100000
num: 100000, diff: 1402225, time: 14.022250.
$ ./test_context 100000
num: 100000, diff: 1322459, time: 13.224590.
Impossible, something was completely wrong and another world's magic was mixed, that was my first impression.
But when I studied in MIPT, I was told on every physics lab, that there is no magic, so I started to think.
The only thing my brain could think about, was to run strace.
So I did, and found following interesting moments.

Pthread case:
...
mmap2(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7622000
...
clone(child_stack=0xb7e214c4, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|
	CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID|CLONE_DETACHED, 
	parent_tidptr=0xb7e21bf8, {entry_number:6, base_addr:0xb7e21bb0, limit:1048575, seg_32bit:1, 
	contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}, child_tidptr=0xb7e21bf8) = 24426
clone(child_stack=0xb7e214c4, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|
	CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID|CLONE_DETACHED, 
	parent_tidptr=0xb7e21bf8, {entry_number:6, base_addr:0xb7e21bb0, limit:1048575, seg_32bit:1, 
	contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}, child_tidptr=0xb7e21bf8) = 24427
...
and so on - everything looks ok - one stack allocation, and then it was reused since stack was not freed, but was put into cache by nptl implmentation.

Here is ucontext case:
...
mmap2(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7659000
sigprocmask(SIG_BLOCK, NULL, [])        = 0
sigprocmask(SIG_SETMASK, [], [])        = 0
sigprocmask(SIG_SETMASK, [], NULL)      = 0
munmap(0xb7659000, 8392704)             = 0
mmap2(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7659000
sigprocmask(SIG_BLOCK, NULL, [])        = 0
sigprocmask(SIG_SETMASK, [], [])        = 0
sigprocmask(SIG_SETMASK, [], NULL)      = 0
munmap(0xb7659000, 8392704)
...
So, mmap()/munmap() was the culprit - my context allocation code did not used cache of stacks, instead it allocated/freed new one for each new context. After I emulated cache usage, I got following strace for context case:
mmap2(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb75fc000
sigprocmask(SIG_BLOCK, NULL, [])        = 0
sigprocmask(SIG_SETMASK, [], [])        = 0
sigprocmask(SIG_SETMASK, [], NULL)      = 0
sigprocmask(SIG_BLOCK, NULL, [])        = 0
sigprocmask(SIG_SETMASK, [], [])        = 0
sigprocmask(SIG_SETMASK, [], NULL)      = 0
sigprocmask(SIG_BLOCK, NULL, [])        = 0
...
munmap(0xb75fc000, 8392704)             = 0
And following results:
$ ./test_pthread 100000
num: 100000, diff: 1402225, time: 14.022250.
$ ./test_context 100000
num: 100000, diff: 179360, time: 1.793600.
As expected - there is no magic, userspace context switching is about 7 times faster than real thread creation, and mmap()/munmap() syscalls provide exactly clone() overhead. Empty syscall on this machine is about 0.25 microseconds, so its overhead is negligible.

/devel/threading :: Link / Comments (0)


Bring me candies from Linux.Conf.Au.


If I would be there I would visited following presentations:

Non-development activity I would like to participate:

/devel/other :: Link / Comments (0)