|
About ::
TODO ::
Blog ::
RSS ::
Old blog ::
Projects ::
GIT ::
Gallery ::
Notes
Mon, 15 Jan 2007
Initial benchmarking of $ ./test_pthread 100000 num: 100000, diff: 1402225, time: 14.022250. $ ./test_context 100000 num: 100000, diff: 1322459, time: 13.224590.Impossible, something was completely wrong and another world's magic was mixed, that was my first impression. But when I studied in MIPT, I was told on every physics lab, that there is no magic, so I started to think. The only thing my brain could think about, was to run strace.So I did, and found following interesting moments. Pthread case:
...
mmap2(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7622000
...
clone(child_stack=0xb7e214c4, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|
CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID|CLONE_DETACHED,
parent_tidptr=0xb7e21bf8, {entry_number:6, base_addr:0xb7e21bb0, limit:1048575, seg_32bit:1,
contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}, child_tidptr=0xb7e21bf8) = 24426
clone(child_stack=0xb7e214c4, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|
CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID|CLONE_DETACHED,
parent_tidptr=0xb7e21bf8, {entry_number:6, base_addr:0xb7e21bb0, limit:1048575, seg_32bit:1,
contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}, child_tidptr=0xb7e21bf8) = 24427
...
and so on - everything looks ok - one stack allocation, and then it was reused since stack was not freed,
but was put into cache by nptl implmentation.Here is ucontext case: ... mmap2(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7659000 sigprocmask(SIG_BLOCK, NULL, []) = 0 sigprocmask(SIG_SETMASK, [], []) = 0 sigprocmask(SIG_SETMASK, [], NULL) = 0 munmap(0xb7659000, 8392704) = 0 mmap2(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7659000 sigprocmask(SIG_BLOCK, NULL, []) = 0 sigprocmask(SIG_SETMASK, [], []) = 0 sigprocmask(SIG_SETMASK, [], NULL) = 0 munmap(0xb7659000, 8392704) ...So, mmap()/munmap() was the culprit - my context allocation code
did not used cache of stacks, instead it allocated/freed new one for each new context.
After I emulated cache usage, I got following strace for context case:mmap2(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb75fc000 sigprocmask(SIG_BLOCK, NULL, []) = 0 sigprocmask(SIG_SETMASK, [], []) = 0 sigprocmask(SIG_SETMASK, [], NULL) = 0 sigprocmask(SIG_BLOCK, NULL, []) = 0 sigprocmask(SIG_SETMASK, [], []) = 0 sigprocmask(SIG_SETMASK, [], NULL) = 0 sigprocmask(SIG_BLOCK, NULL, []) = 0 ... munmap(0xb75fc000, 8392704) = 0And following results: $ ./test_pthread 100000 num: 100000, diff: 1402225, time: 14.022250. $ ./test_context 100000 num: 100000, diff: 179360, time: 1.793600.As expected - there is no magic, userspace context switching is about 7 times faster than real thread creation, and mmap()/munmap() syscalls provide exactly clone() overhead.
Empty syscall on this machine is about 0.25 microseconds, so its overhead is negligible.
/devel/threading :: Link / Comments (0) Please solve this captcha to be allowed to post (need to reload in a minute): 78 - 31 Comments are closed for this story. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||