|
|
About
TODO
Blog
RSS
Old blog
Projects
Gallery
Notes
Wed, 17 Jan 2007
Climbing evening.
It was not that bad training, although with some negative moments.
I tried new traces this day - most of them were quite doable, but they
were complex enough and I performed several complex boulderings before,
I failed. The most interesting was yellow trace in the central sector
on vertical wall with all completely passive holds - I managed to complete
about half of the trace on-sight, and then it started to present surprises,
or maybe I was just too tired for the on-sight climbing of that complexity.
Anyway, at the top I was completely out of power, and even managed to
get rope over the head and damage my shoulder, but eventually I quicly fixed my position,
but teared my handphones cord.
/life :: Link / Comments (0)
Breakthrough ideas are not from teams. Hans von Ohain.
Interesting note... I would even say 'ego boosting'. I like it.
/other :: Link / Comments (0)
Threading part of the NTL M-on-N threading library is ready.
Although not without problems - there is no scheduler (well, there is round-robin one,
which is not what I want), I did not run any kind of benchmarks to test
SMP scalability and timer signal overhead (the latter is the most
problematic part - although 'top' shows zero CPU
usage for pool of 100 threads sleeping in infinite loop,
it is still possible that actual CPU usage due to
signal delivery overhead can be noticeble).
Code does not contain kevent syscall wrappers yet. I will think about dynamic
library loading tricks, which allow to 'replace' syscalls in runtime.
Enough for today - I'm going climbing.
/devel/threading :: Link / Comments (0)
pthread_create() vs. clone().
Did you ever tried to use clone() directly?
I bet you never tried it at least with recent kernels.
First, exported clone() does not correspond to
what kernel expects, it looks like it is only provided for
compatibility. Manpage for that call is utterly obsoleted
and incorrect (except useful flag description it contains
_wrong_ descrition of parameters at least for i386).
But I do not search for easy ways - I have glibc sources and can
dig into them.
That was my first impression of the man, who in theory can climb the Everest,
fly to the space and understand math behind string theory (the latter only if time permits,
it looks like the whole life can be spent there digging into more and more
new subtheories).
Now I think that all three tasks described above can be much-much-much
more solvable than digging in the glibc sources. And those people
says that I poorly described kevent - hey, look into glibc NPTL
implementation (and I even do not talk about its coding style)
and pray you will never see this again,
or just try to start a new thread using clone().
After about an hour of reverse engineering process trying to make __clone()
work (note, that clone() does not work at all, just forget
about this call, only __clone() is correct for i386 and 2.6 kernel),
I managed to start new thread. It was a win, except very small problem,
that it crashed somewhere in the provided function calling chain.
I want you to know, that I do not know low-level i386 arch enough
to easily read and understand asm code (some years ago I managed to
write asm application which entered protected mode in DOS,
but I do not recall asm already, and actually never understood
gas semantic good enough) found in sysdeps/unix/sysv/linux/i386/clone.S,
so I miserably failed to proceed.
Yes, I started to use pthread_create() for SMP scalability.
I do not hear how you scream 'loser', since you would be there too, but those of you,
who still lurks here and ironically nod your head would better point
me to something useful for understanding of how modern i386 (or actually any other arch)
starts and works with threads/processes.
/devel/threading :: Link / Comments (0)
Initial implementation of the ntl (new threading library) M-on-N threading library.
Well, I can not find better prefix than ntl,
which is extremely non-ordinary abbreviation for 'new threading library'.
Anyway, current version is very initial, it does not contain scheduler
and does not contain kevent-driven wrappers on top os usual IO syscalls,
but it already has all initialization mechanisms, cache of threads
and all structures required for scheduling.
There are two major problems uncovered with this initial implementation.
First one is scheduling problem. Since NTL does not contain dedicated schduling thread,
it is quite hard to perfrom scheduling of the functions which does not
do syscalls, for example with those which just do while(1); loop
and eat 100% CPU and never enters NTL layer. To solve this problem I need
to add timer and appropriate signal handler, where reschduling will happen,
which in theory can lead to performance degradation and to problem with alarm
signal registered in thread function (although that should be fixed with kevent
timer notifications).
Another problem is futex performance.
In current code there are two locks implemented as semaphores, which
in modern Linux are transfered into futexes - schduler lock,
which guards queue of threads, and stack cache lock, which guards
list of free thread stacks.
So usual thread creation, empty function and thread exiting in NTL changes
from this
operations to:
mmap2(NULL, 8396800, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7684000
sigprocmask(SIG_BLOCK, NULL, []) = 0
futex(0xb7fdac20, FUTEX_WAKE, 1) = 0
sigprocmask(SIG_SETMASK, [], []) = 0
futex(0xb7fdac20, FUTEX_WAKE, 1) = 0
futex(0xb7fdaaa0, FUTEX_WAKE, 1) = 0
sigprocmask(SIG_SETMASK, [], NULL) = 0
futex(0xb7fdaaa0, FUTEX_WAKE, 1) = 0
...
munmap(0xb7fd7000, 4096) = 0
so we get aditional four futex calls - two locks are processed: one when stack is unlinked
and returned to stack cache, and another when thread is added and removed from scheduler's queue.
Performance differs noticebly (test case includes creation of the thread, which exits immediately,
which is repeated requested number of times):
$ ./ntl_test 100000
num: 100000, diff: 388234, speed: 3.882340.
Compared to 1.793600 microseconds without futex calls.
In this situation there is no concurency at all - it is synthetic test,
so actually one _empty_ futex call gets about 0.5 microseconds, where
pure syscall overhead is 50% (this is Intel Core Duo 3.40GHz (running 3.7 Ghz) test machine).
I can not say if futex performance is slow of fast - but I would like to avoid this,
so in practice semaphores should not be used for thread serialization, instead
lightweight locks must be introduced.
In current code all locks are abstracted and implemented in separate file, so
lock changes are trivial, but I do not want to introduce per-arch usage right now.
/devel/threading :: Link / Comments (0)
Bruce Schneier's facts.
Super!
When Bruce Schneier observes a quantum particle, it remains in the same state until he has finished observing it.
Most people use passwords. Some people use passphrases.
Bruce Schneier uses an epic passpoem, detailing the life and works of seven mythical Norse heroes.
Bruce Schneier writes his books and essays by generating random alphanumeric text of an appropriate length and then decrypting it.
/other :: Link / Comments (0)
New kevent 'take33' release.
It is minor release which only contains following changes:
- Updated documentation (
aio_sendfile_path()).
- Fixed typo in forward declaration.
/devel/kevent :: Link / Comments (0)
|