Zbr's days.
January
Sun Mon Tue Wed Thu Fri Sat
 
17
     
2007
Months
Jan

About TODO Blog RSS Old blog Projects Gallery Notes

Wed, 17 Jan 2007

Climbing evening.


It was not that bad training, although with some negative moments. I tried new traces this day - most of them were quite doable, but they were complex enough and I performed several complex boulderings before, I failed. The most interesting was yellow trace in the central sector on vertical wall with all completely passive holds - I managed to complete about half of the trace on-sight, and then it started to present surprises, or maybe I was just too tired for the on-sight climbing of that complexity. Anyway, at the top I was completely out of power, and even managed to get rope over the head and damage my shoulder, but eventually I quicly fixed my position, but teared my handphones cord.

/life :: Link / Comments (0)


Breakthrough ideas are not from teams. Hans von Ohain.


Interesting note... I would even say 'ego boosting'. I like it.

/other :: Link / Comments (0)


Threading part of the NTL M-on-N threading library is ready.


Although not without problems - there is no scheduler (well, there is round-robin one, which is not what I want), I did not run any kind of benchmarks to test SMP scalability and timer signal overhead (the latter is the most problematic part - although 'top' shows zero CPU usage for pool of 100 threads sleeping in infinite loop, it is still possible that actual CPU usage due to signal delivery overhead can be noticeble).

Code does not contain kevent syscall wrappers yet. I will think about dynamic library loading tricks, which allow to 'replace' syscalls in runtime.

Enough for today - I'm going climbing.

/devel/threading :: Link / Comments (0)


pthread_create() vs. clone().


Did you ever tried to use clone() directly? I bet you never tried it at least with recent kernels.
First, exported clone() does not correspond to what kernel expects, it looks like it is only provided for compatibility. Manpage for that call is utterly obsoleted and incorrect (except useful flag description it contains _wrong_ descrition of parameters at least for i386).
But I do not search for easy ways - I have glibc sources and can dig into them.
That was my first impression of the man, who in theory can climb the Everest, fly to the space and understand math behind string theory (the latter only if time permits, it looks like the whole life can be spent there digging into more and more new subtheories).
Now I think that all three tasks described above can be much-much-much more solvable than digging in the glibc sources. And those people says that I poorly described kevent - hey, look into glibc NPTL implementation (and I even do not talk about its coding style) and pray you will never see this again, or just try to start a new thread using clone().

After about an hour of reverse engineering process trying to make __clone() work (note, that clone() does not work at all, just forget about this call, only __clone() is correct for i386 and 2.6 kernel), I managed to start new thread. It was a win, except very small problem, that it crashed somewhere in the provided function calling chain.
I want you to know, that I do not know low-level i386 arch enough to easily read and understand asm code (some years ago I managed to write asm application which entered protected mode in DOS, but I do not recall asm already, and actually never understood gas semantic good enough) found in sysdeps/unix/sysv/linux/i386/clone.S, so I miserably failed to proceed.

Yes, I started to use pthread_create() for SMP scalability. I do not hear how you scream 'loser', since you would be there too, but those of you, who still lurks here and ironically nod your head would better point me to something useful for understanding of how modern i386 (or actually any other arch) starts and works with threads/processes.

/devel/threading :: Link / Comments (0)


Initial implementation of the ntl (new threading library) M-on-N threading library.


Well, I can not find better prefix than ntl, which is extremely non-ordinary abbreviation for 'new threading library'.

Anyway, current version is very initial, it does not contain scheduler and does not contain kevent-driven wrappers on top os usual IO syscalls, but it already has all initialization mechanisms, cache of threads and all structures required for scheduling.

There are two major problems uncovered with this initial implementation.
First one is scheduling problem. Since NTL does not contain dedicated schduling thread, it is quite hard to perfrom scheduling of the functions which does not do syscalls, for example with those which just do while(1); loop and eat 100% CPU and never enters NTL layer. To solve this problem I need to add timer and appropriate signal handler, where reschduling will happen, which in theory can lead to performance degradation and to problem with alarm signal registered in thread function (although that should be fixed with kevent timer notifications).

Another problem is futex performance.
In current code there are two locks implemented as semaphores, which in modern Linux are transfered into futexes - schduler lock, which guards queue of threads, and stack cache lock, which guards list of free thread stacks.
So usual thread creation, empty function and thread exiting in NTL changes from this operations to:

mmap2(NULL, 8396800, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7684000
sigprocmask(SIG_BLOCK, NULL, [])        = 0
futex(0xb7fdac20, FUTEX_WAKE, 1)        = 0
sigprocmask(SIG_SETMASK, [], [])        = 0
futex(0xb7fdac20, FUTEX_WAKE, 1)        = 0
futex(0xb7fdaaa0, FUTEX_WAKE, 1)        = 0
sigprocmask(SIG_SETMASK, [], NULL)      = 0
futex(0xb7fdaaa0, FUTEX_WAKE, 1)        = 0
...
munmap(0xb7fd7000, 4096)                = 0
so we get aditional four futex calls - two locks are processed: one when stack is unlinked and returned to stack cache, and another when thread is added and removed from scheduler's queue.

Performance differs noticebly (test case includes creation of the thread, which exits immediately, which is repeated requested number of times):
$ ./ntl_test 100000
num: 100000, diff: 388234, speed: 3.882340.
Compared to 1.793600 microseconds without futex calls.

In this situation there is no concurency at all - it is synthetic test, so actually one _empty_ futex call gets about 0.5 microseconds, where pure syscall overhead is 50% (this is Intel Core Duo 3.40GHz (running 3.7 Ghz) test machine).
I can not say if futex performance is slow of fast - but I would like to avoid this, so in practice semaphores should not be used for thread serialization, instead lightweight locks must be introduced.
In current code all locks are abstracted and implemented in separate file, so lock changes are trivial, but I do not want to introduce per-arch usage right now.

/devel/threading :: Link / Comments (0)


Bruce Schneier's facts.


Super!

When Bruce Schneier observes a quantum particle, it remains in the same state until he has finished observing it.

Most people use passwords. Some people use passphrases. Bruce Schneier uses an epic passpoem, detailing the life and works of seven mythical Norse heroes.

Bruce Schneier writes his books and essays by generating random alphanumeric text of an appropriate length and then decrypting it.

/other :: Link / Comments (0)


New kevent 'take33' release.


It is minor release which only contains following changes:

  • Updated documentation (aio_sendfile_path()).
  • Fixed typo in forward declaration.

/devel/kevent :: Link / Comments (0)