Zbr's days.
December
Sun Mon Tue Wed Thu Fri Sat
           
7
         
2007
Months
Dec

About TODO Blog RSS Old blog Projects Gallery Notes

Fri, 07 Dec 2007

Climbing evening.

It was quite short and not very hard training - I was a bit later than usually, and most of the time I tried quite old but very complex start on the horizontal negative slope. Meantime I talked with instructor and found that start in question does not contain one hold, which was there originally, so that should explain why I fell. I will continue that red trace next time, I even want to put a huge paper around another hold, located where old one was: 'I'm a red hold, I'm just feigning'.

/life :: Link / Comments (0)


Strong checksumms in DST rocks.

Great thanks to person, who suggested me to implement them and Zach Brown, who showed, that Castagnoli crc is a better one than Adler.

I've debugged a setup where system failed to mount XFS filesystem on top of distributed storage, and after turned on strong checksums, system detected they were wrong, so some corruption happend during filesystem setup.
Turning off TSO, RX and TX offload of e1000 nics on machines, which form the storage, fixed the problem.

Strong checksumms rocks!

/devel/dst :: Link / Comments (3)


Distributed storage and long distances.

I've just completed some tests over the distributed system, created on top of usual internet links between machines, located in Moscow, Russia and London, UK.
Remote target was setup, then XFS filesystem created, mounted and some tests ran.
One of the machines (main storage server) is located behind at least one NAT firewall.

/devel/dst :: Link / Comments (4)


The return of syslets.

Zach Brown announced new syslet patchset aimed to simplify and stbilize basic async operations. Syslets is a mechanims of performing syscalls asynchronously - new thread is started when syscall is about to block, execution blocks and old thread is scheduled away to the new one, on behalf of which userspace continues its execution.
Version 7 of the patchset was built on top of indirect syscall, threadlets, userspace function execution and async io was removed from the patchset for simplicity, number of comments and code clarifications were added.
Main goal of the syslets right now is to make fundamental things working right.

Asynchronous IO operations has too long history already - it was implemented as a state machine in KAIO and kevent AIO, kernel supports AIO for directIO operations (userspace requires libaio).
Syslet approach was shown to be in some cases much slower than libaio (which is actually a sync operations for usual files), but it was resolved as unfairness of CFS scheduler, and (iirc) it was fixed/extended.

My main objection against this is the fact, that when you have thousands of actively running applications, system starts sucking badly, but if it is possible to reduce maximum amount of working thread per user to some resonable limit, things will be just fine. Syslets (and its more friendly threadlets user) were supported by Linus and Ingo Molnar, so very likely it will be the default way to do asynchronous IO and other operations.
Right now Zach highlighted following problems:

  • ring buffer of syslet statuses limitations
  • ptrace() problems
  • stale data (when thread issuing a syslet calls for example setuid(), in which case another thread, which actually executes blocked syscall, contains wrong data)
  • problems with sys_clone() and syslets, sys_clone() is actually a mechanism to create a new thread in syslets, so we get a recursion
All above problems are technically not-impossible for resolution, and I think it is not that bad to introduce some simple limitations for users, so that majority of async IO qustions are resolved with this mechanism.

/devel/other :: Link / Comments (0)


B(something)-tree vs RB-tree. On-disk allocations.

In the previous article it was shown, how btree and rbtree behave with allocations are being done in memory. In such conditions btree should suck compared to rbtree, and generally it is true, although in some conditions its insert speed can be even slightly higher htan rbtree.

Now, let's check how they behave when all allocations are performed from disk.
Below graph shows insert speed for both rbtree and btree in such conditions, each node was allocated with 1024+sizeof(node) offset from previous one so that readahead and thus cached disk apges would not influence the results.
Totally 1 million keys were inserted into the tree.
Search speed is roughly the same as with in-memory tests, since most of the tree sat in the ram after insertion.

B(somthing)-tree vs RB-tree. On-disk allocations

High jump around 220 keys is likely a place, where node size becomes bing enough, and amount of them is small enough, so that total tree started to fit the page cache. In some cases there is no such a peak and graph slowly moves to around 40k insertions per second, which likely happens when some background task is actively using page cache flushing away test file's pages from the memory.

/devel/fs :: Link / Comments (0)


The most discouragement-resistant hacker out there.

That is how Jonathan Corbet calls me :)

/devel/other :: Link / Comments (0)