|
|
About ::
TODO ::
Blog ::
RSS ::
Old blog ::
Projects ::
GIT ::
Gallery ::
Notes
Tue, 14 Oct 2008
Moving, moving, moving...
I've tired, I'm going away.
After about 10 years helping to administrate 2ka.mipt.ru servers,
when we first time had known what is security, after I rooted that machine,
I decided to retire and switch my mail system to the different place
as I did with web long ago. 2ka server has too many problems which are
not fixed by admins and they are completely unmotivated to do so. Since I like
my emails, I'm sitching.
And since I started this, I decided also to switch blog-engine and domain.
Greet www.ioremap.net
My mail contact is zbr_ioremap.net now, and all 2ka mail will be forwarded there.
This blog will not be updated and will redirect you to the new location in 20 seconds.
See you in the new place, its cool and will be even better with time.
:: Link / Comments ()
Mon, 13 Oct 2008
Massive documentation update for the distributed storage. New release.
Andrew Morton expresed (somewhat angry imho :) lack of documentation
for the DST
as a review-stopper, so I cleaned up up some simple stuff he reported (like
style changs, kcalloc() instead of kzalloc(),
config dependency and other such things) and wrote about 500 lines of code documentation.
Not that much, but it is a bit more than 10% of the whole DST project:
$ git commit -a -m "Documentation update."
Created commit 4886f36: Documentation update.
7 files changed, 476 insertions(+), 18 deletions(-)
$ git-diff-tree -r --stat origin master
warning: refname 'origin' is ambiguous.
drivers/block/Kconfig | 2 +
drivers/block/Makefile | 2 +
drivers/block/dst/Kconfig | 14 +
drivers/block/dst/Makefile | 3 +
drivers/block/dst/crypto.c | 731 +++++++++++++++++++++++++++++
drivers/block/dst/dcore.c | 963 +++++++++++++++++++++++++++++++++++++++
drivers/block/dst/export.c | 662 ++++++++++++++++++++++++++
drivers/block/dst/state.c | 838 ++++++++++++++++++++++++++++++++++
drivers/block/dst/thread_pool.c | 345 ++++++++++++++
drivers/block/dst/trans.c | 335 ++++++++++++++
include/linux/connector.h | 4 +-
include/linux/dst.h | 572 +++++++++++++++++++++++
12 files changed, 4470 insertions(+), 1 deletions(-)
As usual one can grab new release from the
archive or via GIT
tree.
/devel/dst :: Link / Comments ()
Sun, 12 Oct 2008
Back to the roots.
Long ago I knew how hardware looks and actully made
some simple boards. Not with much success, but I did
not learn it at all, so did not expect anything serious
(in those days "Bugs"
was my favourite film, now I know a little bit more to understand,
that it was a bit wrong :).
Abr, who recently mastered his PhD, presented me (almost a year ago :)
a simple integrated circuit construction set:
K8055 -
USB interface board with 5 digital input channels and 8 digital output channels,
two analogue inputs and two analogue outputs with 8 bit resolution.

I spent half of the day soldering it today (it was quite simple task,
since board was already created and all details marked), and now board is ready.
Unfortunately I do not have appropriate USB connector (square one) to test
it, so it just lies here and reminds, that actually I do like such things very much.
Really very much, but know virtually nothing about IC, and even in the best days
in university I did not know even enough. I would like to start playing with IC now. Thinking...
/devel/other :: Link / Comments ()
Meanwhile at appartment development side.
I turned on my warm floor in the bathroom, and it's doing very well (I sometimes
move there just to warm oneself for a while, I will change windows next year,
since it is already quite cool).
Also created kind of a chair out of several stools stacked one on another,
since my table
is really high so requires appropriate chair.
I like my appartments more and more. I used to work everyday many years already,
not counting morning or evening, weekend or working days. The last 7-8 years I work
most of the time in the office, and I just did not like to work at home and did not know,
why.
I think now I know the reason: home place usually was not appropriately created and prepared.
Now with own loft, egoistically designed environment, custom-made things (like
table, specially designed for computer-like workload with lots of different
things used in conjunction), which I just love to have and work with,
with all that how my appartments are developed, I like to stay here (am I a misanthrope and egoist? :).
The last couple of weekends I spent here hacking on some projects (great thanks to The One,
who created datacenters and reverse tunnel in openssh), which is a very strange thing, since
I did not remember when I stayed at home more than a day and it was not related to some sickness.
Things are changing, and that's great!
/devel/flat :: Link / Comments ()
Fri, 10 Oct 2008
How to get back 100 MB/s in several clicks or fixing tbench regression for fun.
It was reported recently that tbench has a long history of regressions,
started at least from 2.6.23 kernel. I verified, that in my test
environment tbench
'lost'
more than 100 MB/s from 470 down to 355 for 8 threads
between at least 2.6.24 and 2.6.27. 2.6.26-2.6.27 performance regression
in my machines rougly corresponds to 375 down to 355 MB/s.
I spent several days (please do not think that I'm bored and have
nothing to do: there are really interesting things to work with,
but since I already started...) in various tests and bisections (unfortunately
bisect can not always point to the 'right' commit), and found following
problems.
First, related to the network, as lots of people expected: TSO/GSO over
loopback with tbench workload eats about 5-10 MB/s, since TSO/GSO frame
creation overhead is not paid by the optimized super-frame processing
gains. Since it brings really impressive improvement in big-packet
workload, it was (likely) decided not to add a patch for this, but
instead one can disable TSO/GSO via ethtool. This patch was added in
2.6.27 window, so it has its part in its regression.
Second part in the 26-27 window regression (I remind, it is about 20
MB/s) is related to the scheduler changes, which was expected by another
group of people. I tracked it down to the a7be37ac8e1565e00880531f4e2aff421a21c803
commit, which, if being reverted, returns 2.6.27 tbench perfromance to the highest
(for 2.6.26-2.6.27) 365 MB/s mark. I also tested tree, stopped at above commit itself,
i.e. not 2.6.27, adn got 373 MB/s, so likely another changes in that merge
ate couple of megs.
Curious reader can ask, where did we lost another 100 MB/s? This small
issue was not detected (or at least reported in netdev@ with provocative
enough subject), and it happend to live somehere in 2.6.24-2.6.25 changes.
I was so lucky to 'guess' (just after couple of hundreds of compilations),
that it corresponds to 8f4d37ec073c17e2d4aa8851df5837d798606d6f commit about
high-resolution timers. I sent a patch, based on revert of the above commit,
to the mail lists and developers, unfortunately it is impossible to clearly revert
it in 2.6.25 not even talking about 2.6.27 tree. That patch brings
performance for the 2.6.25 kernel tree to 455 MB/s.
There are still somewhat missed 20 MB/s, but 2.6.24 has 475 MB/s, so
likely bug lives between 2.6.24 and above 8f4d37ec073 commit, but this excercise
I left to Ingo and Peter :)
Sigh, it is more than 3 A.M. in Moscow, I think if I would be on stronger than Linux
kernel hacking drugs, all my organs would run away from me long ago...
/devel/other :: Link / Comments ()
Thu, 09 Oct 2008
An interesting observation about tbench regression.
For the last several days I bisected 26-27 kernel at least 4 times,
trying to find out where the bug lives. Unfortunately all the time I ended up
with some obscure patches, which do not even touch x86 arch, not talking about
network. Like avr32 or s390 patches, which do not even show up in my config.
And I do have kernel version changesets between the two major releases, which
have 15 MB/s difference between them. I do not know how exactly bisect works
(I think it is a binary search in the changesets, but for example I
do not know if it enters merge commit or just gets it as one) in particular and git
in general, so I will do the last try to find the problem place in the 26-27 window
manually selecting changesets to try and check the result. If I will not succeed there,
I will move to the 23-24 and 22-23 windows with usual bisect and return
to the current tree later. I think I've found a way to run two tests in parallel,
so things may run a little bit faster... Second machine just started to test 23-24,
which has the biggest drop.
/devel/other :: Link / Comments ()
Wed, 08 Oct 2008
Trumpeting in C.
I do not understand.
Just can not get it: why is it so complex for me to reliably
play in second octave. I mean play and not just make a sound.
I can produce sounds upto concert F, maybe G/G# (trumpet G and A/A# accordingly)
in the second octave, probably sometimes higher (battery in my
KORG AW-1
tuner discharged), but can not play a simple trumpet A-A#-C-D-E-F
(concert G-G#-Bb-C-D-Eb) scale. It is just first-to-second octave,
and although I can play each sound, I can not play them in a line.
Weird.
/life/music :: Link / Comments ()
This day has come.
One month of waiting.
One week of real work.
Seven releases of different projects.
One idea.
One implementation.
One project.
This day has come: the new, completely rewritten locking subsystem in
POHMELFS.
The release day!
Following changes were made:
- The new distributed locking subsystem. Locks were prepared to be byte-range,
but since all Linux filesystems lock the whole inode, it was decided to lock the whole
object during writing. Actual messages being sent for locking/cache coherency protocol
are byte-range, but because the whole inode is locked, lock is cached, so range actually
is equal to
inode->i_size. One can simultaneously write into the same page
via different offsets from different client, and every time file will be coherent on all
clients which do it and on the server itself.
- Documentation update. Fixed by Adam Langley (agl_imperialviolet.org)
- Add/del/show commands patch from Varun Chandramohan (varunc_linux.vnet.ibm.com)
- Bug fixes and cleanups.
Get the latest version from
archive
or via GIT tree.
Enjoy!
/devel/fs :: Link / Comments ()
Tue, 07 Oct 2008
Valgrind support for netchannels.
Alexandre Lissy (alexandre.lissy_smartjog.com) made a
patch
for the latest to date Valgrind version (3.2.1).
Now one can analyze performance bottlenecks with
netchannels applications
using standard techniques.
/devel/networking :: Link / Comments ()
POHMELFS locking testing.
So far it produced not bad results. But not good either.
I see locking messages and they are in the right order and file content
is not damaged, but clients frequently give up on timeout waiting for
lock to be granted. Since locking release process requires inode to be
unlocked (so it could be found and locked by the thread, which received
network packet), this indeed may take too long on slow media and disks,
since locking has to wait until data is written, for example wait for writeback
completion or page reading, if they were note in the cache yet.
I tested
POHMELFS
locks in Xen domains, where network speed is limited
by 3 MB/s and writing one million (or ten millions, that may be the point)
8-byte entries at different offsets (sequential step of 128 bytes) took more
than 50 seconds, so 5 seconds default lock timeout could be not enough.
That's the theory, in practice I need to test different timeouts and actually
run on real machines, but here comes another problem. I have three quite fast
SMP machines with lots of RAM connected over gigabit ethernet, which can be used
whatever I like. But...
The first one was essentially killed by
tbench regression
testing. They all have long history of problems with disks or SCSI controllers,
now it happens again: the first machine boots only with single 2.6.22 Debian kernel,
anything else (including vanilla 2.6.22) fails to read data from the software raid
partition, although disks are detected correctly.
Another machine is actively used by aforementioned tbench regression testing,
it takes quite long time to boot it and run tests, so things are slow enough.
And the last one is used to control IPMI, since it is the only way I have to reboot them,
so when I managed to freeze all three, I needed to contact people who needed to contact people,
who needed to hard reset machines in datacenter and put them into BIOS, since existing
KVM switches are stupid enough not to respond to keyboard when machine died.
So, I'm a bit forced to spread efforts in several different directions,
but nevertheless there is a little bit of time for the new things:
$ ./elliptics -c ./elliptics.conf
2008-10-07 01:03:39.430198 12778 Logging has been started.
2008-10-07 01:03:39.430559 12778 Successfully initialized 'sha1' hash.
2008-10-07 01:03:39.430641 12778 Node id: b551803fd74ff5590ed38f6ce8a10a2e577b2a9e
2008-10-07 01:03:39.431076 12778 Server is now listening at 127.0.0.1:1025.
$ cat elliptics.conf
#
# This is a simple config file for the elliptics network.
# Note, that spaces are skipped before and after the '=' delimiter.
#
log = /dev/stdout
hash = sha1
id = This is id string
#numeric_id = 1234567890abcdefffffffffffffffffffffffa
root = /tmp
addr = 127.0.0.1:1025:2
#addr = ::1:1025:10
That will be an excellent project (maybe even my best one to date :),
which will be used in... More details when things are ready.
I like the idea, so maybe it will give a name for my new site, like
noelliptics.net. Not yet though.
/devel/fs :: Link / Comments ()
Mon, 06 Oct 2008
New distributed storage release.
New DST release contains following changes:
- Keepalive messages to early detect failed nodes, which are sent if there is no traffic between the nodes.
- Listening socket reuses address now, which speeds up stop/start sequence.
- Fixed bug with wrong debug option, which could read uninitialized memory.
- Change module name from
dst.ko to nst.ko, since the former is used by dvb card.
- Whitespace cleanup.
As usual patch is available from
archive
or via GIT tree.
Enjoy!
Asked for inclusion again. Let's make bets on number of comments for the patch :)
/devel/dst :: Link / Comments ()
Sun, 05 Oct 2008
Civilization sometimes visits even me.
 "Cezares Illusion"
That's how my shower cabin and bathroom corner look now.
It took me virtually years to make this, but practically a day to install the cabin,
and still bathroom is not completed yet. I need to finish tile glueing and water
hatch installation.
Also need to complete brick tiles glueing (roughly 7 sqare meters of walls in kitchen),
which likely will be done with bathroom glueing.

There is a lot of work, but actually not that much as may look from the first view.
I just need a special development mood to start doing it good and fast, which, as a muse,
does not come on demand.
/devel/flat :: Link / Comments ()
Thu, 02 Oct 2008
POHMELFS got new locking subsystem.
I've completed a small rewrite of the distributed locks in
POHMELFS.
They can be byte-range, but since Linux VFS locks the whole inode
during writing, I decided first to implement simpler apporach,
so although clients send byte-range locks, server locks the whole
object.
If there is a simultaneous writing to the object, only one writer is allowed
at a time. Write locks are grabbed at write time, read locks at read time. Writing
is still handled via writeback, so all caching facilities persist. Locks are 'cached',
i.e. if inode was locked and no one else tried to update it, no new lock messages
are sent between server and client. Lock release message (initiated by another client,
who wants to start writing into the same file) forces inode writeback on the current
lock owner.
I've started a testing process, so far quite trivial, but I plan to write a simple
application, which will simultaneously write into the same file from different clients
into different offsets (like first client writes each second byte, second client writes
each third byte and so on) and check the result. If everything is ok, I will release a new
version this weekend and start implementation of the really cool distributed facilities
I plan to have in POHMELFS. It will be first implemented as a library, so that anyone could
use it to create a distributed storage without patching a kernel (but with own API though,
I do not want to mess with FUSE).
/devel/fs :: Link / Comments ()
Wed, 01 Oct 2008
Tbench regression. SLAB vs SLUB.
After I found
a small fix for tbench regression over loopback, I decided to run some tests with it.

As was expected, turning off TSO/GSO does not fix the whole issue, performance was increased from 366 MB/s upto 381 MB/s,
which is still less than 398 MB/s for 2.6.26-slub.
Another interesting issue I found, is SLAB vs SLUB difference. The former is always faster (about 5-7 MB/s difference):
366 vs 361 MB/s for 2.6.27-rc7 and 381 vs 374 MB/s when TSO and GSO are turned off. Pekka Enberg suggested to revert
5595cffc8248e4672c5803547445e85e4053c8fc commit, which could result in this performance degradation, but without
this commit SLUB behaves a little bit slower: 372 vs 374 MB/s.
I will try to find out why there is a huge drop between 2.6.23 and 2.6.24 (54 MB/s) next.
/devel/other :: Link / Comments ()
Mon, 29 Sep 2008
First fix for the tbench over loopback regression.
It brought me back about 5% in Xen domain with 256 MB of ram
in 4-clients tbench test:
current: 187 MB/s
patched: 194 MB/s
Patch is rather trivial: it disables TSO and GSO in loopback
and generically on devices which are capable of scatter-gather
(where it was automatically enabled by e5a4a72d4f8 commit, which
I biseced to be guilty). Actually TSO disablement part provided more gain than GSO on SG devices.
Idea behind patches is clear: we create bigger packet, so we should have
smaller overhead of its processing, but apparently GSO/TSO packet creation
overhead dominates in loopback at least.
My all three (big) test machines died in various (apparently unbootable) bisections,
so I tested it in small and very slow Xen domain. Because of that I did not run
2.6.22 kernel, since git operations and compilation take ages on this 'machine'.
For example I was only able to perform about dozen or so git checkous/resets/bisections
and compilations for the whole day.
I've posted patch to the netdev@, let's see the result.
Forgot to mention, that I wanted to sell this patch for the DST, POHMELFS or netchannels
patch review next time I will post them :)
/devel/other :: Link / Comments ()
Sun, 28 Sep 2008
Tbench Linux regressions with time.
It was reported, that starting from 2.6.23 Linux kernel has a continuous
network-related regression, which results in more than 20% performance degradation.
I checked it, and got interesting results.
It is better one time to see, than 1000 times to hear it.

Yes, we suck!
I decided to try to fix this issues, and started to bisect 2.6.22->2.6.23 and 2.6.26->2.6.27 on
two identical machines, which have 4 logical CPUs (HT enabled) and 4 GB of RAM.
Result was quite surprising: second bisection in the 22->23 froze machine
in the middle of the compilation, and first bisection in the 26->27 did not boot.
Since I ran it remotely, no progress on this til tomorrow.
/devel/other :: Link / Comments ()
Sat, 27 Sep 2008
POHMELFS cache coherency protocol.
Finally it looks like there are no killing bugs
or noticebly bad features in the distributed storage,
yesterday I pushed a change to drop wrong debug, which may resulted in a crash, also couple
of comment cleanups are waiting to be pushed, and likely that's it. It will be the last release,
if there will be no new feature requests or bugs found.
So, I switched back to the POHMELFS
development from DST.
To be really cool in cache coherency collisions, POHMELFS requires new locking/coherency mechanism, which
I implement similar to MOESI cache coherency protocol.
Which basically means a floating lock for given object,
which may be owned by only one client at a time not counting readers, they just receive a message, that theirs
data is not valid anymore.
First, I changed userspace management of the inode cache: now there is only single tree of all objects,
which were ever opened by any client. When client disconnects or drop inode locally, it is removed from the
server's cache also.
Next, there will be a special command to acicure grab/release a lock, which is only being sent by writers.
When writer starts its dirty job of damaging shared data, it sends a lock grab message to the server with
requested range, which in turn is broadcasted to the other writers, only single writer is allowed to own given area.
Then server proceeds with its usual tasks of cooking or waiting for IO. Eventually owner of the lock
decides to release it, for example after above message from the server it can flush data to the server
and send lock release message or just on its own. So server checks if given area is now free and sends lock
comepltion message to the requester. New owner receives the message, mark inode as own and starts writing there.
Any subsequent writing, if inode is marked as owned, does not end up with additional lock message.
So far looks doable, but I only completed what is called 'first' above :)
If there will be no major problems with other project, I plan to complete this part quickly and move furward.
/devel/fs :: Link / Comments ()
Fri, 26 Sep 2008
New failed ipw2100 interrupt and its races.
During my testing I managed to beat following interrupts out of the chip:
[41773.200686] ipw2100: Fatal interrupt. Scheduling firmware restart.
[41773.200707] eth1: Fatal error value: 0x500185B8, address: 0x08004501, inta: 0x40000000
[41773.200810] ipw2100 0000:02:04.0: PCI INT A disabled
[41773.203110] ipw2100: IRQ INTA == 0xFFFFFFFF
[41773.224446] ipw2100: IRQ INTA == 0xFFFFFFFF
[41773.245781] ipw2100: IRQ INTA == 0xFFFFFFFF
[41773.249360] ipw2100 0000:02:04.0: enabling device (0000 -> 0002)
[41773.249384] ipw2100 0000:02:04.0: PCI INT A -> Link[C0C8] -> GSI 11 (level, low) -> IRQ 11
[41773.249426] ipw2100 0000:02:04.0: restoring config space at offset 0x1
(was 0x2900002, writing 0x2900006)
This happens during PCI ipw2100 device disablement in the reset handler,
so when interrupt handler sees that, it bails out. It should be generally ok,
but I found a different thing: there is a race between interrupt handler (handler
itself and related processing tasklet) and
reset code. The latter disables interrupts before starting to turn adapter on,
but interrupt handler can run right now on given cpu and can schedule
the tasklet, so its disablement does not prevent parallel reading and writing of the
various registers.
IRQ processing tasklet does register reading and writing under the lock with interrupts
turned off, but reset tasklet does not protect initialization path against it, so I wonder,
what may happen in this case. Since register reading and writing happens from absolute
addresses (I meant there is no need to write address register first), this maybe not a problem,
but still race exists and theoretically can harm the system. Similar unguarded accesses exist
in ipw2100_wx_event_work() handler, and also there is unguarded status field setting
in various places in the driver, which can harm the driver's behaviour too.
So, maybe I decided to blame firmware a little bit early, although found things may
be harmless. I will try to figure this out later tomorrow.
/devel/networking/ipw2100 :: Link / Comments ()
Thu, 25 Sep 2008
ipw2100 fatal interrupt: playing with power states.
I was not able to force card not to send or receive packets
with ping tests, although definitely was able to generate lots
of fatal interrupt with completely different values and addresses.
Frequently card generates fatal interrupt with different values on the same
address, like below: eth1: Fatal error value: 0x50018584, address: 0x61C00000, inta: 0x40000000
eth1: Fatal error value: 0x50018584, address: 0x61C00000, inta: 0x40000000
eth1: Fatal error value: 0x5000CEE4, address: 0x61C00000, inta: 0x40000000
eth1: Fatal error value: 0x50018584, address: 0x61C00000, inta: 0x40000000
eth1: Fatal error value: 0x5000CEE4, address: 0x61C00000, inta: 0x40000000
eth1: Fatal error value: 0x50018584, address: 0x61C00000, inta: 0x40000000
eth1: Fatal error value: 0x50018584, address: 0x61C00000, inta: 0x40000000
eth1: Fatal error value: 0x5000CEE4, address: 0x61C00000, inta: 0x40000000
eth1: Fatal error value: 0x50018584, address: 0x61C00000, inta: 0x40000000
eth1: Fatal error value: 0x50018584, address: 0x61C00000, inta: 0x40000000
They did not follow one after another though.
Different error values likely mean, that there is no any correlation between
values and addresses, so this information is useless.
I added power state changes to the reset function, so now it does something like that:
[ 897.661002] ipw2100: Fatal interrupt. Scheduling firmware restart.
[ 897.661021] eth1: Fatal error value: 0x30016C44, address: 0x601F7C00, inta: 0x40000000
[ 897.664712] ipw2100 0000:02:04.0: PCI INT A disabled
[ 897.712041] ipw2100 0000:02:04.0: enabling device (0000 -> 0002)
[ 897.713549] ipw2100 0000:02:04.0: PCI INT A -> Link[C0C8] -> GSI 11 (level, low) -> IRQ 11
[ 897.713595] ipw2100 0000:02:04.0: restoring config space at offset 0x1
(was 0x2900002, writing 0x2900006)
[ 954.646319] ipw2100: Fatal interrupt. Scheduling firmware restart.
[ 954.646338] eth1: Fatal error value: 0x5000CF10, address: 0x61A00000, inta: 0x40000000
[ 954.646429] ipw2100 0000:02:04.0: PCI INT A disabled
[ 954.692041] ipw2100 0000:02:04.0: enabling device (0000 -> 0002)
[ 954.692063] ipw2100 0000:02:04.0: PCI INT A -> Link[C0C8] -> GSI 11 (level, low) -> IRQ 11
[ 954.692103] ipw2100 0000:02:04.0: restoring config space at offset 0x1
(was 0x2900002, writing 0x2900006)
[ 968.585409] ipw2100: Fatal interrupt. Scheduling firmware restart.
[ 968.585429] eth1: Fatal error value: 0x5000C9D0, address: 0x57E00500, inta: 0x40000000
[ 968.585517] ipw2100 0000:02:04.0: PCI INT A disabled
[ 968.632037] ipw2100 0000:02:04.0: enabling device (0000 -> 0002)
[ 968.632059] ipw2100 0000:02:04.0: PCI INT A -> Link[C0C8] -> GSI 11 (level, low) -> IRQ 11
[ 968.632099] ipw2100 0000:02:04.0: restoring config space at offset 0x1
(was 0x2900002, writing 0x2900006)
[ 972.269514] ipw2100 0000:02:04.0: PCI INT A disabled
[ 972.316041] ipw2100 0000:02:04.0: enabling device (0000 -> 0002)
[ 972.316400] ipw2100 0000:02:04.0: PCI INT A -> Link[C0C8] -> GSI 11 (level, low) -> IRQ 11
[ 972.316446] ipw2100 0000:02:04.0: restoring config space at offset 0x1
(was 0x2900002, writing 0x2900006)
As we can see, fatal interrupts did not dissapear, and are actually as frequent as before.
Also got this lines:[ 2032.560413] ipw2100: exit - failed to send CARD_DISABLE command
[ 2032.560449] ipw2100: exit - failed to send CARD_DISABLE command
[ 2032.560491] ipw2100: exit - failed to send CARD_DISABLE command
[ 2032.560593] ipw2100: exit - failed to send CARD_DISABLE command
One after another, which does not provide me any clue though.
I've started several big torrent downloads/seeds as a big load, maybe card somehow
differentiates different flows, so this test should be more heavy than lots
of pings. First time I noticed fatal interrupt problem with this kind of load,
when card not only stopped to work, but also printed some goodbay message.
So far conclusion is not very optimistic: fatal interrupts happen always, no matter
what magic is enabled in the reset, which already tells that firmware is broken.
Hopefully additional reset games with power management will allow card to work,
even with those interrupts. Time will tell.
/devel/networking/ipw2100 :: Link / Comments ()
Wed, 24 Sep 2008
More Kernel Summit photos.

At kernel summit.
Check new ones!
/devel/other :: Link / Comments ()
New DST release.
This is a maintenance release, which contains following changes:
- Use idr to manage minor numbers. Now create/remove/create sequence does not
produce new minor, but uses previous one, which is now freed.
- Added cache name to the node. It is possible to have freed node still
being alive while we register new node with the same name, so its cache name should be different.
- Wait during node removal until there are no pending transaction, so node would be
freed in process context and not in the receiving threads itself.
- Warn user if there is no security permission config file during
export node initialization. No client will be allowed to connect
without explicit security association.
- Tune default size of the page pool for crypto processing a bit.
I want to thank Remy Ritchen (remy.ritchen_gmail.com) for his excellent tests and analysis.
As usual, DST
is available from archive and via
git tree.
/devel/dst :: Link / Comments ()
First ipw2100 testing: fatal interrupt.
I managed to compile small enough kernel, which boots on
my laptop (do not know how long it took, since fell asleep),
and managed to bring fatal interrupt error just after several seconds
of ping -f 192.168.1.1 -s 8192 on freshly booted
machine. 192.168.1.1 is my gateway address.
Here is the result with the patch I posted to the mail lists,
which was not acked, replied and commented though (well, I have to admit,
that if I would send it couple of mails earlier, it could probably find its
way into the tree, but I still believe that it would not result in anything,
since everyone knows about this bug, it just is not fixed by some reasons).
Intel developers (at least those who maintain the driver) continue to keep silence.
[ 613.960164] ipw2100: exit - failed to send CARD_DISABLE command
[ 624.456033] eth1: no IPv6 routers present
[ 690.721534] ipw2100: Fatal interrupt. Scheduling firmware restart.
[ 690.721554] eth1: Fatal error value: 0x5000C97C, address: 0x100E201C, inta: 0x40000000
[ 690.721580] ------------[ cut here ]------------
[ 690.721587] WARNING: at drivers/net/wireless/ipw2100.c:3188
ipw2100_irq_tasklet+0x8fe/0x9b0 [ipw2100]()
[ 690.721736] Pid: 0, comm: swapper Not tainted 2.6.27-rc7-mainline #2
[ 690.721744] [] warn_on_slowpath+0x5f/0x90
[ 690.721763] [] up+0x11/0x40
[ 690.721773] [] release_console_sem+0x190/0x1d0
[ 690.721786] [] enqueue_hrtimer+0x72/0xf0
[ 690.721795] [] printk+0x1b/0x20
[ 690.721805] [] ipw2100_irq_tasklet+0x8fe/0x9b0 [ipw2100]
[ 690.721831] [] hrtick_start_fair+0x157/0x170
[ 690.721844] [] enqueue_hrtimer+0x72/0xf0
[ 690.721855] [] snd_intel8x0_interrupt+0x1d7/0x250 [snd_intel8x0]
[ 690.721875] [] tasklet_action+0x46/0xb0
[ 690.721886] [] __do_softirq+0x75/0xf0
[ 690.721897] [] do_softirq+0x37/0x40
[ 690.721906] [] do_IRQ+0x40/0x70
[ 690.721917] [] getnstimeofday+0x37/0xe0
[ 690.721927] [] common_interrupt+0x23/0x28
[ 690.721937] [] sys_setpgid+0xd8/0x190
[ 690.721955] [] acpi_idle_enter_simple+0x15a/0x1c1 [processor]
[ 690.721980] [] cpuidle_idle_call+0x7b/0xc0
[ 690.721991] [] cpu_idle+0x46/0xe0
[ 690.722000] =======================
[ 690.722006] ---[ end trace 70268f59a00d957c ]---
[ 695.271318] ipw2100: Fatal interrupt. Scheduling firmware restart.
[ 695.271337] eth1: Fatal error value: 0x50014148, address: 0x60207E04, inta: 0x40000000
writing this note and starting over
[ 1520.709136] ipw2100: Fatal interrupt. Scheduling firmware restart.
[ 1520.709156] eth1: Fatal error value: 0x5000C96C, address: 0x538E7E40, inta: 0x40000000
[ 1550.954315] ipw2100: Fatal interrupt. Scheduling firmware restart.
[ 1550.954334] eth1: Fatal error value: 0x5000C99C, address: 0x08418004, inta: 0x40000000
[ 1592.175473] ipw2100: Fatal interrupt. Scheduling firmware restart.
[ 1592.175492] eth1: Fatal error value: 0x50018588, address: 0x57E77A00, inta: 0x40000000
So, this fatal error value and address numbers do not tell me anything,
but since they are always different on different addresses, I think firmware
just loses its mind and stops responding.
The first line, where ipw2100 fails to send a command, was obtained during
ifdown of the interface. I never saw it before, but do not think
it is related though.
So, I need to move to the office and want to make some
distributed storage
changes, namely fix an issue with name collision (kernel already has a dvb card, which
module is called dst.ko), and implement better minor number allocation
scheme for the imported devices, since right now after node was created and distroyed,
new one will not get the same number, but continuously increasing one, which looks
confusing and may bring a sysfs initialization error (when system tries to
register kobject with existing name).
I will continue ipw2100 experiments today's night if will not fall asleep again
because of jetlag. Stay tuned!
/devel/networking/ipw2100 :: Link / Comments ()
Tue, 23 Sep 2008
2008 Linux Kernel Summit photos.
 At kernel summit.
Last couple of photos were made at Linux Plumbers Conference (filesystem bof).
Not all of them got into the gallery though, I need to try to find missing bits.
I needed to get a real flash instead and do not use build-in one, which sometimes mangled the images...
Got a look?
/devel/other :: Link / Comments ()
Mon, 22 Sep 2008
Do you know why mammoths are dead?
I'm sorry, I'm not going to waste time on this if you keep acting
this dishonest; welcome to my mail filter...
If we pretend to not know about the problem, problem comes and hits
us out of stand.
/devel/other :: Link / Comments ()
Sun, 21 Sep 2008
Walking in Portland.

Actually I think I like this city much more than when just saw it.
It is small, but still alive, it has parks and river as long excellent coast to
walk at (without access to water though, since it is navigable). There are
interesting buildings and lots of places where to take a seat like restaurants,
cafes and pubs. Once heared live jazz music from the street, but was suggested to
visit Ostin: capital of the live music in the USA.
So, couple of photos
of Portland I made (without any artistical attempts). Several KS and Plumbers photos are pending.
/life :: Link / Comments ()
Mark IPW2100 driver as broken in linux kernel.
Just sent a patch to
zillions of maillist (netdev@, linux-kernel@, linux-wireless@) and
to lots of developers because of its Fatal interrupt. Scheduling firmware restart.
problem.
Let's see if Intel folks will do anything.
Also added couple of jokes about conspiracy theories (like bug fires because Intel
forces us to buy a new adapter by this error) to make it a little bit more flameable
and to bring attention. I really hope Intel does not do it intentionally.
/devel/other :: Link / Comments ()
Wed, 17 Sep 2008
A small gift from Gumstix.
Overo board:
Texas Instruments X-Loader 1.4.2 (Sep 10 2008 - 08:47:04)
Reading boot sector
Loading u-boot.bin from mmc
U-Boot 1.3.4 (Sep 10 2008 - 08:47:30)
OMAP3503-GP rev 2, CPU-OPP2 L3-165MHz
Gumstix Overo board + LPDDR/NAND
DRAM: 128 MB
NAND: 256 MiB
*** Warning - bad CRC or NAND, using default environment
In: serial
Out: serial
Err: serial
Hit any key to stop autoboot: 0
reading uImage
2501840 bytes read
## Booting kernel from Legacy Image at 82000000 ...
Image Name: Angstrom/2.6.27-rc6+r27+giteddca
Image Type: ARM Linux Kernel Image (uncompressed)
Data Size: 2501776 Bytes = 2.4 MB
Load Address: 80008000
Entry Point: 80008000
Verifying Checksum ... OK
Loading Kernel Image ... OK
OK
Starting kernel ...
Uncompressing Linux............................................................................
..................................................
Linux version 2.6.27-rc6-omap1 (sakoman@tera) (gcc version 4.2.1) #1 Wed Sep 10 20:32:01 PDT 2008
CPU: ARMv7 Processor [411fc082] revision 2 (ARMv7), cr=00c5387f
Machine: Gumstix Overo
Memory policy: ECC disabled, Data cache writeback
OMAP3430 ES2.2
SRAM: Mapped pa 0x40200000 to va 0xd7000000 size: 0x100000
CPU0: L1 I VIPT cache. Caches unified at level 2, coherent at level 3
CPU0: Level 1 cache is separate instruction and data
CPU0: I cache: 16384 bytes, associativity 4, 64 byte lines, 64 sets,
supports RA
CPU0: D cache: 16384 bytes, associativity 4, 64 byte lines, 64 sets,
supports RA WB WT
CPU0: Level 2 cache is unified
CPU0: unified cache: 262144 bytes, associativity 8, 64 byte lines, 512 sets,
supports WA RA WB WT
Built 1 zonelists in Zone order, mobility grouping on. Total pages: 32512
Kernel command line: setenv bootargs console=ttyS2,115200n8
root=/dev/mmcblk0p2 rw rootfstype=ext3 rootdelay=1
Clocking rate (Crystal/DPLL/ARM core): 26.0/331/500 MHz
GPMC revision 5.0
IRQ: Found an INTC at 0xd8200000 (revision 4.0) with 96 interrupts
Total of 96 interrupts on 1 active controller
OMAP34xx GPIO hardware version 2.5
PID hash table entries: 512 (order: 9, 2048 bytes)
OMAP clockevent source: GPTIMER1 at 32768 Hz
Console: colour dummy device 80x30
Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
Memory: 128MB = 128MB total
Memory: 124372KB available (4088K code, 368K data, 916K init)
SLUB: Genslabs=12, HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
Calibrating delay loop... 499.92 BogoMIPS (lpj=1949696)
Mount-cache hash table entries: 512
CPU: Testing write buffer coherency: ok
net_namespace: 488 bytes
NET: Registered protocol family 16
Found NAND on CS0
Registering NAND on CS0
OMAP DMA hardware revision 4.0
USB: No board-specific platform config found
i2c_omap i2c_omap.1: bus 1 rev3.12 at 2600 kHz
i2c_omap i2c_omap.3: bus 3 rev3.12 at 400 kHz
TWL4030: TRY attach Slave TWL4030-ID0 on Adapter OMAP I2C adapter [1]
TWL4030: TRY attach Slave TWL4030-ID1 on Adapter OMAP I2C adapter [1]
TWL4030: TRY attach Slave TWL4030-ID2 on Adapter OMAP I2C adapter [1]
TWL4030: TRY attach Slave TWL4030-ID3 on Adapter OMAP I2C adapter [1]
Initialized TWL4030 USB module
SCSI subsystem initialized
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
musb_hdrc: version 6.0, pio, host, debug=0
musb_hdrc: USB Host mode controller at d80ab000 using PIO, IRQ 92
musb_hdrc musb_hdrc: MUSB HDRC host driver
musb_hdrc musb_hdrc: new USB bus registered, assigned bus number 1
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 1 port detected
usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb1: Product: MUSB HDRC host driver
usb usb1: Manufacturer: Linux 2.6.27-rc6-omap1 musb-hcd
usb usb1: SerialNumber: musb_hdrc
Bluetooth: Core ver 2.13
NET: Registered protocol family 31
Bluetooth: HCI device and connection manager initialized
Bluetooth: HCI socket layer initialized
NET: Registered protocol family 2
IP route cache hash table entries: 1024 (order: 0, 4096 bytes)
TCP established hash table entries: 4096 (order: 3, 32768 bytes)
TCP bind hash table entries: 4096 (order: 2, 16384 bytes)
TCP: Hash tables configured (established 4096 bind 4096)
TCP reno registered
NET: Registered protocol family 1
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
JFFS2 version 2.2. (NAND) (SUMMARY) © 2001-2006 Red Hat, Inc.
msgmni has been set to 243
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered (default)
omapfb: configured for panel overo
omapfb: DISPC version 3.0 initialized
Console: switching to colour frame buffer device 128x48
omapfb: Framebuffer initialized. Total vram 1572864 planes 1
omapfb: Pixclock 54000 kHz hfreq 45.1 kHz vfreq 57.7 Hz
Serial: 8250/16550 driver4 ports, IRQ sharing enabled
serial8250.0: ttyS0 at MMIO 0x4806a000 (irq = 72) is a ST16654
serial8250.0: ttyS1 at MMIO 0x4806c000 (irq = 73) is a ST16654
serial8250.0: ttyS2 at MMIO 0x49020000 (irq = 74) is a ST16654
console [ttyS2] enabled
brd: module loaded
loop: module loaded
usbcore: registered new interface driver asix
usbcore: registered new interface driver cdc_ether
usbcore: registered new interface driver usb8xxx
libertas_sdio: Libertas SDIO driver
libertas_sdio: Copyright Pierre Ossman
i2c /dev entries driver
TWL4030 GPIO Demux: IRQ Range 384 to 402, Initialization Success
input: triton2-pwrbutton as /class/input/input0
triton2 power button driver initialized
Driver 'sd' needs updating - please use bus_type methods
omap2-nand driver initializing
NAND device: Manufacturer ID: 0x2c, Chip ID: 0xba (Micron NAND 256MiB 1,8V 16-bit)
cmdlinepart partition parsing not available
Creating 5 MTD partitions on "omap2-nand":
0x00000000-0x00080000 : "xloader"
0x00080000-0x00240000 : "uboot"
0x00240000-0x00280000 : "uboot environment"
0x00280000-0x00680000 : "linux"
0x00680000-0x10000000 : "rootfs"
ehci-omap ehci-omap.0: new USB bus registered, assigned bus number 2
ehci-omap ehci-omap.0: irq 77, io mem 0x48064800
ehci-omap ehci-omap.0: USB 0.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb2: configuration #1 chosen from 1 choice
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 3 ports detected
usb usb2: New USB device found, idVendor=1d6b, idProduct=0002
usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb2: Product: OMAP-EHCI Host Controller
usb usb2: Manufacturer: Linux 2.6.27-rc6-omap1 ehci_hcd
usb usb2: SerialNumber: ehci-omap.0
Initializing USB Mass Storage driver...
usbcore: registered new interface driver usb-storage
USB Mass Storage support registered.
mice: PS/2 mouse device common for all mice
twl4030_rtc twl4030_rtc: rtc core: registered twl4030_rtc as rtc0
twl4030_rtc twl4030_rtc: Power up reset detected.
twl4030_rtc twl4030_rtc: Enabling TWL4030-RTC.
OMAP Watchdog Timer Rev 0x31: initial timeout 60 sec
Bluetooth: HCI UART driver ver 2.2
Bluetooth: HCI H4 protocol initialized
Bluetooth: HCI BCSP protocol initialized
Bluetooth: Broadcom Blutonium firmware driver ver 1.2
usbcore: registered new interface driver bcm203x
Bluetooth: Digianswer Bluetooth USB driver ver 0.10
usbcore: registered new interface driver bpa10x
mmci-omap mmci-omap.2: No Slots
usbcore: registered new interface driver usbhid
usbhid: v2.6:USB HID core driver
Advanced Linux Sound Architecture Driver Version 1.0.17.
usbcore: registered new interface driver snd-usb-audio
ASoC version 0.13.2
overo SoC init
TWL4030 Audio Codec init
asoc: twl4030 <-> omap-mcbsp-dai mapping ok
ALSA device list:
#0: overo (twl4030)
oprofile: using timer interrupt.
TCP cubic registered
NET: Registered protocol family 17
NET: Registered protocol family 15
Bluetooth: L2CAP ver 2.11
Bluetooth: L2CAP socket layer initialized
Bluetooth: SCO (Voice Link) ver 0.6
Bluetooth: SCO socket layer initialized
Bluetooth: RFCOMM socket layer initialized
Bluetooth: RFCOMM TTY layer initialized
Bluetooth: RFCOMM ver 1.10
Bluetooth: BNEP (Ethernet Emulation) ver 1.3
Bluetooth: BNEP filters: protocol multicast
Bluetooth: HIDP (Human Interface Emulation) ver 1.2
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
ieee80211: 802.11 data/management/control stack, git-1.1.13
ieee80211: Copyright (C) 2004-2005 Intel Corporation
ThumbEE CPU extension supported.
Power Management for TI OMAP3.
SmartReflex driver initialized
VFP support v0.3: implementor 41 architecture 3 part 30 variant c rev 1
twl4030_rtc twl4030_rtc: setting system clock to 2000-01-01 00:00:00 UTC (946684800)
Waiting 1sec before mounting root device...
mmc0: host does not support reading read-only switch. assuming write-enable.
mmc0: new high speed SD card at address 0007
mmcblk0: mmc0:0007 SD02G 1992704KiB
mmcblk0: p1 p2
kjournald starting. Commit interval 5 seconds
EXT3 FS on mmcblk0p2, internal journal
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem).
Freeing init memory: 916K
INIT: version 2.86 booting
Starting udevudevd version 124 started
Remounting root file system...
mount: according to mtab, /proc is already mounted on /proc
NET: Registered protocol family 10
Setting up IP spoofing protection: rp_filter.
Configuring network interfaces... ifconfig: SIOCGIFFLAGS: No such device
eth0 No such device
udhcpc: SIOCGIFINDEX: No such device
Error for wireless request "Set Mode" (8B06) :
SET failed on device wlan0 ; No such device.
Error for wireless request "Set ESSID" (8B1A) :
SET failed on device wlan0 ; No such device.
ifconfig: SIOCGIFFLAGS: No such device
wlan0No such device
udhcpc: SIOCGIFINDEX: No such device
done.
Starting portmap daemon: portmap.
Sat Sep 13 22:36:00 UTC 2008
Turning echo off on /dev/ttyS1
INIT: Entering runlevel: 5
ALSA: Restoring mixer settings...
Starting Dropbear SSH server: dropbear.
Starting advanced power management daemon: No APM support in kernel
(failed.)
Starting system message bus: dbus.
Starting Hardware abstraction layer hald
Starting syslogd/klogd: start-stop-daemon: lseek: Invalid argument
* Starting Avahi mDNS/DNS-SD Daemon: avahi-daemon
[ ok ]
Starting Bluetooth subsystem:
Initialization timed out.
Running ntpdate to synchronize clockError : Temporary failure in name resolution
.
Starting GPE display manager: gpe-dm
.-------.
| | .-.
| | |-----.-----.-----.| | .----..-----.-----.
| | | __ | ---'| '--.| .-'| | |
| | | | | |--- || --'| | | ' | | | |
'---'---'--'--'--. |-----''----''--' '-----'-'-'-'
-' |
'---'
The Angstrom Distribution overo ttyS2
Angstrom 2008.1-test-20080911 overo ttyS2
overo login: root
Welcome to Gumstix Overo
For more information visit:
http://www.gumstix.net/Overo/115.html
login[1458]: root login on `ttyS2'
root@overo:~#

Neat toy! Computer itself actually has a size of the finger (not that thick though),
but it does not have a power supply and interface connectors, so essentially unusable
as stand-alon board, but with extension motherboard (as on the picture) it becomes very
interesting with several usb connectors, hdmi display and audio connectors.
WiFi/bluetooth module is based on wi2wi W2CBW003 Marvell 88W8686 chip. Pretty much
unlikely Marvell will share a documentation (on my experience if you do not
get more than 1000 chips in single order you will not be allowed to enter
its intranet and get access to the needed datasheets), so I will not be able to work
on wireless driver, but I would gladly implement it otherwise.
/devel/other :: Link / Comments ()
Al Viro uses Appele's Mac.

And an interesting
mix of russian and english from him.
/devel/other :: Link / Comments ()
Tue, 16 Sep 2008
Intel, BURN IN HELL!
[47477.938968] ipw2100: Fatal interrupt. Scheduling firmware restart.
[47478.808276] ipw2100: Fatal interrupt. Scheduling firmware restart.
[47480.611796] ipw2100: Fatal interrupt. Scheduling firmware restart.
[47483.415218] ipw2100: Fatal interrupt. Scheduling firmware restart.
[47487.154543] ipw2100: Fatal interrupt. Scheduling firmware restart.
There is no wired access at kernel summit, but Pavel Emelyanov setup a NAT
for me, so I can write this exceptionally informative note.
/devel/other :: Link / Comments ()
Mon, 15 Sep 2008
God bless America! Because no one else will.
Likely I'm just kidding, but...
First, Portland met me with the excellent weather. Just bloody perfect one,
about 25-30 degrees Centigrade, now rain or cold winds. Very nice.
Second, hotel is quite good, with couple of exceptions though: they do not have
european-to-american electricity socket adapter. They do not even have a meter of wire,
so that I could create it myself. They have american to UK one though.
There are no 24 hours opened shops except Starbucks. I found one small food shop though,
which did not
sold me a bear until 7 AM, I was wished a good breakfast with 6 bottles of Bud.
In Moscow it was 6 PM. Actually currently I think I already do not suffer from
the jetlag, although wake up at 4-5 AM local time.
Portland is a quite small city. I managed to walk around central district
for several hours. And it is slow. In that regard, that there is no
some kind of a life flow, no energy, no drive... It is likely a perfect place
to raise childrens or draw pictures (Portland has excellent nature) though.
I made several photos during the walk around the city (even listened
for live music in so called Portland Saturday Market, where people sell
hand-made stuff, there are couple of real gems there I think)
as long as Linux kernel summit
ones, but becaue of slow internet access I will not publish them yet, expect gallery
update in a week or so.
/life :: Link / Comments ()
Sat, 13 Sep 2008
New distributed storage release.
This is maintenance only release of the
DST, which brings us following changes:
- Fixed memory leak in crypto thread initialization error path. Noticed by Sven Wegener (sven.wegener_stealer.net).
- Unprotected tree access (exceptionally stupid bug, I was made blind by the electronic equipment), and tricky bug_on catch in scsi
code caused by incorrect bio flag initialization in the exporting node. 64bit alignment fix.
Bugs reported by Rémy Ritchen(
- Couple of bogus compilation warnings about unintialized variables cought by different compiler.
- Allow both hread and write permission, not only read or write in security config.
Patch can be found in git tree
or archive.
The most tricky bug is scsi's BUG_ON(), which did not even contain any DST related calls.
It was cought at drivers/scsi/scsi_lib.c:1175:
kernel BUG at drivers/scsi/scsi_lib.c:1175!
RIP: 0010:[] [] scsi_setup_fs_cmnd+0x64/0x70
...
[] ? sd_prep_fn+0xa8/0x9b0
[] ? __cfq_slice_expired+0x59/0xb0
[] ? cfq_dispatch_requests+0x8d/0x330
[] ? elv_next_request+0x119/0x250
[] ? scsi_request_fn+0x6b/0x3c0
[] ? generic_unplug_device+0x24/0x30
[] ? blk_unplug_work+0x41/0x80
Which is the following code:
int scsi_setup_fs_cmnd(struct scsi_device *sdev, struct request *req)
{
struct scsi_cmnd *cmd;
int ret = scsi_prep_state_check(sdev, req);
if (ret != BLKPREP_OK)
return ret;
/*
* Filesystem requests must transfer data.
*/
BUG_ON(!req->nr_phys_segments);
Which means that request structure did not contain any segment to process. Origianlly
I thought that it is because of some tricky elevator steps, which selected wrong request queue
because of all debug showed, that sync bio (block IO request with BIO_RW_SYNC bit set)
is handled differently compared to the same request without this flag. But experiments with various
flags showed, that bug occurs no matter how, but just in completely unpredictible place.
Fortunately I managed to catch it in a debug trap in block IO merging path, which showed me, that
block IO requests with very srtange read/write and flags fields was a cause of this error. Looking more
precisely to the block queue allocation path, I found, that its default initialization is not correct,
and my setup happens before it, so it did not contain the right parameters for the maximum request sizes
(hw and phys sectors). This also showed, that one block IO request in the export node had clone and other
local-only fields, which is very wrong for the bio to be submitted, which actually resulted in the seen bug.
Those fields were set by the client bio and should not be transferred to the remote one, so I only limited flag fields
to show that bio is uptodate and have blockable IO bit.
That's the story about how things were hacked this day (its a middle of the night actually, while I'm waiting
for the taxi to move to the airport), so
POHMELFS locking algorithm
was not implemented today, and likely is postponed to the next weekend when I return, since I got a
group theory book and made some prints about numbers theory (after completed reading Vinogradov's book),
so I will have what to read in all four planes (two in each direction) if I will not fall asleep,
and likely I will not have much time in Portland:
we will need to talk/listen to other people and check local pubs (people suggested some coctail places, but I prefer
beer).
See you in Portland.
/devel/dst :: Link / Comments ()
Thu, 11 Sep 2008
POHMELFS development process.
I completed design (without implementation yet :) of the new
locking (or cache coherency mechanism, it does not really matter)
for the shared objects in the
POHMELFS.
It is somewhat close to the MOSI (even MOESI) cache coherency protocol,
used in modern CPUs, although also differs a bit because of the nature
of the POHMELFS server. It can provide byte-range locking for any object,
but so far I will only implement per-file locking (i.e. the whole file will
be 'locked' or 'owned' when client performs a write, even if another client
could write to the different location in the same object), and if scalability
will not be good enough, it can be extended (not that complex though). Since
all in-kernel filesystems lock the whole inode when performing a write,
this should not be a big problem.
This approach requires to change POHMELFS server's directory cache, but I
never liked existing one, since it looks a bit over engineered.
If things will go smooth, I will complete it tomorrow before flying to the kernel summit
(saturday early morning), since idea is really not very complex as long as I expect
implementation to be.
Meanwhile, DST
got a fix for the incredibly stupid bug I made, even do not want to call this
'bug', it is likely a tricky created blindness by the electrons moving things around in my
monitor. They forced me not to see an obvious place to lock access to, which resulted
in a nasty oopses. Patch is already in the
git tree.
There is another one though: when some SCSI device is being exported and client performs a
write, request has somehow zero req->nr_phys_segments field (which should
be initialized from block IO request), which catches a BUG_ON() in the scsi
code. I'm working on it right now.
/devel/fs :: Link / Comments ()
Yesterday, all my troubles seemed so far away...
Playing this known and not too complex song today.

Trumpet has an excellent sound here imho, although I did not always produce
it. Frequently vibratto was mixed into the music, so it became not very interesting.
Nevertheless, when I succeed (note, that I only played it first time today
about 30-40 minutes, so it was not very frequently cool with my trumpet
'experience', maybe one third :), I do like it. Also 15-24 part was not clear to me,
since I do not remember this part of the original song, so played just via
intervals, which sounded, but I think not too good.
Also tried to improvise in blues pentatonism.

Not with much of a success though. I can play notes and move over the
row with different steps, but it does not sound very good. Likely because
I do not know intervals and what is being sounded does not feet interesting timing.
Unfortunately couple of books and articles about improvisation I have contain too specific
for the musical theory terminology, so I frequently just do not understand, what's that.
Quite interesting happend to play some random mix of sounds and select those
sets of 3-7 ones, which sounded cool.
This looks like blind man is trying to walk around, but nevertheless I like it...
Stay tuned, maybe eventually I will record some random 'music' I produce out of my trumpet :)
/life :: Link / Comments ()
Tue, 09 Sep 2008
Userspace network stack git tree is now open.
One can check it via
web interface.
/devel/networking/unetstack :: Link / Comments ()
Trumpeting in C.
I found quite simple and interesting to transpose sounds from trumpet
pinch to the piano one, i.e. by increasing sound by two tones. It is not
that complex as I tought before, but if you already know the melody,
I can play from the script (very slowly and without half-tones though),
but can not transpose in a real-time.
Also found, that playing pedal tones in a mix with the highest (for me
it is very end of the second octave) register does help to bump the highest
notes, and also makes lips no tire quickly. Actually after hour exercise
today they almost were not tired because of it.
Found some interesting sounds and started to think about music theory.
I play not very good, but I will eventually, it is a matter of a practice.
But I completely do not know music theory, but want to put my musical thoughs
into the sound. Usually I fail, so want to fill this gap.
I think I need to start with the harmony, so searching for a good book about it.
/life :: Link / Comments ()
New distributed storage release: "There is no spoon, black and white".
This is a very minor
DST update,
which contains following changes:
sector_t compilation warnings removed.
- Debug, init, alloc, whatever cleanups noted by Sven Wegener (sven.wegener_stealer.net).
- S o m e c h e c k p a t c h . p l m a s t u r b a t i o n
- New name: "Linux benevolent dictator said: there is no spoon, black and white"
Actually I fixed only small amount of the crap returned by checkpatch.pl,
particulary I did not fix cases of long lines, when it is actually a comment added after
some variable, or things like
for (i=0; i<n; ++i) and
struct some_name
{
...
}
when checkpatch.pl wants
for (i = 0; i < n; ++i) and
struct some_name {
...
}
But tried to remove more than 80-characters code strings, trailing spaces and
couple of other warnings.
Now I will concentrate on POHMELFS
locking and then distributed facilities. Stay tuned, new version will be extremely cool in this regard!
/devel/dst :: Link / Comments ()
Mon, 08 Sep 2008
New distributed storage release.
It brings us following changes:
- Permission checks in export node. Read-only connections.
- Remove DST node from the global table not only when it is freed,
but also on demand with node del command.
I think project is completed. I added inclusion request
(with grammar error of course, how else) into announcement mail.
Check it out!
/devel/dst :: Link / Comments ()
GIT web interface for the projects.
I've put there distributed storage
with tools, pohmelfs
with its server and netchannels.
Enjoy!
/devel/other :: Link / Comments ()
Sun, 07 Sep 2008
New netchannels release.
Network channel
is peer-to-peer protocol agnostic communication channel
between hardware and userspace. It uses unified cache to store it's
channels. All protocol processing happens in process context.
This release brings us reworked (and very simple) unified
storage for all kinds of protocols (netchannel can be created for any kind
of the protocol), completely lockless data processing
(data queueing into the netchannel and its lookup in the global
storage are protected by RCU), simplifed interface.
Feature list:
- Very high bulk performance with small packets
(check userspace network stack
for more details).
- Completely lockless netchannel processing (packet queueing and netchannel lookup in the global storage are protected by RCU).
- Unified storage for all kinds of protocols: TCP/UDP, IP/IPv6, whatever you decide to implement on top of hardware layer you use.
- No protocol processing. This is pushed to the peer itself. For example to the
userspace network stack.
- Ability to inject packet into the network without root priveledges.
Userspace network stack
is the main user of the new netchannel subsystem.
Todo list include:
- Ability to improve receiving latencies (queue packets from hardware interupt handler and not software interrupt).
- Automatically scale netchannel hash table on demand.
/devel/networking :: Link / Comments ()
New userspace network stack release.
Unetstack
is an extremely small and fast TCP/UDP/IP stack implementation on top of packet socket or
netchannels interface.
This release includes sync with the new netchannels interface,
dropped routing table support, since userspace network stack is designed on
behalf of netchannels and thus efectively single opened object operates
with single source and destination peers, so there is no need to
introduce unneded caches, since all needed information can be stored in
the userspace network stack object itself.
/devel/networking/unetstack :: Link / Comments ()
|