Zbr's days.
January
Sun Mon Tue Wed Thu Fri Sat
   
31    
2008
Months
Jan
Oct Nov Dec

About :: TODO :: Blog :: RSS :: Old blog :: Projects :: GIT :: Gallery :: Notes

Thu, 31 Jan 2008

Nasty dentry abuse or...

... searching for rakes by stepping on them in a dark room. That is how I can describe the process of hunting for obscure bugs in filesystem code.

Preface 1.
System locks hardly without any single message in dmesg, although all kernel hacking options are enabled in config. System responses to ping, but there is no way to login or to do somthing by local user.

Preface 2.
I recall, things were cool.

Bisecting is not my friend today, since fair number of fixes was added and while I can find situation, when new bug does not exist, old ones can kill the system, so I decided to manually check every patch in git I added for the last days. Since I do not know VFS enough, there are several things I just copied from other filesystems (most of them do it that way), so I started to drop some bits out of that code in pohmelfs.
Eventually I found, that lookup, which fails to find requested dentry in most filesystems adds NULL inode into dentry either via d_add() or via d_splice_alias(). Both look harmless, except that dentry with NULL inode exists in the dentry cache. Maybe it is good and there is some other bug in pohmelfs, but after I added it I started to get that obscure freezes (it is quite easily reproducible with almost 100% probability in some test), and some times general protection fault happend in VFS code during umount.

So, I just removed code, which adds NULL inode into dentry via d_add() and things are good again. I do not know how frequently this can happen in local filesystem, but fact is fact, after removing this code pohmelfs behaves excellent (modulo its speed).

Edited to add: no, somthing wrong still exists in the system, although I'm not sure for whom to blame:

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.23-pohmelfs #4
-------------------------------------------------------
bash/4116 is trying to acquire lock:
 (&journal->j_list_lock){--..}, at: [] journal_try_to_free_buffers+0xd4/0x187 [jbd]

but task is already holding lock:
 (inode_lock){--..}, at: [] drop_pagecache+0x48/0xd8

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (inode_lock){--..}:
       [] __lock_acquire+0xa66/0xc48
       [] lock_acquire+0x7a/0x94
       [] _spin_lock+0x38/0x62
       [] __mark_inode_dirty+0xce/0x147
       [] __set_page_dirty+0xd0/0xdf
       [] mark_buffer_dirty+0x8b/0x92
       [] __journal_temp_unlink_buffer+0x174/0x17b [jbd]
       [] __journal_unfile_buffer+0xb/0x15 [jbd]
       [] __journal_refile_buffer+0x6a/0xe3 [jbd]
       [] journal_commit_transaction+0xf46/0x11eb [jbd]
       [] kjournald+0xb5/0x1c1 [jbd]
       [] kthread+0x3b/0x63
       [] kernel_thread_helper+0x7/0x10
       [] 0xffffffff

-> #0 (&journal->j_list_lock){--..}:
       [] __lock_acquire+0x952/0xc48
       [] lock_acquire+0x7a/0x94
       [] _spin_lock+0x38/0x62
       [] journal_try_to_free_buffers+0xd4/0x187 [jbd]
       [] ext3_releasepage+0x68/0x74 [ext3]
       [] try_to_release_page+0x33/0x44
       [] __invalidate_mapping_pages+0x74/0xe0
       [] drop_pagecache+0x70/0xd8
       [] drop_caches_sysctl_handler+0x36/0x4e
       [] proc_sys_write+0x6b/0x85
       [] vfs_write+0x82/0xb8
       [] sys_write+0x3d/0x61
       [] syscall_call+0x7/0xb
       [] 0xffffffff

other info that might help us debug this:

2 locks held by bash/4116:
 #0:  (&type->s_umount_key#11){----}, at: [] drop_pagecache+0x38/0xd8
 #1:  (inode_lock){--..}, at: [] drop_pagecache+0x48/0xd8

stack backtrace:
 [] show_trace_log_lvl+0x1a/0x2f
 [] show_trace+0x12/0x14
 [] dump_stack+0x16/0x18
 [] print_circular_bug_tail+0x5f/0x68
 [] __lock_acquire+0x952/0xc48
 [] lock_acquire+0x7a/0x94
 [] _spin_lock+0x38/0x62
 [] journal_try_to_free_buffers+0xd4/0x187 [jbd]
 [] ext3_releasepage+0x68/0x74 [ext3]
 [] try_to_release_page+0x33/0x44
 [] __invalidate_mapping_pages+0x74/0xe0
 [] drop_pagecache+0x70/0xd8
 [] drop_caches_sysctl_handler+0x36/0x4e
 [] proc_sys_write+0x6b/0x85
 [] vfs_write+0x82/0xb8
 [] sys_write+0x3d/0x61
 [] syscall_call+0x7/0xb
 =======================
Although it does not contain any signs of pohmelfs, it still can be related...

/devel/fs :: Link / Comments (0)

Please solve this captcha to be allowed to post (need to reload in a minute): 13 - 65

Comments are closed for this story.