|
About ::
TODO ::
Blog ::
RSS ::
Old blog ::
Projects ::
GIT ::
Gallery ::
Notes
Fri, 26 Sep 2008
New failed ipw2100 interrupt and its races.
[41773.200686] ipw2100: Fatal interrupt. Scheduling firmware restart. [41773.200707] eth1: Fatal error value: 0x500185B8, address: 0x08004501, inta: 0x40000000 [41773.200810] ipw2100 0000:02:04.0: PCI INT A disabled [41773.203110] ipw2100: IRQ INTA == 0xFFFFFFFF [41773.224446] ipw2100: IRQ INTA == 0xFFFFFFFF [41773.245781] ipw2100: IRQ INTA == 0xFFFFFFFF [41773.249360] ipw2100 0000:02:04.0: enabling device (0000 -> 0002) [41773.249384] ipw2100 0000:02:04.0: PCI INT A -> Link[C0C8] -> GSI 11 (level, low) -> IRQ 11 [41773.249426] ipw2100 0000:02:04.0: restoring config space at offset 0x1 (was 0x2900002, writing 0x2900006)This happens during PCI ipw2100 device disablement in the reset handler, so when interrupt handler sees that, it bails out. It should be generally ok, but I found a different thing: there is a race between interrupt handler (handler itself and related processing tasklet) and reset code. The latter disables interrupts before starting to turn adapter on, but interrupt handler can run right now on given cpu and can schedule the tasklet, so its disablement does not prevent parallel reading and writing of the various registers. IRQ processing tasklet does register reading and writing under the lock with interrupts turned off, but reset tasklet does not protect initialization path against it, so I wonder, what may happen in this case. Since register reading and writing happens from absolute addresses (I meant there is no need to write address register first), this maybe not a problem, but still race exists and theoretically can harm the system. Similar unguarded accesses exist in ipw2100_wx_event_work() handler, and also there is unguarded status field setting
in various places in the driver, which can harm the driver's behaviour too.So, maybe I decided to blame firmware a little bit early, although found things may be harmless. I will try to figure this out later tomorrow. /devel/networking/ipw2100 :: Link / Comments () Thu, 25 Sep 2008
ipw2100 fatal interrupt: playing with power states.
eth1: Fatal error value: 0x50018584, address: 0x61C00000, inta: 0x40000000 eth1: Fatal error value: 0x50018584, address: 0x61C00000, inta: 0x40000000 eth1: Fatal error value: 0x5000CEE4, address: 0x61C00000, inta: 0x40000000 eth1: Fatal error value: 0x50018584, address: 0x61C00000, inta: 0x40000000 eth1: Fatal error value: 0x5000CEE4, address: 0x61C00000, inta: 0x40000000 eth1: Fatal error value: 0x50018584, address: 0x61C00000, inta: 0x40000000 eth1: Fatal error value: 0x50018584, address: 0x61C00000, inta: 0x40000000 eth1: Fatal error value: 0x5000CEE4, address: 0x61C00000, inta: 0x40000000 eth1: Fatal error value: 0x50018584, address: 0x61C00000, inta: 0x40000000 eth1: Fatal error value: 0x50018584, address: 0x61C00000, inta: 0x40000000They did not follow one after another though. Different error values likely mean, that there is no any correlation between values and addresses, so this information is useless. I added power state changes to the reset function, so now it does something like that: [ 897.661002] ipw2100: Fatal interrupt. Scheduling firmware restart. [ 897.661021] eth1: Fatal error value: 0x30016C44, address: 0x601F7C00, inta: 0x40000000 [ 897.664712] ipw2100 0000:02:04.0: PCI INT A disabled [ 897.712041] ipw2100 0000:02:04.0: enabling device (0000 -> 0002) [ 897.713549] ipw2100 0000:02:04.0: PCI INT A -> Link[C0C8] -> GSI 11 (level, low) -> IRQ 11 [ 897.713595] ipw2100 0000:02:04.0: restoring config space at offset 0x1 (was 0x2900002, writing 0x2900006) [ 954.646319] ipw2100: Fatal interrupt. Scheduling firmware restart. [ 954.646338] eth1: Fatal error value: 0x5000CF10, address: 0x61A00000, inta: 0x40000000 [ 954.646429] ipw2100 0000:02:04.0: PCI INT A disabled [ 954.692041] ipw2100 0000:02:04.0: enabling device (0000 -> 0002) [ 954.692063] ipw2100 0000:02:04.0: PCI INT A -> Link[C0C8] -> GSI 11 (level, low) -> IRQ 11 [ 954.692103] ipw2100 0000:02:04.0: restoring config space at offset 0x1 (was 0x2900002, writing 0x2900006) [ 968.585409] ipw2100: Fatal interrupt. Scheduling firmware restart. [ 968.585429] eth1: Fatal error value: 0x5000C9D0, address: 0x57E00500, inta: 0x40000000 [ 968.585517] ipw2100 0000:02:04.0: PCI INT A disabled [ 968.632037] ipw2100 0000:02:04.0: enabling device (0000 -> 0002) [ 968.632059] ipw2100 0000:02:04.0: PCI INT A -> Link[C0C8] -> GSI 11 (level, low) -> IRQ 11 [ 968.632099] ipw2100 0000:02:04.0: restoring config space at offset 0x1 (was 0x2900002, writing 0x2900006) [ 972.269514] ipw2100 0000:02:04.0: PCI INT A disabled [ 972.316041] ipw2100 0000:02:04.0: enabling device (0000 -> 0002) [ 972.316400] ipw2100 0000:02:04.0: PCI INT A -> Link[C0C8] -> GSI 11 (level, low) -> IRQ 11 [ 972.316446] ipw2100 0000:02:04.0: restoring config space at offset 0x1 (was 0x2900002, writing 0x2900006)As we can see, fatal interrupts did not dissapear, and are actually as frequent as before. Also got this lines: [ 2032.560413] ipw2100: exit - failed to send CARD_DISABLE command [ 2032.560449] ipw2100: exit - failed to send CARD_DISABLE command [ 2032.560491] ipw2100: exit - failed to send CARD_DISABLE command [ 2032.560593] ipw2100: exit - failed to send CARD_DISABLE commandOne after another, which does not provide me any clue though. I've started several big torrent downloads/seeds as a big load, maybe card somehow differentiates different flows, so this test should be more heavy than lots of pings. First time I noticed fatal interrupt problem with this kind of load, when card not only stopped to work, but also printed some goodbay message. So far conclusion is not very optimistic: fatal interrupts happen always, no matter what magic is enabled in the reset, which already tells that firmware is broken. Hopefully additional reset games with power management will allow card to work, even with those interrupts. Time will tell. /devel/networking/ipw2100 :: Link / Comments () Wed, 24 Sep 2008
First ipw2100 testing: fatal interrupt.
[ 613.960164] ipw2100: exit - failed to send CARD_DISABLE command [ 624.456033] eth1: no IPv6 routers present [ 690.721534] ipw2100: Fatal interrupt. Scheduling firmware restart. [ 690.721554] eth1: Fatal error value: 0x5000C97C, address: 0x100E201C, inta: 0x40000000 [ 690.721580] ------------[ cut here ]------------ [ 690.721587] WARNING: at drivers/net/wireless/ipw2100.c:3188 ipw2100_irq_tasklet+0x8fe/0x9b0 [ipw2100]() [ 690.721736] Pid: 0, comm: swapper Not tainted 2.6.27-rc7-mainline #2 [ 690.721744] [So, this fatal error value and address numbers do not tell me anything, but since they are always different on different addresses, I think firmware just loses its mind and stops responding. The first line, where ipw2100 fails to send a command, was obtained during ifdown of the interface. I never saw it before, but do not think
it is related though.So, I need to move to the office and want to make some distributed storage changes, namely fix an issue with name collision (kernel already has a dvb card, which module is called dst.ko), and implement better minor number allocation
scheme for the imported devices, since right now after node was created and distroyed,
new one will not get the same number, but continuously increasing one, which looks
confusing and may bring a sysfs initialization error (when system tries to
register kobject with existing name).I will continue ipw2100 experiments today's night if will not fall asleep again because of jetlag. Stay tuned! /devel/networking/ipw2100 :: Link / Comments () |