Re: mfi panic on recused on non-recusive mutex MFI I/O lock

看板FB_stable作者時間12年前 (2013/04/27 12:34), 編輯推噓0(000)
留言0則, 0人參與, 最新討論串7/9 (看更多)
On Fri, Nov 09, 2012 at 05:06:03PM -0000, Steven Hartland wrote: | | ----- Original Message ----- | From: "Steven Hartland" | ... | >I've just had another panic, trace below, but it doesn't seem to be related | >to my changes so I'd appreciate your feedback on them as they are for now. | > | >While the lock patch fixes the problems I've seen, its not clear to me | >why mfi_tbolt_reset is acquiring the lock and hence requiring | >mfi_process_fw_state_chg_isr to jump through hoops to ensure locking | >around queue manipulation is done correctly. Given what its doing | >(resetting the entire adapter) I wouldn't be surprised if it should | >really be acquiring the config lock. | > | >Other things I've noticed / questions | >* Should mfi_abort sleep even if its call to mfi_mapcmd fails? | >* Should mfi_get_controller_info really ignore the error from mfi_mapcmd? | >* Do these controllers not support none 512 byte requests? Currently | >all syspd requests are done assuming 512 byte sectors which the disk may | >not be. This will both reduce performance or potentially break totally | >if the firmware isn't translating it under the surface correctly. | > | >Anyway the new panic manually transcribed is:- | >panic: Bad linx elm 0xffffff0069b0fc0 next->prev != elm | >... | >mfi_tbolt_get_cmd() | >mfi_build_mpt_pass_thru() | >mfi_tbolt_build_mpt_cmd() | >mfi_tbolt_send_frame() | >bus_dmamap_load() | >mfi_mapcmd() | >mfi_startio() | >mfi_syspd_strategy() | >g_disk_start() | >g_io_schedule_down() | >g_down_proc_body() | >fork_exit() | >fork_trampoline() | > | >Looks like mfi_cmd_tbolt_tqh has become corrupt some how, but as far as I | >can tell all manip is done using the TAILQ macros and under mfi_io_lock | >so its not obvious to me at this time why this is, any ideas? | | I've gone through looking for the possible cause of this and while there's | nothing directly connected to the manip of this queue I've found and fixed | quite a large number of additional problems which may have been indirectly | causing this problem. | | The biggest change is to use mfi_max_cmds to limit the value stored in | sc->mfi_max_fw_cmds as this is used extensively throughout the driver | for allocation and range checks so having this inconsitently set opened up | a large number of possible overrun errors. | | The new patch attached documents all the changes in detail. | | I've managed to do one test run so far which failed to reproduce any panics, | so definitely moving in the right direction :) | | The machine has now been collected for repair by the supplier but I'm going | to try and get them to put it online for more testing over the weekend. | | Given the failure rate so far if I can do another 4 runs with no panics I'd | be happy that the majority of error conditions are working as expected. Sounds like you have made some good progress. I looked at your prior locking change and they good. Haven't had time to go through the queue changes yet. Thanks, Doug A. _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
文章代碼(AID): #1HUrLI0g (FB_stable)
討論串 (同標題文章)
文章代碼(AID): #1HUrLI0g (FB_stable)