USB/coredump hangs in 8 and 9

看板FB_stable作者時間14年前 (2011/08/13 04:32), 編輯推噓0(000)
留言0則, 0人參與, 最新討論串1/11 (看更多)
Re: panic: bufwrite: buffer is not busy??? (originally on freebsd-net) Re: debugging frequent kernel panics on 8.2-RELEASE (originally on = freebsd-stable) Re: System hang in USB umass module while processing panic (originally = on freebsd-usb) Hello Andriy and Hans, Sorry for tying in so many discussions on this topic, but I think I have = an explanation for the problems we have been reporting* with hanging = coredumps on multicore systems on 8.2-RELEASE, and it has implications = for Andriy's proposed scheduler patch** and for USB. In today's 8.X and 9.X branches, nothing that I can find stops the other = CPUs when the kernel panics, but many parts of the locking code get = disabled (grep on 'panicstr'). The 'bufwrite: buffer is not busy???' = panic is caused by the syncer encountering an error. If that happens = when it's on the dumping CPU everything hangs. If it's running on a = different CPU, it will be blocked and hidden by the panic_cpu spinlock = in panic(), and the dump continues, polling every attached keyboard for = a Ctl-C. But, the new 8.X USB stack relies on multithreading. (The new stack is = the variable that broke coredumps for us in the 7.1->8.2 transition, I = think.) SVN 224223 fixes a hang that would happen when dumpsys() polls = the USB keyboard (IPMI KVM, in our case). That helps, but it only gets = as far as usb_process(), where it hangs in a loop around a cv_wait() = call. This is easy to reproduce by adding code to the watchdog to break = into the debugger if panicstr is set. I am experimenting with Andriy's patch** to stop the scheduler and it = seems to be most of the way there, stopping the CPUs and disabling the = rest of locking. There are a few places that still reference panicstr, = but that's minor. These are the changes I made to the patch: * Changed ukbd_do_poll() to return immediately if SCHEDULER_STOPPED() = is true, so that we don't hang up in USB. ukbd_yield() locks up in = DROP_GIANT(), and if you skip ukbd_yield(), usbd_transfer_poll() locks = up trying to drop mutexes. * Changed the call to spinlock_enter() back to critical_enter(), so = that interrupts stay enabled and the hardclock still functions. * Added code in the beginning of panic() to switch to CPU 0, so that = we're able to service the hardclock interrupts and so that watchdog = panics get through. This has worked 100% for me so far, although anyone using a USB keyboard = or dump device would still be out of luck. Thoughts? It seems like stopping all of the other CPUs is the right = thing to do on a panic (what are they doing otherwise?). Are the USB = issues fixable? If Andriy's patch get committed it might just involve = short-circuiting all of the locking in the polling path, but I haven't = gotten that far yet. I bet dumping to NFS will have the same problem. Thanks, Andrew * - http://www.freebsd.org/cgi/query-pr.cgi?pr=3Dkern/155421 ** - http://people.freebsd.org/~avg/stop_scheduler_on_panic.8.x.diff -------------------------------------------------- Andrew Boyer aboyer@averesystems.com _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
文章代碼(AID): #1EHOt4Bi (FB_stable)
討論串 (同標題文章)
文章代碼(AID): #1EHOt4Bi (FB_stable)