Re: rc and smf

看板DFBSD_kernel作者時間21年前 (2005/02/25 04:01), 編輯推噓0(000)
留言0則, 0人參與, 最新討論串28/53 (看更多)
Hmm. Well, I have to say that in my opinion a service failure is a critical bug in the application. I usually go in and fix the application software rather then write monitoring programs for it (other then to tell me if it has failed). Most service oriented applications fork() on connect (a DNS cache being an exception), and those that have the option of forking or running threaded I usually tell to fork. This greatly narrows the amount of code that actually has to run in the parent's connection-accepting loop and works as well as or better then a service monitor. The proxies I wrote at BEST Internet a long time ago all did that, and those applications never failed. Not once. Ever. They handled millions of emails a day. Insofar as the remaining applications go, I have seen occassional failures and certainly failures can occur, but it isn't a 'random' occurance. Some applications are prone to problems, some never die. I have lost older BIND demons to corruption (not actual segfaults), but I don't think I've had a dns failure for over two years now, and that is plenty long enough for me to prefer having the system yell and scream at me if it dies rather then restart and forget. The only time a service has failed on crater.dragonflybsd.org has been when I screwed it up myself, accidently, or when the hard drive physically crashed. That's it. I certainly don't spend my time worrying at night that random services might not be working! But anyhow, back to service failures... service failures do not always end in a crash. Take BIND for example. It is far more likely that BIND's cache will become corrupted then for BIND to actually crash. A simple 'detect that it died and restart' monitor doesn't help you there. What you have to do is have a program which actually goes in and uses the service for real. e.g. for a web server a program which connects to it every minute and retrieves the most complex CGI'd page it serves out. That's the sort of monitoring we need... not this simple it-dies-and-we-restart stuff. Service corruption is the far more likely scenario these days. And please, Dan, stop trying to compare generic UNIX systems to RTOSes and dedicated custom turnkey systems. Those systems run dedicated, heavily maintained software, whereas you are running run-of-the-mill third party software (as are we). You can hardly expect the same level of reliability from a pot-luck dinner as you can from a carefully prepared meal. -Matt Matthew Dillon <dillon@backplane.com>
文章代碼(AID): #127ZA600 (DFBSD_kernel)
文章代碼(AID): #127ZA600 (DFBSD_kernel)