Re: rc and smf

看板DFBSD_kernel作者時間21年前 (2005/02/25 06:01), 編輯推噓0(000)
留言0則, 0人參與, 最新討論串48/53 (看更多)
:Matthew Dillon wrote: :> Hmm. Well, I have to say that in my opinion a service failure is a :> critical bug in the application. I usually go in and fix the application : :Nobody argues this. Again, this is one of the reasons why people :supervise in the first place. There's nothing stopping you to add :an alert feature to a supervisor. : :> software rather then write monitoring programs for it (other then to :> tell me if it has failed). Most service oriented applications fork() :> on connect (a DNS cache being an exception), and those that have the : :Nothing stops the parent process of your forked children to be killed or :crashed, obviously for some reasons already discussed. Be killed ... by what? Crashing ... due to what? The problem here is that you are just throwing out examples without paying any attention to the likelihood that the issue might actually occur under normal (or even exceptional) system operation. It's like you don't trust that a for(i = 0; i < 10; ++i) loop will actually count properly and you want to protect against it possibly not counting properly. You are saying "what if" instead of "how often". Just because something might POTENTIALLY happen doesn't mean that it WILL happen or that it will happen often enough to warrent protection or that it will EVER happen in the particular environment you are trying to protect. People get hit by lightning all the time but that doesn't mean we wear a faraday cage jacket every time we go outside! Hard drives fail all the time, but most consumer systems still ship with just one. And, frankly, it's far more likely that your RAID storage system will fail then many of the things you are pulling out as examples. I don't bother putting a crash monitor on sendmail and apache because, well, sendmail hasn't actually crashed on me for at least 20 years, and apache hasn't crashed on me since I used it. Slow down, yes. Get behind on the queues, yes. Have a CGI/backend database failure, absolutely. But the primary connection accepting server actually crash? Hasn't happened. If I want my apache server to be robust I write a monitoring program that runs on an entirely DIFFERENT machine, and doesn't just test whether the connection works, but actually goes in and issues a real query that exercises the most complex CGI/database path I can find, and screams bloody hell if that fails. Dan, we could argue what-if's all day long, because there are an infinite number of what-if scenarios. It's like pulling a rabbit out of your hat. The problem is that just throwing out these scenarios doesn't actually help anyone running a REAL production server. You are trying to solve problems that you don't have rather then trying to solve the problems that you do have. That's the real issue here. Now, a lot of people on these lists, including me, have tried to explain this to you, but you aren't seeming to get it. You are still focusing on what-if scenarios that might occur once a decade or not at all instead of solving the REAL problem facing you, which in the case of that mail proxy service is simply configuring the program to limit the number of simultanous connections it can handle. And if it doesn't have such a configuration option, then it's broken and you should either fix it or replace it with something better. It's that simple. You don't need overcommit, you don't necessarily need service monitoring. If the program is otherwise reliable you just need a simple configuration variable. -Matt
文章代碼(AID): #127awb00 (DFBSD_kernel)
文章代碼(AID): #127awb00 (DFBSD_kernel)