[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[cobalt-users] sendmail bogs down



We had a(nother) sendmail crash this weekend on our Qube3. I'm trying to figure out what's causing these so I can prevent another.

Sendmail apparently went down between 11:45 and midnight on Friday night. I noticed there was a problem when I looked at my work email on Sunday (rare) and noticed a number of cron jobs had emailed errors: fcheck was choking on the md5sum of something (which happens regularly, always a different file), a weekly mysqldump backup of several sites on another server had failed, and my script which counts DNSBL rejections reported none(!) for the previous day... not likely.

Also, the load average on the server was up around 6, which is very unusual.

Telnet to port 25 would establish a connection, but there was no sendmail banner.

ps showed two sendmail processes, but kill [pid] wouldn't shut them down! When I tried /etc/rc.d/init.d/sendmail stop, it would report

stopping mail service: sendmail ERROR!ok

and then trying sendmail start would just say

starting mail service:

In hindsight, I should probably have checked "mailq" at this point to see if there was a problem there. Instead, I rebooted the box, which cleared up all problems. It is now running normally. I scoured the logs (maillog and messages) this morning and didn't find anything interesting - just a dramatic reduction (to zero) of outside connections around midnight, and a depressing amount of incoming spam before that.

I'd like to figure out what is causing this so I can avoid it in the future, or at least recover from it more gracefully than by rebooting the server. I searched the archives for "sendmail shutdown" and "sendmail not responding" but so far I haven't found anything which matches my situation. I have found some posts implying that sendmail will shut down at a certain load average; is it possible that a combination of a hung fcheck process and a big blob of spam (using both sendmail and spamd) could have produced a load average high enough (>15) to shut it down? Also, those messages suggested that Active Monitor would attempt to restart sendmail, and that apparently wasn't happening.

Any ideas?

Thanks,

pjm