[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[cobalt-users] Runaway mail processes on a Qube3
- Subject: [cobalt-users] Runaway mail processes on a Qube3
- From: Parker Morse <morse@xxxxxxxxxxx>
- Date: Mon Nov 18 06:56:01 2002
- List-id: Mailing list for users to share thoughts on Sun Cobalt products. <cobalt-users.list.cobalt.com>
Our Qube3 has been exhibiting strange behavior for a few days. I first
noticed it on Friday 11/15, but it may have started Thursday 11/14. First
we had users getting errors when they tried to POP their mail. The errors
were transient and I was unable to find anything unusual in
/var/log/maillog.
(Aside: I DID find the "error: safesasl(/etc/sasldb) failed: Group
readable file" error which is mentioned in the archives:
<http://list.cobalt.com/pipermail/cobalt-users/2002-June/072499.html>,
among others. I ran the fix for that, but it may not be related.)
What I HAVE started seeing is curious TOP output: sendmail and/or procmail
processes taking large chunks of memory and/or CPU time, taking a very
long time to execute (I'm seeing a procmail process right now that's
showing 9:34 for time, and a sendmail at 3:41; they'll be higher before I
finish this message.) Load averages get up in the mid-2s.
They seem to die on their own eventually, I haven't seen one run over 10
minutes.
When I have runaway process and I'm not running "top", trying to run it
throws an error:
[admin 09:09:58]~$ top
Segmentation fault (core dumped)
[admin 09:13:38]~$ top
Segmentation fault (core dumped)
[admin 09:13:40]~$ top
Segmentation fault (core dumped)
[admin 09:13:42]~$ ps
BUG IN DYNAMIC LINKER ld.so: dl-minimal.c: 69: malloc: Assertion `page !=
((void *) -1)' failed!
[root 09:24:57]/home/users/admin$ top
Segmentation fault
[root 09:27:47]/home/users/admin$ top
Segmentation fault
I think this is probably because the runaway processes are hogging system
resources.
I'm using two DNSBLs in sendmail, and procmail calls SpamAssassin's spamd.
I've seen spamd processes pass through "top" pretty quickly, so I doubt
that's the problem. I did see a dramatic drop in DNSBL rejections over the
weekend - perhaps the lookups are timing out?
Also worth noting: I followed Gerald's helpful instructions to upgrade
bind to 8.3.3-REL (the patched one) on Thursday. Could the new bind be
causing this trouble? This machine isn't authoritative for any domain, but
is one of the local DNSs for our local subnet. I noticed in the archives
that another user with high load averages and sendmail problems had minor
DNS issues as well.
Has anyone seen this on their system? Is there anything else I should be
considering?
Thanks,
pjm