[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [cobalt-users] CPU heavily loaded, low on memory, smtp server not responding



> Date: Tue, 22 Jan 2002 16:05:15 -0500
> From: Cobalt List <cobalt@xxxxxxxxxxxxxxxxxxxxxxx>

> I woke this morning to the following scary messages from my cobalt:
> 
> #1. (at 5:07 this morning) "Over the past fifteen minutes, the CPU has been 
> heavily loaded", etc..
> "1 minute load average: 99.50
> 5 minute load average: 97.58
> 15 minute load average: 71.17"

First, to clear a common misconception:  Load average is *not* 
CPU usage.  It indicates the number of processes waiting on
average over the specified interval.  It may have *nothing* to do
with CPU.  (Start ten instances of the disk benchmark "bonnie"
and you'll see what I mean.)

A high loadaverage may be caused by insufficient CPU, blocking on
IO, etc.  Some buggy kernels will run out of <some critical
descriptor>, and block until things free up.

> #2. (at 5:08 this morning) "The SMTP (mail) server appears to
> be down", etc....

When your loadavg becomes too high, Sendmail refuses new
connections.  It's designed to do this.  The exact value is
tunable, but the default (IIRC) is to queue at 8 and refuse new
at 12.

> #3. (at 5:30 this morning) "Memory on the Cobalt server is
> heavily used", etc.
> Total memory is: 389084 KB
> Used memory is: 371784 KB
> Free memory is: 17300 KB
> Percent used is: 95"
> 
> The box usually runs with a 15 minute load average of less than 0.50 even 
> during peak times.
> So this was a major spike!
> 
> I'm not sure what was going on at that time. The only somewhat
> processor intensive thing running at that time is webalizer 2.

How about disk intensive?  How much swap are you using?  If
you're bogged down waiting for disk, your load average can easily
shoot through the roof.

What does the "procinfo" command show?

Do you perhaps have a runaway CGI script?

> I checked my logcheck messages and the only thing I found
> strange was tons of "lame server" and "bad referral" messages.

Someone has you listed as being authoritative for DNS zones for
which you are not.

> I usually get a few of these every hour but this was hundreds
> of them right around the time I was getting memory, cpu and

That's odd.  Maybe coincidence, maybe not.  I've never tried
running BIND with such a high loadavg; I recommend that people
keep DNS on a dedicated machine that does DNS, DNS, and more DNS.

> smtp messages.

Start "top" at root.  Press "M" -- make sure it's uppercase.
What's the biggest RSS there, and to what process does it
belong?  A runaway process could cause this...

Also, if you

	ps ax | grep http | wc -l

That will tell, for instance, how many copies of Apache are
running.  If you have MaxServers set too high, you could be
thrashing.  (Hence my inquiry about swap usage.)

Alas, you must catch either of these two in the act.


Eddy

---------------------------------------------------------------------------
Brotsman & Dreger, Inc. - EverQuick Internet Division
Phone: +1 (316) 794-8922 Wichita/(Inter)national
Phone: +1 (785) 865-5885 Lawrence
---------------------------------------------------------------------------

Date: Mon, 21 May 2001 11:23:58 +0000 (GMT)
From: A Trap <blacklist@xxxxxxxxx>
To: blacklist@xxxxxxxxx
Subject: Please ignore this portion of my mail signature.

These last few lines are a trap for address-harvesting spambots.  Do NOT
send mail to <blacklist@xxxxxxxxx>, or you are likely to be blocked.