[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[cobalt-developers] load average above 100



I had a problem.  On my RaqXTR, I noticed the load average at 60 and
raising.  YIKES, I've never heard a load avg that high.  I tried a
'shutdown -r now' and it said it was going down for reboot now but it
didn't.  Then I tried a 'ps -aux' and didn't see anything unusual.  I tried
a 'top' and didn't see anything unusual.  I do not understand what is
happening.  I tried killing processes.  Still, the load average kept
climbing.  It was climbing about 2 every minute.  The problem became worse.
When the load average was at 130 (really!), I tried a '/etc/rc.d/init.d/halt
reboot'  --  then all the services ceased to work.  I could still ping the
machine, though.  And, after 10 minutes, I could still ping the machine, but
it had not ever rebooted.  The lcd panel said it was rebooting, but it would
never complete.  So then, I shutdown the machine by holding the power button
for 5 seconds, then turned it back on.  When it came back up, telnet will
not work, FTP will not work and e-mail will not work.  The web admin
interface did, thank goodness.  When I look at active monitor, it says that
RAID is doing this:  "Server data is being duplicated to the backup hard
drive."  When I tried to go to the control panel area of web admin and turn
on Telnet (which was marked as off - it gave this error:  "Cannot read
/etc/inetd.conf, /etc/inetd.conf is locked"  RAID rebuild finished
successfully.  inetd.conf file never unlocked. I rebooted the machine from
the web admin panel.  Worked fine, except the inetd.conf file is still
locked (or so it said).

I had to get to the machine physically, and attach a labtop to it to get a
command prompt.  When I looked in the /etc folder, there was no inetd.conf
file at all!  There was a inetd.conf.master, which I copied over and then
telnet worked again.  However, there were several corrupted mail files,
several corrupted MySQL database tables and DNS files were a mess.  I had to
rebuild these areas.  So messy.  What happened?  How can I prevent it from
happening again?

Dave.