[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [cobalt-users] Qube2 Crashes without obvious cause
- Subject: Re: [cobalt-users] Qube2 Crashes without obvious cause
- From: Mike Vanecek <nospam99@xxxxxxxxxxxx>
- Date: Fri Dec 8 10:19:01 2000
- Organization: anonymous
- List-id: Mailing list for users to share thoughts on Cobalt products. <cobalt-users.list.cobalt.com>
On Fri, 08 Dec 2000 17:01:21 +1100, Malcolm McLeary <mmcleary@xxxxxxx> wrote:
:>It would be nice if the box had a hardware watchdog to do a hardware reset
:>if/when it crashed.
I guess one could be programmed into the monitor?
:>> Anything in the
:>> /var/cobalt/ monitor logs (since they get updated every 15 minutes - unless
:>> you have changed crontab like I have to do a monitor every 6 hours)? You can
:>> check memory and disk usage there.
:>
:>Hadn't looked there it might be interesting to correlate the "stop" time in
:>/var/log/messages with these logs. It might also be interesting to disable
:>monitoring altogether to see if its a "monitor" which is causing the
:>problem, but wouldn't that cause it to crash every 15 minutes?
I doubt it is the monitor, but those logs may give you a hint as to what as
failing at the time of the lockup. Uhm, anything being logrotated around that
time?
:>
:>> What changes have been made to the system
:>> or the environment that might suggest a cause?
:>
:>Up until now none ... its as delivered. The user had been complaining about
:>"File System Full" email so messages so I moved /usr/doc to /home/doc to
:>free up a little space on /.
You did put in a ln -s for /usr/doc right?
Those messages should not be sent unless the file space hits 80% or 90%.
:>[root /root]# df -m
:>Filesystem MB-blocks Used Available Capacity Mounted on
:>/dev/hda1 290 211 79 73% /
:>/dev/hda3 193 25 168 13% /var
:>/dev/hda4 18172 577 17595 3% /home
:>[root /root]#
:>
:>It was running at 78%.
Qube2s typically arrive with 78% on the root partition. A couple of other
thoughts. The /var/Cobalt logs did not all rotate correctly (I posted a
message about this several months ago). I believe one of the patches fixed
that. Look at both /var/cobalt and /var/log and make sure the logs have a
matching zipped log (if it has been around log enough to have needed zipping).