[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [cobalt-users] How do I figure out why I crashed/
- Subject: Re: [cobalt-users] How do I figure out why I crashed/
- From: josh <josh@xxxxxxxxxxxxxxxxxx>
- Date: Wed Dec 10 07:45:01 2003
- List-id: Mailing list for users to share thoughts on Sun Cobalt products. <cobalt-users.list.cobalt.com>
I have been wrestling with this for a while. I suspect the most useful
info is the stuff that flashes by on the monitor attached to the
RaQXTR/550 - a RaQXTR with 2GB RAM and 4 30GB disks configured to use
RAID0 and the RaQ550 OS with all patches. The stuff flying by on the
monitor seems to be messages about paging failures and panics. I
suspect a memory problem but can't really test easily because the
memtest package available from Sun reads the ROM which says it has
512MB -- I have no idea how to change this. I am about to start
pulling chips until it stops failing.
I suspect that it is some virtual memory problem because with the ROM
kernel I have seen signs of memory swapping working (i.e. memory gets
swapped out), but with the current kernel on disk it alwasy seem to
have 0 used swap. I have been going over and over - fixing many little
things but it seems either the kernel or the memory is bad. (I suspect
both.)
[10:25:56:war-admiral:~]uname -a
Linux war-admiral.saratoga.lib.ny.us 2.4.19C10_V #1 Thu Aug 28
12:24:01 PDT 2003 i686 unknown
[10:26:13:war-admiral:~]sudo /usr/sbin/cmos -c romrev
2.9.34
[10:29:44:war-admiral:~]free
total used free shared buffers
cached
Mem: 2065188 359116 1706072 0 64
196128
-/+ buffers/cache: 162924 1902264
Swap: 524536 0 524536
I have so far not found clues in any log files or notices anything
in particular -- here is the beginnning of the syslog - starting from
a daily restart and ending with the crash and the parenthentical
comments are mine and much is deleted. This is the output for
/var/log/messages -- I could also provide dmesg output as well as
/var/log/kernel and /var/log/httpd/error
Dec 10 04:02:15 war-admiral syslogd 1.3-3: restart.
Dec 10 04:02:16 war-admiral cced(smd)[31841]: client [0:1547] has
admin rights
Dec 10 04:02:16 war-admiral cced(smd)[31843]: client [0:31840] has
admin rights
Dec 10 04:02:43 war-admiral cced(smd)[31890]: client [0:31889] has
admin rights
Dec 10 04:02:43 war-admiral cced(smd)[31898]: client [0:31896] has
admin rights
Dec 10 04:02:43 war-admiral cced(smd)[31900]: client [0:31897] has
admin rights
Dec 10 04:02:52 war-admiral cced(smd)[32054]: client [0:32053] has
admin rights
Dec 10 04:12:07 war-admiral ntpd[1334]: time reset 7.494690 s
Dec 10 04:12:07 war-admiral ntpd[1334]: synchronisation lost
(I get quite a few of these - presumably because I've lost contact
temporarily with the peer ntpd server - but I'd love to know why I
always seem to be about 7 seconds off and how to fix that.)
Dec 10 10:00:05 war-admiral cced(smd)[18615]: client 0:[0:18614]: SET
succeeded
Dec 10 10:00:05 war-admiral cced(smd)[18615]: client 0:[0:18614]: SET
16 . Overflow "lastRun" "=" "1071068405" "currentMessage" "="
"[[base-overflow.amStatusGreen]]"
Dec 10 10:00:05 war-admiral swatch[18614]: processing monitor
namespace NetWorker
Dec 10 10:00:05 war-admiral cced(smd)[18615]: client 0:[0:18614]: SET
succeeded
Dec 10 10:00:05 war-admiral swatch[18614]: turning sysfault light off
Dec 10 10:01:04 war-admiral named[1319]: lame server resolving
'www.fancc.net' (
in 'fancc.NET'?): 205.166.226.38#53
Dec 10 10:01:39 war-admiral ntpd[1334]: time reset 7.261551 s
Dec 10 10:01:39 war-admiral ntpd[1334]: synchronisation lost
Dec 10 10:02:36 war-admiral identd[18975]: main: listen
Dec 10 10:02:36 war-admiral inetd[1283]: pid 18975: exit status 1
Dec 10 10:06:18 war-admiral named[1319]: lame server resolving
'webhosts2.equat.
com' (in 'equat.com'?): 66.36.242.227#53
Dec 10 10:07:07 war-admiral identd[19222]: main: listen
Dec 10 10:07:07 war-admiral inetd[1283]: pid 19222: exit status 1
Dec 10 10:07:07 war-admiral identd[19222]: main: listen
Dec 10 10:07:07 war-admiral inetd[1283]: pid 19222: exit status 1
(Something happened and the system rebooted - does this help??)
Dec 10 10:15:17 war-admiral syslogd 1.3-3: restart.
Dec 10 10:15:20 war-admiral cced[273]: Cobalt Configuration Engine
(CCE) version 0.80.2
Dec 10 10:15:20 war-admiral cced[273]: Copyright (c) 1999,2000 Cobalt
Networks,Inc.
Dec 10 10:15:20 war-admiral cced[273]: starting up (pid 273)
Dec 10 10:15:20 war-admiral logger: cce_construct started
Dec 10 10:15:20 war-admiral logger: ***** cce_construct:
/usr/sausalito/constructor/base/system/10_addSystem.pl
On Wed, Dec 10, 2003 at 01:08:46AM -0500, Dan Kriwitsky wrote:
> >
> > Just wondering -- I keep crashing and don't know why?
>
> Driving lessons?
>
> A little more info would help. Server type for instance? Output of
> /var/log/messages
--
Josh Kuperman
josh@xxxxxxxxxxxxxxxxxx