[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[cobalt-users] Is this unusual?
- Subject: [cobalt-users] Is this unusual?
- From: Greg Hewitt-Long <cobaltusers@xxxxxxxxxxxxxxxxxxx>
- Date: Mon Jan 19 17:58:01 2004
- List-id: Mailing list for users to share thoughts on Sun Cobalt products. <cobalt-users.list.cobalt.com>
I'm experiencing some system crashes on a box - the machine just locks,
then if not hard-rebooted, it begins doing strange things, like it's
totally failing - ie, can't connect, can't ping, if the gui responds, it
internal server errors everywhere.
I've checked:
partitions - these appear fine - nowhere near full.
top:
6:40pm up 30 min, 1 user, load average: 0.13, 0.17, 0.23
72 processes: 71 sleeping, 1 running, 0 zombie, 0 stopped
CPU states: 47.8% user, 4.7% system, 0.0% nice, 47.4% idle
Mem: 776656K av, 768048K used, 8608K free, 246420K shrd, 558996K buff
Swap: 655812K av, 0K used, 655812K free 94904K cached
PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND
4304 root 3 0 896 896 692 R 0 2.4 0.1 0:00 top
1937 root 1 0 524 524 424 S 0 1.1 0.0 0:05 syslogd
1988 root 3 0 484 484 412 S 0 0.2 0.0 0:00 inetd
2062 root 1 0 1344 1344 1116 S 0 0.2 0.1 0:00 sendmail
1 root 0 0 472 472 400 S 0 0.0 0.0 0:03 init
2 root 0 0 0 0 0 SW 0 0.0 0.0 0:00 kflushd
3 root 0 0 0 0 0 SW 0 0.0 0.0 0:00 kupdate
4 root 0 0 0 0 0 SW 0 0.0 0.0 0:00 kswapd
5 root -20 -20 0 0 0 SW< 0 0.0 0.0 0:00 mdrecoveryd
1946 root 0 0 768 768 384 S 0 0.0 0.0 0:00 klogd
1976 root 0 0 616 616 508 S 0 0.0 0.0 0:00 crond
1994 root 0 0 1060 1060 928 S 0 0.0 0.1 0:00 sshd
2005 root 0 0 5920 5920 5524 S 0 0.0 0.7 0:01
httpd.admsrv
2027 root 0 0 6508 6508 5180 S 0 0.0 0.8 0:00
httpd.admsrv
2044 root 0 0 9692 9692 9468 S 0 0.0 1.2 0:02 httpd
2067 root 0 0 1356 1356 1108 S 0 0.0 0.1 0:00 sendmail
2070 root 0 0 1828 1828 1256 S 0 0.0 0.2 0:00 sendmail
2091 httpd 0 0 10416 10M 9080 S 0 0.0 1.3 0:00 httpd
2092 httpd 0 0 10344 10M 9068 S 0 0.0 1.3 0:00 httpd
2093 httpd 0 0 10384 10M 9068 S 0 0.0 1.3 0:00 httpd
2094 httpd 0 0 10420 10M 9076 S 0 0.0 1.3 0:00 httpd
2095 httpd 0 0 10452 10M 9068 S 0 0.0 1.3 0:00 httpd
2098 root 0 0 7212 7212 1272 S 0 0.0 0.9 0:01 mailscanner
2154 postgres 5 5 1336 1336 940 S N 0 0.0 0.1 0:00 postmaster
2158 httpd 0 0 10056 9.8M 9584 S 0 0.0 1.2 0:00 httpd
2326 root 0 0 664 664 496 S 0 0.0 0.0 0:00 caspd
2327 root 0 0 664 664 496 S 0 0.0 0.0 0:00 caspd
2328 root 0 0 664 664 496 S 0 0.0 0.0 0:00 caspd
2340 httpd 0 0 10092 9.9M 9584 S 0 0.0 1.2 0:00 httpd
2341 root 0 0 6060 6060 3232 S 0 0.0 0.7 0:00 caspeng
Nothing I see is THAT unusual. Although all available RAM gets sucked up
rapidly - I think that's caused by a number of users who keep email on the
server, and they're constantly checking it - I see their in.qpopper
commands hang around for a while using tons of resource.
chkrootkit doesn't find anything out of the ordinary, although I need to
update the chkrootkit I think.
Netstat:
[root admin]# netstat -a --numeric
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 65.103.98.107:25 209.139.49.194:63532 ESTABLISHED
tcp 0 0 66.151.173.187:110 24.173.85.246:1407 TIME_WAIT
tcp 0 0 65.103.98.112:443 200.67.167.87:2922 FIN_WAIT2
tcp 0 0 65.103.98.112:443 200.67.167.87:2921 FIN_WAIT2
tcp 0 0 66.151.173.187:110 66.76.147.107:2039 TIME_WAIT
tcp 0 0 66.151.173.187:110 66.76.147.107:2035 TIME_WAIT
tcp 0 0 66.151.173.187:110 66.76.147.107:2033 TIME_WAIT
tcp 0 0 66.151.173.187:110 66.76.147.107:2031 TIME_WAIT
tcp 0 0 65.103.98.112:25 208.46.240.44:3806 TIME_WAIT
tcp 0 0 66.151.173.187:110 66.76.147.107:2029 TIME_WAIT
tcp 0 0 66.151.173.187:110 66.76.147.107:4439 TIME_WAIT
tcp 0 0 65.103.98.122:80 65.214.36.57:37291 TIME_WAIT
tcp 0 0 65.103.98.104:25 218.107.188.99:4433 ESTABLISHED
tcp 0 0 65.103.98.117:80 204.32.195.37:10252 FIN_WAIT2
tcp 0 10283 65.103.98.109:80 12.148.243.131:56344 FIN_WAIT1
tcp 0 1 65.103.98.121:1074 211.158.86.63:25 SYN_SENT
tcp 0 1 65.103.98.121:1072 64.38.64.91:25 SYN_SENT
tcp 0 1 65.103.98.121:1070 200.87.122.170:25 SYN_SENT
tcp 0 1 65.103.98.121:1060 217.114.167.203:25 SYN_SENT
tcp 0 0 66.151.173.181:25 209.139.49.194:63469 ESTABLISHED
tcp 0 1112 65.103.98.121:22 65.103.96.10:21905 ESTABLISHED
tcp 0 0 0.0.0.0:3001 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:3000 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN
tcp 0 0 65.103.98.125:443 0.0.0.0:* LISTEN
tcp 0 0 65.103.98.112:443 0.0.0.0:* LISTEN
tcp 0 0 65.103.98.114:443 0.0.0.0:* LISTEN
tcp 0 0 66.151.173.186:443 0.0.0.0:* LISTEN
tcp 0 0 65.103.99.172:443 0.0.0.0:* LISTEN
tcp 0 0 65.103.99.132:443 0.0.0.0:* LISTEN
tcp 0 0 65.103.99.134:443 0.0.0.0:* LISTEN
tcp 0 0 65.125.145.41:443 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:25 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:81 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:444 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:143 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:110 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:21 0.0.0.0:* LISTEN
udp 0 0 0.0.0.0:514 0.0.0.0:*
raw 0 0 0.0.0.0:1 0.0.0.0:* 7
raw 0 0 0.0.0.0:6 0.0.0.0:* 7
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags Type State I-Node Path
unix 0 [ ACC
] STREAM LISTENING 2299 /var/lib/mysql/mysql.sock
unix 0 [ ACC ] STREAM LISTENING 2012 /tmp/.s.PGSQL.5432
unix 4 [ ] DGRAM 1397 /dev/log
unix 1 [ W ] STREAM CONNECTED 2717
unix 1 [ ] STREAM CONNECTED 2716
unix 0 [ ] DGRAM 1962
unix 0 [ ] DGRAM 1735
unix 0 [ ] DGRAM 1627
unix 0 [ ] DGRAM 1408
I think I've found the significant kernel logfile entries (/var/log/kernel)
- their are a number around the freak out times like this:
Jan 19 15:04:11 co05 kernel: Unable to handle kernel paging request at
virtual address 00010108
Jan 19 15:04:11 co05 kernel: current->tss.cr3 = 05af5000, %%cr3 = 05af5000
Jan 19 15:04:11 co05 kernel: *pde = 00000000
Jan 19 15:04:11 co05 kernel: Oops: 0000
Jan 19 15:04:11 co05 kernel: CPU: 0
Jan 19 15:04:11 co05 kernel: EIP: 0010:[get_stat+292/708]
Jan 19 15:04:11 co05 kernel: EFLAGS: 00010206
Jan 19 15:04:11 co05 kernel: eax: 00000000 ebx: d2630000 ecx:
00000041 edx: 00000040
Jan 19 15:04:11 co05 kernel: esi: bffff9cc edi: 00010000 ebp:
00000400 esp: d2631f1c
Jan 19 15:04:11 co05 kernel: ds: 0018 es: 0018 ss: 0018
Jan 19 15:04:11 co05 kernel: Process pidof (pid: 21871, process nr: 63,
stackpage=d2631000)
Jan 19 15:04:11 co05 kernel: Stack: c0252780 00000400 c41384e0 00010000
52001000 40015000 00000000 bffff9cc
Jan 19 15:04:11 co05 kernel: 400bfa34 00112000 00000006 00000000
00000000 00000000 c014646f 0000556f
Jan 19 15:04:11 co05 kernel: c4e61000 c0146559 c4e61000 0000556f
0000000b ca29e320 ffffffea 00000000
Jan 19 15:04:11 co05 kernel: Call Trace: [get_process_array+71/96]
[array_read+209/484] [sys_read+174/196] [system_call+52/56]
Jan 19 15:04:11 co05 kernel: Code: 8b b7 08 01 00 00 89 74 24 1c eb 08 c7
44 24 1c ff ff ff ff
Apart from perhaps running out of swap, any ideas what this might be? Bad
RAM? Just not enough RAM/Swap for the job?
I've now upped the swapfile from a 1/2Gb to full Gb - hopefully this will
relieve the problems, although I think there might be more to it than just
throwing swap at it!
thanks
Greg