[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [cobalt-users] RaQ3 System Locking Up Daily
- Subject: RE: [cobalt-users] RaQ3 System Locking Up Daily
- From: "Gary M. Root" <groot@xxxxxxxxxxxxxxx>
- Date: Fri Nov 2 18:00:01 2001
- List-id: Mailing list for users to share thoughts on Cobalt products. <cobalt-users.list.cobalt.com>
Here is what I have in the message log:
Oct 26 04:02:35 raq4 syslogd 1.3-3: restart.
Oct 28 07:18:09 raq4 kernel: general protection fault: 0000
Oct 28 07:18:09 raq4 kernel: CPU: 0
Oct 28 07:18:09 raq4 kernel: EIP: 0010:[do_fork+175/2032]
Oct 28 07:18:09 raq4 kernel: EFLAGS: 00010286
Oct 28 07:18:09 raq4 kernel: eax: 00000100 ebx: def4e000 ecx: dfe6f378 edx: ffffffff
Oct 28 07:18:09 raq4 kernel: esi: df7fa570 edi: def4e570 ebp: df7fa000 esp: df7fbf7c
Oct 28 07:18:09 raq4 kernel: ds: 0018 es: 0018 ss: 0018
Oct 28 07:18:09 raq4 kernel: Process httpd (pid: 367, process nr: 12, stackpage=df7fb000)
Oct 28 07:18:09 raq4 kernel: Stack: 00000001 df7fbfbc df7fbfbc d51bc580 00000000 df7fbfa0 df7fa000 fffffff5
Oct 28 07:18:09 raq4 kernel: df7fa000 00000000 00000000 00000000 c01081ee 00000011 bffffc58 df7fbfc4
Oct 28 07:18:09 raq4 kernel: bffffc70 c01092e8 00000000 00000001 401fb80c 00000003 00000001 bffffc70
Oct 28 07:18:09 raq4 kernel: Call Trace: [sys_fork+18/28] [system_call+52/56]
Oct 28 07:18:09 raq4 kernel: Code: 39 02 0f 8d 23 07 00 00 ff 02 8d 76 00 31 c0 81 3d 80 b1 23
The RaQ's restart every day at 04:02:35, but somewhere between 26 OCT at 04:02:35 and 28 OCT at 07:18:09, it died. I think I tried to connect to the machine at the 28 OCT entry and received a partial response (it asked for admin logon information on :81, but then died.) It apparently started having problems before the 27 OCT restart at 04:02.
The machine is cool and, despite having to replace about 10 fans on various RaQ's and counting, these fans are running fine. Anyone know what the GPF and other information means? I suspect some sort of sporadic hardware failure, as suggested below by Rik.
-Gary
-----Original Message-----
From: Rik Thomas [mailto:rikt@xxxxxxxxxxxxxxxx]
Sent: Friday, November 02, 2001 10:05 AM
To: Gary M. Root
Cc: cobalt-users@xxxxxxxxxxxxxxx
Subject: Re: [cobalt-users] RaQ3 System Locking Up Daily
On Thu, 1 Nov 2001, Gary M. Root wrote:
> I've been learning from this group for years now. Thank you! We have 8 RaQ 3's and a few RaQ4r's. The RaQ 3's were configured identically and act as a load-balanced pool of application servers. One of the them recently began to lock up completely once-a-day. If we power down and reboot (with the switch), it comes up fine, but after a day, the LCD is locked up and all services are unreachable. The only "service" that does still respond is Ping, which is bad, because it tells our load-balancer that the machine is up when in reality all services are down. I searched but only came up with some warm hits. Any ideas about what's going on here?
>
> Thanks,
> Gary
>
I would start with looking at the fans, make sure all are running, sounds
like you are overheating the server. You also may want to look at
/var/log/messages to see if there are any error in there correlating to
the time the server locks up. One time we had a similar issue, but it
wouldn't even respond to ping and it was the ide controller.
--
Rik Thomas
rikt@xxxxxxxxxxxxxxxx http://SmartBackups.com
Is your Website Smart? Automated Website backups. Free 30Day trial!
Ph: 888.845.6856 Fx: 302.672.7315 ICQ: 879956