[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [cobalt-users] RaQ3 System Locking Up Daily



Here is what I have in the  message log:

Oct 26 04:02:35 raq4 syslogd 1.3-3: restart.
Oct 28 07:18:09 raq4 kernel: general protection fault: 0000
Oct 28 07:18:09 raq4 kernel: CPU:    0
Oct 28 07:18:09 raq4 kernel: EIP:    0010:[do_fork+175/2032]
Oct 28 07:18:09 raq4 kernel: EFLAGS: 00010286
Oct 28 07:18:09 raq4 kernel: eax: 00000100   ebx: def4e000   ecx: dfe6f378   edx: ffffffff
Oct 28 07:18:09 raq4 kernel: esi: df7fa570   edi: def4e570   ebp: df7fa000   esp: df7fbf7c
Oct 28 07:18:09 raq4 kernel: ds: 0018   es: 0018   ss: 0018
Oct 28 07:18:09 raq4 kernel: Process httpd (pid: 367, process nr: 12, stackpage=df7fb000)
Oct 28 07:18:09 raq4 kernel: Stack: 00000001 df7fbfbc df7fbfbc d51bc580 00000000 df7fbfa0 df7fa000 fffffff5
Oct 28 07:18:09 raq4 kernel:        df7fa000 00000000 00000000 00000000 c01081ee 00000011 bffffc58 df7fbfc4
Oct 28 07:18:09 raq4 kernel:        bffffc70 c01092e8 00000000 00000001 401fb80c 00000003 00000001 bffffc70
Oct 28 07:18:09 raq4 kernel: Call Trace: [sys_fork+18/28] [system_call+52/56]
Oct 28 07:18:09 raq4 kernel: Code: 39 02 0f 8d 23 07 00 00 ff 02 8d 76 00 31 c0 81 3d 80 b1 23

The RaQ's restart every day at 04:02:35, but somewhere between 26 OCT at 04:02:35 and 28 OCT at 07:18:09, it died. I think I tried to connect to the machine at the 28 OCT entry and received a partial response (it asked for admin logon information on :81, but then died.) It apparently started having problems before the 27 OCT restart at 04:02.

The machine is cool and, despite having to replace about 10 fans on various RaQ's and counting, these fans are running fine. Anyone know what the GPF and other information means? I suspect some sort of sporadic hardware failure, as suggested below by Rik.

-Gary


-----Original Message-----
From: Rik Thomas [mailto:rikt@xxxxxxxxxxxxxxxx]
Sent: Friday, November 02, 2001 10:05 AM
To: Gary M. Root
Cc: cobalt-users@xxxxxxxxxxxxxxx
Subject: Re: [cobalt-users] RaQ3 System Locking Up Daily


On Thu, 1 Nov 2001, Gary M. Root wrote:

> I've been learning from this group for years now. Thank you! We have 8 RaQ 3's and a few RaQ4r's. The RaQ 3's were configured identically and act as a load-balanced pool of application servers. One of the them recently began to lock up completely once-a-day. If we power down and reboot (with the switch), it comes up fine, but after a day, the LCD is locked up and all services are unreachable. The only "service" that does still respond is Ping, which is bad, because it tells our load-balancer that the machine is up when in reality all services are down. I searched but only came up with some warm hits. Any ideas about what's going on here?
> 
> Thanks,
> Gary
> 

I would start with looking at the fans, make sure all are running, sounds 
like you are overheating the server.  You also may want to look at 
/var/log/messages to see if there are any error in there correlating to 
the time the server locks up.  One time we had a similar issue, but it 
wouldn't even respond to ping and it was the ide controller.


-- 
Rik Thomas 
rikt@xxxxxxxxxxxxxxxx http://SmartBackups.com
Is your Website Smart? Automated Website backups.  Free 30Day trial!
Ph: 888.845.6856 Fx: 302.672.7315 ICQ: 879956