[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [cobalt-users] RaQ3i Clusters
- Subject: Re: [cobalt-users] RaQ3i Clusters
- From: Ed Booher Jr <ebooher@xxxxxxxxxxx>
- Date: Wed Dec 6 05:57:01 2000
- Organization: One Call Internet
- List-id: Mailing list for users to share thoughts on Cobalt products. <cobalt-users.list.cobalt.com>
Will,
The primary lost all port related services. Ports 80 and 23 were
"alive" but were not answering at all. Ports 25 and 110 were
dead and were generating a "connection refused" error. The
machine itself was still humming along, but the oddest thing was
that I attempted to console into the RaQ via the serial port and
all it gave me was lines of gibberish. I tried VT100 and ANSI
from my terminal program but neither seemed to work at all. I
was able to issue a reboot from the LCD of the Primary. It took
the Primary roughly 10 minutes to bring itself back up and in
this entire time the Secondary never attempted to assume
control. I have the failover control command set to 5 seconds.
After reboot the pair went into full synchronization mode. This
did complete, but I have no way of telling if it was attempting a
synchronization prior to the services failure. I looked through
the logs in /var/log and I didn't see anything in any of them
that would indicate to me that it was either getting ready to
die, or had already died. Saw the reboot command issued in them,
though. No e-mail in the Root mail box. I'm at a loss as to
what happened and why.
Thank you for your time,
Ed Booher
Network Engineer
One Call Internet
http://www.onecall.net/
(800) 876-1300
(317) 843-1300
"If Linux is an 18-wheeler semi, capable of pulling multi-ton
loads cross-country, BeOS is a slick Porsche 911 Turbo."
- Franco Vitaliano / OPEN Magazine -
[Disclaimer - Any and all views, opinions, outlooks,
philosophies, words of wisdom, words of brash stupidity, and
principles outlined in this post are the belief of the Reverend
Eddie W. Booher, Jr. and are not necessarily synonymous with the
views of his employer or religion.]
= Will DeHaan wrote: =
>
> == Ed Booher Jr wrote: ==
> >
> > Does anyone here really and intimately know the StaQWare
> > clustering software for the RaQ3i's?
> > ...
> > My Primary RaQ3i failed
> > Monday morning and the Secondary never failed over.
>
> How did the primary fail? Partial network disconnection? Service
> failure? As far as StaQware is concerned, if the kernel and StaQware
> daemon is operating on the primary, it has not failed.
>
> > It is
> > supposed to fail over within 5 seconds, and even after I issued a
> > forced shutdown / reboot from the LCD of the Primary unit, the
> > Secondary sat there in Secondary mode.
>
> Had the pair completed synchronization prior to the failure on Monday?
> Have you checked admin's email on the primary server if it's
> recoverable? Power off the primary and reboot the secondary using the
> LCD.
>
> -- Will