[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [cobalt-users] Reoccurring Server Outage/Failure
- Subject: RE: [cobalt-users] Reoccurring Server Outage/Failure
- From: "Jerry Farquhar" <jerry@xxxxxxxxxxx>
- Date: Tue Jan 14 22:19:01 2003
- List-id: Mailing list for users to share thoughts on Sun Cobalt products. <cobalt-users.list.cobalt.com>
Sounds like a possible bad port on the Hub/Switch with it being
intermittent. On a whim.. you might try telnet/ssh..ing from one of your
working servers to the problem server to see if the problem is an internal
or external problem at the Co-Lo.. especially if the logs support the box
having been up the whole time. Alternately you could simply ask your Co-Lo
to switch the server to a different port or even better to a different port
on a different hub/switch to see if that clears up the problem.
It wouldn't be the first time that a hub/switch had a bad port..
I've had a number of LAN customers who have over the last several years had
hub/switch problems and not always are they easily identified because of the
ports not failing entirely. In some cases fluctuating in and out of
operational mode sometimes for days or weeks between problem occurrences.
Jerry
-----Original Message-----
From: cobalt-users-admin@xxxxxxxxxxxxxxx
[mailto:cobalt-users-admin@xxxxxxxxxxxxxxx]On Behalf Of Charlie Clemmer
Sent: Tuesday, January 14, 2003 11:48 PM
To: cobalt-users@xxxxxxxxxxxxxxx
Subject: [cobalt-users] Reoccuring Server Outage/Failure
Trying to troubleshoot, and I don't like the fact that this server is far
far away (hosted by CobaltRacks) and I don't have access to the console
port during these outages.
Both yesterday and again today, my server (a Raq3 lightly loaded ...
hosting about 6-10 domains) has decided to drop off the net for 2-6 hours
at a time. Both days this outage started around 4:30pm. Now, once the
server comes back, uptime reports the server has not been rebooted, and the
logs and everything seem to back that up. From looking at the maillog and
syslog, it appears that when this happens, the only traffic I'm seeing is
the active monitor traffic .. I'm getting the normal polls every 15
minutes, but not a thing is coming in from the local ethernet.
Thinking the worst, I've rerun chkrootkit after each failure, as well as
their normally scheduled runs, but nothing is turning up there. I've also
got PortSentry, Logcheck, and IPChains all running on this box, and I've
seen no trips on the security front. Without having physical console access
during the time of this outage, what can I go back and check to try and get
a clue as to what's going on? As a reference point, this outage is only
affecting this one server, and my other servers at CobaltRacks are running
fine, so it doesn't appear to be network related, unless it's the specific
ethernet port that I'm connected to.
Any ideas or pointers?
Charlie
_____________________________________
cobalt-users mailing list
cobalt-users@xxxxxxxxxxxxxxx
To subscribe/unsubscribe, or to SEARCH THE ARCHIVES, go to:
http://list.cobalt.com/mailman/listinfo/cobalt-users