[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[cobalt-users] Reoccuring Server Outage/Failure



Trying to troubleshoot, and I don't like the fact that this server is far far away (hosted by CobaltRacks) and I don't have access to the console port during these outages.

Both yesterday and again today, my server (a Raq3 lightly loaded ... hosting about 6-10 domains) has decided to drop off the net for 2-6 hours at a time. Both days this outage started around 4:30pm. Now, once the server comes back, uptime reports the server has not been rebooted, and the logs and everything seem to back that up. From looking at the maillog and syslog, it appears that when this happens, the only traffic I'm seeing is the active monitor traffic .. I'm getting the normal polls every 15 minutes, but not a thing is coming in from the local ethernet.

Thinking the worst, I've rerun chkrootkit after each failure, as well as their normally scheduled runs, but nothing is turning up there. I've also got PortSentry, Logcheck, and IPChains all running on this box, and I've seen no trips on the security front. Without having physical console access during the time of this outage, what can I go back and check to try and get a clue as to what's going on? As a reference point, this outage is only affecting this one server, and my other servers at CobaltRacks are running fine, so it doesn't appear to be network related, unless it's the specific ethernet port that I'm connected to.

Any ideas or pointers?

Charlie