[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [cobalt-users] Reoccurring Server Outage/Failure



> Trying to troubleshoot, and I don't like the fact that this server is far
> far away (hosted by CobaltRacks) and I don't have access to the console
> port during these outages.
>
> Both yesterday and again today, my server (a Raq3 lightly loaded ...
> hosting about 6-10 domains) has decided to drop off the net for 2-6 hours
> at a time. Both days this outage started around 4:30pm. Now, once the
> server comes back, uptime reports the server has not been rebooted, and
the
> logs and everything seem to back that up. From looking at the maillog and
> syslog, it appears that when this happens, the only traffic I'm seeing is
> the active monitor traffic .. I'm getting the normal polls every 15
> minutes, but not a thing is coming in from the local ethernet.

I had a similar fate befall one of my RaQs when I was colo'ing it in a
single rack space without a hardware firewall. Long story short, I was being
DOS'd by SYN Flood attacks. Once I reboot, all is fine. You could see it
happen if you were monitoring netstat (wrote a short script to do so). The
GRC.com website has a good description of SYN Floods if you want to read up.

I basically solved the problem by throwing money at it. I put it behind a
hardware firewall which helps to prevent these attacks by proxy'ing the
connections, resetting the connections faster than a timeout takes. Since I
moved behind the firewall, the problems ceased. I have been hit with some
massive floods (25+ Mbps by my calculations) since then, which I can see in
my colo port bandwidth monitoring, but my systems no longer go down - not an
invitation to try. :)

Unfortunately, the old kernel on the RaQs don't have some of the built in
DOS protections that the newer ones do. Apparently they had a rash of these
problems in my data center at one point - the RaQs were the only ones
crashing.

Just my 2 cents, and a different perspective.

Rick