[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [cobalt-users] RaQ3i Clusters
- Subject: Re: [cobalt-users] RaQ3i Clusters
- From: "Graeme Fowler" <graeme.f@xxxxxxxxxxxxxxx>
- Date: Fri Dec 8 01:37:01 2000
- Organization: WebFusion Internet Solutions
- List-id: Mailing list for users to share thoughts on Cobalt products. <cobalt-users.list.cobalt.com>
Will DeHaan wrote:
> Ed Booher Jr wrote:
> > The primary lost all port related services. Ports 80 and 23 were
> > "alive" but were not answering at all. Ports 25 and 110 were
> > dead and were generating a "connection refused" error. The
> > machine itself was still humming along
<snip>
> So, as far as StaQware was concerned, the master did not fail--the
> kernel and the StaQware daemon kept on humming along, and full network
> connectivity was maintained.
We see this with both RaQs and other linux boxes which have been
thrashing heavily - ie. they have run out of physical RAM. This is, in
my experience, caused by either a genuine overload; mail loops (check
your autoresponders!); 'flood' type attacks (whether to your web server
or other services); or (and this is the most likely case) badly written
CGI scripts.
It only takes one CGI which (say) reads in a big file/database and
processes it solely in RAM to take a server out. Web log analysers and
banner exchange schemes are my favourites... there are a large number of
them out there, and some are terribly bad code.
> This I haven't heard of before. My first guess would be baud rate
> misnegotiation at your term server or the RaQ console, but that's
> probably simplistic. Anyone else seen this console condition?
Yes. Usually restarting my terminal software makes it go away, but there
have been times when it simply doesn't work - either spitting gibberish
out or not responding. It's likely that in the case of a box on which
most services which are RAM sensitive (eg. inetd) have died, that the
mingetty process and/or init have ended up with corrupt data in memory
and can't restart properly. In that case, a reboot seems the only
option...
Graeme Fowler
Systems Administrator
graeme.f@xxxxxxxxxxxxxxx
***************************************************************
WebFusion Internet Solutions Ltd.
The UK's Largest Web Hosting Company
http://www.webfusion.co.uk
***************************************************************