[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [cobalt-users] help server down (strange)



>Hi all
>[snip]


Do you have 256Mb of RAM or 128Mb of RAM and 128Mb of swap?  How many sites?

We run the RAQs with 512Mb of RAM minimum now - and I'm about to get some 512Mb sticks if I can and increase to 768Mb of RAM initially, with a plan to go to 1024Mb in a month or so (Gerald - got any 512Mb sticks?).


>I tried to enter the admin gui that dit not work
>
>I can ftp, mail and telnet.
>
>So I treid to reboot
>
>By tel net su as root and dit /sbin/shutdown -r now


install SSH from http://www.pkgmaster.com then disable telnet as soon as possible - it's a serious security hole - get rid of it.

>
>Te server did not show me if it did a reboot I got the promt directly




On our RAQ4 this command works:

/sbin/shutdown -r -now


You will get your prompt back - but also see the broadcast message as follows:"


Broadcast message from root (pts/2) Sun Apr 27 10:58:30 2003...

The system is going down for reboot NOW !!

Within 1 minute, your telnet session should terminate or go dead - that's rebooting losing the connection - if it doesn't, or you don't get the broadcast message - you have a problem, perhaps you've been hacked.

Once you reboot, get two telnet/ssh sessions and run one with a top
take a look for processes going nuts and using lots of RAM.

My suggestion is that you have a rogue bot, or someone on the other end of a browser with IUS - or Idiot User Syndrome - it's common!  The user of your site isn't getting the response they need in a fast manner, so they hit the back button and hit forward, or the link or button again - they'll often repeat this a hundred times!  I've seen it! 

IUS browsers labor under the mistaken belief that requesting the same thing again, will result in it coming quicker - instead they have exacerbated the problem, and continue to do so as long as the request hits the same machine (this can be why confusion exists - as large corporate sites using load balancers with a round-robin type algorithm *can* be made to respond by moving onto the next httpd server in the list).  If a user learns that they can keep making requests and get their data quickly, they can crash a lesser server with requests, as they have NO CLUE what is on the server end.

Anyway, whether it's a bot or IUS, the problem occurs when your RAQ4 uses it's RAM, then begins to use SWAP space (disk based RAM) - which is MUCH slower than real RAM.  The problem gets worse and worse.

We've had both IUS and bots cause this on servers - ipchaining out the bot/user is the first task, then running top and cutting/pasting the output into a spreadsheet which grabbed the process IDs and in our case, we were able to terminate the processes gracefully using a kill -1.

If you can't do that - then simply shut down the httpd processes for 10-15 minutes.

/etc/rc.d/init.d/httpd stop ; sleep 600 ; /etc/rc.d/init.d/httpd start;

To get to the underlying problem, you may need to either reduce the RAM that a process is using - or - add more RAM - or both.  An interim solution is to block out the bot/ip hurting your system.

hth

Greg


>
>Can any one help me?
>
>How can I do en new reboot? And a good one?
>
>Or how can I restart the DNS server by telnet?
>
>Please some help!
>
>Maurice
>
>
>
>
>
>
>
>
>
>
>
>_____________________________________
>cobalt-users mailing list
>cobalt-users@xxxxxxxxxxxxxxx
>To subscribe/unsubscribe, or to SEARCH THE ARCHIVES, go to:
>http://list.cobalt.com/mailman/listinfo/cobalt-users

-- 
http://www.webyourbusiness.com/
Providers of E-Commerce Software &
Web Design Consultancy and Services.
PH: (970) 266-0195   FAX: (970) 266-0158