[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [cobalt-users] [RAQ3] Runaway process control?



>We have a client whose site runs an e-commerce solution which will remain nameless - anyway, their particular store has about 10,000 product lines - this in itself is NOTHING too problematic - however, the problems occur with one of their store sections, namely Semiconductors.  This category has quite a few thousand items in it - which results in an HTML file of around 400kb in size.
>
>Again - this file is not the problem - but when someone searches for a particular product, and the result is on this HUGE file, the browsing public causes a CGI script to read and parse the offending HTML file, in order to highlight the search phrases.
>
>This process can take quite a long time - our research has shown that their viewers are impatient enough to click the back button, click the result again, and repeat the process, and repeat, repeat, repeat - before getting bored.
>
>How is this a problem?   Well... .each time some impatient browser does this, the server fires up cgi to serve the page - each of these cgi script has been known to consume about 60Mb of RAM!
>
>Now - I found no less than 24 of these processes today - each using 60Mb of RAM - or rather - 60Mb of SWAP in most cases.  These processes invariably end up sitting around for ages trying to send out data to a port that is no longer waiting for it - this uses CPU and HUGE amount of RAM - plus it puts the server into SWAP mode pretty darned quickly.
>


I got fed up with waiting for a solution, or even a work around, so I wrote a script to check for spinning sh00001.pl/sh000001.cgi perl scripts (the offending Actinic catalog/Actinic Business e-commerce software scripts).

I'm about to launch it live on the problem server, and I thought I'd share it.  The script basically grabs the process list and greps for the offending process, I've added the ability to limit processes for a particular user, as we only had issues with one site, so I figured that it would be ok to ignore the others on the occasional time they ran for a while.

Next the script waits for a predetermined amount of time (45 seconds in this script, but change the variable and you change the time) - this allows the processes to terminate on their own if they are quick processes.  After the time has elapsed, we kill -1 the process - this is as graceful a request to quit as it gets - so there should be no nasties left over.  Actinic processes respond to kill -1.

I've written the script so it can be setup to run forever, or configured to run once and quit, so that it can be scheduled using cron of other task scheduler.

I've tested it on a RAQ3 - the cut command may need to be alter on other unix variants, maybe even on RAQ4's or 550's, this will be dependent on the position of the process ID in the ps -elf output.

Enjoy:

#!/bin/sh
#
# cleanup_shproc.sh : Greg Hewitt-Long : Nov-06-2002
# This script is written to be run by cron, or by hand by
# the webserver administration user, or by root.
#
# Purpose: spinning sh00001.pl processes consuming large
# amounts of CPU and RAM can rapidly kill the performance
# of low end web servers (slowish CPU and low amounts of
# free RAM).  This proglem is particularly prevalent wher
# customers have excessive amounts of products in a single
# category.  Their browsing public will tend to be impatient
# to the point of abandoning search results - leaving these
# processes spinning.

# You should set this script to run via cron either hourly,
# every quarter hour, or some other interval, ONLY if using
# the EXITAFTERRUN="Y"
#

# clients to watch contains the user ID to watch for
# - might as well not bother with customers who are
#   not causing problems.
CLIENTSTOWATCH="user1|user2"
# gracetime is the time to wait after finding the process,
# which allows it time to exit normally
GRACETIME="45"
# timebetweenchecks is only relevant if "EXITAFTERRUN" is NOT "Y"
TIMEBETWEENCHECKS="180"
# KILLSIG is the kill signal to use on these processes -1 causes
# actinic processes to exit gracefully-ish
KILLSIG="-1"
EXITAFTERRUN="Y"
export CLIENTSTOWATCH GRACETIME TIMEBETWEENCHECKS KILLSIG EXITAFTERRUN

# ---- No configuration below here...
PIDLISTFILE="/tmp/PIDLIST.CL.$$" ; export PIDLISTFILE

while :
do
# clear the list of Process IDs and start the run
   PIDLIST="" ; export PIDLIST
   rm -f $PIDLISTFILE 2>>/dev/null
   ps -elf | grep sh000 | grep -v grep  | egrep "$CLIENTSTOWATCH" | while read LINE
   do
#echo "----"
#echo $LINE
# build the list of process IDs into a single string
      PID=`echo "$LINE" | cut -d\  -f5`
#echo "PID=$PID"
#echo "----"
      PIDLIST="$PIDLIST $PID" ; export PIDLIST
#echo "PIDLIST=$PIDLIST"
     echo "$PIDLIST" > $PIDLISTFILE
   done
# grab the file generated in the previous while loop (if any), as vars are
# not persistent outside of their while loops in all shell implementations.
   PIDLIST="`cat $PIDLISTFILE 2>>/dev/null`" ; export PIDLIST
#echo "PIDLISTOUTSITE=$PIDLIST"
   if [ "X" = "X$PIDLIST" ]
   then
# no processes found - it's a wonderful day!
      echo "$0: `date '+%d/%m/%y - %H:%m'`: no processes found"
   else
# wait and see if the process finished without incident, then kill them anyway
      sleep $GRACETIME
      kill $KILLSIG $PIDLIST
   fi
# check if we're running forever, or only once
   if [ "XY" = "X$EXITAFTERRUN" ]
   then
      rm -f $PIDLISTFILE 2>>/dev/null
      exit
   else
      sleep $TIMEBETWEENCHECKS
   fi
done
-- 
http://www.webyourbusiness.com/
Providers of E-Commerce Software &
Web Design Consultancy and Services.
PH: (970) 266-0195   FAX: (970) 266-0158