[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [cobalt-users] Number of hits, software running and performance



bert_catsburg wrote:
> 
> People,
> 
> We are in the process of building a large website for a customer.
> We prefer to use the Cobalt boxes for this site.
> 
> The problem is that we don't know how the number of hits relate to the
> size of the machine. So, how many hits of what size can a cobalt have
> per month.

Some ten thousands to a gazillion. And size does not matter unless it is
outrageous - size won't affect the output performance unless it gets so
huge that you have to consider network and interface bandwidth. 

> Are these numbers available?

No. There are hits and hits. A hit to a CGI script doing extensives
uncacheable searches in a huge database with many joins and processing
the result into graphics could end up with a minute or more of CPU time,
while series of hits to one plain text file of moderate size without any
processing directives in the server config would only need a millisecond
or two each. 

That is, depending on the type of content, the server could cope with
anything between hundreds of hits per second and a few hits per hour.
For an average corporate server with plain HTML/GIF files, a document
tree of a few hundred documents and most requests for a small group of
some ten documents, our RaQ1 servers can handle above 100 requests per
second, but at that load, serving server-processed documents or serving
huge file trees with a very even distribution of hits could already be
critical - for common CGI scripts without careful performance
optimization, something in the order of 10-30 requests/s is a more
realistic limit.

Besides, average requests per month (or even per day) are no meaningful
figure for a performance estimate - a site publishing e.g. sports
results will get almost all of its hits within a few hours during just
one or two days per week, and spend the rest of its time idle.
Generally, leisure content specific to a narrow band of time zones would
see its peaks in the afternoon, evening and on weekends, while business
relevant regional sites would see most requests during regular business
hours. In either case, the server might be overloaded to a degree where
users won't get through while the daily average still is far below of
the theoretical performance on evenly distributed requests.    

In real high-performance serving (certainly not the domain of RaQ
servers - you'll have to spend 5-6 digit figures on suitable servers),
stuff like parallel file access performance, IP stack tuning and memory
management start to get very important, and fine tuning the relationship
between content, server and hardware configuration gets real work. We
spent about three man-months of tuning to get a BIG multiprocessor
server to handle an average of 500-2000 hits per second sustained for
six hours per day. 

Sevo

-- 
Sevo Stille
sevo@xxxxxxxx