[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [cobalt-users] Robots.txt



-----Original Message-----
From: cobalt-users-admin@xxxxxxxxxxxxxxx
[mailto:cobalt-users-admin@xxxxxxxxxxxxxxx]On Behalf Of Jim Popovitch
Sent: 21 June 2001 03:34
To: cobalt-users@xxxxxxxxxxxxxxx
Subject: RE: [cobalt-users] Robots.txt


>That is EXACTLY the desired effect. How could you possibly see that as
>NOT good advice?

>-Jim P.


Because advice like this is taken into the archives and could be 
pulled up by anyone searching for solutions regarding "robots.txt".
I don't disagree with your idea per say, what I disagree with the
lack of additional information which could affect other users who
might mistakenly take your suggestion to be a proper way of 
controlling the spidering of robots. Any advice offered needs 
to be quantified such that it can't be misinterpreted by any 
subsequent searches of the archives.

The archives are littered with advice which, although perfect for
one given scenario is completely incorrect for another. A large 
proportion of these pieces of advice clearly state limitations and
potential problems, and so can be taken and analyzed for suitability
for the searchers individual problem. However where blanket statements
exist where no limitations and/or consequences are expressed, the 
possibility of additional problems occuring is introduced for anyone
who follows that piece of advice. The amount of "newbies" (and I'm 
in this group) who are told to search the archives before posting 
expect to find complete advice, is huge. It is not impossible to 
imagine one of these looking up robots.txt to see how to use it,
finding your comment about meta refresh and using it without 
being informed of all the possible consequences, and subsequently
finding that their site has been banned by various search engines.

Yes I agree that if you wished to permanently ban a domain from some 
(not all) search engines, the use of a meta refresh tag would have 
exactly the desired effect, BUT it is the simple fact that the entire 
domain gets banned, not just the individual page into which the refresh 
statement is placed, which makes this method bad advice in general.

The original posting was ambiguous in its question;
"How can i, for instance forbid a robot to search my pages?"

Did this mean all pages within a site, or certain pages within a site 
(for example registered user areas)?

I don't know of many sites that would not wish to be listed at all, 
the vast majority, if they wished to restrict spidering, would 
actually only want to restrict certain parts of the site.

If a domain is never submitted, and never referenced by an external 
site, it shouldn't get picked up by the spiders anyway.

The meta refresh command *can* be used without risk, providing that 
the time is set to a user intervention amount, and that the page 
text carries a warning that the user is about to be redirected. 
AFAIK this time is to be set to no less than "5" and is preferred 
to be around the "10" mark, to allow sufficient time for the user
to reject the redirect.

Equally, there is at least one major engine which does recognise 
and follow refresh commands through to the redirect page.


Regards

Si Watts
SiWIS
http://www.siwis.co.uk

Don't just think global, THINK LOCAL!