[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [cobalt-users] Robots.txt
- Subject: RE: [cobalt-users] Robots.txt
- From: "Simon Watts" <simon@xxxxxxxxxxx>
- Date: Thu Jun 21 01:21:09 2001
- List-id: Mailing list for users to share thoughts on Cobalt products. <cobalt-users.list.cobalt.com>
-----Original Message-----
From: cobalt-users-admin@xxxxxxxxxxxxxxx
[mailto:cobalt-users-admin@xxxxxxxxxxxxxxx]On Behalf Of Jim Popovitch
Sent: 21 June 2001 13:22
To: cobalt-users@xxxxxxxxxxxxxxx
Subject: RE: [cobalt-users] Robots.txt
--- Simon Watts <simon@xxxxxxxxxxx> wrote:
[snip]
>> Equally, there is at least one major engine which does recognise 
>> and follow refresh commands through to the redirect page.
>How ironic!  Simon, you write a whole page on Internet style and
>perfection (as it pertains to Search Engine Results) and then you end
>it with a quip that has no detail whatsoever.  Which search engine? :)
>JIM,
as you are so interested, the meta refresh command is detected by alta 
vista and would most likely trigger a human inspection and subsequent
review of any site.
Googlebot is also strongly rumoured to have the ability to detect and 
follow meta refresh commands, although it is currently not 100% clear 
whether they can detect javascripted redirects using the 
location.replace function. This javascript equivalent of the meta 
refresh currently would be most certainly
ignored if it were placed into an external .js file, as it would be 
recommended for any other javascripts if you are concerned about 
SE positioning.
Just as an aside, this is the robots.txt file from google itself!
User-agent: *
Disallow: /search
Disallow: /groups
Disallow: /keyword/
Disallow: /u/
Disallow: /cobrand
Disallow: /custom
Back to Carries prolem with cached pages in Google;
If you wish to not have google cache your page then insert the 
following into the <head></head> section;
<META NAME="GOOGLEBOT" CONTENT="NOARCHIVE"> 
This is the official "Google" method of preventing your page
being cached. It would however be ineffective for any other engine 
using page caching.
Alternatively, you could use a Javascript as follows;
<script language="javascript">
<!-- hide
if (location.href.indexOf("cache") != -1)
{
window.location.href="http://www.domain.com/path/to/page.htm";
}
// end hide --></script>
This script looks for the word "cache" in the browser address bar,
and if found the page is redirected to the non cached version.
This would most likely work for any caching engine (if there are 
others) and force viewers to access through your htaccess file.
If any other engine does do caching of pages, but without issueing
the word cache into the viewed cached page URL, you would simply
need to repeat the script using the correct word for said engine.
Having said that, I've yet to come across any other major engine 
caching pages
Regards
Si Watts
SiWIS
http://www.siwis.co.uk
Don't just think global, THINK LOCAL!