[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [cobalt-users] Robots.txt



> > The only
> > practical answer is to passwd-protect (htaccess) the site(s) or
> page(s) in
> > question. The spider may find 'em, they may appear on search
> engines, but
> > accessing them will be prevented by the existence of the password
> > requirement.
>
> Just one note to this - it doesn't work all of the time, either.
> Googlebot (Google's friendly little crawler) will go through
> password-protected pages regardless, and cache them. So if a user
> can't get into the page directly, they can access Google's "view
> cached page" option and see it anyway.
> I'm not sure what other search engines are doing cached pages now, but
> I do know that Google provides a form somewhere on their site where
> you can tell Gogglebot not to cache your pages.

Ah yes, now you come to mention it...I *have* seen that reference. I never
made the connection until you pointed it out, that passwd-protected pages
would be retrieved, cached and therefore viewable. Ummm...I'm sure it won't
be long before others emulate this..if they are not already doing it.
I'm forced to wonder then what options remain for those who do not want
content on a public server to be viewed....heh, methinks there are few.
Regards,
-Colin