I've a bit of a staging server around the public internet running copies from the production code for any couple of websites. I'd not really enjoy it when the staging sites get indexed.

It is possible to way I'm able to modify my httpd.conf around the staging server to bar internet search engine spiders?

Altering the robots.txt wouldn't go a long way since i have use scripts copying exactly the same code base to both servers. Also, I'd rather not alter the virtual host conf files either as there's a lot of sites and I'd rather not need to make sure to copy on the certain setting basically create a new site.

Would you alias robots.txt around the staging virtualhosts to some limited robots.txt located inside a different location?

To genuinely stop pages from being indexed, you will need to hide the websites behind HTTP auth. This can be done inside your global Apache config and employ an easy .htpasswd file.

Only disadvantage to this really is you have to key in a username/password the very first time you browse to the pages around the staging server.

Produce a robots.txt file using the following contents:

User-agent: *

Disallow: /

Put that file somewhere in your staging server your directory root is a superb place for this (e.g. /var/world wide web/html/robots.txt).

Add the next for your httpd.conf file:

# Exclude all robots

<Location "/robots.txt">

    SetHandler None


Alias /robots.txt /path/to/robots.txt

The SetHandler directive is most likely not needed, but it may be needed if you are utilizing a handler like mod_python, for instance.

That robots.txt file will be offered for those virtual hosts in your server, overriding any robots.txt file you may have for individual hosts.

(Note: My response is basically exactly the same factor that ceejayoz's response is recommending you need to do, however i needed to spend a couple of extra minutes determining all of the specifics to have it to operate. I made the decision to place this answer here with regard to other people who might come across this.)

Try Using Apache to prevent bad robots. You will get the user agents online or simply allow browsers, instead of attempting to block all bots.

Based on your deployment scenario, you need to search for methods to deploy different robots.txt files to dev/stage/test/push (or whatever combination you've). Presuming you've different database config files or (or whatever's similar) around the different servers, this will consume a similar process (you do have different passwords for the databases, right?)

Without having a 1-step deployment process in position, this really is most likely good motivation to obtain one... you will find a lot of tools available for various conditions - Capistrano is a nice doozy, and preferred within the Rails/Django world, but is in no way the only person.

Failing everything, you can most likely setup a worldwide Alias directive inside your Apache config that will affect all virtualhosts and indicate a limited robots.txt