I am using wordpress with custom permalinks, and I wish to disallow my posts but leave my category pages available to bots. Here are a few good examples of the items the Web addresses seem like:

Category page: somesite us dot com /2010/category-title/

Publish: somesite us dot com /2010/category-title/product-title/

So, I am curious if there's some form of a regex means to fix leave the page at /category-title/ permitted while disallowing anything one level much deeper (the 2nd example.)

Any ideas? Thanks! :)

There's no official standards body or RFC for that robots.txt protocol. It had been produced by consensus in June 1994 by people from the robots subscriber list (robots-request@nexor.co.united kingdom). The data indicating the various components that shouldn't be utilized is specified by personal files known as robots.txt within the top-level directory from the website. The robots.txt designs are matched up by simple substring evaluations, so care should automatically get to make certain that designs matching sites possess the final '/' character appended, otherwise all files with names beginning with this substring will match, as opposed to just individuals within the directory intended.

There’s no 100% sure way to exclude your website from being found, apart from to not distribute them whatsoever, obviously.

See: http://www.robotstxt.org/robotstxt.html

There's no Allow within the Consensus. As well as the Regex choice is not within the Consensus either.

In the Robots Consensus:

This really is presently a little awkward, as there's no "Allow" area. The easiest way would be to invest files to become disallowed right into a separate directory, say "stuff", and then leave the main one file within the level above ezinearticles:

User-agent: *
Disallow: /~joe/stuff/

Alternatively you are able to clearly disallow all disallowed pages:

User-agent: *
Disallow: /~joe/junk.html
Disallow: /~joe/foo.html
Disallow: /~joe/bar.html

A Potential Solution:

Use .htaccess to create to disallow search robots from the specific folder while obstructing bad robots.

See: http://www.askapache.com/htaccess/setenvif.html

Would the next have the desired effect?

User-agent: *
Disallow: /2010/category-name/*/

You will need to clearly allow certain folders under /2010/category-name:

User-agent: *
Disallow: /2010/category-name/
Allow: /2010/category-name/product-name-1/
Allow: /2010/category-name/product-name-2/

But based on this article, Allow area isn't inside the standard, so some spiders may not support it.

EDIT: I simply found another resource for use within each page. This page describes rid of it:

The fundamental idea is when you include a tag like:


inside your HTML document, that document will not be indexed.

Should you choose:


the hyperlinks for the reason that document won't be parsed through the robot.