I've produced an internet site using wordpress, and the very first day it had been filled with dummy content until I submitted mine. Google indexed pages for example:
world wide web.url.com/?cat=1
Now these pages does not is available, and to create a removal request google request me to bar them on robots.txt
Must I use:
User-Agent: * Disallow: /?cat=
User-Agent: * Disallow: /?cat=*
My robots.txt file would look something similar to this:
User-agent: * Disallow: /cgi-bin Disallow: /wp-admin Disallow: /wp-includes Disallow: /wp-content Disallow: /wp-login.php Disallow: /wp-register.php Disallow: /author Disallow: /?cat= Sitemap: http://url.com/sitemap.xml.gz
Performs this look fine or will it cause any difficulty with search engines like google? Must I use Allow: / together with all of the Disallow:?
If your internet search engine can't crawl it, it cannot tell whether it has been removed and will continue to index (as well as start indexing) individuals Web addresses.
Generally, you need to not make use of the robots.txt directives to deal with removed content. If your internet search engine can't crawl it, it cannot tell whether it has been removed and will continue to index (as well as start indexing) individuals Web addresses. The best option would be to make certain that the site returns a 404 (or 410) HTTP result code for individuals Web addresses, then they'll give up instantly with time.
If you wish to use Google's urgent URL removal tools, you would need to submit these Web addresses individually anyway, so you wouldn't gain anything using a robots.txt disallow.
I'd opt for this really
To bar use of all Web addresses that incorporate a question mark (?) (more particularly, any URL that starts with your domain title, then any string, then an issue mark, then any string):
User-agent: Googlebot Disallow: /*?
And So I would really opt for:
User-agent: Googlebot Disallow: /*?cat=