I've treatments for the HttpServer although not within the ApplicationServer or even the Java Programs a slave to but I have to block immediate access to particular pages on individuals programs. Precisely, I'm not going customers automating use of forms giving direct GET/Publish HTTP demands towards the appropriate servlet.

So, I made the decision to bar customers in line with the worth of HTTP_REFERER. In the end, when the user is moving within the site, it'll have a suitable HTTP_REFERER. Well, which was things i thought.

I implemented a rewrite rule within the .htaccess file that states:

RewriteEngine on 

# Options +FollowSymlinks
RewriteCond %{HTTP_REFERER} !^http://mywebaddress(.cl)?/.* [NC]
RewriteRule (servlet1|servlet2)/.+\?.+ - [F]

I was expecting to forbid use of customers that did not navigate the website but problem direct GET demands towards the "servlet1" or "servlet2" servlets using querystrings. But my anticipation ended abruptly since the regular expression "(servlet1servlet2)/.+?.+" did not labored whatsoever.

I recieve really disappointed after i transformed that expr to "(servlet1servlet2)/.+" and labored very well that my customers were blocked whether they sailed the website or otherwise.

So, my real question is: How do you can make this happen factor of not permitting "robots" with immediate access to particular pages basically don't have any access/rights/time for you to customize the application?

I am unsure basically can solve this all at once, but we are able to shuttle as necessary.

First, I wish to repeat things i think you're saying and make certain I am obvious. You need to disallow demands to servlet1 and servlet2 may be the request does not possess the proper referer also it does possess a query string? I am unsure I realize (servlet1servlet2)/.+?.+ since it appears like you're needing personal files under servlet1 and a pair of. I believe you may be mixing PATH_INFO (prior to the "?") having a GET query string (following the "?"). It seems the PATH_INFO part works however the GET query test won't. I designed a quick test on my small server using script1.cgi and script2.cgi and also the following rules labored to complete what you're requesting. They're clearly edited just a little to complement my atmosphere:

RewriteCond %{HTTP_REFERER} !^http://(www.)?example.(com|org) [NC]
RewriteCond %{QUERY_STRING} ^.+$
RewriteRule ^(script1|script2)\.cgi - [F]

The above mentioned caught incorrectly-referer demands to script1.cgi and script2.cgi that attempted to submit data utilizing a query string. However, you may also submit data utilizing a path_info by posting data. I made use of this type to safeguard against the three techniques getting used with incorrect referer:

RewriteCond %{HTTP_REFERER} !^http://(www.)?example.(com|org) [NC]
RewriteCond %{QUERY_STRING} ^.+$ [OR]
RewriteCond %{REQUEST_METHOD} ^POST$ [OR]
RewriteCond %{PATH_INFO} ^.+$
RewriteRule ^(script1|script2)\.cgi - [F]

In line with the example you had been looking to get working, I believe this is exactly what you would like:

RewriteCond %{HTTP_REFERER} !^http://mywebaddress(.cl)?/.* [NC]
RewriteCond %{QUERY_STRING} ^.+$ [OR]
RewriteCond %{REQUEST_METHOD} ^POST$ [OR]
RewriteCond %{PATH_INFO} ^.+$
RewriteRule (servlet1|servlet2)\b - [F]

Hopefully this a minimum of will get you nearer to your ultimate goal. Please tell us how it operates, I am thinking about your condition.

(BTW, To be sure that referer obstructing is poor security, however i also realize that relaity forces imperfect and partial solutions sometimes, that you simply appear to already acknowledge.)

I do not possess a solution, but I am betting that depending around the referrer won't ever work because user-agents have the freedom not to send it whatsoever or spoof it to something which will allow them to in.

You cannot tell apart customers and malicious scripts by their http request. However, you can evaluate which customers are asking for a lot of pages in way too short a period, and block their ip-addresses.

Javascript is yet another useful tool to avoid (or at best delay) screen scraping. Most automated scraping tools do not have a Javascript interpreter, so that you can do such things as setting hidden fields, etc.

Edit: Something like this Phil Haack article.