I observed an unusual phenomenon while using the apache httpclient libraries and I wish to know why it happens. I produced some sample code to show. Think about the following code:

//Example URL
 String url = "http://rads.stackoverflow.com/amzn/click/05961580";
 GetMethod get = new GetMethod(url);
 HttpMethodRetryHandler httpHandler = new DefaultHttpMethodRetryHandler(1, false);
 get.getParams().setParameter(HttpMethodParams.RETRY_HANDLER, httpHandler );
 HttpConnectionManager connectionManager = new SimpleHttpConnectionManager();
 HttpClient client = new HttpClient( connectionManager );
 client.getParams().setParameter("http.useragent", FIREFOX );
 String line;
 StringBuilder stringBuilder = new StringBuilder();
 String toStreamBody = null;
 String toStringBody = null;
 try {
  int statusCode = client.executeMethod(get);
  if( statusCode != HttpStatus.SC_OK ){
   System.err.println("Internet Status: " + HttpStatus.getStatusText(statusCode) );
   System.err.println("While getting page: " + url );
  toStringBody = get.getResponseBodyAsString();
  InputStreamReader isr = new InputStreamReader(get.getResponseBodyAsStream())
  BufferedReader rd = new BufferedReader(isr);
  while ((line = rd.readLine()) != null) {
 } catch (java.io.IOException ex) {
  System.out.println( "Failed to get page: " + url);
 } finally {
 toStreamBody = stringBuilder.toString();

This code prints nothing:

 System.out.println(toStringBody); // ""

This code prints the site:

 System.out.println(toStreamBody); // "Whole Page"

However it will get even stranger... Replace:




Now we obtain the mistake: Unsuccessful to obtain page: http://www.amazon.com/gp/offer-listing/0596158068/ref=dp_olp_used?ie=UTF8

I had been not able to locate another website besides for amazon . com that illegal copies this behavior however i assume you will find others.

I know that based on the documentation at http://hc.apache.org/httpclient-3.x/performance.html attempts using getResponseBodyAsString(), it doesn't state that the page won't load, that you might be vulnerable to an from memory exception. Is it feasible that getResponseBodyAsString() is coming back the page before it loads? How come this only happen with amazon . com?

Have you test with every other URL?

The URL in code that you simply provided redirects with 302 to http://www.amazon.com/dp/05961580/?tag=stackoverfl08-20, which in turn returns 404 (not found).

HttpClient doesn't handle redirects: http://hc.apache.org/httpclient-3.x/redirects.html