I'm experimentation with mechanize and re to obtain the websites which match a listing of stores.

I've been parsing Bing search engine results to seize the very best result's url. Regrettably, apparently in addition to the query, randomly occasions I have been getting an httplib.IncompleteRead error. Despite the fact that I have got a workaround which follows, Let me understand what is happening.

def bingSearch(query): #query is the store's name, i.e. "Bob's Pet Shop"
     while True:
         try:
             bingBrowser.open('http://www.bing.com/search?q="' + query.replace(' ','+') + '"' )
             htmlCode = bingBrowser.response().read()
             break
         except httplib.IncompleteRead:
             #Sleep for a little while and try again.

Other relevant info:

  • Sometimes, for any single bing url, this program will to try to open and browse that url multiple occasions, before a effective read with no IncompleteRead error.
  • bingBrowser's headers attribute is to establish to appear nice.
  • bingBrowser's robots attribute is placed to false.
  • httplib: incomplete read ... I'm not sure anything about Apache and so i wasn't in a position to comprehend the response to the question, however it might be useful for you. Nevertheless, I doubt that I am getting an identical problem (Why would bing.com be struggling with an Apache error?!)

Edit:

  1. Changed query.replace(' ','+') + '"' ) with urllib.urlencode(dict(q=query)) per JF Sebastian's suggestion - no change (I understand this wasn't suggested like a solution).
  2. Experienced from an inexplicable urllib2.URLError on bingBrowser.open('http://www.bing.com/search?q="' + query.replace(' ','+') + '"' )
  3. Got an xlwt related "String more than 65535 figures" error - most likely unrelated.

Thanks ahead of time.

I faced an identical problem. The thing is that you don't catch all of the exceptions that may arise when hooking up to Bing.

You might find an answer here. It really works correctly during my situation.