I am attempting to collecting data from the frequently upgrading blog, and so i simply employ some time loop including urllib2.urlopen("http:example.com") to refresh the page every a few minutes to gather the information I needed.
However I observe that I am not receiving the newest content using this method, it's not the same as things i see via browser for example Opera, and after checking both source code of Opera and also the same page I recieve from python, I discovered it's Wordpress Super Cache that is stopping me from getting the newest result.
And That I get exactly the same cache page even when I spoof the headers during my python code. And So I question it is possible to method to miss Wordpress super cache? And why there is no such super cache in Opera whatsoever?
Perhaps you have attempted altering the URL with a few harmless data? Something similar to this:
import time urllib2.urlopen("http:\example.com?time=%s" % int(time.time()))
It'll really call
http:\example.com?time=1283872559. Most caching systems will bypass the cache if there is a querystring or it is something that is not expected.