I want some good info from the website that isn't mine, to be able to understand this information I have to login towards the web site to gather the data, this occurs via a HTML form. How do i do that authenticated screenscaping in C#?

Extra information:

  • Cookie based authentication.
  • Publish action needed.

You'd result in the request as if you'd just completed the shape. Presuming it's Publish for instance, you are making a Publish request using the correct data. Now if you cannot login straight to exactly the same page you need to scrape, you'll have to track whatever snacks are positioned after your login request, and can include them inside your scraping request to let you stay drenched in.

It could seem like:

HttpWebRequest http = WebRequest.Create(url) as HttpWebRequest

http.Connection = "Keep-alive" //uncertain

http.Method = "Publish"

http.ContentType = "application/x-world wide web-form-urlencoded"

string postData="FormNameForUserId=" + strUserId + "&FormNameForPassword=" + strPassword

byte[] dataBytes = UTF8Encoding.UTF8.GetBytes(postData)

http.ContentLength = dataBytes.Length

using (Stream postStream = http.GetRequestStream())



HttpWebResponse httpResponse = http.GetResponse() as HttpWebResponse

// Most likely wish to inspect the http.Headers here first

http = WebRequest.Create(url2) as HttpWebRequest

http.CookieContainer = new CookieContainer()

http.CookieContainer.Add(httpResponse.Snacks)

HttpWebResponse httpResponse2 = http.GetResponse() as HttpWebResponse

Maybe.

Use a WebBrowser control. Just feed it the Link to the website, then make use of the DOM to create the password in to the right fields, and finally send a click towards the submit button. By doing this you do not worry about not the 2 input fields and also the submit button. No cookie handling, no raw HTML parsing, no HTTP sniffing at - all that's made by the browser control.

Should you go this way, a couple of more suggestions:

  1. You are able to avoid the control from loading add-inches for example Expensive - can save you a while.
  2. When you login, you can aquire whatever important information in the DOM - you don't need to parse raw HTML.
  3. If you wish to result in the tool much more portable just in case the website changes later on, you are able to replace your explicit DOM manipulation by having an injection of JavaScript. The JS could be acquired from an exterior resource, and when known as it may perform the fields population and also the submit.

You should utilize the HTTPWebRequest and perform a Publish. This link should help you to get began. The bottom line is, you have to consider the HTML Type of the page you are attempting to publish from to determine all of the parameters the shape needs to be able to submit the publish.

http://world wide web.netomatix.com/httppostdata.aspx

http://geekswithblogs.internet/rakker/archive/2006/04/21/76044.aspx