For some strange reason I couldn’t figure out why some websites would return status 500 internal server error when they were retrieved using a WebClient in C#. The same page would render fine using a browser. It was only once in a while it happened.

I thought it might have something to do with the WebClient class so I tried using an HttpWebRequest and HttpWebResponse instead, but the result was the same. Then I started Fiddler to construct requests and tried out different HTTP headers. This let me to the problem and the solution.

The problem was that some websites use certain headers without checking if they exist or not. In this case it was the Accept-Encoding and Accept-Language headers that were missing from my request. The solution is the method below.

/// <summary>

/// Downloads a web page from the Internet and returns the HTML as a string. .

/// </summary>

/// <param name="url">The URL to download from.</param>

/// <returns>The HTML or null if the URL isn't valid.</returns>

public static string DownloadWebPage(Uri url)

{

  try

  {

    HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);

    request.Headers["Accept-Encoding"] = "gzip";

    request.Headers["Accept-Language"] = "en-us";

    request.Credentials = CredentialCache.DefaultNetworkCredentials;

    request.AutomaticDecompression = DecompressionMethods.GZip;

 

    using (WebResponse response = request.GetResponse())

    {

      using (StreamReader reader = new StreamReader(response.GetResponseStream()))

      {

        return reader.ReadToEnd();

      }

    }

  }

  catch (Exception)

  {

    return null;

  }

}

This is one of those things that seem obvious when you know the way around it. It still didn't stop me from using an hour tracking it down. Doh!

Comments


Comments are closed