Monday, November 2, 2009

Screen scraping in C#

If you ever have to scrape the contents of a HTML page at a given URL, and create a String variable of HTML, the following code can come in handy.

Note: WebClient object is part of System.NET namespace in .NET

//Part of System.NET namespace
WebClient client = new WebClient(); 

//URL to scrape
string strUrl = "http://www.gmail.com/"; 

byte[] strHTML; 
strHRML = client.DownloadData(strUrl); 
UTF8Encoding objUTF8 = new UTF8Encoding(); 
return objUTF8.GetString(strHTML); 

No comments:

Post a Comment