This has answer in other languages/platforms however i could not look for a robust solution in
C#. Here I am searching for negligence URL which we use within
WHOIS so I am not thinking about sub-domain names, port, schema, etc.
Example 1: http://s1.website.co.united kingdom/folder/querystring?key=value => website.co.united kingdom Example 2: ftp://username:email@example.com => website.com
The end result ought to be the same once the owner in whois is identical so sub1.abc.com and sub2.abc.com both fit in with that has the abc.com which I am have to extract from the URL.
I desired exactly the same, and so i authored a category that you could copy to your solution. It utilizes a hard coded string variety of tld's. http://pastebin.com/raw.php?i=VY3DCNhp
Console.WriteLine(GetDomain.GetDomainFromUrl("http://world wide web.beta.microsoft.com/path/page.htm"))
Console.WriteLine(GetDomain.GetDomainFromUrl("http://world wide web.beta.microsoft.co.united kingdom/path/page.htm"))
As @Pete noted, this can be a tiny bit complicated, but I'll try it out.
Observe that this application must have a complete listing of known TLD's. It may be retrieved from http://publicsuffix.org/. Left removing their email list out of this site being an exercise for that readers.
class Program Primary(string args) world wide web.domain.com.ac", "world wide web.domain.ac", "domain.com.ac", "domain.ac", "localdomain", "localdomain.local" foreach (string testCase in testCases) => ", testCase, UriHelper.GetDomainFromUri(new Uri("http://" + testCase + "/"))) /* Produces the next results: world wide web.domain.com.ac => domain.com.ac world wide web.domain.ac => domain.ac domain.com.ac => domain.com.ac domain.ac => domain.ac localdomain => localdomain localdomain.local => localdomain.local */ public static class UriHelper internet.ac", "mil.ac", "org.ac", "ac" // Complete this list from http://publicsuffix.org/. public static string GetDomainFromUri(Uri uri) public static string GetDomainFromHostName(string hostName) private static int FindMatchingParts(string hostNameParts, int offset) Consists of(domain.ToLowerInvariant())) return (hostNameParts.Length - offset) + 1 return FindMatchingParts(hostNameParts, offset + 1) private static string GetPartOfHostName(string hostNameParts, int offset) string domain = sb.ToString() return domain
The nearest you have access to may be the System.Uri.Host property, which may extract the sub1.abc.com portion. Regrettably, it's difficult to be aware what exactly may be the "toplevel" area of the host (e.g. sub1.foo.co.united kingdom versus sub1.abc.com)
if you want to domain title you'll be able to use URi.hostadress insinternet
if you want the url from content you will want to parse them using regex.