I have to undergo a big list of string url's and extract the domain title from their store.

For instance:

http://world wide web.stackoverflow.com/questions would extract world wide web.stackoverflow.com

I initially was using new URL(theUrlString).getHost() however the URL object initialization adds considerable time towards the process and appears needless.

It is possible to faster approach to extract the host title that might be as reliable?


Edit: My mistake, yes the world wide web. could be incorporated in domain title example above. Also, these web addresses might be http or https

If you wish to handle https etc, It is best to make a move such as this:

int slashslash = url.indexOf("//") + 2

domain = url.substring(slashslash, url.indexOf('/', slashslash))

Observe that this really is includes the world wide web part (just like URL.getHost() would do) that is really area of the domain title.

You need to be rather careful with applying a "fast" way unpicking Web addresses. There's lots of potential variability in Web addresses that may result in a "fast" approach to fail. For instance:

  • The plan (protocol) part could be designed in any mixture of lower and upper situation letters e.g. "http", "Http" and "HTTP" are equivalent.

  • The authority part can optionally incorporate a user title or a port number as with "http://you@example.com:8080/index.html".

  • Since DNS is situation insensitive, the hostname a part of a URL can also be (effectively) situation insensitive.

  • It's legal (though highly irregular) to %-scribe unreserved figures within the plan or authority aspects of a URL. You have to bear this in mind when matching (or draining) the plan, or when interpretation the hostname. An hostname with %-encoded figures is determined to become equal to one using the %-encoded sequences decoded.

Now, for those who have total charge of the procedure that creates the Web addresses you're draining, you are able to most likely ignore these niceties. But when they're gathered from documents or webpages, or joined by humans, you'd be strongly advised to think about what can happen in case your code encounters an "unusual" URL.

You can write a regexp? http:// is definitely exactly the same, after which match everything before you obtain the first '/'.

Presuming that they are all well-created Web addresses, however, you dont' know whether they will be http://, https://, etc.

int start = theUrlString.indexOf('/')

int start = theUrlString.indexOf('/', start+1)

int finish = theUrlString.indexOf('/', start+1)

String domain = theUrlString.subString(start, finish)

You could attempt to make use of regular expressions.


This is a question about removing domain title with regular expressions in Java:

Regular expression to retrieve domain.tld