I'm using Ruby on Rails 3..10 and I must retrieve the plan://domain a part of an URL without such as the subdomain part. That's, basically possess the following URL

http://world wide web.sub_domain.domain.com

I must retrieve

http://world wide web.domain.com

How do i do this (must i make use of a regex?)?


@mu is simply too short appropriately stated in hisher comment (that made ​​me think...):

You get me wrong. world wide web.ac.united kingdom is meaningless, the bottom domain for Oxford is ox.ac.united kingdom the ac.united kingdom part means "academic United kingdom" and it is, semantically, one component. A couple of other nations have similar naming schemes.

So, the update real question is:

How do i iterate over an URL (for instance http://world wide web.maths.ox.ac.united kingdom/) as produced in the next steps to remove progressively subdomain parts before the last?

http://world wide web.maths.ox.ac.united kingdom/ # Step  (start)

http://world wide web.ox.ac.united kingdom/       # Step One

http://world wide web.ac.united kingdom/          # Step Two (finish)

This can be a total hack, and that i have no clue how it may be helpful within the generic sense, but here you decide to go.

ruby-1.8.7-p352 >   uri = URI.parse("http://world wide web.foo.domain.com/")

 => #<URI::HTTP:0x105011840 URL:http://world wide web.foo.domain.com/>

ruby-1.8.7-p352 > uri.plan + "://" + uri.host.split(/./)[-2..-1].join(".")

 => "http://domain.com"

Should you know the URL finishes in .com and follows the format you specified, why not consider a regular expression such as this:


to parse the domain and also the following .com. Prefix by using http://world wide web and you ought to be ready.

There's no "general situation" solution with this. Some Web addresses make use of a suffix with one us dot (.com or .edu), although some use multiple dots (.co.jp, etc). You will not have the ability to solve this with simple things like a regex.

That which you may have the ability to do would be to make a listing of possible URL suffixes and create a regex for every. Whether it matches your input string, make use of a variation of the aforementioned:

base_regex = '.[w-]+'

list_of_suffixes.each sworld wide web.' + match[]

Note: code is off the top my mind as well as for illustration reasons only (it most likely will not run just as-is, however, you understand)