I am attempting to write (or simply find a current) PHP method that can a hyperlink and extract the url. The secret is, it must hold underneath the weight of strange searching domain names like:
world wide web.champa.kku.ac.th
Searching at that one myself with human eyes, I still suspected it improperly: thought the domain could be
kku.ac.th but that provides a dns error when going to.
So anybody knows of a great way to dependably extract the domain from url:
http://site.com/hello.php http://site.com.united kingdom/hello.php http://subdomain.site.com/hello.php http://subdomain.site.com.united kingdom/hello.php http://world wide web.champa.kku.ac.th/hello.php // as well as the main one I could not tell
parse_url function may help, here ?
Inside your situation, with individuals Web addresses, the next part of code :
echo parse_url('http://site.com/hello.php', PHP_URL_HOST) . '<br />' echo parse_url('http://site.com.united kingdom/hello.php', PHP_URL_HOST) . '<br />' echo parse_url('http://subdomain.site.com/hello.php', PHP_URL_HOST) . '<br />' echo parse_url('http://subdomain.site.com.united kingdom/hello.php', PHP_URL_HOST) . '<br />' echo parse_url('http://world wide web.champa.kku.ac.th/hello.php', PHP_URL_HOST) . '<br />'
Gives this output :
site.com site.com.united kingdom subdomain.site.com subdomain.site.com.united kingdom world wide web.champa.kku.ac.th
PHP has got the parse_url() function that may help you perform the fundamental splitting into protocol, host, port, and so forth.
Regarding removing the "right" domain in uncertain cases, this really is very tough to tell because sometimes, "two-part TLDs" really are a measure through the TLD authority (e.g. within the United kingdom) and often are private businesses (e.g.
.united kingdom.com). I believe you will not circumvent maintaining lists of top level domain names which have two parts like
- .co.united kingdom
- .ac.united kingdom
individuals being could be treated like TLDs (Top level domain names), ingesting the 2nd part.
This is actually the best way of dependably telling apart "two-part TLDs" like
.co.united kingdom - where
server1.ibm.co.united kingdom (in which the two-part
.co.united kingdom must be removed to look for the domain itself) from regular sub-domain names like
.com must be removed).
A great beginning indicate get a listing of numerous important "two-part TLDs" may be the domain search at speednames.com (choose "all" in nations).
With Ruby you should use the Domainatrix library / jewel
require 'rubygems' require 'domainatrix' s = 'http://world wide web.champa.kku.ac.th/dir1/dir2/file?option1&option2' url = Domainatrix.parse(s) url.domain => "kku"
useful gizmo! :-)