I've a listing of million web addresses. I have to extract the TLD for every url and make multiple files for every TLD. For instance collect all web addresses with .com as tld and dump that in 1 file, another apply for .edu tld and so forth. Further within each file, I must sort it alphabetically by domain names after which by subdomains etc.

Can anybody produce a jump for applying this in perl?

  1. Use URI to parse the URL,
  2. Use its host technique to get the host,
  3. Use Domain::PublicSuffix's get_root_domain to parse the host title.
  4. Make use of the tld or suffix technique to get the actual TLD or even the pseudo TLD.

use feature qw( say )

use Domain::PublicSuffix qw( )

use URI                  qw( )

my $dps = Domain::PublicSuffix->new()

for (qw(

   http://world wide web.google.com/

   http://world wide web.google.co.united kingdom/

)) Web addresses as absolute Web addresses with missing http://.

   $url = "http://$url" if $url !~ /^w+:/

   my $host = URI->new($url)->host()

   $host =~ s/.z//  # D::PS does not handle "domain.com.".

   $dps->get_root_domain($host)

      or die $dps->error()

   say $dps->tld()     # com  united kingdom

   say $dps->suffix()  # com  co.united kingdom