I've something which appears like this:

my $report = new ReportGenerator; #custom object
my $dbh = $dbc->prepare('SELECT * FROM some_table WHERE some_condition'); #DBI handle
$dbh->execute();
while(my $href = $dbh->fetchrow_hashref){
    $report->process_record($href);
}
$dbh->finish();
print $report->printReport();

My issue is that every iteration from the loop is extremely slow. The issue is the MySQL. I'm wondering whether it was easy to put some type of wrapper within the while loop to really make it fetch several record at any given time, simultaneously, fetching all records into memory isn't practical either. I'm not concerned about the efficiency from the code(hashref versus arrayref,etc..). Rather, I'm thinking about fetching allows say 10000 records at any given time.

The database has ~5 Million records. I am unable to change/upgrade the server.

Thanks

You should use the fetchall_arrayref function which accepts a 'maxrows' argument:

while (my $data = $dbc->fetchall_arrayref(undef, 10000)) {
  for my $row( @{$data} ) {
    $report->process_record($row);
  }
}

You might consider the RowCacheSize property which tries to control the number of records are came back inside a fetch out of your driver.

Which bit is slow? Could it be the phone call to execute, fetchrow_hashref or process_record? It appears unlikely in my experience that fetchrow_hashref may be the problem. It is more prone to function as the execution from the query or even the black-box of process_record.

But all of this uncertainty. You can't really help much here. I suggest you acquire some real data concerning the performance from the code by utilizing Devel::NYTProf.

The quickest method to fetch rows as hashes while using DBI is by using bind_columns() such as this:

  $sth->execute;
  my %row;
  $sth->bind_columns( \( @row{ @{$sth->{NAME_lc} } } ));
  while ($sth->fetch) {
      print "$row{region}: $row{sales}\n";
  }

Measuring only appropriate if you are happy for every row to reuse exactly the same hash.

Beyond that, To be sure with davorg, avoid uncertainty: measure first.

For much a lot of while using DBI, including performance, see my tutorial slides (from 2007, but nonetheless relevant).