I am attempting to make room some seafood species information profiles from the bespoke Content management systems using latin1 charset to some WordPress customised (custom publish type, with plenty of meta fields) database which utilizes UTF-8.

In addition, that old Content management systems uses some odd bbCode bits.

Essentially, I am searching for a function which is going to do this:

  • Take information from the old database with latin1_swedish_ci collation (and latin1 charset)
  • Convert all the non-standard figures (we've figures from languages including although not without Croatian, Czech, The spanish language, French and German) to HTML organizations for example á (amounts like &134; fine too).
  • Convert all the bbCode (see below) to HTML
  • Convert ' and " to HTML organizations
  • Return the data with utf-8 charset to my new database

The bbCode to and from are:

$search = array( '[i]', '[/i]', '[b]', '[/b]', '[pl]', '[/pl]' );
$replace = array( '<i>', '</i>', '<strong>', '</strong>', '', '' );

The function that I have attempted to date is:

$search = array( '[i]', '[/i]', '[b]', '[/b]', '[pl]', '[/pl]' );
$replace = array( '<i>', '</i>', '<strong>', '</strong>', '', '' );

function _convert($content) { 
    if(!mb_check_encoding($content, 'UTF-8') 
        OR !($content === mb_convert_encoding(mb_convert_encoding($content, 'UTF-32', 'UTF-8' ), 'UTF-8', 'UTF-32'))) { 

        $content = mb_convert_encoding($content, 'UTF-8'); 

        if (mb_check_encoding($content, 'UTF-8')) { 
            return $content;
        } else { 
            echo "<p>Couldn't convert to UTF-8.</p>";
        } 
    } 
} 

function _clean($content) {
    $content = _convert( $content );
    /* edited out because otherwise all HTML appears as &lt;html&gt; rather than <html>
    //$content = htmlentities( $content, ENT_QUOTES, "UTF-8" );
    $content = str_replace( $search, $replace, $content );

    return $content;
}

This really is preventing some fields from being imported towards the new database and is not changing the bbCode.

Basically make use of the following code, it mostly works:

$var = str_replace( $search, $replace, htmlentities( $row["var"], ENT_QUOTES, "UTF-8" ) );

However, certain fields that contains things i believe are Czech/Croatian figures are not appearing whatsoever.

Does anybody have recommendations for the way i can, within the order in the above list, effectively convert the data in the "old format" towards the new?