I've got a saved procedure which handles adding nodes inside a tree. Essentially the table structure is

label VARCHAR(1) /* the value of the node which is a character */
parent_id INT /* id of the parent node */

Here's my saved procedure:

    takes a word, and adds every character in to the table 
    where every character is a child of the previous character
    the first character of every word is a child of the root
    /* (parent_id = 0) => child of root */
    DECLARE pid INT DEFAULT 0; /* parent id */
    DECLARE strlength INT;
    DECLARE query_count INT DEFAULT 0;
    DECLARE active_char VARCHAR(1);
    SET strlength = LENGTH(word);
    /* loop through the word */
    WHILE j <= strlength DO
        /* get a single character from word */
        SET active_char = SUBSTR(word,j,1);
        /* if the character doesn't already exist, insert it */
        SELECT COUNT(*) INTO query_count FROM tree
        WHERE parent_id = pid AND label = active_char;
        IF (query_count = 0) THEN
            INSERT INTO tree (label, parent_id)
            VALUES (active_char,pid);
        END IF;
        /* Set the new parent id */
        SELECT id INTO npid FROM tree
        WHERE label = active_char AND parent_id = pid;
        SET pid = npid;
        SET j = j + 1;
END //

I am sure you will find a couple of tweaks I'm able to make to create the process a little more effective however i can't think about something that would considerably lessen the time required.

I am dealing with many different words and this procedure operates a few 100,000 occasions which means lots of card inserts and lots of queries. It requires hrs possibly days(unsure because I threw in the towel waiting and stopped the procedure).

The truth is, I do not think I'm able to perform a bulk place because every place is dependent on the previous place.

I'm wondering if there's a way to produce a virtual table that's saved in primary memory to complete each one of these procedures fast after which just save the end result to the actual table.

Right now, the only real possible solution I'm able to think about would be to build the tree in PHP after which perform a bulk place. I believe this ought to be faster but I am unsure as much as what degree.

Any help could be really appreciated.


Although I am afraid I can not claim that they can used it myself, passing the information lower towards the saved procedure as XML and processing it as being described here would appear an acceptable approach. N.B. MySQL 5.1 or greater.

Some ideas:

  1. "Lowering the time" means you need to show us the schema, indexes, queries, and EXPLAIN (each one of the claims within the procedure).
  2. Use INSERT IGNORE rather than the SELECT/INSERT pair that you are using now.
  3. You don't have to SELECT the ID out - use LAST_INSERT_ID()
  4. Why choose into npid after which copy into pid? Just choose into pid.