I am creating a MySQL database which consists of records about special substrings of DNA in types of yeast. My table appears like this:
+--------------+---------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +--------------+---------+------+-----+---------+-------+ | species | text | YES | MUL | NULL | | | region | text | YES | MUL | NULL | | | gene | text | YES | MUL | NULL | | | startPos | int(11) | YES | | NULL | | | repeatLength | int(11) | YES | | NULL | | | coreLength | int(11) | YES | | NULL | | | sequence | text | YES | MUL | NULL | | +--------------+---------+------+-----+---------+-------+
You will find roughly 1.8 million records. In one sort of query I wish to see the number of DNA substrings are connected with every kind of species and region, and so i problem this question:
select species, region, count(*) group by species, region;
The species and region posts only have two possible records (conserved/scer for species, and promoter/coding for region) yet this question takes about thirty seconds.
Is an ordinary period of time to anticipate for this kind of query given how big the table? Could it be slow because I am using text fields rather than simple integer or boolean values (I favor text fields as several non-CS scientists is going to be while using DB). Every other ideas and suggestions could be welcome.
Please excuse if this sounds like a boneheaded question, I'm an SQL neophyte.
P.S. I have also seen this however the suggested solution does not appear relevant for which I am doing.
EDIT: Transforming individuals fields to VARCHARs reduced the runtime to ~2.5 seconds. Note I additionally timed it against ENUMs which in fact had an identical timing.
Why're all of your string based posts understood to be TEXT? Should you browse the performance comparison, you'll notice that TEXT was ~3x reduced than the usual VARCHAR column using identical indexing: http://forums.mysql.com/read.php?24,105964,105964
In case your fields are just ever likely to have 2 values, you are much best which makes them booleans. It's also wise to make everything
NOT NULL unless of course there is a real reason you will need so that it is
Also have a look in the
ENUM type for an easy method to utilize a finite quantity of human-readable values for any column.
For slowness, the very first factor to test would be to create indices in your posts. For that particular query you are showing here, a catalog on
species, region should create a massive difference:
create index on mytablename (species, region);
must do it.