I have to get summary data from several rows. The summary fields are counts of the number of records have each value for various fields. For instance, a table with individuals age, city, job, etc, the summary data would come with fields like "countManager", "countCodeMonkey" for every job, after which "countChicago", "countNewYork" etc for metropolitan areas.
I understand that the simple way to get things are:
select count(*) from table group by age, city, job
But this really is vertical counting - another row for every value I want. Rather than rows using the counts, I want fields, because I've other fields to group by, for instance condition. And So I want my leads to seem like this:
| State | countManager | countMonkey | | IL | 3 | 25 | | NY | 5 | 40 |
I am searching at two ways to get this done. Finances one implemented, also it takes twenty minutes to complete. I am wondering if an alternative way could be faster.
The present way appears like this:
create view managers as select state, count(*) as theCount from table where job = 'Manager' group by state; create view monkeys as select state, count(*) as theCount from table where job = 'Monkey' group by state; select managers.theCount as managers, monkeys.theCount as monkeys from managers left join monkeys on managers.state = monkeys.state;
Within the real situation, there's about 20 more sights, and therefore 20 more joins.
I am thinking about the next horizontal count method rather:
select state, sum(case when job='Manager' then 1 else 0 end) as managers, sum(case when job='Monkey' then 1 else 0 end) as monkeys from table group by state;
This removes the joins. However I have no idea concerning the performance from the "sum situation" statement. Will this be any faster, comparable, or much reduced? Does the engine need to go with the rows multiple occasions, once for every such counted area? Or perhaps is it wise enough to count all of the fields in a single tell you, checking each value and incrementing the right sum?
I might spend a part of each day writing a script to develop a giant quantity of garbage data to check this, but I'd still prefer to understand how the db engine reacts, that we could not find elsewhere online.
Totally is dependent around the engine and just how you need to begin to see the data, however your second item should certainly survive faster.
twenty minutes for the first totally absurd though unless of course you've literally vast amounts of rows. For the reason that situation you should think about archiving data on the monthly/weekly basis with precompiled aggregated data inside a table where one can slice and dice to match.
If there's not concurrency involving the transaction yet others, the "sum situation" is a great choice. Aggregate functions as
GROUP BY, degrade performance. Remain in touch with a couple of things: "divide and conquer" and "number information is more faster than text data".
Produce a datawarehouse (just one table, a database) to prevent concurrency and increase processing.
CPU are wonderful hand calculators: convert your categorical data (
"Woman") by number data (
62) and enhance your mining.
Clean a decision about database suppliers or plataform options, but relational algebra.