After reading through the guidelines from this excellent Nettuts+ article I have develop a table schema that will separate highly volatile data using their company tables exposed to heavy reads and simultaneously lower the amount of tables necessary for the entire database schema, however I am unsure if this is an excellent idea because it does not stick to the rules of normalization and I must hear your advice, this is actually the general idea:


I have four kinds of customers patterned inside a Class Table Inheritance structure, within the primary "user" table I store data present with all of the customers (id, username, password, several flags, ...) together with some TIMESTAMP fields (date_created, date_updated, date_activated, date_lastLogin, ...).

To quote the end #16 in the Nettuts+ article pointed out above:

Example 2: You've got a “last_login” area inside your table. It updates every time a person logs to the website. But every update on the table causes the query cache for your table to become flushed. You are able to put that area into another table to help keep updates for your customers table low.

Now it will get even more difficult, I have to keep an eye on some user statistics like

  • the number of unique occasions a person profile was seen
  • the number of unique occasions a ad from the specific kind of user was clicked on
  • the number of unique occasions a publish from the specific kind of user was seen
  • and so forth...

During my fully stabilized database this accumulates to around eight to ten additional tables, it isn't a great deal but I must keep things simple basically could, so I have develop the next "events" table:

|------|----------------|----------------|---------------------|-----------|
| ID   | TABLE          | EVENT          | DATE                | IP        | 
|------|----------------|----------------|---------------------|-----------|
| 1    | user           | login          | 2010-04-19 00:30:00 | 127.0.0.1 |
|------|----------------|----------------|---------------------|-----------|
| 1    | user           | login          | 2010-04-19 02:30:00 | 127.0.0.1 |
|------|----------------|----------------|---------------------|-----------|
| 2    | user           | created        | 2010-04-19 00:31:00 | 127.0.0.2 |
|------|----------------|----------------|---------------------|-----------|
| 2    | user           | activated      | 2010-04-19 02:34:00 | 127.0.0.2 |
|------|----------------|----------------|---------------------|-----------|
| 2    | user           | approved       | 2010-04-19 09:30:00 | 217.0.0.1 |
|------|----------------|----------------|---------------------|-----------|
| 2    | user           | login          | 2010-04-19 12:00:00 | 127.0.0.2 |
|------|----------------|----------------|---------------------|-----------|
| 15   | user_ads       | created        | 2010-04-19 12:30:00 | 127.0.0.1 |
|------|----------------|----------------|---------------------|-----------|
| 15   | user_ads       | impressed      | 2010-04-19 12:31:00 | 127.0.0.2 |
|------|----------------|----------------|---------------------|-----------|
| 15   | user_ads       | clicked        | 2010-04-19 12:31:01 | 127.0.0.2 |
|------|----------------|----------------|---------------------|-----------|
| 15   | user_ads       | clicked        | 2010-04-19 12:31:02 | 127.0.0.2 |
|------|----------------|----------------|---------------------|-----------|
| 15   | user_ads       | clicked        | 2010-04-19 12:31:03 | 127.0.0.2 |
|------|----------------|----------------|---------------------|-----------|
| 15   | user_ads       | clicked        | 2010-04-19 12:31:04 | 127.0.0.2 |
|------|----------------|----------------|---------------------|-----------|
| 15   | user_ads       | clicked        | 2010-04-19 12:31:05 | 127.0.0.2 |
|------|----------------|----------------|---------------------|-----------|
| 2    | user           | blocked        | 2010-04-20 03:19:00 | 217.0.0.1 |
|------|----------------|----------------|---------------------|-----------|
| 2    | user           | deleted        | 2010-04-20 03:20:00 | 217.0.0.1 |
|------|----------------|----------------|---------------------|-----------|

Essentially the ID refers back to the primary key (id) area within the TABLE table, In my opinion the relaxation ought to be pretty straightforward. One factor that I have arrived at as with this design is will be able to keep an eye on all of the user logins rather than only the 4g iphone, and therefore generate some interesting metrics with this data.

Because of the growing character from the events table I additionally considered making some optimizations, for example:

  • #9: Since there's merely a finite quantity of tables along with a finite (and predetermined) quantity of occasions, the TABLE and EVENTS posts might be setup as ENUMs rather than VARCHARs in order to save some space.
  • #14: Store IPs as UNSIGNED INTs with INET_ATON() rather than VARCHARs.
  • Store DATEs as TIMESTAMPs rather than DATETIMEs.
  • Make use of the ARCHIVE (or even the CSV?) engine rather than InnoDB / MyISAM.
    • Only INSERTs and SELECTs are supported, and information is compressed quickly.

Overall, each event would only consume 14 (uncompressed) bytes that is okay for my traffic I suppose.

Pros:

  • Capability to store more in depth data (for example logins).
  • You don't need to design (and code for) almost twelve additional tables (dates and statistics).
  • Reduces a couple of posts per table and keeps volatile data separated.

Cons:

  • Non-relational (still less bad as EAV):
    • SELECT * FROM events WHERE id = 2 AND table = 'user' ORDER BY date DESC();
  • 6 bytes overhead per event (ID, TABLE and EVENT).

I am more inclined to choose this method because the pros appear to far over-shadow the cons, but I am still a bit reluctant... Am I missing something? What exactly are your ideas about this?

Thanks!


@coolgeek:

One factor which i do slightly in a different way would be to maintain an entity_type table, and employ its ID in the item_type column (inside your situation, the 'TABLE' column). You would like to perform the same factor by having an event_type table.

Simply to be obvious, you mean I ought to add one more table that maps which occasions are permitted inside a table and employ the PK of this table within the occasions table rather than getting a TABLE / EVENT pair?


@ben:

They are all statistics based on existing data, are they not?

The extra tables are mainly associated with statistics however i the information does not already is available, some good examples:

user_ad_stats                          user_post_stats
-------------                          ---------------
user_ad_id (FK)                        user_post_id (FK)
ip                                     ip
date                                   date
type (impressed, clicked)

Basically drop these tables I have not a way to keep an eye on who, what or when, unsure how sights might help here.

To be sure that it must be separate, but more since it is essentially different data. What someone is and what someone does are two different things. I do not think unpredictability is really important.

I have heard it for both and that i could not find anything within the MySQL manual that states that each one is appropriate. Anyway, To be sure along with you that they must be separated tables simply because they represent types of data (using the additional advantage to be more in depth than the usual regular approach).

I believe you are missing the forest for the trees, as they say.

The predicate for the table could be "User ID from IP IP sometimes DATE EVENTed to TABLE" which appears reasonable, but you will find issues.

Things I intended for "less bad as EAV" is the fact that all records consume a linear structure and they're pretty simple to query, there's no hierarchical structure so that all queries can be achieved having a simple SELECT.

Relating to your second statement, I believe you understood me wrong here the Ip isn't always connected using the user. The table structure should read something similar to this:

Ip (IP) did something (EVENT) towards the PK (ID) from the table (TABLE) on date (DATE).

For example, within the last row of my example above it will read that IP 217...1 (some admin), erased the consumer #2 (whose last known IP is 127...2) at 2010-04-20 03:20:00.

You are able to still join, say, user occasions to customers, however, you can't implement a foreign key constraint.

Indeed, that's my primary concern. However I am not totally sure so what can fail with this particular design that could not fail having a traditional relational design. I'm able to place some caveats but as lengthy because the application playing using the database knows what it's doing I suppose there should not be any problems.

Another factor that counts within this argument is the fact that I'll be storing a lot more occasions, and every event will a lot more than double in comparison towards the original design, it can make sense to make use of the ARCHIVE storage engine here, the only real factor could it be does not support FKs (neither UPDATEs or DELETEs).