My needs are:

  • Have to have the ability to dynamically add User-Defined fields associated with a data type
  • Have to have the ability to query UDF fields rapidly
  • Have to have the ability to do information on UDF fields according to datatype
  • Have to have the ability to sort UDF fields according to datatype

Additional Information:

  • I am searching for performance mainly
  • You will find a couple of million Master records which could have UDF data attached
  • After I last checked, there have been over 50mil UDF records within our current database
  • More often than not, a UDF area is just mounted on a couple of 1000 from the Master records, not every one of them
  • UDF fields aren't became a member of or used as secrets. They are just data employed for queries or reviews

Options:

  1. Produce a large table with StringValue1, StringValue2... IntValue1, IntValue2,... etc. I personally don't like this concept, and can contemplate it if a person will easily notice me it is best than other ideas and why.

  2. Produce a dynamic table which adds a brand new column when needed when needed. I additionally can't stand this concept since i have feel performance could be slow unless of course you indexed every column.

  3. Produce a single table that contains UDFFieldName, UDFDataType, and Value. Whenever a new UDFField will get added, generate a View which pulls exactly that data and parses it into whatever type is specified. Products which don't satisfy the parsing criteria return NULL.

  4. Create multiple UDF tables, one per data type. So we'd have tables for UDFStrings, UDFDates, etc. Most likely would do just like #2 and auto-generate a View when a new area will get added

  5. XML DataTypes? I've not labored with one of these before but have experienced them pointed out. Unsure if they'd produce the outcomes I would like, particularly with performance.

  6. Another thing?

If performance may be the first concern, I'd opt for #6... a table per UDF (really, this can be a variant of #2). This response is particularly customized for this situation and also the description from the data distribution and access designs referred to.

Pros:

  1. Since you indicate that some UDFs have values for any small part of the general data set, another table would provide you with the best performance because that table will be only as huge as it must be to aid the UDF. Exactly the same is true for that related indices.

  2. Additionally you obtain a speed boost by restricting the quantity of data that needs to be processed for aggregations or any other changes. Splitting the information out into multiple tables allows you perform a few of the aggregating along with other record analysis around the UDF data, then join that lead to the actual table via foreign key to find the non-aggregated characteristics.

  3. You should use table/column names that reflect exactly what the data really is.

  4. You've complete control to make use of data types, check constraints, default values, etc. to define the information domain names. Don't underestimate the performance hit caused by on-the-fly data type conversion. Such constraints also help RDBMS query optimizers develop more efficient plans.

  5. If you ever want to use foreign secrets, built-in declarative referential integrity isn't out-carried out by trigger-based or application level constraint enforcement.

Cons:

  1. This may create lots of tables. Enforcing schema separation and/or perhaps a naming convention would alleviate this.

  2. There's more application code required to operate the UDF definition and management. I expect this really is still less code needed compared to the original options 1, 3, &lifier 4.

Other Factors:

  1. If there's anything concerning the character from the data that will make sense for that UDFs to become arranged, that needs to be urged. This way, individuals data elements could be combined right into a single table. For instance, let us if you have UDFs for color, size, and price. The inclination within the information is that many cases of this data appears like

     'red', 'large', 45.03
    
    

    instead of

     NULL, 'medium', NULL
    
    

    In this situation, you will not get in a noticeable speed penalty by mixing the three posts in 1 table because couple of values could be NULL and you avoid making 2 more tables, that is 2 less joins needed when you have to access all 3 posts.

  2. Should you hit a performance wall from the UDF that's heavily populated and commonly used, then that needs to be considered for inclusion within the master table.

  3. Logical table design may take you to definitely a particular point, however when the record counts get truly massive, additionally you should start searching at what table partitioning choices are supplied by your RDBMS of preference.