I am storing some very fundamental information "data sources" entering my application. These data sources could be as a document (e.g. PDF, etc.), audio (e.g. MP3, etc.) or video (e.g. AVI, etc.). Say, for instance, I'm only thinking about the filename from the databases. Thus, I've the next table:
DataSource Id (PK) Filename
For every databases, I should also store a number of its characteristics. Example for any PDF could be "numbe of pages." Example for audio could be "bit rate." Example for video could be "duration." Each DataSource may have different needs for that characteristics that should be saved. So, I've patterned "databases attribute" by doing this:
DataSourceAttribute Id (PK) DataSourceId (FK) Name Value
Thus, I'd have records such as these:
DataSource->Id = 1 DataSource->Filename = 'mydoc.pdf' DataSource->Id = 2 DataSource->Filename = 'mysong.mp3' DataSource->Id = 3 DataSource->Filename = 'myvideo.avi' DataSourceAttribute->Id = 1 DataSourceAttribute->DataSourceId = 1 DataSourceAttribute->Name = 'TotalPages' DataSourceAttribute->Value = '10' DataSourceAttribute->Id = 2 DataSourceAttribute->DataSourceId = 2 DataSourceAttribute->Name = 'BitRate' DataSourceAttribute->Value '16' DataSourceAttribute->Id = 3 DataSourceAttribute->DataSourceId = 3 DataSourceAttribute->Name = 'Duration' DataSourceAttribute->Value = '1:32'
My problem is this fact does not appear to scale. For instance, say I have to query for the PDF documents together with thier final amount of pages:
Filename, TotalPages 'mydoc.pdf', '10' 'myotherdoc.pdf', '23' ...
The JOINs required to make the above result are simply too pricey. How must i address this issue?
It appears as if you want something a little more losse than the usual typical Relational db. Seems like a great candidate for something similar to Lucene or MongoDB. Lucene is definitely an index engine which enables any kind of document to become saved and indexed. MongoDB is incorporated in the middle between RDBMS and free-form document storage. JSON in certain form or any other (MongoDB is a great one) should fit nicely.
This may work, but define too pricey...
select datasource.id, d1.id as d1id, d1.value as d1filename, d2.id as d2id, d2.value as d2totalpages from datasource inner join datasourceattribute d1 on datasource.id = d1.datasourceid and d1.name = 'filename' inner join datasourceattribute d2 on datasource.id = d2.datasourceid and d2.name = 'totalpages' having d1filename like '%pdf'
Scaling is among the most typical issues with EAV (Entity-Attribute-Value) data structures. In a nutshell, you need to request for that meta data (i.e. locate the characteristics) to get at the information. However, this is a query which you can use to obtain the data you would like:
Select DataSourceId , Min( Case When Name = 'TotalPages' Then Value End ) As TotalPages , Min( Case When Name = 'BitRate' Then Value End ) As BitRate , Min( Case When Name = 'Duration' Then Vlaue End ) As Duration From DataSourceAttribute Group By DataSourceId
To be able to improve performance, you will want a catalog on DataSourceId and possibly Title too. To get at the outcomes you published, you'd do:
Select DataSource.FileName , Min( Case When DataSourceAttribute.Name = 'TotalPages' Then Value End ) As TotalPages , Min( Case When DataSourceAttribute.Name = 'BitRate' Then Value End ) As BitRate , Min( Case When DataSourceAttribute.Name = 'Duration' Then Vlaue End ) As Duration From DataSourceAttribute Join DataSource On DataSource.Id = DataSourceAttribute.DataSourceId Group By DataSource.FileName