I am on the project which involves time series statistics, and I have to have the ability to let customers upload personal files that contains their very own time series (ie amounts with dates), for example inside a .csv file. Data found in their files would then be around anytime, for use inside our system.
How could I actually do that? The minds I have considered:
- Produce a table every time a user upload personal files (and save somewhere the title of this table). Basically have plenty of customers uploading plenty of data, I might finish track of a lot of tables.
- Create one large body fat monster table with essentially 3 or 4 posts: the date from the value the worthiness the dataset title (and/or even the dataset's owner). Things are submitted for the reason that table, so when Bob needs its weather data I simply choose (date,value) where owner = Bob and datasetname = weatherdata.
- Among solution: one table per user, and all sorts of Bob's datasets have been in Bob's table.
- Different: just save the .csv file somewhere and browse it when it's needed.
I keep reading through it's bad practice to possess a different quantity of tables (but it). However my situation is slightly not the same as other questions I have seen on this website (many people appears to are thinking about creating one table per user, once they should create one row per user).
Some more information:
- time series data could have 100s of 1000's findings, maybe millions
- a priori, saved data shouldn't be modified later on. However guess it might be helpful to allow customers append new data for their time series.
- a priori, I will not have to do complicated SQL choose claims. Among the finest to see Bob's weather data and I'll most likely utilize it within the chronological order - although who knows what tomorrow would bring.
- using PostgreSQL 9.1, if that is associated with a importance.
EDIT Reading through some solutions I recognize I might haven't done my job perfectly, I ought to have stated that I am clearly already changing inside a SQL atmosphere I curently have a person table after i write "table" I truly mean "relation" my 4 ideas involve foreign secrets somewhere and RDBMS normalization may be the paradigm unless of course another thing is much better. (All of this not meaning I am against not-sql solutions).
I am going to need to opt for the "large body fat monster table". This is the way relational databases should work, even though you should normalize it (create one table for customers, another for data sets, and the other for that data points). Getting multiple tables with identical schemas is an awful idea all angles - design, management, security, even querying are you currently sure you may never wish to mix information from two data sets?
If you're really sure that each data set is going to be totally isolated then you could also consider not using SQL whatsoever. HDF (hierarchical data format) was literally designed for this exact purpose, efficient storage and retrieval of "scientific data sets" that are very frequently time-series data. "Tables" in HDF are actually known as data sets, they are able to share definitions, they may be multidimensional (e.g. one dimension during the day, one for that time), and they're less expensive than SQL tables.
I do not normally attempt to steer people from SQL, but unusual situations sometimes demand unusual solutions. If you are likely to finish track of billions of rows inside a SQL table (or even more) and you've got practically no other data to keep, then SQL might not be the best solution for you personally.
Example T-SQLRegarding * a potential design:
CREATE TABLE dbo.Datasets ( ID int NOT NULL IDENTITY(1,1), OwnerUserID int NOT NULL, Loaded datetime NOT NULL, CONSTRAINT FK_Datasets_Users FOREIGN KEY ( OwnerUserID ) REFERENCES dbo.Users ( ID ) ); CREATE TABLE dbo.DatasetValues ( DatasetID int NOT NULL, Date datetime NOT NULL, Value int NOT NULL, CONSTRAINT FK_DatasetValues_Datasets FOREIGN KEY ( DatasetID ) REFERENCES dbo.Datasets ( ID ) );
The look models two 'entities' implied inside your question – time series data being loaded and also the sets of your time series data.
*For SQL Server I understand you stated PostgreSQL 9.1, but I am confident you are able to translate easily enough.