Greetings stackers,

I am attempting to develop the very best database schema to have an application that allows customers create surveys and offer these to the general public. You will find a lot of "standard" demographic fields that many surveys (although not all) includes, like Name, Surname, etc. Not to mention customers can make an limitless quantity of "custom" questions.

The very first factor I figured of is one thing such as this:

Survey
  ID
  SurveyName

SurveyQuestions
  SurveyID
  Question

Responses
  SurveyID
  SubmitTime

ResponseAnswers
  SurveyID
  Question
  Answer

But that will suck each time I wish to query data out. Also it appears precariously near to Inner Platform Effect

A noticable difference is always to include as numerous fields when i can think about ahead of time within the reactions table:

Responses
  SurveyID
  SubmitTime
  FirstName
  LastName
  Birthdate
  [...]

Then a minimum of queries for data from all of these common posts is easy, and that i can query, say, the typical chronilogical age of everybody whatever person clarified any survey where they gave their birthdate.

However it appears such as this will complicate the code a little. Now to determine what questions are requested inside a survey I must check which common response fields are enabled (using, I suppose, a bitfield in Survey) AND what's within the SurveyQuestions table. And That I need to bother about special cases, like if a person tries to produce a "custom" question that replicates a "common" question within the Reactions table.

Is the very best I'm able to do? Shall We Be Held missing something?

The first schema is the foremost choice of these two. At this time, you should not be worried about performance problems. Be worried about creating a good, flexible, extensible design. You will find a variety of methods that you can do later to cache data making queries faster. Utilizing a stiffer database schema to be able to solve a performance problem that won't even materialize is really a bad decision.

Besides, many (possibly most) survey answers are only seen periodically by a small amount of people (event coordinators, managers, etc.), which means you will not constantly be querying the database its the outcomes. And even when you had been, the performance is going to be fine. You'd most likely paginate the outcomes in some way anyway.

The very first schema is a lot more flexible. You are able to, automatically, include questions like title and address, however for anonymous surveys, you can not create them. When the survey creator really wants to only view everyone's solutions to 3 questions from 500, this is a rather easy SQL query. You can setup a cascading down remove to instantly removing reactions and questions whenever a survey is erased. Producing statistics is going to be much simpler with this particular schema too.

This is a slightly modified version from the schema you provided. I suppose you are able to evaluate which data types go where :-)


    surveys

      survey_id (index)

      title

    questions

      question_id (index, auto increment)

      survey_id (connect to surveys->survey_id)

      question

    reactions

      response_id (index, auto increment)

      survey_id (connect to surveys->survey_id)

      submit_time

    solutions

      answer_id (index, auto increment)

      question_id (connect to questions-question_id)

      answer

I recommend you usually have a stabilized method of your database schema after which later made the decision if you want to produce a solution for performance reasons. Premature optimisation could be harmful. Premature database p-normalization could be disastrous!

I recommend that you simply stick to the initial schema and then, if required, produce a confirming table that's a p-stabilized version of the stabilized schema.

One change that might help simplify things is always to not link the ResponseAnswers to the SurveyID. Rather, create an ID per response and per question and allow your ResponseAnswers table retain the fields ResponseID, QuestionID, Answer. Even though this would require keeping unique Identifiers for every unit it might help to keep things a bit more stabilized. The response solutions do you don't need to connect using the survey these were responding to only the specific question they're responding to and also the response information that they're connected.