What kind of database schema can you use to keep e-mail, with just as much header information as practical/possible, right into a database?

Assume they have been given right into a script in the MTA and parsed in to the relevant headers/body/accessories.

Can you keep message body whole within the database table, or split any MIME-parts apart? How about accessories?

Is dependent on which you are likely to be doing by using it. If you are gonna need to do frequent searching against certain items of it, you will want to break up in ways which makes sense for the usage situation. Whether it's only for something similar to storage of e-mail for Sarbanes-Oxley compliance, you'd most likely be okay storing the entire factor - headers, parts, etc. - as you large text area.

You might want to look into the architecture and also the DB schema of "Archiveopteryx".

Suggestion: produce a well defined table for storing e-mail having a column for every relevant a part of a note: sender, header, subject, body. It will considerably simpler later if you wish to query, for instance, by subject area. Within the same table you are able to define a area to help keep the road of the attachment and keep attached file around the file system, instead of storing it in blob fields.

Everything is dependent on what you would like related to the information, however in general I may wish to store all data as well as make certain the semantics construed through the MUA are maintained within the db, so for instance: - All headers which are parsed must have their very own column - A column should retain the whole headers - The accessories (including body, multipart) ought to be inside a many to 1 table using the email table.

You might want to make use of a schema in which the message body and attachment records could be shared between multiple readers around the message. It isn't uncommon to determine email servers where fully 50% from the disk storage can be used by duplicate emails.

An easy hash from the body/attachment could be enough to ascertain if that record had been within the database. However, you'd still have to keep separate headers.

If it's already separate, and you can be certain the routine to separate the information is seem, i quickly would separate the table as granular as you possibly can. You could parse it together again inside your middle tier. If space isn't an problem, you can always store it two times. One, put into the appropriate fields, and the other area which has the entire factor as you blob, if putting it together again is difficult.

It's not trivial to parse an e-mail, so consider storing the e-mail like a blob then parse it into whatever pieces you'll need later on.


You'll most likely wish to a minimum of store accessories individually to optimize storage. It's astonishing to determine the dimensions and volume of accessories (videos, etc.) that many customers unhesitatingly affix to emails.

Within the situation of outgoing emails you might have multiple emails delivering exactly the same attachment. It is more effective to keep just one copy from the attachment that's recommended by all emails that share it.

One more reason for storing accessories individually is it provides you with some archiving options afterwards. Should space for storage become an problem, you could return and remove large accessories over the age of confirmed date to be able to compact the database.

An essential part of database schema design is to determine what kinds of entity you need to model. With this application the organizations may be:

  • Messages
  • E-mail addresses
  • Conversation threads (possibly: if you wish to do efficient threads)
  • Accessories (possibly: as recommended in other solutions)
  • ...

Knowing the organizations, you are able to identify associations between organizations, which may be symbolized by tables:

  • Messages possess a many-many relationship to messages (In-Reply-To and References headers).
  • Messages possess a many-many relationship to e-mail addresses (From, To, Cc etc headers).
  • Messages possess a many-one relationship with threads.
  • Messages possess a many-many relationship with accessories.
  • ...