Consider an e-commerce application with multiple stores. Each store owner can edit the product catalog of his store.

My current database schema is the following:

item_names: id | name | description | picture | common(BOOL)
items: id | item_name_id | picture | price | description | picture
item_synonyms: id | item_name_id | name | error(BOOL)

Notes: error signifies an incorrect spelling (eg. "Ericson"). description and picture from the item_names table are "globals" that may optionally be overridden by "local" description and picture fields from the items table (just in case the shop owner really wants to give you a different picture to have an item). common helps separate unique item names ("Jimmy Joe's Cheese Pizza" from "Cheese Pizza")

I believe the good side of the schema is:

Enhanced searching &lifier Handling Synonyms: I'm able to query the item_names &lifier item_synonyms tables using name LIKE %QUERY% and acquire their email list of item_name_ids that should be became a member of using the items table. (Good examples of synonyms: "The new sony Ericsson", "The new sony Ericson", "X10", "X 10")

Autocompletion: Again, an easy query towards the item_names table. I'm able to avoid using DISTINCT also it minimizes quantity of versions ("The new sony Ericsson Xperia™ X10", "The new sony Ericsson - Xperia X10", "Xperia X10, The new sony Ericsson")

The lower side could be:

Overhead: When placing a product, I query item_names to ascertain if this title already is available. Otherwise, I produce a new entry. When removing a product, I count the amount of records with similar title. If this sounds like the only real item with this title, I remove the entry in the item_names table (simply to keep things clean makes up about possible erroneous distribution). And upgrading may be the mixture of both.

Strange Item Names: Store proprietors sometimes use sentences like "Harry Potter 1, 2 Books + Compact disks + Miracle Hat". There is something off about getting a lot overhead to support cases such as this. This could possibly function as the prime reason I am enticed to choose a schema such as this:

items: id | name | picture | price | description | picture

(... with item_names and item_synonyms as utility tables which i could query)

  • It is possible to better schema you'd recommended?
  • Should item names be stabilized for autocomplete? Is most likely what Facebook does for "School", "City" records?
  • May be the first schema or even the second better/optimal for search?

Thanks ahead of time!

References: (1) Is normalizing a person's name going too far?, (2) Avoiding DISTINCT

EDIT: In case of 2 products being joined concentrating on the same names, an Admin who sees this simply clicks "Make Synonym" that will convert among the names in to the synonym from the other. I do not require a method to instantly identify if the joined title may be the synonym from the other. I am wishing the autocomplete will require proper care of 95% of these cases. Because the table set increases in dimensions, the necessity to "Make Synonym" will decrease. Hope that clears the confusion.

UPDATE: To individuals who want to understand what I went ahead with... I have gone using the second schema but removed the item_names and item_synonyms tables hoping that Solr will give you me having the ability to perform all of the remaining tasks I want:

items: id | name | picture | price | description | picture

Thanks everybody for that help!

If there have been more characteristics uncovered for mapping, I recommend utilizing a fast search index system. You don't need to set aliases as the records are added, the characteristics simply get indexed and every search released returns matches having a relevance score. Go ahead and take top X% as valid matches and display individuals.

Creating and storing aliases appears just like a brute-pressure, labor intensive approach that most likely will not have the ability to adapt to the requirements of your customers.