I've read many blog and articles concerning the benefits and drawbacks of Amazon . com EC2 versus Microsoft Azure (and Google's Application Engine). However, I'm attempting to choose which would better suite my particular situation.

I've got a data set - which may be regarded as a typical table from the format:

[id]  [title]  [d0]  [d1]  [d2] .. [d63]


     Name1   .43 -.22  .11   -.81

1     Name2   .23  .65  .62    .41

2     Name3  -.13 -.23  .17    .00


N     NameN   .43 -.23  .12    .01

I ultimately wish to accomplish something which (despite my final selected stack) would associate for an SQL Choose statement much like:

Choose title FROM [table] WHERE (d0*QueryParameter1) + (d1*QueryParameter1) +(d2*QueryParameter2) + ... + (dN*QueryParameterN) < .5

where QueryParameter1,2,N are parameters provided at runtime, and alter every time the totally run (so caching is unthinkable).

My primary problem is using the speed from the query, so I'd like suggestions about which cloud stack option provides the quickest query result possible.

I'm able to do that various ways:

  • (1) Use SQL Azure, just like the query lies above. I've attempted this process, and also the queries can be very slow not surprisingly since SQL only provides you with just one instance. I'm able to spin up multiple cases of SQL and shard the information, but that will get real costly here real quick.
  • (2) Use Azure Storage Tables. Writers claim storage tables are faster generally, but would this still be for my query needs?
  • (3) Use EC2 and spin up several instances with MySQL, possibly integrating sharding to new instances (cost increases though).
  • (4) Use EC2 with MongoDB, as I have see clearly is faster than MySQL. Again this really is most likely determined by the kind of query.
  • (5) Google AppEngine. I am not necessarily sure how GAE works with this particular query structure, however i guess this is exactly why I'm searching for opinions.

I would like to get the best stack combination to optimize my specific need (layed out through the pseudo SQL query above).

Does anybody have experience of this? Which stack option would increase the risk for quickest query that contains many math operators within the WHERE clause?

Cheers, Brett

Your kind of query with dynamic coefficients (weights) will need the whole table to become scanned on every query. A SQL database engine won't assist you to here, because there's really nothing the query optimizer can perform.

Quite simply, the thing you need isn't a SQL database, however , a "NoSQL" database which really maximizes table/row use of the quickest speed possible. Which means you really should not need to try SQL Azure and MySQL to discover this area of the answer.

Also, each row inside your kind of totally completely independent from one another, therefore it gives itself to simple parallelism. The selection of platform ought to be whichever provides you with:

  1. Table/row scan in the quickest speed
  2. Capability to highly parallelize your operation

Each platform you pointed out provides you with capability to store immeasureable blob or table-like data for extremely fast scan retrieval (e.g. table storage in Azure). Each also provides you with a chance to "spin up" multiple instances to process them in parallel. It is dependent which programming atmosphere you are preferred in (e.g. Java in the search enginesOrAmazon . com, .Internet in Azure). Essentially all of them perform the same factor.

My own recommendation is Azure, as you're able to:

  1. Store massive levels of data in "table storage", enhanced for fast scan retrieval, and partitioned (e.g. over d0 ranges) for optimal parallelism
  2. Dynamically "spin up" as numerous compute instances as you desire to process the information in parallel
  3. Queueing systems to synchronize the outcomes collation

Azure does that which you requires in an exceedingly "no-extras" way -- supplying sufficient infrastructure to do your work, and absolutely nothing more.