I am writing a Django-ORM enchancement that tries to cache models and postpone model saving before the finish from the transaction. It's all regulated almost done, however discovered an unpredicted difficulty in SQL syntax.

I am not a DBA, but from things i understand, databases don't actually work effectively for a lot of small queries. Couple of bigger queries tend to be better. For instance it's easier to use large batch card inserts (say 100 rows at the same time) rather than 100 one-inserts.

Now, from what I can tell, SQL does not really supply any statement to carry out a batch update on the table. The word appears to become confusing so, I'll explain what i'm saying with that. I've a range of arbitrary data, each entry explaining just one row inside a table. Let me update certain rows within the table, each using data from the corresponding entry within the array. The concept is much like a load place.

For instance: My table might have two posts "id" and "some_col". The array explaining the information for any batch update includes three records (1, 'first updated'), (2, 'second updated'), and (3, 'third updated'). Prior to the update the table consists of rows: (1, 'first'), (2, 'second'), (3, 'third').

I found this publish:

Why are batch inserts/updates faster? How do batch updates work?

which appears to complete things i want, however can't really determine the syntax in the finish.

I possibly could also remove all of the rows that need upgrading and reinsert them utilizing a batch place, however fight to think that this could really perform much better.

Sometimes with PostgreSQL 8.4, so some saved methods will also be possible here. However when i intend to free the project eventually, anymore portable ideas or methods to perform the same factor on the different RDBMS are most welcome.

Follow-up question: How you can perform a batch "place-or-update"/"upsert" statement?

Test results

I have carried out 100x occasions 10 place procedures spread over 4 different tables (so 1000 card inserts as a whole). I examined on Django 1.3 having a PostgreSQL 8.4 after sales.

Fundamental essentials results:

  • All procedures carried out by Django ORM - each pass ~2.45 seconds,
  • Exactly the same procedures, but done without Django ORM - each pass ~1.48 seconds,
  • Only place procedures, without querying the database for sequence values ~.72 seconds,
  • Only place procedures, performed in blocks of 10 (100 blocks as a whole) ~.19 seconds,
  • Only place procedures, one large execution block ~.13 seconds.
  • Only place procedures, about 250 claims per block, ~.12 seconds.

Conclusion: execute as numerous procedures as you possibly can in one connection.execute(). Django itself introduces a considerable overhead.

Disclaimer: I did not introduce any indices aside from default primary key indices, so place procedures might improve your speed due to that.

I have used 3 methods for batch transactional work:

  1. Generate SQL claims quickly, concatenate all of them with semicolons, after which submit the claims over night. I have done as much as 100 card inserts in by doing this, also it was quite efficient (done against Postgres).
  2. JDBC has batching abilities built-in, if set up. Should you generate transactions, you are able to flush your JDBC claims to ensure that they transact over night. This plan requires less database calls, because the claims are performed in a single batch.
  3. Hibernate will also support JDBC batching like the prior example, however in this situation you perform flush() method from the Hibernate Session, not the actual JDBC connection. It achieves exactly the same factor as JDBC batching.

Incidentally, Hibernate will also support a batching strategy in collection fetching. Should you annotate an assortment with @BatchSize, when fetching associations, Hibernate uses IN rather than =, resulting in less SELECT claims to stock up the collections.