I've a credit card applicatoin that imports large volumes of information daily, several 100 1000's records.
Data originates from different sources. The information is read using C#, then bulk placed in to the database.

This information is then processed:

  • different tables are linked
  • new tables are produced
  • information is remedied using complicated algorithmns (totals of certain tables need to total zero)

The majority of this processing is completed in saved methods.
Although a few of the complex processing could be simpler in C#, the extraction from the data right into a dataset and it is reinjection would slow things lower substantially.
You might request why I don't process the information before placing it in to the database, but I don't think it practical to control 100,000s of records in memory, and also the SQLs set based instructions help when designing plenty of records.

This can most likely spark up the common question of utilizing saved methods as well as their benefits and drawbacks. (eg. How can you unit test saved methods?)

What I'd like in reaction, is the knowledge about large volumes of information and just how you handled the issue.

I'd use SSIS or DTS (presuming you're speaking about MSSQL). They're made for your purpose and use SPs if you want them.

An alternative choice would be to preprocess the information using Perl. Despite the fact that it may sound just like a wierd suggestion, Perl is really very fast during these situations. I have tried on the extender previously to process vast amounts of records in reasonable time (i.e. days rather than days).

Regarding "How can you Unit Test store methods", you unit test them out with MBUnit like other things. Only little bit of advice: the setup and rollback from the data could be tricky, you may either make use of a DTS transaction or explicit SQL claims.

I'd have to accept Skliwz if this involves doing things in MSSQL. SSIS and DTS are what you want, but when you don't know individuals technologies they may be cumbersome to utilize. However, there's an alternate that will permit you to perform the processing in C#, but still keep the data within SQL Server.

Should you think the processing could be simpler in C# then you might want to consider utilizing a SQL Server Project to produce database objects using C#. You will find lots of really effective steps you can take with CLR objects within SQL Server, which would permit you to write and unit test the code before it ever touches the database. You are able to unit test out your CLR code within Versus using the standard unit testing frameworks (NUnit, MSTest), and it's not necessary to write a lot of setup and destroy scripts that may be hard to manage.

So far as testing your saved methods I'd honestly consider DBFit for your. Your database does not need to be a black hole of untested functionality anymore :)

In which you process data is dependent greatly on which you are doing. If you want, for instance, to discard data which you wouldn't want inside your database, you would then procedure that inside your C# code. However, data to process within the database should generally be data that ought to be "implementation agnostic". Therefore if another person really wants to place data from the Java client, the database should have the ability to reject bad data. Should you put that logic to your C# code, the Java code will not learn about it.

Many people object and say "but I'll never use another language for that database!" Even when that's true, you will still have DBAs or designers dealing with the database and they're going to get some things wrong when the logic is not there. Or perhaps your new C# developer will attempt to shove in data and never learn about (or simply ignore) data pre-processors designed in C#.

In a nutshell, the logic you devote your database ought to be enough to ensure the information is correct without depending on exterior software.