I've got a CSV file that is about 1GB large and consists of about 50million rows of information, I'm wondering is it more beneficial to help keep it as being a CSV file or store it as being some type of a database. I'm not sure a good deal about MySQL to argue why I ought to utilize it or any other database framework over just keeping it as being a CSV file. I'm essentially carrying out a Breadth-First Search with this particular dataset, so after i obtain the initial "seed" set the 50million I personally use this because the first values during my queue.


I'd state that you will find a multitude of good things about utilizing a database on the CSV for such large structured data so I recommend that you simply learn enough to do this. However, according to your description you might like to take a look at non-server/lighter in weight databases. For example SQLite, or something like that much like JavaDB/Derby... or with respect to the structure of the data a non-relational (Nosql) database- clearly you'll need one with some form of python support though.

If you wish to explore something graph-ant (because you mention Breadth-First Search) a graph database might prove helpful.

Are you currently just likely to slurp in everything all at one time? If that's the case, then CSV is most likely what you want. It is rather simple and works.

If you want to do searches, then something which allows you index the information, like MySQL, could be better.

Out of your previous questions, it appears like you do social-network searches against facebook friend data and so i presume your computer data is some 'A is-friend-of B' claims, and you're simply searching for a least link between two people?

For those who have enough memory, I recommend parsing your csv file right into a dictionary of lists. See Can this breadth-first search be made faster?

If you fail to hold all of the data at the same time, a nearby-storage database like SQLite is most likely the next-best alternative.

You will find several python modules that might help:

What about some key-value storages like MongoDB