Hi im writing an internet crawler in python to extract news articles from news websites like nytimes.com. i wish to understand what will be a good db for a after sales with this project?
Thanks ahead of time!
This may be an excellent project to utilize a document database like CouchDB, MongoDB, or SimpleDB.
SimpleDB is a superb choice if you're hosting this on Amazon . com Web Services
CouchDB is definitely an free package in the Apache Foundation.
I believe the database itself will most likely be among the simpler facets of an internet crawler such as this.
If expect high load reading through or writing the database (for instance if you plan to operate many spiders simultaneously) then you will need to steer in direction of MySql, otherwise something similar to Sqlite will most likely would you all right.