I wish to classify some text. So I must compare it along with other texts. After representing texts as vectors how do i store them (very large lists of float values) to SQL database for implementing them later?

My idea is applying pickle module:

vector=text_to_vector(text)
present=pickle.dumps(big_list)
some_db.save(text_id,present)

#later
present=some_db.get(text_id)
vector=pickle.loads(present)

Could it be fast and effective basically have thousends of texts?

You might find that pickle and databases aren't effective too well together.

Python's pickle is perfect for serializing Python objects to some format, that may then be read in to Python objects by Python. Although it is easy to serialize with pickle, you cannot- query this serialized format, you cannot- see clearly right into a enter in another language. Take a look at cPickle, another Python module, for faster pickle-ing.

Databases, however, are ideal for persisting data in a way that it's queryable and non-language-specific. However the price is it's generally harder to obtainOrplace data into/in the database. This is exactly why there's special tools like SQL Alchemy, and endless blog-based debates concerning the benefits/disasters of Object-Relation-Mapping software.

Pickle-ing objects, after which delivering these to a database for example MySQL or SQL Server is most likely not recommended. However, take a look at shelve, another Python module for database-like persistence of Python objects.

So, to summarize:

  • use pickle or shelve if you need to simply save the information later with a Python program
  • map objects to some database if you wish to persist the information for general use, using the knowning that this involves more effort
  • performance-smart, cPickle will most likely conquer a database + object/relation mapping

*: a minimum of, not without lots of effort and/or special libraries.