I am doing a bit of data munging which may be a great deal simpler basically could stick a lot of dictionaries within an in-memory database, then run simply queries against it.

For instance, something similar to:

people = db([
    {"name": "Joe", "age": 16},
    {"name": "Jane", "favourite_color": "red"},
])
over_16 = db.filter(age__gt=16)
with_favorite_colors = db.filter(favorite_color__exists=True)

You will find three confounding factors, though:

  • A few of the values is going to be Python objects, and serializing them is unthinkable (not fast enough, breaks identity). Obviously, I possibly could deal with this (eg, by storing all of the products inside a large list, then serializing their indexes for the reason that list… But that may have a lot of fiddling).
  • You will see 1000's of information, and I'll be running research-heavy procedures (like graph traversals) against them, therefore it must be easy to perform efficient (ie, indexed) queries.
  • As with the example, the information is unstructured, so systems which require me to predefine a schema could be tricky.

So, does this type of factor exist? Or should i have to kludge something together?

How about utilizing an in-memory SQLite database through the sqlite3 standard library module, while using special value :memory: for that connection? If you won't want to write your on SQL claims, you could make use of an ORM, like SQLAlchemy, to gain access to an in-memory SQLite database.

EDIT: I observed you mentioned the values might be Python objects, and that you need staying away from serialization. Needing arbitrary Python objects be saved inside a database also necessitates serialization.

Can One propose an operating solution should you must keep individuals two needs? Why don't you only use Python dictionaries as indices to your assortment of Python dictionaries? It may sound like you'll have idiosyncratic needs for building all of your indices evaluate which values you are likely to query on, then write a function to create and index for every. The potential values for just one type in your listing of dicts would be the secrets to have an index the values from the index is going to be a listing of dictionaries. Query the index by providing the worthiness you are searching for because the key.

import collections
import itertools

def make_indices(dicts):
    color_index = collections.defaultdict(list)
    age_index = collections.defaultdict(list)
    for d in dicts:
        if 'favorite_color' in d:
            color_index[d['favorite_color']].append(d)
        if 'age' in d:
            age_index[d['age']].append(d)
    return color_index, age_index


def make_data_dicts():
    ...


data_dicts = make_data_dicts()
color_index, age_index = make_indices(data_dicts)
# Query for those with a favorite color is simply values
with_color_dicts = list(
        itertools.chain.from_iterable(color_index.values()))
# Query for people over 16
over_16 = list(
        itertools.chain.from_iterable(
            v for k, v in age_index.items() if age > 16)
)