I am thinking about the thought of developing a persistent storage just like a dbms engine, what will be the benefits to produce a custom binary format over directly cPickling the item and/or while using shelve module?

Pickling is really a two-face gold coin.

On one for reds, you have the means to store your object in an exceedingly smart way. Just four lines of code and also you pickle. You will find the object exactly because it is.

On the other hand, it may be a compatibility nightmare. You can't unpickle objects if they're not defined inside your code, just as these were defined when pickled. This strongly limits what you can do to refactor the code, or arrange stuff inside your modules. Also, not everything could be pickled, and when you aren't strict on which will get pickled and also the client of the code has full freedom of including any object, eventually it'll pass something unpicklable for your system, and also the system goes boom.

Be cautious about its use. there is no better meaning of fast and dirty.

One good reason to define your personal custom binary format might be optimisation. pickle (and shelve, which utilizes pickle) is really a generic serialization framework it may store just about any Python data. It's not hard to use pickle in many situations, however it needs time to work to examine all of the objects and serialize their data and also the data is saved inside a generic, verbose format. If you're storing specific known data a custom-built serializer could be both faster and much more concise.

It requires 37 bytes to pickle an item having a single integer value:

>>> import pickle

>>> class Foo: pass...

>>> foo = Foo()

>>> foo.x = 3

>>> print repr(pickle.dumps(foo))

"(i__primary__nFoonp0n(dp1nS'x'np2nI3nsb."

Embedded for the reason that information is the title from the property and it is type. A custom serializer for Foo (and Foo alone) could eliminate might just keep number, saving both some time and space.

One more reason for any custom serialization framework is that you could easily do custom validation and versioning of information. Should you improve your object types and want to load a classic version of information it may be tricky via pickle. Your personal code can be simply personalized to deal with older data formats.

Used, I'd build something while using generic cPickle module and just change it if profiling indicated it had been vital. Maintaining another serialization framework is a lot of work.

The last resource you might find helpful: some synthetic serializer benchmarks. cPickle is fairly fast.

Observe that not every objects might be directly pickled - only fundamental types, or objects which have defined the pickle protocol.
Making use of your own binary format would permit you to potentially store any type of object.

Only for note, Zope Object DB (ZODB) is after that exact same approach, storing objects using the Pickle format. You might be thinking about getting their implementations.

The possibility benefits of a custom format on the pickle are:

  • you are able to selectively get individual objects, instead of needing to incarnate the entire group of objects
  • you are able to query subsets of objects by qualities, and just load individuals objects that suit your criteria

Whether these advantages materialize is dependent how you design the storage, obviously.