I'm wondering exactly what the how to store graphs in persistent storage are, for later analysis, search, clustering, etc.
I see neo4j becoming an option, I'm curious if you will find also other graph databases available. Does anybody have experience into how bigger internet sites store their graph based data (or any other sites that need the storage of graph like models, e.g. RDF).
How about options like Cassandra, or MySQL?
- HyperGraphDB: an over-all purpose, extensible, portable, distributed, embeddable, open-source data storage mechanism.
- InfoGrid: an online Graph Database having a many additional software components which make the introduction of Relaxation-ful web programs on the graph foundation easy.
- vertexdb: a higher performance graph database server that supports automatic garbage collection.
- WebGraph is really a framework to review the net graph. Using their page - "It offers approaches to manage large graphs, taking advantage of modern compression techniques."
- Dex is really a high end library to handle large graphs or systems.
- This web site publish - On Building a Stupidly Fast Graph Database - provides some recommendations on creating a graph database - the strategy they will use is "memory-planned I/O, disk-based linear-hashing".
Disclaimer: I'm speaking make up the graph analysis perspective.
You will find several file formats for storing graph data: GraphML, GXL and many others. But storage usually isn't a problem. Dealing with the graphs without fully loading them into RAM may be the tricky part.
The RDF model is too generic to complete serious graph analysis stuff. If you do not mind your analysis being slow and programming the calculations yourself, opt for the present graph databases - see wikipedia about this.
You could think about InfiniteGraph, which is launched for beta soon (http://www.infinitegraph.com/)
If this sounds like for commercial use then you will see it's specific towards sites which will have bigger graphs. The social networks built custom solutions, which labored on their behalf at that time. But they are in-house solutions tend to be more restricting than using something similar to InfiniteGraph. Items like Cassandra or MySQL were not created for this many-to-many problem set. Can you're doing so? Sure, but it is lots of hands-written coding, and never scalable. Tell us for those who have a genuine project, we can help you determine you graph needs. Thanks, Warren firstname.lastname@example.org