I must implement searching feature which has the capacity to rapidly perform arbitrary complex queries to XML-data. When the user constitutes a query, all XML files should be looked to locate possible matches. The customers may have plenty of XML-Files (a couple of 10000 or even more) that are typically a couple of kilobytes in dimensions. All of the XML-files have almost exactly the same structure.
I already benchmarked XPath, it's not fast enough for me.
How will it be achieved most effectively? It's easy to create indexes for that items in the XML files (protecting content semantics, not only plain fulltext search)?
Could it be helpful to place the XML data into an (embedded) SQL database and perform the queries with SQL?
The other options have i got?
Do not attempt an re-invent the wheel!
I'd import the XML right into a database(eg SQLite) (plus meta data, XML information), and query that.
You can implement a 'drop folder' that is 'indexed'/imported upon first run. A Folder watcher could be carried out to ONLY update new/changes to XML files. SQLite could be run in memeory for that quickest I/O performance.
The quickest strategy is to produce your personal in memory type of data obtainable in XML, convert it to simple objects and straightforward types, and organize it within the structure that meets your queries best. Index it furthermore as right for your condition (using Dictionary/SortedDictionary). This method is going to be considerably faster then your one with using SQL database, and taking advantage of SQL database may also be much faster then querying each XML. With respect to the complexity of the queries, this might vary from a reasonably simple factor to complete, to some very difficult by which situation you need to certainly choose embedded database.
The SQL Server 2005+ enables for creating XML indexes. The queries could be carried out around the SQL server, without locating the XML data around the application side. This feature exists within the free Express edition.
For indexing the items in xml: use Lucene (along with a .internet based implementation from it). This will help you to rapidly retrieve the xml paperwork which contain some specific values then you definitely might be more conscious of these ones.