I've multiple arrays about 100 possible values, ie:
a = (a, b, c, d) a = (a, e) a = (d, f, g)
I wish to FASTLY return which arrays consists of (a b) &lifier&lifier (d e)
within this example, and 1
I believed about bitwise procedures... like representing "abcd" by "1111" "ad" by "1001", and so forth. I Quickly could solve the "OR" with only a bitwise OR, after which see if both of them are non-zero
can anybody think on the better solution? that one is not very pratical because it does not appear to be really escalable
what are the DBMS that may do this rapidly? I attempted with mongodb, however it appears they did not add the "$and" function yet (doc states it's on version 1.9.1, however i are only able to download 1.9., and it is not stable anyway)
I guess this is a "boolean search", much like what the search engines do constantly... so I am speculating there's an easy method (not so quick, but more escalable) than that
Yes, a bitwise solution works quite nicely with this. Yes, some databases include this type of capacity, usually named a bitmapped column (or bitmapped index, depending). The typical advice is to use it to some column which has relatively low cardinality (i.e., a reasonably few possible values, for example sex).
With what sense isn't it scalable? 16 bytes of information per (bit)array is not bad! I am unsure why you'll need a DBMS with this you are able to put binary data inside if you want to (hopefully blocks of arrays), and pull everything to query. Unless of course you are thinking about getting vast amounts of arrays.
For small amounts of elements, bit logic is quickest. But when you begin going far beyond 100 values, then keeping the arrays sorted and doing binary (as well as linear!) search is going to be faster. You will need to benchmark in your system to obtain the exact cutoff point, but when your arrays have ~4 elements each, I generally find linear search faster (counting in the occurrences from the elements you are searching for within the boolean logic along the way), which it beats binary math at comparable point that binary representations also become bigger.
Store your arrays like a trie, e.g.,
a b c d e d f g
Produce a trie in the expression too, e.g.,
a b d e d e b d e
You are able to match the second trie from the former (disregarding any values that are not within the expression, i.e., 'c', 'f', and 'g') to find the solutions. I leave the particulars from the trie representation and also the matching formula your decision.
While you stated the potential values are about 100, however, you have large amount of arrays, I believe a hash table does much better than bit level operation(s).
possess a hash table set using the values in expression, i.e a, b set to at least one and d, e set to two.
for each array a in arrays for each value v in array sum+= ht[v] if sum == 3 print found break
(the above mentioned won't with replicates though!)
the very first for loop could be parallelized, most likely having a map-reduce framework as well as openMP.
(btw the 2nd for may also be parallelized!)
This ought to be faster than creating a little representation of entire elements within the array and doing As well as OR. You essentially benefit using the best situation (for eg a and d are the initial 2 elements!) worst situation being same for techniques (might be the if accomplished for every element be overhead)