My real question is relating to this subject I have been reading through about a little. Essentially my understanding is the fact that in greater dimensions every point finish up being not far from one another.
The doubt I've is whether or not which means that calculating distances the typical way (euclidean for example) applies or otherwise. Whether it remained as valid, this indicates when evaluating vectors in high dimensions, two of the most similar wouldn't differ much from the third one even if this third you could be completely unrelated.
Is correct? Then within this situation, how does one have the ability to tell whether you've got a match or otherwise?
Essentially the length measurement continues to be correct, however, it might be meaningless if you have "real life" data, that is noisy.
The result we discuss here's that the high distance between two points in a single dimension will get rapidly overshadowed by small distances in most another dimensions. This is exactly why ultimately, every point somewhat finish up with similar distance. There is available a great illustration with this:
Say you want to classify data according to their value in every dimension. We simply say we divide each dimension once (with a selection of ..1). Values in [, .5) are positive, values in [.5, 1] are negative. With this particular rule, in 3 dimensions, 12.5% from the space are covered. In Five dimensions, it is simply 3.1%. In 10 dimensions, it's under .1%.
So in every dimension we still allow 1 / 2 of the general value range! Quite much. But everything eventually ends up ins1% from the total space -- the variations between these data points are huge in every dimension, but minimal within the whole space.
You are able to go further and say in every dimension you cut only 10% from the range. Which means you allow values in [, .9). You'll still finish track of under 35% from the whole space covered in 10 dimensions. In 50 dimensions, it's .5%. This is why, wide ranges of information in every dimension are packed right into a really small part of your research space.
This is exactly why you'll need dimensionality reduction, in which you essentially disregard variations on less informative axes.