Suppose I've got a people as well as their Gps navigation coordinates:

User1, 52.99, -41.0
User2, 91.44, -21.4
User3, 5.12, 24.5
...

My objective is: Given some coordinates,

  1. In the individuals customers, discover the ones within 20 meters. (how you can perform a Choose statement such as this?)
  2. For every of individuals customers, obtain the distance.

While you most likely suspected, these coordinates is going to be retrieved from the cell phone. The phones will update their longitude/latitude every ten seconds, in addition to have that listing of customers <20 meters. It's dynamic.

I'd like the easiest method to do that to ensure that it may scale.

  • Can you keep coordinates inside a database, increase it every 10 seconds? (Should you store it inside a database...how does one calculate it...)
  • How does one do that therefore it can scale?

Incidentally, there's already an equation that may calculate the length between 2 coordinates http://www.johndcook.com/python_longitude_latitude.html. I simply need to know what's the easiest method to do that technically (Trees, Database? What architecture? More particularly...how does one match the lengthy/lat distance formula in to the "Choose" statement?)

  1. Produce a MyISAM table having a column of datatype Point

  2. Produce a SPATIAL index about this column

  3. Convert the GPS coords into UTM (power grid) coords and store them inside your table

  4. Problem this question:

    SELECT  user_id, GLength(LineString(user_point, @mypoint))
    FROM    users
    WHERE   MBRWithin(user_point, LineString(Point(X(@mypoint) - 20, Y(@mypoint - 20)), Point(X(@mypoint) + 20, Y(@mypoint + 20))
            AND GLength(LineString(user_point, @mypoint)) <= 20
    

Observe that this question will most most likely be operate on very volatile data and you will have to perform the additional inspections promptly.

Since MySQL cannot mix SPATIAL indexes, it will likely be easier to apply certain type of surface tiling technology:

  1. Split our planet surface into numerous tiles, say, 1 x 1 " (it comes down to 30 meters from the meridian and 30 * COS(lon) from the parallel.

  2. Keep data within the CHAR(14) column: 7 numbers from the lat + 7 numbers around the lon (14 numbers whatsoever). Disable key compression about this column.

  3. Produce a composite index on (time, tile)

  4. Around the client, calculate all possible tiles your pals may maintain. For 20 meters distance, this is for the most part 9 tiles, unless of course you're deep at North or South. However, you might alter the tiling formula to deal with these cases.

  5. Problem this question:

    SELECT  *
    FROM    (
            SELECT  tile1
            UNION ALL
            SELECT  tile2
            UNION ALL
            …
            ) tiles
    JOIN    users u
    ON      u.tile  = tiles.tile
            AND u.time >= NOW() 
            AND GLength(LineString(user_point, @mypoint)) <= 20
    

, where tile1 etc are precalculated tiles.

SQL Server implements this formula because of its spatial indexes (instead of R-Tree that MySQL uses).

Well, the naive approach is always to do an O(n) omit every point, obtain distance in the current point, and discover the very best 20. This really is perfectly Suitable for small datasets (say <= 500 points), but on bigger sets it will likely be quite slow. In SQL, this is like:

SELECT point_id, DIST_FORMULA(x, y) as distance
FROM   points
WHERE  distance < 20

To deal with the ineffectiveness of the aforementioned method, you would need to apply certain kind of preprocessing step, probably space partitioning. That may frequently significantly improve performance in nearest neighbour kind of searches such as this. However, inside your situation, if all of the points are up-to-date every ten seconds, you would need to do an Ω(n) pass to update the positioning of every reason for the area partitioning tree. For those who have greater than a couple of queries in between each update, it will likely be helpful, otherwise it'll just be an overhead.