Allows say I've got a database table known as "Scrape" possibly setup like:
UserID (int) UserName (varchar) Wins (int) Losses (int) ScrapeDate (datetime)
I am attempting to have the ability to rank my customers according to their Wins/Loss ratio. However, every week I'm going to be scraping for brand new data around the customers and making another entry within the Scrape table.
How do i query a listing of customers sorted by wins/deficits, only with the newest entry (ScrapeDate)?
Also, do you consider it matters that individuals is going to be striking the website and also the scrape may well be in the center of finishing?
For instance I possibly could have:
1 - Bob - Wins: 320 - Losses: 110 - ScrapeDate: 7/8/09 1 - Bob - Wins: 360 - Losses: 122 - ScrapeDate: 7/17/09 2 - Frank - Wins: 115 - Losses: 20 - ScrapeDate: 7/8/09
Where, this signifies a scrape which has only up-to-date Bob to date, and is incorporated in the procedure for upgrading Frank but has not yet been placed. How does one handle this case too?
So, my real question is:
- How does one handle querying only the newest scrape of every user to look for the ratings
- Do you consider the truth that the database might be in a condition of upgrading (particularly if a scrape could require one day to accomplish), and never all customers have completely up-to-date yet matters? If that's the case, how does one handle this?
Thanks, and appreciate your reactions you've given me on my small related question:
This is exactly what I call the "finest-n-per-group" problem. It pops up several occasions each week on StackOverflow.
I solve this kind of problem utilizing an outer join technique:
SELECT s1.*, s1.wins / s1.losses AS win_loss_ratio FROM Scrape s1 LEFT OUTER JOIN Scrape s2 ON (s1.username = s2.username AND s1.ScrapeDate < s2.ScrapeDate) WHERE s2.username IS NULL ORDER BY win_loss_ratio DESC;
This can return just one row for every username -- the row using the finest value within the
ScrapeDate column. That is what the outer join is perfect for, to try to complement
s1 with a few other row
s2 with similar username along with a greater date. If there's no such row, the outer join returns NULL for those posts of
s2, therefore we know
s1 matches the row using the finest date for your given username.
This will work if you have a partly-completed scrape happening.
This method is not always as fast because the CTE and RANKING solutions other solutions have given. You should attempt both and find out the things that work much better. The main reason I favor my solution is it works in almost any flavor of SQL.