Need Help! Potential new Popufur feature: watch suggestions
14 years ago
I'm trying to figure out an algorithm that will allow me to take the current social network of FA and "suggest" people to watch. But I'm running into some brick walls.
For input, I'm using the database of watches, which consists of (A,B) where A watches B. I'm trying to get out of it a list of "people that you probably know". I've tried two different algorithms, and I'm not happy with either result.
Test 1, "FOAF"/Friend of a Friend: This algorithm ranks the users you are not watching by the number of "indirect watchers", i.e. 2-degree separation, where you are watching who is watching X.
In English: who is the most popular out of the people that the people you're watching are watching?
Test 2, "You'll Also Like": This one's a little more complicated, and honestly I'm upset this one didn't work right off the bat. Let's say you're "A". This method goes through each other user in the database, which we'll call B. I calculate the number of users that both A and B watch, let's call that N. Then, for every person that B is watching that A is not, N (divided by the number of people B is watching that A is not) is added to B's "score".
In English: Find users with similar watch lists, and suggest the "missing links", especially so, the more similar the lists.
Both of these algorithms are intrinsically inefficient, but run in polynomial time. In practice, they run fairly quickly. It takes about 7-8 minutes to build the FA watch database (18 million watches and 170k users), and then a matter of seconds (if even that) to build a suggestion list for a single user.
The reason I'm unsatisfied with both of these algorithms is that it's more or less a list of the most popular artists on the site. Naturally, if 10-15% of all users with at least one watch are watching each of the top handful of artists, odds are they're going to be suggested as "people you'd probably also like".
What I'm looking for is a set of results more along the lines of "oh hey you know A and B and C, then you probably know D." and "wow, I do know D! I didn't know they had an account here". Kind of like how Twitter or Facebook do it, but slightly less creepily and with far less input data to work with.
I've looked into how big'uns Amazon and such do it, but since I don't have a rating system to weight different associations between users, it's harder to just use a binary "either you watch them or you don't" system.
If I can figure out a suitable algorithm for suggesting users to watch, I hope to extend it to "suggested art/submissions you might like". (Fun fact: as of right now, there are approximately 91 million favorites among the 172k known users)
So... does anyone have experience in the field of social network analysis and/or search suggestion algorithms? I'm open for ideas; I honestly thought I'd have it down by now! :D
Oh, and yiff yiff murr scritch blowjob, etc. There, now this post is relevant to FA. :)
For input, I'm using the database of watches, which consists of (A,B) where A watches B. I'm trying to get out of it a list of "people that you probably know". I've tried two different algorithms, and I'm not happy with either result.
Test 1, "FOAF"/Friend of a Friend: This algorithm ranks the users you are not watching by the number of "indirect watchers", i.e. 2-degree separation, where you are watching who is watching X.
In English: who is the most popular out of the people that the people you're watching are watching?
Test 2, "You'll Also Like": This one's a little more complicated, and honestly I'm upset this one didn't work right off the bat. Let's say you're "A". This method goes through each other user in the database, which we'll call B. I calculate the number of users that both A and B watch, let's call that N. Then, for every person that B is watching that A is not, N (divided by the number of people B is watching that A is not) is added to B's "score".
In English: Find users with similar watch lists, and suggest the "missing links", especially so, the more similar the lists.
Both of these algorithms are intrinsically inefficient, but run in polynomial time. In practice, they run fairly quickly. It takes about 7-8 minutes to build the FA watch database (18 million watches and 170k users), and then a matter of seconds (if even that) to build a suggestion list for a single user.
The reason I'm unsatisfied with both of these algorithms is that it's more or less a list of the most popular artists on the site. Naturally, if 10-15% of all users with at least one watch are watching each of the top handful of artists, odds are they're going to be suggested as "people you'd probably also like".
What I'm looking for is a set of results more along the lines of "oh hey you know A and B and C, then you probably know D." and "wow, I do know D! I didn't know they had an account here". Kind of like how Twitter or Facebook do it, but slightly less creepily and with far less input data to work with.
I've looked into how big'uns Amazon and such do it, but since I don't have a rating system to weight different associations between users, it's harder to just use a binary "either you watch them or you don't" system.
If I can figure out a suitable algorithm for suggesting users to watch, I hope to extend it to "suggested art/submissions you might like". (Fun fact: as of right now, there are approximately 91 million favorites among the 172k known users)
So... does anyone have experience in the field of social network analysis and/or search suggestion algorithms? I'm open for ideas; I honestly thought I'd have it down by now! :D
Oh, and yiff yiff murr scritch blowjob, etc. There, now this post is relevant to FA. :)
I hope it works out and more people can learn about it!!!
Anyway, I wish I could help with the algorithms. Unfortunately, I know nothing about programming and thus I am absolutely useless to you. But I thought I should at least let you know that your work is definitely highly appreciated. (:
Something like this:
[CODE]
watchweight=5; // Preferred ratio of watches to watched-bys. Someone watched by 10 can watch 50 before losing score.
if (user.watchlist.length > user.watchedby.length*watchweight) {
user.focw=(user.watchedby.length*watchweight) / user.watchlist.length; // Divide lower by higher for "focus weight" ratio
} else {
user.focw=1.0; // Alternatively, we could have this taper off in the other direction, but...
}
// user.focw=sqrt(user.focw); // Since these ratios range from 0 to 1.0, we can easily apply some kind of curve to tweak the result.
[/CODE]
That way, instead of just counting how many users watch the potentially-suggested users, you can add their weightings instead.
Also, instead of the "friend of a friend" strategy, how about something more like "those who watch who I watch also watch these"... Sort of a "forwards-backwards-forwards" approach instead of just "forwards-forwards". This might actually be very similar to your "test 2" except with a slightly different approach, that might yield different results.
I love this sort of thing. I'm gonna experiment with these ideas!