FA API wishlist
13 years ago
For developers, statisticians, and other related nutjobs, I'd like to propose a list of read-only tables or exported files that contain most/all of the data relevant to site submissions. This will hopefully cut down on CPU time wasted generating pages for bots, and possibly promote interest in the site in general.
Table 1: user general info.
Numeric userid, site handle, creation time.
Publicly semi-available data that can be skimmed from a user's main page. If you know their URL, you can get there, but there is no full listing of all FA accounts.
This data can be used to show site growth over time, and possibly interesting trivia about common username patterns.
Table 2: submission data
Numeric userid, numeric submission id, submission timestamp
(Optional extra fields: detailed notes, category listing, maturity rating, tag list)
Semi-public info: a bot could go through all of the submission id's and collect data that way, but obviously mature stuff wouldn't be available.
This info can demonstrate peak posting times, breakdown of mature/general art posted, and common themes in the descriptions/tags. I'm not looking for a dump of all the uploaded content; just the info surrounding it.
Table 3: comment data
Submission id, commenter user id, time. (Optionally: comment contents.)
Semi-public, semi-available data: obviously bots can't access comments on mature pieces, and now that FA allows comments to be hidden, some comments may disappear over time or become otherwise inaccessible.
This can be used to determine who comments the most on which artists, best times of the day/week/month/year to receive comments, how quickly people post on submissions. If comment content is included, it could be used to detect retards, trolls, etc. (E.g. "DORRY.")
Table 4: favorites data
submission id, user id (of fan, not artist), time.
Semi-public, semi-available data: again, mature submissions are inaccessible to bots without cookies, but also only the favorites shown on a user's main page display "when" the favoriting happened older favorites' times aren't displayed.
This data would be useful to see when people favorite stuff, in terms of time of day/week/year AND with respect to submission time. Could also be used to locate "hidden gems" in the site by relatively unknown artists.
Table 5: watchers
Userid A, userid B, where A watches B. Possibly also a timestamp, if it's stored. I know user watches are sorted by recency on a user's page, but I'm not sure if that's intentional or just a side effect of DB design.
public data: this info is completely public already and can be accessed by anyone without having to log in. Currently this data can't be gathered easily because of server limitations, possible throttling (total conjecture on my part) and such.
This'd be fantastic data to have so I could update Popufur, but also it would serve to detail how frequently people favorite stuff from users they're watching.
Table 6: activity
Registered users count, guest count, timestamp.
No frills- just the data displayed at the bottom of the page. If it's actually cached somewhere. If not, I've been tracking it for weeks already.
With this, you can see how active the site is by hour of day, day of week, around conventions and holidays, etc, and if site activity and favs/comments activity are correlated!
Anyway, that's all I came up with. Anything else that people would like to see? Like I said, this data would provide data mavens with info they can mostly already get by site scraping, thereby saving everyone headaches. It wouldn't need to be real-time, either. Daily or weekly or monthly snapshots would be even better. Xml, json, csv, whatever- just zip that stuff up.
I think this could potentially drive value-added development by third-party coders, so I'd like to request some sort of API, if it wouldn't be too much trouble.
Thanks for reading <3
Table 1: user general info.
Numeric userid, site handle, creation time.
Publicly semi-available data that can be skimmed from a user's main page. If you know their URL, you can get there, but there is no full listing of all FA accounts.
This data can be used to show site growth over time, and possibly interesting trivia about common username patterns.
Table 2: submission data
Numeric userid, numeric submission id, submission timestamp
(Optional extra fields: detailed notes, category listing, maturity rating, tag list)
Semi-public info: a bot could go through all of the submission id's and collect data that way, but obviously mature stuff wouldn't be available.
This info can demonstrate peak posting times, breakdown of mature/general art posted, and common themes in the descriptions/tags. I'm not looking for a dump of all the uploaded content; just the info surrounding it.
Table 3: comment data
Submission id, commenter user id, time. (Optionally: comment contents.)
Semi-public, semi-available data: obviously bots can't access comments on mature pieces, and now that FA allows comments to be hidden, some comments may disappear over time or become otherwise inaccessible.
This can be used to determine who comments the most on which artists, best times of the day/week/month/year to receive comments, how quickly people post on submissions. If comment content is included, it could be used to detect retards, trolls, etc. (E.g. "DORRY.")
Table 4: favorites data
submission id, user id (of fan, not artist), time.
Semi-public, semi-available data: again, mature submissions are inaccessible to bots without cookies, but also only the favorites shown on a user's main page display "when" the favoriting happened older favorites' times aren't displayed.
This data would be useful to see when people favorite stuff, in terms of time of day/week/year AND with respect to submission time. Could also be used to locate "hidden gems" in the site by relatively unknown artists.
Table 5: watchers
Userid A, userid B, where A watches B. Possibly also a timestamp, if it's stored. I know user watches are sorted by recency on a user's page, but I'm not sure if that's intentional or just a side effect of DB design.
public data: this info is completely public already and can be accessed by anyone without having to log in. Currently this data can't be gathered easily because of server limitations, possible throttling (total conjecture on my part) and such.
This'd be fantastic data to have so I could update Popufur, but also it would serve to detail how frequently people favorite stuff from users they're watching.
Table 6: activity
Registered users count, guest count, timestamp.
No frills- just the data displayed at the bottom of the page. If it's actually cached somewhere. If not, I've been tracking it for weeks already.
With this, you can see how active the site is by hour of day, day of week, around conventions and holidays, etc, and if site activity and favs/comments activity are correlated!
Anyway, that's all I came up with. Anything else that people would like to see? Like I said, this data would provide data mavens with info they can mostly already get by site scraping, thereby saving everyone headaches. It wouldn't need to be real-time, either. Daily or weekly or monthly snapshots would be even better. Xml, json, csv, whatever- just zip that stuff up.
I think this could potentially drive value-added development by third-party coders, so I'd like to request some sort of API, if it wouldn't be too much trouble.
Thanks for reading <3
A table containing journal entries and another for comments.
Since they're all public and sequentially numbered, people can already access and gather these.
My only concern with some of these tables is that things like submissions and comments can be deleted- would having all of that data cached permanently in archives make people uncomfortable?
Yes, this would be a good idea. I mentioned that I am not much of a programmer, but I had thought of making an FA android app. This would make that easier I suppose as opposed to 'scraping' html files. Also, this guy
The only thing I'd add maybe is 'Shouts'. Perhaps also profile info text and the stuff like "Shell of choice".
Can you not have a 'bot' with an account to access restricted content? I even ran across somebody who had their whole account hidden from non-registered users, which I didn't even realize was an option (and as an aside, since I am paranoid I might think about enabling it myself).
Anyway, that's that. Haven't spoken to you in a while, and I would like to, but it is a bad habit of mine to keep to myself too much.