Forum - View topicCustomanime search by rating
|
Author | Message | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Pontifex
Posts: 7 |
|
|||||||||||
First of all: Thank you for making such a wonderful site. I've looked around for years and have not found the wealth of information and ratings available here.
Question: Is there a way to filter anime by ratings, such that I can: *Find a minimum number of review (as 30 or more) *Find a minimum "Arithmetic mean" rating (say between 7.5 and 10) *Find a minimum "Standard Deviation" of the rating (between 0 and 1.7) (Bonus points for including it in an email subscription / RSS feed, though I'd settle for just having it on a page on the site) Checked Tips & Tricks. Tried to make my own Google Search. (Found a bug when attempting to post this link) Checked the API page. Any help would be appreciated. (Bug: When attempting to post the full URL: https://encrypted.google.com/search?q=allintext:%22Arithmetic+mean%22+%227.5..10%22+%22std.+dev.:%22+%220..1.7%22+%22%28TV%29%22+site:www.animenewsnetwork.com#hl=en&safe=off&tbo=d&sclient=psy-ab&q=allintext:%22Arithmetic+mean%22+%227.5..10%22+%22std.+dev.%3A%22+%220..1.7%22+%22%28TV%29%22+site%3Awww.animenewsnetwork.com&oq=allintext:%22Arithmetic+mean%22+%227.5..10%22+%22std.+dev.%3A%22+%220..1.7%22+%22%28TV%29%22+site%3Awww.animenewsnetwork.com&gs_l=serp.12...0.0.0.335018.0.0.0.0.0.0.0.0..0.0.les%3B..0.0...1c..2.serp.xYbA36RCRs4&pbx=1&bav=on.2,or.r_gc.r_pw.r_qf.&bvm=bv.42080656,d.cGE&fp=83f4cc16c74fbd4b&biw=1920&bih=856 The BB code used for the forums would not parse it, to include it in a [url] tag as I completed above. I know that contains some garbage in it, like my session ID and browser ID, etc, but you should be able to post arbitrary links into these tags without having to manually mangle the URL. =/) |
||||||||||||
Pontifex
Posts: 7 |
|
|||||||||||
So I guess there isn't a way. =(
I'll have to get something working with screen scraping I suppose. Anyone want to share their thoughts about that? I have a little experience, but not a whole bunch. I'd like to extract the ratings to be able to not have wade through all the anime's manually. |
||||||||||||
DerekTheRed
Posts: 3544 Location: ::Points to hand:: |
|
|||||||||||
Well, ANNs encyclopedia entries all have the format: animenewsnetwork.com/encyclopedia/anime.php?id=##### so you could write a program that steps through them all easily and checks the source code of each web page for your criteria, then returns a list of titles/addresses. But I don't know how resource intensive that would be on ANNs end, you might get mistaken for a DDOS or something by CloudFlare.
For instance, you could read line by line until you find
(Had to remove the opening angle bracket to make it display in the forum) then skip 11 lines, then your next line has the info you want and you'd just have to extract it.
This method is not going to be very efficient because there is a lot of wasted work, but maybe you can come up with something better. |
||||||||||||
Pontifex
Posts: 7 |
|
|||||||||||
That's a good start. Probably have to mirror the whole thing (like API page recommends) and then do searches as you said.
Well HTTRack does that, and I have a bit of experience in that, so I'll start there. Dev's: Don't suppose you guys have any dedicate mirrors I could *ahem* bother for all of the information I'm looking for? Maybe an Rsync mirror, if I've been really good? ^__^ All: Any interest in having this information published in searchable form? I have a full plate already, but I could post my progress if there's interest. |
||||||||||||
Pontifex
Posts: 7 |
|
|||||||||||
I had to look it up. Apparently web scraping is frowned upon! (Sometimes)
So reading the Privacy Policy and Copyright Policy, I found that:
And
So seemingly information of a statistical nature (e.g. the numbers on the reviews) is shared with third parties without limitation. And:
Which appears to be ambiguous coupled with this:
So only the forums have explicit attribution to their authors? Individual reviews outside of the forums do not and are owned by ANN? Unclear, as one would think an opinion expressed by a vote on the voting widget on the anime's page would be not substantially different than a vote / worded post in the forums! ANN: But, in short: Head's up, I'm using Statistical Data in a manner that does not violate your privacy policy, that appears to be owned by your user's, in a manner in keeping with your fair use statement. |
||||||||||||
DerekTheRed
Posts: 3544 Location: ::Points to hand:: |
|
|||||||||||
I think you're misunderstanding what they mean by statistical data in the privacy policy...
|
||||||||||||
Pontifex
Posts: 7 |
|
|||||||||||
Could very well be.
(Though to be fair Bayesian statics, means and standard deviations of votes are about as statistical as you can get! =D) They at least wanted a heads up, so there's that. |
||||||||||||
Tempest
I Run this place.
ANN Publisher Posts: 10420 Location: Do not message me for support. |
|
|||||||||||
Sorry for not responding to this sooner.
Web scraping is definitely frowned upon. I've never heard of any website no frowning on the practice.
This has nothing to do with scraping. This restricts how we use the data that is provided to us by our users. It does not give third parties any permission to take data from ANN (nor does it restrict such abilities, it simply isn't related). In otherwords, our privacy policy has nothing to do with your use and shouldn't be used by you as a guideline.
This is applies only to the news. You aren't publishing news (nor are you educating, parodying or reviewing), so you aren't covered by the defined fair use clauses.
I'm sorry, but you've completely misunderstood our privacy policy, fair use, and ownership of the content. I never brought any of this up before because none of it was relevant. The important issues are 1) Do we mind if you scrape the site Answer: As long as you do it infrequently and do not place a significant burden on the site, that's fine. 2) Can you use the data Legally - this is a grey zone. Data itself is generally not considered to be protected by copyright, however collections of data are considered protected by copyright (in the USA and Canada, I'm not certain about other countries). Furthermore, there are precedents that say that scraping an entire data source and republishing it is definitely an infringement of copyright. That would be my legal opinion (I'm no lawyer, but I know a massive amount about copyright, more in fact than many lawyers who do not specialize in IP), as well as our lawyers (who is an IP specialist). But ultimately, I'm happy to be much less restrictive with this that what the law allows. If your only purpose is to create a searchable database by rating, and you aren't reproducing the entire encyclopedia, I (and therefore ANN) have no problem with the practice. So if someone uses your data to find a specific anime by rating, and they then want to know who the director of this anime is, they need to go to ANN to find that information. Is this project publicly accessible, or is it only for yourself ?\ -t |
||||||||||||
Pontifex
Posts: 7 |
|
|||||||||||
Excellent write up, thank you!
I was beginning to despair at getting a response. So to address your points: 1) I'm doing it at the moment with WinHTTRACK, whose defaults have bandwidth consumption at ~15kb/s, so it shouldn't be a burden per say. Well I hope not, anyways, it's been running for a couple of days. 2) Not going to republish the entire site no. Just wanted to state that explicitly. My idea was to be able to search in a more fine grained manner on the ratings than is currently available / desirable.
I was planning to just make something just for myself to be able find new anime to try out. If there was interest I thought it might be nice to make a site similar to metacritic but with an anime focus. And of course the much lauded "recommend me an anime" question that gets bandied about on forums and such (found a lot of those while I was doing my due diligence, trying to find something like this already made) would be nice to automate; A Recommender system using Collaborative filtering. Of course you guys at ANN could probably do that right now, if agent A like Anime A_n, and agent B wants a recommendation on A_n, how "close" are they together to be able to create a reasonable recommendation based on their similar interests. |
||||||||||||
Pontifex
Posts: 7 |
|
|||||||||||
Just wanted to end this thread with the culmination of my research:
http://lab.rolisoft.net/tvshowtracker.html Not quite what I was looking for, but does search databases of anime and a recommendation feature is in the works. |
||||||||||||
All times are GMT - 5 Hours |
||
|
Powered by phpBB © 2001, 2005 phpBB Group