Forum - View topic
Customanime search by rating




Anime News Network Forum Index -> Site-related -> Bugs & Technical Questions
View previous topic :: View next topic  
Author Message
Pontifex



Joined: 20 Aug 2008
Posts: 7

PostPosted: Wed Feb 06, 2013 9:19 pm Reply with quote
First of all: Thank you for making such a wonderful site. I've looked around for years and have not found the wealth of information and ratings available here.

Question: Is there a way to filter anime by ratings, such that I can:

    *Find a minimum number of review (as 30 or more)
    *Find a minimum "Arithmetic mean" rating (say between 7.5 and 10)
    *Find a minimum "Standard Deviation" of the rating (between 0 and 1.7)

(Bonus points for including it in an email subscription / RSS feed, though I'd settle for just having it on a page on the site)

Checked Tips & Tricks.
Tried to make my own Google Search. (Found a bug when attempting to post this link)
Checked the API page.

Any help would be appreciated.

(Bug: When attempting to post the full URL: https://encrypted.google.com/​search?​q=​allintext:​"​Arithmetic+​mean"​+​"​7.​5.​.​10"​+​"​std.​+​dev.​:​"​+​"​0.​.​1.​7"​+​"​%​28TV%​29"​+​site:​www.​animenewsnetwork.​com#​hl=​en&​safe=​off&​tbo=​d&​sclient=​psy-​ab&​q=​allintext:​"​Arithmetic+​mean"​+​"​7.​5.​.​10"​+​"​std.​+​dev.​:​"​+​"​0.​.​1.​7"​+​"​%​28TV%​29"​+​site:​www.​animenewsnetwork.​com&​oq=​allintext:​"​Arithmetic+​mean"​+​"​7.​5.​.​10"​+​"​std.​+​dev.​:​"​+​"​0.​.​1.​7"​+​"​%​28TV%​29"​+​site:​www.​animenewsnetwork.​com&​gs_l=​serp.​12.​.​.​0.​0.​0.​335018.​0.​0.​0.​0.​0.​0.​0.​0.​.​0.​0.​les;​.​.​0.​0.​.​.​1c.​.​2.​serp.​xYbA36RCRs4&​pbx=​1&​bav=​on.​2,​or.​r_gc.​r_pw.​r_qf.​&​bvm=​bv.​42080656,​d.​cGE&​fp=​83f4cc16c74fbd4b&​biw=​1920&​bih=​856
The BB code used for the forums would not parse it, to include it in a [url] tag as I completed above. I know that contains some garbage in it, like my session ID and browser ID, etc, but you should be able to post arbitrary links into these tags without having to manually mangle the URL. =/)
Back to top
View user's profile Send private message
Pontifex



Joined: 20 Aug 2008
Posts: 7

PostPosted: Sun Feb 10, 2013 12:12 pm Reply with quote
So I guess there isn't a way. =(

I'll have to get something working with screen scraping I suppose. Anyone want to share their thoughts about that? I have a little experience, but not a whole bunch. I'd like to extract the ratings to be able to not have wade through all the anime's manually.
Back to top
View user's profile Send private message
DerekTheRed
Encyclopedia SupporterEncyclopedia Supporter


Joined: 19 Dec 2007
Posts: 3273
Location: ::Points to hand::

PostPosted: Sun Feb 10, 2013 4:21 pm Reply with quote
Well, ANNs encyclopedia entries all have the format: animenewsnetwork.com/encyclopedia/anime.php?id=##### so you could write a program that steps through them all easily and checks the source code of each web page for your criteria, then returns a list of titles/addresses. But I don't know how resource intensive that would be on ANNs end, you might get mistaken for a DDOS or something by CloudFlare.

For instance, you could read line by line until you find
Code:
DIV ID=ratingbox CLASS=ratings-collapsed>

(Had to remove the opening angle bracket to make it display in the forum) then skip 11 lines, then your next line has the info you want and you'd just have to extract it.
Code:
<SPAN><B>Seen</B> in part or in whole by 237 users, rank: #2203 (of 5468)<BR><B>Median rating:</B> Not really good<BR><B>Arithmetic mean:</B> 4.574 (So-so−), std. dev.: 2.9756, rank: #5153 (of 5457)<BR><B>Weighted mean:</B> 4.443 (Not really good+), rank: #5214 (of 5457) <span>(seen all: 4.44 / seen some: 5.43 / won't finish: 0.00)</span><BR><B>Bayesian estimate:</B> 4.668 (So-so−), rank: #4088 (of 4138)<BR></SPAN>


This method is not going to be very efficient because there is a lot of wasted work, but maybe you can come up with something better.
Back to top
View user's profile Send private message My Anime My Manga
Pontifex



Joined: 20 Aug 2008
Posts: 7

PostPosted: Mon Feb 11, 2013 1:32 am Reply with quote
That's a good start. Probably have to mirror the whole thing (like API page recommends) and then do searches as you said.

Well HTTRack does that, and I have a bit of experience in that, so I'll start there.

Dev's: Don't suppose you guys have any dedicate mirrors I could *ahem* bother for all of the information I'm looking for? Maybe an Rsync mirror, if I've been really good? ^__^

All: Any interest in having this information published in searchable form? I have a full plate already, but I could post my progress if there's interest.
Back to top
View user's profile Send private message
Pontifex



Joined: 20 Aug 2008
Posts: 7

PostPosted: Thu Feb 14, 2013 3:07 pm Reply with quote
I had to look it up. Apparently web scraping is frowned upon! (Sometimes)

So reading the Privacy Policy and Copyright Policy, I found that:

Quote:
"Statistical Data" is data compiled from other information, personal and otherwise that itself is not personally identifiable. For example the average age of our readers. their geographic dispersement and so on.


And

Quote:
Under no other circumstance do we share your personal information with a third party except in the form of statistical data.


So seemingly information of a statistical nature (e.g. the numbers on the reviews) is shared with third parties without limitation.

And:

Quote:
Reviews

Manufacturers, distributors and retailers can quote short excerpts from our reviews in publicity material and on the product itself provided that the quote is attributed to "AnimeNewsNetwork.com". We reserve the right to reproduce the publicity material as a part of our own publicity materials. This includes single pages from publications, home video box covers and screenshots.

While our permission is not required, a heads up is greately appreciated.


Quote:
In short: 1) Link to the source, 2) use your own words 3) Don't rely on one single source and 4) Have some original content.


Quote:
Unless otherwise stated all material published on AnimeNewsNetwork.com is owned by and copyright Anime News Network Inc.

Third party images, product descriptions and product names are the legal property (copyright and/or trademark) of their respective owner(s) where said companies' or individuals' rights apply.


Which appears to be ambiguous coupled with this:

Quote:
Forum Posts

Any and all posts made on Anime News Network's discussion forums are the property of the original poster. Excerpts from the posts can be used under the fair use clause, but the entire post cannot be reprinted without the original poster's explicit consent.


So only the forums have explicit attribution to their authors? Individual reviews outside of the forums do not and are owned by ANN? Unclear, as one would think an opinion expressed by a vote on the voting widget on the anime's page would be not substantially different than a vote / worded post in the forums!

ANN: But, in short: Head's up, I'm using Statistical Data in a manner that does not violate your privacy policy, that appears to be owned by your user's, in a manner in keeping with your fair use statement.
Back to top
View user's profile Send private message
DerekTheRed
Encyclopedia SupporterEncyclopedia Supporter


Joined: 19 Dec 2007
Posts: 3273
Location: ::Points to hand::

PostPosted: Thu Feb 14, 2013 6:48 pm Reply with quote
I think you're misunderstanding what they mean by statistical data in the privacy policy...
Back to top
View user's profile Send private message My Anime My Manga
Pontifex



Joined: 20 Aug 2008
Posts: 7

PostPosted: Thu Feb 14, 2013 11:02 pm Reply with quote
Could very well be.
(Though to be fair Bayesian statics, means and standard deviations of votes are about as statistical as you can get! =D)

They at least wanted a heads up, so there's that.
Back to top
View user's profile Send private message
Tempest
ANN Publisher & CEO


Joined: 29 Dec 2001
Posts: 8534
Location: Do not message me for support.

PostPosted: Sat Feb 16, 2013 2:04 pm Reply with quote
Sorry for not responding to this sooner.

Pontifex wrote:
I had to look it up. Apparently web scraping is frowned upon! (Sometimes)


Web scraping is definitely frowned upon. I've never heard of any website no frowning on the practice.

Quote:

So reading the Privacy Policy and Copyright Policy, I found that:

Quote:
"Statistical Data" is data compiled from other information, personal and otherwise that itself is not personally identifiable. For example the average age of our readers. their geographic dispersement and so on.


This has nothing to do with scraping. This restricts how we use the data that is provided to us by our users. It does not give third parties any permission to take data from ANN (nor does it restrict such abilities, it simply isn't related). In otherwords, our privacy policy has nothing to do with your use and shouldn't be used by you as a guideline.

Quote:
In short: 1) Link to the source, 2) use your own words 3) Don't rely on one single source and 4) Have some original content.


This is applies only to the news. You aren't publishing news (nor are you educating, parodying or reviewing), so you aren't covered by the defined fair use clauses.

Quote:
Unclear, as one would think an opinion expressed by a vote on the voting widget on the anime's page would be not substantially different than a vote / worded post in the forums!
It's very different. A vote is statistical information. A post is written material.

Quote:

ANN: But, in short: Head's up, I'm using Statistical Data in a manner that does not violate your privacy policy, that appears to be owned by your user's, in a manner in keeping with your fair use statement.


I'm sorry, but you've completely misunderstood our privacy policy, fair use, and ownership of the content. I never brought any of this up before because none of it was relevant.

The important issues are

1) Do we mind if you scrape the site
Answer: As long as you do it infrequently and do not place a significant burden on the site, that's fine.

2) Can you use the data
Legally - this is a grey zone. Data itself is generally not considered to be protected by copyright, however collections of data are considered protected by copyright (in the USA and Canada, I'm not certain about other countries). Furthermore, there are precedents that say that scraping an entire data source and republishing it is definitely an infringement of copyright.

That would be my legal opinion (I'm no lawyer, but I know a massive amount about copyright, more in fact than many lawyers who do not specialize in IP), as well as our lawyers (who is an IP specialist).

But ultimately, I'm happy to be much less restrictive with this that what the law allows. If your only purpose is to create a searchable database by rating, and you aren't reproducing the entire encyclopedia, I (and therefore ANN) have no problem with the practice. So if someone uses your data to find a specific anime by rating, and they then want to know who the director of this anime is, they need to go to ANN to find that information.

Is this project publicly accessible, or is it only for yourself ?\

-t
Back to top
View user's profile Send private message Send e-mail My Anime My Manga
Pontifex



Joined: 20 Aug 2008
Posts: 7

PostPosted: Mon Feb 18, 2013 2:00 pm Reply with quote
Excellent write up, thank you!

I was beginning to despair at getting a response.

So to address your points:

1)

I'm doing it at the moment with WinHTTRACK, whose defaults have bandwidth consumption at ~15kb/s, so it shouldn't be a burden per say. Well I hope not, anyways, it's been running for a couple of days.

2)

Not going to republish the entire site no. Just wanted to state that explicitly.

My idea was to be able to search in a more fine grained manner on the ratings than is currently available / desirable.

Quote:
Is this project publicly accessible, or is it only for yourself ?\


I was planning to just make something just for myself to be able find new anime to try out.

If there was interest I thought it might be nice to make a site similar to metacritic but with an anime focus.

And of course the much lauded "recommend me an anime" question that gets bandied about on forums and such (found a lot of those while I was doing my due diligence, trying to find something like this already made) would be nice to automate; A Recommender system using Collaborative filtering.

Of course you guys at ANN could probably do that right now, if agent A like Anime A_n, and agent B wants a recommendation on A_n, how "close" are they together to be able to create a reasonable recommendation based on their similar interests.
Back to top
View user's profile Send private message
Pontifex



Joined: 20 Aug 2008
Posts: 7

PostPosted: Mon Mar 11, 2013 7:25 pm Reply with quote
Just wanted to end this thread with the culmination of my research:

http://lab.rolisoft.net/​tvshowtracker.​html

Not quite what I was looking for, but does search databases of anime and a recommendation feature is in the works.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Anime News Network Forum Index -> Site-related -> Bugs & Technical Questions All times are GMT - 5 Hours
Page 1 of 1

 


Powered by phpBB © 2001, 2005 phpBB Group