On 3/7/09 4:35 AM, Yaniv Samuha wrote:
>
>
> Hello,
> I have files containing moview reviews from the OpenLensGroup DB.
> I have a file containing user data (user Id-number, age-number,
> gender-M/F, occupation-number, zipcode-number)
> and a file containg movie reviews DATA (user id-number, movie id-number,
> grade-number)
> [in the movie review file the same user Id can repeat but with different
> movie Id in each row]
> In order to do that I tried KNN alg and for that I want to use the WEKA
> package
> However, I don't know how to define the arff file based on the 2 files I
> already have (of user, movie data).
> I don't know what are the attributes that the generic IBK classifier can
> take and based on them
> give me a list of movie Id 's from the K-nearest users.
> I need the accuracy to be based on the movies the users have seen
> (meaning that if a user have seen movie A,B,C
> and I select him for the test data then if the train are movies A,B then
> if the IBK offers movie C the accuracy will increase)
> Do you have an Idea on how to do that using the WEKA's IBK?
This sounds like a recommender system to me. Weka doesn't support this kind of
application directly, but you can certainly apply it's algorithms if you are
willing to write some code. One approach would be to set up a separate learning
problem for each movie (i.e. each movie is considered to be the class in turn
and you predict, for a given user, whether they would want to see this movie
based on the other movies and any other attributes you have available). Another
approach would be to use K-nearest neighbors directly and fill in multiple
movies simultaneously for each test instance. In any case, you would need to
decide how to measure accuracy.
I'd suggest taking a look at the volumes of information that have been written
about the Netflix challenge.
Cheers,
Mark.
--
Mark Hall
Senior Developer/Consultant, Pentaho Open Source Business Intelligence
Citadel International, Suite 340, 5950 Hazeltine National Dr., Orlando, FL
32822, USA
+64 7 847-3537 office, +64 21 399-132 mobile, +1 815 550-8637 fax,
Skype: mark.andrew.hall, Yahoo: mark_andrew_hall
Download the latest release today <
http://www.sourceforge.net/projects/pentaho>
_______________________________________________
Wekalist mailing list
Send posts to:
Wekalist@...
List info and subscription status:
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalistList etiquette:
http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html