Fwd: WEKA IBK question

View: New views
3 Messages — Rating Filter:   Alert me  

Parent Message unknown Fwd: WEKA IBK question

by Yaniv Samuha :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



Hello,
I have files containing moview reviews from the OpenLensGroup DB.
I have a file containing user data (user Id-number, age-number, gender-M/F, occupation-number, zipcode-number)
and a file containg movie reviews DATA (user id-number, movie id-number, grade-number)
[in the movie review file the same user Id can repeat but with different movie Id in each row]
 
In order to do that I tried KNN alg and for that I want to use the WEKA package
However, I don't know how to define the arff file based on the 2 files I already have (of user, movie data).
I don't know what are the attributes that the generic IBK classifier can take and based on them
give me a list of movie Id 's from the K-nearest users.
I need the accuracy to be based on the movies the users have seen (meaning that if a user have seen movie A,B,C
and I select him for the test data then if the train are movies A,B then if the IBK offers movie C the accuracy will increase)
Do you have an Idea on how to do that using the WEKA's IBK?
Thanks,
Yaniv. 


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: Fwd: WEKA IBK question

by Peter Reutemann-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> I have files containing moview reviews from the OpenLensGroup DB.
> I have a file containing user data (user Id-number, age-number, gender-M/F,
> occupation-number, zipcode-number)
> and a file containg movie reviews DATA (user id-number, movie id-number,
> grade-number)
> [in the movie review file the same user Id can repeat but with different
> movie Id in each row]
>
> In order to do that I tried KNN alg and for that I want to use the WEKA
> package
> However, I don't know how to define the arff file based on the 2 files I
> already have (of user, movie data).

Weka only allows flat files (apart from the multi-instance
classifiers), i.e., you have to join those two files (like with an SQL
join) and create one file ou of it. For details on the ARFF format see
the following wiki article:
  http://weka.wiki.sourceforge.net/ARFF

> I don't know what are the attributes that the generic IBK classifier can
> take

Assuming that you use 3.6.x or 3.7.x or Weka, bring up the
GenericObjectEditor dialog for IBk (e.g., in the Explorer) and click
on the "Capabilities" button. That tells you what attributes and class
types the classifier can handle.

> and based on them
> give me a list of movie Id 's from the K-nearest users.

The nearest neighbor search that IBk uses, works on a whole instance.
If you provide this search an Instance containing a specifc user-movie
relation, then it will return the k-nearest other user-movie relations
(but not the movie IDs from the k-nearest users!).

> I need the accuracy to be based on the movies the users have seen (meaning
> that if a user have seen movie A,B,C
> and I select him for the test data then if the train are movies A,B then if
> the IBK offers movie C the accuracy will increase)
> Do you have an Idea on how to do that using the WEKA's IBK?

I leave these questions to someone else on the list.

Cheers, Peter
--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/           Ph. +64 (7) 858-5174

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: Fwd: WEKA IBK question

by Mark Hall-9 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 3/7/09 4:35 AM, Yaniv Samuha wrote:

>
>
> Hello,
> I have files containing moview reviews from the OpenLensGroup DB.
> I have a file containing user data (user Id-number, age-number,
> gender-M/F, occupation-number, zipcode-number)
> and a file containg movie reviews DATA (user id-number, movie id-number,
> grade-number)
> [in the movie review file the same user Id can repeat but with different
> movie Id in each row]
> In order to do that I tried KNN alg and for that I want to use the WEKA
> package
> However, I don't know how to define the arff file based on the 2 files I
> already have (of user, movie data).
> I don't know what are the attributes that the generic IBK classifier can
> take and based on them
> give me a list of movie Id 's from the K-nearest users.

> I need the accuracy to be based on the movies the users have seen
> (meaning that if a user have seen movie A,B,C
> and I select him for the test data then if the train are movies A,B then
> if the IBK offers movie C the accuracy will increase)
> Do you have an Idea on how to do that using the WEKA's IBK?

This sounds like a recommender system to me. Weka doesn't support this kind of
application directly, but you can certainly apply it's algorithms if you are
willing to write some code. One approach would be to set up a separate learning
problem for each movie (i.e. each movie is considered to be the class in turn
and you predict, for a given user, whether they would want to see this movie
based on the other movies and any other attributes you have available). Another
approach would be to use K-nearest neighbors directly and fill in multiple
movies simultaneously for each test instance. In any case, you would need to
decide how to measure accuracy.

I'd suggest taking a look at the volumes of information that have been written
about the Netflix challenge.

Cheers,
Mark.

--
Mark Hall
Senior Developer/Consultant, Pentaho Open Source Business Intelligence
Citadel International, Suite 340, 5950 Hazeltine National Dr., Orlando, FL
32822, USA
+64 7 847-3537 office, +64 21 399-132 mobile, +1 815 550-8637 fax,
Skype: mark.andrew.hall, Yahoo: mark_andrew_hall
Download the latest release today <http://www.sourceforge.net/projects/pentaho>


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html