|
View:
New views
3 Messages
—
Rating Filter:
Alert me
|
|
|
|
|
|
Re: Fwd: WEKA IBK question> I have files containing moview reviews from the OpenLensGroup DB.
> I have a file containing user data (user Id-number, age-number, gender-M/F, > occupation-number, zipcode-number) > and a file containg movie reviews DATA (user id-number, movie id-number, > grade-number) > [in the movie review file the same user Id can repeat but with different > movie Id in each row] > > In order to do that I tried KNN alg and for that I want to use the WEKA > package > However, I don't know how to define the arff file based on the 2 files I > already have (of user, movie data). Weka only allows flat files (apart from the multi-instance classifiers), i.e., you have to join those two files (like with an SQL join) and create one file ou of it. For details on the ARFF format see the following wiki article: http://weka.wiki.sourceforge.net/ARFF > I don't know what are the attributes that the generic IBK classifier can > take Assuming that you use 3.6.x or 3.7.x or Weka, bring up the GenericObjectEditor dialog for IBk (e.g., in the Explorer) and click on the "Capabilities" button. That tells you what attributes and class types the classifier can handle. > and based on them > give me a list of movie Id 's from the K-nearest users. The nearest neighbor search that IBk uses, works on a whole instance. If you provide this search an Instance containing a specifc user-movie relation, then it will return the k-nearest other user-movie relations (but not the movie IDs from the k-nearest users!). > I need the accuracy to be based on the movies the users have seen (meaning > that if a user have seen movie A,B,C > and I select him for the test data then if the train are movies A,B then if > the IBK offers movie C the accuracy will increase) > Do you have an Idea on how to do that using the WEKA's IBK? I leave these questions to someone else on the list. Cheers, Peter -- Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174 _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
Re: Fwd: WEKA IBK questionOn 3/7/09 4:35 AM, Yaniv Samuha wrote:
> > > Hello, > I have files containing moview reviews from the OpenLensGroup DB. > I have a file containing user data (user Id-number, age-number, > gender-M/F, occupation-number, zipcode-number) > and a file containg movie reviews DATA (user id-number, movie id-number, > grade-number) > [in the movie review file the same user Id can repeat but with different > movie Id in each row] > In order to do that I tried KNN alg and for that I want to use the WEKA > package > However, I don't know how to define the arff file based on the 2 files I > already have (of user, movie data). > I don't know what are the attributes that the generic IBK classifier can > take and based on them > give me a list of movie Id 's from the K-nearest users. > I need the accuracy to be based on the movies the users have seen > (meaning that if a user have seen movie A,B,C > and I select him for the test data then if the train are movies A,B then > if the IBK offers movie C the accuracy will increase) > Do you have an Idea on how to do that using the WEKA's IBK? This sounds like a recommender system to me. Weka doesn't support this kind of application directly, but you can certainly apply it's algorithms if you are willing to write some code. One approach would be to set up a separate learning problem for each movie (i.e. each movie is considered to be the class in turn and you predict, for a given user, whether they would want to see this movie based on the other movies and any other attributes you have available). Another approach would be to use K-nearest neighbors directly and fill in multiple movies simultaneously for each test instance. In any case, you would need to decide how to measure accuracy. I'd suggest taking a look at the volumes of information that have been written about the Netflix challenge. Cheers, Mark. -- Mark Hall Senior Developer/Consultant, Pentaho Open Source Business Intelligence Citadel International, Suite 340, 5950 Hazeltine National Dr., Orlando, FL 32822, USA +64 7 847-3537 office, +64 21 399-132 mobile, +1 815 550-8637 fax, Skype: mark.andrew.hall, Yahoo: mark_andrew_hall Download the latest release today <http://www.sourceforge.net/projects/pentaho> _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
| Free embeddable forum powered by Nabble | Forum Help |