|
View:
New views
6 Messages
—
Rating Filter:
Alert me
|
|
|
Are we supposed to Mention CLASS in TESTING DATASorry for this almost stupid question .. But its just for confirmation that in the testing file, are we suppoed to mention the class label after each instance??
_______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
Re: Are we supposed to Mention CLASS in TESTING DATA> Sorry for this almost stupid question .. But its just for confirmation that
> in the testing file, are we suppoed to mention the class label after each > instance?? If your test set doesn't contain any values for the class attribute, then you can't evaluate the performance of a classifier (classifiers aren't psychic yet). Even if you just want to output the predictions, your test set still needs the same structure as the training set. In that case, just use missing values (in ARFF files, missing values are denoted by the question mark). Cheers, Peter -- Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174 _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
Re: Are we supposed to Mention CLASS in TESTING DATAWriting a text analysis program and generating test datasets with StringToWordVector to classify with a pre-trained classifier, I noticed that it classified even without the exact same format.
I am now under the impression that Weka will match up the columns with similar names, and consider the ones that don't exist are ignored. Maybe they're counted as 'missing' by the classifier? I don't really know! I just know that it works! Is it just taking the first n attributes, and discarding n+1 and on? Incorrectly comparing attributes that aren't the same metric? (if so, is there a way to StringToWordVector a String field into the appropriate attribute set?) On Sun, Nov 8, 2009 at 12:12 AM, Peter Reutemann <fracpete@...> wrote:
_______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
Re: Are we supposed to Mention CLASS in TESTING DATAPlease no top-posting, see mailing list etiquette why
(http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html). > Writing a text analysis program and generating test datasets with > StringToWordVector to classify with a pre-trained classifier, I noticed that > it classified even without the exact same format. Use batch filtering (see FAQ "How do I generate compatible train and test sets that get processed with a filter?") or the FilteredClassifier meta-classifier (StringToWordVector and your choice of base-classifier) to ensure compatible datasets. > I am now under the impression that Weka will match up the columns with > similar names, and consider the ones that don't exist are ignored. Maybe > they're counted as 'missing' by the classifier? I don't really know! I just > know that it works! Weka *assumes* that the datasets have the same structure. There is no magic happening, Weka's classifiers will just use attribute indices when making predictions. If they datasets differ, values from different attributes (at the same location) will be picked. In a nutshell: the results cannot be trusted. > Is it just taking the first n attributes, and discarding n+1 and on? That depends on the classifier. A decision tree based classifier might only need 5 attributes out of a 1000. > Incorrectly comparing attributes that aren't the same metric? Weka's internal data format are just doubles (the indices of labels are stored for nominal attributes), nothing more. Simple and fast number comparisons happen internally. Computational expensive mappings original attribute space of input and potentially different attribute space when predicting would make it unbearably slow. > (if so, is > there a way to StringToWordVector a String field into the appropriate > attribute set?) Like I said above, use batch filtering or the FilteredClassifier approach. Cheers, Peter -- Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174 _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
Re: Are we supposed to Mention CLASS in TESTING DATAMy test set may be huge. Can I serialize filters?
Gosh, that'd be convenient.
On Sun, Nov 8, 2009 at 5:34 AM, Peter Reutemann <fracpete@...> wrote: Please no top-posting, see mailing list etiquette why _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
Re: Are we supposed to Mention CLASS in TESTING DATAHint: top-posting is frowned upan and delete irrelevant stuff from posts...
> My test set may be huge. Can I serialize filters? Yes, like most classes in Weka. BTW If you use the FilteredClassifier approach (with the classifier and filter that you want to use), then you can just use the original datasets and the FilteredClassifier takes care of the rest. The filter and base-classifier are part of the FilteredClassifier object and will get serialized with it. Cheers, Peter -- Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174 _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
| Free embeddable forum powered by Nabble | Forum Help |