On 7/3/09 4:34 AM, Polczynski, Mark wrote:
>>> I applied OneR toFishers iris dataset with 10-fold cross-validation and minBucketSize = 3. Petal Width was the attribute that OneR chose, with split at 0.8 and 1.65. The output below the rule says (144/150 instances correct). When I look at the dataset, this checks out, with 4 virginica classified as versicolor, and 4 versicolor classified as virginica. But the confusion matrix says that 6 virginica were classified as versicolor, and 6 versicolor as virginica.
>>>
>>> When I repeat this with minBucketSize = 6, the results are the same except the confusion matrix now says 2 virginica as versicolor and 7 versicolor as virginica.
>>>
>>> Why might this be? Im using Weka 3.7.
>
>> 10-fold cross-validation generates 10 different models. The confusion
> matrix reflects this, the printed model is built on the full training
> data *before* cross-validation is performed (you can turn it off in
> the Explorer in the "More Options" dialog: "Output model"). This model
> is only printed to give the user an idea of what the classifier does
> on the data, it doesn't necessarily reflect the 10 CV models.
>
>> Cheers, Peter
>> --
>> Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
>>
http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174
>
> *******************************
> Thanks, Peter. Now for a follow-on question. I modified the Fishers' iris dataset to have 4 missing values in each of the four attributes. I used OneR with 10-fold cross validation and minBucketSize = 6. The classifier model says:
>
> Petal Length:
> < 2.45 - setosa
> < 4.75 -> versicolor
>> = 4.75 -> virginica
> ? -> virginica
>
> So, what does the last line mean? Also, just to verify, is it true that OneR automatically replaces all missing values for an attribute the average value for the attribute?
The last line indicates the prediction to use when Petal Length is
missing. So OneR does not do global replacement of missing values.
Cheers,
Mark.
--
Mark Hall
Senior Developer/Consultant, Pentaho Open Source Business Intelligence
Citadel International, Suite 340, 5950 Hazeltine National Dr., Orlando,
FL 32822, USA
+64 7 847-3537 office, +64 21 399-132 mobile, +1 815 550-8637 fax,
Skype: mark.andrew.hall, Yahoo: mark_andrew_hall
Download the latest release today
<
http://www.sourceforge.net/projects/pentaho>
_______________________________________________
Wekalist mailing list
Send posts to:
Wekalist@...
List info and subscription status:
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalistList etiquette:
http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html