|
View:
New views
6 Messages
—
Rating Filter:
Alert me
|
|
|
Meaning of confusion matrix for OneRI applied OneR toFisher’s iris dataset with 10-fold cross-validation and minBucketSize = 3. Petal Width was the attribute that OneR chose, with split at 0.8 and 1.65. The output below the rule says (144/150 instances correct). When I look at the dataset, this checks out, with 4 virginica classified as versicolor, and 4 versicolor classified as virginica. But the confusion matrix says that 6 virginica were classified as versicolor, and 6 versicolor as virginica.
When I repeat this with minBucketSize = 6, the results are the same except the confusion matrix now says 2 virginica as versicolor and 7 versicolor as virginica. Why might this be? I’m using Weka 3.7. Thanks, Mark Polczynski _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
Re: Meaning of confusion matrix for OneR> I applied OneR toFisher’s iris dataset with 10-fold cross-validation and minBucketSize = 3. Petal Width was the attribute that OneR chose, with split at 0.8 and 1.65. The output below the rule says (144/150 instances correct). When I look at the dataset, this checks out, with 4 virginica classified as versicolor, and 4 versicolor classified as virginica. But the confusion matrix says that 6 virginica were classified as versicolor, and 6 versicolor as virginica.
> > When I repeat this with minBucketSize = 6, the results are the same except the confusion matrix now says 2 virginica as versicolor and 7 versicolor as virginica. > > Why might this be? I’m using Weka 3.7. 10-fold cross-validation generates 10 different models. The confusion matrix reflects this, the printed model is built on the full training data *before* cross-validation is performed (you can turn it off in the Explorer in the "More Options" dialog: "Output model"). This model is only printed to give the user an idea of what the classifier does on the data, it doesn't necessarily reflect the 10 CV models. Cheers, Peter -- Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174 _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
|
|
|
Re: Re: Re: Meaning of confusion matrix for OneROn 7/3/09 4:34 AM, Polczynski, Mark wrote:
>>> I applied OneR toFishers iris dataset with 10-fold cross-validation and minBucketSize = 3. Petal Width was the attribute that OneR chose, with split at 0.8 and 1.65. The output below the rule says (144/150 instances correct). When I look at the dataset, this checks out, with 4 virginica classified as versicolor, and 4 versicolor classified as virginica. But the confusion matrix says that 6 virginica were classified as versicolor, and 6 versicolor as virginica. >>> >>> When I repeat this with minBucketSize = 6, the results are the same except the confusion matrix now says 2 virginica as versicolor and 7 versicolor as virginica. >>> >>> Why might this be? Im using Weka 3.7. > >> 10-fold cross-validation generates 10 different models. The confusion > matrix reflects this, the printed model is built on the full training > data *before* cross-validation is performed (you can turn it off in > the Explorer in the "More Options" dialog: "Output model"). This model > is only printed to give the user an idea of what the classifier does > on the data, it doesn't necessarily reflect the 10 CV models. > >> Cheers, Peter >> -- >> Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ >> http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174 > > ******************************* > Thanks, Peter. Now for a follow-on question. I modified the Fishers' iris dataset to have 4 missing values in each of the four attributes. I used OneR with 10-fold cross validation and minBucketSize = 6. The classifier model says: > > Petal Length: > < 2.45 - setosa > < 4.75 -> versicolor >> = 4.75 -> virginica > ? -> virginica > > So, what does the last line mean? Also, just to verify, is it true that OneR automatically replaces all missing values for an attribute the average value for the attribute? missing. So OneR does not do global replacement of missing values. Cheers, Mark. -- Mark Hall Senior Developer/Consultant, Pentaho Open Source Business Intelligence Citadel International, Suite 340, 5950 Hazeltine National Dr., Orlando, FL 32822, USA +64 7 847-3537 office, +64 21 399-132 mobile, +1 815 550-8637 fax, Skype: mark.andrew.hall, Yahoo: mark_andrew_hall Download the latest release today <http://www.sourceforge.net/projects/pentaho> _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
|
|
|
Re: Re: Re: Re: Re: Meaning of confusion matrix for OneROn 4/7/09 4:33 AM, Polczynski, Mark wrote:
> Thanks Mark. Next question: why does OneR choose Virginica when Petal Length is missing? I put in 4 missing values for each attribute. I do notice that Virginica has 2 missing values for Petal Length, whereas Setosa and Versicolor only have 1 missing value for this attribute. OneR will just use the class label that occurs most frequently when the value of the attribute in question is missing. Cheers, Mark. -- Mark Hall Senior Developer/Consultant, Pentaho Open Source Business Intelligence Citadel International, Suite 340, 5950 Hazeltine National Dr., Orlando, FL 32822, USA +64 7 847-3537 office, +64 21 399-132 mobile, +1 815 550-8637 fax, Skype: mark.andrew.hall, Yahoo: mark_andrew_hall Download the latest release today <http://www.sourceforge.net/projects/pentaho> _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
| Free embeddable forum powered by Nabble | Forum Help |