|
View:
New views
2 Messages
—
Rating Filter:
Alert me
|
|
|
Re: RE: Wekalist Digest, Vol 77, Issue 6>> I am using OneR on a version of the weather dataset (outlook/temp/humid/ windy/play) which has numerical attributes for temperature and humidity. Is there a way to see the bin ranges that OneR uses to discretize these numerical attributes?
> If OneR chooses a numeric attribute to base its model on (the classifiers uses only a single attribute!), then the output is as follows (UCI dataset "balance-scale"): > left-weight: < 2.5 -> R >= 2.5 -> L > For numeric attributes, the generated rule holds the breakpoints (= borders between bins). The above example has one breakpoint and therefore two bins. Thank you Peter. I believe I am asking a different question. It seems that in order for OneR to select the single attribute, it must discretize the numerical attributes into nominal attributes first. I'm assuming that it does discretization using the scheme outlined in the paper by Neville-Manning, Holmes and Witten. I know that I can specify the minBucketSize in the GenericObjetEditor that OneR will use when discretizing the values. What I would like to see is the bins, or perhaps the term is "buckets", that OneR put the numerical values in to in order to select the single attribute. The goal is to compare this with the nominal attributes in the version of the weather database that has all nominal values to see what the difference is between the attribute values for temperature and humidity. I would then like to compare this with equal-width and equal-frequency discretization. Thanks again, Mark Polczynski _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
Re: Re: RE: Wekalist Digest, Vol 77, Issue 6>>> I am using OneR on a version of the weather dataset (outlook/temp/humid/ windy/play) which has numerical attributes for temperature and humidity. Is there a way to see the bin ranges that OneR uses to discretize these numerical attributes?
> >> If OneR chooses a numeric attribute to base its model on (the > classifiers uses only a single attribute!), then the output is as > follows (UCI dataset "balance-scale"): > >> left-weight: > < 2.5 -> R > >= 2.5 -> L > >> For numeric attributes, the generated rule holds the breakpoints (= > borders between bins). The above example has one breakpoint and > therefore two bins. > > Thank you Peter. I believe I am asking a different question. Nope, maybe I wasn't clear enough. > It seems that in order for OneR to select the single attribute, it must discretize the numerical attributes into nominal attributes first. The breakpoints of a rule resemble OneR's form of discretization. It doesn't use any of Weka's discretization filters. Here is a short summary of how OneR computes the breakspoints for a numeric rule (for a specific numeric attribute) based on what I can gather from the code (method "newNumericRule"): 1. sort the instances in ascending manner in regards to the selected numeric attribute, set instance index to start of sorted list 2. reset class label counter 3. increment the instance index and increment the class label counter accordingly 4. the first class label that meets the minBucketSize requirement is used as candidate for new breakpoint, if not go back to 3. 5. continue counting class labels (i.e., incrementing instance index) until the class label changes 6. if class labels of candidate is the same as from previous breakpoint, merge them, otherwise we have a new breakpoint at the given position; in both cases, update the breakpoint value with the current value from the sorted instance list 7. go back to 2. and continue going through the instances. After rules for all attributes apart from the class attribute have been generated, the rule that gets most class labels right is selected as the only rule used in the model (OneR = one rule). > I'm assuming that it does discretization using the scheme outlined in the paper by Neville-Manning, Holmes and Witten. I know that I can specify the minBucketSize in the GenericObjetEditor that OneR will use when discretizing the values. What I would like to see is the bins, or perhaps the term is "buckets", that OneR put the numerical values in to in order to select the single attribute. The goal is to compare this with the nominal attributes in the version of the weather database that has all nominal values to see what the difference is between the attribute values for temperature and humidity. I would then like to compare this with equal-width and equal-frequency discretization. The breakpoints resemble the borders between the bins. The example from my previous post has exactly one breakpoint for the chosen attribute "left-weight", which results in two bins: - bin 1: (-Inf, 2.5) - bin 2: [2.5, +Inf) Cheers, Peter -- Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174 _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
| Free embeddable forum powered by Nabble | Forum Help |