|
View:
New views
7 Messages
—
Rating Filter:
Alert me
|
|
|
Using weka in my own code generates exceptionHello,
I've been using weka for two days, and one detail is not clear to me: Please have a look at the attached file. What does work: Defining an attribute "pos" with some predefined possible values (N and V). Creating an instance with this attribute and setting value to N or V, creating the classifier and classify something. Pretty fine at all. What does not work: Defining an attribute "pos" WITHOUT some predefined possible values. Creating an instance with it and setting the value like this: Attribute posAtt1 = dataset.attribute("pos"); inst1.setDataset(dataset); inst1.setValue(posAtt1, posAtt1.addStringValue("N")); inst1.setClassValue("1"); when calling classifier.buildClassifier(dataset); there's an exception thrown: weka.core.UnsupportedAttributeTypeException: weka.classifiers.trees.j48.C45PruneableClassifierTree: Cannot handle string attributes! at weka.core.Capabilities.test(Unknown Source) at weka.core.Capabilities.test(Unknown Source) at weka.core.Capabilities.test(Unknown Source) at weka.core.Capabilities.test(Unknown Source) at weka.core.Capabilities.testWithFail(Unknown Source) at weka.classifiers.trees.j48.C45PruneableClassifierTree.buildClassifier(Unknown Source) at weka.classifiers.trees.J48.buildClassifier(Unknown Source) at de.uni_leipzig.asv.inflection.test.TestWeka.thisDoesntWork(TestWeka.java:116) at de.uni_leipzig.asv.inflection.test.TestWeka.main(TestWeka.java:14) But why? Aren't the string-attributes internally handled as numeric values and isn't returning Attribute.addStringValue an int? Kind regards Julian package de.uni_leipzig.asv.inflection.test; import weka.classifiers.Classifier; import weka.classifiers.trees.J48; import weka.core.Attribute; import weka.core.FastVector; import weka.core.Instance; import weka.core.Instances; public class TestWeka { public static void main(String[] args){ thisWorks(); thisDoesntWork(); } public static void thisWorks(){ FastVector posses = new FastVector(2); posses.addElement("N"); posses.addElement("V"); Attribute pos = new Attribute("pos", posses); FastVector hits = new FastVector(2); hits.addElement("1"); hits.addElement("0"); Attribute clss = new Attribute("hit", hits); FastVector attrs = new FastVector(2); attrs.addElement(pos); attrs.addElement(clss); Instances dataset = new Instances("testset", attrs, 2); dataset.setClass(clss); Instance inst1 = new Instance(2); Attribute posAtt1 = dataset.attribute("pos"); inst1.setDataset(dataset); inst1.setValue(posAtt1, "N"); inst1.setClassValue("1"); inst1.setDataset(dataset); Instance inst2 = new Instance(2); Attribute posAtt2 = dataset.attribute("pos"); inst2.setDataset(dataset); inst2.setValue(posAtt2, "V"); inst2.setClassValue("0"); inst2.setDataset(dataset); Classifier classifier = new J48(); try { classifier.buildClassifier(dataset); } catch (Exception e) { // TODO Auto-generated catch block e.printStackTrace(); } Instance inst3 = new Instance(2); Attribute posAtt3 = dataset.attribute("pos"); inst3.setDataset(dataset); inst3.setValue(posAtt3, "N"); try { double predicted = classifier.classifyInstance(inst3); System.out.println(dataset.classAttribute().value((int)predicted)); } catch (Exception e) { // TODO Auto-generated catch block e.printStackTrace(); } } public static void thisDoesntWork(){ //maybe the possible values are not known before! Attribute pos = new Attribute("pos", (FastVector)null); FastVector hits = new FastVector(2); hits.addElement("1"); hits.addElement("0"); Attribute clss = new Attribute("hit", hits); FastVector attrs = new FastVector(2); attrs.addElement(pos); attrs.addElement(clss); Instances dataset = new Instances("testset", attrs, 2); dataset.setClass(clss); Instance inst1 = new Instance(2); Attribute posAtt1 = dataset.attribute("pos"); inst1.setDataset(dataset); inst1.setValue(posAtt1, posAtt1.addStringValue("N")); inst1.setClassValue("1"); Instance inst2 = new Instance(2); Attribute posAtt2 = dataset.attribute("pos"); inst2.setDataset(dataset); inst2.setValue(posAtt2, posAtt2.addStringValue("V")); inst2.setClassValue("0"); Classifier classifier = new J48(); try { //here an exception is thrown, output: /* * weka.core.UnsupportedAttributeTypeException: weka.classifiers.trees.j48.C45PruneableClassifierTree: Cannot handle string attributes! at weka.core.Capabilities.test(Unknown Source) at weka.core.Capabilities.test(Unknown Source) at weka.core.Capabilities.test(Unknown Source) at weka.core.Capabilities.test(Unknown Source) at weka.core.Capabilities.testWithFail(Unknown Source) at weka.classifiers.trees.j48.C45PruneableClassifierTree.buildClassifier(Unknown Source) at weka.classifiers.trees.J48.buildClassifier(Unknown Source) at de.uni_leipzig.asv.inflection.test.TestWeka.thisDoesntWork(TestWeka.java:116) at de.uni_leipzig.asv.inflection.test.TestWeka.main(TestWeka.java:14) */ classifier.buildClassifier(dataset); } catch (Exception e) { // TODO Auto-generated catch block e.printStackTrace(); } } } _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
Re: Using weka in my own code generates exception> I've been using weka for two days, and one detail is not clear to me:
> > Please have a look at the attached file. > > What does work: > > Defining an attribute "pos" with some predefined possible values (N and > V). Creating an instance with this attribute and setting value to N or > V, creating the classifier and classify something. Pretty fine at all. > > What does not work: > > Defining an attribute "pos" WITHOUT some predefined possible values. > Creating an instance with it and setting the value like this: > > Attribute posAtt1 = dataset.attribute("pos"); > inst1.setDataset(dataset); > inst1.setValue(posAtt1, posAtt1.addStringValue("N")); > inst1.setClassValue("1"); > > when calling classifier.buildClassifier(dataset); there's an exception > thrown: > > weka.core.UnsupportedAttributeTypeException: > weka.classifiers.trees.j48.C45PruneableClassifierTree: Cannot handle > string attributes! > at weka.core.Capabilities.test(Unknown Source) > at weka.core.Capabilities.test(Unknown Source) > at weka.core.Capabilities.test(Unknown Source) > at weka.core.Capabilities.test(Unknown Source) > at weka.core.Capabilities.testWithFail(Unknown Source) > at > weka.classifiers.trees.j48.C45PruneableClassifierTree.buildClassifier(Unknown > Source) > at weka.classifiers.trees.J48.buildClassifier(Unknown Source) > at > de.uni_leipzig.asv.inflection.test.TestWeka.thisDoesntWork(TestWeka.java:116) > at de.uni_leipzig.asv.inflection.test.TestWeka.main(TestWeka.java:14) > > But why? Aren't the string-attributes internally handled as numeric > values and isn't returning Attribute.addStringValue an int? "addStringValue" is only used for STRING attributes. STRING attributes are used to contain textual information from documents, not predefined values as NOMINAL attributes. Normally, one uses the StringToWordVector filter to convert these STRING attributes into bag of words, TF/IDF features, etc. Apart from the StringKernel (for SVMs), I don't know of any algorithm in Weka that handles STRING attributes (i.e., documents) natively. Anyhow, instead of using a STRING attribute, define a NOMINAL, which J48 will be able to handle. See wiki article "Creating an ARFF file" for a code example on how to create all currently supported attribute types in Weka: http://weka.wikispaces.com/Creating+an+ARFF+file Cheers, Peter -- Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174 _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
Re: Using weka in my own code generates exceptionHi there,
Peter Reutemann schrieb: > > "addStringValue" is only used for STRING attributes. STRING attributes > are used to contain textual information from documents, not predefined > values as NOMINAL attributes. Normally, one uses the > StringToWordVector filter to convert these STRING attributes into bag > of words, TF/IDF features, etc. Apart from the StringKernel (for > SVMs), I don't know of any algorithm in Weka that handles STRING > attributes (i.e., documents) natively. > > Anyhow, instead of using a STRING attribute, define a NOMINAL, which > J48 will be able to handle. that was what I thought after my mail yesterday, but wouldn't be the StringToNominal filter helpful? Tried it but didn't get it working. > > See wiki article "Creating an ARFF file" for a code example on how to > create all currently supported attribute types in Weka: > http://weka.wikispaces.com/Creating+an+ARFF+file Thanks, I will have a look on it. Regards Julian > > Cheers, Peter _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
Re: Using weka in my own code generates exceptionHi,
Julian Moritz schrieb: > Hi there, > > Peter Reutemann schrieb: >> Anyhow, instead of using a STRING attribute, define a NOMINAL, which >> J48 will be able to handle. > > that was what I thought after my mail yesterday, but wouldn't be the > StringToNominal filter helpful? Tried it but didn't get it working. > I'm using now the setAttributeRange and there's no exception now. But the filter doesn't seem to do what I'm expecting. Before filtering my instances look as follows: Instance inst1 = new Instance(2); inst1.setDataset(dataset); Attribute posAtt1 = dataset.attribute("pos"); inst1.setValue(posAtt1, posAtt1.addStringValue("N")); inst1.setClassValue("1"); Instance inst2 = new Instance(2); Attribute posAtt2 = dataset.attribute("pos"); inst2.setDataset(dataset); inst2.setValue(posAtt2, posAtt2.addStringValue("V")); inst2.setClassValue("0"); If I apply the filter: @relation testset-weka.filters.unsupervised.attribute.StringToNominal-Rfirst @attribute pos {} @attribute hit {1,0} @data But shouldn't it be @attribute pos {N,V} ? If I classify an instance with the value "V" for the pos-attribute, the class is "1", but should be "0". Regards Julian >> See wiki article "Creating an ARFF file" for a code example on how to >> create all currently supported attribute types in Weka: >> http://weka.wikispaces.com/Creating+an+ARFF+file > > Thanks, I will have a look on it. > > Regards > Julian > >> Cheers, Peter > > _______________________________________________ > Wekalist mailing list > Send posts to: Wekalist@... > List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html > _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
Re: Using weka in my own code generates exceptionHi,
Julian Moritz schrieb: > Hi, > > Julian Moritz schrieb: >> Hi there, >> I think I've solved my problem, need more coffee, posting later a solution. Sorry for bothering. Regards Julian _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
Re: Using weka in my own code generates exception>> "addStringValue" is only used for STRING attributes. STRING attributes
>> are used to contain textual information from documents, not predefined >> values as NOMINAL attributes. Normally, one uses the >> StringToWordVector filter to convert these STRING attributes into bag >> of words, TF/IDF features, etc. Apart from the StringKernel (for >> SVMs), I don't know of any algorithm in Weka that handles STRING >> attributes (i.e., documents) natively. >> >> Anyhow, instead of using a STRING attribute, define a NOMINAL, which >> J48 will be able to handle. > > that was what I thought after my mail yesterday, but wouldn't be the > StringToNominal filter helpful? Tried it but didn't get it working. If you know your labels and they are fixed, then use a NOMINAL attribute. STRING attributes are used for arbitrary values (that's why STRING attributes are used for document classification). Weka requires the training and test dataset to be exactly the same, if you run the StringToNominal filter separately on two datasets with STRING attributes, you will generate incompatible datasets. [...] Cheers, Peter -- Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174 _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
Re: Using weka in my own code generates exception>>> Anyhow, instead of using a STRING attribute, define a NOMINAL, which
>>> J48 will be able to handle. >> >> that was what I thought after my mail yesterday, but wouldn't be the >> StringToNominal filter helpful? Tried it but didn't get it working. >> > > I'm using now the setAttributeRange and there's no exception now. But > the filter doesn't seem to do what I'm expecting. Before filtering my > instances look as follows: > > Instance inst1 = new Instance(2); > inst1.setDataset(dataset); > Attribute posAtt1 = dataset.attribute("pos"); > inst1.setValue(posAtt1, posAtt1.addStringValue("N")); > inst1.setClassValue("1"); > > > Instance inst2 = new Instance(2); > > Attribute posAtt2 = dataset.attribute("pos"); > inst2.setDataset(dataset); > inst2.setValue(posAtt2, posAtt2.addStringValue("V")); > inst2.setClassValue("0"); > > If I apply the filter: > > @relation testset-weka.filters.unsupervised.attribute.StringToNominal-Rfirst > > @attribute pos {} > @attribute hit {1,0} > > @data > > But shouldn't it be @attribute pos {N,V} ? No, you only *used* the values "1" and "0" (you're setting strings and not numeric indexes!). All others get discarded. Use a NOMINAL attribute instead with predefined values. > If I classify an instance with the value "V" for the pos-attribute, the > class is "1", but should be "0". My advice, forget about using a STRING attribute and use NOMINAL one. Cheers, Peter -- Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174 _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
| Free embeddable forum powered by Nabble | Forum Help |