Using weka in my own code generates exception

View: New views
7 Messages — Rating Filter:   Alert me  

Using weka in my own code generates exception

by Julian Moritz-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

I've been using weka for two days, and one detail is not clear to me:

Please have a look at the attached file.

What does work:

Defining an attribute "pos" with some predefined possible values (N and
V). Creating an instance with this attribute and setting value to N or
V, creating the classifier and classify something. Pretty fine at all.

What does not work:

Defining an attribute "pos" WITHOUT some predefined possible values.
Creating an instance with it and setting the value like this:

Attribute posAtt1 = dataset.attribute("pos");
inst1.setDataset(dataset);
inst1.setValue(posAtt1, posAtt1.addStringValue("N"));
inst1.setClassValue("1");

when calling classifier.buildClassifier(dataset); there's an exception
thrown:

weka.core.UnsupportedAttributeTypeException:
weka.classifiers.trees.j48.C45PruneableClassifierTree: Cannot handle
string attributes!
        at weka.core.Capabilities.test(Unknown Source)
        at weka.core.Capabilities.test(Unknown Source)
        at weka.core.Capabilities.test(Unknown Source)
        at weka.core.Capabilities.test(Unknown Source)
        at weka.core.Capabilities.testWithFail(Unknown Source)
        at
weka.classifiers.trees.j48.C45PruneableClassifierTree.buildClassifier(Unknown
Source)
        at weka.classifiers.trees.J48.buildClassifier(Unknown Source)
        at
de.uni_leipzig.asv.inflection.test.TestWeka.thisDoesntWork(TestWeka.java:116)
        at de.uni_leipzig.asv.inflection.test.TestWeka.main(TestWeka.java:14)

But why? Aren't the string-attributes internally handled as numeric
values and isn't returning Attribute.addStringValue an int?

Kind regards
Julian

package de.uni_leipzig.asv.inflection.test;

import weka.classifiers.Classifier;
import weka.classifiers.trees.J48;
import weka.core.Attribute;
import weka.core.FastVector;
import weka.core.Instance;
import weka.core.Instances;

public class TestWeka {

        public static void main(String[] args){
               
                thisWorks();
                thisDoesntWork();
                               
        }
       
        public static void thisWorks(){
               
                FastVector posses = new FastVector(2);
                posses.addElement("N");
                posses.addElement("V");
                               
                Attribute pos = new Attribute("pos", posses);
               
                FastVector hits = new FastVector(2);
                hits.addElement("1");
                hits.addElement("0");
               
                Attribute clss = new Attribute("hit", hits);
               
                FastVector attrs = new FastVector(2);
               
                attrs.addElement(pos);
                attrs.addElement(clss);
               
                Instances dataset = new Instances("testset", attrs, 2);
                dataset.setClass(clss);
               
                Instance inst1 = new Instance(2);
               
                Attribute posAtt1 = dataset.attribute("pos");
                inst1.setDataset(dataset);
                inst1.setValue(posAtt1, "N");
                inst1.setClassValue("1");
                inst1.setDataset(dataset);
               
                Instance inst2 = new Instance(2);
               
                Attribute posAtt2 = dataset.attribute("pos");
                inst2.setDataset(dataset);
                inst2.setValue(posAtt2, "V");
                inst2.setClassValue("0");
                inst2.setDataset(dataset);
                               
                Classifier classifier = new J48();
                try {
                        classifier.buildClassifier(dataset);
                } catch (Exception e) {
                        // TODO Auto-generated catch block
                        e.printStackTrace();
                }
               
               
                Instance inst3 = new Instance(2);
                Attribute posAtt3 = dataset.attribute("pos");
                inst3.setDataset(dataset);
                inst3.setValue(posAtt3, "N");
               
                try {
                        double predicted = classifier.classifyInstance(inst3);
                        System.out.println(dataset.classAttribute().value((int)predicted));
                } catch (Exception e) {
                        // TODO Auto-generated catch block
                        e.printStackTrace();
                }
               
       
               
        }
       
        public static void thisDoesntWork(){
                               
                //maybe the possible values are not known before!
                Attribute pos = new Attribute("pos", (FastVector)null);
               
                FastVector hits = new FastVector(2);
                hits.addElement("1");
                hits.addElement("0");
               
                Attribute clss = new Attribute("hit", hits);
               
                FastVector attrs = new FastVector(2);
               
                attrs.addElement(pos);
                attrs.addElement(clss);
               
                Instances dataset = new Instances("testset", attrs, 2);
                dataset.setClass(clss);
               
                Instance inst1 = new Instance(2);
               
                Attribute posAtt1 = dataset.attribute("pos");
                inst1.setDataset(dataset);
                inst1.setValue(posAtt1, posAtt1.addStringValue("N"));
                inst1.setClassValue("1");
                               
                Instance inst2 = new Instance(2);
               
                Attribute posAtt2 = dataset.attribute("pos");
                inst2.setDataset(dataset);
                inst2.setValue(posAtt2, posAtt2.addStringValue("V"));
                inst2.setClassValue("0");
                                               
                Classifier classifier = new J48();
                try {
                        //here an exception is thrown, output:
                        /*
                         * weka.core.UnsupportedAttributeTypeException: weka.classifiers.trees.j48.C45PruneableClassifierTree: Cannot handle string attributes!
        at weka.core.Capabilities.test(Unknown Source)
        at weka.core.Capabilities.test(Unknown Source)
        at weka.core.Capabilities.test(Unknown Source)
        at weka.core.Capabilities.test(Unknown Source)
        at weka.core.Capabilities.testWithFail(Unknown Source)
        at weka.classifiers.trees.j48.C45PruneableClassifierTree.buildClassifier(Unknown Source)
        at weka.classifiers.trees.J48.buildClassifier(Unknown Source)
        at de.uni_leipzig.asv.inflection.test.TestWeka.thisDoesntWork(TestWeka.java:116)
        at de.uni_leipzig.asv.inflection.test.TestWeka.main(TestWeka.java:14)
                         */
                        classifier.buildClassifier(dataset);
                } catch (Exception e) {
                        // TODO Auto-generated catch block
                        e.printStackTrace();
                }
                               
        }
       
}

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: Using weka in my own code generates exception

by Peter Reutemann-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> I've been using weka for two days, and one detail is not clear to me:
>
> Please have a look at the attached file.
>
> What does work:
>
> Defining an attribute "pos" with some predefined possible values (N and
> V). Creating an instance with this attribute and setting value to N or
> V, creating the classifier and classify something. Pretty fine at all.
>
> What does not work:
>
> Defining an attribute "pos" WITHOUT some predefined possible values.
> Creating an instance with it and setting the value like this:
>
> Attribute posAtt1 = dataset.attribute("pos");
> inst1.setDataset(dataset);
> inst1.setValue(posAtt1, posAtt1.addStringValue("N"));
> inst1.setClassValue("1");
>
> when calling classifier.buildClassifier(dataset); there's an exception
> thrown:
>
> weka.core.UnsupportedAttributeTypeException:
> weka.classifiers.trees.j48.C45PruneableClassifierTree: Cannot handle
> string attributes!
>        at weka.core.Capabilities.test(Unknown Source)
>        at weka.core.Capabilities.test(Unknown Source)
>        at weka.core.Capabilities.test(Unknown Source)
>        at weka.core.Capabilities.test(Unknown Source)
>        at weka.core.Capabilities.testWithFail(Unknown Source)
>        at
> weka.classifiers.trees.j48.C45PruneableClassifierTree.buildClassifier(Unknown
> Source)
>        at weka.classifiers.trees.J48.buildClassifier(Unknown Source)
>        at
> de.uni_leipzig.asv.inflection.test.TestWeka.thisDoesntWork(TestWeka.java:116)
>        at de.uni_leipzig.asv.inflection.test.TestWeka.main(TestWeka.java:14)
>
> But why? Aren't the string-attributes internally handled as numeric
> values and isn't returning Attribute.addStringValue an int?

"addStringValue" is only used for STRING attributes. STRING attributes
are used to contain textual information from documents, not predefined
values as NOMINAL attributes. Normally, one uses the
StringToWordVector filter to convert these STRING attributes into bag
of words, TF/IDF features, etc. Apart from the StringKernel (for
SVMs), I don't know of any algorithm in Weka that handles STRING
attributes (i.e., documents) natively.

Anyhow, instead of using a STRING attribute, define a NOMINAL, which
J48 will be able to handle.

See wiki article "Creating an ARFF file" for a code example on how to
create all currently supported attribute types in Weka:
  http://weka.wikispaces.com/Creating+an+ARFF+file

Cheers, Peter
--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/           Ph. +64 (7) 858-5174

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: Using weka in my own code generates exception

by Julian Moritz-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi there,

Peter Reutemann schrieb:

>
> "addStringValue" is only used for STRING attributes. STRING attributes
> are used to contain textual information from documents, not predefined
> values as NOMINAL attributes. Normally, one uses the
> StringToWordVector filter to convert these STRING attributes into bag
> of words, TF/IDF features, etc. Apart from the StringKernel (for
> SVMs), I don't know of any algorithm in Weka that handles STRING
> attributes (i.e., documents) natively.
>
> Anyhow, instead of using a STRING attribute, define a NOMINAL, which
> J48 will be able to handle.

that was what I thought after my mail yesterday, but wouldn't be the
StringToNominal filter helpful? Tried it but didn't get it working.

>
> See wiki article "Creating an ARFF file" for a code example on how to
> create all currently supported attribute types in Weka:
>   http://weka.wikispaces.com/Creating+an+ARFF+file

Thanks, I will have a look on it.

Regards
Julian

>
> Cheers, Peter

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: Using weka in my own code generates exception

by Julian Moritz-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

Julian Moritz schrieb:
> Hi there,
>
> Peter Reutemann schrieb:
>> Anyhow, instead of using a STRING attribute, define a NOMINAL, which
>> J48 will be able to handle.
>
> that was what I thought after my mail yesterday, but wouldn't be the
> StringToNominal filter helpful? Tried it but didn't get it working.
>

I'm using now the setAttributeRange and there's no exception now. But
the filter doesn't seem to do what I'm expecting. Before filtering my
instances look as follows:

Instance inst1 = new Instance(2);
                inst1.setDataset(dataset);
                Attribute posAtt1 = dataset.attribute("pos");
                inst1.setValue(posAtt1, posAtt1.addStringValue("N"));
                inst1.setClassValue("1");
               
               
                Instance inst2 = new Instance(2);
               
                Attribute posAtt2 = dataset.attribute("pos");
                inst2.setDataset(dataset);
                inst2.setValue(posAtt2, posAtt2.addStringValue("V"));
                inst2.setClassValue("0");

If I apply the filter:

@relation testset-weka.filters.unsupervised.attribute.StringToNominal-Rfirst

@attribute pos {}
@attribute hit {1,0}

@data

But shouldn't it be @attribute pos {N,V} ?

If I classify an instance with the value "V" for the pos-attribute, the
class is "1", but should be "0".

Regards
Julian

>> See wiki article "Creating an ARFF file" for a code example on how to
>> create all currently supported attribute types in Weka:
>>   http://weka.wikispaces.com/Creating+an+ARFF+file
>
> Thanks, I will have a look on it.
>
> Regards
> Julian
>
>> Cheers, Peter
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: Wekalist@...
> List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: Using weka in my own code generates exception

by Julian Moritz-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

Julian Moritz schrieb:
> Hi,
>
> Julian Moritz schrieb:
>> Hi there,
>>

I think I've solved my problem, need more coffee, posting later a
solution. Sorry for bothering.

Regards
Julian

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: Using weka in my own code generates exception

by Peter Reutemann-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>> "addStringValue" is only used for STRING attributes. STRING attributes
>> are used to contain textual information from documents, not predefined
>> values as NOMINAL attributes. Normally, one uses the
>> StringToWordVector filter to convert these STRING attributes into bag
>> of words, TF/IDF features, etc. Apart from the StringKernel (for
>> SVMs), I don't know of any algorithm in Weka that handles STRING
>> attributes (i.e., documents) natively.
>>
>> Anyhow, instead of using a STRING attribute, define a NOMINAL, which
>> J48 will be able to handle.
>
> that was what I thought after my mail yesterday, but wouldn't be the
> StringToNominal filter helpful? Tried it but didn't get it working.

If you know your labels and they are fixed, then use a NOMINAL
attribute. STRING attributes are used for arbitrary values (that's why
STRING attributes are used for document classification).

Weka requires the training and test dataset to be exactly the same, if
you run the StringToNominal filter separately on two datasets with
STRING attributes, you will generate incompatible datasets.

[...]


Cheers, Peter
--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/           Ph. +64 (7) 858-5174

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: Using weka in my own code generates exception

by Peter Reutemann-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>>> Anyhow, instead of using a STRING attribute, define a NOMINAL, which
>>> J48 will be able to handle.
>>
>> that was what I thought after my mail yesterday, but wouldn't be the
>> StringToNominal filter helpful? Tried it but didn't get it working.
>>
>
> I'm using now the setAttributeRange and there's no exception now. But
> the filter doesn't seem to do what I'm expecting. Before filtering my
> instances look as follows:
>
> Instance inst1 = new Instance(2);
>                inst1.setDataset(dataset);
>                Attribute posAtt1 = dataset.attribute("pos");
>                inst1.setValue(posAtt1, posAtt1.addStringValue("N"));
>                inst1.setClassValue("1");
>
>
>                Instance inst2 = new Instance(2);
>
>                Attribute posAtt2 = dataset.attribute("pos");
>                inst2.setDataset(dataset);
>                inst2.setValue(posAtt2, posAtt2.addStringValue("V"));
>                inst2.setClassValue("0");
>
> If I apply the filter:
>
> @relation testset-weka.filters.unsupervised.attribute.StringToNominal-Rfirst
>
> @attribute pos {}
> @attribute hit {1,0}
>
> @data
>
> But shouldn't it be @attribute pos {N,V} ?

No, you only *used* the values "1" and "0" (you're setting strings and
not numeric indexes!). All others get discarded. Use a NOMINAL
attribute instead with predefined values.

> If I classify an instance with the value "V" for the pos-attribute, the
> class is "1", but should be "0".

My advice, forget about using a STRING attribute and use NOMINAL one.

Cheers, Peter
--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/           Ph. +64 (7) 858-5174

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html