Instance level meta-data

View: New views
11 Messages — Rating Filter:   Alert me  

Instance level meta-data

by Marco FAQ :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Guys,

A newbie question:

I am developing a KDTree based application. Just wondering what would be the best way to handle meta-data at Instance level.

First I thought of using a Map of Instance and meta-data inside application, but it seems the neasrestNeighbour method of KDTree returns a copy of the Instance object instead of original object, so this approach won't work.

I could add meta-data as atrributes, but I do not know how to let KDTree know not to use it in distance calculation.
thanks,
Marco



_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: Instance level meta-data

by Peter Reutemann-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> I am developing a KDTree based application. Just wondering what would be the
> best way to handle meta-data at Instance level.
>
> First I thought of using a Map of Instance and meta-data inside application,
> but it seems the neasrestNeighbour method of KDTree returns a copy of the
> Instance object instead of original object, so this approach won't work.
>
> I could add meta-data as atrributes, but I do not know how to let KDTree
> know not to use it in distance calculation.

The EuclideanDistance distance function that KDTree uses, allows you
to specify the attribute range to base the distance calculations on.
That should enable you to store the meta-data as additional
attributes.

Cheers, Peter
--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/           Ph. +64 (7) 858-5174

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: Instance level meta-data

by Marco FAQ :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Thanks for the reply. I tried adding meta-data as you suggested, but when KDTree returns nearestNeighbour, it wipes out all the attributes not used in distance calculation from the original Instance object.
 
Any workarounds?

thanks,
Marco

On Tue, Nov 3, 2009 at 12:12 PM, Peter Reutemann <fracpete@...> wrote:
> I am developing a KDTree based application. Just wondering what would be the
> best way to handle meta-data at Instance level.
>
> First I thought of using a Map of Instance and meta-data inside application,
> but it seems the neasrestNeighbour method of KDTree returns a copy of the
> Instance object instead of original object, so this approach won't work.
>
> I could add meta-data as atrributes, but I do not know how to let KDTree
> know not to use it in distance calculation.

The EuclideanDistance distance function that KDTree uses, allows you
to specify the attribute range to base the distance calculations on.
That should enable you to store the meta-data as additional
attributes.

Cheers, Peter
--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/           Ph. +64 (7) 858-5174

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Gradient Boost for 2-class classification problem

by Doan Viet Dung :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi all,
This is my first post on weka list.

Currently, I am implementing a Gradient Boost algorithm with Logistic Loss Function and Decision Tree (Regression Tree) as weak learner (e.g for simple 2 terminal nodes DecisionStump) using weka. Such generic boosting algorithm can be found in the paper "Greedy Function Approximation: A Gradient Boosting Machine, J.H. Friedman"

I have 2 questions:

1. Is there any similar class in weka does the same thing as I do ?
2. When I build a tree classifier f for a set of data { x_i, y_i } , i = 1,..,N, x \in R, y = {-1,1} and call f.distributionForInstance(x_i), it will provide the an array of distribution values d for each value x_i. For example in case of two class y_i = {-1,1}, we have two distribution values d[0] for y=-1 and d[1] for y = 1 respectively.
However, when I transform the class y to numeric y' and build a new tree classifier f' on such modified set of data { x,y' }. Then the new f'.distributionForInstance(x_i) gives only one value d[0]. So what does this value mean ?

Thanks in advance
VDung


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: Instance level meta-data

by Peter Reutemann-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Please no top-posting, see mailing list etiquette why
(http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html).


> Thanks for the reply. I tried adding meta-data as you suggested, but when
> KDTree returns nearestNeighbour, it wipes out all the attributes not used in
> distance calculation from the original Instance object.

Sorry, but I get the same data format back that I used to initialize
the KDTree instance with. See attached example class and example
output (run on the UCI dataset "anneal").

Cheers, Peter
--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/           Ph. +64 (7) 858-5174

Input data has 39 attributes.
Neighbors data has 39 attributes.

Instance:
'?',C,R,0,0,'?',S,2,0,'?','?','?','?','?',Y,'?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?',SHEET,2,1500,4170,'?',0,'?',2

Neighbors:
1. distance=0.10664893617021276
   '?',C,R,0,0,'?',S,2,0,'?','?','?','?','?',Y,'?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?',SHEET,1.599,1500,4170,'?',0,'?',2
2. distance=0.3191489361702128
   '?',C,R,0,0,'?',S,2,0,'?','?','?','?','?',Y,'?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?',SHEET,0.8,1500,4170,'?',0,'?',2
3. distance=0.7768496558027644
   '?',C,R,0,0,'?',S,2,0,'?','?','?','?','?',Y,'?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?',SHEET,0.8,1320,762,'?',0,'?',2
4. distance=1.005642748774168
   '?',C,R,0,0,'?',S,2,0,'?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?',SHEET,1.6,1500,4170,'?',0,'?',2
5. distance=1.0229176046025408
   '?',C,R,0,0,'?',S,2,0,'?','?',E,'?','?',Y,'?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?',SHEET,1.601,1320,4880,'?',0,'?',3

[KDTreeWithMetaData.java]

import weka.core.*;
import weka.core.converters.ConverterUtils.*;
import weka.core.neighboursearch.*;

import java.util.*;

/**
 * Example class for demonstrating how to use KDTree with meta-data, i.e.,
 * additional attributes that are not used in the distance calculation.
 *
 * @author FracPete (fracpete at waikato dot ac dot nz)
 * @version $Revision$
 */
public class KDTreeWithMetaData {

  /**
   * Expects a dataset as first parameter. The last attribute is used as class attribute
   * and the first attribute will be excluded from the distance calculation.
   *
   * @param args          the commandline arguments
   * @throws Exception    if something goes wrong
   */
  public static void main(String[] args) throws Exception {
    // load data
    Instances data = DataSource.read(args[0]);
    data.setClassIndex(data.numAttributes() - 1);
    System.out.println("Input data has " + data.numAttributes() + " attributes.");

    // initialize KDTree
    EuclideanDistance distfunc = new EuclideanDistance();
    distfunc.setAttributeIndices("2-last");
    KDTree kdtree = new KDTree();
    kdtree.setDistanceFunction(distfunc);
    kdtree.setInstances(data);

    // obtain neighbors for a random instance
    Random rand = data.getRandomNumberGenerator(42);
    Instance inst = data.instance(rand.nextInt(data.numInstances()));
    Instances neighbors = kdtree.kNearestNeighbours(inst, 5);
    double[] distances = kdtree.getDistances();
    System.out.println("Neighbors data has " + neighbors.numAttributes() + " attributes.");
    System.out.println("\nInstance:\n" + inst);
    System.out.println("\nNeighbors:");
    for (int i = 0; i < neighbors.numInstances(); i++)
      System.out.println((i+1) + ". distance=" + distances[i] + "\n   " + neighbors.instance(i) + "");
  }
}


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: Gradient Boost for 2-class classification problem

by Peter Reutemann-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Currently, I am implementing a Gradient Boost algorithm with Logistic Loss Function and Decision Tree (Regression Tree) as weak learner (e.g for simple 2 terminal nodes DecisionStump) using weka. Such generic boosting algorithm can be found in the paper "Greedy Function Approximation: A Gradient Boosting Machine, J.H. Friedman"
>
> I have 2 questions:
>
> 1. Is there any similar class in weka does the same thing as I do ?

I don't think so, but I can be mistaken.

> 2. When I build a tree classifier f for a set of data { x_i, y_i } , i = 1,..,N, x \in R, y = {-1,1} and call f.distributionForInstance(x_i), it will provide the an array of distribution values d for each value x_i. For example in case of two class y_i = {-1,1}, we have two distribution values d[0] for y=-1 and d[1] for y = 1 respectively.
> However, when I transform the class y to numeric y' and build a new tree classifier f' on such modified set of data { x,y' }. Then the new f'.distributionForInstance(x_i) gives only one value d[0]. So what does this value mean ?

The distributionForInstance method is only used for nominal class
attributes, for numeric attributes it doesn't make much sense as it is
not possible to return a class distribution. The default
implementation in the weka.classifiers.Classifier (or later version of
the developer version of Weka: weka.classifiers.AbstractClassifier)
just returns the value of the classifyInstance method of the
classifier when encountering a numeric class attribute. The
classifyInstance method returns the index of the chosen class label
for nominal class attributes and for numeric ones the calculated
regression value.

Cheers, Peter
--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/           Ph. +64 (7) 858-5174

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: Gradient Boost for 2-class classification problem

by Mark Hall-9 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 4/11/09 12:37 PM, Doan Viet Dung wrote:

> Hi all,
> This is my first post on weka list.
>
> Currently, I am implementing a Gradient Boost algorithm with Logistic
> Loss Function and Decision Tree (Regression Tree) as weak learner (e.g
> for simple 2 terminal nodes DecisionStump) using weka. Such generic
> boosting algorithm can be found in the paper "Greedy Function
> Approximation: A Gradient Boosting Machine, J.H. Friedman"
>
> I have 2 questions:
>
> 1. Is there any similar class in weka does the same thing as I do ?

weka.classifiers.meta.LogitBoost is closely related I believe.

Cheers,
Mark.


--
Mark Hall
Senior Developer/Consultant, Pentaho Open Source Business Intelligence
Citadel International, Suite 340, 5950 Hazeltine National Dr., Orlando, FL
32822, USA
+64 7 348-7099 office, +64 21 399-132 mobile, +1 815 550-8637 fax,
Skype: mark.andrew.hall, Yahoo: mark_andrew_hall
Download the latest release today <http://www.sourceforge.net/projects/pentaho>

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

how to read model file of trained classifier

by tgh :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi
I use weka3.6.0, and I use data to train TAN classifier, and it works,
I save the model of trained classifier, but how to read this classifier
file, I want to read it to see the format of it and to see whether I can
make some modification for it ,
I use windows, and the text reader can not read the model file,

If I want to read the model file, to read it, that is ,read it just as read
a text file, how to do it


Thank you in advance





_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: how to read model file of trained classifier

by Peter Reutemann-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> I use weka3.6.0, and I use data to train TAN classifier, and it works,
> I save the model of trained classifier, but how to read this classifier
> file, I want to read it to see the format of it and to see whether I can
> make some modification for it ,
> I use windows, and the text reader can not read the model file,
>
> If I want to read the model file, to read it, that is ,read it just as read
> a text file, how to do it

*All* models in Weka are serialized Java objects, i.e., binary files.
You can only use Weka (or your own Java code) to deserialize it again
and use for further classifications.

Cheers, Peter
--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/           Ph. +64 (7) 858-5174

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: Gradient Boost for 2-class classification problem

by Doan Viet Dung :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


The distributionForInstance method is only used for nominal class
attributes, for numeric attributes it doesn't make much sense as it is
not possible to return a class distribution. The default
implementation in the weka.classifiers.Classifier (or later version of
the developer version of Weka: weka.classifiers.AbstractClassifier)
just returns the value of the classifyInstance method of the
classifier when encountering a numeric class attribute. The
classifyInstance method returns the index of the chosen class label
for nominal class attributes and for numeric ones the calculated
regression value.

Thank you for your reply Peter.
So I understand that in case of numeric class, the tree returns a regression value. Then I wonder how can I convert this value to the class distribution value for a classification problem (with 2 label {-1,1}) ?

For more details, I would describe a bit my algoorithm below :
In the gradient boosting algorithm with loss function = log(1 + exp(-2yF(x))), with y = {-1,1}, I have to compute a new pseudo-reponse (using Gradient decent) : y' = 2y/(1+exp(2yF(x))), with now y' \in R.

Then I fit a  tree to such new data {x,y'}_i , i=1,..,N. Consider using a tree with 2 nodes  R_1 and R_2 (DecisionStump in weka), how can I know a given data x_i belong to which tree node ? Normally, I can get it though class distribution value but here I have only a regression one.

Cheers
VDung


--- On Wed, 11/4/09, Peter Reutemann <fracpete@...> wrote:

From: Peter Reutemann <fracpete@...>
Subject: Re: [Wekalist] Gradient Boost for 2-class classification problem
To: "Weka machine learning workbench list." <wekalist@...>
Date: Wednesday, November 4, 2009, 7:32 AM

> Currently, I am implementing a Gradient Boost algorithm with Logistic Loss Function and Decision Tree (Regression Tree) as weak learner (e.g for simple 2 terminal nodes DecisionStump) using weka. Such generic boosting algorithm can be found in the paper "Greedy Function Approximation: A Gradient Boosting Machine, J.H. Friedman"
>
> I have 2 questions:
>
> 1. Is there any similar class in weka does the same thing as I do ?

I don't think so, but I can be mistaken.

> 2. When I build a tree classifier f for a set of data { x_i, y_i } , i = 1,..,N, x \in R, y = {-1,1} and call f.distributionForInstance(x_i), it will provide the an array of distribution values d for each value x_i. For example in case of two class y_i = {-1,1}, we have two distribution values d[0] for y=-1 and d[1] for y = 1 respectively.
> However, when I transform the class y to numeric y' and build a new tree classifier f' on such modified set of data { x,y' }. Then the new f'.distributionForInstance(x_i) gives only one value d[0]. So what does this value mean ?





Cheers, Peter
--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: Gradient Boost for 2-class classification problem

by Mark Hall-9 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 5/11/09 12:30 AM, Doan Viet Dung wrote:

>
> The distributionForInstance method is only used for nominal class
> attributes, for numeric attributes it doesn't make much sense as it is
> not possible to return a class distribution. The default
> implementation in the weka.classifiers.Classifier (or later version of
> the developer version of Weka: weka.classifiers.AbstractClassifier)
> just returns the value of the classifyInstance method of the
> classifier when encountering a numeric class attribute. The
> classifyInstance method returns the index of the chosen class label
> for nominal class attributes and for numeric ones the calculated
> regression value.
>
> Thank you for your reply Peter.
> So I understand that in case of numeric class, the tree returns a
> regression value. Then I wonder how can I convert this value to the
> class distribution value for a classification problem (with 2 label
> {-1,1}) ?
>
> For more details, I would describe a bit my algoorithm below :
> In the gradient boosting algorithm with loss function = log(1 +
> exp(-2yF(x))), with y = {-1,1}, I have to compute a new pseudo-reponse
> (using Gradient decent) : y' = 2y/(1+exp(2yF(x))), with now y' \in R.
>
> Then I fit a tree to such new data {x,y'}_i , i=1,..,N. Consider using a
> tree with 2 nodes R_1 and R_2 (DecisionStump in weka), how can I know a
> given data x_i belong to which tree node ? Normally, I can get it though
> class distribution value but here I have only a regression one.

I'd say that if the regression tree predicts a value <= 0 then the class is -1,
otherwise it is 1.

Cheers,
Mark.


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html