|
View:
New views
11 Messages
—
Rating Filter:
Alert me
|
|
|
Instance level meta-dataGuys,
A newbie question: I am developing a KDTree based application. Just wondering what would be the best way to handle meta-data at Instance level. First I thought of using a Map of Instance and meta-data inside application, but it seems the neasrestNeighbour method of KDTree returns a copy of the Instance object instead of original object, so this approach won't work. I could add meta-data as atrributes, but I do not know how to let KDTree know not to use it in distance calculation. thanks, Marco
_______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
|
Re: Instance level meta-data> I am developing a KDTree based application. Just wondering what would be the
> best way to handle meta-data at Instance level. > > First I thought of using a Map of Instance and meta-data inside application, > but it seems the neasrestNeighbour method of KDTree returns a copy of the > Instance object instead of original object, so this approach won't work. > > I could add meta-data as atrributes, but I do not know how to let KDTree > know not to use it in distance calculation. The EuclideanDistance distance function that KDTree uses, allows you to specify the attribute range to base the distance calculations on. That should enable you to store the meta-data as additional attributes. Cheers, Peter -- Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174 _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
|
Re: Instance level meta-dataThanks for the reply. I tried adding meta-data as you suggested, but when KDTree returns nearestNeighbour, it wipes out all the attributes not used in distance calculation from the original Instance object. Any workarounds? thanks, Marco On Tue, Nov 3, 2009 at 12:12 PM, Peter Reutemann <fracpete@...> wrote:
_______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
|
Gradient Boost for 2-class classification problem
_______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
|
Re: Instance level meta-dataPlease no top-posting, see mailing list etiquette why
(http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html). > Thanks for the reply. I tried adding meta-data as you suggested, but when > KDTree returns nearestNeighbour, it wipes out all the attributes not used in > distance calculation from the original Instance object. Sorry, but I get the same data format back that I used to initialize the KDTree instance with. See attached example class and example output (run on the UCI dataset "anneal"). Cheers, Peter -- Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174 Input data has 39 attributes. Neighbors data has 39 attributes. Instance: '?',C,R,0,0,'?',S,2,0,'?','?','?','?','?',Y,'?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?',SHEET,2,1500,4170,'?',0,'?',2 Neighbors: 1. distance=0.10664893617021276 '?',C,R,0,0,'?',S,2,0,'?','?','?','?','?',Y,'?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?',SHEET,1.599,1500,4170,'?',0,'?',2 2. distance=0.3191489361702128 '?',C,R,0,0,'?',S,2,0,'?','?','?','?','?',Y,'?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?',SHEET,0.8,1500,4170,'?',0,'?',2 3. distance=0.7768496558027644 '?',C,R,0,0,'?',S,2,0,'?','?','?','?','?',Y,'?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?',SHEET,0.8,1320,762,'?',0,'?',2 4. distance=1.005642748774168 '?',C,R,0,0,'?',S,2,0,'?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?',SHEET,1.6,1500,4170,'?',0,'?',2 5. distance=1.0229176046025408 '?',C,R,0,0,'?',S,2,0,'?','?',E,'?','?',Y,'?','?','?','?','?','?','?','?','?','?','?','?','?','?','?','?',SHEET,1.601,1320,4880,'?',0,'?',3 [KDTreeWithMetaData.java] import weka.core.*; import weka.core.converters.ConverterUtils.*; import weka.core.neighboursearch.*; import java.util.*; /** * Example class for demonstrating how to use KDTree with meta-data, i.e., * additional attributes that are not used in the distance calculation. * * @author FracPete (fracpete at waikato dot ac dot nz) * @version $Revision$ */ public class KDTreeWithMetaData { /** * Expects a dataset as first parameter. The last attribute is used as class attribute * and the first attribute will be excluded from the distance calculation. * * @param args the commandline arguments * @throws Exception if something goes wrong */ public static void main(String[] args) throws Exception { // load data Instances data = DataSource.read(args[0]); data.setClassIndex(data.numAttributes() - 1); System.out.println("Input data has " + data.numAttributes() + " attributes."); // initialize KDTree EuclideanDistance distfunc = new EuclideanDistance(); distfunc.setAttributeIndices("2-last"); KDTree kdtree = new KDTree(); kdtree.setDistanceFunction(distfunc); kdtree.setInstances(data); // obtain neighbors for a random instance Random rand = data.getRandomNumberGenerator(42); Instance inst = data.instance(rand.nextInt(data.numInstances())); Instances neighbors = kdtree.kNearestNeighbours(inst, 5); double[] distances = kdtree.getDistances(); System.out.println("Neighbors data has " + neighbors.numAttributes() + " attributes."); System.out.println("\nInstance:\n" + inst); System.out.println("\nNeighbors:"); for (int i = 0; i < neighbors.numInstances(); i++) System.out.println((i+1) + ". distance=" + distances[i] + "\n " + neighbors.instance(i) + ""); } } _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
|
Re: Gradient Boost for 2-class classification problem> Currently, I am implementing a Gradient Boost algorithm with Logistic Loss Function and Decision Tree (Regression Tree) as weak learner (e.g for simple 2 terminal nodes DecisionStump) using weka. Such generic boosting algorithm can be found in the paper "Greedy Function Approximation: A Gradient Boosting Machine, J.H. Friedman"
> > I have 2 questions: > > 1. Is there any similar class in weka does the same thing as I do ? I don't think so, but I can be mistaken. > 2. When I build a tree classifier f for a set of data { x_i, y_i } , i = 1,..,N, x \in R, y = {-1,1} and call f.distributionForInstance(x_i), it will provide the an array of distribution values d for each value x_i. For example in case of two class y_i = {-1,1}, we have two distribution values d[0] for y=-1 and d[1] for y = 1 respectively. > However, when I transform the class y to numeric y' and build a new tree classifier f' on such modified set of data { x,y' }. Then the new f'.distributionForInstance(x_i) gives only one value d[0]. So what does this value mean ? The distributionForInstance method is only used for nominal class attributes, for numeric attributes it doesn't make much sense as it is not possible to return a class distribution. The default implementation in the weka.classifiers.Classifier (or later version of the developer version of Weka: weka.classifiers.AbstractClassifier) just returns the value of the classifyInstance method of the classifier when encountering a numeric class attribute. The classifyInstance method returns the index of the chosen class label for nominal class attributes and for numeric ones the calculated regression value. Cheers, Peter -- Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174 _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
|
Re: Gradient Boost for 2-class classification problemOn 4/11/09 12:37 PM, Doan Viet Dung wrote:
> Hi all, > This is my first post on weka list. > > Currently, I am implementing a Gradient Boost algorithm with Logistic > Loss Function and Decision Tree (Regression Tree) as weak learner (e.g > for simple 2 terminal nodes DecisionStump) using weka. Such generic > boosting algorithm can be found in the paper "Greedy Function > Approximation: A Gradient Boosting Machine, J.H. Friedman" > > I have 2 questions: > > 1. Is there any similar class in weka does the same thing as I do ? weka.classifiers.meta.LogitBoost is closely related I believe. Cheers, Mark. -- Mark Hall Senior Developer/Consultant, Pentaho Open Source Business Intelligence Citadel International, Suite 340, 5950 Hazeltine National Dr., Orlando, FL 32822, USA +64 7 348-7099 office, +64 21 399-132 mobile, +1 815 550-8637 fax, Skype: mark.andrew.hall, Yahoo: mark_andrew_hall Download the latest release today <http://www.sourceforge.net/projects/pentaho> _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
|
how to read model file of trained classifierHi
I use weka3.6.0, and I use data to train TAN classifier, and it works, I save the model of trained classifier, but how to read this classifier file, I want to read it to see the format of it and to see whether I can make some modification for it , I use windows, and the text reader can not read the model file, If I want to read the model file, to read it, that is ,read it just as read a text file, how to do it Thank you in advance _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
|
Re: how to read model file of trained classifier> I use weka3.6.0, and I use data to train TAN classifier, and it works,
> I save the model of trained classifier, but how to read this classifier > file, I want to read it to see the format of it and to see whether I can > make some modification for it , > I use windows, and the text reader can not read the model file, > > If I want to read the model file, to read it, that is ,read it just as read > a text file, how to do it *All* models in Weka are serialized Java objects, i.e., binary files. You can only use Weka (or your own Java code) to deserialize it again and use for further classifications. Cheers, Peter -- Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174 _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
|
Re: Gradient Boost for 2-class classification problem
_______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
|
Re: Gradient Boost for 2-class classification problemOn 5/11/09 12:30 AM, Doan Viet Dung wrote:
> > The distributionForInstance method is only used for nominal class > attributes, for numeric attributes it doesn't make much sense as it is > not possible to return a class distribution. The default > implementation in the weka.classifiers.Classifier (or later version of > the developer version of Weka: weka.classifiers.AbstractClassifier) > just returns the value of the classifyInstance method of the > classifier when encountering a numeric class attribute. The > classifyInstance method returns the index of the chosen class label > for nominal class attributes and for numeric ones the calculated > regression value. > > Thank you for your reply Peter. > So I understand that in case of numeric class, the tree returns a > regression value. Then I wonder how can I convert this value to the > class distribution value for a classification problem (with 2 label > {-1,1}) ? > > For more details, I would describe a bit my algoorithm below : > In the gradient boosting algorithm with loss function = log(1 + > exp(-2yF(x))), with y = {-1,1}, I have to compute a new pseudo-reponse > (using Gradient decent) : y' = 2y/(1+exp(2yF(x))), with now y' \in R. > > Then I fit a tree to such new data {x,y'}_i , i=1,..,N. Consider using a > tree with 2 nodes R_1 and R_2 (DecisionStump in weka), how can I know a > given data x_i belong to which tree node ? Normally, I can get it though > class distribution value but here I have only a regression one. I'd say that if the regression tree predicts a value <= 0 then the class is -1, otherwise it is 1. Cheers, Mark. _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
| Free embeddable forum powered by Nabble | Forum Help |