« Return to Thread: Assuring the completeness of training data set

Re: Assuring the completeness of training data set

by Harri Saarikoski :: Rate this Message:

Reply to Author | View in Thread

Lainaus "J K Rai" <jk.anurag@...>:

>
>
>> > From the discussion which went on it looks like that I
>> should use smote or the filter suggested by wirefree (meant
>> for non-numeric classification?) . Both seem to add some
>> more instances. Can I do with the original dataset,
>>
>> sure you can. forget about the adding part for now.
>>
>> for original dataset I suggested running the following in
>> experimenter:
>> - iterations: 1
>> - folds: 10
>> - classifier selection: select e.g. the following
>> fast-running classifiers (doesn't matter much for our
>> purposes which we choose but best have several of them
>> though):
>>    bayes.NaiveBayes
>>    trees.J48
>>    lazy.IBk
>> (all classifiers in their default configurations)
>>
>> after it finishes, go to results tab and select from those
>> row/column buttons something reading "standard deviation"
>> -> results analysis: stdev could be e.g. +/- 2..3% which
>> under the above assumption of similarity between trainset
>> and testset is your figure of representativeness of your
>> trainset (the smaller this deviation is, the more
>> representative your trainset is)
>
> If I choose a different value in comparison field (in analyzer I get  
> different stddev in right window), like .04 and .1 for  
> Mean_absolute_error and Root_mean_squared_error respectively, shown  
> below:
>
> What should I choose there for analysis (in comparison field)?
> Harri, I am not clear with the method and concept regarding your  
> suggestion. Kindly explain.
sorry, stdev is a tickbox in the analyse tab not a row/column selection
tick that and rerun. then you get:

Tester:     weka.experiment.PairedCorrectedTTester
Analysing:  Percent_correct
Datasets:   1
Resultsets: 1
Confidence: 0.05 (two tailed)
Sorted by:  -
Date:       30.6.2009 7:50


Dataset                   (1) trees.REPTree '-
----------------------------------------------
                           (10)   94.00(5.84) |
----------------------------------------------
(v/ /*)                                      |


Key:
(1) trees.REPTree '-M 2 -V 0.0010 -N 3 -S 1 -L -1' -8562443428621539458

-> 5.84 is the stdev

(maybe observe the individual fold results too, get that by selecting  
Fold as one of the Rows there, and hit Perform test again)

>
> regards,
> Jitendra
> ============================================================
> Tester:     weka.experiment.PairedCorrectedTTester
> Analysing:  Mean_absolute_error
> Datasets:   1
> Resultsets: 1
> Confidence: 0.05 (two tailed)
> Sorted by:  -
> Date:       6/30/09 6:07 AM
>
>
> Dataset                   (1) functions.Linea
> ---------------------------------------------
> L2_MR                     (10)   0.64(0.04) |
> ---------------------------------------------
> (v/ /*)                                     |
>
>
> Key:
> (1) functions.LinearRegression '-S 0 -R 1.0E-8' -3364580862046573747
>
> ==============================================================
>
> Tester:     weka.experiment.PairedCorrectedTTester
> Analysing:  Root_mean_squared_error
> Datasets:   1
> Resultsets: 1
> Confidence: 0.05 (two tailed)
> Sorted by:  -
> Date:       6/30/09 6:08 AM
>
>
> Dataset                   (1) functions.Linea
> ---------------------------------------------
> L2_MR                     (10)   1.06(0.10) |
> ---------------------------------------------
> (v/ /*)                                     |
>
>
> Key:
> (1) functions.LinearRegression '-S 0 -R 1.0E-8' -3364580862046573747
> ======================================================================
>
>
>       ICC World Twenty20 England '09 exclusively on YAHOO!  
> CRICKET http://cricket.yahoo.com
>
>
>


--

kind regards,
Harri M.T. Saarikoski
M.A, PhD graduate student
Helsinki University
Finland



_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

 « Return to Thread: Assuring the completeness of training data set