« Return to Thread: Assuring the completeness of training data set

Re: Assuring the completeness of training data set

by J K Rai :: Rate this Message:

Reply to Author | View in Thread



> > From the discussion which went on it looks like that I
> should use smote or the filter suggested by wirefree (meant
> for non-numeric classification?) . Both seem to add some
> more instances. Can I do with the original dataset,
>
> sure you can. forget about the adding part for now.
>
> for original dataset I suggested running the following in
> experimenter:
> - iterations: 1
> - folds: 10
> - classifier selection: select e.g. the following
> fast-running classifiers (doesn't matter much for our
> purposes which we choose but best have several of them
> though):
>    bayes.NaiveBayes
>    trees.J48
>    lazy.IBk
> (all classifiers in their default configurations)
>
> after it finishes, go to results tab and select from those
> row/column buttons something reading "standard deviation"
> -> results analysis: stdev could be e.g. +/- 2..3% which
> under the above assumption of similarity between trainset
> and testset is your figure of representativeness of your
> trainset (the smaller this deviation is, the more
> representative your trainset is)
If I choose a different value in comparison field (in analyzer I get different stddev in right window), like .04 and .1 for Mean_absolute_error and Root_mean_squared_error respectively, shown below:

What should I choose there for analysis (in comparison field)?
Harri, I am not clear with the method and concept regarding your suggestion. Kindly explain.

regards,
Jitendra
============================================================
Tester:     weka.experiment.PairedCorrectedTTester
Analysing:  Mean_absolute_error
Datasets:   1
Resultsets: 1
Confidence: 0.05 (two tailed)
Sorted by:  -
Date:       6/30/09 6:07 AM


Dataset                   (1) functions.Linea
---------------------------------------------
L2_MR                     (10)   0.64(0.04) |
---------------------------------------------
(v/ /*)                                     |


Key:
(1) functions.LinearRegression '-S 0 -R 1.0E-8' -3364580862046573747

==============================================================

Tester:     weka.experiment.PairedCorrectedTTester
Analysing:  Root_mean_squared_error
Datasets:   1
Resultsets: 1
Confidence: 0.05 (two tailed)
Sorted by:  -
Date:       6/30/09 6:08 AM


Dataset                   (1) functions.Linea
---------------------------------------------
L2_MR                     (10)   1.06(0.10) |
---------------------------------------------
(v/ /*)                                     |


Key:
(1) functions.LinearRegression '-S 0 -R 1.0E-8' -3364580862046573747
======================================================================


      ICC World Twenty20 England '09 exclusively on YAHOO! CRICKET http://cricket.yahoo.com



_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

 « Return to Thread: Assuring the completeness of training data set