constant number of training instances per class

View: New views
3 Messages — Rating Filter:   Alert me  

constant number of training instances per class

by Emre Akbas-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi, 

I'd like to split my dataset in such a way that there are N (which is a constant number) training instances for each class, and all the rest are left for testing. Can I do this using the GUI? What is the easiest way to achieve this? 

Thanks. 

Emre

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: constant number of training instances per class

by Harri Saarikoski-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


2009/10/26 Emre Akbas <eakbas2@...>
Hi, 

I'd like to split my dataset in such a way that there are N (which is a constant number) training instances for each class, and all the rest are left for testing. Can I do this using the GUI? What is the easiest way to achieve this? 


using explorer -> preprocess tab ->

(1) RemoveRange filter
two runs: instances till N are for training, from constant till end are for testing
save both as arff, and so on
be sure that your dataset is fully order-randomised per classes before doing this
i.e. that order of instances does not give away what class the instance is

(2) StratifiedRemoveFolds filter
if your fixed constant can be transformed to a percentage cutoff
the much easier and directly applicable way is to use this filter
two runs with and without the invert flag, same as above
-> this latter would also guarantee the rather key thing that the resulting two arffs
have the same prior distribution of instances into classes  which is generally recommended
to ensure 'balanced' output of the classes

Thanks. 

Emre

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html




--
-----------------
Harri M.T. Saarikoski
M.A, PhD graduate student
Helsinki University
Finland

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: constant number of training instances per class

by J K Rai :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Hi,

> I'd like to split my dataset in such a way
> that there are N (which is a constant number) training
> instances for each class, and all the rest are left for
> testing. Can I do this using the GUI? What is the easiest
> way to achieve this?
>
> using explorer -> preprocess tab ->
>
> (1) RemoveRange filter
> two runs: instances till N are for training, from constant
> till end are for testing
> save both as arff, and so on
>
> be sure that your dataset is fully order-randomised per
> classes before doing this
> i.e. that order of instances does not give away what class
> the instance is
 
How can we insure that the data-set is fully order-randomized? Is there a way/option thru weka to insure this.

Regards,
Jitendra


      Try the new Yahoo! India Homepage. Click here. http://in.yahoo.com/trynew


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html