> Weka provides the stringtoword vector filter. When applied to a string using
> both n-grams and a stopwordlist the n-gram is applied before the
> stopwordlist. However, since this results in many useless combinations in my
> text categorization task I would like to know whether the stopwordlist could
> be applied first followed by the n-gram technology.
The n-gram tokenizer was a contribution, long after the
StringToWordVector filter came into being. There's currently no way
around to apply the stopword list first, unfortunately.
Cheers, Peter
--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174
_______________________________________________
Wekalist mailing list
Send posts to:
Wekalist@...
List info and subscription status:
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalistList etiquette:
http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html