stringtowordvector: sequential execution of n-gram and wordlist option

View: New views
2 Messages — Rating Filter:   Alert me  

stringtowordvector: sequential execution of n-gram and wordlist option

by paul.adriani :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.

Dear sir,

 

Weka provides the stringtoword vector filter. When applied to a string using both n-grams and a stopwordlist the n-gram is applied before the stopwordlist. However, since this results in many useless combinations in my text categorization task I would like to know whether the stopwordlist could be applied first followed by the n-gram technology.

 

Regards,

 

Paul

 

 

 

Paul Adriani Ba. sc. en ssc.

Jan van Riebeekstraat 14-3

1057ZX Amsterdam

tel 0644141917

 

Infocaster BV

paul@...

 


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: stringtowordvector: sequential execution of n-gram and wordlist option

by Peter Reutemann-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Weka provides the stringtoword vector filter. When applied to a string using
> both n-grams and a stopwordlist the n-gram is applied before the
> stopwordlist. However, since this results in many useless combinations in my
> text categorization task I would like to know whether the stopwordlist could
> be applied first followed by the n-gram technology.

The n-gram tokenizer was a contribution, long after the
StringToWordVector filter came into being. There's currently no way
around to apply the stopword list first, unfortunately.

Cheers, Peter
--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/           Ph. +64 (7) 858-5174

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html