> At the moment I am performing an experiment in which I have WEKA select
> create wordvectors for an experiment. The input sets are sets of
> descriptions of news items. The number of descriptions varies per set from
> 10 items to 200 items.
>
> The smaller sets are all subsets of the larger sets.
>
>
>
> My question is why the number of features is not growing positively related
> with the number of items, i.e. if the number of items grows the number of
> features grows as well.
You're not providing a lot of information, so I'm only guessing... The
StringToWordVector filter has an option (-W/wordsToKeep) that is an
upper limit of how many words the generated internal dictionary will
contain in the end. If you're already at the limit with your subsets,
adding more data only means that less frequent words will get
discarded.
Cheers, Peter
--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174
_______________________________________________
Wekalist mailing list
Send posts to:
Wekalist@...
List info and subscription status:
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalistList etiquette:
http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html