If you are working with text, you can use the stringtowordvector class
to convert each record into a "bag of words"
It is also possible to stem the words, and assign the tf-idf weight to
the them via the appropriate parameters
You can chose the sparse vector representation to save "space" (and memory)
I think the easiest way is to try on the explorer, and then copy and
past the "commands" from the object editor into your code since the
commands can get pretty long
If you dealing with non-textual data, then you should be able to use the
weka cluster functionalities directly; each instance of your arff for
e.g. is itself a vector
For the weighting, you can use the tf-idf measure or weka also allows
you to assing different weights to different attributes, but i'm not
sure how to do that
ashwin
Puri, Aseem wrote:
>
> Hi
>
> I am very new to weka. I want to cluster my data and so
> that similar data be grouped together. I am thinking to use K-means.
> In weka as far I have seen we are giving a file (.arff or .csv) as
> input. Clustering is done on that.
>
> But input I want to give is vector input. I want to give
> weight for every term residing in a weight vector. Can anybody please
> tell me how should I proceed?
>
>
>
> Thanks & Regards
>
> **Aseem Puri**
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Wekalist mailing list
> Send posts to:
Wekalist@...
> List info and subscription status:
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist> List etiquette:
http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html>
_______________________________________________
Wekalist mailing list
Send posts to:
Wekalist@...
List info and subscription status:
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalistList etiquette:
http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html