« Return to Thread: Cluster data by giving vector input

Re: Cluster data by giving vector input

by Peter Reutemann-3 :: Rate this Message:

Reply to Author | View in Thread

> Actually i want to group my HBase Row data. HBase is Data Base built on top
> of Hadoop File System. I created a lucene index on my HBase table. With the
> lucene analyzer stemming and stop words are removed and also i have
> calculated tf-idf for each word in a row. Now i want to group similar kind
> of rows. So now i have vector for each row which further have tf-idf
> weights.
>
> Now please tell how should i proceed further so my similar kind of rows
> should group together. Also i want to know how i give this input to any
> cluster algorithm

Instead of generating files, you could just generate a
weka.core.Instances object on-the-fly and feed that into clusterer.
For an example of how to generate weka.core.Instances objects, see
wiki article "Creating an ARFF file":
  http://weka.wiki.sourceforge.net/Creating+an+ARFF+file

As each of rows is weighted, you will have to set the weight of each
weka.core.Instance object as well, either using the weight directly in
the weka.core.Instance constructor or afterwards using the
setWeight(double) method. See Javadoc of that class for more
information.

For general API usage, see FAQ "How do I use WEKA's classes in my own
code?". Link to the FAQs available from the Weka homepage.

Cheers, Peter
--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/           Ph. +64 (7) 858-5174

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

 « Return to Thread: Cluster data by giving vector input