On 6/30/09 5:14 AM, nikita m. wrote:
> I have a dataset with numeric and string-attributes and want to use K-means,
> maybe later even X-means. Is there any possibilty to use different metrics
> for the attributes? eg. Euclidean for numeric and edit-distance for strings?
> Thanks in advance
Not without changing the code. Note that how the centroids are computed
for K-means is important if you want to guarantee that the distance
function minimizes the within cluster error. The component-wise mean is
the correct choice for squared error (Euclidean distance) while the
component-wise median minimizes the Manhattan distance.
Cheers,
Mark.
--
Mark Hall
Senior Developer/Consultant, Pentaho Open Source Business Intelligence
Citadel International, Suite 340, 5950 Hazeltine National Dr., Orlando,
FL 32822, USA
+64 7 847-3537 office, +64 21 399-132 mobile, +1 815 550-8637 fax,
Skype: mark.andrew.hall, Yahoo: mark_andrew_hall
Download the latest release today
<
http://www.sourceforge.net/projects/pentaho>
_______________________________________________
Wekalist mailing list
Send posts to:
Wekalist@...
List info and subscription status:
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalistList etiquette:
http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html