|
View:
New views
6 Messages
—
Rating Filter:
Alert me
|
|
|
Using different distances for attributes, is it possible?I have a dataset with numeric and string-attributes and want to use K-means, maybe later even X-means. Is there any possibilty to use different metrics for the attributes? eg. Euclidean for numeric and edit-distance for strings? Thanks in advance
nikita m. |
|
|
Re: Using different distances for attributes, is it possible?> I have a dataset with numeric and string-attributes and want to use K-means,
> maybe later even X-means. Is there any possibilty to use different metrics > for the attributes? eg. Euclidean for numeric and edit-distance for strings? No, you can only use a single distance function (though you can specify what attributes ranges to use in calculation). But you can always implement your own distance function that uses different sub-distance-functions for the different attribute types. A distance function has to implement the interface weka.core.DistanceFunction. See the other distance functions for implementation examples. Cheers, Peter -- Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174 _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
Re: Using different distances for attributes, is it possible?On 6/30/09 5:14 AM, nikita m. wrote:
> I have a dataset with numeric and string-attributes and want to use K-means, > maybe later even X-means. Is there any possibilty to use different metrics > for the attributes? eg. Euclidean for numeric and edit-distance for strings? > Thanks in advance Not without changing the code. Note that how the centroids are computed for K-means is important if you want to guarantee that the distance function minimizes the within cluster error. The component-wise mean is the correct choice for squared error (Euclidean distance) while the component-wise median minimizes the Manhattan distance. Cheers, Mark. -- Mark Hall Senior Developer/Consultant, Pentaho Open Source Business Intelligence Citadel International, Suite 340, 5950 Hazeltine National Dr., Orlando, FL 32822, USA +64 7 847-3537 office, +64 21 399-132 mobile, +1 815 550-8637 fax, Skype: mark.andrew.hall, Yahoo: mark_andrew_hall Download the latest release today <http://www.sourceforge.net/projects/pentaho> _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
Re: Using different distances for attributes, is it possible?>No, you can only use a single distance function (though you can
>specify what attributes ranges to use in calculation). >A distance function has to implement the interface weka.core.DistanceFunction. It seems like i have to implement a new distance function. I am using 3.6.1 and already tried to use my own distancefunction, but Kmeans insists on Euclidean or Manhattan distance - and as far as i know, Xmeans is using Kmeans aswell. So - what can i do? - Rewrite the Euclidean distance with some extra functionality for Strings? - Force Kmeans to accept my own distance? - Something else? What i'm trying to do is using Euclidean for numeric atrributes and Edit-dist. for String- attributes and mixing the values for an overall distance. This distance should then be used by Kmeans and/or Xmeans. |
|
|
Re: Using different distances for attributes, is it possible?>>No, you can only use a single distance function (though you can
>>specify what attributes ranges to use in calculation). > >>A distance function has to implement the interface > weka.core.DistanceFunction. > > It seems like i have to implement a new distance function. I am using 3.6.1 > and already tried to use my own distancefunction, but Kmeans insists on > Euclidean or > Manhattan distance - You might want to change the setDistanceFunction method, adding your distance function to be allowed as well. > and as far as i know, Xmeans is using Kmeans aswell. XMeans isn't using KMeans, it's a completely separate implementation. >From a quick look at the code, there don't seem to be any restrictions regarding distance functions. [...] Cheers, Peter -- Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174 _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
Re: Using different distances for attributes, is it possible?>You might want to change the setDistanceFunction method, adding your >distance function to be allowed as well. ok - I will try this. > XMeans isn't using KMeans, it's a completely separate implementation. You are right. I assumed this, because I was taking a too 'quick' look on the code. >From a quick look at the code, there don't seem to be any restrictions >regarding distance functions. I'm glad to read that there are no fundamental restrictions. thx for the help nikita |
| Free embeddable forum powered by Nabble | Forum Help |